Libratus, the artificial intelligence that defeated four top professional poker players earlier this year, uses a three-pronged approach to master a game with more decision points than atoms in the universe, scientists say. In a study published in the journal Science, researchers from the Carnegie Mellon University in the US detailed how their AI was able to achieve superhuman performance by breaking the game into computationally manageable parts and, based on its opponents' gameplay, fix potential weaknesses in its strategy during the competition.
AI programs have defeated top humans in checkers, chess and Go - all challenging games, but ones in which both players know the exact state of the game at all times. Poker players, by contrast, contend with hidden information - what cards their opponents hold and whether an opponent is bluffing. In a 20-day competition involving 120,000 hands at Rivers Casino in Pittsburgh in January, Libratus became the first AI to defeat top human players at head's up no-limit Texas Hold'em Poker - the primary benchmark and long-standing challenge problem for imperfect-information game-solving by AIs. Libratus beat each of the players individually in the two-player game and collectively amassed more than USD 1.8 million in chips.
"The techniques in Libratus do not use expert domain knowledge or human data and are not specific to poker. Thus they apply to a host of imperfect-information games," researchers said. Such hidden information is ubiquitous in real-world strategic interactions, including business negotiation, cybersecurity, finance, strategic pricing and military applications. Libratus includes three main modules, the first of which computes an abstraction of the game that is smaller and easier to solve than by considering all possible decision points - about 10 multiplied 161 times - in the game. It then creates its own detailed strategy for the early rounds of Texas Hold'em and a course strategy for the later rounds.
This strategy is called the blueprint strategy. In the final rounds of the game, a second module constructs a new, finer-grained abstraction based on the state of play. It also computes a strategy for this subgame in real-time that balances strategies across different subgames using the blueprint strategy for guidance - something that needs to be done to achieve safe subgame solving. The third module is designed to improve the blueprint strategy as competition proceeds. Typically, AIs use machine learning to find mistakes in the opponent's strategy and exploit them. However, that also opens the AI to exploitation if the opponent shifts strategy, Sandholm said. Instead, Libratus' self-improver module analyses opponents' bet sizes to detect potential holes in Libratus' blueprint strategy.
Libratus then added these missing decision branches, computes strategies for them, and adds them to the blueprint. In addition to beating the human pros, Libratus was evaluated against the best prior poker AIs. "The techniques that we developed are largely domain independent and can thus be applied to other strategic imperfect-information interactions, including non-recreational applications," researchers said. "Due to the ubiquity of hidden information in real-world strategic interactions, we believe the paradigm introduced in Libratus will be critical to the future growth and widespread application of AI," they said.
Watch: George Zhao, President, Honor Global | Interview