Libratus Poker Ai

Analysis Machines have triumphed again. Libratus, a powerful computer program, has crushed its human opponents at a heads-up no-limit Texas hold’em poker tournament held at Rivers Casino in Pittsburgh, Pennsylvania, winning $1,776,250 over 120,000 hands.

It’s a landmark achievement in AI game playing, said Tuomas Sandholm, co-creator of Libratus and a machine-learning professor at Carnegie Mellon University (CMU).

Libratus AI Defeated Top Pros in 20 Days of Poker Play Byron Spice Sunday, December 17, 2017 Print In a paper published today in Science, Computer Science Professor Tuomas Sandholm and Ph.D. Student Noam Brown detail how their poker AI, Libratus, achieved superhuman performance.

“Heads up no-limit Texas hold’em is – in a way – the last frontier standing within the foreseeable future. Of course, new things can come later. But of all of the games, where AI research has been significantly conducted – by which I mean multiple decades of research – all the other games like Othello, checkers, chess, Go, limit no Texas hold’em, Jeopardy! ... and so forth are such that the best AI has surpassed the best humans.

“But heads up no-limit Texas hold’em remained elusive in that never before has it been possible to beat the absolute top no limit Texas hold’em professionals. And in this event, this actually happened. So this is a landmark really in AI game playing.”

It was trebles for Sandholm and his PhD student, Noam Brown, on Twitter. Andrew Ng, a prominent AI researcher at Baidu and Stanford University, said the achievement was comparable to IBM’s Deep Blue, which beat Garry Kasparov at Chess and DeepMind’s AlphaGo, which beat Lee Sedol at Go.

CMU just made history: AI beats top humans at Texas Hold'em poker. A stunning accomplishment, comparable to Deep Blue & AlphaGo!

— Andrew Ng (@AndrewYNg) January 31, 2017

Over 20 days, four human poker players stared at multiple computer screens for ten hours a day with mounting frustration as they were repeatedly thrashed by their superior opponent, Libratus.

It was “demoralizing” to wake up and lose everyday, said Jason Les, a professional poker player, who finished fourth in the competition.

“I’m just so impressed with the quality of poker Libratus plays. We make a living trying to find vulnerabilities and strategies – that’s what we do every day when we play heads up no-limit. So if the public had any doubt about the quality of this technology, I can tell you from our experience, we tried everything we could but it was too strong.”

It was hardly a close match, with Libratus swooping in to take the lead from the very first day. In the evenings, the professional players Jason Les, Dong Kim, Daniel McAulay and Jimmy Chou would get together to compare notes. They analyzed the game and tried to come up with strategies to defeat the enemy, and it did work for a while.

In the early days of the competition, there were signs of hope as Les beat the machine to end on a positive $49,072, while his teammates were still in the negative. The poker pros fought mercilessly and bounced back to narrow Libratus’ win. They even seized their first six-figure win.

  1. In a stunning victory completed tonight the Libratus Poker AI, created by Noam Brown et al. At Carnegie Mellon University, has beaten four human professional players at No-Limit Hold'em. For the first time in history, the poker-playing world is facing a future of machines taking over the game of No-Limit Holdem.
  2. Libratus is an artificial intelligence computer program designed to play poker, specifically heads up no-limit Texas hold 'em. Libratus' creators intend for it to be generalisable to other, non-Poker-specific applications. It was developed at Carnegie Mellon University, Pittsburgh.
  3. Libratus, an artificial intelligence developed by Carnegie Mellon University, made history by defeating four of the world’s best professional poker players in a marathon 20-day poker competition, called “Brains Vs. Artificial Intelligence: Upping the Ante” at Rivers Casino in Pittsburgh.
  4. Carnegie Mellon University’s Libratus, an artificial intelligence computer program designed to play poker, started the year by proving it could beat four human poker pros. Now, a pair of university researchers behind the program are ending the year by telling the world exactly how the AI program managed to do it.

Second session of the day I go +40k and Dong goes +30k for the human team to put up its first six figure day of +110k #BrainsVsAI

— Jason Les (@heyitscheet) January 17, 2017

Libratus Poker Ai

The taste of victory was short-lived, as Libratus came back stronger and scooped a huge win eight days into the competition. As it continued to play poker, the machine learned to adjust its strategies, improving over time.

The constant upgrade in difficulty is what made it challenging for the players. It’s “extremely tough as the AI keeps getting better,” Kim told viewers while answering questions over a live stream on Twitch.

Libratus upped its game, crushing the chances of victory for team mortal, and charged to the finish line to win a whopping $1,776,250 – equivalent to 14.7 big-blinds per hundred or 147 milli-big-blinds per hand.

The large score is of “statistical significance” and a convincing win for the computer, the researchers say. It wasn’t down to a simple run of good cards, as the game was set up in a way to minimize the effect of luck. The four players were split into two teams of two people. One team plays in the open while the other team is locked in a room with no phones or outside communication. The locked-away team are dealt the same cards at the open team but with places switched: the open team humans gets the locked-away AI's hole cards, the locked-away humans get the open AI's hole cards, and so on. This is supposed to cancel out any run good effects.

It's not all bad news for the humans, as they take away a proportion of the $200,000 prize depending on how well they played relative to each other.

All you need is a supercomputer and algorithms

The exact details on how Libratus works will remain unclear until the researchers analyze the results and publish their work in a paper. However, Sandholm and Brown have provided snippets of information. It’s not the first time CMU has built a poker bot to challenge humans. The previous 'Brains vs AI' poker match in 2015 saw Claudico, Libaratus’ predecessor, lose to Dong Kim, Jason Les, Bjorn Li, and Doug Polk – the number one poker player at the time.

Poker is a difficult game for machines to master as it’s an imperfect information game. Players do not have equal knowledge about the game state due to hidden cards. Many researchers, including the team who recently published a paper on their own poker computer program DeepStack, use a technique called counterfactual regret minimization (CFR) to compute imperfect information games.

Counterfactual values that represent possible outcomes are picked, and the computer chooses the best possible move based on a decision tree and knowledge of previous strategies learned through training.

An important factor lies in the improved “end-game solving,” according to a paper [PDF] by CMU's Sandholm and Brown. “Unlike perfect-information games, imperfect-information games cannot be decomposed into subgames that are solved independently. Thus more computationally intensive equilibrium-finding techniques are used, and abstraction – in which a smaller version of the game is generated and solved – is essential. Endgame solving is the process of computing a (presumably) better strategy for just an endgame than what can be computationally afforded for the full game,” the paper's abstract reads.

Libratus

Libaratus's approach to tackling this is similar to that of DeepStack, another computer program that also bests human players at no-limit Texas hold’em.

Both programs – Libaratus and DeepStack – try to home in on the best possible winning strategy by attempting to solve for the Nash equilibrium – a solution in game theory which states that no player has an incentive to change his or her strategy after an opponent has made their move. Every player has picked the best line of attack based on their rivals' actions, basically.

The software can't find a perfect solution of the Nash equilibrium, though, due to the complexity of poker and the way in which the gameplay is abstracted into mathematical form. Getting close to the equilibrium is key to winning, and it’s an area were Claudico was weak whereas Libratus was rather good.

It’s difficult to compare the abilities of Libratus with DeepStack without them playing against each other. However, Libratus definitely has the edge in computational power, powered by the Bridges system at the Pittsburgh Supercomputing Center, which can achieve 1.35 PFLOPS – or more than a quadrillion floating-point math calculations per second.

Libratus gobbled up approximately 19 million core hours of computing, equivalent to 3,300 laptops generating over 2,600TB of data throughout the tournament. DeepStack was more modest: it's essentially a neural network with seven layers and uses deep learning algorithms – whereas Libratus used reinforcement learning to solve Nash equilibrium algorithms.

Could poker-playing AI lead to General AI?

Libratus Poker Ai

“Since the earliest days of AI research, beating top human players has been a powerful measure of progress in the field,” Sandholm said earlier.

“That was achieved with chess in 1997, with Jeopardy! in 2009 and with the board game Go just last year. Poker poses a far more difficult challenge than these games, as it requires a machine to make extremely complicated decisions based on incomplete information while contending with bluffs, slow play and other ploys.”

The victory for Libratus has ignited fear over the state of online poker. Many viewers watching the live stream on Jason Les’ Twitch channel flooded the chatroom with 'RIP online poker' messages.

But the fear of losing to bots online or possible cheating with poker bots is over-exaggerated. Poker is normally played in a multi-player environment, and having to consider more than one player makes solving for the Nash equilibrium much more complex. Basically, today's robo-players triumph in heads-up battles, not at a table of five, six, or more, players vying to pick up the pot.

Building a poker bot as good as Libratus is also a major task, as it requires a healthy sized supercomputer. Libratus probably won’t be playing anyone online anytime soon, as it costs too much to run.

Sandholm doesn’t see Libratus as a threat. Instead it adds “whole new depths to the game” and has made it “more interesting,” rather than killing it, he said during a live interview on Twitch.

The algorithms aren’t game-dependent. They can be applied in imperfect information environments to find the best strategies and can be adapted for negotiation and bargaining – applicable for cyber security, finance and the military.

It might even pave the way to general AI, says Brown. “If the field of AI is to achieve its goal of general AI, it needs to be able to address this problem of uncertainty which comes up a lot in real life. We see these algorithms are being used in this bot – it’s really advancing the field for those problems. How do you deal with uncertainty in real life?” ®

Get ourTech Resources

Libratus, the artificial intelligence (AI) engine designed by Professor Tuomas Sandholm at Carnegie Mellon University (CMU) and his graduate student Noam Brown has made an impression on Jason Les, one of the world’s top poker players. Poker News, the poker industry’s online news magazine, recently interviewed Les. A couple questions were telling when asked about which is a better name for his firstborn child and which is the more annoying opponent, Claudico or Libratus. For both questions, he responded with Libratus.[1],[2]

In January, Les and three others of the world’s top four poker champions—Dong Kim, Daniel McAulay, and Jimmy Chou—were challenged to 20 days of No-limit Heads-up Texas Hold ‘em poker at the Brains versus Artificial Intelligence tournament in Pittsburgh’s Rivers Casino. Libratus beat all four opponents, taking away more than $1.7 million in chips.

Les also considered that being selected to play against Libratus as his proudest poker accomplishment to date.

During the tournament, Dong Kim said about Libratus, “I didn’t realize how good it was until today. I felt like I was playing against someone who was cheating, like it could see my cards. I’m not accusing it of cheating. It was just that good.”[3] Kim also noted during the tournament that “he and his fellow humans have no real chance of winning.”[4]

Texas Hold ‘em, The Holy Grail

“In the area of game theory,” said Sandholm, “in which I’ve been working since 1989, No-Limit Heads-Up Texas Hold ‘em is the holy grail of imperfect-information games.” According to Sandholm, this version of poker was really the last game that had not been cracked by AI, in the sense of becoming better than humans. AI has already beaten top experts of other games, such as chess and go, but Texas Hold ‘em is different. In those games, the game play is open. All players know about where the pieces are at any time and the strategic possibilities presented, and thus can directly solve endgame strategies. In Texas Hold ‘em, players have a limited amount of information available about the opponents’ cards because some cards are played on the table and some are held privately in each player’s hand. The private cards represent hidden information that each player uses to devise their strategy. Thus, it is reasonable to assume that each player’s strategy is rational and designed to win the greatest amount of chips.

“Besides being a limited information play, the stakes space is huge. There are 10160 different situations the player can face. You can’t solve all those possibilities, and you can’t solve a sub-game of the game tree with information from a sub game only. How you play a sub-game actually depends on how you play in totally different parts of the game. It takes totally different algorithms from a game like chess, checkers, or go.”

Poker

Libratus Poker Air Fryer

Learning and Reasoning in a Limited Information Environment

Libratus is an AI system designed to learn in a limited information environment. It consists of three modules:

  1. A module that uses game-theoretic reasoning, just with the rules of the game as input, ahead of the actual game, to compute a blueprint strategy for playing the game. According to Sandholm, “It uses Nash equilibrium approximation, but with a new version of the Monte Carlo counterfactual regret minimization (MCCRM) algorithm that makes the MCCRM faster, and it also mitigates the issue of involving imperfect recall abstraction.”
  2. An subgame solver that recalculates strategy on the fly so the software can refine its blueprint strategy with each move.
  3. A post-play analyzer that reviews the opponent’s plays that exploited holes in Libratus’s strategies. “Then, Libratus re-computes a more refined strategy for those parts of the stage space, essentially finding and fixing the holes in its strategy to play even closer to Nash equilibrium in those spots,” stated Sandholm.

From Claudico to Libratus

Libratus Poker Air Purifier

Les and Kim also played against Sandholm’s previous AI poker player, Claudico, in 2015. And, while the two programs were authored by the same persons, Libratus is a different application. “Between Claudico and Libratus, we developed a better equilibrium-finding algorithm for module one, so we could do much finer-grained abstractions. The endgame solver of module two is a big improvement. In Claudico, we found during the 2015 tournament, that the application ran well with or without the endgame solver we had built. Libratus’ endgame solver improves play between 80 to 100 milli-big blinds per hand (mbb/hand),[5] increases safety, or non-exploitability, and it is a nested endgame solver, whereas Claudico was not. In module three, Claudico essentially continued overnight what it did in module one, while in Libratus, it actually fixes exploitable holes in its own strategy. That’s an entirely new aspect,” explained Sandholm.

Sandholm and his students have been working on poker optimization for 13 years. He typically had two PhD Students working with him, “but the actual code for Libratus was written from scratch,” stated Sandholm. He and his student, Noam Brown, spent a little over a year writing Libratus. “In 2015, we couldn’t beat the best players with Claudico, and many wondered how much we could accomplish in just a little over a year of work on Libratus,” added Sandholm. “In fact, the international betting sites were putting us at from four to five to one odds against.” It turned out not to be a safe bet.

Bridges, the Supercomputing Poker Machine

For the tournament, Libratus ran on the newest supercomputer, called Bridges, at the Pittsburgh Supercomputing Center (PSC) The system is a unique configuration built for traditional types of applications, like simulating astrophysical phenomena, and for machine learning workloads, like Libratus. Bridges was built by Hewlett Packard Enterprise (HPE), using Intel® Xeon® processors and Intel® Omni-Path Architecture (Intel® OPA), a new networking technology from Intel that is now being used in the world’s fastest supercomputers. Bridges comprises 846 nodes of computing power, with both “regular memory” nodes and “high-memory” nodes, the latter often used in. WIRED. Retrieved 2017-01-24.

[4] Ibid.

[5] Mbb/g is a normalizing measure of well a player plays.

# # #

Poker

This article was produced as part of Intel’s High Performance Computing (HPC) editorial program, with the goal of highlighting cutting-edge science, research and innovation driven by the HPC community through advanced technology. The publisher of the content has final editing rights and determines what articles are published.