Computer scientists from the University of Alberta in Canada have programmed an AI poker player that can never lose across a series of hands of heads-up limit Texas hold'em.
Named Cepheus, the program uses a strategy for a two-player game of heads-up limit Texas hold'em poker that's so brilliant, statistical analysis says that even if a person spends their entire lifetime playing poker against this program, the program will never lose. The AI will only ever come out on top, or break even. It will never make a mistake, even not knowing what cards its opponent is holding.
And it's not about winning every hand - it will get dealt dud cards as often as the next guy - but the AI has figured out how to turn even the worst situations around to come out on top. "[It] will lose if it's dealt an inferior hand, but it will minimise its losses as best as is mathematically possible and will slowly but surely take your money by making the "perfect" decision in any given scenario," says Jason Koebler at Motherboard. "Heads-up limit hold'em, it can be said, has been 'solved'."
The difference between 'limit' and 'no limit' poker comes down to money - you're either limited to set amounts you can bet, or you can bet as much as you like. The latter would be impossible for a computer program to solve, because predicting random and limitless amounts of money is a feat in and of itself. But that doesn't make what this program can do any less impressive.
Compare what it's doing to other games that have been 'solved' by AI players. Even our feeble human brains can figure out how to be unbeatable in naughts and crosses. Games like chess and checkers take the complexity way up a notch because of all the different possibilities at every turn, but all the information is still there on the board. The opponent can't hide anything, other than their strategy, but that doesn't matter, because the AI already knows every possible play, and has already figured out the perfect strategy for countering each move before the game has even started.
But what about poker? The program would be cheating if it knew what was on the two hidden cards its opponent holds in every round. As Koebler notes, Cepheus must somehow know how to navigate the 3 x 10^14 possible decisions in a game of limit poker, in which, at any given moment, not all the information is known to it. The University of Alberta team refers to this kind of game, where not all information is known, as an 'imperfect-information' game.
"The solutions for imperfect-information games require computers to handle the additional complication of not knowing exactly what the game's state is, such as not knowing an opponent's hand," one of the team, Neil Burch, told Jeremy Hsu at IEEE Spectrum. "Such techniques require more computer memory and computing power."
How much memory, exactly? Roughly 262 terabytes. Yikes. And what does it do with all that memory? Cepheus runs an algorithm called CFR+, which was invented by the team as an enhancement of an existing algorithm known as the counterfactual regret minimisation (CFR).
Regret minimisation is, essentially, about learning from your mistakes. So, as Arielle Duhaime-Ross explains at The Verge, if Cepheus thinks about the possibility of raising a bet, and decides to play randomly instead and loses, it will retrace its steps, figure out how much it could have won if it had raised, and will store that amount as a 'regret value'.
This value is placed on every possible opportunity that this same decision is available to the computer, so it will avoid making the same mistake. This is very different from the way humans play - if we take a big loss, we focus on trying to win that back, rather than on how to use that information to perfect the rest of our game. Cepheus will continue to update itself with these regret values until it reaches what the team calls "perfect play".
"CFR+ still works like the old CFR algorithms by gradually developing better solutions through playing thousands and hundreds of thousands of hands of poker," Hsu writes at IEEE Spectrum. "But it can develop a very good solution much faster than any past CFR algorithm by being more efficient; basically the equivalent of taking fewer, bigger steps toward the best solution."
The team published their findings in the journal Science.
According to IEEE Spectrum, once the team had figured out the strategy, they managed to whittle the memory requirements down to less than 11 terabytes to store the counterfactual values, and 6 terabytes to compute the main strategy. It ended up taking 24 trillion hands of poker over 70 days, and 200 computers running the CFR+ algorithm with 32 GB of RAM and 24 central processing units to train Cepheus to 'solve' the game. "We could continue to train it, and it would continue to get better," one of the team, Michael Bowlin, told Arielle Duhaime-Ross at The Verge. "But we stopped at this point because we can't tell it apart from being perfect."
The team is now going to work on adjusting the algorithm to play heads-up no limit poker. They know, because of the variables, it will probably be impossible to create an 'unbeatable' AI player, but they hope to create a player that can beat the best human players in the world. That's the goal. And they're also thinking about how to use the technology to help governments and private companies create better security systems that can't be hacked.
Want to try your hand against Cepheus? Click here to play against it. I just hope you enjoy the feeling of losing.
Sources: Motherboard, The Verge, IEEE Spectrum