Pluribus, a newly minted algorithm designed by academics from Carnegie Mellon and Facebook made headlines recently for having beaten the world’s top human professional poker players in a six-player no-limit Texas hold’em poker.
Having started in one-on-one games like ‘Chess’, and the much more complex ‘GO’, the past five years have seen algorithms move to the arena of more complex, multi-player games. This presents unique challenges in the sheer scale and size of possibilities, and because it requires some amount of anticipation of other agents and game-theory to solve, it also starts to resemble more real-life systems from military applications, to complex multi-agent systems like the economy.
It is useful to pause on Pluribus as a short case study, and it’s lessons for financial profession of what is to come, and probably sooner than we think.
Firstly, Pluribus took six copies of itself and trained purely on its own (self-play), in isolation from human players. It can dissect each move, moving into alternate ‘what if’ scenarios that are not possible for an actual game, building a ‘regret’ indicator of not having taken other actions that could have made it (probabilistically) more money.
Unlike humans, it does not play to the observed tendencies of the opponents. It does not look for bluffs, for tells, for nervous twitches, and doesn’t take into account if your voice sounds nervous when you move your chips into the centre of the table and theatrically declare ‘All in’. That is very much a human way of playing the game, and it hasn’t learnt from humans, nor is it trying to replicate human behaviour in order to win. It is finding it’s own strategy in the game, and pursuing it relentlessly. In this sense, it is truly in the concept of an ‘AI’ rather than an intelligent automation, and is not bound by ‘human intuition’ and therefore less transparent for us. For financial services, we may be significantly more concerned with interpretability, even for elite expert systems.
Secondly, Pluribus does not (solely) rely on mathematical game theory for it’s strategy. This is because in multi-player games with imperfect information, there are no unique ‘solution’ to something like Poker, there is only a vast tree of probabilities. The approach is instead, like ‘GO’ to let the data and training create optimality, rather than mathematical theory. Almost apologetically, the authors wave goodbye to theory, and let the algorithm find its own pattern that is ‘capable of consistently defeating elite human professionals’. As an analogy for finance, it does not start with an economic theory that it wants to confirm, as is the case for macro-economists, quants, and financial pundits globally. It uses statistical methods, and if economic theories emerge, so long as they are efficient, they are integrated.
A successful strategy here is well defined: win more money in repeated games. In finance, success may take many forms, from academic publication and suitability, to short-horizon and long-horizon outcomes, efficiency of a theory, may be defined in multiple ways, from academic publication, to credibility, Economic theory are patterns observed by academics, mostly confirmed by data and statistics, and deeply embedded in our language, and our discourse. These algorithms take some lessons in their framework for pattern recognition, but essentially fill in the blanks afterward.
Thirdly, the efficiency is staggering. Pluribus was trained in 8 days, and required 512 GB of memory, equivalent to $144 in cloud computing costs. In playing, Pluribus runs on two CPUs, and uses less than 128GB of memory, in other words, probably the device you’re reading this on now. Its predecessors (AlphaGo) with similar problem used something like 280 GPUs, while Deep Blue in 1997 used 480 custom chips. Like the story of the personal computer, multiplying and available processing power is empowering these projects. Consider the global supply chain, or the vast networks of agents trading a stock exchange that dwarf a small six player poker game.
Finally, the authors published the win-rate of the algorithm. In academic terms it is strong, but not completely convincing. While the media is quick to publicise that we have solved a problem, the data is a little less convincing, although it has been published in ‘Science’ which is a peer reviewed. It doesn’t feel like the complete thing at this point, but a kind of early design version that has been made public. Like quantitative models in finance, they have a ‘hit rate’ associated with winning, and although its higher (arguably more so than factor models for example) it is not yet ‘conquered’ the game in its entirety.
This case study offers a few takeaways that we may consider for the financial services industry. Enormous computing power married with the almost naïve practical applications of data science is conquering increasingly complex problems. These problems are starting to resemble those found in applied industries, and will move from ‘games’ to applications soon. This innovation is almost entirely theory-agnostic, and is likely to establish (and even encourage) non-intuitive, and entirely self-taught solutions. ‘Solving’ the financial markets are likely to remain outside the reasonable bound for these models, but when innovation comes in leaps and bounds, it is harder to forecast what is reasonable. In short, the academic foundations of finance and macro-economics are likely to embrace significant challenge.
The business of developing unique algorithms, like Pluribus, opens opportunities for algorithms to be seen as stand alone products that can be traded or shared (in the case of free-sourcing), and forms the basis of a value proposition. Financial services industries may find themselves buying and selling these algorithms, and associated services, as this field is likely to get increasingly competitive, and as algorithms are pitted against algorithms when it comes to competitive services like the provision of credit and insurance. While current efforts are to consolidate (messy) data, and provide some useful analytics to empower decision making, the next phase is likely to be honing of models that start to use that data well, and perhaps better than humans.
As a disruptor, Pluribus makes the business model of online gambling much more difficult. Algorithms can disguise themselves as human players, or they can sit with human players as advisors, making them a necessary competitive advantage. These algorithms, which are already in practise in online gambling for example, and other systems of change with complex inputs and a definitive ‘win’ scenario, will continue to evolve to competitive markets and industries. Trading algorithms that place trades selectively, and some elements of high-frequency trading are already in this space, specifically in the short-term efficiency gains that often try to forecast other trades in the market.
Michael Kollo is a former general manager of quantitative solutions and risk at HESTA