Towards an AI for Rock, Paper, Scissors
Last Updated on May 20, 2020 by Editorial Team
Author(s): Benjamin Peterson
Rock Paper Scissors (RPS) is a decidedly non-trivial game with a fascinating strategy. Here I present a simple RPS AI with code and some high-level notes on the general parameters of the game and the nature of human-vs-human and AI-vs-AI RPS strategy.
An online RPS game
First of all, a basic implementation of an RPS opponent, with source code, is available at my site. Have a look! Now read on…
The game of Rock Paper Scissors
Rock Paper Scissors (hereafter RPS) is very much an under-studied game. Its minimalism makes it unique among well-known games, but its powerful underlying principles are extremely widely applicable and have fascinating implications. Before considering the RPS strategy, this article will briefly sum up the core principles and history of RPS.
Key principles: Non-randomness
It is absolutely crucial to understand that RPS has no random component. The outcome is entirely determined by the decisions of the players. This is why RPS is played to a fast rhythm:
- If RPS is played slowly, or with no time limit at all, the players have time to EITHER come up with a genuinely random value to base their play on, OR to overthink their own throw and the opponent’s throw until the result is so convoluted that it might as well be random. Both these eventualities have the same outcome — random play and an equal expectation of wins over time for either player.
- If RPS is played rapidly and to a rhythm, players do not have time either to find a source of random moves or to overthink. As a result, the outcome is determined by the player’s ability to quickly select a strategy, discard a losing strategy, stay calm, and analyze their opponent’s behavior.
For this reason, while the RPS play varies between regions, there’s always a rapid countdown or chant and quick turnaround time. In the ’90s, for example, when RPS was commonly played for money in some parts of Japan, a ‘best of three’ game could be over in four or five seconds.
Key principles: Sansukumi
Sansukumi (Japanese link) is a Japanese expression referring to three forces that hold one another in check. This concept originates in China but appears early in Japanese literature, associated both with games and with a variety of edifying philosophical concepts. For example, “The emperor commands the citizens, the citizens command the slaves, and the slaves threaten to rebel against the emperor.”
Sansukumi is often used as an important element in balancing computer games; the Infantry/Cavalry/Artillery sansukumi is common in war games, for example.
History and context
Gesture games are a large family of games that have been independently discovered at various points in history. East Asia is particularly rich in both extant and historical gesture games, of which RPS has been the most internationally successful example.
RPS is a specifically Japanese member of this family. The Japanese name for RPS is ‘janken’; Janken is descended from an earlier game called mushi-ken, in which the three throws represent frog, snake, and slug (or possibly centipede, depending on how we reconstruct still earlier variants).
Like mushi-ken, RPS was a minimal sansukumi game with only three options. Other sansukumi gesture games throughout Asia may have five or even four options, and Japan, in particular, is rich in variants that introduce a larger meta-game around the core sansukumi concept — such as yakyuuken (Japanese link) in which competing players are required to sing and dance between throws!
Gesture games not based on sansukumi are often based on odd/even outcomes — including odds and evens but also more complex variants. Whether these odd/even games are fully analogous to sansukumi games is an interesting question.
Although the history of gesture games in general, and Asian sansukumi games in particular, is interesting in itself, it seems fair to say that RPS manages to capture almost all the interesting concepts of the whole family of games in a very minimal set of rules. This, perhaps, explains its success.
Since the mid-20th century, Janken has spread across the world beyond Japan, taking on many names (Paper Scissors Stone, Roshambo…) and developing its own variants and rule sets. During the 1980s, Janken was taken quite seriously, and at this time there was both substantial analysis of strategy in China and Japan, and some large-scale tournaments. Unfortunately, the profile of the game has dropped over the last 20 years or so; there are still some large scale games such as this one((Japanese link), but there is no strong governing body either in Japan or globally. At the same time, interest in RPS as an AI challenge has risen…
RPS strategy is a complex topic that no individual fully understands. Generally, we can separate strategy into three areas:
- Core strategy. Core strategy revolves around creating a high-speed, almost automatic behavior in which a player, without thinking, adapts their throws to their opponent in such a way as to win more than half the time.
- Fringe strategy. This covers strategy relating to team play, timing, and psychology.
- The metagame. Events outside of the game itself, which nevertheless affect the outcome.
The core strategy is generally modeled by serious players, not in terms of the throws rock, paper, and scissors, but in terms of these options:
- As the winner of the last throw: Play the same throw as last time
- As the winner of the last throw: Play the throw that would beat the last throw
- As the winner of the last throw: Play the throw that would lose to the last throw (i.e., your opponent’s last throw)
- As the loser of the last throw: Copy the winning throw
- As the loser of the last throw: Play the throw that would beat the winning throw
- As the loser of the last throw: Play the throw that would lose to the winning throw (i.e., repeat your own last throw)
The reason for this is that the actual symbols rock, paper, and scissors don’t have as much resonance with people as concepts like ‘I won, so I’ll keep doing the same thing’ or ‘I lost, so I’ll do something different.’ The six options above are the basic atoms of RPS; after each throw, each player selects between three of those atoms.
From these atoms, the player assembles in real-time a strategy that they hope will beat their opponent. For example, a player might find themselves thinking first:
“I won, so I’ll throw the same hand again.”
And then, after another few rounds:
“Playing the same throw after winning is no longer working, I’ll shift to playing the loser’s throw after winning.”
And perhaps a few rounds later:
“Playing the loser’s throw after winning is giving me a 50% success rate, but if I switch to repeating a throw after winning, I should be able to bank one or two extra wins before my opponent adapts…”
There are some broad statistical observations that can help determine the initial strategy:
- Rock accounts for about 36% of throws, Paper for 34%, and scissors for 30% overall. These ratios seem to be true over a variety of times, places, and game types.
- Winners repeat their last throw far more often than losers do.
But only a good player can evolve a strategy in real-time and consistently win.
‘Fringe’ strategy is an important part of the game. In particular, fringe strategy relates to timing (the speed of the game and handling of draws) and to team play.
Timing is a key parameter of RPS. The faster the game of RPS, the more likely players are to play ‘default’ moves (throwing rock more, repeating winning throws more), and the less able they are to achieve pseudo-randomness. Faster games of RPS show a greater divergence in win rate between players. This difference is even visible between Japanese chants (“Jan, ken, PON!” with the throw taking place on the third syllable) and slower American changes (“One two three GO!” with a throw on the fourth syllable).
Team play adds another dimension to RPS. Some individuals do less well against others for obscure reasons. Swapping team members, or changing the order of play, can have a profound effect on team games.
The strictness of rules is also important. When rules are very tight, players on a losing streak have few options. When rules are less tight, the losing player may change hands or vary timing in the hope of breaking out of the rut.
Finally, the scoring rules exert a strong influence. RPS is usually played in sets of ‘best of three’ — but not always. How many throws must a player make before having ‘time off’? How frequently does the leading player ‘bank’ their success? Serious RPS is often played as a rapid sequence of three or five ‘best of three’ games with no break in between them — so there’s ample time to learn an opponent’s behavior, but also ample opportunity to _lose despite winning more throws_ if the winning throws are not distributed efficiently.
No discussion of RPS strategy is complete without considering the metagame; in this respect (and in absolutely no other, I feel!), RPS is strangely similar to golf. The metagame (meaning the actions and decisions that are not part of RPS rules but nevertheless affect the outcome of a game) can have a profound effect on outcome over time. Some of the key points to consider are:
- Casual players are more random than serious players and thus harder to beat and also harder to be beaten by.
- RPS players are vulnerable to what in poker is called ‘tilt. ‘ This can be described as a tendency toward error caused by an involuntary perception of ‘unfairness’ in the game. Tilted players will often change throws unnecessarily or repeat patterns because they feel they ‘should’ win. In team play, ’tilted’ players can be rested and given time to recover.
- Money. Especially in the past, RPS has been played for considerable amounts of money. Money management during a game is crucial and gives all players something extra to worry about. When playing for cash, money is typically handled by team leaders or friends to avoid an emotional effect on players.
- Alcohol. It’s sad but true — not all RPS players maintain peak physical condition, and when the play takes place in bars and pubs, alcohol is often a factor. For example, some analysts maintain that players who have consumed alcohol are more likely to repeat a losing throw.
- Bluffing and mind games. I’ve seen a player stroll up to his opponent just before play and say, “Planning on throwing out rock again? Great, I’ll be waiting.” As with poker, tolerance for this sort of technique varies from venue to venue.
Analysis: Building an RPS AI
In the end, my experience over time is that good players beat bad players, perhaps 60% of the time. That’s not a very great difference in expected outcomes, but the brevity of individual RPS games is so small that the difference rapidly adds up. Can we easily achieve an RPS AI that manages at least this outcome?
Well, yes and no.
- It’s surprisingly easy to construct an RPS AI that beats most people by a significant margin initially.
- It’s much harder to construct an RPS AI that beats *everyone*, or that keeps on winning consistently in the long term.
The difficulty of winning in the long term is particularly interesting. A simple AI strategy is usually enough to beat an unprepared human opponent, but the human will eventually learn the AI’s behavior — perhaps consciously but perhaps intuitively — and the AI will stop winning. The elements of a successful AI then are twofold:
- A model with which to construct winning behaviors
- A meta-model which triggers a change to a different strategy when the opponent stops losing
Let’s consider how to build a framework first. The first throw, in fact, is the most difficult for an AI. Should it be completely random? Should it be based on the recent history of throws, or only on the recent history of *first* throws? To avoid this question, the algorithm provided with this article simply plays an endless string of throws, not divided into the ‘best of three’ groups.
Leaving aside the problem of the first throw, the basic choices come down to the six choices described above, of which three are valid at any given moment (not counting the first throw). At each step, the algorithm will have to choose one option from three, given a series of observations that consist of the throw history up until now. We can express this as a categorization problem, categorizing situations as follows:
- Situations in which repeating the last throw is best.
- Situations in which playing the throw that would beat the last throw is best
- Situations in which playing the throw that loses to the last throw is best
If we do accept this as a categorization problem, then a perceptron is the obvious approach. Perceptrons are well suited to expressing the level of confidence in an outcome given a set of observations.
In this case, our perceptron’s input nodes consist of a set of observed features for each of the last n rounds of play, and the ‘activation function’ consists of a simple weighted aggregate of the inputs that can be compared to find the most probable opponent throw.
There are other possible approaches, too:
- Monte Carlo approaches have been suggested, but it’s unlikely that a sufficiently large population of throws *made under extremely similar circumstances* could be obtained as the basis for MC simulation. I also suspect that the combinatorial space of the problem isn’t large enough to make MC a good approach.
- A Markov chain (a contextual Markov chain, not a pure Markov process) seems like a tempting approach, but as with Monte Carlo, part of the challenge would be in building up a corpus of observed throws from which to project the next throw. This approach may be very powerful but would require both an interesting Markov chain algorithm and a pre-existing body of game histories from which scenarios relevant to the current game could be selected. Further research on this would be interesting.
- Bayesian logic seems like a good fit since we are taking a sequence of observations (throws) and adjusting an expected outcome. An important challenge, however, would be this: how would the Bayesian model react to sudden changes in opponent strategy? If an opponent who has thrown scissors twenty times suddenly throws paper four times, can the Bayesian model be re-weighted to represent the fact that the next throw is likely to be paper? I’m not sure.
I should also, by the way, mention a very different approach adopted by the University of Tokyo, as shown here. Shame on you, University of Tokyo!
A perceptron for RPS
Building a perceptron along the above lines is not difficult. A perceptron is just a very degenerate case of a neural net, a case in which there is one input per feature, and they’re all connected to a single output. In fact, a perceptron is just a list of weights — a list of weights to which feedback can be applied to train the model. We’ll need three perceptrons, one for each possible outcome. For each perceptron, the feature vector will be a history of the game (or of the most recent throws), and the weights will relate those features to an expected outcome, obtained by training the perceptron in the usual [neural net] way.
The feature vector must certainly be a ‘history of the game,’ but in practice, there is much leeway in what features we define. The obvious approach is to use features that are the same as our output categories, so for each level of history we store, we make three observations (‘did the opponent repeat a throw? did the opponent play the throw that would lose to their previous throw?’ etc.). Although obvious, this approach is probably not optimal; for example, encoding the winner of each throw as an input to the net turns out to greatly improve prediction.
The number of historical throws to consider as inputs are not trivial to decide. At first glance, one might expect a longer history to be a better predictor, but the experiment shows that the optimum depth of history is about 5 to 7 throws. This suggests that people, when trying to be random, consider a context of roughly 5 to 7 throws. A history above this number tends to result in a worse outcome, suggesting that throws even a few seconds ago are poor predictors of a player’s next decision. The depth of history that can be usefully exploited seems to increase with more sophisticated predictors.
The example provided with this article is effective but limited implementation. There are no hidden nodes, but most importantly, the feature vector simply consists of ‘rock,’ ‘paper,’ and ‘scissors’ — no attempt is made to model the effect of who won the throw, and how any throw relates to the previous throw. This is enough to do surprisingly well for a while against many human opponents but still significantly weaker than an implementation with a more interesting feature vector.
Varying the strategy
Having built a predictor, we now need to know when NOT to use the predictor. It turns out that unlike a human player, an AI does not need to make a conscious effort to switch models. Rather, the AI can run multiple models in parallel, and in addition to predicting the next throw from each model, we *also* predict the likelihood of a given model being the best one for a given throw.
Now, we can see a use for the somewhat naive model on this page; we need a population of different perceptrons with different features sets. At each throw, we back-test each perceptron model to see how well it is doing. We expect a sufficiently skilled opponent (AI or not) to change model now and then, and by selecting between models on each throw, we should be able to adapt as soon as we have suffered a couple of losses.
A ‘couple’ of losses is a very vague term, and it highlights another challenge — identifying a cue to change models, given that we expect clusters of losses to occur occasionally in even a very successful model. Again, it’s likely that the best approach is not to identify discrete features that trigger model change, but rather to maintain a rolling view of each model’s power based on back-testing; however, it’s possible that a sufficiently good meta-model could identify ‘inflection points’ in the game at which the model should be changed.
This article has only scratched the surface of the history, strategy, and technology of RPS. I hope, though, that this article has at least highlighted the point that RPS is anything but a game of chance. I look forward to the day when RPS AIs (forbidden, of course, from using random number generators) compete at the highest levels in this fascinating discipline.
The TypeScript source code for the RPS opponent discussed on this page can be found here. Feel free to use it for whatever you like. Please credit me if appropriate.
Towards an AI for Rock, Paper, Scissors was originally published in Towards AI — Multidisciplinary Science Journal on Medium, where people are continuing the conversation by highlighting and responding to this story.
Published via Towards AI