Combinatorial Games: From Theoretical Solving To AI Algorithms
Combinatorial Games: From Theoretical Solving To AI Algorithms
AI algorithms
Eric Duchêne?
1 Combinatorial games
1.1 Introduction
Playing combinatorial games is a common activity for the general public. Indeed,
the games of Go, Chess or Checkers are rather familiar to all of us. However, the
underlying mathematical theory that enables to compute the winner of a given
game, or more generally, to build a sequence of winning moves, is rather recent.
It was settled by Berlekamp, Conway and Guy only in the late 70s [2] , [8]. The
current section will present the highlights of this beautiful theory.
In order to avoid any confusion, first note that combinatorial game theory
(here shortened as CGT) is very different from the so-called ”economic” game
theory introduced by Von Neumann and Morgenstern. I often consider that a
preliminary activity to tackle CGT issues is the reading of Siegel’s book [31]
which gives a strong and formal background on CGT. Strictly speaking, a com-
binatorial game must satisfy the following criteria:
?
Supported by the ANR-14-CE25-0006 project of the French National Research
Agency and the CNRS PICS-07315 project.
2
Examples of such games are Nim [6] or Domineering [20]. In the first one,
game positions are tuples of non-negative integers (a1 , . . . , an ). A move consists
in strictly decreasing exactly one of the values ai for some 1 ≤ i ≤ n, provided
the resulting position remains valid. The first player unable to move loses. In
other words, reaching the position (0, . . . , 0) is a winning move. The game Dom-
ineering is played on a rectangular grid. The two players alternately place a
domino on the grid under the following condition: Left must place his dominoes
vertically and Right horizontally. Once again, the first player unable to place a
domino loses. Figure 1 illustrates a position for this game, where Left started
and wins, since Right cannot place any additional horizontal domino.
Definition 2 (Game tree). Given a game G with starting position S, the game
tree associated to (G , S) is a semi-ordered rooted tree defined as follows:
– The vertex root correspond to the starting position S.
– All the game positions reachable for Left (resp. Right) in a single move from
S are set as left (resp. right) children of the root.
3
Figure 2 gives an example of such a game tree for Domineering with starting
position . For more convenience, note that only the top three levels of the
tree are depicted (there is one additional level when fully expanded).
Now, playing any game on its game tree consists is moving alternately a to-
ken from the root to a leaf. Each player must follow an edge corresponding to
his direction (i.e., full edges for Left and dashed ones for Right). In the normal
play convention, the first player who moves the token on a leaf of the tree is the
winner. We will see later on that this tree representation is very useful, both to
compute exact and approximate strategies.
In view of Definition 1, one can remark that the specified conditions are too
strong to cover some of the well-known abstract 2-player games. For example,
Chess and Checkers may have draw outcomes, which is not allowed in a combi-
natorial game. This is due to the fact that some game positions can be visited
several times during the play. Such games are called loopy. In games like Go,
Dots and Boxes or Othello, the winner is determined with a score and not ac-
cording to the player making the last move. However, such games remain very
close to combinatorial games. Some keys can be found in the literature to deal
with their resolution ([31], chap. 6 for loopy games, and [24] for an overview on
scoring game theory). In addition, first attempts to built an “absolute” theory
that would cover normal and misère play conventions, loopy and scoring games
have been recently made [23]. Note that the concepts and issues that will be
introduced in the current survey make also sense in this extended framework.
– Can one provide a winning strategy, i.e., a sequence of optimal moves for
the winner whatever his opponent’s moves are ?
For each of the above questions, I will give some parts of answer relative to
the known theory.
The first problem is the determination of the winner of a given game, also
called outcome. In a strict combinatorial game (i.e., a game satisfying the con-
ditions of Definition 1), there are only four possible outcomes [31]:
This property can be easily deduced from the game tree, by labeling the
vertices from the leaves to the root. Consequently, such an algorithm allows to
compute the outcome of a game in polynomial time in the size of the tree. Yet,
a game position has often a smaller input size than the size of its correspond-
ingPgame tree. For example, a position (a1 , . . . , an ) of Nim has an input size
n
O( i=1 log2 (ai )), which is far smaller than the number of positions in the game
tree. Hence, computing the whole game tree is generally not the good key to
determine effectively the answer to Problem 1 below.
Note that for loopy games, the outcome Draw is added to the list of the
possible outcomes.
The concept of game value was first defined by Conway in [8]. In his theory,
each game position is assigned a numeric value among the set of surreal numbers.
Roughly speaking, it corresponds to the number of moves ahead that Left has
towards his opponent. For instance, position of Domineering has value
−2 since Right can place two more dominoes than Left before being blocked. A
5
more formal definition can be found in [31]. Just note that Conway’s values are
defined recursively and can also be computed from the game tree.
Knowing the value of a game allows to deduce its outcome. For example, all
games having a strictly positive value are L and all games having a zero value
are P . Moreover, its knowledge is even more paramount when the game splits
in sums: it means that a game G can be considered as a set of independent
smaller games whose values allows to compute the overall value of G . Consider
the example depicted by Fig. 3. This game position can be considered as a sum
to a position (a01 , . . . , a0n ) whose bitwise sum a01 ⊕ . . . ⊕ a0n equals 0 (meaning that
it will be losing for the other player).
– Problem 1 and Problem 3 can be solved in polynomial time for any starting
position S of G .
– Winning strategies in G can be consumed in at most an exponential number
of moves.
– These two properties remain valid for any sum of two game positions of G .
Game Complexity
Tic Tac Toe PSPACE-complete
Othello PSPACE-complete
Hex PSPACE-complete
Amazons PSPACE-complete
Checkers EXPTIME-complete
Chess EXPTIME-complete
Go EXPTIME-complete
graphs. In 2009, Demaine and Hearn wrote a rich book about the complexity of
many combinatorial games and puzzles [16]. If this list confirms that games be-
long to decision problems of highest complexity, some of them admit a lower one.
The game of Nim is one of them and is luckily not the only one. For example,
many games played on tuples of integers admit a polynomial winning strategy
derived from tools arising from arithmetic, algebra or combinatorics on words.
See the recent survey [11] which summarizes some of these games. Moreover,
some games on graphs proved to be PSPACE-complete have a more afford-
able complexity on particular families of graphs. For example, Node Kayles
is proved to be polynomial on paths and cographs [4]. This is also the case for
Geography played on undirected graphs [19]. Finally, note that the complexity
of Domineering is still an open problem.
A natural question arises when reading the above table. What makes a game
harder than another one ? If there is obviously no universal answer, Fraenkel
suggests several relevant criteria in [17].
– The average branching factor, i.e., the average number of available moves
from a position (around 35 for Chess and 250 for the game of Go).
– The total number of game positions (1018 for Checkers, 10171 for the game
of Go).
– The existence of cycles. In other words, loopy games are harder than non
loopy ones.
– Impartial or Partizan. A game is said impartial if both players always have
the same available moves. It implies the game tree to be symmetric. Nim is
an example of an impartial game, whereas Domineering and all the games
mentioned in Table 2 are not. Such games are called partizan. Impartial
games are in general easier to solve since their Conway’s values are more
“controlled”.
– The fact that the game can be decomposed into sums of smaller independent
games (as it is the case for Domineering).
– The number of final positions.
4 3
7 4 12 3
7 -5 4 -2 12 3
10 7 -5 4 3 -2 12 3 8
branches, it turns out that the overall value of the root is at least v, then one can
prune all the unexplored branches whose values are guaranteed to be less than v.
The ordering of the branches in the game tree then turns out to be paramount,
as it can considerably increase the efficiency of the algorithm. In addition to
this technique, one can also mention the use of transposition tables (adjoined to
alpha-beta pruning) to speed up the search in the game tree.
The combination of both MiniMax and Monte Carlo methods is called MCTS,
which stands for Monte Carlo Tree Search. Since its introduction, it has been
considered by much research on AI for games. This success is mainly explained
by the significant improvements made by computer Go programs that are using
this technique. Moreover, it has also shown very good performances for problems
for which other techniques had poor ones (e.g. some problems in combinatorial
optimization, puzzles, multi-player games, scheduling , operation research...).
Another great advantage of MCTS is that there is no need of a strong expert
knowledge to implement a good algorithm. Hence it can be considered for prob-
lems for which humans do not have a strong background. In addition, MCTS
can be stopped at any time to provide the current best solution and the tree
built so far can be reused for the next step.
In what follows, we will give the necessary information to understand the
essence of MCTS applied to games. For additional material, the reader could
refer to the more exhaustive survey [7].
The basic MCTS algorithm consists in building progressively the game tree,
guided by the results of the previous explorations of it. Unlike the standard Min-
iMax algorithm, the tree is built in an asymmetric manner. The in-depth search
is considered only for the most promising branches that are chosen according
to a tuned selection policy. This policy relies on the values of each node of the
tree. Roughly speaking, the value of a node vi corresponds to the percentage
of winning random simulations when vi is played. Of course this value become
more and more accurate when the tree grows.
11
update
descent
A3 B1 C2 E3
0.1 0.6 0.2 0.1
A3 D1 E1
0.8 0.2 0.1
growth
E3 B4
1 0
rollout
P
Final positions
to the stage they impact. Table 3 summarizes the most important enhancements
brought to MCTS.
Stage Improvement
Descent UCT (2006) [22]
Descent RAVE (2007) [15]
Descent Criticality (2009) [10]
Growth FPU (2007) [35]
Rollout Pool-RAVE (2011), [26]
Rollout NST (2012) [33]
Rollout BHRF (2016) [14]
Update Fuego reward (2010) [13]
One of the most important feature of the algorithm is the node selection
policy during the descent. At each step of this stage, MCTS chooses the node
that maximizes (or minimizes, according to whether it is Left or Right’s turn)
some quantity. A formula that is frequently used is called Upper Confidence
Bounds (UCB). It associates to each node vi of the tree the following value:
r
ln N
V (vi ) + C × ,
ni
where V (vi ) is the percentage of winning simulations involving vi , ni is the total
number of simulations involving vi , N is the number of times its parent has
been visited, and C is a tunable parameter. This formula is well-known in the
context of bandit problems (choose sequentially amongst n actions the best one
in order to maximize the cumulative reward). It allows in particular to deal with
the exploration-exploitation dilemma, i.e., to find a balance between exploring
unvisited nodes and reinforce the statistics of the best ones. The combination of
MCTS and UCB is called UCT [22].
A second common enhancement for MCTS during the descent is the RAVE
estimator (Rapide Action-Value Estimator [15]). It consists in considering each
move of the rollout as important as the first move. In other words, the moves
visited during the rollout stage will also affect the values of the same moves
in the tree. On Fig. 5, imagine the move E3 is played during the simulation
depicted with dashed line. Then RAVE will thus modify the UCB value of the
node E3 of the tree (the RAVE formula will not be given here).
MCTS has also been widely studied in order to increase the quality of the
random simulations. A first way to mimic the strategy of a good player is to
consider evaluations functions based on expert knowledge. In [34], moves are
categorized according to several criteria: location on the board, capturing or
13
blocking potential and proximity to the last move. Then the approach is to eval-
uate the probability that a move belonging to a category will be played by a real
player. This probability is determined by analyzing a huge sample of real games
played by either humans or computers. Of course this strategy is fully specific to
the game on which MCTS is applied. More generic approaches were considered
such as NST [33], BHRF [14] or Pool RAVE [26]. In the first two ones, good
sequences of moves are kept in memory. Indeed, it is rather frequent that given
successive attacking moves of a player, there is an usual sequence of answers of
the opponent to defend himself. In the second one, the random rollout policy is
biased by the values in the game tree, i.e., good moves visited in the tree are
likely to be played during a simulation.
4 Perspectives
precisely, the neural network approach proposed by Google requires a wide set
of expert knowledge and needs computer power for a long time. However, there
are some games for which both are not available. In particular, the example of
General Game Playing is a real challenge for AI algorithms, as the rules of the
game are given at the latest 20 minutes before running the program. Supervised
learning techniques like those of Alpha Go are thus almost impossible to set up,
and standard MCTS enhancements are currently the most effective ones for this
kind of problem. In addition, one can also look for adapting MCTS to problems
of higher uncertainty such as multi-player games or games having randomness
in their rules (use of dices for example). First results have already been made in
that direction [36].
References
1. L. V. Allis, Searching for solutions in games an artificial intelligence, PhD Maas-
tricht, Netherland: Limburg University (1994).
2. E. Berlekamp, J. H. Conway, and R. K. Guy, Winning ways for your mathematical
plays, Vol. 1, Second edition. A K Peters, Ltd., Natick, MA (2001).
3. A. Bernstein and M. Roberts, Computer V. Chess player, Scientific American 198
(1958), 96-105.
4. H. L. Bodlaender et D. Kratsch, Kayles and nimbers, J. Algorithms 43 (2002),
106–119.
5. E. Bonnet and A. Saffidine, Complexit des Jeux (in french), Bulletin de la ROADEF
31 (2014), 9–12.
6. C. L. Bouton, Nim, a game with a complete mathematical theory, Annals of Math.
3 (1905), 35–39.
7. C. Browne, E. Powley, D. Whitehouse, S. Lucas, P. I. Cowling, P. Rohlfshagen, S.
Tavener, D. Perez, S. Samothrakis and S. Colton, A Survey of Monte Carlo Tree
Search Methods, IEEE Transactions on computational intelligence and AI in games
4 (1) (2012), 1–43.
8. J. H. Conway, On number and games, Academic Press Inc. (1976).
9. R. Coulom, Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search,
in Proc. 5th Int. Conf. Comput. and Games, Turin, Italy (2006), 72-83.
10. R. Coulom, Criticality: a Monte-Carlo Heuristic for Go Programs, invited talk in
University of Electro-Communication, Tokyo, Japan (2009).
11. E. Duchêne, A.S. Fraenkel, V. Gurvich, N.B. Ho, C. Kimberling and U. Larsson,
Wythoff Wisdom, Preprint.
12. D.J. Edwards and T. P. Hart, The α − β heuristic, Artificial intelligence project
RLE and MIT Computation Centre, Memo 30 (1963).
13. M. Enzenberger, M. Muller, B. Arneson, R. Segal, Fuego an open-source framework
for board games and Go engine based on Monte Carlo tree search. IEEE Trans.
Comput. Intell. AI Games 2(4), 259270 (2010)
14. A. Fabbri, F. Armetta, E. Duchne, and S. Hassas, A Self-Acquiring Knowledge Pro-
cess for MCTS, International Journal on Artificial Intelligence Tools 25 (1) (2016).
15. S. Gelly, D. Silver, Combining online and offline knowledge in UCT, in Proceedings
of the International Conference on Machine Learning (ICML), ed. by Z. Ghahramani,
ACM, New York (2007), 273-280.
16. R. A. Hearn and E. D. Demaine, Games, Puzzles, and Computation, A K Peters
(2009).
15
17. A.S. Fraenkel, Nim is easy, chess is hard - but why??, J. Internat. Computer Games
Assoc. 29 (2006), 203–206.
18. A.S. Fraenkel, Complexity, appeal and challenges of combinatorial games, Theo-
retical Computer Science 313 (2004), 393–415.
19. A.S. Fraenkel et S. Simonson, Geography, Theoretical Computer Science 110
(1993), 197–214.
20. M. Gardner, Mathematical Games: Cram, crosscram and quadraphage: new games
having elusive winning strategies, Scientific American 230 (1974), 106–108.
21. A. Junghanns and J. Schaeffer, Sokoban : Enhancing general single-agent search
methods using domain knowledge, Artificial Intelligence 129(1) (2001), 219-251.
22. L. Kocsis, C. Szepesvri, Bandit based Monte-Carlo planning, Lecture Notes in
Artificial Intelligence 4212, Springer, Berlin (2006),282-293.
23. U. Larsson, R.J. Nowakowski and C. Santos, Absolute Combinatorial Game Theory,
arXiv:1606.01975 (2016).
24. U. Larsson, R.J. Nowakowski and C. Santos, When waiting moves you in scoring
combinatorial games, arXiv:1505.01907 (2015).
25. G. Renault and S. Schmidt, On the complexity of the misre version of three games
played on graphs, preprint.
26. A. Rimmel, F. Teytaud and O. Teytaud, Biasing Monte-Carlo simulations through
RAVE values, International Conference on Computers and Games (2011), 59–68.
27. L. Rougetet, Combinatorial games and machines, A Bridge between Conceptual
Frameworks, Sciences, Society and Technology Studies, Pisano, Springer, Dordrecht,
(2015), 475–494.
28. T. J. Schaeffer, On the complexity of some two-person perfect-information games,
J. Comput. System Sci., 16 (1978), 185–225.
29. J. Schaeffer, N. Burch, Y. Bjrnsson, A. Kishimoto, M. Mller, R. Lake, P. Lu and
S. Sutphen, Checkers Is Solved, Science 317(5844) (2007), 1518–1522.
30. C. Shannon, Programming a computer for playing chess, Philosophical Magazine
Series 7 41(314) (1950), 256-275.
31. Aaron N. Siegel, Combinatorial Game Theory, San Francisco, CA (2013).
32. D. Silver, A. Huang, C.J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J.
Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot et al., Mastering the
game of Go with deep neural networks and tree search, Nature 529(7587) (2016),
484–489.
33. M.J.W. Tak, M.H.M. Winands, Y. Bjornsson, N-grams and the last-good-reply
policy applied in general game playing. IEEE Trans. Comput. Intell. AI Games 4(2)
(2012), 73-83.
34. Y. Tsuruoka, D. Yokoyama and T. Chikayama, Game-tree search algorithm based
on realization probability, ICGA J. 25(3) (2002), 132–144.
35. Y. Wang and S. Gelly, Modifications of UCT and sequence-like simulations for
Monte-Carlo Go, IEEE Symposium on Computational Intelligence and Games, Hon-
olulu, Hawai (2007), 175–182.
36. M. Winands, Monte-Carlo Tree Search in Board Games, Handbook of Digital
Games and Entertainment Technologies, Springer (2015), 1–30.