Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit 22e571d

Browse files
antmarakisnorvig
authored andcommitted
Games: Alpha-Beta + Updates (aimacode#546)
* Update games.py * Create test_games.py * Create games.ipynb
1 parent 940f0c9 commit 22e571d

File tree

3 files changed

+154
-36
lines changed

3 files changed

+154
-36
lines changed

games.ipynb

Lines changed: 140 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@
2020
" * Tic-Tac-Toe\n",
2121
" * Figure 5.2 Game\n",
2222
"* Min-Max\n",
23+
"* Alpha-Beta\n",
2324
"* Players\n",
2425
"* Let's Play Some Games!"
2526
]
@@ -347,7 +348,7 @@
347348
"cell_type": "markdown",
348349
"metadata": {},
349350
"source": [
350-
"`utility`: Returns the value of the terminal state for a player ('MAX' and 'MIN')."
351+
"`utility`: Returns the value of the terminal state for a player ('MAX' and 'MIN'). Note that for 'MIN' the value returned is the negative of the utility."
351352
]
352353
},
353354
{
@@ -363,19 +364,21 @@
363364
},
364365
{
365366
"cell_type": "code",
366-
"execution_count": 12,
367+
"execution_count": 3,
367368
"metadata": {},
368369
"outputs": [
369370
{
370371
"name": "stdout",
371372
"output_type": "stream",
372373
"text": [
373-
"3\n"
374+
"3\n",
375+
"-3\n"
374376
]
375377
}
376378
],
377379
"source": [
378-
"print(fig52.utility('B1', 'MAX'))"
380+
"print(fig52.utility('B1', 'MAX'))\n",
381+
"print(fig52.utility('B1', 'MIN'))"
379382
]
380383
},
381384
{
@@ -472,9 +475,20 @@
472475
"source": [
473476
"# MIN-MAX\n",
474477
"\n",
475-
"This algorithm (often called *Minimax*) computes the next move for a player (MIN or MAX) at their current state. It recursively computes the minimax value of successor states, until it reaches terminals (the leaves of the tree). Using the `utility` value of the terminal states, it computes the values of parent states until it reaches the initial node (the root of the tree). The algorithm returns the move that returns the optimal value of the initial node's successor states.\n",
478+
"## Overview\n",
476479
"\n",
477-
"Below is the code for the algorithm:"
480+
"This algorithm (often called *Minimax*) computes the next move for a player (MIN or MAX) at their current state. It recursively computes the minimax value of successor states, until it reaches terminals (the leaves of the tree). Using the `utility` value of the terminal states, it computes the values of parent states until it reaches the initial node (the root of the tree).\n",
481+
"\n",
482+
"It is worth noting that the algorithm works in a depth-first manner."
483+
]
484+
},
485+
{
486+
"cell_type": "markdown",
487+
"metadata": {},
488+
"source": [
489+
"## Implementation\n",
490+
"\n",
491+
"In the implementation we are using two functions, `max_value` and `min_value` to calculate the best move for MAX and MIN respectively. These functions interact in an alternating recursion; one calls the other until a terminal state is reached. When the recursion halts, we are left with scores for each move. We return the max. Despite returning the max, it will work for MIN too since for MIN the values are their negative (hence the order of values is reversed, so the higher the better for MIN too)."
478492
]
479493
},
480494
{
@@ -492,6 +506,8 @@
492506
"cell_type": "markdown",
493507
"metadata": {},
494508
"source": [
509+
"## Example\n",
510+
"\n",
495511
"We will now play the Fig52 game using this algorithm. Take a look at the Fig52Game from above to follow along.\n",
496512
"\n",
497513
"It is the turn of MAX to move, and he is at state A. He can move to B, C or D, using moves a1, a2 and a3 respectively. MAX's goal is to maximize the end value. So, to make a decision, MAX needs to know the values at the aforementioned nodes and pick the greatest one. After MAX, it is MIN's turn to play. So MAX wants to know what will the values of B, C and D be after MIN plays.\n",
@@ -546,6 +562,119 @@
546562
"print(minimax_decision('A', fig52))"
547563
]
548564
},
565+
{
566+
"cell_type": "markdown",
567+
"metadata": {},
568+
"source": [
569+
"# ALPHA-BETA\n",
570+
"\n",
571+
"## Overview\n",
572+
"\n",
573+
"While *Minimax* is great for computing a move, it can get tricky when the number of games states gets bigger. The algorithm needs to search all the leaves of the tree, which increase exponentially to its depth.\n",
574+
"\n",
575+
"For Tic-Tac-Toe, where the depth of the tree is 9 (after the 9th move, the game ends), we can have at most 9! terminal states (at most because not all terminal nodes are at the last level of the tree; some are higher up because the game ended before the 9th move). This isn't so bad, but for more complex problems like chess, we have over $10^{40}$ terminal nodes. Unfortunately we have not found a way to cut the exponent away, but we nevertheless have found ways to alleviate the workload.\n",
576+
"\n",
577+
"Here we examine *pruning* the game tree, which means removing parts of it that we do not need to examine. The particular type of pruning is called *alpha-beta*, and the search in whole is called *alpha-beta search*.\n",
578+
"\n",
579+
"To showcase what parts of the tree we don't need to search, we will take a look at the example `Fig52Game`.\n",
580+
"\n",
581+
"In the example game, we need to find the best move for player MAX at state A, which is the maximum value of MIN's possible moves at successor states.\n",
582+
"\n",
583+
"`MAX(A) = MAX( MIN(B), MIN(C), MIN(D) )`\n",
584+
"\n",
585+
"`MIN(B)` is the minimum of 3, 12, 8 which is 3. So the above formula becomes:\n",
586+
"\n",
587+
"`MAX(A) = MAX( 3, MIN(C), MIN(D) )`\n",
588+
"\n",
589+
"Next move we will check is c1, which leads to a terminal state with utility of 2. Before we continue searching under state C, let's pop back into our formula with the new value:\n",
590+
"\n",
591+
"`MAX(A) = MAX( 3, MIN(2, c2, .... cN), MIN(D) )`\n",
592+
"\n",
593+
"We do not know how many moves state C allows, but we know that the first one results in a value of 2. Do we need to keep searching under C? The answer is no. The value MIN will pick on C will at most be 2. Since MAX already has the option to pick something greater than that, 3 from B, he does not need to keep searching under C.\n",
594+
"\n",
595+
"In *alpha-beta* we make use of two additional parameters for each state/node, *a* and *b*, that describe bounds on the possible moves. The parameter *a* denotes the best choice (highest value) for MAX along that path, while *b* denotes the best choice (lowest value) for MIN. As we go along we update *a* and *b* and prune a node branch when the value of the node is worse than the value of *a* and *b* for MAX and MIN respectively.\n",
596+
"\n",
597+
"In the above example, after the search under state B, MAX had an *a* value of 3. So, when searching node C we found a value less than that, 2, we stopped searching under C."
598+
]
599+
},
600+
{
601+
"cell_type": "markdown",
602+
"metadata": {},
603+
"source": [
604+
"## Implementation\n",
605+
"\n",
606+
"Like *minimax*, we again make use of functions `max_value` and `min_value`, but this time we utilise the *a* and *b* values, updating them and stopping the recursive call if we end up on nodes with values worse than *a* and *b* (for MAX and MIN). The algorithm finds the maximum value and returns the move that results in it.\n",
607+
"\n",
608+
"The implementation:"
609+
]
610+
},
611+
{
612+
"cell_type": "code",
613+
"execution_count": 2,
614+
"metadata": {
615+
"collapsed": true
616+
},
617+
"outputs": [],
618+
"source": [
619+
"%psource alphabeta_search"
620+
]
621+
},
622+
{
623+
"cell_type": "markdown",
624+
"metadata": {},
625+
"source": [
626+
"## Example\n",
627+
"\n",
628+
"We will play the Fig52 Game with the *alpha-beta* search algorithm. It is the turn of MAX to play at state A."
629+
]
630+
},
631+
{
632+
"cell_type": "code",
633+
"execution_count": 8,
634+
"metadata": {},
635+
"outputs": [
636+
{
637+
"name": "stdout",
638+
"output_type": "stream",
639+
"text": [
640+
"a1\n"
641+
]
642+
}
643+
],
644+
"source": [
645+
"print(alphabeta_search('A', fig52))"
646+
]
647+
},
648+
{
649+
"cell_type": "markdown",
650+
"metadata": {},
651+
"source": [
652+
"The optimal move for MAX is a1, for the reasons given above. MIN will pick move b1 for B resulting in a value of 3, updating the *a* value of MAX to 3. Then, when we find under C a node of value 2, we will stop searching under that sub-tree since it is less than *a*. From D we have a value of 2. So, the best move for MAX is the one resulting in a value of 3, which is a1.\n",
653+
"\n",
654+
"Below we see the best moves for MIN starting from B, C and D respectively. Note that the algorithm in these cases works the same way as *minimax*, since all the nodes below the aforementioned states are terminal."
655+
]
656+
},
657+
{
658+
"cell_type": "code",
659+
"execution_count": 7,
660+
"metadata": {},
661+
"outputs": [
662+
{
663+
"name": "stdout",
664+
"output_type": "stream",
665+
"text": [
666+
"b1\n",
667+
"c1\n",
668+
"d3\n"
669+
]
670+
}
671+
],
672+
"source": [
673+
"print(alphabeta_search('B', fig52))\n",
674+
"print(alphabeta_search('C', fig52))\n",
675+
"print(alphabeta_search('D', fig52))"
676+
]
677+
},
549678
{
550679
"cell_type": "markdown",
551680
"metadata": {},
@@ -561,7 +690,7 @@
561690
"The `random_player` is a function that plays random moves in the game. That's it. There isn't much more to this guy. \n",
562691
"\n",
563692
"## alphabeta_player\n",
564-
"The `alphabeta_player`, on the other hand, calls the `alphabeta_full_search` function, which returns the best move in the current game state. Thus, the `alphabeta_player` always plays the best move given a game state, assuming that the game tree is small enough to search entirely.\n",
693+
"The `alphabeta_player`, on the other hand, calls the `alphabeta_search` function, which returns the best move in the current game state. Thus, the `alphabeta_player` always plays the best move given a game state, assuming that the game tree is small enough to search entirely.\n",
565694
"\n",
566695
"## play_game\n",
567696
"The `play_game` function will be the one that will actually be used to play the game. You pass as arguments to it an instance of the game you want to play and the players you want in this game. Use it to play AI vs AI, AI vs human, or even human vs human matches!"
@@ -580,7 +709,7 @@
580709
},
581710
{
582711
"cell_type": "code",
583-
"execution_count": 2,
712+
"execution_count": 3,
584713
"metadata": {
585714
"collapsed": true
586715
},
@@ -672,7 +801,7 @@
672801
},
673802
{
674803
"cell_type": "code",
675-
"execution_count": 6,
804+
"execution_count": 5,
676805
"metadata": {},
677806
"outputs": [
678807
{
@@ -681,13 +810,13 @@
681810
"'a1'"
682811
]
683812
},
684-
"execution_count": 6,
813+
"execution_count": 5,
685814
"metadata": {},
686815
"output_type": "execute_result"
687816
}
688817
],
689818
"source": [
690-
"alphabeta_full_search('A', game52)"
819+
"alphabeta_search('A', game52)"
691820
]
692821
},
693822
{

games.py

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@ def min_value(state):
4242
# ______________________________________________________________________________
4343

4444

45-
def alphabeta_full_search(state, game):
45+
def alphabeta_search(state, game):
4646
"""Search game to determine best action; use alpha-beta pruning.
4747
As in [Figure 5.7], this version searches all the way to the leaves."""
4848

@@ -71,7 +71,7 @@ def min_value(state, alpha, beta):
7171
beta = min(beta, v)
7272
return v
7373

74-
# Body of alphabeta_search:
74+
# Body of alphabeta_cutoff_search:
7575
best_score = -infinity
7676
beta = infinity
7777
best_action = None
@@ -83,7 +83,7 @@ def min_value(state, alpha, beta):
8383
return best_action
8484

8585

86-
def alphabeta_search(state, game, d=4, cutoff_test=None, eval_fn=None):
86+
def alphabeta_cutoff_search(state, game, d=4, cutoff_test=None, eval_fn=None):
8787
"""Search game to determine best action; use alpha-beta pruning.
8888
This version cuts off search and uses an evaluation function."""
8989

@@ -114,7 +114,7 @@ def min_value(state, alpha, beta, depth):
114114
beta = min(beta, v)
115115
return v
116116

117-
# Body of alphabeta_search starts here:
117+
# Body of alphabeta_cutoff_search starts here:
118118
# The default test cuts off at depth d or at a terminal state
119119
cutoff_test = (cutoff_test or
120120
(lambda state, depth: depth > d or
@@ -154,7 +154,7 @@ def random_player(game, state):
154154

155155

156156
def alphabeta_player(game, state):
157-
return alphabeta_full_search(state, game)
157+
return alphabeta_search(state, game)
158158

159159

160160
# ______________________________________________________________________________

tests/test_games.py

Lines changed: 9 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,3 @@
1-
"""A lightweight test suite for games.py"""
2-
3-
# You can run this test suite by doing: py.test tests/test_games.py
4-
# Of course you need to have py.test installed to do this.
5-
6-
import pytest
7-
81
from games import *
92

103
# Creating the game instances
@@ -36,27 +29,27 @@ def test_minimax_decision():
3629
assert minimax_decision('D', f52) == 'd3'
3730

3831

39-
def test_alphabeta_full_search():
40-
assert alphabeta_full_search('A', f52) == 'a1'
41-
assert alphabeta_full_search('B', f52) == 'b1'
42-
assert alphabeta_full_search('C', f52) == 'c1'
43-
assert alphabeta_full_search('D', f52) == 'd3'
32+
def test_alphabeta_search():
33+
assert alphabeta_search('A', f52) == 'a1'
34+
assert alphabeta_search('B', f52) == 'b1'
35+
assert alphabeta_search('C', f52) == 'c1'
36+
assert alphabeta_search('D', f52) == 'd3'
4437

4538
state = gen_state(to_move='X', x_positions=[(1, 1), (3, 3)],
4639
o_positions=[(1, 2), (3, 2)])
47-
assert alphabeta_full_search(state, ttt) == (2, 2)
40+
assert alphabeta_search(state, ttt) == (2, 2)
4841

4942
state = gen_state(to_move='O', x_positions=[(1, 1), (3, 1), (3, 3)],
5043
o_positions=[(1, 2), (3, 2)])
51-
assert alphabeta_full_search(state, ttt) == (2, 2)
44+
assert alphabeta_search(state, ttt) == (2, 2)
5245

5346
state = gen_state(to_move='O', x_positions=[(1, 1)],
5447
o_positions=[])
55-
assert alphabeta_full_search(state, ttt) == (2, 2)
48+
assert alphabeta_search(state, ttt) == (2, 2)
5649

5750
state = gen_state(to_move='X', x_positions=[(1, 1), (3, 1)],
5851
o_positions=[(2, 2), (3, 1)])
59-
assert alphabeta_full_search(state, ttt) == (1, 3)
52+
assert alphabeta_search(state, ttt) == (1, 3)
6053

6154

6255
def test_random_tests():
@@ -67,7 +60,3 @@ def test_random_tests():
6760

6861
# The player 'X' (one who plays first) in TicTacToe never loses:
6962
assert ttt.play_game(alphabeta_player, random_player) >= 0
70-
71-
72-
if __name__ == '__main__':
73-
pytest.main()

0 commit comments

Comments
 (0)