AlphaToe

Applying the deep learning techniques from Alpha Go to play tic-tac-toe

These are the code examples to with my talk, the slide for which are in AlphaToe.pdf

As well as the slides, the file script/policy_gradient.py is a good starting point for the project. All networks are built using TensorFlow.

SetUp

To get running start by creating a virtual env/conda env with tensorFlow installed. Current instructions for this are at: https://www.tensorflow.org/versions/r0.11/get_started/os_setup.html#anaconda-installation

Then run the file file policy_gradient.py

This has been tested with python 2.7 and 3.5

Name		Name	Last commit message	Last commit date
Latest commit History 62 Commits
common		common
connect_4		connect_4
games		games
pickles		pickles
techniques		techniques
tests		tests
tic_tac_toe_5_4		tic_tac_toe_5_4
.DS_Store		.DS_Store
.gitignore		.gitignore
Alpha Toe.pdf		Alpha Toe.pdf
DQN_Nash_control.py		DQN_Nash_control.py
DQN_Nash_well_trained_control.py		DQN_Nash_well_trained_control.py
DQN_control.py		DQN_control.py
DQN_minmax_control.py		DQN_minmax_control.py
LICENSE		LICENSE
Play_Against_Human.py		Play_Against_Human.py
Q-value-Divergence.png		Q-value-Divergence.png
README.md		README.md
policy_gradient.py		policy_gradient.py
policy_gradient_Nash.py		policy_gradient_Nash.py
policy_gradient_historical_competition.py		policy_gradient_historical_competition.py
requirements.txt		requirements.txt
td_learning_control.py		td_learning_control.py
train_against_minmax.py		train_against_minmax.py
train_policy_gradient_Nash_draws.npy		train_policy_gradient_Nash_draws.npy
train_policy_gradient_Nash_p1.npy		train_policy_gradient_Nash_p1.npy
train_policy_gradient_Nash_p2.npy		train_policy_gradient_Nash_p2.npy
value_network.py		value_network.py