|
| 1 | +# MiniGo |
| 2 | +This is a simplified implementation of MiniGo based on the code provided by the authors: [MiniGo](https://github.com/tensorflow/minigo). |
| 3 | + |
| 4 | +MiniGo is a minimalist Go engine modeled after AlphaGo Zero, built on MuGo. The current implementation consists of three main modules: the DualNet model, the Monte Carlo Tree Search (MCTS), and Go domain knowledge. Currently the **model** part is our focus. |
| 5 | + |
| 6 | +This implementation maintains the features of model training and validation, and also provides evaluation of two Go models. |
| 7 | + |
| 8 | + |
| 9 | +## DualNet Model |
| 10 | +The input to the neural network is a [board_size * board_size * 17] image stack |
| 11 | +comprising 17 binary feature planes. 8 feature planes consist of binary values |
| 12 | +indicating the presence of the current player's stones; A further 8 feature |
| 13 | +planes represent the corresponding features for the opponent's stones; The final |
| 14 | +feature plane represents the color to play, and has a constant value of either 1 |
| 15 | +if black is to play or 0 if white to play. Check `features.py` for more details. |
| 16 | + |
| 17 | +In MiniGo implementation, the input features are processed by a residual tower |
| 18 | +that consists of a single convolutional block followed by either 9 or 19 |
| 19 | +residual blocks. |
| 20 | +The convolutional block applies the following modules: |
| 21 | + 1. A convolution of num_filter filters of kernel size 3 x 3 with stride 1 |
| 22 | + 2. Batch normalization |
| 23 | + 3. A rectifier non-linearity |
| 24 | + |
| 25 | +Each residual block applies the following modules sequentially to its input: |
| 26 | + 1. A convolution of num_filter filters of kernel size 3 x 3 with stride 1 |
| 27 | + 2. Batch normalization |
| 28 | + 3. A rectifier non-linearity |
| 29 | + 4. A convolution of num_filter filters of kernel size 3 x 3 with stride 1 |
| 30 | + 5. Batch normalization |
| 31 | + 6. A skip connection that adds the input to the block |
| 32 | + 7. A rectifier non-linearity |
| 33 | + |
| 34 | +Note: num_filter is 128 for 19 x 19 board size, and 32 for 9 x 9 board size. |
| 35 | + |
| 36 | +The output of the residual tower is passed into two separate "heads" for |
| 37 | +computing the policy and value respectively. The policy head applies the |
| 38 | +following modules: |
| 39 | + 1. A convolution of 2 filters of kernel size 1 x 1 with stride 1 |
| 40 | + 2. Batch normalization |
| 41 | + 3. A rectifier non-linearity |
| 42 | + 4. A fully connected linear layer that outputs a vector of size (board_size * board_size + 1) corresponding to logit probabilities for all intersections and the pass move |
| 43 | + |
| 44 | +The value head applies the following modules: |
| 45 | + 1. A convolution of 1 filter of kernel size 1 x 1 with stride 1 |
| 46 | + 2. Batch normalization |
| 47 | + 3. A rectifier non-linearity |
| 48 | + 4. A fully connected linear layer to a hidden layer of size 256 for 19 x 19 |
| 49 | + board size and 64 for 9x9 board size |
| 50 | + 5. A rectifier non-linearity |
| 51 | + 6. A fully connected linear layer to a scalar |
| 52 | + 7. A tanh non-linearity outputting a scalar in the range [-1, 1] |
| 53 | + |
| 54 | +The overall network depth, in the 10 or 20 block network, is 19 or 39 |
| 55 | +parameterized layers respectively for the residual tower, plus an additional 2 |
| 56 | +layers for the policy head and 3 layers for the value head. |
| 57 | + |
| 58 | +## Getting Started |
| 59 | +Please follow the [instructions](https://github.com/tensorflow/minigo/blob/master/README.md#getting-started) in original Minigo repo to set up the environment. |
| 60 | + |
| 61 | +## Training Model |
| 62 | +One iteration of reinforcement learning consists of the following steps: |
| 63 | + - Bootstrap: initializes a random model |
| 64 | + - Selfplay: plays games with the latest model, producing data used for training |
| 65 | + - Gather: groups games played with the same model into larger files of tfexamples. |
| 66 | + - Train: trains a new model with the selfplay results from the most recent N |
| 67 | + generations. |
| 68 | + |
| 69 | + Run `minigo.py`. |
| 70 | + ``` |
| 71 | + python minigo.py |
| 72 | + ``` |
| 73 | + |
| 74 | +## Validating Model |
| 75 | + Run `minigo.py` with `--validation` argument |
| 76 | + ``` |
| 77 | + python minigo.py --validation |
| 78 | + ``` |
| 79 | + The `--validation` argument is to generate holdout dataset for model validation |
| 80 | + |
| 81 | +## Evaluating MiniGo Models |
| 82 | + Run `minigo.py` with `--evaluation` argument |
| 83 | + ``` |
| 84 | + python minigo.py --evaluation |
| 85 | + ``` |
| 86 | + The `--evaluation` argument is to invoke the evaluation between the latest model and the current best model. |
| 87 | + |
| 88 | +## Testing Pipeline |
| 89 | +As the whole RL pipeline may takes hours to train even for a 9x9 board size, we provide a dummy model with a `--debug` mode for testing purpose. |
| 90 | + |
| 91 | + Run `minigo.py` with `--debug` argument |
| 92 | + ``` |
| 93 | + python minigo.py --debug |
| 94 | + ``` |
| 95 | + The `--debug` argument is for testing purpose with a dummy model. |
| 96 | + |
| 97 | +Validation and evaluation can also be tested with the dummy model by combing their corresponding arguments with `--debug`. |
| 98 | +To test validation, run the following commands: |
| 99 | + ``` |
| 100 | + python minigo.py --debug --validation |
| 101 | + ``` |
| 102 | +To test evaluation, run the following commands: |
| 103 | + ``` |
| 104 | + python minigo.py --debug --evaluation |
| 105 | + ``` |
| 106 | +To test both validation and evaluation, run the following commands: |
| 107 | + ``` |
| 108 | + python minigo.py --debug --validation --evaluation |
| 109 | + ``` |
| 110 | + |
| 111 | +## MCTS and Go features (TODO) |
| 112 | +Code clean up on MCTS and Go features. |
0 commit comments