ipcoder
diff --git a/‎research/minigo/README.md
Lines changed: 112 additions & 0 deletions b/‎research/minigo/README.md
Lines changed: 112 additions & 0 deletions
diff --git a/‎research/minigo/__init__.py b/‎research/minigo/__init__.py
diff --git a/‎research/minigo/coords.py
Lines changed: 110 additions & 0 deletions b/‎research/minigo/coords.py
Lines changed: 110 additions & 0 deletions
@@ -0,0 +1,112 @@
+# MiniGo
+This is a simplified implementation of MiniGo based on the code provided by the authors: [MiniGo](https://github.com/tensorflow/minigo).
+
+MiniGo is a minimalist Go engine modeled after AlphaGo Zero, built on MuGo. The current implementation consists of three main modules: the DualNet model, the Monte Carlo Tree Search (MCTS), and Go domain knowledge. Currently the **model** part is our focus.
+
+This implementation maintains the features of model training and validation, and also provides evaluation of two Go models.
+
+
+## DualNet Model
+The input to the neural network is a [board_size * board_size * 17] image stack
+comprising 17 binary feature planes. 8 feature planes consist of binary values
+indicating the presence of the current player's stones; A further 8 feature
+planes represent the corresponding features for the opponent's stones; The final
+feature plane represents the color to play, and has a constant value of either 1
+if black is to play or 0 if white to play. Check `features.py` for more details.
+
+In MiniGo implementation, the input features are processed by a residual tower
+that consists of a single convolutional block followed by either 9 or 19
+residual blocks.
+The convolutional block applies the following modules:
+  1. A convolution of num_filter filters of kernel size 3 x 3 with stride 1
+  2. Batch normalization
+  3. A rectifier non-linearity
+
+Each residual block applies the following modules sequentially to its input:
+  1. A convolution of num_filter filters of kernel size 3 x 3 with stride 1
+  2. Batch normalization
+  3. A rectifier non-linearity
+  4. A convolution of num_filter filters of kernel size 3 x 3 with stride 1
+  5. Batch normalization
+  6. A skip connection that adds the input to the block
+  7. A rectifier non-linearity
+
+Note: num_filter is 128 for 19 x 19 board size, and 32 for 9 x 9 board size.
+
+The output of the residual tower is passed into two separate "heads" for
+computing the policy and value respectively. The policy head applies the
+following modules:
+  1. A convolution of 2 filters of kernel size 1 x 1 with stride 1
+  2. Batch normalization
+  3. A rectifier non-linearity
+  4. A fully connected linear layer that outputs a vector of size (board_size * board_size + 1) corresponding to logit probabilities for all intersections and the pass move
+
+The value head applies the following modules:
+  1. A convolution of 1 filter of kernel size 1 x 1 with stride 1
+  2. Batch normalization
+  3. A rectifier non-linearity
+  4. A fully connected linear layer to a hidden layer of size 256 for 19 x 19
+    board size and 64 for 9x9 board size
+  5. A rectifier non-linearity
+  6. A fully connected linear layer to a scalar
+  7. A tanh non-linearity outputting a scalar in the range [-1, 1]
+
+The overall network depth, in the 10 or 20 block network, is 19 or 39
+parameterized layers respectively for the residual tower, plus an additional 2
+layers for the policy head and 3 layers for the value head.
+
+## Getting Started
+Please follow the [instructions](https://github.com/tensorflow/minigo/blob/master/README.md#getting-started) in original Minigo repo to set up the environment.
+
+## Training Model
+One iteration of reinforcement learning consists of the following steps:
+ - Bootstrap: initializes a random model
+ - Selfplay: plays games with the latest model, producing data used for training
+ - Gather: groups games played with the same model into larger files of tfexamples.
+ - Train: trains a new model with the selfplay results from the most recent N
+   generations.
+
+ Run `minigo.py`.
+ ```
+ python minigo.py
+ ```
+
+## Validating Model
+ Run `minigo.py` with `--validation` argument
+ ```
+ python minigo.py --validation
+ ```
+ The `--validation` argument is to generate holdout dataset for model validation
+
+## Evaluating MiniGo Models
+ Run `minigo.py` with `--evaluation` argument
+ ```
+ python minigo.py --evaluation
+ ```
+ The `--evaluation` argument is to invoke the evaluation between the latest model and the current best model.
+
+## Testing Pipeline
+As the whole RL pipeline may takes hours to train even for a 9x9 board size, we provide a dummy model with a `--debug` mode for testing purpose.
+
+ Run `minigo.py` with `--debug` argument
+ ```
+ python minigo.py --debug
+ ```
+ The `--debug` argument is for testing purpose with a dummy model.
+
+Validation and evaluation can also be tested with the dummy model by combing their corresponding arguments with `--debug`.
+To test validation, run the following commands:
+ ```
+ python minigo.py --debug --validation
+ ```
+To test evaluation, run the following commands:
+ ```
+ python minigo.py --debug --evaluation
+ ```
+To test both validation and evaluation, run the following commands:
+ ```
+ python minigo.py --debug --validation --evaluation
+ ```
+
+## MCTS and Go features (TODO)
+Code clean up on MCTS and Go features.
@@ -0,0 +1,110 @@
+# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Logic for dealing with coordinates.
+
+This introduces some helpers and terminology that are used throughout MiniGo.
+
+MiniGo Coordinate: This is a tuple of the form (row, column) that is indexed
+  starting out at (0, 0) from the upper-left.
+Flattened Coordinate: this is a number ranging from 0 - N^2 (so N^2+1
+  possible values). The extra value N^2 is used to mark a 'pass' move.
+SGF Coordinate: Coordinate used for SGF serialization format. Coordinates use
+  two-letter pairs having the form (column, row) indexed from the upper-left
+  where 0, 0 = 'aa'.
+KGS Coordinate: Human-readable coordinate string indexed from bottom left, with
+  the first character a capital letter for the column and the second a number
+  from 1-19 for the row. Note that KGS chooses to skip the letter 'I' due to
+  its similarity with 'l' (lowercase 'L').
+PYGTP Coordinate: Tuple coordinate indexed starting at 1,1 from bottom-left
+  in the format (column, row)
+
+So, for a 19x19,
+
+Coord Type      upper_left      upper_right     pass
+-------------------------------------------------------
+minigo coord    (0, 0)          (0, 18)         None
+flat            0               18              361
+SGF             'aa'            'sa'            ''
+KGS             'A19'           'T19'           'pass'
+pygtp           (1, 19)         (19, 19)        (0, 0)
+"""
+
+import gtp
+
+# We provide more than 19 entries here in case of boards larger than 19 x 19.
+_SGF_COLUMNS = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
+_KGS_COLUMNS = 'ABCDEFGHJKLMNOPQRSTUVWXYZ'
+
+
+def from_flat(board_size, flat):
+  """Converts from a flattened coordinate to a MiniGo coordinate."""
+  if flat == board_size * board_size:
+    return None
+  return divmod(flat, board_size)
+
+
+def to_flat(board_size, coord):
+  """Converts from a MiniGo coordinate to a flattened coordinate."""
+  if coord is None:
+    return board_size * board_size
+  return board_size * coord[0] + coord[1]
+
+
+def from_sgf(sgfc):
+  """Converts from an SGF coordinate to a MiniGo coordinate."""
+  if not sgfc:
+    return None
+  return _SGF_COLUMNS.index(sgfc[1]), _SGF_COLUMNS.index(sgfc[0])
+
+
+def to_sgf(coord):
+  """Converts from a MiniGo coordinate to an SGF coordinate."""
+  if coord is None:
+    return ''
+  return _SGF_COLUMNS[coord[1]] + _SGF_COLUMNS[coord[0]]
+
+
+def from_kgs(board_size, kgsc):
+  """Converts from a KGS coordinate to a MiniGo coordinate."""
+  if kgsc == 'pass':
+    return None
+  kgsc = kgsc.upper()
+  col = _KGS_COLUMNS.index(kgsc[0])
+  row_from_bottom = int(kgsc[1:])
+  return board_size - row_from_bottom, col
+
+
+def to_kgs(board_size, coord):
+  """Converts from a MiniGo coordinate to a KGS coordinate."""
+  if coord is None:
+    return 'pass'
+  y, x = coord
+  return '{}{}'.format(_KGS_COLUMNS[x], board_size - y)
+
+
+def from_pygtp(board_size, pygtpc):
+  """Converts from a pygtp coordinate to a MiniGo coordinate."""
+  # GTP has a notion of both a Pass and a Resign, both of which are mapped to
+  # None, so the conversion is not precisely bijective.
+  if pygtpc in (gtp.PASS, gtp.RESIGN):
+    return None
+  return board_size - pygtpc[1], pygtpc[0] - 1
+
+
+def to_pygtp(board_size, coord):
+  """Converts from a MiniGo coordinate to a pygtp coordinate."""
+  if coord is None:
+    return gtp.PASS
+  return coord[1] + 1, board_size - coord[0]