This repository contains an implementation of Epistemic AlphaZero which is an modification of AlphaZero
that uses Epistemic Monte Carlo Tree Search (E-MCTS). We use JAX to make efficient use of GPU acceleration.
Our framework is compatible with pgx environments, and in fact we implement two new ones: DeepSea and Subleq, (see src/envs/).
See also emctx, a fork of mctx which supports epistemic uncertainty propagation as described in the E-MCTS paper.
- The program in
src/:- The entry is
main.py. - Self-play (i.e. environment interaction) is in
selfplay.py. - Replay buffer reanalyze is in
reanalyze.py. - Evaluation (i.e. determining strength) is in
evaluate.py. - Network training (i.e. policy and value improvement) is in
train.py. - Config options are in
config.py, and the context that is created from them is incontext.py.
- The entry is
- Custom Environments are in
envs/. - Network architectures and hashing algorithms (for uncertainty estimation) are in
network/. - Scripts for submitting experiments and analysis are in
scripts/.
- Install Python.
- Install
pipenvwithpip install --user pipenv. - Run
pipenv installin this directory to install the required dependencies. - Run
pipenv run python src/main.pywith optional configuration specified as space-separatedparameter=value.