Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Epistemic AlphaZero utilizes uncertainty to explore and learn even when AlphaZero gets stuck.

Notifications You must be signed in to change notification settings

emcts/e-alphazero

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

e-alphazero

This repository contains an implementation of Epistemic AlphaZero which is an modification of AlphaZero that uses Epistemic Monte Carlo Tree Search (E-MCTS). We use JAX to make efficient use of GPU acceleration. Our framework is compatible with pgx environments, and in fact we implement two new ones: DeepSea and Subleq, (see src/envs/).

Subleq Results

See also emctx, a fork of mctx which supports epistemic uncertainty propagation as described in the E-MCTS paper.

Structure

  • The program in src/:
    • The entry is main.py.
    • Self-play (i.e. environment interaction) is in selfplay.py.
    • Replay buffer reanalyze is in reanalyze.py.
    • Evaluation (i.e. determining strength) is in evaluate.py.
    • Network training (i.e. policy and value improvement) is in train.py.
    • Config options are in config.py, and the context that is created from them is in context.py.
  • Custom Environments are in envs/.
  • Network architectures and hashing algorithms (for uncertainty estimation) are in network/.
  • Scripts for submitting experiments and analysis are in scripts/.

Usage

  1. Install Python.
  2. Install pipenv with pip install --user pipenv.
  3. Run pipenv install in this directory to install the required dependencies.
  4. Run pipenv run python src/main.py with optional configuration specified as space-separated parameter=value.

About

Epistemic AlphaZero utilizes uncertainty to explore and learn even when AlphaZero gets stuck.

Topics

Resources

Stars

Watchers

Forks

Contributors 2

  •  
  •  

Languages