-
Chess Environment & Encoding [done]
- Board stepping [done]
- Observataion encoding [done]
- Action encoding [done]
-
Neural Nework Model [done]
- Shared resnet with policy and value head [done]
-
Monte Carlo Tree search [done]
-
Self play loop [done]
- Play [done]
- Store data [done]
- Export data as PyTorch dataset-compatible format [done]
-
Training pipeline [done]
-
Evaluation and Rating TODO
-
Optimization and Scaling TODO
Interresting Papers:
- Monte-Carlo tree search as regularized policy optimization https://arxiv.org/abs/2007.12509
Improvement of MCTS for low Nsim values