Releases: enpasos/muzero
v0.7.0
The agent learns the perfect game in the tic-tac-toe integration test. Perfect means that every possible decision in the decision tree is correct and stable from epoch to epoch. It also means that the agent is not exploitable in any way. However, it goes beyond exploitability: the agent selects actions that are rewarded with the same probability in the optimal course of the game with the same probability. This means that the agent does not specialize, but remains broadly positioned.
This release uses the latest versions of the most important libraries: PyTorch 2.1.1, Java JDK 21, Spring Boot 3.2.0, and Gradle 8.5. In DJL it is on 0.26.0-SNAPSHOT.
The model is fully encapsulated and stable.
The code needs to be refactored and cleaned up.