MuZero General

MuZero 算法的高品質開源實現。本項目基於 DeepMind 的論文與偽代碼，提供了一個易於理解且可擴展的 MuZero 框架。

簡介 (Introduction)

MuZero 是目前最先進的強化學習算法之一，它在不具備環境規則知識的情況下，能夠精通圍棋、西洋棋、將棋以及 Atari 遊戲。它透過學習環境的動態模型來預測獎勵、價值與策略。

本項目旨在提供一個清晰、註釋豐富的實作，方便研究人員與開發者學習或將其應用於新的遊戲環境。

主要功能 (Features)

核心架構: 支援全連接網絡 (Fully Connected) 與殘差網絡 (ResNet)，使用 PyTorch 實現。
分散式計算: 使用 Ray 支援多執行緒/異步處理，以及多 GPU 訓練與自我對弈 (Self-Play)。
監控: 整合 TensorBoard 進行實時訓練監控。
模型管理: 自動保存檢查點 (Checkpoints) 與模型權重。
擴展性: 設計模組化，易於添加新遊戲 (參考 games/ 目錄)。
文檔: 豐富的代碼註釋與文檔說明。

支援的遊戲 (Implemented Games)

Cartpole (平衡桿)
Lunar Lander (月球登陸器)
Gridworld (網格世界)
Tic-tac-toe (井字棋)
Connect4 (四子棋)
Gomoku (五子棋)
Twenty-One / Blackjack (二十一點)
Atari Breakout (打磚塊)

安裝指南 (Installation)

請確保您的 Python 版本 >= 3.6。

git clone https://github.com/werner-duvaud/muzero-general.git
cd muzero-general

# 安裝依賴
pip install -r requirements.lock

快速開始 (Quick Start)

啟動 MuZero：

python muzero.py

這將會啟動五子棋 (Gomoku) 的選單，您可以選擇：

訓練: 開始自我對弈並訓練模型。
載入模型: 載入預訓練模型。
診斷模型: 比較虛擬軌跡與真實環境。
對戰: 與訓練好的 AI 對弈。

若要直接訓練特定遊戲 (例如 Cartpole)：

python muzero.py cartpole

查看訓練進度

在新的終端機視窗中執行：

tensorboard --logdir ./results

然後打開瀏覽器訪問 http://localhost:6006。

配置說明 (Configuration)

每個遊戲的超參數都定義在該遊戲文件的 MuZeroConfig 類別中 (位於 games/ 目錄下)。

例如，要在 games/gomoku.py 中修改配置：

self.board_size: 棋盤大小
self.num_simulations: MCTS 模擬次數
self.training_steps: 總訓練步數
self.lr_init: 初始學習率

架構概覽 (Architecture Overview)

本項目的架構設計如下：

MuZero (muzero.py): 程式入口，負責協調各個 Worker。
Workers (Ray):
- SelfPlay: 進行自我對弈，產生遊戲數據並存入 Replay Buffer。
- Trainer: 從 Replay Buffer 採樣數據，訓練神經網絡。
- SharedStorage: 儲存最新的模型權重，供各 Worker 同步。
- ReplayBuffer: 儲存過往的遊戲經驗。
Models (models.py): 定義了 MuZero 的三個核心網絡 (Representation, Dynamics, Prediction)。
Games (games/): 包含所有遊戲的具體實現與配置。

詳細架構分析請參考本目錄下的 architecture_analysis.md。

致謝 (Authors)

Werner Duvaud
Aurèle Hainaut
Paul Lenoir
Contributors

如果您在研究中使用了本代碼庫，請引用：

@misc{muzero-general,
  author       = {Werner Duvaud, Aurèle Hainaut},
  title        = {MuZero General: Open Reimplementation of MuZero},
  year         = {2019},
  publisher    = {GitHub},
  journal      = {GitHub repository},
  howpublished = {\url{https://github.com/werner-duvaud/muzero-general}},
}

Name		Name	Last commit message	Last commit date
Latest commit History 135 Commits
.github		.github
docs		docs
games		games
results/gomoku		results/gomoku
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
check_training.py		check_training.py
config.json		config.json
diagnose_model.py		diagnose_model.py
gomoku_ui_tk.py		gomoku_ui_tk.py
models.py		models.py
muzero.py		muzero.py
replay_buffer.py		replay_buffer.py
requirements.lock		requirements.lock
requirements.txt		requirements.txt
self_play.py		self_play.py
shared_storage.py		shared_storage.py
trainer.py		trainer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MuZero General

簡介 (Introduction)

主要功能 (Features)

支援的遊戲 (Implemented Games)

安裝指南 (Installation)

快速開始 (Quick Start)

查看訓練進度

配置說明 (Configuration)

架構概覽 (Architecture Overview)

致謝 (Authors)

About

Uh oh!

Releases

Packages

Languages

License

Vress0/muzero-general

Folders and files

Latest commit

History

Repository files navigation

MuZero General

簡介 (Introduction)

主要功能 (Features)

支援的遊戲 (Implemented Games)

安裝指南 (Installation)

快速開始 (Quick Start)

查看訓練進度

配置說明 (Configuration)

架構概覽 (Architecture Overview)

致謝 (Authors)

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages