Status: Archive (code is provided as-is, no updates expected)
This is the code for implementing the MADDPG algorithm presented in the paper: Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments. It is configured to be run in conjunction with environments from the Multi-Agent Particle Environments (MPE). Note: this codebase has been restructured since the original paper, and the results may vary from those reported in the paper.
Update: the original implementation for policy ensemble and policy estimation can be found here. The code is provided as-is.
-
To install,
cdinto the root directory and typepip install -e . -
Known dependencies: Python (3.5.4), OpenAI gym (0.10.5), tensorflow (1.8.0), numpy (1.14.5)
We demonstrate here how the code can be used in conjunction with the Multi-Agent Particle Environments (MPE).
-
Download and install the MPE code here by following the
README. -
Ensure that
multiagent-particle-envshas been added to yourPYTHONPATH(e.g. in~/.bashrcor~/.bash_profile). -
To run the code,
cdinto theexperimentsdirectory and runtrain.py:
python train.py --scenario simple
- You can replace
simplewith any environment in the MPE you'd like to run.
-
--scenario: defines which environment in the MPE is to be used (default:"simple") -
--max-episode-lenmaximum length of each episode for the environment (default:25) -
--num-episodestotal number of training episodes (default:60000) -
--num-adversaries: number of adversaries in the environment (default:0) -
--good-policy: algorithm used for the 'good' (non adversary) policies in the environment (default:"maddpg"; options: {"maddpg","ddpg"}) -
--adv-policy: algorithm used for the adversary policies in the environment (default:"maddpg"; options: {"maddpg","ddpg"})
-
--lr: learning rate (default:1e-2) -
--gamma: discount factor (default:0.95) -
--batch-size: batch size (default:1024) -
--num-units: number of units in the MLP (default:64)
-
--exp-name: name of the experiment, used as the file name to save all results (default:None) -
--save-dir: directory where intermediate training results and model will be saved (default:"/tmp/policy/") -
--save-rate: model is saved every time this number of episodes has been completed (default:1000) -
--load-dir: directory where training state and model are loaded from (default:"")
-
--restore: restores previous training state stored inload-dir(or insave-dirif noload-dirhas been provided), and continues training (default:False) -
--display: displays to the screen the trained policy stored inload-dir(or insave-dirif noload-dirhas been provided), but does not continue training (default:False) -
--benchmark: runs benchmarking evaluations on saved policy, saves results tobenchmark-dirfolder (default:False) -
--benchmark-iters: number of iterations to run benchmarking for (default:100000) -
--benchmark-dir: directory where benchmarking data is saved (default:"./benchmark_files/") -
--plots-dir: directory where training curves are saved (default:"./learning_curves/")
-
./experiments/train.py: contains code for training MADDPG on the MPE -
./maddpg/trainer/maddpg.py: core code for the MADDPG algorithm -
./maddpg/trainer/replay_buffer.py: replay buffer code for MADDPG -
./maddpg/common/distributions.py: useful distributions used inmaddpg.py -
./maddpg/common/tf_util.py: useful tensorflow functions used inmaddpg.py
If you used this code for your experiments or found it helpful, consider citing the following paper:
@article{lowe2017multi,
title={Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments},
author={Lowe, Ryan and Wu, Yi and Tamar, Aviv and Harb, Jean and Abbeel, Pieter and Mordatch, Igor},
journal={Neural Information Processing Systems (NIPS)},
year={2017}
}