Multi-Agent Deep Deterministic Policy Gradient (MADDPG)

Status: Archive (code is provided as-is, no updates expected)

Multi-Agent Deep Deterministic Policy Gradient (MADDPG)

This is the code for implementing the MADDPG algorithm presented in the paper: Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments. It is configured to be run in conjunction with environments from the Multi-Agent Particle Environments (MPE). Note: this codebase has been restructured since the original paper, and the results may vary from those reported in the paper.

Update: the original implementation for policy ensemble and policy estimation can be found here. The code is provided as-is.

Installation

To install, cd into the root directory and type pip install -e .
Known dependencies: Python (3.5.4), OpenAI gym (0.10.5), tensorflow (1.8.0), numpy (1.14.5)

Case study: Multi-Agent Particle Environments

We demonstrate here how the code can be used in conjunction with the Multi-Agent Particle Environments (MPE).

Download and install the MPE code here by following the README.
Ensure that multiagent-particle-envs has been added to your PYTHONPATH (e.g. in ~/.bashrc or ~/.bash_profile).
To run the code, cd into the experiments directory and run train.py:

python train.py --scenario simple

You can replace simple with any environment in the MPE you'd like to run.

Command-line options

Environment options

--scenario: defines which environment in the MPE is to be used (default: "simple")
--max-episode-len maximum length of each episode for the environment (default: 25)
--num-episodes total number of training episodes (default: 60000)
--num-adversaries: number of adversaries in the environment (default: 0)
--good-policy: algorithm used for the 'good' (non adversary) policies in the environment (default: "maddpg"; options: {"maddpg", "ddpg"})
--adv-policy: algorithm used for the adversary policies in the environment (default: "maddpg"; options: {"maddpg", "ddpg"})

Core training parameters

--lr: learning rate (default: 1e-2)
--gamma: discount factor (default: 0.95)
--batch-size: batch size (default: 1024)
--num-units: number of units in the MLP (default: 64)

Checkpointing

--exp-name: name of the experiment, used as the file name to save all results (default: None)
--save-dir: directory where intermediate training results and model will be saved (default: "/tmp/policy/")
--save-rate: model is saved every time this number of episodes has been completed (default: 1000)
--load-dir: directory where training state and model are loaded from (default: "")

Evaluation

--restore: restores previous training state stored in load-dir (or in save-dir if no load-dir has been provided), and continues training (default: False)
--display: displays to the screen the trained policy stored in load-dir (or in save-dir if no load-dir has been provided), but does not continue training (default: False)
--benchmark: runs benchmarking evaluations on saved policy, saves results to benchmark-dir folder (default: False)
--benchmark-iters: number of iterations to run benchmarking for (default: 100000)
--benchmark-dir: directory where benchmarking data is saved (default: "./benchmark_files/")
--plots-dir: directory where training curves are saved (default: "./learning_curves/")

Code structure

./experiments/train.py: contains code for training MADDPG on the MPE
./maddpg/trainer/maddpg.py: core code for the MADDPG algorithm
./maddpg/trainer/replay_buffer.py: replay buffer code for MADDPG
./maddpg/common/distributions.py: useful distributions used in maddpg.py
./maddpg/common/tf_util.py: useful tensorflow functions used in maddpg.py

Paper citation

If you used this code for your experiments or found it helpful, consider citing the following paper:

@article{lowe2017multi,
  title={Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments},
  author={Lowe, Ryan and Wu, Yi and Tamar, Aviv and Harb, Jean and Abbeel, Pieter and Mordatch, Igor},
  journal={Neural Information Processing Systems (NIPS)},
  year={2017}
}

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
experiments		experiments
maddpg		maddpg
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
base.bat		base.bat
game.py		game.py
hanabi.py		hanabi.py
mpe.py		mpe.py
out.txt		out.txt
plot.py		plot.py
script.bat		script.bat
setup.py		setup.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

Multi-Agent Deep Deterministic Policy Gradient (MADDPG)

Installation

Case study: Multi-Agent Particle Environments

Command-line options

Environment options

Core training parameters

Checkpointing

Evaluation

Code structure

Paper citation

About

Uh oh!

Releases

Packages

Languages

Uh oh!

License

Uh oh!

TailiaReganMalloy/CLDAC

Folders and files

Latest commit

History

Repository files navigation

Multi-Agent Deep Deterministic Policy Gradient (MADDPG)

Installation

Case study: Multi-Agent Particle Environments

Command-line options

Environment options

Core training parameters

Checkpointing

Evaluation

Code structure

Paper citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages