Distributional Reinforcement Learning for Multi-Dimensional Reward Functions

Implementations for our paper Distributional Reinforcement Learning for Multi-Dimensional Reward Functions at NeurIPS 2021.

Codes

The implementation of three maze environments is in dopamine/environment/maze.py, and the implementation of MD3QN is in dopamine/agents/mmdqn_nd/mmdqn_nd_agent.py.

Dependencies

Run

conda env create -f environment.yml

in this repository to install all dependencies for this project.

Run experiments

Run python ./script/batch_general.py to run the experiments for both policy evaluation setting and control setting.

batch_general.py provides an interactive process for user to input the configurations, and run the experiment on these configurations.

By running python ./script/batch_general.py it runs all the experiment with the input configurations.

By running python ./script/batch_general.py export, it exports all the commands for experiments (without running) into ./scripts-export. By running python ./script/batch_general.py test, it outputs the command for experiments (without running).

Modeled Joint distribution in policy evaluation on Maze environments

To reproduce Figure 2 in our paper for modeled joint distribution on policy evaluation setting on Maze, run batch_general.py in the following configurations:

clip_reward: False
icml_setting: False
game_name_file: maze.txt
alg_name: MMDQNND
seed: 0
network_type: v21
bandwidth_type: v3
kscale_type: v11
use_priority: False
evaluation_setting: True
eval_policy_path: use default value (press Enter)
iterations: 20
exp_name: use default value (press Enter)
resume: False
gpu_func: use default value (press Enter)

The results will be saved in dopamine_runs/MultiRewardMaze-v0_MMDQNND_seed-0_network-v21_bw-v3_kscale-v11_eval/evaluation_plots/iter-{iteration}/episode-{episode}.pickle

You can use the script plot-maze.py to plot the results in Figure 2.

Control setting on Atari games

Run python ./script/batch_general.py with the following configurations to run the policy optimization experiments on Atari games. To generate the data needed for Figure 3:

reward_clipping: True
icml_setting: True
game_name_file: default value (press Enter)
alg_name: DQN or HRA or MMDQN or MMDQNND
seed: 0 or 1 or 2
if running MD3QN:
- network_type: v21
- bandwidth_type: v3
- kscale_type: v11
- use_priority: False
- evaluation_setting: False
iterations: 200
exp_name: use default value (press Enter)
resume: False
gpu_func: use default value (press Enter)

The results will be saved in dopamine_runs/{game_name}_MMDQNND_icml_seed-{seed}_network-v21_bw-v3_kscale-v11 and {game_name}_HRA_icml_seed-2

We provide plot-atari.py to generate the Figure 3 in our paper. You can copy all the results for HRA, MMDQN and MD3QN in dopamine_runs/ into data/ folder, and run python plot-atari.py, which will generate Figure 3 in our paper.

RL with multiple constraints by MD3QN

To reproduce the experiments for Appendix A.3.3 (RL with multiple constraints), run python ./script/batch_general.py with the following configurations:

clip_reward: False
icml_setting: False
game_name_file: maze-v3.txt
alg_name: MMDQNND
seed: 0
network_type: v21
bandwidth_type: v3
kscale_type: v11
use_priority: False
evaluation_setting: False
eval_policy_path: use default value (press Enter)
iterations: 20
exp_name: use default value (press Enter)
resume: False
gpu_func: use default value (press Enter)
First use_marginal: False (using MD3QN's joint distribution)
Second use_marginal: True (using MD3QN's marginal distribution information)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Distributional Reinforcement Learning for Multi-Dimensional Reward Functions

Codes

Dependencies

Run experiments

Modeled Joint distribution in policy evaluation on Maze environments

Control setting on Atari games

RL with multiple constraints by MD3QN

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
dopamine		dopamine
reward-compose		reward-compose
scripts		scripts
README.md		README.md
atari.txt		atari.txt
environment.yml		environment.yml
maze-v3.txt		maze-v3.txt
maze.txt		maze.txt
plot-atari.py		plot-atari.py
plot-maze.py		plot-maze.py

zpschang/MD3QN

Folders and files

Latest commit

History

Repository files navigation

Distributional Reinforcement Learning for Multi-Dimensional Reward Functions

Codes

Dependencies

Run experiments

Modeled Joint distribution in policy evaluation on Maze environments

Control setting on Atari games

RL with multiple constraints by MD3QN

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages