DPMORL

Implementations for our paper Distributional Pareto-Optimal Multi-Objective Reinforcement Learning at NeurIPS 2023.

Requirements

To install the environment (except for ReacherBullet), run:

conda create -n dpmorl python=3.8
conda activate dpmorl
pip install -r requirements.txt

Training Policies with DPMORL

Generating Utility Functions

Before training policies, DPMORL requires first generate utility functions. To generate utility functions, run:

python main_generate_utility.py

The generated utility functions are saved in utility-model/dim-2, and the visualization are saved in utility-plot/dim-2.

You can run

python main_generate_utility.py --reward_shape 3 --num_utility_function 100

for configuring the reward dimensions and utility function number for generated utility functions.

We have provided part of our generated utility functions in utility-model-selected and utility-plot-selected.

Training Policies

To policies by DPMORL in the paper, run this command:

python -u main_policy.py --lamda=0.1 --env [env] --reward_two_dim --exp_name [exp_name]

The environment name is in env.txt.

Configuration --reward_two_dim makes DPMORL run on the first two dimensions of reward functions. To run DPMORL on other dimensions of reward (e.g. 0, 1, 2 dimension), you can change --reward_two_dim to --reward_dim_indices=[0,1,2].

You can also run . run_policy_parallel.sh to run DPMORL in all environments in parallel.

Evaluate the policies

After training finished, you should evaluate the policies learned by DPMORL by running . run_test.sh.

Visualize the return distributions of learned policies

To visualize the return distributions of policies learned by DPMORL, run python plot_utility_returns.py [exp_name]. The visualization results will be located in the experiments/[exp_name] directory. test_final_*.png will visualize the return distributions of all learned policies by DPMORL.

Compute the evaluation metric

Run stats.py to compute all the evaluation metrics for DPMORL and other baseline methods. The implemenetation includes constraint satisfaction and variance objective.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
DIPG		DIPG
MORL_stablebaselines3		MORL_stablebaselines3
normalization_data		normalization_data
utility-model-selected/dim-2		utility-model-selected/dim-2
utility-plot-selected/dim-2		utility-plot-selected/dim-2
.gitignore		.gitignore
README.md		README.md
env.txt		env.txt
main_generate_utility.py		main_generate_utility.py
main_policy.py		main_policy.py
plot_utility_returns.py		plot_utility_returns.py
requirements.txt		requirements.txt
run_policy_parallel.sh		run_policy_parallel.sh
run_test.sh		run_test.sh
stats.py		stats.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DPMORL

Requirements

Training Policies with DPMORL

Generating Utility Functions

Training Policies

Evaluate the policies

Visualize the return distributions of learned policies

Compute the evaluation metric

About

Uh oh!

Releases

Packages

Uh oh!

Languages

zpschang/DPMORL

Folders and files

Latest commit

History

Repository files navigation

DPMORL

Requirements

Training Policies with DPMORL

Generating Utility Functions

Training Policies

Evaluate the policies

Visualize the return distributions of learned policies

Compute the evaluation metric

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages