Thanks to visit codestin.com
Credit goes to github.com

Skip to content

The official code for paper "Trustworthy Human-AI Collaboration: Reinforcement Learning with Human Feedback and Physics Knowledge for Safe Autonomous Driving”

License

zilin-huang/PE-RLHF

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Trustworthy Human-AI Collaboration ✨: Reinforcement Learning with Human Feedback and Physics Knowledge for Safe Autonomous Driving

Webpage | Code | Video | Paper

PE-RLHF.mp4

🌟 Highlights

  • 2024-09-16 Codes are now release!
  • 2024-09-12 Explore our project page, now live here🔗!
  • 2024-09-01 Our paper is available on Arxiv📄!

1. Requirements 📦

For an optimal experience, we recommend using conda to set up a new environment for PE-RLHF.

# Clone the code to local
git clone https://github.com/zilin-huang/PE-RLHF.git
cd PE-RLHF

# Create virtual environment
conda create -n PE-RLHF python=3.7
conda activate PE-RLHF

# Install basic dependency
pip install -e .

conda install cudatoolkit=11.0
conda install -c nvidia cudnn
# Now you can run the training script of PE-RLHF in MetaDrive Environment.

2. Quick Start 🚀

Since the main experiment of PE-RLHF takes one hour and requires a steering wheel (Logitech G920), we further provide an easy task for users to experience PE-RLHF.

# use PE-RLHF environment
conda activate PE-RLHF 
cd pe_rlhf/run_main_exp/
python train_pe_rlhf_keyboard_easy.py --exp-name PE_RLHF --local-dir PATH_TO_SAVE_DIR --num-gpus=[your_gpu_num]

For example:

python train_pe_rlhf_keyboard_easy.py --exp-name PE_RLHF --local-dir /home/sky-lab/codes/PE-RLHF/pe_rlhf/run_main_exp --num-gpus=1

Now, human is authorized to take over the vehicle by pressing W/A/S/D and guide or safeguard the agent to the destination ("E" can be used to pause simulation). Since there is only one map in this task, 10 minutes or 5000 transitions is enough for agent to learn a policy.

📝 Note: To make it easier for you to get started and test our code, we have uploaded a pretrained value estimator, i.e., sac_expert.npz, in the utils/saved_expert folder. Meanwhile, we set "warmup_ts": 0, so the info interface will show warmup: False. i.e., no need to warm up.

3. Main Experiment 💡

To reproduce the main experiment reported in paper, run following scripts:

python train_pe_rlhf.py --exp-name PE_RLHF --local-dir PATH_TO_SAVE_DIR --num-gpus=[your_gpu_num]

If steering wheel is not available, set controller="keyboard" in the script to train HAIM-DRL agent. After launching this script, one hour is required for human to assist HAIM-DRL agent to learn a generalizable driving policy by training in 50 different maps.

📊 Visualizing Results: Then, the training results will be saved in the run_main_exp/PE_RLHF folder. To evaluate PE-RLHF, you should run run_main_exp/eval_pe_rlhf.py.

4. Training Baselines 🖥️

4.1 Physics-based Methods The physics-based methods (e.g., IDM-MOBIL model) does not need to train a model, to eval the performance you can directly run:
# use previous PE-RLHF environment
conda activate PE-RLHF  
cd pe_rlhf/run_baselines
# eval physics-based experiment
python eval_IDM_MOBIL.py

📊 Visualizing Results: Then, testing results for physics-based methods will be saved in the idm_mobil_results.csv file.

4.2 RL and Safe RL Methods For SAC/PPO/PPO-Lag/SAC-Lag, there is no additional requirement to run the training scripts.
# use previous PE-RLHF environment
conda activate PE-RLHF  
cd pe_rlhf/run_baselines
# launch RL and safe RL experiment
python train_[ppo/sac/sac_lag/ppo_lag].py --num-gpus=[your_gpu_num]

For example:

python train_ppo.py --num-gpus=1

📝 Note: The result reported in our paper for RL and safe RL methods were repeated five times using different random seeds. To save computer resource, can revise the num_seeds=5 to num_seeds=1 in the train_[ppo/sac/sac_lag/ppo_lag].py.

📊 Visualizing Results: Then, the training and testing results for RL and safe RL methods will be saved in the run_baselines/[ppo/sac/sac_lag/ppo_lag] folder. You can open it with tensorboard.

For example:

tensorboard --logdir=. --port=8080
4.3 Human Demonstration Dataset

Human demonstration dataset is required to run offline RL, IL, and offline RLHF methods. You can collect human demonstration by runing:

# use previous PE-RLHF environment
conda activate PE-RLHF 
cd pe_rlhf/utils
# launch human demonstration data collection experiment
python collect_human_data_set.py

📝 Note: Also, for your convenient, we provide a high-quality demonstration dataset collected by our PE-RLHF team human expert. This dataset contains approximately 49,000 transitions in the training environment. This high-quality demonstration dataset achieves 100% success rate, with an episodic return of 388.16 and a low safety violation rate of 0.03. You can dricetly download at here.

Next, move the dataset, for example, human_traj_100_new.json, from the Downloads directory to the 'pe_rlhf' directory (which is at the same level as the algo, run_baselines, run_main_exp, and utils.

For example, you can use the following command:

mv ~/Downloads/human_traj_100_new.json /home/sky-lab/codes/PE-RLHF/pe_rlhf/
4.4 IL Methods

If you wish to run BC/CQL, extra setting is required as follows:

# ray needs to be updated to 1.2.0
pip install ray==1.2.0
cd pe_rlhf/run_baselines
# launch BC/CQL experiment
python train_[bc/cql].py --num-gpus=0 # do not use gpu

⚙️ Issue: BC/CQL will encounter the following error:

File "/home/zilin/anaconda3/envs/PE-RLHF/lib/python3.7/site-packages/ray/rllib/utils/torch_ops.py", line 105, in mapping
    tensor = torch.from_numpy(np.asarray(item))
TypeError: can't convert np.ndarray of type numpy.object_. The only supported types are: float64, float32, float16, complex64, complex128, int64, int32, int16, int8, uint8, and bool.

💡 Solution: Modify lines 104-105 of the original code, i,e., tensor = torch.from_numpy(np.asarray(item)). The new code as

else:
    # tensor = torch.from_numpy(np.asarray(item))
    if isinstance(item, bool):
        item = int(item)
    tensor = torch.from_numpy(np.asarray(item).astype(float))

📝 Note: Since the computational process of BC and CQL is on the CPU, it requires a relatively large amount of CPU memory. To reduce the computational resources, you can revise the num_seeds=5 to num_seeds=1 in the train_[bc/cql].py. Also we set bc_iters=tune.grid_search([5_0000, 10_0000] in train_cql.py. You can also change it to bc_iters=5_0000. this will also save computational resources.

4.5 GAIL and Offline RLHF Methods

To run GAIL/HG-DAgger/IWR, please create a new conda environment and install GPU-version of torch:

# Create virtual environment
conda create -n PE-RLHF-torch python=3.7
conda activate PE-RLHF-torch
# Install basic dependency
pip install -e .
# install torch
conda install pytorch==1.7.1 torchvision==0.8.2 torchaudio==0.7.2 cudatoolkit=11.0 -c pytorch
conda install cudatoolkit=11.0

Now, GAIL can be trained by:

cd pe_rlhf/run_baselines 
python train_[gail/IWR/hg_dagger].py

📝 Note: You can set the maximum number of training cycles in train_gail.py by modifying parser.add_argument('--max_epoch', default=3000).

IWR/HG-Dagger can be trained by:

cd pe_rlhf/run_baselines 
python train_[IWR/hg_dagger].py

📝 Note: IWR and HG-Dagger run through a warm-up period of renderless, then the metadrive render screen pops up, with the human-in-the-loop capability activated to allow a human takeover. You can set the number of training rounds by modifying NUM_ITS = 5 in train_[IWR/hg_dagger].py. Also, you can adjust BC_WARMUP_DATA_USAGE = 30000 in train_[IWR/hg_dagger].py to set a different number of warmup transitions.

📊 Visualizing Results: Then, the training model for IWR and HG-Dagger will be saved in the run_baselines folder. Notably, to evaluate IWR and HG-Dagger, you should run algo/IWR/eval_IWR.py or algo/HG_Dagger/eval_hg_dagger.py.

4.6 Online RLHF Methods If you want to run HACO
cd pe_rlhf/run_baselines 
python train_haco.py --num-gpus=1

📝 Note: If steering wheel is not available, uncomment line 25 in train_haco.py, i.e., # “controller”: “keyboard”, # use keyboard or not. This allows you to use the keyboard to control agent.

📊 Visualizing Results: Then, the training results will be saved in the run_baselines/HACO folder. To evaluate HACO, you should run run_baselines/eval_haco.py.

5. Ablation Study 📊

The pe_rlhf_human_in_the_loop_env.py is the core file to implement the PE-RLHF. You can examine the sensitivity of PE-RLHF as well as perform ablation experiments by adjusting the settings in lines 52 - 67. In addition, you can also do this by adjusting lines 30 - 35 of train_pe_rlhf.py.

You can also achieve this effect by setting command line parameters. For example:

python train_pe_rlhf.py --exp-name PE_RLHF --local-dir /home/sky-lab/codes/PE-RLHF/pe_rlhf/run_main_exp --num-gpus=1 --pe_rlhf-ensemble --ckpt-freq 10

Contributions 🤝

Your thoughts and contributions are a green signal for us! 🚦

If you have any suggestions or collaboration about this repo, feel free to create a issue/PR or send email to us ([email protected]). 📬

Citation 🔖

If you find our paper and codes useful, please kindly cite us via:

@article{huang2024trustworthy,
        title={Trustworthy Human-AI Collaboration: Reinforcement Learning with Human Feedback and Physics Knowledge for Safe Autonomous Driving},
        author={Huang, Zilin and Sheng, Zihao and Chen, Sikai},
        journal={arXiv preprint arXiv:2409.00858},
        year={2024}
      }

Other Awesome Projects from Our Team 📌

Our team is actively involved in various innovative projects in the realm of autonomous driving. Here are some other exciting repositories that you might find interesting:

📝 License

The content of this repository is under the hood of an MIT License.

About

The official code for paper "Trustworthy Human-AI Collaboration: Reinforcement Learning with Human Feedback and Physics Knowledge for Safe Autonomous Driving”

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages