DecAP : Decaying Action Priors for Accelerated Imitation Learning of Torque-Based Legged Locomotion Policies
Follow these steps to set up the legged_gym environment with DecAP:
We recommend using Python 3.8:
conda create --name=decap python=3.8
conda activate decappip install torch==1.10.0+cu113 torchvision==0.11.1+cu113 torchaudio==0.10.0+cu113 \
-f https://download.pytorch.org/whl/cu113/torch_stable.htmlpip uninstall torch -y
pip install torch==2.2.2+cu121 torchvision torchaudio \
--index-url https://download.pytorch.org/whl/cu121- Download Isaac Gym Preview 3 (Preview 2 will not work!) from NVIDIA Isaac Gym.
- Install Isaac Gym:
cd isaacgym/python
pip install -e .- Test the installation:
cd examples
python 1080_balls_of_solitude.py- For troubleshooting, refer to
isaacgym/docs/index.html.
git clone https://github.com/marmotlab/decaying_action_priors.git
cd decaying_action_priors/rsl_rl && pip install -e .
cd .. && pip install -e .You are now ready to train and run policies with DecAP!
Trained policies are stored in the logs folder. Each folder contains:
- Torque policies trained with DecAP + Imitation
- Torque policies trained with Imitation alone
Log folders are named as:
decap_[reward_scale_value]imi_[reward_scale_value]
To run a trained torque policy:
- Set the following in your config file:
control_type = torquesaction_scale = 8.0
- Run:
python legged_gym/scripts/play.py --task=go1_flat --load_run=decap_0.75x
- You can use different robots and policies for comparison.
Key parameters for DecAP are in legged_gym/envs/param_config.yaml:
control_type:decap_torques(default): trains torques using DecAPtorques: for inference or imitation-only trainingposition: for position control
gammaandk: DecAP hyperparameters (see paper for details)path_to_imitation_data: set according to the robot (choose from commented options)
To train a policy:
python legged_gym/scripts/train.py --task={task_name}- Supported tasks:
go1_flat,cassie,yuna,h1
Imitation Rewards Used (see {robot}_config files):
- Joint Angles
- End-effector Position
- Foot Height
- Base Height
Imitation Reward Scales Tested:
0.75, 1.5, 7.5, 15.0
To train using imitation only (no DecAP), set
control_type = torques.
- You can generate imitation data by training your own position policy, using any position policy, or an optimal controller.
- The
imitation_datafolder contains example data. - Imitation rewards are defined in the respective robot files in the
envsfolder. - Decaying action priors are set in the
compute_torquesfunction in the same files.
- Move DecAP params to YAML
- Go2 support
- H1 Humanoid support
- Sim-to-Sim Mujoco support
- Code for hardware deployment
If you find this work useful, please consider citing:
@article{Sood2023DecAPD,
title={DecAP : Decaying Action Priors for Accelerated Imitation Learning of Torque-Based Legged Locomotion Policies},
author={Shivam Sood and Ge Sun and Peizhuo Li and Guillaume Sartoretti},
journal={2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
year={2023},
pages={2809-2815},
url={https://api.semanticscholar.org/CorpusID:263830010}
}We used the codebase from Legged Gym and RSL RL:
- Rudin, Nikita, et al. "Learning to walk in minutes using massively parallel deep reinforcement learning." CoRL 2022.