Liang Pan1,2
Β·
Zeshi Yang 3
Β·
Zhiyang Dou2
Β·
Wenjia Wang2
Β·
Buzhen Huang4
Β·
Bo Dai2,5
Β·
Taku Komura2
Β·
Jingbo Wang1
1Shanghai AI Lab 2The University of Hong Kong 3Independent Researcher 4Southeast University 5Feeling AI
CVPR 2025
ποΈ Oral Presentation (Top 3.3%)
Also Spotlight in the 1st Workshop on Humanoid Agents at CVPR 2025
Long-horizon Task Completion in a Complex Dynamic Environment
- [2025-04-07] Released full code. Please note to download the latest datasets and models from Hugging Face.
- [2025-04-06] Released three skill composition tasks with pre-trained models.
- [2025-04-05] TokenHSI has been selected as an oral paper at CVPR 2025! π
- [2025-04-03] Released long-horizon task completion with a pre-trained model.
- [2025-04-01] We just updated the Getting Started section. You can play TokenHSI now!
- [2025-03-31] We've released the codebase and checkpoint for the foundational skill learning part.
- Release foundational skill learning
- Release policy adaptation - skill composition
- Release policy adaptation - object shape variation
- Release policy adaptation - terrain shape variation
- Release policy adaptation - long-horizon task completion
Follow the following instructions:
-
Create new conda environment and install pytroch
conda create -n tokenhsi python=3.8 conda activate tokenhsi conda install pytorch==2.0.0 torchvision==0.15.0 torchaudio==2.0.0 pytorch-cuda=11.8 -c pytorch -c nvidia pip install -r requirements.txt -
Install IsaacGym Preview 4
cd IsaacGym_Preview_4_Package/isaacgym/python pip install -e . # add your conda env path to ~/.bashrc export LD_LIBRARY_PATH="your_conda_env_path/lib:$LD_LIBRARY_PATH" -
Install pytorch3d (optional, if you want to run the long-horizon task completion demo)
We use pytorch3d to rapidly render height maps of dynamic objects for thousands of simulation environments.
conda install -c fvcore -c iopath -c conda-forge fvcore iopath pip install git+https://github.com/facebookresearch/[email protected] -
Download SMPL body models and organize them as follows:
|-- assets |-- body_models |-- smpl |-- SMPL_FEMALE.pkl |-- SMPL_MALE.pkl |-- SMPL_NEUTRAL.pkl |-- ... |-- lpanlib |-- tokenhsi
We provide two methods to generate the motion and object data.
-
Download pre-processed data from Hugging Face. Please follow the instruction in the dataset page.
-
Generate data from source:
-
Download AMASS (SMPL-X Neutral), SAMP, and OMOMO.
-
Modify dataset paths in
tokenhsi/data/dataset_cfg.yamlfile.# Motion datasets, please use your own paths amass_dir: "/YOUR_PATH/datasets/AMASS" samp_pkl_dir: "/YOUR_PATH/datasets/samp" omomo_dir: "/YOUR_PATH/datasets/OMOMO/data" -
We still need to download the pre-processed data from Hugging Face. But now we only require the object data.
-
Run the following script:
bash tokenhsi/scripts/gen_data.sh
-
Download checkpoints from Hugging Face. Please follow the instruction in the model page.
-
Single task policy trained with AMP
-
Path-following
# test sh tokenhsi/scripts/single_task/traj_test.sh # train sh tokenhsi/scripts/single_task/traj_train.sh -
Sitting
# test sh tokenhsi/scripts/single_task/sit_test.sh # train sh tokenhsi/scripts/single_task/sit_train.sh -
Climbing
# test sh tokenhsi/scripts/single_task/climb_test.sh # train sh tokenhsi/scripts/single_task/climb_train.sh -
Carrying
# test sh tokenhsi/scripts/single_task/carry_test.sh # train sh tokenhsi/scripts/single_task/carry_train.sh
-
-
TokenHSI's unified transformer policy
-
Foundational Skill Learning
# test sh tokenhsi/scripts/tokenhsi/stage1_test.sh # eval sh tokenhsi/scripts/tokenhsi/stage1_eval.sh carry # we need to specify a task to eval, e.g., traj, sit, climb, or carry. # train sh tokenhsi/scripts/tokenhsi/stage1_train.shIf you successfully run the test command, you will see:
-
Policy Adaptation - Skill Composition
-
Traj + Carry
# test sh tokenhsi/scripts/tokenhsi/stage2_comp_traj_carry_test.sh # eval sh tokenhsi/scripts/tokenhsi/stage2_comp_traj_carry_eval.sh # train sh tokenhsi/scripts/tokenhsi/stage2_comp_traj_carry_train.shIf you successfully run the test command, you will see:
-
Sit + Carry
# test sh tokenhsi/scripts/tokenhsi/stage2_comp_sit_carry_test.sh # eval sh tokenhsi/scripts/tokenhsi/stage2_comp_sit_carry_eval.sh # train sh tokenhsi/scripts/tokenhsi/stage2_comp_sit_carry_train.shIf you successfully run the test command, you will see:
-
Climb + Carry
# test sh tokenhsi/scripts/tokenhsi/stage2_comp_climb_carry_test.sh # eval sh tokenhsi/scripts/tokenhsi/stage2_comp_climb_carry_eval.sh # train sh tokenhsi/scripts/tokenhsi/stage2_comp_climb_carry_train.shIf you successfully run the test command, you will see:
-
-
Policy Adaptation - Object Shape Variation
-
Carrying: Box-2-Chair
# test sh tokenhsi/scripts/tokenhsi/stage2_object_chair_test.sh # eval sh tokenhsi/scripts/tokenhsi/stage2_object_chair_eval.sh # train sh tokenhsi/scripts/tokenhsi/stage2_object_chair_train.shIf you successfully run the test command, you will see:
-
Carrying: Box-2-Table
# test sh tokenhsi/scripts/tokenhsi/stage2_object_table_test.sh # eval sh tokenhsi/scripts/tokenhsi/stage2_object_table_eval.sh # train sh tokenhsi/scripts/tokenhsi/stage2_object_table_train.shIf you successfully run the test command, you will see:
-
-
Policy Adaptation - Terrain Shape Variation
-
Path-following
# test sh tokenhsi/scripts/tokenhsi/stage2_terrain_traj_test.sh # eval sh tokenhsi/scripts/tokenhsi/stage2_terrain_traj_eval.sh # train sh tokenhsi/scripts/tokenhsi/stage2_terrain_traj_train.shIf you successfully run the test command, you will see:
-
-
Carrying
# test sh tokenhsi/scripts/tokenhsi/stage2_terrain_carry_test.sh # eval sh tokenhsi/scripts/tokenhsi/stage2_terrain_carry_eval.sh # train sh tokenhsi/scripts/tokenhsi/stage2_terrain_carry_train.shIf you successfully run the test command, you will see:
-
Policy Adaptation - Long-horizon Task Completion
# test sh tokenhsi/scripts/tokenhsi/stage2_longterm_test.sh # eval sh tokenhsi/scripts/tokenhsi/stage2_longterm_eval.sh # train sh tokenhsi/scripts/tokenhsi/stage2_longterm_train.sh
-
| Keyboard | Function |
|---|---|
| F | focus on humanoid |
| Right Click + WASD | change view port |
| Shift + Right Click + WASD | change view port fast |
| K | visualize lines |
| L | record screenshot, press again to stop recording |
The recorded screenshots are saved in output/imgs/. You can use lpanlib/others/video.py to generate mp4 video from the recorded images.
python lpanlib/others/video.py --imgs_dir output/imgs/example_path --delete_imgs
If you find our work helpful, please cite:
@inproceedings{pan2025tokenhsi,
title={TokenHSI: Unified Synthesis of Physical Human-Scene Interactions through Task Tokenization},
author={Pan, Liang and Yang, Zeshi and Dou, Zhiyang and Wang, Wenjia and Huang, Buzhen and Dai, Bo and Komura, Taku and Wang, Jingbo},
booktitle={CVPR},
year={2025},
}
@inproceedings{pan2024synthesizing,
title={Synthesizing physically plausible human motions in 3d scenes},
author={Pan, Liang and Wang, Jingbo and Huang, Buzhen and Zhang, Junyu and Wang, Haofan and Tang, Xu and Wang, Yangang},
booktitle={2024 International Conference on 3D Vision (3DV)},
pages={1498--1507},
year={2024},
organization={IEEE}
}Please also consider citing the following papers that inspired TokenHSI.
@article{tessler2024maskedmimic,
title={Maskedmimic: Unified physics-based character control through masked motion inpainting},
author={Tessler, Chen and Guo, Yunrong and Nabati, Ofir and Chechik, Gal and Peng, Xue Bin},
journal={ACM Transactions on Graphics (TOG)},
volume={43},
number={6},
pages={1--21},
year={2024},
publisher={ACM New York, NY, USA}
}
@article{he2024hover,
title={Hover: Versatile neural whole-body controller for humanoid robots},
author={He, Tairan and Xiao, Wenli and Lin, Toru and Luo, Zhengyi and Xu, Zhenjia and Jiang, Zhenyu and Kautz, Jan and Liu, Changliu and Shi, Guanya and Wang, Xiaolong and others},
journal={arXiv preprint arXiv:2410.21229},
year={2024}
}This repository builds upon the following awesome open-source projects:
- ASE: Contributes to the physics-based character control codebase
- Pacer: Contributes to the procedural terrain generation and trajectory following task
- rl_games: Contributes to the reinforcement learning code
- OMOMO/SAMP/AMASS/3D-Front: Used for the reference dataset construction
- InterMimic: Used for the github repo readme design
This codebase is released under the MIT License.
Please note that it also relies on external libraries and datasets, each of which may be subject to their own licenses and terms of use.








