Thanks to visit codestin.com
Credit goes to github.com

Skip to content

liangpan99/TokenHSI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

18 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

TokenHSI: Unified Synthesis of Physical Human-Scene Interactions through Task Tokenization

Liang Pan1,2 Β· Zeshi Yang 3 Β· Zhiyang Dou2 Β· Wenjia Wang2 Β· Buzhen Huang4 Β· Bo Dai2,5 Β· Taku Komura2 Β· Jingbo Wang1
1Shanghai AI Lab 2The University of Hong Kong 3Independent Researcher 4Southeast University 5Feeling AI
CVPR 2025
πŸ†οΈ Oral Presentation (Top 3.3%)
Also Spotlight in the 1st Workshop on Humanoid Agents at CVPR 2025

🏠 About

Introducing TokenHSI, a unified model that enables physics-based characters to perform diverse human-scene interaction tasks. It excels at seamlessly unifying multiple foundational HSI skills within a single transformer network and flexibly adapting learned skills to challenging new tasks, including skill composition, object/terrain shape variation, and long-horizon task completion.

πŸ“Ή Demo


Long-horizon Task Completion in a Complex Dynamic Environment

πŸ”₯ News

  • [2025-04-07] Released full code. Please note to download the latest datasets and models from Hugging Face.
  • [2025-04-06] Released three skill composition tasks with pre-trained models.
  • [2025-04-05] TokenHSI has been selected as an oral paper at CVPR 2025! πŸŽ‰
  • [2025-04-03] Released long-horizon task completion with a pre-trained model.
  • [2025-04-01] We just updated the Getting Started section. You can play TokenHSI now!
  • [2025-03-31] We've released the codebase and checkpoint for the foundational skill learning part.

πŸ“ TODO List

  • Release foundational skill learning
  • Release policy adaptation - skill composition
  • Release policy adaptation - object shape variation
  • Release policy adaptation - terrain shape variation
  • Release policy adaptation - long-horizon task completion

πŸ“– Getting Started

Dependencies

Follow the following instructions:

  1. Create new conda environment and install pytroch

    conda create -n tokenhsi python=3.8
    conda activate tokenhsi
    conda install pytorch==2.0.0 torchvision==0.15.0 torchaudio==2.0.0 pytorch-cuda=11.8 -c pytorch -c nvidia
    pip install -r requirements.txt
    
  2. Install IsaacGym Preview 4

    cd IsaacGym_Preview_4_Package/isaacgym/python
    pip install -e .
    
    # add your conda env path to ~/.bashrc
    export LD_LIBRARY_PATH="your_conda_env_path/lib:$LD_LIBRARY_PATH"
    
  3. Install pytorch3d (optional, if you want to run the long-horizon task completion demo)

    We use pytorch3d to rapidly render height maps of dynamic objects for thousands of simulation environments.

    conda install -c fvcore -c iopath -c conda-forge fvcore iopath
    pip install git+https://github.com/facebookresearch/[email protected]
    
  4. Download SMPL body models and organize them as follows:

    |-- assets
    |-- body_models
        |-- smpl
            |-- SMPL_FEMALE.pkl
            |-- SMPL_MALE.pkl
            |-- SMPL_NEUTRAL.pkl
            |-- ...
    |-- lpanlib
    |-- tokenhsi
    

Motion & Object Data

We provide two methods to generate the motion and object data.

  • Download pre-processed data from Hugging Face. Please follow the instruction in the dataset page.

  • Generate data from source:

    1. Download AMASS (SMPL-X Neutral), SAMP, and OMOMO.

    2. Modify dataset paths in tokenhsi/data/dataset_cfg.yaml file.

      # Motion datasets, please use your own paths
      amass_dir: "/YOUR_PATH/datasets/AMASS"
      samp_pkl_dir: "/YOUR_PATH/datasets/samp"
      omomo_dir: "/YOUR_PATH/datasets/OMOMO/data"
      
    3. We still need to download the pre-processed data from Hugging Face. But now we only require the object data.

    4. Run the following script:

      bash tokenhsi/scripts/gen_data.sh
      

Checkpoints

Download checkpoints from Hugging Face. Please follow the instruction in the model page.

πŸ•ΉοΈ Play TokenHSI!

  • Single task policy trained with AMP

    • Path-following

      # test
      sh tokenhsi/scripts/single_task/traj_test.sh
      # train
      sh tokenhsi/scripts/single_task/traj_train.sh
      
    • Sitting

      # test
      sh tokenhsi/scripts/single_task/sit_test.sh
      # train
      sh tokenhsi/scripts/single_task/sit_train.sh
      
    • Climbing

      # test
      sh tokenhsi/scripts/single_task/climb_test.sh
      # train
      sh tokenhsi/scripts/single_task/climb_train.sh
      
    • Carrying

      # test
      sh tokenhsi/scripts/single_task/carry_test.sh
      # train
      sh tokenhsi/scripts/single_task/carry_train.sh
      
  • TokenHSI's unified transformer policy

    • Foundational Skill Learning

      # test
      sh tokenhsi/scripts/tokenhsi/stage1_test.sh
      # eval
      sh tokenhsi/scripts/tokenhsi/stage1_eval.sh carry # we need to specify a task to eval, e.g., traj, sit, climb, or carry.
      # train
      sh tokenhsi/scripts/tokenhsi/stage1_train.sh
      

      If you successfully run the test command, you will see:

    • Policy Adaptation - Skill Composition

      • Traj + Carry

        # test
        sh tokenhsi/scripts/tokenhsi/stage2_comp_traj_carry_test.sh
        # eval
        sh tokenhsi/scripts/tokenhsi/stage2_comp_traj_carry_eval.sh
        # train
        sh tokenhsi/scripts/tokenhsi/stage2_comp_traj_carry_train.sh
        

        If you successfully run the test command, you will see:

      • Sit + Carry

        # test
        sh tokenhsi/scripts/tokenhsi/stage2_comp_sit_carry_test.sh
        # eval
        sh tokenhsi/scripts/tokenhsi/stage2_comp_sit_carry_eval.sh
        # train
        sh tokenhsi/scripts/tokenhsi/stage2_comp_sit_carry_train.sh
        

        If you successfully run the test command, you will see:

      • Climb + Carry

        # test
        sh tokenhsi/scripts/tokenhsi/stage2_comp_climb_carry_test.sh
        # eval
        sh tokenhsi/scripts/tokenhsi/stage2_comp_climb_carry_eval.sh
        # train
        sh tokenhsi/scripts/tokenhsi/stage2_comp_climb_carry_train.sh
        

        If you successfully run the test command, you will see:

    • Policy Adaptation - Object Shape Variation

      • Carrying: Box-2-Chair

        # test
        sh tokenhsi/scripts/tokenhsi/stage2_object_chair_test.sh
        # eval
        sh tokenhsi/scripts/tokenhsi/stage2_object_chair_eval.sh
        # train
        sh tokenhsi/scripts/tokenhsi/stage2_object_chair_train.sh
        

        If you successfully run the test command, you will see:

      • Carrying: Box-2-Table

        # test
        sh tokenhsi/scripts/tokenhsi/stage2_object_table_test.sh
        # eval
        sh tokenhsi/scripts/tokenhsi/stage2_object_table_eval.sh
        # train
        sh tokenhsi/scripts/tokenhsi/stage2_object_table_train.sh
        

        If you successfully run the test command, you will see:

    • Policy Adaptation - Terrain Shape Variation

      • Path-following

        # test
        sh tokenhsi/scripts/tokenhsi/stage2_terrain_traj_test.sh
        # eval
        sh tokenhsi/scripts/tokenhsi/stage2_terrain_traj_eval.sh
        # train
        sh tokenhsi/scripts/tokenhsi/stage2_terrain_traj_train.sh
        

        If you successfully run the test command, you will see:

    • Carrying

      # test
      sh tokenhsi/scripts/tokenhsi/stage2_terrain_carry_test.sh
      # eval
      sh tokenhsi/scripts/tokenhsi/stage2_terrain_carry_eval.sh
      # train
      sh tokenhsi/scripts/tokenhsi/stage2_terrain_carry_train.sh
      

      If you successfully run the test command, you will see:

    • Policy Adaptation - Long-horizon Task Completion

      # test
      sh tokenhsi/scripts/tokenhsi/stage2_longterm_test.sh
      # eval
      sh tokenhsi/scripts/tokenhsi/stage2_longterm_eval.sh
      # train
      sh tokenhsi/scripts/tokenhsi/stage2_longterm_train.sh
      

Viewer Shortcuts

Keyboard Function
F focus on humanoid
Right Click + WASD change view port
Shift + Right Click + WASD change view port fast
K visualize lines
L record screenshot, press again to stop recording

The recorded screenshots are saved in output/imgs/. You can use lpanlib/others/video.py to generate mp4 video from the recorded images.

python lpanlib/others/video.py --imgs_dir output/imgs/example_path --delete_imgs

πŸ”— Citation

If you find our work helpful, please cite:

@inproceedings{pan2025tokenhsi,
  title={TokenHSI: Unified Synthesis of Physical Human-Scene Interactions through Task Tokenization},
  author={Pan, Liang and Yang, Zeshi and Dou, Zhiyang and Wang, Wenjia and Huang, Buzhen and Dai, Bo and Komura, Taku and Wang, Jingbo},
  booktitle={CVPR},
  year={2025},
}

@inproceedings{pan2024synthesizing,
  title={Synthesizing physically plausible human motions in 3d scenes},
  author={Pan, Liang and Wang, Jingbo and Huang, Buzhen and Zhang, Junyu and Wang, Haofan and Tang, Xu and Wang, Yangang},
  booktitle={2024 International Conference on 3D Vision (3DV)},
  pages={1498--1507},
  year={2024},
  organization={IEEE}
}

Please also consider citing the following papers that inspired TokenHSI.

@article{tessler2024maskedmimic,
  title={Maskedmimic: Unified physics-based character control through masked motion inpainting},
  author={Tessler, Chen and Guo, Yunrong and Nabati, Ofir and Chechik, Gal and Peng, Xue Bin},
  journal={ACM Transactions on Graphics (TOG)},
  volume={43},
  number={6},
  pages={1--21},
  year={2024},
  publisher={ACM New York, NY, USA}
}

@article{he2024hover,
  title={Hover: Versatile neural whole-body controller for humanoid robots},
  author={He, Tairan and Xiao, Wenli and Lin, Toru and Luo, Zhengyi and Xu, Zhenjia and Jiang, Zhenyu and Kautz, Jan and Liu, Changliu and Shi, Guanya and Wang, Xiaolong and others},
  journal={arXiv preprint arXiv:2410.21229},
  year={2024}
}

πŸ‘ Acknowledgements and πŸ“š License

This repository builds upon the following awesome open-source projects:

  • ASE: Contributes to the physics-based character control codebase
  • Pacer: Contributes to the procedural terrain generation and trajectory following task
  • rl_games: Contributes to the reinforcement learning code
  • OMOMO/SAMP/AMASS/3D-Front: Used for the reference dataset construction
  • InterMimic: Used for the github repo readme design

This codebase is released under the MIT License.
Please note that it also relies on external libraries and datasets, each of which may be subject to their own licenses and terms of use.

🌟 Star History

Star History Chart

About

[CVPR 2025 Oral] TokenHSI: Unified Synthesis of Physical Human-Scene Interactions through Task Tokenization

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published