TokenHSI: Unified Synthesis of Physical Human-Scene Interactions through Task Tokenization

Liang Pan^1,2 · Zeshi Yang ³ · Zhiyang Dou² · Wenjia Wang² · Buzhen Huang⁴ · Bo Dai^2,5 · Taku Komura² · Jingbo Wang¹
¹Shanghai AI Lab ²The University of Hong Kong ³Independent Researcher ⁴Southeast University ⁵Feeling AI
CVPR 2025
🏆️ Oral Presentation (Top 3.3%)
Also Spotlight in the 1st Workshop on Humanoid Agents at CVPR 2025

🏠 About

Introducing TokenHSI, a unified model that enables physics-based characters to perform diverse human-scene interaction tasks. It excels at seamlessly unifying multiple foundational HSI skills within a single transformer network and flexibly adapting learned skills to challenging new tasks, including skill composition, object/terrain shape variation, and long-horizon task completion.

📹 Demo

Long-horizon Task Completion in a Complex Dynamic Environment

🔥 News

[2025-04-07] Released full code. Please note to download the latest datasets and models from Hugging Face.
[2025-04-06] Released three skill composition tasks with pre-trained models.
[2025-04-05] TokenHSI has been selected as an oral paper at CVPR 2025! 🎉
[2025-04-03] Released long-horizon task completion with a pre-trained model.
[2025-04-01] We just updated the Getting Started section. You can play TokenHSI now!
[2025-03-31] We've released the codebase and checkpoint for the foundational skill learning part.

📝 TODO List

Release foundational skill learning
Release policy adaptation - skill composition
Release policy adaptation - object shape variation
Release policy adaptation - terrain shape variation
Release policy adaptation - long-horizon task completion

📖 Getting Started

Dependencies

Follow the following instructions:

Create new conda environment and install pytroch

conda create -n tokenhsi python=3.8
conda activate tokenhsi
conda install pytorch==2.0.0 torchvision==0.15.0 torchaudio==2.0.0 pytorch-cuda=11.8 -c pytorch -c nvidia
pip install -r requirements.txt

Install IsaacGym Preview 4

cd IsaacGym_Preview_4_Package/isaacgym/python
pip install -e .

# add your conda env path to ~/.bashrc
export LD_LIBRARY_PATH="your_conda_env_path/lib:$LD_LIBRARY_PATH"

Install pytorch3d (optional, if you want to run the long-horizon task completion demo)

We use pytorch3d to rapidly render height maps of dynamic objects for thousands of simulation environments.
```
conda install -c fvcore -c iopath -c conda-forge fvcore iopath
pip install git+https://github.com/facebookresearch/[email protected]
```

Download SMPL body models and organize them as follows:

|-- assets
|-- body_models
    |-- smpl
        |-- SMPL_FEMALE.pkl
        |-- SMPL_MALE.pkl
        |-- SMPL_NEUTRAL.pkl
        |-- ...
|-- lpanlib
|-- tokenhsi

Motion & Object Data

We provide two methods to generate the motion and object data.

Download pre-processed data from Hugging Face. Please follow the instruction in the dataset page.
Generate data from source:
1. Download AMASS (SMPL-X Neutral), SAMP, and OMOMO.
2. Modify dataset paths in tokenhsi/data/dataset_cfg.yaml file.
```
# Motion datasets, please use your own paths
amass_dir: "/YOUR_PATH/datasets/AMASS"
samp_pkl_dir: "/YOUR_PATH/datasets/samp"
omomo_dir: "/YOUR_PATH/datasets/OMOMO/data"
```
3. We still need to download the pre-processed data from Hugging Face. But now we only require the object data.
4. Run the following script:
```
bash tokenhsi/scripts/gen_data.sh
```

Checkpoints

Download checkpoints from Hugging Face. Please follow the instruction in the model page.

🕹️ Play TokenHSI!

Single task policy trained with AMP

Path-following

# test
sh tokenhsi/scripts/single_task/traj_test.sh
# train
sh tokenhsi/scripts/single_task/traj_train.sh

Sitting

# test
sh tokenhsi/scripts/single_task/sit_test.sh
# train
sh tokenhsi/scripts/single_task/sit_train.sh

Climbing

# test
sh tokenhsi/scripts/single_task/climb_test.sh
# train
sh tokenhsi/scripts/single_task/climb_train.sh

Carrying

# test
sh tokenhsi/scripts/single_task/carry_test.sh
# train
sh tokenhsi/scripts/single_task/carry_train.sh

TokenHSI's unified transformer policy

Foundational Skill Learning

# test
sh tokenhsi/scripts/tokenhsi/stage1_test.sh
# eval
sh tokenhsi/scripts/tokenhsi/stage1_eval.sh carry # we need to specify a task to eval, e.g., traj, sit, climb, or carry.
# train
sh tokenhsi/scripts/tokenhsi/stage1_train.sh