Project page | Paper | Bilibili
Generating high-quality whole-body human object interaction motion sequences is becoming increasingly important in various fields such as animation, VR/AR, and robotics. The main challenge of this task lies in determining the level of involvement of each hand given the complex shapes of objects in different sizes and their different motion trajectories, while ensuring strong grasping realism and guaranteeing the coordination of movement in all body parts. Contrasting with existing work, which either generates human interaction motion sequences without detailed hand grasping poses or only models a static grasping pose, we propose a simple yet effective framework that jointly models the relationship between the body, hands, and the given object motion sequences within a single diffusion model. To guide our network in perceiving the object's spatial position and learning more natural grasping poses, we introduce novel contact-aware losses and incorporate a data-driven, carefully designed guidance. Experimental results demonstrate that our approach outperforms the state-of-the-art method and generates plausible whole-body motion sequences.This package has the following requirements:
- Python >=3.8.0
- Pytorch>=1.13.0 (cuda version==11.6)
- pytroch3d >=0.7.5
- Kaolin==0.15.0
- SMPLX
- bps_torch
- aitviewer
To install the dependencies please follow the next steps:
- Clone this repository
- Install the dependencies:
pip install -r requirements.txt - Download SMPL-X model, and place it to
./para_models/smplx.
-
Download the GRAB dataset from the GRAB website, and follow the instructions there to extract the files. Save the raw data in
../DATASETS/GRAB. -
Sample 4000 points for each object in the GRAB dataset:
python data/preprocess_GRAB_objects.py -
To pre-process the GRAB dataset for our setting, run:
python data/process_GRAB.py
Please download the GRAB dataset checkpoint and put them in the folders as below.
DiffGrasp
├── work_dir
│ ├── DiffGrasp
│ │ ├── snapshots
│ │ │ ├──E300_model.pt
│ │ │ │
│ │ │ │
.
.
.
python train_DiffGrasp.py --mode=training
If you have downloaded checkpoint, you can inference directly; otherwise, please train DiffGrasp first.
python train_DiffGrasp.py --mode=inference
We use aitviewer to visualize the results. Please run on the server side:
python ait_vis.py
Launch an empty viewer with the command:
python -m aitviewer.server
@article{Zhang2025DiffGrasp,
title={DiffGrasp: Whole-Body Grasping Synthesis Guided by Object Motion Using a Diffusion Model},
author={Zhang, Yonghao and He, Qiang and Wan, Yanguang and Zhang, Yinda and Deng, Xiaoming and Ma, Cuixia and Wang, Hongan},
url={https://ojs.aaai.org/index.php/AAAI/article/view/33120},
journal={Proceedings of the AAAI Conference on Artificial Intelligence},
year={2025}
}
