- Code cleanup and refactoring
- Integrate motionrender for visualization
- Release test set of OOD Oakink Objects
- Uploaded Pretrained GraspVAE Checkpoint
For environment setup instructions, please refer to projects/mdm_hand/environment.md.
- Register for a MANO account.
- Download the checkpoint files.
Place the downloaded files under mdm_hand/data/body_models/mano.
The directory structure should look like this:
data/
└── body_models/
└── mano/
├── MANO_LEFT.pkl
└── MANO_RIGHT.pkl
└── grab/
├── grab_frames
└── grab_seq20fps
└── oakink/
└── oakink_aligned
For easy use of the code the processed data, it can be downloaded from huggingface Run following script:
cd projects/mdm_hand/data
wget https://huggingface.co/datasets/jojo23333/LatentHOI-data/resolve/main/grab_frames.tar.gz
wget https://huggingface.co/datasets/jojo23333/LatentHOI-data/resolve/main/grab_seq20fps.tar.gz
tar -xzvf grab_frames.tar.gz
tar -xzvf grab_seq20fps.tar.gz
We also provide preprocessed oakink split used in our code, with mesh cordiante direction calibrated with Grab objects
cd projects/mdm_hand/data
unzip oakink.zip
All data preparation scripts should be run from the projects/mdm_hand/datasets/GRAB directory.
GRAB Dataset
# For VAE training (single frame hand data, left hand not in contact are omitted)
python grab/grab_preprocessing_adapt_flat_hand.py
# For Diffusion model training (sequence data)
python grab/grab_preprocessing_all_seq.pyDexYCB Dataset
# For VAE training
python grab/dexycb_preprocessing_all_seq.py
# For Diffusion model training (with --seq flag)
python grab/dexycb_preprocessing_all_seq.py --seq# GRAB
python -m tools.train_vae --num-gpus 1 --resume --config config/VAE/VAE_grab.yaml
# DexYCB
python -m tools.train_vae --num-gpus 1 --resume --config config/VAE/VAE_dexycb.yamlOr you can use my pretrained graspvae here: https://drive.google.com/drive/folders/13dvExxUbENk9DF4XhNBO1NKAC7tx0Em8?usp=sharing
In configs, replace the DIFFUSION.VAE_CHECKPOINT with your trained vae checkpoint from above
# GRAB
python -m tools.train_diff --num-gpus 2 --mode ldm --resume --config config/grab/LDM_pretrain_vae.yaml
# DexYCB
python -m tools.train_diff --num-gpus 2 --mode ldm --resume --config config/dexycb/LDM_pretrain_vae.yaml # Generate for Oakink split
python -m tools.train_diff --mode ldm --eval-only --config config/oakink/ldm_oakink.yaml TEST.BATCH_SIZE 9
During training, middle result for evalution will be stored with the frequency defined by TEST.EVAL_PERIOD. <path_to_vis_folder> should contain evaluated result in the form of .pth, below command will visualize/evalutate all the .pth file in the folder
# Basic evaluation
python -m tools.eval_motion -f <path_to_vis_folder>
# With visualization (generates videos)
python -m tools.eval_motion -f <path_to_vis_folder> --vis
# With physics evaluation
python -m tools.eval_motion -f <path_to_vis_folder> --eval
# For DexYCB dataset
eval_motion -f <path_to_vis_folder> --dexVisualization is integrated into the evaluation process. Use the --vis flag with the evaluation command to generate videos of the hand motions:
python -m tools.eval_motion -f <path_to_vis_folder> --visModify the AITViewer backend to use EGL in aitviewer/viewer.py line 129:
self.window = base_window_cls(
title=title,
size=size,
fullscreen=C.fullscreen,
resizable=C.resizable,
gl_version=self.gl_version,
aspect_ratio=None,
vsync=C.vsync,
samples=self.samples,
cursor=True,
backend="egl"
)The provided ffmpeg might not recognize presets in commands. Solution options:
- Download and replace the conda environment ffmpeg as described in StyleSDF issue #20
- Use the system's global ffmpeg installation
As reported in AITViewer issue #53, use sub-processes to run rendering commands.
This error occurs when storing shared tensors for dataloader workers. Solution:
mean_latent, std_latent = copy.deepcopy(torch.chunk(mean_latent, 2, dim=-1))
dataset.mean_latent, dataset.std_latent = mean_latent.numpy(), std_latent.numpy()Adding .numpy() converts tensors to numpy arrays, solving the shared tensor issue.
@InProceedings{Muchen_LatentHOI,
author = {Li, Muchen and Christen, Sammy and Wan, Chengde and Cai, Yujun and Liao, Renjie and Sigal, Leonid and Ma, Shugao},
title = {LatentHOI: On the Generalizable Hand Object Motion Generation with Latent Hand Diffusion.},
booktitle = {Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR)},
month = {June},
year = {2025},
pages = {17416-17425}
}