Rao Fu* Β· Dingxi Zhang* Β· Alex Jiang Β· Wanjia Fu Β· Austin Funk Β· Daniel Ritchie Β· Srinath Sridhar
-
[2025/09/04] For full object poses, access our Globus repository: here. There are a total of 3.3k object motion sequences in 4 zip files (zip 1, zip 2, zip 3, zip 4). Object meta annotation is provided here. Please checkout the README_object for usage.
-
[2025/07/21] We provide hand-object mesh visualizer. The visualizer provides hand-object temporal alignment, and camera parameter usage.
-
[2025/07/09] For object meshes, you can download them here.
We also provide smoother 3D hand poses that are aligned with the object coordinate system, derived from MANO parameters β available here. Note: the previously provided
keypoints_3d_manowere also generated from MANO parameters, but have been normalized and recentered to better support motion generation training. -
[2025/04/30] For multiview RGB videos, access our Globus repository: here. Download each
.tar.gzseparately (contains 10 views per file, 51 camera views in total.) -
[2025/04/02] We are pleased to release our full hand pose dataset, available for download here (Including all
keypoints_3d,keypoints_3d_manoandparams).
Complete text annotation are available here. We used the rewritten_annotation for model training.
More data coming soon! π
Understanding bimanual human hand activities is a critical problem in AI and robotics. We cannot build large models of bimanual activities because existing datasets lack the scale, coverage of diverse hand activities, and detailed annotations. We introduce GigaHands, a massive annotated dataset capturing 34 hours of bimanual hand activities from 56 subjects and 417 objects, totaling 14k motion clips derived from 183 million frames paired with 84k text annotations. Our markerless capture setup and data acquisition protocol enable fully automatic 3D hand and object estimation while minimizing the effort required for text annotation. The scale and diversity of GigaHands enable broad applications, including text-driven action synthesis, hand motion captioning, and dynamic radiance field reconstruction.
We store our dataset on Globus. Access the raw data via here.
You can download 1 demo sequence from here. Or download 5 demo sequences from here.
Details
```text gigahands_demo/ βββ hand_pose/ β βββ p--/ β β βββ bboxes/ β β βββ keypoints_2d/ β β βββ keypoints_3d/ β β βββ keypoints_3d_mano/ β β βββ mano_vid/ β β βββ params/ β β βββ rgb_vid/ β β β βββ brics-odroid--camx/ β β β βββ xxx.mp4 β β β βββ xxx.txt β β βββ repro_2d_vid/ β β βββ repro_3d_vid/ β β βββ optim_params.txt β βββ ... βββ object_pose/ βββ p--/ β βββ mesh/ β βββ pose/ β βββ render/ β βββ segmentation/ βββ ...The dataset directory should look like this:
./dataset/GigaHands/
βββ multiview_rgb_vids/
βββ p<participant id>-<scene>/
βββ brics-odroid-<camera id>/
βββ brics-odroid-<camera id>_<sequence 0 timestamp>.mp4
βββ brics-odroid-<camera id>_<sequence 1 timestamp>.mp4
βββ ...
βββ hand_poses/
βββ p<participant id>-<scene>/
βββ keypoints_3d/ # 3D hand keypoints (triangulate multi-view 2D keypoints.)
βββ keypoints_3d_mano/ # 3D hand keypoints (extract from mano parms and normalized, more smooth)
βββ params/ # mano parameters
βββ optim_params.txt # camera parameters
βββ object_poses/
βββ <scene name>
βββ <object name>
βββ p<participant id>-<scene>_<squence id>/
βββ pose # object 6DoF poses
βββ object_meta/ # all scanned and generated meshes
βββ <scene name>
βββ <object name>
βββ scene_wise_round1
βββ <scene name>_annotated_round1.csv # annotation for object mesh occurance, tracking success <--> scene-sequence id
βββ annotations_v2.jsonl # text annotations
βββ instruction_script.json # original instruction for filming
βββ multiview_camera_video_map.csv # scene-sequence id <-> multiview rgb video id mappingDownlod the multiview_rgb videos from here, hand annotations and camera parameters from here, smoothed 3d hand keypoints from here, object poses from here, object mesh from here, text annotations from here, the original instruction script grouped by scenario, scene, activity from here, scene-sequence mapping with video id meta file here, object mesh mapping with scene-sequence id here.
This code requires:
- Python 3.8+
- conda3 or miniconda3
- CUDA capable GPU (one is enough)
- Create a virtual environment and install necessary dependencies
conda create -n gigahands python==3.8
conda activate gigahands
conda install pytorch==2.2.0 torchvision==0.17.0 torchaudio==2.2.0 pytorch-cuda=12.1 -c pytorch -c nvidia
conda install -c conda-forge ffmpeg
pip install -r requirements.txt- Install EasyMocap
cd third-party/EasyMocap
python setup.py develop- Download mano models and place the
MANO_*.pklfiles underbody_models/smplh. - Download the pretrained models by running
bash dataset/download_pretrained_models.sh, which should be like:
./checkpoints/GigaHands/
./checkpoints/GigaHands/GPT/ # Text-to-motion generation model
./checkpoints/GigaHands/VQVAE/ # Motion autoencoder
./checkpoints/GigaHands/text_mot_match/ # Motion & Text feature extractors for evaluationExample use of hand pose and object pose annotation: After downloading hand pose, object pose and object meshes, run the script below to visualize hand-object mesh.
python render_mesh_video.py \
--dataset_root <data_root_path> \
--scene_name 17_instruments \
--session_name p003-instrument \
--seq_id 33 \
--object_name ukelele_scan \
--mesh_name ukelele-simplified1_1.obj \
--render_camera brics-odroid-011_cam0 \
--save_root visualizations
You will see videos of the rendered hand-object mesh in visualizations directory.
Example use for text-hand annotations. Visualizer below is customized for training text-hand models.
python visualize_hands.pyYou will see videos of the MANO render results and reprojected keypoints in the visualizations directory.
Example use for multi-view videos. Visualizer below is customized for multiview video loading.
python multiview_videoloader.py \
--video_root_dir <your-path-to-multiview_rgb_vids> \
--session <session name, i.e. p001-folder> \
--seqid <sequence id, i.e. 17> \
--out_dir <your-path-to-hand_poses (which contains optim_params.txt)>You will see 3 concatenated frames from multi-views from targeted session and sequence id under the visualizations directory.
Sampling results from customized descriptions:
python gen_motion_custom.py --resume-pth ./checkpoints/GigaHands/VQVAE/net_last.pth --resume-trans ./checkpoints/GigaHands/GPT/net_best_fid.pth --input-text ./input.txtThe results are saved in the folder output.
Training motion VQ-VAE:
python3 train_vq_hand.py \
--batch-size 256 \
--lr 2e-4 \
--total-iter 300000 \
--lr-scheduler 200000 \
--nb-code 512 \
--down-t 2 \
--depth 3 \
--dilation-growth-rate 3 \
--out-dir output \
--dataname GigaHands \
--vq-act relu \
--quantizer ema_reset \
--loss-vel 0.5 \
--recons-loss l1_smooth \
--exp-name VQVAE \
--window-size 128Training T2M GPT model:
python3 train_t2m_trans_hand.py \
--exp-name GPT \
--batch-size 128 \
--num-layers 9 \
--embed-dim-gpt 1024 \
--nb-code 512 \
--n-head-gpt 16 \
--block-size 51 \
--ff-rate 4 \
--drop-out-rate 0.1 \
--resume-pth output/VQVAE/net_last.pth \
--vq-name VQVAE \
--out-dir output \
--total-iter 300000 \
--lr-scheduler 150000 \
--lr 0.0001 \
--dataname GigaHands \
--down-t 2 \
--depth 3 \
--quantizer ema_reset \
--eval-iter 10000 \
--pkeep 0.5 \
--dilation-growth-rate 3 \
--vq-act relu \- Release demo data
- Release hand pose data
- Release multi-view video data
- Release object pose data (3.3k) and meshes
- Release hand-object-motion and meshes correspondence file
- Release inference code for text-to-motion task
- Release training code for text-to-motion task
If you find our work useful in your research, please cite:
@inproceedings{fu2025gigahands,
title={Gigahands: A massive annotated dataset of bimanual hand activities},
author={Fu, Rao and Zhang, Dingxi and Jiang, Alex and Fu, Wanjia and Funk, Austin and Ritchie, Daniel and Sridhar, Srinath},
booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference},
pages={17461--17474},
year={2025}
}
We appreciate helps from :
- Public code like EasyMocap, text-to-motion, TM2T, MDM, T2M-GPT etc.
- This research was supported by AFOSR grant FA9550-21 1-0214, NSF CAREER grant #2143576, and ONR DURIP grant N00014-23-1-2804. We would like to thank the Ope nAI Research Access Program for API support and extend our gratitude to Ellie Pavlick, Tianran Zhang, Carmen Yu, Angela Xing, Chandradeep Pokhariya, Sudarshan Harithas, Hongyu Li, Chaerin Min, Xindi Qu, Xiaoquan Liu, Hao Sun, Melvin He and Brandon Woodard.
GigaHands is released under the Creative Commons Attribution-NonCommercial 4.0 International License. See the LICENSE file for details.