[CVPR 2025 Highlight]
GigaHands: A Massive Annotated Dataset of Bimanual Hand Activities

Brown IVL

Rao Fu^* · Dingxi Zhang^* · Alex Jiang · Wanjia Fu · Austin Funk · Daniel Ritchie · Srinath Sridhar

📢 Updates

[2025/09/04] For full object poses, access our Globus repository: here. There are a total of 3.3k object motion sequences in 4 zip files (zip 1, zip 2, zip 3, zip 4). Object meta annotation is provided here. Please checkout the README_object for usage.
[2025/07/21] We provide hand-object mesh visualizer. The visualizer provides hand-object temporal alignment, and camera parameter usage.
[2025/07/09] For object meshes, you can download them here.

We also provide smoother 3D hand poses that are aligned with the object coordinate system, derived from MANO parameters — available here. Note: the previously provided keypoints_3d_mano were also generated from MANO parameters, but have been normalized and recentered to better support motion generation training.
[2025/04/30] For multiview RGB videos, access our Globus repository: here. Download each .tar.gz separately (contains 10 views per file, 51 camera views in total.)
[2025/04/02] We are pleased to release our full hand pose dataset, available for download here (Including all keypoints_3d, keypoints_3d_mano and params).

Complete text annotation are available here. We used the rewritten_annotation for model training.

More data coming soon! 🔜

🗒️ Overview

Understanding bimanual human hand activities is a critical problem in AI and robotics. We cannot build large models of bimanual activities because existing datasets lack the scale, coverage of diverse hand activities, and detailed annotations. We introduce GigaHands, a massive annotated dataset capturing 34 hours of bimanual hand activities from 56 subjects and 417 objects, totaling 14k motion clips derived from 183 million frames paired with 84k text annotations. Our markerless capture setup and data acquisition protocol enable fully automatic 3D hand and object estimation while minimizing the effort required for text annotation. The scale and diversity of GigaHands enable broad applications, including text-driven action synthesis, hand motion captioning, and dynamic radiance field reconstruction.

📂 Data Format

We store our dataset on Globus. Access the raw data via here.

Demo Data

You can download 1 demo sequence from here. Or download 5 demo sequences from here.

Directory Structure (Click to expand)

Details

```text gigahands_demo/ ├── hand_pose/ │ ├── p--/ │ │ ├── bboxes/ │ │ ├── keypoints_2d/ │ │ ├── keypoints_3d/ │ │ ├── keypoints_3d_mano/ │ │ ├── mano_vid/ │ │ ├── params/ │ │ ├── rgb_vid/ │ │ │ ├── brics-odroid--camx/ │ │ │ ├── xxx.mp4 │ │ │ ├── xxx.txt │ │ ├── repro_2d_vid/ │ │ ├── repro_3d_vid/ │ │ ├── optim_params.txt │ ├── ... └── object_pose/ ├── p--/ │ ├── mesh/ │ ├── pose/ │ ├── render/ │ ├── segmentation/ ├── ...

Whole Dataset

The dataset directory should look like this:

./dataset/GigaHands/
├── multiview_rgb_vids/
    ├── p<participant id>-<scene>/
        ├── brics-odroid-<camera id>/
            ├── brics-odroid-<camera id>_<sequence 0 timestamp>.mp4
            ├── brics-odroid-<camera id>_<sequence 1 timestamp>.mp4
            ├── ...
├── hand_poses/
    ├── p<participant id>-<scene>/
        ├── keypoints_3d/				# 3D hand keypoints (triangulate multi-view 2D keypoints.)
        ├── keypoints_3d_mano/				# 3D hand keypoints (extract from mano parms and normalized, more smooth)
        ├── params/					# mano parameters
        ├── optim_params.txt				# camera parameters
├── object_poses/
    ├── <scene name>
        ├── <object name>
            ├── p<participant id>-<scene>_<squence id>/
                ├── pose				# object 6DoF poses
├── object_meta/                                        # all scanned and generated meshes
    ├── <scene name>
        ├── <object name>
    ├── scene_wise_round1
        ├── <scene name>_annotated_round1.csv           # annotation for object mesh occurance, tracking success <--> scene-sequence id

└── annotations_v2.jsonl 				# text annotations
└── instruction_script.json 			# original instruction for filming
└── multiview_camera_video_map.csv      # scene-sequence id <-> multiview rgb video id mapping

Downlod the multiview_rgb videos from here, hand annotations and camera parameters from here, smoothed 3d hand keypoints from here, object poses from here, object mesh from here, text annotations from here, the original instruction script grouped by scenario, scene, activity from here, scene-sequence mapping with video id meta file here, object mesh mapping with scene-sequence id here.

Installation

This code requires:

Python 3.8+
conda3 or miniconda3
CUDA capable GPU (one is enough)

Create a virtual environment and install necessary dependencies

conda create -n gigahands python==3.8
conda activate gigahands
conda install pytorch==2.2.0 torchvision==0.17.0 torchaudio==2.2.0 pytorch-cuda=12.1 -c pytorch -c nvidia
conda install -c conda-forge ffmpeg
pip install -r requirements.txt

Install EasyMocap

cd third-party/EasyMocap
python setup.py develop

Download mano models and place the MANO_*.pkl files under body_models/smplh.
Download the pretrained models by running bash dataset/download_pretrained_models.sh, which should be like:

./checkpoints/GigaHands/
./checkpoints/GigaHands/GPT/			# Text-to-motion generation model
./checkpoints/GigaHands/VQVAE/ 			# Motion autoencoder
./checkpoints/GigaHands/text_mot_match/		# Motion & Text feature extractors for evaluation

🎥 Visualizations

Example use of hand pose and object pose annotation: After downloading hand pose, object pose and object meshes, run the script below to visualize hand-object mesh.

▶ Video demo (MP4, 211k)

python render_mesh_video.py \
    --dataset_root <data_root_path> \
    --scene_name 17_instruments \
    --session_name p003-instrument \
    --seq_id 33 \
    --object_name ukelele_scan \
    --mesh_name ukelele-simplified1_1.obj \
    --render_camera brics-odroid-011_cam0 \
    --save_root visualizations

You will see videos of the rendered hand-object mesh in visualizations directory.

Example use for text-hand annotations. Visualizer below is customized for training text-hand models.

python visualize_hands.py

You will see videos of the MANO render results and reprojected keypoints in the visualizations directory.

Example use for multi-view videos. Visualizer below is customized for multiview video loading.

python multiview_videoloader.py \
    --video_root_dir <your-path-to-multiview_rgb_vids> \
    --session <session name, i.e. p001-folder> \
    --seqid <sequence id, i.e. 17> \
    --out_dir <your-path-to-hand_poses (which contains optim_params.txt)>

You will see 3 concatenated frames from multi-views from targeted session and sequence id under the visualizations directory.

🚀 Inference - text2motion

Sampling results from customized descriptions:

python gen_motion_custom.py --resume-pth ./checkpoints/GigaHands/VQVAE/net_last.pth --resume-trans ./checkpoints/GigaHands/GPT/net_best_fid.pth --input-text ./input.txt

🏋️ Training - text2motion

The results are saved in the folder output.

Training motion VQ-VAE:

python3 train_vq_hand.py \
--batch-size 256 \
--lr 2e-4 \
--total-iter 300000 \
--lr-scheduler 200000 \
--nb-code 512 \
--down-t 2 \
--depth 3 \
--dilation-growth-rate 3 \
--out-dir output \
--dataname GigaHands \
--vq-act relu \
--quantizer ema_reset \
--loss-vel 0.5 \
--recons-loss l1_smooth \
--exp-name VQVAE \
--window-size 128

Training T2M GPT model:

python3 train_t2m_trans_hand.py  \
--exp-name GPT \
--batch-size 128 \
--num-layers 9 \
--embed-dim-gpt 1024 \
--nb-code 512 \
--n-head-gpt 16 \
--block-size 51 \
--ff-rate 4 \
--drop-out-rate 0.1 \
--resume-pth output/VQVAE/net_last.pth \
--vq-name VQVAE \
--out-dir output \
--total-iter 300000 \
--lr-scheduler 150000 \
--lr 0.0001 \
--dataname GigaHands \
--down-t 2 \
--depth 3 \
--quantizer ema_reset \
--eval-iter 10000 \
--pkeep 0.5 \
--dilation-growth-rate 3 \
--vq-act relu \

Checklist

Release demo data
Release hand pose data
Release multi-view video data
Release object pose data (3.3k) and meshes
Release hand-object-motion and meshes correspondence file
Release inference code for text-to-motion task
Release training code for text-to-motion task

Citation

If you find our work useful in your research, please cite:

@inproceedings{fu2025gigahands,
  title={Gigahands: A massive annotated dataset of bimanual hand activities},
  author={Fu, Rao and Zhang, Dingxi and Jiang, Alex and Fu, Wanjia and Funk, Austin and Ritchie, Daniel and Sridhar, Srinath},
  booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference},
  pages={17461--17474},
  year={2025}
}

Acknowledgement

We appreciate helps from :

Public code like EasyMocap, text-to-motion, TM2T, MDM, T2M-GPT etc.
This research was supported by AFOSR grant FA9550-21 1-0214, NSF CAREER grant #2143576, and ONR DURIP grant N00014-23-1-2804. We would like to thank the Ope nAI Research Access Program for API support and extend our gratitude to Ellie Pavlick, Tianran Zhang, Carmen Yu, Angela Xing, Chandradeep Pokhariya, Sudarshan Harithas, Hongyu Li, Chaerin Min, Xindi Qu, Xiaoquan Liu, Hao Sun, Melvin He and Brandon Woodard.

License

GigaHands is released under the Creative Commons Attribution-NonCommercial 4.0 International License. See the LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

[CVPR 2025 Highlight]
GigaHands: A Massive Annotated Dataset of Bimanual Hand Activities

📢 Updates

🗒️ Overview

📂 Data Format

Demo Data

Whole Dataset

Installation

🎥 Visualizations

🚀 Inference - text2motion

🏋️ Training - text2motion

Checklist

Citation

Acknowledgement

License

About

Uh oh!

Releases

Packages

Contributors 3

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
assets		assets
body_models		body_models
dataset		dataset
glove		glove
hand_utils		hand_utils
models		models
options		options
third-party		third-party
utils		utils
video_utils		video_utils
visualizations		visualizations
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
README_object.md		README_object.md
gen_motion_custom.py		gen_motion_custom.py
input.txt		input.txt
multiview_videoloader.py		multiview_videoloader.py
render_mesh_video.py		render_mesh_video.py
requirements.txt		requirements.txt
train_t2m_trans_hand.py		train_t2m_trans_hand.py
train_vq_hand.py		train_vq_hand.py
visualize_hands.py		visualize_hands.py

brown-ivl/GigaHands

Folders and files

Latest commit

History

Repository files navigation

[CVPR 2025 Highlight] GigaHands: A Massive Annotated Dataset of Bimanual Hand Activities

📢 Updates

🗒️ Overview

📂 Data Format

Demo Data

Whole Dataset

Installation

🎥 Visualizations

🚀 Inference - text2motion

🏋️ Training - text2motion

Checklist

Citation

Acknowledgement

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

[CVPR 2025 Highlight]
GigaHands: A Massive Annotated Dataset of Bimanual Hand Activities

Packages