A minimal solution to hand motion capture from a single color camera at over 100fps. Easy to use, plug to run.
This project provides the core components for hand motion capture:
- estimating joint locations from a monocular RGB image (DetNet)
- estimating joint rotations from locations (IKNet)
We focus on:
- ease of use (all you need is a webcam)
- time efficiency (on our 1080Ti, 8.9ms for DetNet, 0.9ms for IKNet)
- robustness to occlusion, hand-object interaction, fast motion, changing scale and view point
Some links: [video] [paper] [supp doc] [webpage]
The author is too busy to collect the training code for release. On the other hand, it should not be difficult to implement the training part. Feel free to open an issue for any encountered problems.
Here is a pytorch version implemented by @MengHao666. I didn't personally check it but I believe it worth trying. Many thanks to @MengHao666 !
Here is a project that connects this repo to unity. It looks very cool and many thanks to @vinnik-dmitry07 !
Please check requirements.txt. All dependencies are available via pip and conda.
- Download MANO model from here and unzip it.
- In config.py, setOFFICIAL_MANO_PATHto the left hand model.
- Run python prepare_mano.py, you will get the converted MANO model that is compatible with this project atconfig.HAND_MESH_MODEL_PATH.
- Download models from here.
- Put detnet.ckpt.*inmodel/detnet, andiknet.ckpt.*inmodel/iknet.
- Check config.py, make sure all required files are there.
- python app.py
- Put your right hand in front of the camera. The pre-trained model is for left hand, but the input would be flipped internally.
- Press ESCto quit.
- Although the model is robust to variant scales, most ideally the image should be 1.3x larger than the hand bounding box. A good bounding box may result in better accuracy. You can track the bounding box with the 2D predictions of the model.
We found that the model may fail on some "simple" poses. We think this is because such poses were no presented in the training data. We are working on a v2 version with further extended data to tackle this problem.
Please check wrappers.py.
We also provide an optimization-based IK solver here.
The detection model is trained with following datasets:
The IK model is trained with the poses shipped with MANO.
This is the official implementation of the paper "Monocular Real-time Hand Shape and Motion Capture using Multi-modal Data" (CVPR 2020).
The quantitative numbers reported in the paper can be found in plot.py.
If you find the project helpful, please consider citing us:
@InProceedings{zhou2020monocular,
  author = {Zhou, Yuxiao and Habermann, Marc and Xu, Weipeng and Habibie, Ikhsanul and Theobalt, Christian and Xu, Feng},
  title = {Monocular Real-Time Hand Shape and Motion Capture Using Multi-Modal Data},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  month = {June},
  year = {2020}
}