Code for reproducing the results in the following paper:
This repo, together with image-play and pose-hg-train (branch image-play), hold the code for reproducing the results in the following paper:
Forecasting Human Dynamics from Static Images
Yu-Wei Chao, Jimei Yang, Brian Price, Scott Cohen, Jia Deng
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017
Check out the project site for more details.
-
The role of this repo is to implement training step 2 (Sec. 3.3), i.e. pre-training a 3D skeleton converter to recover 3D joint locations from 2D heatmaps.
-
This is later used to initialize the 3D skeleton converter sub-network in training step 3 (Sec. 3.3), i.e. training the full system.
Please cite Skeleton2D3D if it helps your research:
@INPROCEEDINGS{chao:cvpr2017,
author = {Yu-Wei Chao and Jimei Yang and Brian Price and Scott Cohen and Jia Deng},
booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
title = {Forecasting Human Dynamics from Static Images},
year = {2017},
}
This repo contains one submodules (pose-hg-train), so make sure you clone with --recursive:
git clone --recursive https://github.com/ywchao/skeleton2d3d.git- Download Pre-Computed Models and Prediction
- Dependencies
- Setting Up Human3.6M
- Setting Up Penn Action
- Training 3D Skeleton Converter on Ground-Truth Heatmaps
- Fine-Tuning 3D Skeleton Converter on Predicted Heatmaps
- Comparison with Zhou et al. [40]
- Evaluation
If you just want to run the training of the full system, i.e. image-play, you can simply download the pre-computed models and prediction (108M) and skip the remaining content.
./scripts/fetch_s2d3d_models_prediction.sh
./scripts/setup_symlinks_models.shThis will populate the exp folder with precomputed_s2d3d_models_prediction and set up a set of symlinks.
You can also now set up Human3.6M and run the evaluation demo with the downloaded prediction. This will ensure exact reproduction of the paper's results.
To proceed to the remaining content, make sure the following are installed.
- Torch7
- We used commit bd5e664 (2016-10-17) with CUDA 8.0.27 RC and cuDNN v5.1 (cudnn-8.0-linux-x64-v5.1).
- All our models were trained on a GeForce GTX TITAN X GPU.
- matio-ffi
- torch-hdf5
- MATLAB
The Human3.6M dataset is used for training and evaluation.
-
Download the Human3.6M dataset. Only the Poses RawAngles and Videos files are required:
Poses_RawAngles_S1.tgz Poses_RawAngles_S5.tgz Poses_RawAngles_S6.tgz Poses_RawAngles_S7.tgz Poses_RawAngles_S8.tgz Poses_RawAngles_S9.tgz Poses_RawAngles_S11.tgz Videos_S1.tgz Videos_S5.tgz Videos_S6.tgz Videos_S7.tgz Videos_S8.tgz Videos_S9.tgz Videos_S11.tgz
Place these files under
external/Human3.6M. -
Extract the files:
for i in external/Human3.6M/*.tgz; do tar zxvf $i -C external/Human3.6M; done
This will populate the
external/Human3.6Mfolder withS1,S5,S6,S7,S8,S9, andS11. -
Download the Human3.6M dataset code:
./h36m_utils/fetch_h36m_code.sh
This will populate the
h36m_utilsfolder withRelease-v1.1. -
Generate meta files. Start MATLAB
matlabunderskeleton2d3d. You should see the messageadded paths for the experiment!followed by the MATLAB prompt>>. Run the following command:H36MDataBase.instance;Set the data path to
external/Human3.6Mand the config file directory toh36m_utils/Release-v1.1. This will create a new fileH36M.confunderskeleton2d3d. -
Preprocess data for training and evaluation:
matlab -r "generate_data_h36m; quit"This will populate the
data/h36mfolder withframes,train.mat, andval.mat. -
Optional: Visualize 3D pose sequences:
matlab -r "vis_3d_pose; quit"The output will be saved in
output/vis_3d_pose.
The Penn Action dataset is used for running prediction.
-
Download the Penn Action dataset to
external.externalshould containPenn_Action.tar.gz. Extract the files:tar zxvf external/Penn_Action.tar.gz -C external
This will populate the
externalfolder with a folderPenn_Actionwithframes,labels,tools, andREADME. -
Preprocess Penn Action by cropping the images:
matlab -r "prepare_penn_crop; quit"This will populate the
data/penn-cropfolder withframesandlabels. -
Generate validation set and preprocess annotations:
matlab -r "generate_valid_penn; quit" python tools/preprocess.pyThis will populate the
data/penn-cropfolder withvalid_ind.txt,train.h5,val.h5, andtest.h5.
We begin with training a 3D skeleton converter on Human3.6M. As the first step, we use ground-truth heatmaps as input to the network.
-
Before starting, make sure to remove the symlinks from the download section, if any:
find exp -type l -delete
-
Optional: Visualize training examples. Each example consists of input ground-truth heamaps and ground-truth 3D pose. The heamaps are artificially generated by projecting 3D pose onto the image plane. This is done in Torch7 each time we load a training sample. We provide a way to visualize this process in MATLAB:
matlab -r "vis_pose_proj; quit"The output will be saved in
output/vis_pose_proj. -
Start training:
./scripts/h36m/res-64.sh $GPU_IDThe output will be saved in
exp/h36m/res-64. -
Optional: Visualize training loss and accuracy:
matlab -r "plot_loss_err; quit"The output will be saved to
output/plot_res-64.pdf. -
Optional: Visualize prediction on a subset of the validation set:
matlab -r "vis_preds_h36m; quit"The output will be saved in
output/vis_res-64/h36m_val. The predicted pose is colored by blue, green, and red, and the ground-truth pose is colored by black, cyan, and magenta. -
Optional: Run prediction on Penn Action. Given the Human3.6M trained model, we can run prediction on Penn Action. Again, we use ground-truth heatmaps as input to the network. Note that Penn Action contains unlabeled joints, which will introduce empty heatmaps that were not seen during training.
./scripts/penn-crop/res-64-pred.sh $GPU_IDThe output will be saved in
exp/penn-crop/res-64. -
Optional: Visualize prediction on a subset of the validation set:
matlab -r "vis_preds_penn; quit"The output will be saved in
output/vis_res-64/penn_val.
Rather than ground-trtuh heatmaps, often times the 3D skeleton converter is expected to take predicted heatmaps as input. We next fine-tune the pre-trained 3D skeleton converter on heatmaps produced by an hourglass network.
-
Obtain a trained hourglass model. This is done with the submodule
pose-hg-train.Option 1: Download pre-computed hourglass models (50M): (recommended)
cd pose-hg-train ./scripts/fetch_hg_models.sh ./scripts/setup_symlinks_models.sh cd ..
This will populate the
pose-hg-train/expfolder withprecomputed_hg_modelsand set up a set of symlinks.Option 2: Train your own models.
-
Start training:
./scripts/h36m/hg-256-res-64-hg0-hgfix.sh $GPU_IDThe output will be saved in
exp/h36m/hg-256-res-64-hg0-hgfix. -
Optional: Visualize training loss, error, and accuracy:
matlab -r "plot_loss_err_acc; quit"The output will be saved to
output/plot_hg-256-res-64-hg0-hgfix.pdf. -
Optional: Visualize prediction on a subset of the validation set:
matlab -r "vis_preds_h36m_hg; quit"The output will be saved in
output/vis_hg-256-res-64-hg0-hgfix/h36m_val. The predicted pose is colored by blue, green, and red, and the ground-truth pose is colored by black, cyan, and magenta. -
Optional: Run prediction on Penn Action. Again, rather than using ground-truth heatmaps as in the last section, we use predicted heatmaps as input here.
./scripts/penn-crop/hg-256-res-64-hg0-hgfix-pred.sh $GPU_IDThe output will be saved in
exp/penn-crop/hg-256-res-64-hg0-hgfix. -
Optional: Visualize prediction on a subset of the validation set:
matlab -r "vis_preds_penn_hg; quit"The output will be saved in
output/vis_hg-256-res-64-hg0-hgfix/penn_val.
This demo shows how we compare 3D pose recovery with Zhou et al. [40] in the paper (Sec. 4.2).
-
Fine-tune the hourglass network on Human3.6M. We will use the hourglass output as input to Zhou et al.'s method. Our goal is to evaluate the 3D pose output on Human3.6M. Since the hourglass model from
pose-hg-trainis trained on MPII and Penn Action, we first fine-tune it on Human3.6M:./scripts/h36m/hg-256.sh $GPU_IDThe output will be saved in
exp/h36m/hg-256. -
Run prediction:
./scripts/h36m/hg-256-pred.sh $GPU_IDThe output will be saved in
exp/h36m/hg-256. -
Download Zhou et al.'s MATLAB code:
./shapeconvex/fetch_shapeconvex.sh
This will populate the
shapeconvexfolder withrelease. -
Learn pose dictionary on Human3.6M:
matlab -r "shapeconvex_dl; quit"The output will be saved to
shapeconvex/shapeDict_h36m.mat. -
Run 3D pose estimation:
matlab -r "shapeconvex_run; quit"The output will be saved to
shapeconvex/res_hg-256-pred/h36m_val. -
Optional: Visualize prediction on a subset of the validation set:
matlab -r "shapeconvex_vis; quit"The output will be saved to
shapeconvex/vis_hg-256-pred/h36m_val. -
Finally, for a fair comparison, we also need to fine-tune our 3D skeleton converter using the fine-tuned hourglass.
./scripts/h36m/hg-256-res-64-hg1-hgfix.sh $GPU_IDThe output will be saved in
exp/h36m/hg-256-res-64-hg1-hgfix.
This demo runs the MATLAB evaluation script and reproduces our results in the paper (Tab. 2). If you are using pre-computed prediction, and want to also output Zhou et al.'s results, make sure to first run step 3, 4, and 5 in the last section.
-
Compute mean per joint position errors (MPMJE):
matlab -r "eval_run; quit"This will print out the MPMJE values.
-
Optional: Visualize Zhou et al.'s and our results.
matlab -r "vis_run; quit"The output will be saved in
evaluation/shapeconvexandevaluation/skeleton2d3d.