Pose Extraction & Rendering Code for SCAIL: Towards Studio-Grade Character Animation via In-Context Learning of 3D-Consistent Pose Representations
This repository contains the 3D pose extraction & rendering code for SCAIL (Studio-Grade Character Animation via In-Context Learning), a framework that enables high-fidelity character animation under diverse and challenging conditions, including large motion variations, stylized characters, and multi-character interactions.
When processing multi-person data, we segment each person, extract their poses, and then render them together to achieve multi-person pose extraction.
Our multi-stage pose extraction pipeline provides robust estimations under multi-person interactions:
By applying 3D pose instead of 2D key-point based methods, our model is able to recognize occlusion relationships and preserve motion characteristics during augmentation and retarget.
-
Inference Code for 3D Pose Extraction & Rendering
-
Inference Code for 3D Pose Retarget
-
Inference Code for Multi-Human Pose Extraction & Rendering
-
Further Support of SAM3 & SAM3D
Make sure you have already clone the main repo, this repo should be cloned under the main repo folder:
SCAIL/
├── examples
├── sat
├── configs
├── ...
├── SCAIL-Pose
Change dir to this pose extraction & rendering folder:
cd SCAIL-Pose/
We recommend using mmpose for the environment setup. You can refer to the official mmpose installation guide. Note that the example in the guide uses python 3.8, however we recommend using python>=3.10 for compatibility with SAMURAI. The following commands are used to install the required packages once you have setup the environment.
conda activate openmmlab
pip install -r requirements.txt
# [optional] sam2 is only for multi-human extraction purposes, you can skip this step if you only need single human extraction
git clone https://github.com/facebookresearch/sam2.git && cd sam2
pip install -e .
cd ..First, download pretrained weights for pose extraction & rendering. The script below
downloads NLFPose (torchscript), DWPose (
onnx) and YOLOX (onnx) weights. You can also download the weights
manually and put them into the pretrained_weights folder.
mkdir pretrained_weights && cd pretrained_weights
# download NLFPose Model Weights
wget https://github.com/isarandi/nlf/releases/download/v0.3.2/nlf_l_multi_0.3.2.torchscript
# download DWPose Model Weights & Detection Model Weights
mkdir DWPose
wget -O DWPose/dw-ll_ucoco_384.onnx \
https://huggingface.co/yzd-v/DWPose/resolve/main/dw-ll_ucoco_384.onnx
wget -O DWPose/yolox_l.onnx \
https://huggingface.co/yzd-v/DWPose/resolve/main/yolox_l.onnx
cd ..The weights should be formatted as follows:
pretrained_weights/
├── nlf_l_multi_0.3.2.torchscript
└── DWPose/
├── dw-ll_ucoco_384.onnx
└── yolox_l.onnx
[Optional] Then download SAM2 weights for segmentation if you need to use multi-human extraction & rendering. Run the following commands:
cd sam2/checkpoints && \
./download_ckpts.sh && \
cd ../..Default Extraction & Rendering:
python NLFPoseExtract/process_pose.py --subdir <path_to_the_example_pair> --resolution [512, 896]
Extraction & Rendering using 3D Retarget:
python NLFPoseExtract/process_pose.py --subdir <path_to_the_example_pair> --use_align --resolution [512, 896]
Multi-Human Extraction & Rendering:
python NLFPoseExtract/process_pose_multi.py --subdir <path_to_the_example_pair> --resolution [512, 896]
Note that the examples are in the main repo folder, you can also use your own images or videos. After the extraction and rendering, the results will be saved in the example folder and you can continue to use that folder to generate character animations in the main repo.
If you find this work useful in your research, please cite:
@article{yan2025scail,
title={SCAIL: Towards Studio-Grade Character Animation via In-Context Learning of 3D-Consistent Pose Representations},
author={Yan, Wenhao and Ye, Sheng and Yang, Zhuoyi and Teng, Jiayan and Dong, ZhenHui and Wen, Kairui and Gu, Xiaotao and Liu, Yong-Jin and Tang, Jie},
journal={arXiv preprint arXiv:2512.05905},
year={2025}
}