Pose Extraction & Rendering Code for SCAIL: Towards Studio-Grade Character Animation via In-Context Learning of 3D-Consistent Pose Representations
This repository contains the 3D pose extraction & rendering code for SCAIL (Studio-Grade Character Animation via In-Context Learning), a framework that enables high-fidelity character animation under diverse and challenging conditions, including large motion variations, stylized characters, and multi-character interactions. The main repo is at zai-org/SCAIL.
We connect estimated 3D human keypoints according to skeletal topology and represent bones as spatial cylinders. The resulting 3D skeleton is rasterized to obtain 2D motion guidance signals.
When processing multi-character data, we segment each character, extract their poses, and then render them together to achieve multi-character pose extraction.
Our multi-stage pose extraction pipeline provides robust estimations under multi-character interactions, benefiting from NLFPose’s reliable depth estimation:
Utilizing such representation, our framework further resolves the challenge that pose representations cannot simultaneously prevent identity leakage and preserve rich motion information.
- 2025.12.16: ❤️ Huge thanks to KJ for the work done on adaptation — the pose extraction & rendering has also been partly adapted to ComfyUI in ComfyUI-SCAIL-Pose!! We are looking forward for the ComfyUI support of multi-character tracking to avoid character color exchange and the support of multi-character facial keypoints.
- 2025.12.16: 🔔 NOTE that our code should able to support as many characters as the segmentation node can handle, by default we use 2, but you can use more if you want. We recommend to use different colors for overlapping characters to alleviate ID exchange.
- 2025.12.16: 👀 Our Vision: in this project, we use
taichifor skeletal rendering in a certain color scheme. In fact, we welcome the community to modify the nodes to support arbitrary skeletons, such as skeletons generated by text-controlled motion generation (e.g., Being-M0), or skeletons exported from rendering software (Unity, Blender). We hope these improvements will help the SCAIL project build a stronger ecosystem and truly move toward a studio-grade level.
-
Inference Code for 3D Pose Extraction & Rendering
-
Inference Code for 3D Pose Retarget
-
Inference Code for Multi-Human Pose Extraction & Rendering
Make sure you have already clone the main repo, this repo should be cloned under the main repo folder:
SCAIL/
├── examples
├── sat
├── configs
├── ...
├── SCAIL-Pose
Change dir to this pose extraction & rendering folder:
cd SCAIL-Pose/
We recommend using mmpose for the environment setup. You can refer to the official mmpose installation guide. Note that the example in the guide uses python 3.8, however we recommend using python>=3.10 for compatibility with SAMURAI. The following commands are used to install the required packages once you have setup the environment.
conda activate openmmlab
pip install -r requirements.txt
# [optional] sam2 is only for multi-human extraction purposes, you can skip this step if you only need single human extraction
git clone https://github.com/facebookresearch/sam2.git && cd sam2
pip install -e .
cd ..First, download pretrained weights for pose extraction & rendering. The script below
downloads NLFPose (torchscript), DWPose (
onnx) and YOLOX (onnx) weights. You can also download the weights
manually and put them into the pretrained_weights folder.
mkdir pretrained_weights && cd pretrained_weights
# download NLFPose Model Weights
wget https://github.com/isarandi/nlf/releases/download/v0.3.2/nlf_l_multi_0.3.2.torchscript
# download DWPose Model Weights & Detection Model Weights
mkdir DWPose
wget -O DWPose/dw-ll_ucoco_384.onnx \
https://huggingface.co/yzd-v/DWPose/resolve/main/dw-ll_ucoco_384.onnx
wget -O DWPose/yolox_l.onnx \
https://huggingface.co/yzd-v/DWPose/resolve/main/yolox_l.onnx
cd ..The weights should be formatted as follows:
pretrained_weights/
├── nlf_l_multi_0.3.2.torchscript
└── DWPose/
├── dw-ll_ucoco_384.onnx
└── yolox_l.onnx
[Optional] Then download SAM2 weights for segmentation if you need to use multi-human extraction & rendering. Run the following commands:
cd sam2/checkpoints && \
./download_ckpts.sh && \
cd ../..Default Extraction & Rendering:
python NLFPoseExtract/process_pose.py --subdir <path_to_the_example_pair> --resolution [512, 896]
Extraction & Rendering using 3D Retarget:
python NLFPoseExtract/process_pose.py --subdir <path_to_the_example_pair> --use_align --resolution [512, 896]
Multi-Human Extraction & Rendering:
python NLFPoseExtract/process_pose_multi.py --subdir <path_to_the_example_pair> --resolution [512, 896]
Note that the examples are in the main repo folder, you can also use your own images or videos. After the extraction and rendering, the results will be saved in the example folder and you can continue to use that folder to generate character animations in the main repo.
If you find this work useful in your research, please cite:
@article{yan2025scail,
title={SCAIL: Towards Studio-Grade Character Animation via In-Context Learning of 3D-Consistent Pose Representations},
author={Yan, Wenhao and Ye, Sheng and Yang, Zhuoyi and Teng, Jiayan and Dong, ZhenHui and Wen, Kairui and Gu, Xiaotao and Liu, Yong-Jin and Tang, Jie},
journal={arXiv preprint arXiv:2512.05905},
year={2025}
}