Pose Extraction & Rendering Code for SCAIL: Towards Studio-Grade Character Animation via In-Context Learning of 3D-Consistent Pose Representations
This repository works as a submodule of SCAIL (Studio-Grade Character Animation via In-Context Learning), a framework that enables high-fidelity character animation under diverse and challenging conditions, including large motion variations, stylized characters, and multi-character interactions. Please follow instructions in that repo to extract and render the pose.
We connect estimated 3D human keypoints according to skeletal topology and represent bones as spatial cylinders. The resulting 3D skeleton is rasterized to obtain 2D motion guidance signals.
When processing multi-character data, we segment each character, extract their poses, and then render them together to achieve multi-character pose extraction.
Our multi-stage pose extraction pipeline provides robust estimations under multi-character interactions, benefiting from NLFPose’s reliable depth estimation:
Utilizing such representation, our framework further resolves the challenge that pose representations cannot simultaneously prevent identity leakage and preserve rich motion information.
If you find this work useful in your research, please cite:
@article{yan2025scail,
title={SCAIL: Towards Studio-Grade Character Animation via In-Context Learning of 3D-Consistent Pose Representations},
author={Yan, Wenhao and Ye, Sheng and Yang, Zhuoyi and Teng, Jiayan and Dong, ZhenHui and Wen, Kairui and Gu, Xiaotao and Liu, Yong-Jin and Tang, Jie},
journal={arXiv preprint arXiv:2512.05905},
year={2025}
}