HumanLift: Single-Image 3D Human Reconstruction with 3D-Aware Diffusion Priors and Facial Enhancement
1. Institute of Computing Technology, Chinese Academy of Sciences 2. University of Chinese Academy of Sciences
3. Hong Kong University of Science and Technology 4. Cardiff University
Jie Yang1, Bo-Tao Zhang1,2, Feng-Lin Liu1,2, Hongbo Fu3, Yu-Kun Lai4, Lin Gao1,2
SIGGRAPH ASIA 2025
HumanLift elevates a single reference image to a 3D animatable human, enabling view-consistent and photorealistic full-body image synthesis with high-quality facial details.
Install the basic dependencies for multi-view generation (based on DiffSynth):
pip install torch torchvision numpy==1.23 Pillow huggingface_hubThen install the following to obtain SMPL condition images:
# Install PyTorch3D
pip install "git+https://github.com/facebookresearch/pytorch3d.git"
# Install mmcv-full
pip install "mmcv-full>=1.3.17,<1.6.0" -f https://download.openmmlab.com/mmcv/dist/cu117/torch2.0.1/index.html
# Install mmhuman3d
pip install "git+https://github.com/open-mmlab/mmhuman3d.git"Install the 3D Gaussian Splatting package:
pip install gsplatTo obtain an animatable 3D human, set up the LHM environment and download the pretrained models.
Estimate SMPL-X parameters and render multi-view images from an input image:
python pose_estimation/video2motion.py \
--input_path ./images/2.jpg \
--output_path ./motion \
--visualizeGenerate multi-view RGB images using the input image and semantic maps.
Download the required model checkpoints from Google Drive and place them in the ckpt (refer to inference_wan_rgb.py for the expected path structure).
- Copy
./images/todata/data/and rename it totest - Copy
./motion/todata/output/and rename it totest - Update model paths in
inference_wan_rgb.py(the downloaded Wan2.1-14B and fine-tuned weights from our google drive)
python inference_wan_rgb.py- Remove backgrounds from generated RGB images and save as transparent RGBA
- Pad images to 832×832 resolution
- Copy processed images to
3-gs_recon/data/test/images/and rename sequentially aslgt0_r_0000.pngtolgt0_r_0080.png
python train.py -s data/test -m output/testEdit train.sh to set:
- Dataset path (
dataset_path) - Wan2.1-14B model path
bash train.sh
⚠️ This section provides an alternative animation method that may yield lower quality compared to the main reconstruction pipeline.
- Use WeShopAI Fashion Model Pose Change to generate a T‑pose image (image A) with the prompt:
"a full-body portrait of a person standing with arms and legs spread apart".
- Set
IMAGE_INPUTinpredict.shand run:bash predict.sh
- SMPL-rendered images will be saved in
tmp/test/smplimagesrgb.
- Use Photoshop to align the T‑pose image (image A) with the first SMPL rendering (
000000.png), producing an aligned reference image (image B). - Why Photoshop? Current SMPL estimation models are not designed for orthographic camera alignment.
- Place SMPL renderings and image B into HumanWan-Dit and modify hyperparameters in
inference_wan_rgb.py. - Run:
python inference_wan_rgb.py
- Remove backgrounds from all 81 multi-view images.
- Save as RGBA format with transparency.
Set the following paths in inference.sh:
IMAGE_INPUT: path to image A (T‑pose)MOTION_SEQS_DIR: SMPL motion folderDATASET_DIR: RGBA multi-view images folder
Then run:
bash inference.shWe thank the following open-source projects:
DiffSynth, LHM,
WeShopAI Fashion Model Pose Change,
gsplat, and many other inspiring works.
If you use this work, please cite our SIGGRAPH ASIA 2025 paper. For questions or issues, open an issue on the repository.
@inproceedings{humanlift2025
author = {Yang, Jie and Zhang, Bo-Tao and Liu, Feng-Lin abd Fu, Hongbo and Lai, Yu-Kun and Gao, Lin},
title = {HumanLift: Single-Image 3D Human Reconstruction with 3D-Aware Diffusion Priors and Facial Enhancement},
year = {2025},
url = {https://doi.org/10.1145/3757377.3763839},
doi = {10.1145/3757377.3763839},
booktitle = {SIGGRAPH Asia 2025 Conference Papers (SA Conference Papers '25)},
articleno = {31},
numpages = {12},
series = {SIGGRAPH ASIA Conference Papers '25}
}