PyTorch implementation of Scaling 3D Human Gaussian Generation with Millions of Assets.
-
- release the pretrained model and inference code.
-
- release the training code (VAE and DiT).
-
- release the HGS-1M dataset with processing pipeline. (contact [email protected])
- Install the dependencies
# xformers is required! please refer to https://github.com/facebookresearch/xformers for details.
# for example, we use torch 2.1.0 + cuda 11.8
pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url https://download.pytorch.org/whl/cu118
pip install -U xformers --index-url https://download.pytorch.org/whl/cu118
# a modified gaussian splatting (+ depth, alpha rendering)
git clone --recursive https://github.com/ashawkey/diff-gaussian-rasterization
pip install ./diff-gaussian-rasterization
git clone https://github.com/graphdeco-inria/gaussian-splatting.git
# for mesh extraction
pip install git+https://github.com/NVlabs/nvdiffrast
# other dependencies
pip install -r requirements.txt
- Download the SMPLX and put it into the folder
/core/modules/deformers/smplx/SMPLX; Then, download the following files (refer to E3Gen) and place them in/core/modules/deformers/template:
- SMPL-X segmentation file(smplx_vert_segmentation.json)
- SMPL-X UV(smplx_uv.obj)
- SMPL-X FLAME Correspondence(SMPL-X__FLAME_vertex_ids.npy)
- FLAME with mouth Mesh Template(head_template_mesh_mouth.obj)
- FLAME Mesh Template(head_template.obj)
- FLAME Mask(FLAME_masks.pkl)
- Extract templates
cd core/modules/deformers
# preprocess for uv, obtain new uv for smplx_mouth.obj
python preprocess_smplx.py
# save subdivide smplx mesh and corresponding uv
python subdivide_smplx.py
# save parameters for init
python utils_smplx.py
python utils_uvpos.py
- Download the pretrained weights Our VAE and DiT weights can be download from the huggingface:
# VAE
cd ckpt/autoencoder
wget https://huggingface.co/Mr-Hang/SIGMAN/blob/main/autoencoder.safetensors
# DiT
cd ckpt/transformer
wget https://huggingface.co/Mr-Hang/SIGMAN/blob/main/transformer.safetensors
# Image Encoder
mkdir ckpt/sapiens_1b
cd ckpt/sapiens_1b
wget https://huggingface.co/facebook/sapiens-pretrain-1b-torchscript/blob/main/sapiens_1b_epoch_173_torchscript.pt2
# Generation
python scripts/test_DiT.py --image_path demo/images/demo.jpg --pose_path demo/poses/smplx_demo.npz
# Eval of VAE
python scripts/test_vae.py
We use an .npy file to record the root directory of all items for training and evaluation. To train the model, you should should organize each data item in the following format:
├── UV
├── camera_full_calibration.json
├── smplx.npz
├── rgb_map
│ ├── 0000.jpg
│ ├── 0001.jpg
│ ├── ...
│ ├── 0090.jpg
└── mask_map
│ ├── 0000.png
│ ├── 0001.png
│ ├── ...
│ ├── 0090.png
To obtain UVmap for training the VAE, please referr to:
bash core/proj_UV/runs.sh
Then, training the VAE or DiT:
# VAE
# --disc_start specifies the step to employ the Gan loss
accelerate launch --config_file configs/training.yaml --main_process_ip=${MASTER_ADDR} --main_process_port=${MASTER_PORT} --machine_rank=${RANK} \
train_vae.py vae_b --workspace /output_folder --batch_size 8 --wandb_name xxx --disc_start xxx
# DiT
accelerate launch --config_file configs/training.yaml --main_process_ip=${MASTER_ADDR} --main_process_port=${MASTER_PORT} --machine_rank=${RANK} \
train_DiT.py DiT --workspace /output_folder --batch_size xxx
@article{yang2025sigman,
title={SIGMAN: Scaling 3D Human Gaussian Generation with Millions of Assets},
author={Yang, Yuhang and Liu, Fengqi and Lu, Yixing and Zhao, Qin and Wu, Pingyu and Zhai, Wei and Yi, Ran and Cao, Yang and Ma, Lizhuang and Zha, Zheng-Jun and others},
journal={arXiv preprint arXiv:2504.06982},
year={2025}
}