🧢 CAP4D

Official repository for the paper

CAP4D: Creating Animatable 4D Portrait Avatars with Morphable Multi-View Diffusion Models, CVPR 2025 (Oral).

Felix Taubner^1,2, Ruihang Zhang¹, Mathieu Tuli³, David B. Lindell^1,2

¹University of Toronto, ²Vector Institute, ³LG Electronics

TL;DR: CAP4D turns any number of reference images into an animatable avatar.

⚡️ Quick start guide

🛠️ 1. Create conda environment and install requirements

# 1. Clone repo
git clone https://github.com/felixtaubner/cap4d/
cd cap4d

# 2. Create conda environment for CAP4D:
conda create --name cap4d_env python=3.10
conda activate cap4d_env

# 3. Install requirements
pip install -r requirements.txt

# 4. Set python path
export PYTHONPATH=$(realpath "./"):$PYTHONPATH

Follow the instructions and install Pytorch3D. Make sure to install with CUDA support. We recommend to install from source:

export FORCE_CUDA=1
pip install "git+https://github.com/facebookresearch/pytorch3d.git@stable"

📦 2. Download FLAME and MMDM weights

Setup your FLAME account at the FLAME website and set the username and password environment variables:

export FLAME_USERNAME=your_flame_user_name
export FLAME_PWD=your_flame_password

Download FLAME and MMDM weights using the provided scripts:

# 1. Download FLAME blendshapes
# set your flame username and password
bash scripts/download_flame.sh 

# 2. Download CAP4D MMDM weights
bash scripts/download_mmdm_weights.sh

If the FLAME download script did not work, download FLAME2023 from the FLAME website and place flame2023_no_jaw.pkl in data/assets/flame/. Then, fix the flame pkl file to be compatible with newer numpy versions:

python scripts/fixes/fix_flame_pickle.py --pickle_path data/assets/flame/flame2023_no_jaw.pkl

✅ 3. Check installation with a test run

Run the pipeline in debug settings to test the installation.

bash scripts/test_pipeline.sh

Check if a video is exported to examples/debug_output/tesla/sequence_00/renders.mp4. If it appears to show a blurry cartoon Nicola Tesla, you're all set!

🎬 4. Inference

Run the provided scripts to generate avatars and animate them with a single script:

bash scripts/generate_felix.sh
bash scripts/generate_lincoln.sh
bash scripts/generate_tesla.sh

The output directories contain exported animations which you can view in real-time. Open the real-time viewer in your browser (powered by Brush). Click Load file and upload the exported animation found in examples/output/{SUBJECT}/animation_{ID}/exported_animation.ply.

🔧 Custom inference

See below for how to run your custom inference on your own reference images/videos and driving videos.

⚙️ 1. Run FLAME 3D face tracking

1.1 FlowFace tracking

Coming soon! For now, only generations using the provided identities with precomputed FlowFace annotations are supported.

1.2 Pixel3DMM tracking

Install Pixel3DMM using the provided script. Notice that this is prone to errors due to package version mismatches. Please report any errors as an issue!

export FLAME_USERNAME=your_flame_user_name
export FLAME_PWD=your_flame_password
export PIXEL3DMM_PATH=$(realpath "../PATH/TO/pixel3dmm")  # set this to where you would like to clone the Pixel3DMM repo (absolute path)
export CAP4D_PATH=$(realpath "./")  # set this to the cap4d directory (absolute path)

bash scripts/install_pixel3Dmm.sh

Run tracking and conversion on reference images/videos using the provided script. Note: If input is a directory of frames, it is assumed to be discontinous set of (monocular!) images. If input is a file, it will assume that it is a continous monocular video.

export PIXEL3DMM_PATH=$(realpath "../PATH/TO/pixel3dmm")
export CAP4D_PATH=$(realpath "./") 

mkdir examples/output/custom/

# For more information on arguments
bash scripts/track_video_pixel3dmm.sh --help

# Process a directory of (reference) images
bash scripts/track_video_pixel3dmm.sh examples/input/felix/images/cam0/ examples/output/custom/reference_tracking/

# Optional: process a driving (or reference) video
bash scripts/track_video_pixel3dmm.sh examples/input/animation/example_video.mp4 examples/output/custom/driving_video_tracking/

Notice that results will be slightly worse than with FlowFace tracking, since the MMDM is trained with FlowFace.

🖼️ 2. Generate images using MMDM

# Generate images with single reference image
python cap4d/inference/generate_images.py --config_path configs/generation/default.yaml --reference_data_path examples/output/custom/reference_tracking/ --output_path examples/output/custom/mmdm/

Note: the generation script will use all visible CUDA devices. The more available devices, the faster it runs! This will take hours, and requires lots of RAM (ideally > 64 GB) to run smoothly.

👤 3. Fit Gaussian avatar

python gaussianavatars/train.py --config_path configs/avatar/default.yaml --source_paths examples/output/custom/mmdm/reference_images/ examples/output/custom/mmdm/generated_images/ --model_path examples/output/custom/avatar/ --interval 5000

🕺 4. Animate your avatar

Once the avatar is generated, it can be animated with the driving video computed in step 1 or the provided animations.

# Animate the avatar with provided animation files
python gaussianavatars/animate.py --model_path examples/output/custom/avatar/ --target_animation_path examples/input/animation/sequence_00/fit.npz  --target_cam_trajectory_path examples/input/animation/sequence_00/orbit.npz  --output_path examples/output/custom/animation_00/ --export_ply 1 --compress_ply 0

# Animate the avatar with driving video (computed using Pixel3DMM)
python gaussianavatars/animate.py --model_path examples/output/custom/avatar/ --target_animation_path examples/output/custom/driving_video_tracking/fit.npz  --target_cam_trajectory_path examples/output/custom/driving_video_tracking/cam_static.npz  --output_path examples/output/custom/animation_example/ --export_ply 1 --compress_ply 0

The --target_animation_path argument contains FLAME expressions and pose, while the (optional) --target_cam_trajectory_path argument contains the relative camera trajectory.

⚡️ 5. Full inference

We provide a convenient script to run full inference using your reference images and optionally a driving video.

export PIXEL3DMM_PATH=$(realpath "../PATH/TO/pixel3dmm")
export CAP4D_PATH=$(realpath "./") 

# Generate avatar with custom input images/videos.
bash scripts/generate_avatar.sh --help
bash scripts/generate_avatar.sh {INPUT_VIDEO_PATH} {OUTPUT_PATH} [{QUALITY}] [{DRIVING_VIDEO_PATH}]

# Example generation with default quality generation with input images and driving video.
bash scripts/generate_avatar.sh examples/input/felix/images/cam0/ examples/output/felix_custom/ default examples/input/animation/example_video.mp4

✨ 6. View avatar in live viewer

Open the real-time viewer in your browser (powered by Brush). Click Load file and upload the exported animation found in examples/output/custom/animation_00/exported_animation.ply or examples/output/custom/animation_example/exported_animation.ply.

📚 Related Resources

The MMDM code is based on ControlNet. The 4D Gaussian avatar code is based on GaussianAvatars. Special thanks to the authors for making their code public!

Related work:

CAT3D: Create Anything in 3D with Multi-View Diffusion Models
GaussianAvatars: Photorealistic Head Avatars with Rigged 3D Gaussians
FlowFace: 3D Face Tracking from 2D Video through Iterative Dense UV to Image Flow
StableDiffusion: High-Resolution Image Synthesis with Latent Diffusion Models
Pixel3DMM: Versatile Screen-Space Priors for Single-Image 3D Face Reconstruction

Awesome concurrent work:

Pippo: High-Resolution Multi-View Humans from a Single Image
Avat3r: Large Animatable Gaussian Reconstruction Model for High-fidelity 3D Head Avatars

📖 Citation

@inproceedings{taubner2025cap4d,
    author    = {Taubner, Felix and Zhang, Ruihang and Tuli, Mathieu and Lindell, David B.},
    title     = {{CAP4D}: Creating Animatable {4D} Portrait Avatars with Morphable Multi-View Diffusion Models},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2025},
    pages     = {5318-5330}
}

Acknowledgement

This work was developed in collaboration with and with sponsorship from LG Electronics. We gratefully acknowledge their support and contributions throughout the course of this project.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

🧢 CAP4D

⚡️ Quick start guide

🛠️ 1. Create conda environment and install requirements

📦 2. Download FLAME and MMDM weights

✅ 3. Check installation with a test run

🎬 4. Inference

🔧 Custom inference

⚙️ 1. Run FLAME 3D face tracking

1.1 FlowFace tracking

1.2 Pixel3DMM tracking

🖼️ 2. Generate images using MMDM

👤 3. Fit Gaussian avatar

🕺 4. Animate your avatar

⚡️ 5. Full inference

✨ 6. View avatar in live viewer

📚 Related Resources

📖 Citation

Acknowledgement

About

Uh oh!

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
assets		assets
cap4d		cap4d
configs		configs
controlnet		controlnet
data		data
examples/input		examples/input
flowface/flame		flowface/flame
gaussianavatars		gaussianavatars
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Uh oh!

License

Uh oh!

felixtaubner/cap4d

Folders and files

Latest commit

History

Repository files navigation

🧢 CAP4D

⚡️ Quick start guide

🛠️ 1. Create conda environment and install requirements

📦 2. Download FLAME and MMDM weights

✅ 3. Check installation with a test run

🎬 4. Inference

🔧 Custom inference

⚙️ 1. Run FLAME 3D face tracking

1.1 FlowFace tracking

1.2 Pixel3DMM tracking

🖼️ 2. Generate images using MMDM

👤 3. Fit Gaussian avatar

🕺 4. Animate your avatar

⚡️ 5. Full inference

✨ 6. View avatar in live viewer

📚 Related Resources

📖 Citation

Acknowledgement

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages