Peng Sun1,2 · Yi Jiang2 · Tao Lin1
1Westlake University 2Zhejiang University
Official PyTorch implementation of UCGM: A unified framework for training, sampling, and understanding continuous generative models (diffusion, flow-matching, consistency models).
Generated samples from two 675M diffusion transformers trained with UCGM on ImageNet-1K 512×512.
Left: A multi-step model (Steps=40, FID=1.48) | Right: A few-step model (Steps=2, FID=1.75)
Samples generated without classifier-free guidance or other guidance techniques.
- 🚀 Unified Framework: Train/sample diffusion, flow-matching, and consistency models in one system
- 🔌 Plug-and-Play Acceleration: UCGM-S boosts pre-trained models—e.g., given a model from REPA-E (on ImageNet 256×256), cuts 84% sampling steps (NFE=500 → NFE=80) while improving FID (1.26 → 1.06)
- 🥇 SOTA Performance: UCGM-T-trained models outperform peers at low steps (1.21 FID @ 30 steps on ImageNet 256×256, 1.48 FID @ 40 steps on 512×512)
- ⚡ Few-Step Mastery: Just 2 steps? Still strong (1.42 FID on 256×256, 1.75 FID on 512×512)
- 🚫 Guidance-Free: No classifier-free guidance for UCGM-T-trained models, simpler and faster
- 🏗️ Architecture & Dataset Flexibility: Compatible with diverse datasets (ImageNet, CIFAR, etc.) and VAEs/neural architectures (CNNs, Transformers)
- 📖 Check more features in our paper!
-
Download necessary files from Huggingface, including:
- Checkpoints of various VAEs
- Statistic files for datasets
- Reference files for FID calculation
-
Place the downloaded
outputsandbuffersfolders at the same directory level as thisREADME.md -
For dataset preparation (skip if not training models), run:
bash scripts/data/in1k256.shAccelerate any continuous generative model (diffusion, flow-matching, etc.) with UCGM-S. Results marked with ⚡ denote UCGM-S acceleration.
NFE = Number of Function Evaluations (sampling computation cost)
| Method | Model Size | Dataset | Resolution | NFE | FID | NFE (⚡) | FID (⚡) | Model |
|---|---|---|---|---|---|---|---|---|
| REPA-E | 675M | ImageNet | 256×256 | 250×2 | 1.26 | 40×2 | 1.06 | Link |
| Lightning-DiT | 675M | ImageNet | 256×256 | 250×2 | 1.35 | 50×2 | 1.21 | Link |
| DDT | 675M | ImageNet | 256×256 | 250×2 | 1.26 | 50×2 | 1.27 | Link |
| EDM2-S | 280M | ImageNet | 512×512 | 63 | 2.56 | 40 | 2.53 | Link |
| EDM2-L | 778M | ImageNet | 512×512 | 63 | 2.06 | 50 | 2.04 | Link |
| EDM2-XXL | 1.5B | ImageNet | 512×512 | 63 | 1.91 | 40 | 1.88 | Link |
| DDT | 675M | ImageNet | 512×512 | 250×2 | 1.28 | 150×2 | 1.24 | Link |
💻 Usage Examples: Generate images and evaluate FID using a REPA-E trained model:
bash scripts/run_eval.sh ./configs/sampling_multi_steps/in1k256_sit_xl_repae_linear.yamlTrain multi-step and few-step models (diffusion, flow-matching, consistency) with UCGM-T. All models sample efficiently without guidance.
| Encoders | Model Size | Resolution | Dataset | NFE | FID | Model |
|---|---|---|---|---|---|---|
| VA-VAE | 675M | 256×256 | ImageNet | 30 | 1.21 | Link |
| VA-VAE | 675M | 256×256 | ImageNet | 2 | 1.42 | Link |
| DC-AE | 675M | 512×512 | ImageNet | 40 | 1.48 | Link |
| DC-AE | 675M | 512×512 | ImageNet | 2 | 1.75 | Link |
💻 Usage Examples
Generate Images:
# Generate samples using our pretrained few-step model
bash scripts/run_eval.sh ./configs/training_few_steps/in1k256_tit_xl_vavae.yamlTrain Models:
# Train a new multi-step model (full training)
bash scripts/run_train.sh ./configs/training_multi_steps/in1k512_tit_xl_dcae.yaml
# Convert to few-step model (requires pretrained multi-step checkpoint)
bash scripts/run_train.sh ./configs/training_few_steps/in1k512_tit_xl_dcae.yaml❗ Note for few-step training:
- Requires initialization from a multi-step checkpoint
- Prepare your checkpoint file with both
modelandemakeys:{ "model": multi_step_ckpt["ema"], "ema": multi_step_ckpt["ema"] }
If you find this repository helpful for your project, please consider citing our work:
@article{sun2025unified,
title = {Unified continuous generative models},
author = {Sun, Peng and Jiang, Yi and Lin, Tao},
journal = {arXiv preprint arXiv:2505.07447},
year = {2025},
url = {https://arxiv.org/abs/2505.07447},
archiveprefix = {arXiv},
eprint = {2505.07447},
primaryclass = {cs.LG}
}
Apache License 2.0 - See LICENSE for details.