DiC: Rethinking Conv3x3 Designs in Diffusion Models

ConvUNets have been overlooked... but they outperform Diffusion Transformers!

News

6/11/2025: We have released the codes of DiC! 🔥🔥🔥 Weights, SiT, and REPA versions are coming very soon.

3/3/2025: Codes & Weights are at the final stage of inspection. We will have them released ASAP.

2/27/2025: DiC is accepted by CVPR 2025! 🎉🎉

🤔 In this work, we intend to build a diffusion model with Conv3x3 that is simple but efficient.

🔧 We re-design architectures & blocks of the model to tap the potential of Conv3x3 to the full.

🚀 The proposed DiC ConvUNets are more powerful than DiTs, and much much faster!

Repo Outline

This repo is mostly based on the official repo of DiT. Weights, SiT and REPA versions will be opensourced very soon.

Torch model script: dic_models.py

Preparation

Please run command pip install -r requirements.txt to install the supporting packages.

(Optional) Please download the VAE from this link. The VAE could be automatically downloaded as well.

Training

Here we provide two ways to train a DiC model: 1. train on the original ImageNet dataset; 2. train on preprocessed VAE features (Recommended).

Training Data Preparation Use the original ImageNet dataset + VAE encoder. Firstly, download ImageNet as follows:

imagenet/
├──train/
│  ├── n01440764
│  │   ├── n01440764_10026.JPEG
│  │   ├── n01440764_10027.JPEG
│  │   ├── ......
│  ├── ......
├──val/
│  ├── n01440764
│  │   ├── ILSVRC2012_val_00000293.JPEG
│  │   ├── ILSVRC2012_val_00002138.JPEG
│  │   ├── ......
│  ├── ......

Then run the following command:

torchrun --nnodes=1 --nproc_per_node=8 train.py --data-path={path to imagenet/train} --image-size=256 --model={model name} --epochs={iteration//5000} # fp32 Training

accelerate launch --mixed_precision fp16 train_accelerate.py --data-path {path to imagenet/train} --image-size=256 --model={model name} --epochs={iteration//5000} # fp16 Training

Training Feature Preparation (RECOMMENDED)

Following Fast-DiT, it is recommended to load VAE features directly for faster training. You don't need to download the enormous ImageNet dataset (> 100G); instead, a much smaller "VAE feature" dataset (~21G for ImageNet 256x256) is available here on HuggingFace and MindScope. Please do the following steps:

Download imagenet_feature.tar
Unzip the tar ball by running tar -xf imagenet_feature.tar

imagenet_feature/
├── imagenet256_features/ # VAE features
└── imagenet256_labels/ # labels

Append parser --feature-path={path to imagenet_feature} to the training command.

Inference

Weights

Coming soon. Please keep tuned!

Sampling

Run the following command for parallel sampling:

torch --nnodes=1 --nproc_per_node=8 sample_ddp.py --ckpt={path to checkpoint} --image-size=256 --model={model name} --cfg-scale={cfg scale}

BibTex Formatted Citation

If you find this repo useful, please cite:

@article{tian2025dic,
  author       = {Yuchuan Tian and
                  Jing Han and
                  Chengcheng Wang and
                  Yuchen Liang and
                  Chao Xu and
                  Hanting Chen},
  title        = {DiC: Rethinking Conv3x3 Designs in Diffusion Models},
  journal      = {CoRR},
  volume       = {abs/2501.00603},
  year         = {2025},
  url          = {https://doi.org/10.48550/arXiv.2501.00603},
  doi          = {10.48550/ARXIV.2501.00603},
  eprinttype    = {arXiv},
  eprint       = {2501.00603},
  timestamp    = {Mon, 10 Feb 2025 21:52:20 +0100},
  biburl       = {https://dblp.org/rec/journals/corr/abs-2501-00603.bib},
  bibsource    = {dblp computer science bibliography, https://dblp.org}
}

Acknowledgement

We acknowledge the authors of the following repos:

https://github.com/facebookresearch/DiT (Codebase)

https://github.com/YuchuanTian/U-DiT (Codebase)

https://github.com/chuanyangjin/fast-DiT (FP16 training; Training on features)

https://github.com/openai/guided-diffusion (Metric evalutation)

https://huggingface.co/stabilityai/sd-vae-ft-ema (VAE)

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
diffusion		diffusion
imgs		imgs
utils		utils
LICENSE.txt		LICENSE.txt
README.md		README.md
dic_models.py		dic_models.py
download.py		download.py
evaluator.py		evaluator.py
models.py		models.py
requirements.txt		requirements.txt
sample_ddp.py		sample_ddp.py
train.py		train.py
train_accelerate.py		train_accelerate.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

DiC: Rethinking Conv3x3 Designs in Diffusion Models

News

Repo Outline

Preparation

Training

Inference

Weights

Sampling

BibTex Formatted Citation

Acknowledgement

About

Uh oh!

Releases

Packages

Languages

Uh oh!

License

Uh oh!

YuchuanTian/DiC

Folders and files

Latest commit

History

Repository files navigation

DiC: Rethinking Conv3x3 Designs in Diffusion Models

News

Repo Outline

Preparation

Training

Inference

Weights

Sampling

BibTex Formatted Citation

Acknowledgement

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages