SparseDiT: Token Sparsification for Efficient Diffusion Transformer (NeurIPS 2025)
Official PyTorch Implementation
Paper by Shuning Chang, Pichao Wang, Jiasheng Tang, Fan Wang, Yi Yang.
The code is based on DiT.
First, download and set up the repo:
git clone https://github.com/changsn/SparseDit.git
cd SparseDitWe provide an environment.yml file that can be used to create a Conda environment.
conda env create -f environment.yml
conda activate SparseDiTPleaset refer to Fast-DiT to extract ImageNet vae features and download pre-trained models.
accelerate launch --multi_gpu --num_processes 8 --mixed_precision fp16 train.py --model DiT-XL/2 --pretrained /path/to/pre-trained/model --feature-path /path/to/store/features --image-size 512
You can set your hyper-parameters according to my log files.
To evaluate SparseDiT-DiT-XL-512x5112 on ImageNet on N gpus run:
torchrun --nnodes=1 --nproc_per_node=N sample_ddp.py --model DiT-XL/2 --num-fid-samples 50000 --image-size 512 --seed 1
Above command will
generate a folder of 50,000 samples as well as a .npz. We integrate the codes from ADM's TensorFlow
evaluation suite to compute FID, Inception Score and
other metrics, run:
python evaluator.py --ref_batch /path/to/reference.npz --sample_batch /path/to/sampling.npz
Replace the /path/to/reference.npz with VIRTUAL_imagenet512 which you can find in ADM's TensorFlow
evaluation suite
If you use this code for a paper please cite:
@misc{chang2025sparsedittokensparsificationefficient,
title={SparseDiT: Token Sparsification for Efficient Diffusion Transformer},
author={Shuning Chang and Pichao Wang and Jiasheng Tang and Fan Wang and Yi Yang},
year={2025},
eprint={2412.06028},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2412.06028},
}