DilateQuant is a novel quantization-aware training (QAT) framework for accelerating diffusion models. It maintains high-quality image generation at 4-bit and 6-bit precision. Specifically, we find the unsaturation property of the in-channel weights and exploit it to alleviate the wide range of activations. By dilating the unsaturated channels to a constrained range, our method (WD) steadily minimizes quantization errors and ensures the convergence of QAT training. Furthermore, we design a flexible quantizer (TPQ) and introduce a novel knowledge distillation strategy (BKD) to further enhance performance while significantly improving training efficiency.
🔹 Overview Methods
🔹 Weight Dilation
This repository provides the official implementation for DilateQuant, including calibration, training, inference without any reservation. The evaluation and deployment settings are the same as those used in EDA-DM.
Clone this repository, and then create and activate a suitable conda environment named dilatequant by using the following command:
git clone https://github.com/BienLuky/DilateQuant.git
cd DilateQuant
conda env create -f env.yaml
conda activate dilatequant-
For Latent Diffusion and Stable Diffusion experiments, first download relevant checkpoints following the instructions in the latent-diffusion and stable-diffusion repos from CompVis. We currently use
sd-v1-4.ckptfor Stable Diffusion. -
Then use the following commands to run:
# CIFAR-10 (DDIM)
bash scripts/for_cifar.sh
# LSUN Bedroom (LDM-4)
bash scripts/for_bedroom.sh
# LSUN Church (LDM-8)
bash scripts/for_church.sh
# ImageNet (LDM-4)
bash scripts/for_imagenet.sh
# COCO (Stable Diffusion)
bash scripts/for_coco.shThis work is built upon EDA-DM as the baseline. We adopt a random sampling strategy to construct the calibration, and employ TPQ to assign separate quantization parameters for each diffusion timestep. During the distillation-based optimization, we specifically utilize an index-based approach to update the quantization parameters associated with each diffusion timestep.
We deploy the quantized models on RTX 3090 GPU.
| Task | Method | Calib. | Prec. | FID ↓ | sFID ↓ | IS ↑ |
|---|---|---|---|---|---|---|
|
ImageNet 256 × 256 LDM-4 steps = 20 eta = 0.0 scale = 3.0 |
FP | – | 32/32 | 11.69 | 7.67 | 364.72 |
| PTQD | 1024 | 6/6 | 16.38 | 17.79 | 146.78 | |
| EDA-DM | 1024 | 6/6 | 11.52 | 8.02 | 360.77 | |
| TFMQ-DM | 10240 | 6/6 | 7.83 | 8.23 | 311.32 | |
| EfficientDM | 102.4K | 6/6 | 8.69 | 8.10 | 309.52 | |
| QuEST | 5120 | 6/6 | 8.45 | 9.36 | 310.12 | |
| DilateQuant | 1024 | 6/6 | 8.25 | 7.66 | 312.30 | |
| PTQD | 1024 | 4/4 | 245.84 | 107.63 | 2.88 | |
| EDA-DM | 1024 | 4/4 | 20.02 | 36.66 | 204.93 | |
| TFMQ-DM | 10240 | 4/4 | 258.81 | 152.42 | 2.40 | |
| TCAQ-DM | – | 4/4 | 30.69 | 18.92 | 86.11 | |
| EfficientDM | 102.4K | 4/4 | 12.08 | 14.75 | 122.12 | |
| QuEST | 5120 | 4/4 | 38.43 | 29.27 | 69.58 | |
| DilateQuant | 1024 | 4/4 | 8.01 | 13.92 | 257.24 |
| Task | Method | Framework | Calib. | Data | Time | Memory |
|---|---|---|---|---|---|---|
| CIFAR-10 32 × 32 |
EDA-DM | PTQ | 5120 | 0 | 0.97 h | 3019 MB |
| LSQ | QAT | – | 50K | 13.89 h | 9974 MB | |
| EfficientDM | V-QAT | 819.2K | 0 | 2.98 h | 9546 MB | |
| DilateQuant | V-QAT | 5120 | 0 | 1.08 h | 3439 MB | |
| ImageNet 256 × 256 |
QuEST | V-QAT | 5120 | 0 | 15.25 h | 20642 MB |
| DilateQuant | V-QAT | 1024 | 0 | 6.56 h | 14680 MB |
This code was developed based on EDA-DM. We thank torch-fidelity, pytorch-fid, and clip-score for IS, sFID, FID and CLIP score computation.
If you find this work useful in your research, please consider citing our paper:
@article{liu2024dilatequant,
title={DilateQuant: Accurate and Efficient Diffusion Quantization via Weight Dilation},
author={Liu, Xuewen and Li, Zhikai and Gu, Qingyi},
journal={arXiv preprint arXiv:2409.14307},
year={2024}
}
@article{liu2024enhanced,
title={Enhanced distribution alignment for post-training quantization of diffusion models},
author={Liu, Xuewen and Li, Zhikai and Xiao, Junrui and Gu, Qingyi},
journal={arXiv e-prints},
pages={arXiv--2401},
year={2024}
}


