Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[NeurIPS 2025, OPT] Official Implementation of: Exploring Magnitude Preservation and Rotation Modulation in Diffusion Transformers

Notifications You must be signed in to change notification settings

ericbill21/map-dit

Repository files navigation

Exploring Magnitude Preservation and Rotation Modulation in Diffusion Transformers

Denoising diffusion models exhibit remarkable generative capabilities, but remain challenging to train due to their inherent stochasticity, where high-variance gradient estimates lead to slow convergence. Previous works have shown that magnitude preservation helps with stabilizing training in the U-net architecture. This work explores whether this effect extends to the Diffusion Transformer (DiT) architecture. As such, we propose a magnitude-preserving design that stabilizes training without normalization layers. Motivated by the goal of maintaining activation magnitudes, we additionally introduce rotation modulation, which is a novel conditioning method using learned rotations instead of traditional scaling or shifting. Through empirical evaluations and ablation studies on small-scale models, we show that magnitude-preserving strategies significantly improve performance, notably reducing FID scores by $\sim$12.8%. Further, we show that rotation modulation combined with scaling is competitive with AdaLN, while requiring $\sim$5.4% fewer parameters. This work provides insights into conditioning strategies and magnitude control.

               

Fig 1. DiT-S/4 samples without (left) and with (right) magnitude preserving layers.

This project builds upon key concepts from the following research papers:

  • Peebles & Xie (2023) explore the application of transformer architectures to diffusion models, achieving state-of-the-art performance on various generation tasks;
  • Karras et al. (2024) introduce the idea of preserving the magnitude of features during the diffusion process, enhancing the stability and quality of generated outputs.

🚧 Code Status: Work in Progress

We're actively developing this repo. Contributions and feedback are welcome!

Training

python train.py --data-path /path/to/data --results-dir /path/to/results --model DiT-S/2 --num-steps 400_000 <map feature flags>

Magnitude Preservation Flags

Customize the training process by enabling the following flags:

  • --use-cosine-attention - Controls weight growth in attention layers.
  • --use-weight-normalization - Applies magnitude preservation in linear layers.
  • --use-forced-weight-normalization - Controls weight growth in linear layers.
  • --use-mp-residual - Enables magnitude preservation in residual connections.
  • --use-mp-silu - Uses a magnitude-preserving version of SiLU nonlinearity.
  • --use-no-layernorm - Disables transformer layer normalization.
  • --use-mp-pos-enc - Activates magnitude-preserving positional encoding.
  • --use-mp-embedding - Uses magnitude-preserving embeddings.

Sampling

python sample.py --result-dir /path/to/results/<dir> --class-label <class label>

Citation

@misc{bill2025exploringmagnitudepreservationrotation,
      title={Exploring Magnitude Preservation and Rotation Modulation in Diffusion Transformers}, 
      author={Eric Tillman Bill and Cristian Perez Jensen and Sotiris Anagnostidis and Dimitri von Rütte},
      year={2025},
      eprint={2505.19122},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2505.19122}, 
}

About

[NeurIPS 2025, OPT] Official Implementation of: Exploring Magnitude Preservation and Rotation Modulation in Diffusion Transformers

Topics

Resources

Stars

Watchers

Forks

Contributors 2

  •  
  •  

Languages