This repository provides the official implementation for the Overshooting Sampler and AMO Sampler introduced in the CVPR 2025 paper: AMO Sampler: Enhancing Text Rendering with Overshooting.
Paper Link: https://arxiv.org/abs/2411.19415
State-of-the-art text-to-image models like Stable Diffusion 3 (SD3), Flux, and AuraFlow often struggle to accurately render written text within generated images, resulting in misspelled or inconsistent text. This work introduces AMO (Attention Modulated Overshooting) Sampler, a training-free method with minimal computational overhead that significantly enhances text rendering quality in pre-trained rectified flow models.
We propose Overshooting Sampler and AMO Sampler that alternates between over-simulating the learned ODE and reintroducing noise (Langevin dynamics) to correct compounding errors from standard Euler steps.
Key Features:
- Implementation of the Overshooting Sampler and AMO Sampler.
- Training-free method, compatible with pre-trained Rectified Flow models (e.g., SD3, Flux).
- Demonstrated 32.3% (SD3) and 35.9% (Flux) improvement in text rendering accuracy.
- Minimal computational overhead compared to standard Euler samplers.
Steps:
-
Clone the repository recursively to include the modified
diffuserssubmodule:git clone --recursive [email protected]:hxixixh/amo-release.git
(If you cloned non-recursively, run
git submodule update --init --recursiveinside the directory) -
Create and activate a virtual environment:
conda create -n amo python=3.12 conda activate amo
-
Install PyTorch matching your system: Visit the PyTorch website for the correct command based on your OS and CUDA version.
-
Install the modified
diffuserssubmodule and other dependencies:# Install the modified diffusers library from the submodule pip install -e ./diffusion-amo # Install other requirements pip install -r requirements.txt
This code requires pre-trained Rectified Flow models like Stable Diffusion 3 Medium or FLUX.1-schnell. Our scripts assume models are downloaded from the Hugging Face Hub. Authentication might be required (huggingface-cli login). Models will typically be downloaded and cached automatically by the diffusers library upon first use.
Use the run.py script to generate images. Key arguments:
--scheduler: Choose the sampler (euler,overshoot).--num_inference_steps: Number of steps (e.g., 20).--model_type: Base model (fluxorsd3).--use_att: Enable Attention Modulation for the AMO sampler (UseTruefor AMO, omit orFalsefor standard Overshooting).--prompt: Text prompt to generate (Alternatively, modifyrun.pyto read fromprompts.txt).--overshooting_strength(-c): The 'c' parameter for Overshooting/AMO samplers (default: 2.0). See paper for details.
Examples:
-
Euler Sampler (Baseline):
python run.py --scheduler="euler" --num_inference_steps=20 --model_type="flux"
-
Overshooting Sampler:
python run.py --scheduler="overshoot" --num_inference_steps=20 --model_type="flux" --overshooting_strength=2.0
-
AMO Sampler:
python run.py --scheduler="overshoot" --use_att=True --num_inference_steps=20 --model_type="flux" --overshooting_strength=2.0
The file prompts.txt contains 100 diverse prompts used for human evaluation in the paper.
This implementation utilizes a modified version of the diffusers library (v0.30.1), included as the diffusion-amo submodule.
Key Modifications:
- New Scheduler:
diffusion-amo/src/diffusers/schedulers/scheduling_stochastic_rf_discrete_overshot.pyimplements the core Overshooting sampling logic, extending the standard Diffusers scheduler framework. - Modified Pipeline:
diffusers-amo/src/diffusers/pipelines/flux/pipeline_flux.pyanddiffusers-amo/src/diffusers/pipelines/stable_diffusion_3/pipeline_stable_diffusion_3.pyimplements the core AMO sampling logic with attention modulation, extending the standard Diffusers pipeline framework.
The samplers introduce parameters like overshooting strength (c), which can be configured during scheduler initialization (see run.py).
This project is licensed under the Apache 2.0 License. See the LICENSE file for details. The diffusion-amo submodule retains its original Apache 2.0 license from the diffusers library.
If you find this work useful for your research, please cite our paper:
@article{hu2024amo,
title={AMO Sampler: Enhancing Text Rendering with Overshooting},
author={Hu, Xixi and Xu, Keyang and Liu, Bo and Liu, Qiang and Fei, Hongliang},
journal={arXiv preprint arXiv:2411.19415},
year={2024}
}Please feel free to open an issue on GitHub if you encounter problems or have suggestions.