This folder contains plug-and-play inference scripts for five Diffusers video models replacing SDPA with sageattn.
Supported models:
- CogVideoX: CogVideoX-2B and CogVideoX1.5-5B
- WAN: Wan2.1-T2V-1.3B, Wan2.1-T2V-14B and Wan2.2-T2V-A14B
- HunyuanVideo: SageAttention is officially supported in the inference code via CLI flags (
--use_sageattnand--sage_blocks_range). Please refer to this Hugging Face model card for the implementation guide. - Mochi
- LTX-Video: 0.9.7-dev and spatial upscaler
We can replace scaled_dot_product_attention easily.
We will take CogvideoX as an example:
Just add the following codes and run!
from sageattention import sageattn
import torch.nn.functional as F
F.scaled_dot_product_attention = sageattnSpecifically,
cd example
python cogvideox_infer.py --model cogvideox-2b --compile --attention_type sageYou can get a lossless video in ./example/videos/<model>/<attention_type>/ faster than by using --attention_type sdpa.
Note: If you set
--compile, the first run will be slower than the following runs. Please run it twice to get the accurate speed.
Note:
torch.compileis generally incompatible withenable_sequential_cpu_offload(). Don't use them together.
To have finer control over where to use SageAttention, you can modify a small subset of the source code. For example, in modify_mochi.py, you can replace the MochiAttnProcessor2_0 from diffusers with your own attention class.
Note: In Diffusers pipelines, HunyuanVideo uses
attention_maskwhich is not supported by thesageattnAPI. As a workaround, you can follow SageAttention Issue #115 to modify the official attention implementation from the HunyuanVideo repo to split text tokens vs image tokens, then apply SageAttention only to the large image-token self-attention (mask-free), while keeping the masked/text part on SDPA/FlashAttention.
Install xDiT(xfuser >= 0.3.5) and diffusers(>=0.32.0.dev0) from sources and run:
# install latest xDiT(xfuser).
pip install "xfuser[flash_attn]"
# install latest diffusers (>=0.32.0.dev0), need by latest xDiT.
git clone https://github.com/huggingface/diffusers.git
cd diffusers && python3 setup.py bdist_wheel && cd dist && python3 -m pip install *.whl
# then run parallel sage attention inference.
./run_parallel.sh
