example

Plug-and-play Example

This folder contains plug-and-play inference scripts for five Diffusers video models replacing SDPA with sageattn.

Supported models:

CogVideoX: CogVideoX-2B and CogVideoX1.5-5B
WAN: Wan2.1-T2V-1.3B, Wan2.1-T2V-14B and Wan2.2-T2V-A14B
HunyuanVideo: SageAttention is officially supported in the inference code via CLI flags (--use_sageattn and --sage_blocks_range). Please refer to this Hugging Face model card for the implementation guide.
Mochi
LTX-Video: 0.9.7-dev and spatial upscaler

We can replace scaled_dot_product_attention easily.
We will take CogvideoX as an example:

Just add the following codes and run!

from sageattention import sageattn
import torch.nn.functional as F

F.scaled_dot_product_attention = sageattn

Specifically,

cd example
python cogvideox_infer.py --model cogvideox-2b --compile --attention_type sage

You can get a lossless video in ./example/videos/<model>/<attention_type>/ faster than by using --attention_type sdpa.

Note: If you set --compile, the first run will be slower than the following runs. Please run it twice to get the accurate speed.

Note: torch.compile is generally incompatible with enable_sequential_cpu_offload(). Don't use them together.

Modify Attention From Source Code

To have finer control over where to use SageAttention, you can modify a small subset of the source code. For example, in modify_mochi.py, you can replace the MochiAttnProcessor2_0 from diffusers with your own attention class.

Note: In Diffusers pipelines, HunyuanVideo uses attention_mask which is not supported by the sageattn API. As a workaround, you can follow SageAttention Issue #115 to modify the official attention implementation from the HunyuanVideo repo to split text tokens vs image tokens, then apply SageAttention only to the large image-token self-attention (mask-free), while keeping the masked/text part on SDPA/FlashAttention.

Parallel SageAttention Inference

Install xDiT(xfuser >= 0.3.5) and diffusers(>=0.32.0.dev0) from sources and run:

# install latest xDiT(xfuser).
pip install "xfuser[flash_attn]"
# install latest diffusers (>=0.32.0.dev0), need by latest xDiT.
git clone https://github.com/huggingface/diffusers.git
cd diffusers && python3 setup.py bdist_wheel && cd dist && python3 -m pip install *.whl
# then run parallel sage attention inference.
./run_parallel.sh

Name		Name	Last commit message	Last commit date
parent directory ..
modify_model		modify_model
videos		videos
README.md		README.md
cogvideox_infer.py		cogvideox_infer.py
hunyuan_infer.py		hunyuan_infer.py
ltx_infer.py		ltx_infer.py
mochi_infer.py		mochi_infer.py
parallel_sageattn_cogvideo.py		parallel_sageattn_cogvideo.py
run_parallel.sh		run_parallel.sh
wan_infer.py		wan_infer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Plug-and-play Example

Modify Attention From Source Code

Parallel SageAttention Inference

FilesExpand file tree

example

Directory actions

More options

Directory actions

More options

Latest commit

History

example

Folders and files

parent directory

README.md

Plug-and-play Example

Modify Attention From Source Code

Parallel SageAttention Inference