Codestin Search App

Attention-Gym: Triton-Based Sparse and Quantization Attention

Attention-Gym is a flexible and efficient framework built on Triton, designed to help researchers and developers rapidly implement, test, and validate innovative attention mechanisms. With support for sparse and quantized attention, it provides a powerful base environment for experimenting with new algorithms and optimizing existing ones.

Requirements

python>=3.9 , torch>=2.3.0 , triton>=3.0.0, NVIDIA GPUs (Compute Capability 8.0+)
Notice: FP8 dtype is only supported on NVIDIA GPUs (Compute Capability 9.0+)

Installation

pip install -e.

Kernels

Now Support:

How to Use

To easy use:

import attention_gym
out = attention_gym.sageattn_qk_int8_pv_fp16_triton(q, k, v, tensor_layout="HND", is_causal=False)

q, k, v are FP16/BF16 dtype with the shape (batch_size, head_num, seq_len, head_dim) using default tensor_layout="HND". For shape (batch_size, seq_len, head_num, head_dim), set tensor_layout="NHD".
is_causal determines the use of a causal mask.

Kernel Tests

To run the tests:

pytest tests/test_sageattn_qk_int8_pv_fp16.py

Kernel Benchmark

To run the benchmarks:

python benchmarks/benchmark_sage1.py

End-to-end Performance And Accuracy

Here we compare the end-to-end performance and accuracy of the original algorithm author's CUDA implementation and the attention-gym triton implementation of each algorithm.

Algorithm	CUDA Time	Triton Time	Env
STA	1639.61s	1853.24s	wanx2.1-14B H20 2-gpus
sparge_sage2	260s	268s	wanx2.1-1.3B H20 1-gpu
sage2	348.95s	359.94s	wanx2.1-1.3B H20 1-gpu

Acknowledgement

We learned the design and resued some code from the following projects: triton, FastVideo, SpargeAttn, SageAttention

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
assets		assets
attention_gym		attention_gym
benchmarks		benchmarks
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Attention-Gym: Triton-Based Sparse and Quantization Attention

Requirements

Installation

Kernels

How to Use

Kernel Tests

Kernel Benchmark

End-to-end Performance And Accuracy

Acknowledgement

About

Uh oh!

Releases

Packages

Contributors 4

Uh oh!

Languages

License

RiseAI-Sys/attention-gym

Folders and files

Latest commit

History

Repository files navigation

Attention-Gym: Triton-Based Sparse and Quantization Attention

Requirements

Installation

Kernels

How to Use

Kernel Tests

Kernel Benchmark

End-to-end Performance And Accuracy

Acknowledgement

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Uh oh!

Languages

Packages