[TorchInductor] Add ALiBi (Attention with Linear Biases) Fused Attention Pattern #144338

vyom1611 · 2025-01-07T17:28:29Z

Summary

This PR adds support for ALiBi (Attention with Linear Biases) in TorchInductor’s fused-attention. ALiBi applies a position-based bias to attention scores, improving extrapolation for language modeling tasks. With this addition, ALiBi-based attention can leverage PyTorch’s optimized _scaled_dot_product_attention kernel.

Changes

New ALiBi Pattern & Replacement
- _sfdp_pattern_alibi(...): Recognizes [Q @ Kᵀ / √d + alibi_bias] → softmax → dropout → matmul(V).
- _sfdp_replacement_alibi(...): Fuses into _scaled_dot_product_attention using attn_mask=alibi_bias.
Test
- Added _test_sdpa_rewriter_alibi in TestSDPAPatternRewriterTemplate.
- Confirms forward/backward correctness under dropout.
- If you get error: torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised: RuntimeError: Duplicate pattern: expand_default = CallFunction(aten.expand.default, KeywordArg('query'), Ignored()),
  -> run export PYTORCH_GEN_PATTERNS=1 in the terminal to generate the attention pattern.

Notes

If FlashAttention does not support ALiBi directly, PyTorch gracefully falls back to MATH or MEM-EFFICIENT kernels.
Combining ALiBi with a causal mask can be done by summing the bias and mask if needed.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @aakhundov

…ion Pattern

linux-foundation-easycla · 2025-01-07T17:28:34Z

The committers listed above are authorized under a signed CLA.

✅ login: vyom1611 / name: Vyom Sharma (e40d2f4, 503bcf0, c5fd035, bdd42ac)

pytorch-bot · 2025-01-07T17:28:34Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/144338

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit c5fd035 with merge base 96176e3 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

vyom1611 · 2025-01-07T17:31:57Z

@pytorchbot label "topic: not user facing"

eellison

Can you say more about

If you get error: torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised: RuntimeError: Duplicate pattern: expand_default = CallFunction(aten.expand.default, KeywordArg('query'), Ignored()),
-> run export PYTORCH_GEN_PATTERNS=1 in the terminal to generate the attention pattern.

We want to serialize the pattern ahead of time, as with the rest of the attention fusions. Because the additional compilation time of generating at runtime is non insignificant. This is the PYTORCH_GEN_PATTERNS=1 Can you serialize this ? See: torchgen/fuse/gen_patterns.py.

github-actions · 2025-05-05T19:34:04Z

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

vyom1611 and others added 2 commits January 7, 2025 12:22

[TorchInductor] Add ALiBi (Attention with Linear Biases) Fused Attent…

503bcf0

…ion Pattern

Merge branch 'pytorch:main' into main

bdd42ac

pytorch-bot bot added the module: inductor label Jan 7, 2025

pytorchbot added the open source label Jan 7, 2025

pytorch-bot bot added the topic: not user facing topic category label Jan 7, 2025

zou3519 requested a review from eellison January 9, 2025 18:36

zou3519 added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jan 9, 2025

Merge branch 'pytorch:main' into main

e40d2f4

eellison requested a review from drisspg January 16, 2025 19:30

eellison reviewed Jan 16, 2025

View reviewed changes

Merge branch 'pytorch:main' into main

c5fd035

github-actions bot added the Stale label May 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TorchInductor] Add ALiBi (Attention with Linear Biases) Fused Attention Pattern #144338

[TorchInductor] Add ALiBi (Attention with Linear Biases) Fused Attention Pattern #144338

vyom1611 commented Jan 7, 2025 •

edited by pytorch-bot bot

Loading

linux-foundation-easycla bot commented Jan 7, 2025 •

edited

Loading

pytorch-bot bot commented Jan 7, 2025 •

edited

Loading

vyom1611 commented Jan 7, 2025

eellison left a comment

github-actions bot commented May 5, 2025

[TorchInductor] Add ALiBi (Attention with Linear Biases) Fused Attention Pattern #144338

Are you sure you want to change the base?

[TorchInductor] Add ALiBi (Attention with Linear Biases) Fused Attention Pattern #144338

Conversation

vyom1611 commented Jan 7, 2025 • edited by pytorch-bot bot Loading

Summary

Changes

Notes

linux-foundation-easycla bot commented Jan 7, 2025 • edited Loading

pytorch-bot bot commented Jan 7, 2025 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/144338

✅ No Failures

vyom1611 commented Jan 7, 2025

eellison left a comment

Choose a reason for hiding this comment

github-actions bot commented May 5, 2025

vyom1611 commented Jan 7, 2025 •

edited by pytorch-bot bot

Loading

linux-foundation-easycla bot commented Jan 7, 2025 •

edited

Loading

pytorch-bot bot commented Jan 7, 2025 •

edited

Loading