Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[TorchInductor] Add ALiBi (Attention with Linear Biases) Fused Attention Pattern #144338

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

vyom1611
Copy link

@vyom1611 vyom1611 commented Jan 7, 2025

Summary

This PR adds support for ALiBi (Attention with Linear Biases) in TorchInductor’s fused-attention. ALiBi applies a position-based bias to attention scores, improving extrapolation for language modeling tasks. With this addition, ALiBi-based attention can leverage PyTorch’s optimized _scaled_dot_product_attention kernel.

Changes

  • New ALiBi Pattern & Replacement
    • _sfdp_pattern_alibi(...): Recognizes [Q @ Kᵀ / √d + alibi_bias] → softmax → dropout → matmul(V).
    • _sfdp_replacement_alibi(...): Fuses into _scaled_dot_product_attention using attn_mask=alibi_bias.
  • Test
    • Added _test_sdpa_rewriter_alibi in TestSDPAPatternRewriterTemplate.
    • Confirms forward/backward correctness under dropout.
    • If you get error: torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised: RuntimeError: Duplicate pattern: expand_default = CallFunction(aten.expand.default, KeywordArg('query'), Ignored()),
      -> run export PYTORCH_GEN_PATTERNS=1 in the terminal to generate the attention pattern.

Notes

  • If FlashAttention does not support ALiBi directly, PyTorch gracefully falls back to MATH or MEM-EFFICIENT kernels.
  • Combining ALiBi with a causal mask can be done by summing the bias and mask if needed.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @aakhundov

Copy link

linux-foundation-easycla bot commented Jan 7, 2025

CLA Signed

The committers listed above are authorized under a signed CLA.

Copy link

pytorch-bot bot commented Jan 7, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/144338

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit c5fd035 with merge base 96176e3 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@vyom1611
Copy link
Author

vyom1611 commented Jan 7, 2025

@pytorchbot label "topic: not user facing"

@pytorch-bot pytorch-bot bot added the topic: not user facing topic category label Jan 7, 2025
@zou3519 zou3519 requested a review from eellison January 9, 2025 18:36
@zou3519 zou3519 added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jan 9, 2025
@eellison eellison requested a review from drisspg January 16, 2025 19:30
Copy link
Contributor

@eellison eellison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you say more about

If you get error: torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised: RuntimeError: Duplicate pattern: expand_default = CallFunction(aten.expand.default, KeywordArg('query'), Ignored()),
-> run export PYTORCH_GEN_PATTERNS=1 in the terminal to generate the attention pattern.

We want to serialize the pattern ahead of time, as with the rest of the attention fusions. Because the additional compilation time of generating at runtime is non insignificant. This is the PYTORCH_GEN_PATTERNS=1 Can you serialize this ? See: torchgen/fuse/gen_patterns.py.

Copy link
Contributor

github-actions bot commented May 5, 2025

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

@github-actions github-actions bot added the Stale label May 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: inductor open source Stale topic: not user facing topic category triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants