Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Fix attention for large sizes#2903

Merged
awni merged 1 commit intomainfrom
fix_attn_large_size
Dec 13, 2025
Merged

Fix attention for large sizes#2903
awni merged 1 commit intomainfrom
fix_attn_large_size

Conversation

@awni
Copy link
Member

@awni awni commented Dec 12, 2025

Close #2894

@awni awni requested a review from angeloskath December 12, 2025 21:09
@awni
Copy link
Member Author

awni commented Dec 12, 2025

The change in the mma loader is just to speed it up so we don't lose perf using int64 stride for the mask.

Copy link
Member

@angeloskath angeloskath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! I presume you ran some tests to check if there is any regression...

@awni
Copy link
Member Author

awni commented Dec 13, 2025

I presume you ran some tests to check if there is any regression...

Yes I ran a benchmark for just SDPA and a model prefill benchmark and there is no change.

In fact just changing to int64 without changing the loader was a consistent 1-2% slowdown on M2 Ultra (so not that bad). Changing the loader brought it back.

@awni awni merged commit 47d2505 into main Dec 13, 2025
12 checks passed
@awni awni deleted the fix_attn_large_size branch December 13, 2025 14:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

mx.fast.scaled_dot_product_attention produces incorrect results with boolean masks > 2^31 elements

2 participants