Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Flex attention strides #152683

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weโ€™ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
ngc92 opened this issue May 2, 2025 · 0 comments
Open

Flex attention strides #152683

ngc92 opened this issue May 2, 2025 · 0 comments
Labels
module: flex attention module: higher order operators torch.cond and similar module: pt2-dispatcher PT2 dispatcher-related issues (e.g., aotdispatch, functionalization, faketensor, custom-op, oncall: pt2 triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@ngc92
Copy link

ngc92 commented May 2, 2025

๐Ÿ“š The doc issue

The FlexAttention docs nicely document the expected shapes of inputs, but does not specify anything about stride. In contrast, for example cuDNN documents that strides can be freely chosen except for requiring the last dimension to be contiguous. Knowing available options for striding is important, as that informs, e.g., whether it is possible to merge the QKV matmuls into a single matmul.

Also (and independently), return_lse is missing from the output documentation.

Suggest a potential alternative/fix

No response

cc @chauhang @penguinwu @zou3519 @ydwu4 @bdhirsh @Chillee @drisspg @yanboliang @BoyuanFeng

@mikaylagawarecki mikaylagawarecki added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label May 2, 2025
@pytorch-bot pytorch-bot bot added the module: pt2-dispatcher PT2 dispatcher-related issues (e.g., aotdispatch, functionalization, faketensor, custom-op, label May 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: flex attention module: higher order operators torch.cond and similar module: pt2-dispatcher PT2 dispatcher-related issues (e.g., aotdispatch, functionalization, faketensor, custom-op, oncall: pt2 triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests

2 participants