Flex attention strides #152683

ngc92 · 2025-05-02T12:38:03Z

📚 The doc issue

The FlexAttention docs nicely document the expected shapes of inputs, but does not specify anything about stride. In contrast, for example cuDNN documents that strides can be freely chosen except for requiring the last dimension to be contiguous. Knowing available options for striding is important, as that informs, e.g., whether it is possible to merge the QKV matmuls into a single matmul.

Also (and independently), return_lse is missing from the output documentation.

Suggest a potential alternative/fix

No response

cc @chauhang @penguinwu @zou3519 @ydwu4 @bdhirsh @Chillee @drisspg @yanboliang @BoyuanFeng

The text was updated successfully, but these errors were encountered:

mikaylagawarecki added the module: flex attention label May 2, 2025

pytorch-bot bot added module: higher order operators torch.cond and similar oncall: pt2 labels May 2, 2025

mikaylagawarecki added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label May 2, 2025

pytorch-bot bot added the module: pt2-dispatcher PT2 dispatcher-related issues (e.g., aotdispatch, functionalization, faketensor, custom-op, label May 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flex attention strides #152683

Flex attention strides #152683

ngc92 commented May 2, 2025 •

edited by pytorch-bot bot

Loading

Flex attention strides #152683

Flex attention strides #152683

Comments

ngc92 commented May 2, 2025 • edited by pytorch-bot bot Loading

📚 The doc issue

Suggest a potential alternative/fix

ngc92 commented May 2, 2025 •

edited by pytorch-bot bot

Loading