-
Notifications
You must be signed in to change notification settings - Fork 25.5k
[FlexAttention] explicilty create grad_q w/ strides #152641
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/152641
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ✅ No FailuresAs of commit b87b57e with merge base 64957db ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should probably still figure out why query got mutated, we're still using it for dtype and device
stride=[sympy.sympify(s) for s in grad_query_strides], | ||
dtype=query.get_dtype(), | ||
device=query.get_device(), | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in your test, what different strides do you get for grad_query
before/after your change?
I could believe this is right. But one thing I'm surprised by is that it looks like empty_like()
effectively calls these same two functions under the hood when you run it with the default MemoryFormat::Preserve (
infer_dense_strides+
empty_strided`). See here:
infer_dense_strides(self.sizes(), self.strides()); |
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
@pytorchbot merge |
The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command |
@pytorchbot merge -i "the command is hanging / not making progress" |
❌ 🤖 pytorchbot command failed:
Try |
@pytorchbot merge -i |
Merge startedYour change will be merged while ignoring the following 0 checks: Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
@pytorchbot merge -f "This seems to be in a weird state" |
The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command |
@pytorchbot merge -f "This seems to be in a weird state" |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
@pytorchbot cherry-pick --onto release/2.7 -c critical |
Fixes: #147463 There is a mismatch between inductor's lowering for empty_like and it does not match the behavior of eager. The strides do not match preserve format #144699 Pull Request resolved: #152641 Approved by: https://github.com/xmfan (cherry picked from commit a6ea63a)
Cherry picking #152641The cherry pick PR is at #153641 and it is recommended to link a critical cherry pick PR with an issue. The following tracker issues are updated: Details for Dev Infra teamRaised by workflow job |
[FlexAttention] explicilty create grad_q w/ strides (#152641) Fixes: #147463 There is a mismatch between inductor's lowering for empty_like and it does not match the behavior of eager. The strides do not match preserve format #144699 Pull Request resolved: #152641 Approved by: https://github.com/xmfan (cherry picked from commit a6ea63a) Co-authored-by: drisspg <[email protected]>
Stack from ghstack (oldest at bottom):
Fixes: #147463
There is a mismatch between inductor's lowering for empty_like and it does not match the behavior of eager. The strides do not match preserve format
#144699
cc @msaroufim @jerryzh168 @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov @eellison