-
Notifications
You must be signed in to change notification settings - Fork 67
Fix allocation logic: unconnected alloc/logical #5185
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: jj/allocation_PR_0
Are you sure you want to change the base?
Conversation
Review updated until commit 87afb60 Description
Changes walkthrough 📝
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
c64d299
to
33d0ce3
Compare
33d0ce3
to
17df15a
Compare
!test |
17df15a
to
f9acfc3
Compare
!test |
1. refactor buffer allocation buffer to use allocation domain, intead of logical domain. 2. fixing projection from allocation to logical special path when projection is not possible: We now compute correct extent instead of returning the allocation buffer as-is, this allows that layout op to return a tensor with the correct logical size, while still allocating a large enough buffer to accommodate the padding requirement.
f9acfc3
to
87afb60
Compare
!test |
Stacked PRs
Breaking original PR #5170 into three:
#5186 Fix allocation logic: non-divisible split
#5185 Fix allocation logic: unconnected alloc/logical <- this one
#5184 Allow non-device split on allocation domain
This PR
Context
PreprocessGroupedMatmulInputSf op has:
Existing allocation logic allocate buffer matches logical sizes/strides. This is not the right behavior. Because allocation domain could have larger extent. We also cannot use allocation sizes/strides neither, because consumer of the tensor expects a tensor matching the logical size.
We updated the logic to use allocation domain for buffer allocation. Then we slice into the buffer using logical domain to produce the correct-sized output.
For the case of PreprocessGroupedMatmulInputSf, because there's no correct way to slice into the buffer for indexing, we give up on producing correct strides and just use a naive stride instead. It's safe to do so, since we are not using indexing logic on the output.
Code change