-
Notifications
You must be signed in to change notification settings - Fork 839
[Codegen] Fix ConvertAccGemmToGemm to handle read-write arguments correctly. #22975
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Codegen] Fix ConvertAccGemmToGemm to handle read-write arguments correctly. #22975
Conversation
…rectly.
The current pass unconditionally converts from an accumulating GEMM to
a non-accumulating GEMM. This transformation is only required when the
`outs` arguments is coming from a read-only buffer. When coming from a
read-write buffer, the accumulating gemm can be handled as is, and it
does not need to converted to a non-accumulating gemm.
For a function input like
```
func.func @acc_gemm(%lhs : tensor<?x?xf32>, %rhs: tensor<?x?xf32>,
%init : tensor<?x?xf32> {iree.abi.output = 0}) -> tensor<?x?xf32> {
%0 = linalg.matmul ins(%lhs, %rhs : tensor<?x?xf32>, tensor<?x?xf32>)
outs(%init : tensor<?x?xf32>) -> tensor<?x?xf32>
return %0 : tensor<?x?xf32>
}
```
The dispatch sees an single `init` binding that is read-write. In
those cases we dont need to convert the `linalg.matmul` into a
non-accumulating GEMM. This case can be (and is currently) handled
natively.
Signed-off-by: MaheshRavishankar <[email protected]>
jtuyls
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
compiler/src/iree/compiler/Codegen/Common/ConvertAccGEMMToGEMMPass.cpp
Outdated
Show resolved
Hide resolved
compiler/src/iree/compiler/Codegen/Common/ConvertAccGEMMToGEMMPass.cpp
Outdated
Show resolved
Hide resolved
|
I would like to verify this issue is not again caused by this PR #19546 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will note though in practice with the {iree.abi.output = 0}) I am still seeing a readonly tensor and this pass is still making the elementwise, ideally I would have liked to be able to generate the readwrite tensor and see how the underlying codegen is handling it.
here is the IR dump.
I'll take a look. I tried a matmul and it was working as expected. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems like the
/ -----// IR Dump After GPUCombineLayoutTransformationPass (iree-codegen-gpu-combine-layout-transformation) //----- //
pass is turning the readwrite tensor to read tensor, but I turned it off (with --iree-llvmgpu-test-combine-layout-transformation=false) and can confirm things workout for the shape
but I tried a shape that does need padding and there is an issue, with a large private allocation see gist here
However, turning it into a elementwise add also has the same issue as documented in #22919 so that is an unrelated problem.
…re_to_buffer`. Signed-off-by: MaheshRavishankar <[email protected]>
Signed-off-by: MaheshRavishankar <[email protected]>
Thanks @nirvedhmeshram . I looked further and adapted the pass to handle |
Signed-off-by: MaheshRavishankar <[email protected]>
…rectly. (#22975) The current pass unconditionally converts from an accumulating GEMM to a non-accumulating GEMM. This transformation is only required when the `outs` arguments is coming from a read-only buffer. When coming from a read-write buffer, the accumulating gemm can be handled as is, and it does not need to converted to a non-accumulating gemm. For a function input like ``` func.func @acc_gemm(%lhs : tensor<?x?xf32>, %rhs: tensor<?x?xf32>, %init : tensor<?x?xf32> {iree.abi.output = 0}) -> tensor<?x?xf32> { %0 = linalg.matmul ins(%lhs, %rhs : tensor<?x?xf32>, tensor<?x?xf32>) outs(%init : tensor<?x?xf32>) -> tensor<?x?xf32> return %0 : tensor<?x?xf32> } ``` The dispatch sees an single `init` binding that is read-write. In those cases we dont need to convert the `linalg.matmul` into a non-accumulating GEMM. This case can be (and is currently) handled natively. ci-extra: test_torch --------- Signed-off-by: MaheshRavishankar <[email protected]> Signed-off-by: Keshav Vinayak Jha <[email protected]>
The current pass unconditionally converts from an accumulating GEMM to a non-accumulating GEMM. This transformation is only required when the
outsarguments is coming from a read-only buffer. When coming from a read-write buffer, the accumulating gemm can be handled as is, and it does not need to converted to a non-accumulating gemm.For a function input like
The dispatch sees an single
initbinding that is read-write. In those cases we dont need to convert thelinalg.matmulinto a non-accumulating GEMM. This case can be (and is currently) handled natively.ci-extra: test_torch