Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@MaheshRavishankar
Copy link
Collaborator

@MaheshRavishankar MaheshRavishankar commented Dec 23, 2025

The current pass unconditionally converts from an accumulating GEMM to a non-accumulating GEMM. This transformation is only required when the outs arguments is coming from a read-only buffer. When coming from a read-write buffer, the accumulating gemm can be handled as is, and it does not need to converted to a non-accumulating gemm.

For a function input like

func.func @acc_gemm(%lhs : tensor<?x?xf32>, %rhs: tensor<?x?xf32>,
    %init : tensor<?x?xf32> {iree.abi.output = 0}) -> tensor<?x?xf32> {
  %0 = linalg.matmul ins(%lhs, %rhs : tensor<?x?xf32>, tensor<?x?xf32>)
      outs(%init : tensor<?x?xf32>) -> tensor<?x?xf32>
  return %0 : tensor<?x?xf32>
}

The dispatch sees an single init binding that is read-write. In those cases we dont need to convert the linalg.matmul into a non-accumulating GEMM. This case can be (and is currently) handled natively.

ci-extra: test_torch

…rectly.

The current pass unconditionally converts from an accumulating GEMM to
a non-accumulating GEMM. This transformation is only required when the
`outs` arguments is coming from a read-only buffer. When coming from a
read-write buffer, the accumulating gemm can be handled as is, and it
does not need to converted to a non-accumulating gemm.

For a function input like

```
func.func @acc_gemm(%lhs : tensor<?x?xf32>, %rhs: tensor<?x?xf32>,
    %init : tensor<?x?xf32> {iree.abi.output = 0}) -> tensor<?x?xf32> {
  %0 = linalg.matmul ins(%lhs, %rhs : tensor<?x?xf32>, tensor<?x?xf32>)
      outs(%init : tensor<?x?xf32>) -> tensor<?x?xf32>
  return %0 : tensor<?x?xf32>
}
```

The dispatch sees an single `init` binding that is read-write. In
those cases we dont need to convert the `linalg.matmul` into a
non-accumulating GEMM. This case can be (and is currently) handled
natively.

Signed-off-by: MaheshRavishankar <[email protected]>
Copy link
Contributor

@jtuyls jtuyls left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@nirvedhmeshram
Copy link
Contributor

I would like to verify this issue is not again caused by this PR #19546
I will update once I do a check, please hold on landing until then.

Copy link
Contributor

@nirvedhmeshram nirvedhmeshram left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will note though in practice with the {iree.abi.output = 0}) I am still seeing a readonly tensor and this pass is still making the elementwise, ideally I would have liked to be able to generate the readwrite tensor and see how the underlying codegen is handling it.
here is the IR dump.

@MaheshRavishankar
Copy link
Collaborator Author

I will note though in practice with the {iree.abi.output = 0}) I am still seeing a readonly tensor and this pass is still making the elementwise, ideally I would have liked to be able to generate the readwrite tensor and see how the underlying codegen is handling it.
here is the IR dump.

I'll take a look. I tried a matmul and it was working as expected.

Copy link
Contributor

@nirvedhmeshram nirvedhmeshram left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like the

/ -----// IR Dump After GPUCombineLayoutTransformationPass (iree-codegen-gpu-combine-layout-transformation) //----- //

pass is turning the readwrite tensor to read tensor, but I turned it off (with --iree-llvmgpu-test-combine-layout-transformation=false) and can confirm things workout for the shape
but I tried a shape that does need padding and there is an issue, with a large private allocation see gist here

However, turning it into a elementwise add also has the same issue as documented in #22919 so that is an unrelated problem.

Signed-off-by: MaheshRavishankar <[email protected]>
@MaheshRavishankar
Copy link
Collaborator Author

seems like the

/ -----// IR Dump After GPUCombineLayoutTransformationPass (iree-codegen-gpu-combine-layout-transformation) //----- //

pass is turning the readwrite tensor to read tensor, but I turned it off (with --iree-llvmgpu-test-combine-layout-transformation=false) and can confirm things workout for the shape but I tried a shape that does need padding and there is an issue, with a large private allocation see gist here

However, turning it into a elementwise add also has the same issue as documented in #22919 so that is an unrelated problem.

Thanks @nirvedhmeshram . I looked further and adapted the pass to handle iree_codegen.load_from_buffer and iree_codegen.store_to_buffer. Hopefully that makes these cases easier. There is still a private allocation being created that needs padding. I think this is indeed a pre-existing problem, but hopefully this makes it easier to handle.

Signed-off-by: MaheshRavishankar <[email protected]>
@MaheshRavishankar MaheshRavishankar merged commit 55430fd into iree-org:main Dec 30, 2025
98 of 102 checks passed
keshavvinayak01 pushed a commit that referenced this pull request Jan 27, 2026
…rectly. (#22975)

The current pass unconditionally converts from an accumulating GEMM to a
non-accumulating GEMM. This transformation is only required when the
`outs` arguments is coming from a read-only buffer. When coming from a
read-write buffer, the accumulating gemm can be handled as is, and it
does not need to converted to a non-accumulating gemm.

For a function input like

```
func.func @acc_gemm(%lhs : tensor<?x?xf32>, %rhs: tensor<?x?xf32>,
    %init : tensor<?x?xf32> {iree.abi.output = 0}) -> tensor<?x?xf32> {
  %0 = linalg.matmul ins(%lhs, %rhs : tensor<?x?xf32>, tensor<?x?xf32>)
      outs(%init : tensor<?x?xf32>) -> tensor<?x?xf32>
  return %0 : tensor<?x?xf32>
}
```

The dispatch sees an single `init` binding that is read-write. In those
cases we dont need to convert the `linalg.matmul` into a
non-accumulating GEMM. This case can be (and is currently) handled
natively.

ci-extra: test_torch

---------

Signed-off-by: MaheshRavishankar <[email protected]>
Signed-off-by: Keshav Vinayak Jha <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants