[DispatchCreation] Set split reduction size for GEMM with large k dim #22357

yzhang93 · 2025-10-20T22:21:27Z

This PR adds basic support for setting split reduction size for matmul-like ops with large K dim. Note that the constant thresholds are empirically chosen based on limited data (1x1 filter weight backward convs) and may not generalize to all cases. It's challenging to find a single threshold to apply for all shapes. The bottom line is to improve the performance for extremely large K cases while not to degrade many smaller shapes.

Signed-off-by: yzhang93 <[email protected]>

kuhar · 2025-10-22T20:58:32Z

compiler/src/iree/compiler/DispatchCreation/test/set_split_reduction_sizes_gemm.mlir

+// RUN: iree-opt %s --pass-pipeline="builtin.module(util.func(iree-dispatch-creation-set-split-reduction-sizes))" --split-input-file > %t
+// RUN: FileCheck %s < %t


we should pipe to filecheck

yeah, sure. This was simply copied from other tests , but I can modify those as well.

kuhar · 2025-10-22T21:00:24Z

compiler/src/iree/compiler/DispatchCreation/SetSplitReductionSizes.cpp

+    for (int64_t i = 0; i < tileSizes.size(); i++) {
+      int64_t lowerBound = llvm::divideCeil(tileSizes[i], limitParallelLoops);


use enumerate(tileSizes)

MaheshRavishankar

Just a few nit, and clarifications. Otherwise looks good.

MaheshRavishankar · 2025-10-23T20:33:58Z

compiler/src/iree/compiler/DispatchCreation/SetSplitReductionSizes.cpp

+      return std::nullopt;
+    }
+
+    if (linalgOp.getNumParallelLoops() < 2) {


Why? You plan to revisit that later?

This is to guarantee that the op is matmul-like, similar to this

iree/compiler/src/iree/compiler/Dialect/Flow/Transforms/AnnotateDispatches.cpp

Line 223 in 86ae61a

linalgOp.getNumParallelLoops() >= 2;

That is more for naming purposes. It's fine for now, but if there is a batch dimension that will be three parallel loops (and you are accounting for that below)

MaheshRavishankar · 2025-10-23T20:40:56Z

compiler/src/iree/compiler/DispatchCreation/SetSplitReductionSizes.cpp

+    SmallVector<int64_t> tileSizes = std::move(*maybeSizes);
+    int64_t outputSize = mSize * nSize * batchSize;
+    int64_t limitParallelLoops;
+    if (outputSize < 16 * 16) {


I am not fully understanding the link between outputSize and limitParallelLoops. The limitParallelLoops seems to be increasing as the outputSize is smaller. I think a better name might help, and I can suggest once I understand it better.

limitParallelLoops is set as the number limit for parallel loops from split reduction which does not include the number of workgroups. So when the output size is small, it is more likely to distribute to less workgroups, and thus we need more parallel loops from split reduction.

Ok that make sense

Signed-off-by: yzhang93 <[email protected]>

…iree-org#22357) This PR adds basic support for setting split reduction size for matmul-like ops with large K dim. Note that the constant thresholds are empirically chosen based on limited data (1x1 filter weight backward convs) and may not generalize to all cases. It's challenging to find a single threshold to apply for all shapes. The bottom line is to improve the performance for extremely large K cases while not to degrade many smaller shapes. --------- Signed-off-by: yzhang93 <[email protected]>

… weight backward convs (#22491) This PR is a follow-up for #22275. It removes the constraint that only splitting input channel dimension, and added support to split across multiple dimensions. The heuristics for setting multi-dimension tile sizes is similar to what is for GEMM #22357. More than half of the tracked weight backward shapes are benefiting from this change. Example runtime comparison for `convbfp16 -n 16 -c 16 -H 225 -W 225 -k 64 -y 3 -x 3 -p 1 -q 1 -u 1 -v 1 -l 1 -j 1 -m conv -g 1 -F 4 -t 1 --in_layout NHWC --out_layout NHWC --fil_layout NHWC --iter 100` - Without split reduction: 19352.8 ms - Split only the input channel dimension: 1445.1 ms - Split multiple reduction dimensions: 371.7 ms --------- Signed-off-by: yzhang93 <[email protected]>

…iree-org#22357) This PR adds basic support for setting split reduction size for matmul-like ops with large K dim. Note that the constant thresholds are empirically chosen based on limited data (1x1 filter weight backward convs) and may not generalize to all cases. It's challenging to find a single threshold to apply for all shapes. The bottom line is to improve the performance for extremely large K cases while not to degrade many smaller shapes. --------- Signed-off-by: yzhang93 <[email protected]>

… weight backward convs (iree-org#22491) This PR is a follow-up for iree-org#22275. It removes the constraint that only splitting input channel dimension, and added support to split across multiple dimensions. The heuristics for setting multi-dimension tile sizes is similar to what is for GEMM iree-org#22357. More than half of the tracked weight backward shapes are benefiting from this change. Example runtime comparison for `convbfp16 -n 16 -c 16 -H 225 -W 225 -k 64 -y 3 -x 3 -p 1 -q 1 -u 1 -v 1 -l 1 -j 1 -m conv -g 1 -F 4 -t 1 --in_layout NHWC --out_layout NHWC --fil_layout NHWC --iter 100` - Without split reduction: 19352.8 ms - Split only the input channel dimension: 1445.1 ms - Split multiple reduction dimensions: 371.7 ms --------- Signed-off-by: yzhang93 <[email protected]>

…iree-org#22357) This PR adds basic support for setting split reduction size for matmul-like ops with large K dim. Note that the constant thresholds are empirically chosen based on limited data (1x1 filter weight backward convs) and may not generalize to all cases. It's challenging to find a single threshold to apply for all shapes. The bottom line is to improve the performance for extremely large K cases while not to degrade many smaller shapes. --------- Signed-off-by: yzhang93 <[email protected]>

… weight backward convs (iree-org#22491) This PR is a follow-up for iree-org#22275. It removes the constraint that only splitting input channel dimension, and added support to split across multiple dimensions. The heuristics for setting multi-dimension tile sizes is similar to what is for GEMM iree-org#22357. More than half of the tracked weight backward shapes are benefiting from this change. Example runtime comparison for `convbfp16 -n 16 -c 16 -H 225 -W 225 -k 64 -y 3 -x 3 -p 1 -q 1 -u 1 -v 1 -l 1 -j 1 -m conv -g 1 -F 4 -t 1 --in_layout NHWC --out_layout NHWC --fil_layout NHWC --iter 100` - Without split reduction: 19352.8 ms - Split only the input channel dimension: 1445.1 ms - Split multiple reduction dimensions: 371.7 ms --------- Signed-off-by: yzhang93 <[email protected]>

[DispatchCreation] Set split reduction size for GEMM with large k dim

3a38442

Signed-off-by: yzhang93 <[email protected]>

yzhang93 requested review from IanWood1, MaheshRavishankar and hanhanW as code owners October 20, 2025 22:21

yzhang93 requested a review from rkayaith October 20, 2025 22:25

yzhang93 mentioned this pull request Oct 21, 2025

LLVM Translation failed on some GEMM with split reduction #22367

Closed

yzhang93 requested review from Groverkss, antiagainst, kuhar and qedawkins as code owners October 22, 2025 20:45

yzhang93 force-pushed the split_reduction_gemm branch from f74c75e to 3bdf63c Compare October 22, 2025 20:48

yzhang93 removed request for Groverkss, antiagainst, kuhar and qedawkins October 22, 2025 20:49

Split along multi dimensions

3bdf63c

Signed-off-by: yzhang93 <[email protected]>

kuhar reviewed Oct 22, 2025

View reviewed changes

MaheshRavishankar reviewed Oct 23, 2025

View reviewed changes

MaheshRavishankar approved these changes Oct 23, 2025

View reviewed changes

Address comments

a40b4f5

Signed-off-by: yzhang93 <[email protected]>

yzhang93 merged commit 4b57016 into iree-org:main Oct 24, 2025
45 checks passed

yzhang93 mentioned this pull request Oct 30, 2025

[DispatchCreation] Enable splitting multiple reduction dimensions for weight backward convs #22491

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DispatchCreation] Set split reduction size for GEMM with large k dim #22357

[DispatchCreation] Set split reduction size for GEMM with large k dim #22357

Uh oh!

yzhang93 commented Oct 20, 2025 •

edited

Loading

Uh oh!

kuhar Oct 22, 2025

Uh oh!

yzhang93 Oct 22, 2025

Uh oh!

kuhar Oct 22, 2025

Uh oh!

yzhang93 Oct 23, 2025

Uh oh!

MaheshRavishankar left a comment

Uh oh!

MaheshRavishankar Oct 23, 2025

Uh oh!

yzhang93 Oct 23, 2025

Uh oh!

MaheshRavishankar Oct 23, 2025

Uh oh!

MaheshRavishankar Oct 23, 2025

Uh oh!

yzhang93 Oct 23, 2025

Uh oh!

MaheshRavishankar Oct 23, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		// RUN: iree-opt %s --pass-pipeline="builtin.module(util.func(iree-dispatch-creation-set-split-reduction-sizes))" --split-input-file > %t
		// RUN: FileCheck %s < %t

		for (int64_t i = 0; i < tileSizes.size(); i++) {
		int64_t lowerBound = llvm::divideCeil(tileSizes[i], limitParallelLoops);

[DispatchCreation] Set split reduction size for GEMM with large k dim #22357

[DispatchCreation] Set split reduction size for GEMM with large k dim #22357

Uh oh!

Conversation

yzhang93 commented Oct 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MaheshRavishankar left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

yzhang93 commented Oct 20, 2025 •

edited

Loading