Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@yzhang93
Copy link
Contributor

Weight backward convolutions have a special CHWN layout, where the filter sizes (corresponding to output image sizes in forward convolutions) are typically large, while the output spatial dimensions are small. This makes the split reduction strategy particularly effective. This PR adds support to split these convs along the input channel dimension.

Some experimental thresholds are applied to filter out cases that won't benefit from splitting reduction. Particular checks include:

  • When the batch and output channel sizes are large, the workload tends to distributed across many workgroups, making split reduction little to no effect.
  • When the input spatial sizes are small while the batch and output channel sizes are relatively larger (medium size), split reduction often has no effect or even degrades performance.

@yzhang93 yzhang93 force-pushed the split_reduction_backward_conv branch from d447256 to 5cbe5df Compare October 10, 2025 21:26
Copy link
Collaborator

@MaheshRavishankar MaheshRavishankar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly looks good. Juts a minor comment.

Signed-off-by: yzhang93 <[email protected]>
@yzhang93 yzhang93 merged commit d4d74cb into iree-org:main Oct 16, 2025
46 checks passed
weidel-p pushed a commit to weidel-p/iree that referenced this pull request Oct 21, 2025
…e-org#22275)

Weight backward convolutions have a special CHWN layout, where the
filter sizes (corresponding to output image sizes in forward
convolutions) are typically large, while the output spatial dimensions
are small. This makes the split reduction strategy particularly
effective. This PR adds support to split these convs along the input
channel dimension.

Some experimental thresholds are applied to filter out cases that won't
benefit from splitting reduction. Particular checks include:

- When the batch and output channel sizes are large, the workload tends
to distributed across many workgroups, making split reduction little to
no effect.
- When the input spatial sizes are small while the batch and output
channel sizes are relatively larger (medium size), split reduction often
has no effect or even degrades performance.

---------

Signed-off-by: yzhang93 <[email protected]>
Signed-off-by: Philipp <[email protected]>
yzhang93 added a commit that referenced this pull request Nov 3, 2025
… weight backward convs (#22491)

This PR is a follow-up for #22275.
It removes the constraint that only splitting input channel dimension,
and added support to split across multiple dimensions. The heuristics
for setting multi-dimension tile sizes is similar to what is for GEMM
#22357. More than half of the
tracked weight backward shapes are benefiting from this change.

Example runtime comparison for
`convbfp16 -n 16 -c 16 -H 225 -W 225 -k 64 -y 3 -x 3 -p 1 -q 1 -u 1 -v 1
-l 1 -j 1 -m conv -g 1 -F 4 -t 1 --in_layout NHWC --out_layout NHWC
--fil_layout NHWC --iter 100`

- Without split reduction: 19352.8 ms
- Split only the input channel dimension: 1445.1 ms
- Split multiple reduction dimensions: 371.7 ms

---------

Signed-off-by: yzhang93 <[email protected]>
bangtianliu pushed a commit to bangtianliu/iree that referenced this pull request Nov 19, 2025
… weight backward convs (iree-org#22491)

This PR is a follow-up for iree-org#22275.
It removes the constraint that only splitting input channel dimension,
and added support to split across multiple dimensions. The heuristics
for setting multi-dimension tile sizes is similar to what is for GEMM
iree-org#22357. More than half of the
tracked weight backward shapes are benefiting from this change.

Example runtime comparison for
`convbfp16 -n 16 -c 16 -H 225 -W 225 -k 64 -y 3 -x 3 -p 1 -q 1 -u 1 -v 1
-l 1 -j 1 -m conv -g 1 -F 4 -t 1 --in_layout NHWC --out_layout NHWC
--fil_layout NHWC --iter 100`

- Without split reduction: 19352.8 ms
- Split only the input channel dimension: 1445.1 ms
- Split multiple reduction dimensions: 371.7 ms

---------

Signed-off-by: yzhang93 <[email protected]>
pstarkcdpr pushed a commit to pstarkcdpr/iree that referenced this pull request Nov 28, 2025
…e-org#22275)

Weight backward convolutions have a special CHWN layout, where the
filter sizes (corresponding to output image sizes in forward
convolutions) are typically large, while the output spatial dimensions
are small. This makes the split reduction strategy particularly
effective. This PR adds support to split these convs along the input
channel dimension.

Some experimental thresholds are applied to filter out cases that won't
benefit from splitting reduction. Particular checks include:

- When the batch and output channel sizes are large, the workload tends
to distributed across many workgroups, making split reduction little to
no effect.
- When the input spatial sizes are small while the batch and output
channel sizes are relatively larger (medium size), split reduction often
has no effect or even degrades performance.

---------

Signed-off-by: yzhang93 <[email protected]>
pstarkcdpr pushed a commit to pstarkcdpr/iree that referenced this pull request Nov 28, 2025
… weight backward convs (iree-org#22491)

This PR is a follow-up for iree-org#22275.
It removes the constraint that only splitting input channel dimension,
and added support to split across multiple dimensions. The heuristics
for setting multi-dimension tile sizes is similar to what is for GEMM
iree-org#22357. More than half of the
tracked weight backward shapes are benefiting from this change.

Example runtime comparison for
`convbfp16 -n 16 -c 16 -H 225 -W 225 -k 64 -y 3 -x 3 -p 1 -q 1 -u 1 -v 1
-l 1 -j 1 -m conv -g 1 -F 4 -t 1 --in_layout NHWC --out_layout NHWC
--fil_layout NHWC --iter 100`

- Without split reduction: 19352.8 ms
- Split only the input channel dimension: 1445.1 ms
- Split multiple reduction dimensions: 371.7 ms

---------

Signed-off-by: yzhang93 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants