[Codegen] Use safer hoisting in OptimizeTensorInsertExtractSlices#23280
Conversation
|
I'm going to convert to draft until I have the pipeline tests under control. There is an annoying issue with broadcast in the VectorDistribute pipeline. |
|
Here's the issue with broadcast in VectorDistribute: #23283 |
|
The hack we could do is to prevent hoisting from scf.forall ops in OptimizeTensorInsertExtractSlices, since that is what is causing the bug, but it is just masking the real issue IMO. Otherwise, maybe this deserves a longer form discussion. |
I commented on the issue a potential fix that should fix the dps chain. This is fixing a real issue on vector distribute where we aren't hoisting out the redundant computation so we should try to fix it. If it doesn't work, maybe add an option to not fuse out of scf.forall and enable it in VectorDistribute and i'll have a look at it. |
|
The suggestion from @Groverkss worked, so once #23285 is merged, I'll rebase this one and it should be good to go. Thanks for the quick suggestion Kunwar! |
81c9ff4 to
ac8b09d
Compare
Signed-off-by: Max Dawkins <[email protected]>
ac8b09d to
f503a98
Compare
…3280) Use the `moveLoopInvariantCodeFromGuaranteedLoops` transform instead of the `moveLoopInvariantCode` transform in the OptimizeTensorInsertExtractSlices pass. This transform is safer, because it validates that loops will be executed at least once before hoisting loop invariant code. Hoisting from loops that may not execute is not an optimization, so this is a better version of the transformation. The new safer transform also hoists from linalg.generic ops, so the `moveLoopInvariantCodeFromGenericOps` is removed, since it is no longer used. This PR also removes the `_batch_matmul_narrow_n_2_dispatch_4_unpack_i32` test, which was doing nothing but checking that a tensor.empty op gets hoisted from an scf.for loop (which cannot be guaranteed to execute). Hoisting empty tensors is not the job of this pass, and the test is verbose, so the test is simply removed. Signed-off-by: Max Dawkins <[email protected]>
…ee-org#23280) Use the `moveLoopInvariantCodeFromGuaranteedLoops` transform instead of the `moveLoopInvariantCode` transform in the OptimizeTensorInsertExtractSlices pass. This transform is safer, because it validates that loops will be executed at least once before hoisting loop invariant code. Hoisting from loops that may not execute is not an optimization, so this is a better version of the transformation. The new safer transform also hoists from linalg.generic ops, so the `moveLoopInvariantCodeFromGenericOps` is removed, since it is no longer used. This PR also removes the `_batch_matmul_narrow_n_2_dispatch_4_unpack_i32` test, which was doing nothing but checking that a tensor.empty op gets hoisted from an scf.for loop (which cannot be guaranteed to execute). Hoisting empty tensors is not the job of this pass, and the test is verbose, so the test is simply removed. Signed-off-by: Max Dawkins <[email protected]>
Use the
moveLoopInvariantCodeFromGuaranteedLoopstransform instead of themoveLoopInvariantCodetransform in the OptimizeTensorInsertExtractSlices pass. This transform is safer, because it validates that loops will be executed at least once before hoisting loop invariant code. Hoisting from loops that may not execute is not an optimization, so this is a better version of the transformation.The new safer transform also hoists from linalg.generic ops, so the
moveLoopInvariantCodeFromGenericOpsis removed, since it is no longer used.This PR also removes the
_batch_matmul_narrow_n_2_dispatch_4_unpack_i32test, which was doing nothing but checking that a tensor.empty op gets hoisted from an scf.for loop (which cannot be guaranteed to execute). Hoisting empty tensors is not the job of this pass, and the test is verbose, so the test is simply removed.