Tags: xgupta/iree
Tags
[LinalgExt] Add OuterReduction tiling strategy for ArgCompareOp (iree… …-org#23102) This PR extends ArgCompareOp's PartialReductionOpInterface to support the OuterReduction tiling strategy in addition to the existing OuterParallel (Split-Reduction) strategy. ## Example For `arg_compare` on `tensor<64x4096xf32>` with reduction `dim=1` and `tile_size=128`: **OuterParallel** (existing) - each chunk writes to a separate slot: ```mlir // Partial results: tensor<64x32xf32>, tensor<64x32xi32> (32 chunks) %results:2 = scf.forall (%chunk_idx) = (0) to (4096) step (128) shared_outs(%val = %init_val, %idx = %init_idx) { %slice = tensor.extract_slice %input[0, %chunk_idx] [64, 128] [1, 1] %partial:2 = iree_linalg_ext.arg_compare dim(1) ins(%slice) outs(...) scf.forall.in_parallel { tensor.parallel_insert_slice %partial#0 into %val[0, %chunk_idx] [64, 1] tensor.parallel_insert_slice %partial#1 into %idx[0, %chunk_idx] [64, 1] } } %final:2 = linalg.reduce ins(%results#0, %results#1) dims=[1] ``` `OuterReduction `(this PR) - accumulates in place each iteration: ```mlir // Partial results: tensor<64x128xf32>, tensor<64x128xi32> (tile shape) %results:2 = scf.for %iv = 0 to 4096 step 128 iter_args(%val = %init_val, %idx = %init_idx) { %slice = tensor.extract_slice %input[0, %iv] [64, 128] [1, 1] %updated:2 = linalg.generic ins(%slice) outs(%val, %idx) { ^bb0(%new: f32, %acc_val: f32, %acc_idx: i32): %global_idx = arith.addi %iv, %local_idx // track position %cmp = arith.cmpf ogt, %new, %acc_val %sel_val = arith.select %cmp, %new, %acc_val %sel_idx = arith.select %cmp, %global_idx, %acc_idx linalg.yield %sel_val, %sel_idx } scf.yield %updated#0, %updated#1 } %final:2 = linalg.reduce ins(%results#0, %results#1) dims=[1] ``` This is one necessary step for plumbing through ArgCompare along VectorDistribute pipeline. Issue: iree-org#23005 --------- Signed-off-by: Bangtian Liu <[email protected]>
iree-bazel-* improvements for handling multiple targets + options. (i… …ree-org#23330) iree-bazel-try: - supports --features, useful for --features=thin_lto - workaround for thin_lto + Wno-unused-command-line-argument - --copt and --linkopt in a way that ensures the entire build is configured with the options (prior only the try target was, which was useful in isolated testing but not when benchmarking/etc) - uses a new output base (so features/copt/linkopt don't pollute the normal build, good for concurrent try + build/test) - fixed caching of files passed in by path - fixed files passed in by path to have original source locations iree-bazel-test/build/fuzz: - support multiple targets (`iree-bazel-test //:a //:b`) - this allows multiple fuzzers to run in batched mode iree-bazel-cquery: - added to match iree-bazel-query so we have the pair misc: - `target_compatible_with`/platform `select` support in bazel-to-cmake - fixing benchmark warnings about missing unit - fixed a bug in cc_benchmark dropping extra args on benchmark tests --------- Co-authored-by: Claude <[email protected]>
[GPU] MmaSchedule configuration crashes when lacking PerfTflops (iree… …-org#23303) getPerfTflops may return a null dictionary. In these cases we should treat it as empty. Signed-off-by: Rob Suderman <[email protected]>
[Codegen] Use safer hoisting in OptimizeTensorInsertExtractSlices (ir… …ee-org#23280) Use the `moveLoopInvariantCodeFromGuaranteedLoops` transform instead of the `moveLoopInvariantCode` transform in the OptimizeTensorInsertExtractSlices pass. This transform is safer, because it validates that loops will be executed at least once before hoisting loop invariant code. Hoisting from loops that may not execute is not an optimization, so this is a better version of the transformation. The new safer transform also hoists from linalg.generic ops, so the `moveLoopInvariantCodeFromGenericOps` is removed, since it is no longer used. This PR also removes the `_batch_matmul_narrow_n_2_dispatch_4_unpack_i32` test, which was doing nothing but checking that a tensor.empty op gets hoisted from an scf.for loop (which cannot be guaranteed to execute). Hoisting empty tensors is not the job of this pass, and the test is verbose, so the test is simply removed. Signed-off-by: Max Dawkins <[email protected]>
[e2e] Increase test timeout for gfx1250 (iree-org#23286) Expose timeout as an optional argument in `iree_native_test` in matmul tests. Regression tests already know how to translate the bazel timeout parameter to seconds. Assisted-by: claude
[NFC] Make status test macros take ownership of iree_status_t. (iree-… …org#23276) Adds `ConsumeForTest` overloads that wrap raw `iree_status_t` in `iree::Status` RAII wrappers, ensuring automatic cleanup on test failure. The macros `IREE_EXPECT_OK`, `IREE_ASSERT_OK`, `IREE_EXPECT_STATUS_IS`, and `IREE_ASSERT_STATUS_IS` now consume the status they test. For lvalue status variables, the source is cleared to a code-only value so any existing `iree_status_ignore`/`iree_status_free` calls become harmless no-ops. This allows incremental migration without breaking existing tests. Most tests (outside of tokenizer) have been updated. tokenizer is being reworked and the next feature branch merge will adopt this behavior. --------- Co-authored-by: Claude <[email protected]>
[CPU][NFC] Fix incorrect mmt4d dimension names in comments. (iree-org… …#23234) The comments in KernelDispatch.cpp had the mmt4d dimension naming backwards. The six dimensions are M1, N1, K1, M0, N0, K0. - Result shape: BxM0xN0xM1xN1 → BxM1xN1xM0xN0 - getMmt4dInnerTileSizes returns M0/N0, not M1/N1 - Iteration domain: m0, n0, k0, m1, n1, k1 → M1, N1, K1, M0, N0, K0 Signed-off-by: hanhanW <[email protected]>
[CPU][NFC] Fix incorrect mmt4d dimension names in comments. (iree-org… …#23234) The comments in KernelDispatch.cpp had the mmt4d dimension naming backwards. The six dimensions are M1, N1, K1, M0, N0, K0. - Result shape: BxM0xN0xM1xN1 → BxM1xN1xM0xN0 - getMmt4dInnerTileSizes returns M0/N0, not M1/N1 - Iteration domain: m0, n0, k0, m1, n1, k1 → M1, N1, K1, M0, N0, K0 Signed-off-by: hanhanW <[email protected]>
Integrate LLVM@5c35af8f1e6ebc7c32 (iree-org#23252) Reverts carried forward: * Local revert of llvm/llvm-project#169614 due to iree-org#22649 Other changes: * Fixes lit tests to account for llvm/llvm-project#174452
Reapply "LLVM Integrate@6cc18a8e4338 (iree-org#23226)" (iree-org#23236) This reverts commit 8ca6c8f13398c5bbe961e9bc874d6b3de398e5e8. Also uses `visitNonControlFlowArguments` new API since llvm/llvm-project#175815
PreviousNext