Tags: anderspitman/iree
Tags
Fix low-frequency typos in compiler (non-Codegen). NFC. (4/6) (iree-o… …rg#23604) Preparation for adding a typos pre-commit spell checker (6/6). Co-authored-by: Claude Opus 4.6 <[email protected]>
[Codegen][GPU] Clean up prefetch pipeline stages flag to support inte… …ger values (iree-org#23568) Replaces the boolean `--iree-llvmgpu-enable-prefetch` flag with an integer `--iree-llvmgpu-prefetch-num-stages` flag backed by std::optional<uint64_t>. When unset (default), each code path uses its own heuristic default. 0 or 1 disables pipelining, and 2+ enables pipelining with the specified number of stages. ci-extra: test_torch --------- Signed-off-by: Yu-Zhewen <[email protected]>
[Codegen][GPU] Clean up prefetch pipeline stages flag to support inte… …ger values (iree-org#23568) Replaces the boolean `--iree-llvmgpu-enable-prefetch` flag with an integer `--iree-llvmgpu-prefetch-num-stages` flag backed by std::optional<uint64_t>. When unset (default), each code path uses its own heuristic default. 0 or 1 disables pipelining, and 2+ enables pipelining with the specified number of stages. ci-extra: test_torch --------- Signed-off-by: Yu-Zhewen <[email protected]>
Add --exclude-libs=ALL to libIREECompiler.so shared library (iree-org… …#23574) When building `libIREECompiler.so` in `TheRock` I hit the following error when trying to load the library: ``` LLVM ERROR: Option 'pbqp' already exists! ``` I'll put the full claude generated description of what happened below but the long and the short of it is `libIREECompiler.so` seems to be exporting quite a few symbols when linking LLVM's static `.a`'s. In my case, `TheRock` has another library that loads `libLLVM.so`, and when loaded it [interposes](https://maskray.me/blog/2021-05-16-elf-interposition-and-bsymbolic) some of the global command line registration machinery. Because of the interposition, `libIREECompiler.so`'s command line registration got routed to `libLLVM.so`'s; luckily, rather than some incredibly hard to diagnose error, the double command line registration crashes on startup. `--exclude-libs,ALL` hides symbols from static libraries when linking the final `libIREECompiler.so`, preventing LLVM's symbols from being exported and therefore being interposable. --- Claude generated description ``` Add --exclude-libs=ALL to libIREECompiler.so shared library libIREECompiler.so statically links LLVM/MLIR but exports ~175K internal symbols via its version script (api_version.ld uses `global: *`). The visibility design assumes LLVM is compiled with -fvisibility=hidden, but LLVM's own CMake build only applies -fvisibility-inlines-hidden — not -fvisibility=hidden — to the vast majority of its compilation units (2250/2720 LLVM files, 1049/1145 MLIR files lack the flag). This means all non-inline LLVM symbols default to visibility("default") and leak through the version script. When libIREECompiler.so is loaded via dlopen(RTLD_LOCAL) into a process that already has libLLVM.so in the global scope (e.g. via HIP runtime's libamd_comgr.so → libLLVM.so.22.0git), the ELF dynamic linker searches the global scope first when resolving relocations. LLVM function calls in libIREECompiler.so that go through the PLT/GOT resolve to libLLVM.so's copies rather than the compiler's own static LLVM. This causes LLVM cl::Option static initializers to register duplicate options into libLLVM.so's global registry, crashing with: LLVM ERROR: Option 'pbqp' already exists! The fix adds --exclude-libs=ALL to the SharedImpl linker options, which forces all symbols from static archives (.a files) to hidden visibility at link time. Only symbols from object files (.o) compiled with explicit __attribute__((visibility("default"))) annotations — i.e. the IREE and MLIR C API functions marked with IREE_EMBED_EXPORTED / MLIR_CAPI_EXPORTED — remain visible. This reduces exported dynamic symbols from ~274K to ~2.9K. This matches the pattern already used by MLIR's own MLIR-C shared library (mlir/lib/CAPI/CMakeLists.txt): target_link_options(MLIR-C PRIVATE "-Wl,-exclude-libs,ALL") ``` ci-extra: all --------- Signed-off-by: Bangtian Liu <[email protected]> Co-authored-by: Claude Opus 4.6 <[email protected]> Co-authored-by: Bangtian Liu <[email protected]>
Fix Vulkan driver crash from UNIMPLEMENTED `query_capabilities`. (ire… …e-org#23582) iree-org#23576 added `query_capabilities` to the device vtable but left the Vulkan implementation as a bare `UNIMPLEMENTED` return. Every other driver (local_task, local_sync, HIP, CUDA, Metal, null, AMDGPU) got the correct `memset + ok_status` stub. The Vulkan driver was the sole holdout, which meant device group creation — now required by the HAL module — failed immediately for any Vulkan device. This broke every Vulkan CI job, but I didn't notice because the AMD GPU runners have been flaky for months and I'd stopped reading those failures carefully. Lesson learned. Co-authored-by: Claude <[email protected]>
[Async] Fix multishot CTS test flakes: use blocking waits for complet… …ions (iree-org#23577) Maybe this time? The advantage of the crappy gh runners is that they do expose very particular timing issues, I suppose. DrainPending uses immediate (0ms) timeout and breaks on the first empty poll — designed for draining backlogs, not waiting for future events. Sends complete eagerly via writev at submit time, so PollUntil during the send loop is satisfied by send completions alone. The recv may not have been processed by the poll thread yet when DrainPending runs, and it immediately gives up. Replace DrainPending with blocking PollUntil loops that wait until the expected completions actually arrive, matching the pattern already used in MultishotAccept_ListenerClose and MultishotRecv_ConnectionClose. Co-authored-by: Claude <[email protected]>
[Async] Split CTS sync tests into parallel binaries and eliminate tim… …ing deps (iree-org#23553) Split the monolithic sync_tests binary (which ran all sync CTS tests serially per backend) into four parallel test targets: - sync_notification_tests (cancellation, notification, signal) - sync_platform_tests (platform-specific fences, primitive relays) - sync_relay_tests (portable notification-to-notification relays) - sync_semaphore_tests (timeline semantics, async ops, linked chains) This reduces the io_uring CTS critical path from ~13s to ~1.3s by allowing Bazel/CTest to schedule the four binaries concurrently and eliminating all timing-based synchronization from the relay tests. The original relay tests used waiter threads with 20ms settle waits and polling loops with 100ms timeouts — timing dependencies that are fragile on slow CI runners and accounted for ~4s of the old 13s critical path. The new approach uses two verification strategies: For most relay tests: verify relay side effects via epoch queries on the sink notification. DrainPending processes relay CQEs synchronously with immediate-timeout polls; after it returns, the epoch reflects whether the relay fired. No threads, no polling loops, no timing. For TSAN cross-thread coverage: a dedicated NotificationRelayWakesCrossThread test uses an iree_notification_t gate with an epoch-checking predicate. The baseline epoch is captured before spawning the waiter thread, avoiding the three-actor race inherent in iree_async_notification_wait (which captures its baseline epoch internally at function entry — if the relay fires first, the waiter sees the already-incremented epoch as baseline and blocks until timeout). The gate's prepare/commit/cancel protocol makes this race-free across all thread interleavings. The same gate pattern fixes the PrimitiveToNotification test in relay_posix_test.cc, which had the same race bug. All other POSIX and Windows relay tests now use the simpler DrainPending + synchronous check pattern, removing WaitPrimitiveReadable and WaitEventSignaled helpers. Tested: ASAN 400/400 runs (4 targets x 100), TSAN clean, Windows IOCP, macOS kqueue. --------- Co-authored-by: Claude <[email protected]>
[Codegen][Common] Add InsertBatchDimForBatchlessConv pass for 2D Conv (… …iree-org#23351) Most of it has been taken from : iree-org#21955 (comment) (CC: @hanhanW )- refactored the logic, cleaned up and tested for the generic conv tests locally. Context: Upstream MLIR's linalg::inferConvolutionDims and related APIs (matchConvolutionOpOfType, isaConvolutionOpInterface) expect convolutions to have a batch dimension. These APIs are used by DownscaleConv patterns and vectorization to recognize and optimize convolution operations. However, IREE's dispatch formation pipeline strips unit dimensions (including N=1 batch dimensions) via fold-unit-extent-dims. So after generalization step, a conv_2d_nhwc_hwcf with N=1 becomes a 6-loop generic op instead of the expected 7-loop structure, causing the upstream APIs to fail pattern matching. This pass restores the batch dimension for such "batchless" 2D convolutions by inserting tensor.expand_shape on inputs/outputs and tensor.collapse_shape on results. This allows the upstream convolution detection APIs to recognize the operation, enabling DownscaleConv and vectorization patterns to apply. The reshape operations are propagated to dispatch boundaries and folded into dispatch tensor load/store operations, resulting in zero runtime cost. Lit tests added for supported operations: - Conv2DNhwcHwcf, Conv2DNchwFchw - PoolingNhwcSum/Max/Min, PoolingNhwcMaxUnsigned/MinUnsigned - PoolingNchwSum/Max - DepthwiseConv2DNhwcHwc Signed-off-by: Abhishek Varma <[email protected]>
[docs] Fix invalid flag names and typos (iree-org#23542) I'm testing a new multi-model review tool, it reported some outdated docs: - Remove nonexistent `--iree-codegen-gpu-native-math-precision` flag from SDXL golden outputs guide (found by GPT-5.3) - Fix `CMake_BUILD_TYPE` -> `CMAKE_BUILD_TYPE` case typo in Emscripten build docs (found by GPT-5.3) - Fix `iree-import-tf -help` -> `--help` in TensorFlow guide; the tool uses Python argparse which requires `-h` or `--help` (found by GPT-5.3) Findings from Gemini Flash confirmed issues 1. and 3. independently. Co-authored-by: Claude Opus 4.6 <[email protected]>
Integrate third-party/benchmark to v1.9.5 (iree-org#23532) This pulls in google/benchmark#2108 , which fixes a build error when using extremely new Clangs, which can occur during gfx1250 testing. Note that this does bump us past google/benchmark#1836, which gets rid of a division by number of threads on certain reported time values. I don't know if this impacts us, but I'm going to flag it anyway.
PreviousNext