Codestin Search App

iree-3.11.0rc20260302

Fix low-frequency typos in compiler (non-Codegen). NFC. (4/6) (iree-o…

…rg#23604)

Preparation for adding a typos pre-commit spell checker (6/6).

Co-authored-by: Claude Opus 4.6 <[email protected]>

Mar 1, 2026
fb7e890
zip
tar.gz

iree-3.11.0rc20260301

[Codegen][GPU] Clean up prefetch pipeline stages flag to support inte…

…ger values (iree-org#23568)

Replaces the boolean `--iree-llvmgpu-enable-prefetch` flag with an
integer `--iree-llvmgpu-prefetch-num-stages` flag backed by
std::optional<uint64_t>. When unset (default), each code path uses its
own heuristic default. 0 or 1 disables pipelining, and 2+ enables
pipelining with the specified number of stages.

ci-extra: test_torch

---------

Signed-off-by: Yu-Zhewen <[email protected]>

Feb 27, 2026
c18957b
zip
tar.gz

iree-3.11.0rc20260228

[Codegen][GPU] Clean up prefetch pipeline stages flag to support inte…

…ger values (iree-org#23568)

Replaces the boolean `--iree-llvmgpu-enable-prefetch` flag with an
integer `--iree-llvmgpu-prefetch-num-stages` flag backed by
std::optional<uint64_t>. When unset (default), each code path uses its
own heuristic default. 0 or 1 disables pipelining, and 2+ enables
pipelining with the specified number of stages.

ci-extra: test_torch

---------

Signed-off-by: Yu-Zhewen <[email protected]>

Feb 27, 2026
c18957b
zip
tar.gz

iree-3.11.0rc20260227

Add --exclude-libs=ALL to libIREECompiler.so shared library (iree-org…

…#23574)

When building `libIREECompiler.so` in `TheRock` I hit the following
error when trying to load the library:

```
LLVM ERROR: Option 'pbqp' already exists!
```

I'll put the full claude generated description of what happened below
but the long and the short of it is `libIREECompiler.so` seems to be
exporting quite a few symbols when linking LLVM's static `.a`'s. In my
case, `TheRock` has another library that loads `libLLVM.so`, and when
loaded it
[interposes](https://maskray.me/blog/2021-05-16-elf-interposition-and-bsymbolic)
some of the global command line registration machinery. Because of the
interposition, `libIREECompiler.so`'s command line registration got
routed to `libLLVM.so`'s; luckily, rather than some incredibly hard to
diagnose error, the double command line registration crashes on startup.

`--exclude-libs,ALL` hides symbols from static libraries when linking
the final `libIREECompiler.so`, preventing LLVM's symbols from being
exported and therefore being interposable.

---

Claude generated description

```
Add --exclude-libs=ALL to libIREECompiler.so shared library
libIREECompiler.so statically links LLVM/MLIR but exports ~175K internal
symbols via its version script (api_version.ld uses `global: *`). The
visibility design assumes LLVM is compiled with -fvisibility=hidden, but
LLVM's own CMake build only applies -fvisibility-inlines-hidden — not
-fvisibility=hidden — to the vast majority of its compilation units
(2250/2720 LLVM files, 1049/1145 MLIR files lack the flag). This means
all non-inline LLVM symbols default to visibility("default") and leak
through the version script.

When libIREECompiler.so is loaded via dlopen(RTLD_LOCAL) into a process
that already has libLLVM.so in the global scope (e.g. via HIP runtime's
libamd_comgr.so → libLLVM.so.22.0git), the ELF dynamic linker searches
the global scope first when resolving relocations. LLVM function calls
in libIREECompiler.so that go through the PLT/GOT resolve to
libLLVM.so's copies rather than the compiler's own static LLVM. This
causes LLVM cl::Option static initializers to register duplicate options
into libLLVM.so's global registry, crashing with:

  LLVM ERROR: Option 'pbqp' already exists!

The fix adds --exclude-libs=ALL to the SharedImpl linker options, which
forces all symbols from static archives (.a files) to hidden visibility
at link time. Only symbols from object files (.o) compiled with explicit
__attribute__((visibility("default"))) annotations — i.e. the IREE and
MLIR C API functions marked with IREE_EMBED_EXPORTED / MLIR_CAPI_EXPORTED
— remain visible. This reduces exported dynamic symbols from ~274K to
~2.9K.

This matches the pattern already used by MLIR's own MLIR-C shared
library (mlir/lib/CAPI/CMakeLists.txt):

  target_link_options(MLIR-C PRIVATE "-Wl,-exclude-libs,ALL")
  ```
  
ci-extra: all

---------

Signed-off-by: Bangtian Liu <[email protected]>
Co-authored-by: Claude Opus 4.6 <[email protected]>
Co-authored-by: Bangtian Liu <[email protected]>

Feb 27, 2026
1fe030b
zip
tar.gz

iree-3.11.0rc20260226

Fix Vulkan driver crash from UNIMPLEMENTED `query_capabilities`. (ire…

…e-org#23582)

iree-org#23576 added `query_capabilities` to the device vtable but left the
Vulkan implementation as a bare `UNIMPLEMENTED` return. Every other
driver (local_task, local_sync, HIP, CUDA, Metal, null, AMDGPU) got the
correct `memset + ok_status` stub. The Vulkan driver was the sole
holdout, which meant device group creation — now required by the HAL
module — failed immediately for any Vulkan device.

This broke every Vulkan CI job, but I didn't notice because the AMD GPU
runners have been flaky for months and I'd stopped reading those
failures carefully. Lesson learned.

Co-authored-by: Claude <[email protected]>

Feb 25, 2026
6423b51
zip
tar.gz

iree-3.11.0rc20260225

[Async] Fix multishot CTS test flakes: use blocking waits for complet…

…ions (iree-org#23577)

Maybe this time? The advantage of the crappy gh runners is that they do
expose very particular timing issues, I suppose.

DrainPending uses immediate (0ms) timeout and breaks on the first empty
poll — designed for draining backlogs, not waiting for future events.
Sends complete eagerly via writev at submit time, so PollUntil during
the send loop is satisfied by send completions alone. The recv may not
have been processed by the poll thread yet when DrainPending runs, and
it immediately gives up.

Replace DrainPending with blocking PollUntil loops that wait until the
expected completions actually arrive, matching the pattern already used
in MultishotAccept_ListenerClose and MultishotRecv_ConnectionClose.

Co-authored-by: Claude <[email protected]>

Feb 25, 2026
0dca45a
zip
tar.gz

iree-3.11.0rc20260224

[Async] Split CTS sync tests into parallel binaries and eliminate tim…

…ing deps (iree-org#23553)

Split the monolithic sync_tests binary (which ran all sync CTS tests
serially per backend) into four parallel test targets:
  - sync_notification_tests (cancellation, notification, signal)
  - sync_platform_tests (platform-specific fences, primitive relays)
  - sync_relay_tests (portable notification-to-notification relays)
  - sync_semaphore_tests (timeline semantics, async ops, linked chains)

This reduces the io_uring CTS critical path from ~13s to ~1.3s by
allowing Bazel/CTest to schedule the four binaries concurrently and
eliminating all timing-based synchronization from the relay tests.

The original relay tests used waiter threads with 20ms settle waits and
polling loops with 100ms timeouts — timing dependencies that are fragile
on slow CI runners and accounted for ~4s of the old 13s critical path.

The new approach uses two verification strategies:

For most relay tests: verify relay side effects via epoch queries on the
sink notification. DrainPending processes relay CQEs synchronously with
immediate-timeout polls; after it returns, the epoch reflects whether
the relay fired. No threads, no polling loops, no timing.

For TSAN cross-thread coverage: a dedicated
NotificationRelayWakesCrossThread test uses an iree_notification_t gate
with an epoch-checking predicate. The baseline epoch is captured before
spawning the waiter thread, avoiding the three-actor race inherent in
iree_async_notification_wait (which captures its baseline epoch
internally at function entry — if the relay fires first, the waiter sees
the already-incremented epoch as baseline and blocks until timeout). The
gate's prepare/commit/cancel protocol makes this race-free across all
thread interleavings.

The same gate pattern fixes the PrimitiveToNotification test in
relay_posix_test.cc, which had the same race bug. All other POSIX and
Windows relay tests now use the simpler DrainPending + synchronous check
pattern, removing WaitPrimitiveReadable and WaitEventSignaled helpers.

Tested: ASAN 400/400 runs (4 targets x 100), TSAN clean, Windows IOCP,
macOS kqueue.

---------

Co-authored-by: Claude <[email protected]>

Feb 23, 2026
018e603
zip
tar.gz

iree-3.11.0rc20260223

[Codegen][Common] Add InsertBatchDimForBatchlessConv pass for 2D Conv (…

…iree-org#23351)

Most of it has been taken from :
iree-org#21955 (comment)
(CC: @hanhanW )- refactored the logic, cleaned up and tested for the
generic conv tests locally.

Context: 
Upstream MLIR's linalg::inferConvolutionDims and related APIs
(matchConvolutionOpOfType, isaConvolutionOpInterface) expect
convolutions to have a batch dimension. These APIs are used by
DownscaleConv patterns and vectorization to recognize and optimize
convolution operations.

However, IREE's dispatch formation pipeline strips unit dimensions
(including N=1 batch dimensions) via fold-unit-extent-dims. So after
generalization step, a conv_2d_nhwc_hwcf with N=1 becomes a 6-loop
generic op instead of the expected 7-loop structure, causing the
upstream APIs to fail pattern matching.

This pass restores the batch dimension for such "batchless" 2D
convolutions by inserting tensor.expand_shape on inputs/outputs and
tensor.collapse_shape on results. This allows the upstream convolution
detection APIs to recognize the operation, enabling DownscaleConv and
vectorization patterns to apply.

The reshape operations are propagated to dispatch boundaries and folded
into dispatch tensor load/store operations, resulting in zero runtime
cost.

Lit tests added for supported operations:
- Conv2DNhwcHwcf, Conv2DNchwFchw
- PoolingNhwcSum/Max/Min, PoolingNhwcMaxUnsigned/MinUnsigned
- PoolingNchwSum/Max
- DepthwiseConv2DNhwcHwc

Signed-off-by: Abhishek Varma <[email protected]>

Feb 23, 2026
40c28a1
zip
tar.gz

iree-3.11.0rc20260222

[docs] Fix invalid flag names and typos (iree-org#23542)

I'm testing a new multi-model review tool, it reported some outdated
docs:

- Remove nonexistent `--iree-codegen-gpu-native-math-precision` flag
from SDXL golden outputs guide (found by GPT-5.3)
- Fix `CMake_BUILD_TYPE` -> `CMAKE_BUILD_TYPE` case typo in Emscripten
build docs (found by GPT-5.3)
- Fix `iree-import-tf -help` -> `--help` in TensorFlow guide; the tool
uses Python argparse which requires `-h` or `--help` (found by GPT-5.3)

Findings from Gemini Flash confirmed issues 1. and 3. independently.

Co-authored-by: Claude Opus 4.6 <[email protected]>

Feb 21, 2026
c23abe7
zip
tar.gz

iree-3.11.0rc20260221

Integrate third-party/benchmark to v1.9.5 (iree-org#23532)

This pulls in google/benchmark#2108 , which
fixes a build error when using extremely new Clangs, which can occur
during gfx1250 testing.

Note that this does bump us past
google/benchmark#1836, which gets rid of a
division by number of threads on certain reported time values. I don't
know if this impacts us, but I'm going to flag it anyway.

Feb 21, 2026
36df853
zip
tar.gz

PreviousNext

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

iree-3.11.0rc20260302

iree-3.11.0rc20260301

iree-3.11.0rc20260228

iree-3.11.0rc20260227

iree-3.11.0rc20260226

iree-3.11.0rc20260225

iree-3.11.0rc20260224

iree-3.11.0rc20260223

iree-3.11.0rc20260222

iree-3.11.0rc20260221

Tags: anderspitman/iree