Add iree_hal_device_group_t to own device topology lifecycle#23576
Merged
Conversation
krzysz00
reviewed
Feb 25, 2026
Contributor
krzysz00
left a comment
There was a problem hiding this comment.
Looks pretty reasonable to me, though I don't have the context for it
... ah, it's a draft, still looks good
65d9cd6 to
f719600
Compare
AWoloszyn
approved these changes
Feb 25, 2026
The device group (coming next) needs to push topology information into each device after building the topology matrix. This adds the vtable method and public API that the group will call, with trivial memcpy implementations in all 8 drivers. Also fixes the Vulkan driver which was missing the topology_info struct field and had stub implementations that asserted false / returned UNIMPLEMENTED. Co-Authored-By: Claude <[email protected]>
The device group owns a set of retained HAL devices and an immutable topology matrix built during creation. It is the runtime representation of cooperating devices — topology is computed once from device capabilities and optional driver-specific refinement, then pushed into each device via assign_topology_info for lock-free concurrent queries. The builder pattern (initialize/add_device/finalize) is the only way to create a group. Finalize invalidates the builder, queries capabilities, computes edges via edge_from_capabilities with same-driver refinement, finalizes the topology, computes per-device bitmaps (can_wait_from, can_signal_to, can_import_from, can_p2p_with), and assigns topology info to each device. Also introduces hal/testing/mock_device — a lightweight testonly HAL device with configurable identifier and capabilities for testing HAL infrastructure without real hardware or a full driver stack. Co-Authored-By: Claude <[email protected]>
iree_hal_module_create now takes an iree_hal_device_group_t* instead of a flat (device_count, devices[]) pair. The module retains the group and delegates all device access through iree_hal_device_group_device_count / iree_hal_device_group_device_at, eliminating the flexible array member from iree_hal_module_t and simplifying allocation. Adds iree_hal_device_group_create_from_device as a convenience for the common single-device case (7 of 9 callers). The multi-device callers (context_util.c tooling path, Python bindings) use the builder API. Co-Authored-By: Claude <[email protected]>
Co-Authored-By: Claude <[email protected]>
f719600 to
cf49ac8
Compare
benvanik
added a commit
that referenced
this pull request
Feb 25, 2026
#23576 added `query_capabilities` to the device vtable but left the Vulkan implementation as a bare `UNIMPLEMENTED` return. Every other driver (local_task, local_sync, HIP, CUDA, Metal, null, AMDGPU) got the correct `memset + ok_status` stub. The Vulkan driver was the sole holdout, which meant device group creation — now required by the HAL module — failed immediately for any Vulkan device. This broke every Vulkan CI job, but I didn't notice because the AMD GPU runners have been flaky for months and I'd stopped reading those failures carefully. Lesson learned. Co-Authored-By: Claude <[email protected]>
benvanik
added a commit
that referenced
this pull request
Feb 25, 2026
) #23576 added `query_capabilities` to the device vtable but left the Vulkan implementation as a bare `UNIMPLEMENTED` return. Every other driver (local_task, local_sync, HIP, CUDA, Metal, null, AMDGPU) got the correct `memset + ok_status` stub. The Vulkan driver was the sole holdout, which meant device group creation — now required by the HAL module — failed immediately for any Vulkan device. This broke every Vulkan CI job, but I didn't notice because the AMD GPU runners have been flaky for months and I'd stopped reading those failures carefully. Lesson learned. Co-authored-by: Claude <[email protected]>
pravg-amd
pushed a commit
to nod-ai/amd-shark-ai
that referenced
this pull request
Mar 5, 2026
…ange
This commit updates shortfin to work with IREE PR #23576 (commit eacda0d,
Feb 25, 2026), which introduced iree_hal_device_group_t to own device
topology lifecycle.
Background:
-----------
IREE PR #23576 refactored the HAL module API to fix a lifetime ownership
problem. Previously, devices received raw pointers to topology data with
no guarantees that those pointers remained valid. The new device_group
abstraction owns the device lifecycle and ensures topology pointer
validity.
API Changes:
------------
The iree_hal_module_create() signature changed from 8 to 7 parameters:
Before:
iree_hal_module_create(instance, policy, device_count, devices[],
flags, debug_sink, allocator, out_module)
After:
iree_hal_module_create(instance, policy, device_group,
flags, debug_sink, allocator, out_module)
Implementation:
---------------
1. Added hal_device_group_ptr smart pointer wrapper in iree_helpers.h
using the SHORTFIN_IREE_DEF_PTR macro
2. Updated program.cc to create device groups before module creation:
- Single device: Uses iree_hal_device_group_create_from_device()
- Multiple devices: Uses builder pattern with
iree_hal_device_group_builder_* APIs
Benefits:
---------
- Clear lifetime semantics (module retains group; group retains devices)
- Topology safety (pointers guaranteed valid for device lifetime)
- Simplified allocation (no flexible array members)
References:
-----------
- IREE PR: iree-org/iree#23576
- Commit: eacda0d84b0a357d5ea701fec9137346fc724f59
Co-Authored-By: Claude Opus 4.6 <[email protected]>
kimm240
pushed a commit
to kimm240/iree
that referenced
this pull request
May 8, 2026
…org#23576) The topology matrix from iree-org#23573 needs an owner with a clear lifetime contract. Today devices are passed to the HAL module as a flat array — there is no object that retains the devices, owns the topology, and guarantees the topology pointer remains valid for the duration of execution. Every device holds a raw pointer to the topology it was assigned, so if the topology is freed while devices are still alive, those pointers dangle. `iree_hal_device_group_t` is that owner. It takes already-created devices, builds the immutable topology matrix from their capabilities, pushes topology info into each device, and retains all of them. The group's lifetime brackets the devices': whoever holds the devices long-term (the HAL module, the CTS harness, a Python session) retains the group, and the group retains the devices, so the topology pointer in each device is guaranteed valid. ### Creation API The builder pattern matches the topology builder it wraps: stack-allocate, add devices, finalize. Finalize is a consuming operation — it queries capabilities from all devices, computes edge descriptors, calls driver-specific refinement for same-driver pairs, builds the topology matrix, assigns topology info into each device via the new vtable method, and produces the immutable group. The builder is zeroed after finalize (whether it succeeds or fails) and cannot be reused. For the common single-device case (7 of 9 callers), `iree_hal_device_group_create_from_device` wraps the builder sequence into a one-liner. ### `assign_topology_info` vtable method Devices need to receive their topology info after the matrix is built — the topology doesn't exist yet when the device is created, and the device's index in the matrix isn't known until group creation. This is a new vtable method on `iree_hal_device_t` that the group calls during finalize. All existing driver implementations (local-sync, local-task, CUDA, HIP, Vulkan, Metal, AMDGPU, null) store the info into their device struct. The method is called exactly once per device, during group creation. ### HAL module integration `iree_hal_module_create` now takes a `iree_hal_device_group_t*` instead of `(device_count, devices[])`. The module retains the group and delegates all device access through `iree_hal_device_group_device_count` / `iree_hal_device_group_device_at`. This eliminates the flexible array member from `iree_hal_module_t`, simplifies allocation (fixed-size struct instead of variable-size), and makes the lifetime contract explicit: the module holds the group, the group holds the devices and topology. All callers are updated — CLI tooling, the high-level runtime session, TFLite bindings, Python bindings, PJRT, ConstEval, simple_embedding samples, check_test, and the CTS test harness. ### Testing A mock device (`hal/testing/mock_device`) provides controllable capabilities for testing topology construction without requiring real hardware. The device group tests exercise builder validation (empty builds, duplicate devices, capacity limits), single-device and multi-device group creation, topology correctness (self-edges, cross-device edges with expected interop modes), the convenience function, and lifetime ordering (group outlives devices). The CTS test harness creates a device group in `SetUpTestSuite` so every CTS test runs with topology info assigned. ### Where this is going The device group is the scheduling domain for the causal execution system. When the AMDGPU driver gets its frontier-integrated semaphores and queue operations, the device group's topology matrix is what tells the scheduler whether a semaphore can be waited on natively or needs handle import, whether a buffer can be read directly or needs DMA transfer, and what the relative cost is. The group also becomes the natural attachment point for collective channel creation and multi-device resource pools. --------- Co-authored-by: Claude <[email protected]>
kimm240
pushed a commit
to kimm240/iree
that referenced
this pull request
May 8, 2026
…e-org#23582) iree-org#23576 added `query_capabilities` to the device vtable but left the Vulkan implementation as a bare `UNIMPLEMENTED` return. Every other driver (local_task, local_sync, HIP, CUDA, Metal, null, AMDGPU) got the correct `memset + ok_status` stub. The Vulkan driver was the sole holdout, which meant device group creation — now required by the HAL module — failed immediately for any Vulkan device. This broke every Vulkan CI job, but I didn't notice because the AMD GPU runners have been flaky for months and I'd stopped reading those failures carefully. Lesson learned. Co-authored-by: Claude <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The topology matrix from #23573 needs an owner with a clear lifetime contract. Today devices are passed to the HAL module as a flat array — there is no object that retains the devices, owns the topology, and guarantees the topology pointer remains valid for the duration of execution. Every device holds a raw pointer to the topology it was assigned, so if the topology is freed while devices are still alive, those pointers dangle.
iree_hal_device_group_tis that owner. It takes already-created devices, builds the immutable topology matrix from their capabilities, pushes topology info into each device, and retains all of them. The group's lifetime brackets the devices': whoever holds the devices long-term (the HAL module, the CTS harness, a Python session) retains the group, and the group retains the devices, so the topology pointer in each device is guaranteed valid.Creation API
The builder pattern matches the topology builder it wraps: stack-allocate, add devices, finalize. Finalize is a consuming operation — it queries capabilities from all devices, computes edge descriptors, calls driver-specific refinement for same-driver pairs, builds the topology matrix, assigns topology info into each device via the new vtable method, and produces the immutable group. The builder is zeroed after finalize (whether it succeeds or fails) and cannot be reused.
For the common single-device case (7 of 9 callers),
iree_hal_device_group_create_from_devicewraps the builder sequence into a one-liner.assign_topology_infovtable methodDevices need to receive their topology info after the matrix is built — the topology doesn't exist yet when the device is created, and the device's index in the matrix isn't known until group creation. This is a new vtable method on
iree_hal_device_tthat the group calls during finalize. All existing driver implementations (local-sync, local-task, CUDA, HIP, Vulkan, Metal, AMDGPU, null) store the info into their device struct. The method is called exactly once per device, during group creation.HAL module integration
iree_hal_module_createnow takes airee_hal_device_group_t*instead of(device_count, devices[]). The module retains the group and delegates all device access throughiree_hal_device_group_device_count/iree_hal_device_group_device_at. This eliminates the flexible array member fromiree_hal_module_t, simplifies allocation (fixed-size struct instead of variable-size), and makes the lifetime contract explicit: the module holds the group, the group holds the devices and topology.All callers are updated — CLI tooling, the high-level runtime session, TFLite bindings, Python bindings, PJRT, ConstEval, simple_embedding samples, check_test, and the CTS test harness.
Testing
A mock device (
hal/testing/mock_device) provides controllable capabilities for testing topology construction without requiring real hardware. The device group tests exercise builder validation (empty builds, duplicate devices, capacity limits), single-device and multi-device group creation, topology correctness (self-edges, cross-device edges with expected interop modes), the convenience function, and lifetime ordering (group outlives devices). The CTS test harness creates a device group inSetUpTestSuiteso every CTS test runs with topology info assigned.Where this is going
The device group is the scheduling domain for the causal execution system. When the AMDGPU driver gets its frontier-integrated semaphores and queue operations, the device group's topology matrix is what tells the scheduler whether a semaphore can be waited on natively or needs handle import, whether a buffer can be read directly or needs DMA transfer, and what the relative cost is. The group also becomes the natural attachment point for collective channel creation and multi-device resource pools.