Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Add iree_hal_device_group_t to own device topology lifecycle#23576

Merged
benvanik merged 4 commits into
mainfrom
users/benvanik/device-group
Feb 25, 2026
Merged

Add iree_hal_device_group_t to own device topology lifecycle#23576
benvanik merged 4 commits into
mainfrom
users/benvanik/device-group

Conversation

@benvanik
Copy link
Copy Markdown
Collaborator

The topology matrix from #23573 needs an owner with a clear lifetime contract. Today devices are passed to the HAL module as a flat array — there is no object that retains the devices, owns the topology, and guarantees the topology pointer remains valid for the duration of execution. Every device holds a raw pointer to the topology it was assigned, so if the topology is freed while devices are still alive, those pointers dangle.

iree_hal_device_group_t is that owner. It takes already-created devices, builds the immutable topology matrix from their capabilities, pushes topology info into each device, and retains all of them. The group's lifetime brackets the devices': whoever holds the devices long-term (the HAL module, the CTS harness, a Python session) retains the group, and the group retains the devices, so the topology pointer in each device is guaranteed valid.

Creation API

The builder pattern matches the topology builder it wraps: stack-allocate, add devices, finalize. Finalize is a consuming operation — it queries capabilities from all devices, computes edge descriptors, calls driver-specific refinement for same-driver pairs, builds the topology matrix, assigns topology info into each device via the new vtable method, and produces the immutable group. The builder is zeroed after finalize (whether it succeeds or fails) and cannot be reused.

For the common single-device case (7 of 9 callers), iree_hal_device_group_create_from_device wraps the builder sequence into a one-liner.

assign_topology_info vtable method

Devices need to receive their topology info after the matrix is built — the topology doesn't exist yet when the device is created, and the device's index in the matrix isn't known until group creation. This is a new vtable method on iree_hal_device_t that the group calls during finalize. All existing driver implementations (local-sync, local-task, CUDA, HIP, Vulkan, Metal, AMDGPU, null) store the info into their device struct. The method is called exactly once per device, during group creation.

HAL module integration

iree_hal_module_create now takes a iree_hal_device_group_t* instead of (device_count, devices[]). The module retains the group and delegates all device access through iree_hal_device_group_device_count / iree_hal_device_group_device_at. This eliminates the flexible array member from iree_hal_module_t, simplifies allocation (fixed-size struct instead of variable-size), and makes the lifetime contract explicit: the module holds the group, the group holds the devices and topology.

All callers are updated — CLI tooling, the high-level runtime session, TFLite bindings, Python bindings, PJRT, ConstEval, simple_embedding samples, check_test, and the CTS test harness.

Testing

A mock device (hal/testing/mock_device) provides controllable capabilities for testing topology construction without requiring real hardware. The device group tests exercise builder validation (empty builds, duplicate devices, capacity limits), single-device and multi-device group creation, topology correctness (self-edges, cross-device edges with expected interop modes), the convenience function, and lifetime ordering (group outlives devices). The CTS test harness creates a device group in SetUpTestSuite so every CTS test runs with topology info assigned.

Where this is going

The device group is the scheduling domain for the causal execution system. When the AMDGPU driver gets its frontier-integrated semaphores and queue operations, the device group's topology matrix is what tells the scheduler whether a semaphore can be waited on natively or needs handle import, whether a buffer can be read directly or needs DMA transfer, and what the relative cost is. The group also becomes the natural attachment point for collective channel creation and multi-device resource pools.

@benvanik benvanik requested a review from AWoloszyn February 25, 2026 00:59
@benvanik benvanik added the hal/api IREE's public C hardware abstraction layer API label Feb 25, 2026
Copy link
Copy Markdown
Contributor

@krzysz00 krzysz00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty reasonable to me, though I don't have the context for it

... ah, it's a draft, still looks good

Comment thread compiler/src/iree/compiler/ConstEval/Runtime.cpp Outdated
@benvanik benvanik marked this pull request as ready for review February 25, 2026 02:38
@benvanik benvanik added the post-merge-review Ben's special place. People can pick these up and review them for forward fixes if interested. label Feb 25, 2026
@benvanik benvanik force-pushed the users/benvanik/device-group branch from 65d9cd6 to f719600 Compare February 25, 2026 06:05
benvanik and others added 4 commits February 25, 2026 08:17
The device group (coming next) needs to push topology information into
each device after building the topology matrix. This adds the vtable
method and public API that the group will call, with trivial memcpy
implementations in all 8 drivers. Also fixes the Vulkan driver which
was missing the topology_info struct field and had stub implementations
that asserted false / returned UNIMPLEMENTED.

Co-Authored-By: Claude <[email protected]>
The device group owns a set of retained HAL devices and an immutable
topology matrix built during creation. It is the runtime representation
of cooperating devices — topology is computed once from device
capabilities and optional driver-specific refinement, then pushed into
each device via assign_topology_info for lock-free concurrent queries.

The builder pattern (initialize/add_device/finalize) is the only way to
create a group. Finalize invalidates the builder, queries capabilities,
computes edges via edge_from_capabilities with same-driver refinement,
finalizes the topology, computes per-device bitmaps (can_wait_from,
can_signal_to, can_import_from, can_p2p_with), and assigns topology info
to each device.

Also introduces hal/testing/mock_device — a lightweight testonly HAL
device with configurable identifier and capabilities for testing HAL
infrastructure without real hardware or a full driver stack.

Co-Authored-By: Claude <[email protected]>
iree_hal_module_create now takes an iree_hal_device_group_t* instead of
a flat (device_count, devices[]) pair. The module retains the group and
delegates all device access through iree_hal_device_group_device_count /
iree_hal_device_group_device_at, eliminating the flexible array member
from iree_hal_module_t and simplifying allocation.

Adds iree_hal_device_group_create_from_device as a convenience for the
common single-device case (7 of 9 callers). The multi-device callers
(context_util.c tooling path, Python bindings) use the builder API.

Co-Authored-By: Claude <[email protected]>
@benvanik benvanik force-pushed the users/benvanik/device-group branch from f719600 to cf49ac8 Compare February 25, 2026 16:17
@benvanik benvanik removed the post-merge-review Ben's special place. People can pick these up and review them for forward fixes if interested. label Feb 25, 2026
@benvanik benvanik merged commit eacda0d into main Feb 25, 2026
53 of 60 checks passed
@benvanik benvanik deleted the users/benvanik/device-group branch February 25, 2026 17:01
benvanik added a commit that referenced this pull request Feb 25, 2026
#23576 added `query_capabilities` to the device vtable but left the
Vulkan implementation as a bare `UNIMPLEMENTED` return. Every other
driver (local_task, local_sync, HIP, CUDA, Metal, null, AMDGPU) got
the correct `memset + ok_status` stub. The Vulkan driver was the sole
holdout, which meant device group creation — now required by the HAL
module — failed immediately for any Vulkan device.

This broke every Vulkan CI job, but I didn't notice because the AMD
GPU runners have been flaky for months and I'd stopped reading those
failures carefully. Lesson learned.

Co-Authored-By: Claude <[email protected]>
benvanik added a commit that referenced this pull request Feb 25, 2026
)

#23576 added `query_capabilities` to the device vtable but left the
Vulkan implementation as a bare `UNIMPLEMENTED` return. Every other
driver (local_task, local_sync, HIP, CUDA, Metal, null, AMDGPU) got the
correct `memset + ok_status` stub. The Vulkan driver was the sole
holdout, which meant device group creation — now required by the HAL
module — failed immediately for any Vulkan device.

This broke every Vulkan CI job, but I didn't notice because the AMD GPU
runners have been flaky for months and I'd stopped reading those
failures carefully. Lesson learned.

Co-authored-by: Claude <[email protected]>
pravg-amd pushed a commit to nod-ai/amd-shark-ai that referenced this pull request Mar 5, 2026
…ange

This commit updates shortfin to work with IREE PR #23576 (commit eacda0d,
Feb 25, 2026), which introduced iree_hal_device_group_t to own device
topology lifecycle.

Background:
-----------
IREE PR #23576 refactored the HAL module API to fix a lifetime ownership
problem. Previously, devices received raw pointers to topology data with
no guarantees that those pointers remained valid. The new device_group
abstraction owns the device lifecycle and ensures topology pointer
validity.

API Changes:
------------
The iree_hal_module_create() signature changed from 8 to 7 parameters:

Before:
  iree_hal_module_create(instance, policy, device_count, devices[],
                        flags, debug_sink, allocator, out_module)

After:
  iree_hal_module_create(instance, policy, device_group,
                        flags, debug_sink, allocator, out_module)

Implementation:
---------------
1. Added hal_device_group_ptr smart pointer wrapper in iree_helpers.h
   using the SHORTFIN_IREE_DEF_PTR macro

2. Updated program.cc to create device groups before module creation:
   - Single device: Uses iree_hal_device_group_create_from_device()
   - Multiple devices: Uses builder pattern with
     iree_hal_device_group_builder_* APIs

Benefits:
---------
- Clear lifetime semantics (module retains group; group retains devices)
- Topology safety (pointers guaranteed valid for device lifetime)
- Simplified allocation (no flexible array members)

References:
-----------
- IREE PR: iree-org/iree#23576
- Commit: eacda0d84b0a357d5ea701fec9137346fc724f59

Co-Authored-By: Claude Opus 4.6 <[email protected]>
kimm240 pushed a commit to kimm240/iree that referenced this pull request May 8, 2026
…org#23576)

The topology matrix from iree-org#23573 needs an owner with a clear lifetime
contract. Today devices are passed to the HAL module as a flat array —
there is no object that retains the devices, owns the topology, and
guarantees the topology pointer remains valid for the duration of
execution. Every device holds a raw pointer to the topology it was
assigned, so if the topology is freed while devices are still alive,
those pointers dangle.

`iree_hal_device_group_t` is that owner. It takes already-created
devices, builds the immutable topology matrix from their capabilities,
pushes topology info into each device, and retains all of them. The
group's lifetime brackets the devices': whoever holds the devices
long-term (the HAL module, the CTS harness, a Python session) retains
the group, and the group retains the devices, so the topology pointer in
each device is guaranteed valid.

### Creation API

The builder pattern matches the topology builder it wraps:
stack-allocate, add devices, finalize. Finalize is a consuming operation
— it queries capabilities from all devices, computes edge descriptors,
calls driver-specific refinement for same-driver pairs, builds the
topology matrix, assigns topology info into each device via the new
vtable method, and produces the immutable group. The builder is zeroed
after finalize (whether it succeeds or fails) and cannot be reused.

For the common single-device case (7 of 9 callers),
`iree_hal_device_group_create_from_device` wraps the builder sequence
into a one-liner.

### `assign_topology_info` vtable method

Devices need to receive their topology info after the matrix is built —
the topology doesn't exist yet when the device is created, and the
device's index in the matrix isn't known until group creation. This is a
new vtable method on `iree_hal_device_t` that the group calls during
finalize. All existing driver implementations (local-sync, local-task,
CUDA, HIP, Vulkan, Metal, AMDGPU, null) store the info into their device
struct. The method is called exactly once per device, during group
creation.

### HAL module integration

`iree_hal_module_create` now takes a `iree_hal_device_group_t*` instead
of `(device_count, devices[])`. The module retains the group and
delegates all device access through `iree_hal_device_group_device_count`
/ `iree_hal_device_group_device_at`. This eliminates the flexible array
member from `iree_hal_module_t`, simplifies allocation (fixed-size
struct instead of variable-size), and makes the lifetime contract
explicit: the module holds the group, the group holds the devices and
topology.

All callers are updated — CLI tooling, the high-level runtime session,
TFLite bindings, Python bindings, PJRT, ConstEval, simple_embedding
samples, check_test, and the CTS test harness.

### Testing

A mock device (`hal/testing/mock_device`) provides controllable
capabilities for testing topology construction without requiring real
hardware. The device group tests exercise builder validation (empty
builds, duplicate devices, capacity limits), single-device and
multi-device group creation, topology correctness (self-edges,
cross-device edges with expected interop modes), the convenience
function, and lifetime ordering (group outlives devices). The CTS test
harness creates a device group in `SetUpTestSuite` so every CTS test
runs with topology info assigned.

### Where this is going

The device group is the scheduling domain for the causal execution
system. When the AMDGPU driver gets its frontier-integrated semaphores
and queue operations, the device group's topology matrix is what tells
the scheduler whether a semaphore can be waited on natively or needs
handle import, whether a buffer can be read directly or needs DMA
transfer, and what the relative cost is. The group also becomes the
natural attachment point for collective channel creation and
multi-device resource pools.

---------

Co-authored-by: Claude <[email protected]>
kimm240 pushed a commit to kimm240/iree that referenced this pull request May 8, 2026
…e-org#23582)

iree-org#23576 added `query_capabilities` to the device vtable but left the
Vulkan implementation as a bare `UNIMPLEMENTED` return. Every other
driver (local_task, local_sync, HIP, CUDA, Metal, null, AMDGPU) got the
correct `memset + ok_status` stub. The Vulkan driver was the sole
holdout, which meant device group creation — now required by the HAL
module — failed immediately for any Vulkan device.

This broke every Vulkan CI job, but I didn't notice because the AMD GPU
runners have been flaky for months and I'd stopped reading those
failures carefully. Lesson learned.

Co-authored-by: Claude <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hal/api IREE's public C hardware abstraction layer API

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants