Thanks to visit codestin.com
Credit goes to github.com

Skip to content
Merged
Prev Previous commit
Next Next commit
Update xegpu op descriptions.
  • Loading branch information
silee2 committed Aug 20, 2025
commit 2e985ce836322927ae7c63a2db5f503636c8c645
83 changes: 60 additions & 23 deletions mlir/include/mlir/Dialect/XeGPU/IR/XeGPUOps.td
Original file line number Diff line number Diff line change
Expand Up @@ -511,6 +511,8 @@ def XeGPU_CreateDescOp: XeGPU_Op<"create_tdesc", [Pure, ViewLikeOpInterface]> {
match the dimension of offsets. It may also has a second dimension corresponding to
the chunk_size if the chunk size is larger than 1.

This op is not available in SIMT mode.

Example 1: It assumes subgroup size is 4, and accesses a[0], a[16], a[32], a[64]
```mlir
%a = memref.alloc() : memref<1024xf32>
Expand Down Expand Up @@ -618,6 +620,15 @@ def XeGPU_PrefetchOp : XeGPU_Op<"prefetch", []> {
: memref<1024xf32>, vector<4xindex>
```

Example 3 (SIMT mode):
SIMT mode only accepts the offsets variant.
```mlir
xegpu.prefetch %0[%1] {l1_hint = #xegpu.cache_hint<cached>,
l2_hint = #xegpu.cache_hint<cached>,
l3_hint = #xegpu.cache_hint<cached>}
: memref<256xf32>, vector<1xindex>
```

}];

let arguments = (ins XeGPU_GatherScatterSourceType: $source,
Expand Down Expand Up @@ -671,8 +682,18 @@ def XeGPU_LoadGatherOp : XeGPU_Op<"load", [MemoryEffects<[MemRead]>]> {
The mask operand masks out memory access so that it is safe to pass out-of-boundary
addresses/offsets as long as they are masked. It applies to slots of SIMD lanes.

In SIMT mode, the result vector represents the data to be loaded by each work-item.
Each work-item recieves a `chunk_size` number of elements.
In SIMT mode, the result is a 1D vector that represents the data to be loaded by
each work-item.

`source` represents the memory region to be loaded from, which can be either a
tensor_desc or a 1D memref or pointer (ui64, ui32, i64 or i32).
In case of tensor_desc, offsets come from the producer create_tdesc op.
tensor_desc cannot be used in SIMT mode.
`offsets` represents offsets from source. required if `source` in not a TensorDescType.
offsets is a vector of `index` type and vector length is either the subgroup size
or 1 in SIMT mode.
`mask` is a vector of `i1` type, which is used to mask out the memory access.
mask is a vector of size equal to the subgroup size, or 1 in SIMT mode.

Example 1:
```mlir
Expand All @@ -692,16 +713,7 @@ def XeGPU_LoadGatherOp : XeGPU_Op<"load", [MemoryEffects<[MemRead]>]> {
vector<16xi1> -> vector<16x8xf32>
```

Example 3 (SIMT mode):
```mlir
%2 = xegpu.load %1, %0 <{l1_hint = #xegpu.cache_hint<cached>,
l2_hint = #xegpu.cache_hint<uncached>,
l3_hint = #xegpu.cache_hint<uncached>}>
: !xegpu.tensor_desc<16x8xf32, #xegpu.scatter_tdesc_attr<memory_space=global, chunk_size=8>>
vector<16xi1> -> vector<8xf32>
```

Example 4:
Example 3:
A variant accepts memref as base pointer and an offset instead of scattered TensorTdesc.
It combines "create scattered TensorTdesc" and "load with scattered TensorTdesc".
The source operand could be a raw pointer (uint64_t). Please refer to create_tdesc
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The source operand could be a raw pointer (uint64_t). Please refer to create_tdesc
The source operand could be a raw pointer (ui64, ui32, i64 or i32). Please refer to create_tdesc

Expand All @@ -716,6 +728,16 @@ def XeGPU_LoadGatherOp : XeGPU_Op<"load", [MemoryEffects<[MemRead]>]> {
: memref<1024xf32>, vector<16xi1>, vector<16xindex> -> vector<16xf32>
```

Example 4 (SIMT mode):
SIMT mode only accepts the offsets variant. chunk_size can be inferred from result
type. In this example, chunk_size is 8.
```mlir
%2 = xegpu.load %1[%2], %0 <{l1_hint = #xegpu.cache_hint<cached>,
l2_hint = #xegpu.cache_hint<uncached>,
l3_hint = #xegpu.cache_hint<uncached>}>
: memref<128xf32>, vector<1xindex>, vector<1xi1> -> vector<8xf32>
```

}];

let arguments = (ins XeGPU_GatherScatterSourceType: $source,
Expand Down Expand Up @@ -785,8 +807,19 @@ def XeGPU_StoreScatterOp : XeGPU_Op<"store", [MemoryEffects<[MemWrite]>]> {
has transpose effect, which is similar to `load_gather`. Therefore, a transpose attribute is
introduced on purpose, making sure users are aware of this implicit transformation.

In SIMT mode, the input vector represents the data to be stored by each work-item.
Each work-item stores a `chunk_size` number of elements.
In SIMT mode, the result is a 1D vector that represents the data to be stored by
each work-item.

`value` represents the data to be stored.
`dest` represents the memory region to be stored to, which can be either a
tensor_desc or a 1D memref or pointer (ui64, ui32, i64 or i32).
In case of tensor_desc, offsets come from the producer create_tdesc op.
tensor_desc cannot be used in SIMT mode.
`offsets` represents offsets from dest. required if `source` in not a TensorDescType.
offsets is a vector of `index` type and vector length is either the subgroup size
or 1 in SIMT mode.
`mask` is a vector of `i1` type, which is used to mask out the memory access.
mask is a vector of size equal to the subgroup size, or 1 in SIMT mode.

Example 1:
```mlir
Expand All @@ -804,15 +837,7 @@ def XeGPU_StoreScatterOp : XeGPU_Op<"store", [MemoryEffects<[MemWrite]>]> {
: vector<16x8xf32>, !xegpu.tensor_desc<16x8xf32, #xegpu.scattered_tdesc_attr<chunk_size=8>>, vector<16xi1>
```

Example 3 (SIMT mode):
```mlir
xegpu.store %0, %1, %2 <{l1_hint = #xegpu.cache_hint<uncached>,
l2_hint = #xegpu.cache_hint<write_back>,
l3_hint = #xegpu.cache_hint<write_through>}>
: vector<8xf32>, !xegpu.tensor_desc<16x8xf32, #xegpu.scattered_tdesc_attr<chunk_size=8>> vector<16xi1>
```

Example 4:
Example 3:
A variant accepts memref as base pointer and an offset instead of scattered TensorTdesc.
It combines "create scattered TensorTdesc" and "store with scattered TensorTdesc".
The dest operand could be a raw pointer (uint64_t).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The dest operand could be a raw pointer (uint64_t).
The dest operand could be a raw pointer (ui64, ui32, i64 or i32).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank! Missed that one.

Expand All @@ -828,6 +853,16 @@ def XeGPU_StoreScatterOp : XeGPU_Op<"store", [MemoryEffects<[MemWrite]>]> {
: memref<1024xf32>, vector<16xi1>, vector<16xindex> -> vector<16xf32>
```

Example 4 (SIMT mode):
SIMT mode only accepts the offsets variant. chunk_size can be inferred from value
type. In this example, chunk_size is 8.
```mlir
xegpu.store %0, %1[%2], %3 <{l1_hint = #xegpu.cache_hint<uncached>,
l2_hint = #xegpu.cache_hint<write_back>,
l3_hint = #xegpu.cache_hint<write_through>}>
: vector<8xf32>, memref<256xf32>, vector<1xindex>, vector<1xi1>
```

}];

let arguments = (ins
Expand Down Expand Up @@ -896,6 +931,8 @@ def XeGPU_UpdateOffsetOp: XeGPU_Op<"update_offset",
update the offset per work-item, so its offsets contains values representing
shifts for each work-item.

This op is not available in SIMT mode.

Example:
```mlir
%off = arith.constant dense<[32, 32, 32, 32]> : vector<4xindex>
Expand Down