-
Notifications
You must be signed in to change notification settings - Fork 15.1k
[MLIR][XeGPU] Update XeGPU create_tdesc, update_offset, load, store and prefetch. #154653
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
4e9d25f
2e985ce
46857e3
1077b3a
af57f45
daf1839
98f2caa
68492e2
9555b06
95dadb9
269ff58
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
- Loading branch information
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -511,6 +511,8 @@ def XeGPU_CreateDescOp: XeGPU_Op<"create_tdesc", [Pure, ViewLikeOpInterface]> { | |||||
match the dimension of offsets. It may also has a second dimension corresponding to | ||||||
the chunk_size if the chunk size is larger than 1. | ||||||
|
||||||
This op is not available in SIMT mode. | ||||||
|
||||||
Example 1: It assumes subgroup size is 4, and accesses a[0], a[16], a[32], a[64] | ||||||
```mlir | ||||||
%a = memref.alloc() : memref<1024xf32> | ||||||
|
@@ -618,6 +620,15 @@ def XeGPU_PrefetchOp : XeGPU_Op<"prefetch", []> { | |||||
: memref<1024xf32>, vector<4xindex> | ||||||
``` | ||||||
|
||||||
Example 3 (SIMT mode): | ||||||
SIMT mode only accepts the offsets variant. | ||||||
```mlir | ||||||
xegpu.prefetch %0[%1] {l1_hint = #xegpu.cache_hint<cached>, | ||||||
l2_hint = #xegpu.cache_hint<cached>, | ||||||
l3_hint = #xegpu.cache_hint<cached>} | ||||||
: memref<256xf32>, vector<1xindex> | ||||||
``` | ||||||
|
||||||
}]; | ||||||
|
||||||
let arguments = (ins XeGPU_GatherScatterSourceType: $source, | ||||||
|
@@ -671,8 +682,18 @@ def XeGPU_LoadGatherOp : XeGPU_Op<"load", [MemoryEffects<[MemRead]>]> { | |||||
The mask operand masks out memory access so that it is safe to pass out-of-boundary | ||||||
addresses/offsets as long as they are masked. It applies to slots of SIMD lanes. | ||||||
|
||||||
In SIMT mode, the result vector represents the data to be loaded by each work-item. | ||||||
Each work-item recieves a `chunk_size` number of elements. | ||||||
In SIMT mode, the result is a 1D vector that represents the data to be loaded by | ||||||
each work-item. | ||||||
|
||||||
`source` represents the memory region to be loaded from, which can be either a | ||||||
adam-smnk marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||
tensor_desc or a 1D memref or pointer (ui64, ui32, i64 or i32). | ||||||
In case of tensor_desc, offsets come from the producer create_tdesc op. | ||||||
tensor_desc cannot be used in SIMT mode. | ||||||
`offsets` represents offsets from source. required if `source` in not a TensorDescType. | ||||||
offsets is a vector of `index` type and vector length is either the subgroup size | ||||||
or 1 in SIMT mode. | ||||||
`mask` is a vector of `i1` type, which is used to mask out the memory access. | ||||||
mask is a vector of size equal to the subgroup size, or 1 in SIMT mode. | ||||||
|
||||||
Example 1: | ||||||
```mlir | ||||||
|
@@ -692,16 +713,7 @@ def XeGPU_LoadGatherOp : XeGPU_Op<"load", [MemoryEffects<[MemRead]>]> { | |||||
vector<16xi1> -> vector<16x8xf32> | ||||||
``` | ||||||
|
||||||
Example 3 (SIMT mode): | ||||||
```mlir | ||||||
%2 = xegpu.load %1, %0 <{l1_hint = #xegpu.cache_hint<cached>, | ||||||
l2_hint = #xegpu.cache_hint<uncached>, | ||||||
l3_hint = #xegpu.cache_hint<uncached>}> | ||||||
: !xegpu.tensor_desc<16x8xf32, #xegpu.scatter_tdesc_attr<memory_space=global, chunk_size=8>> | ||||||
vector<16xi1> -> vector<8xf32> | ||||||
``` | ||||||
|
||||||
Example 4: | ||||||
Example 3: | ||||||
A variant accepts memref as base pointer and an offset instead of scattered TensorTdesc. | ||||||
It combines "create scattered TensorTdesc" and "load with scattered TensorTdesc". | ||||||
The source operand could be a raw pointer (uint64_t). Please refer to create_tdesc | ||||||
|
The source operand could be a raw pointer (uint64_t). Please refer to create_tdesc | |
The source operand could be a raw pointer (ui64, ui32, i64 or i32). Please refer to create_tdesc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The dest operand could be a raw pointer (uint64_t). | |
The dest operand could be a raw pointer (ui64, ui32, i64 or i32). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank! Missed that one.
Uh oh!
There was an error while loading. Please reload this page.