-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Add auto-bitcasts between x86amx
and i32x256
for AMX intrinsics
#140763
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Some changes occurred in compiler/rustc_codegen_ssa |
I have changed the detection logic to be name-based. This approach can't have false negatives, but it has nontrivial behaviour with functions that export themselves as LLVM intrinsic, with // things from earlier definition
#[export_name = "llvm.x86.tdpbsud.internal"]
#[target_feature(enable = "amx-int8")]
pub extern "unadjusted" fn bar(m: u16, n: u16, k: u16, a: Tile, b: Tile, c: Tile) -> Tile {
unsafe { tdpbuud(m, n, k, a, b, c) }
} I was honestly surprised that this code even compiles! Exporting a function masquerading as an LLVM intrinsic is vile! The LLVM IR produced is ; ModuleID = 'test.acdeec3141bb4e39-cgu.0'
source_filename = "test.acdeec3141bb4e39-cgu.0"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"
; Function Attrs: nounwind
define x86_amx @llvm.x86.tdpbsud.internal(i16 %m, i16 %n, i16 %k, x86_amx %a, x86_amx %b, x86_amx %c) unnamed_addr #0 {
start:
%0 = tail call x86_amx @llvm.x86.tdpbuud.internal(i16 noundef %m, i16 noundef %n, i16 noundef %k, x86_amx %a, x86_amx %b, x86_amx %c) #0
%_0 = bitcast x86_amx %0 to <256 x i32>
ret <256 x i32> %_0
}
; Function Attrs: nounwind
declare x86_amx @llvm.x86.tdpbuud.internal(i16, i16, i16, x86_amx, x86_amx, x86_amx) unnamed_addr #0
attributes #0 = { nounwind }
!llvm.module.flags = !{!0, !1}
!llvm.ident = !{!2}
!0 = !{i32 8, !"PIC Level", i32 2}
!1 = !{i32 2, !"RtLibUseGOT", i32 1}
!2 = !{!"rustc version 1.88.0-dev"} I couldn't find any more edge cases, but I will try make the check stricter |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Some changes occurred in compiler/rustc_codegen_gcc |
This comment has been minimized.
This comment has been minimized.
x86amx
for i32x256
for AMX intrinsics
x86amx
for i32x256
for AMX intrinsicsx86amx
and i32x256
for AMX intrinsics
I managed to resolve the false positive (which resulted in #140822). This can now be extended to more use-cases, e.g. using |
I tried adding support for AMX Tile types to Rust. They are very simple - the LLVM intrinsics operate on
x86amx
types, and all that is needed for us to call those intrinsics is inserting bitcasts to/fromx86amx
andi32x256
before and after the function call (as in this file in LLVM)I tested the codegen for this fragment
test.rs
The LLVM IR generated is (output of
rustc +stage1 --emit=llvm-ir --crate-type=rlib -O test.rs && cat test.ll
)and the ASM generated is (output of
rustc +stage1 --emit=asm --crate-type=rlib -O test.rs && cat test.s
)(note: the tests were done on
x86_64-unknown-linux-gnu
)This is pretty similar to the CLang codegen (https://godbolt.org/z/G19rjo3Ke).
Reviews are welcome, as I am not too confident in the code (I am still not sure if the checks for AMX are strict enough, I will try strengthen them).
Unresolved Questions
Areturns outbitcast
's good enough? CLang usesllvm.x86.cast.vector.to.tile.v256i32
andllvm.x86.cast.tile.to.vector.v256i32
, is there any functional difference withbitcast
s?bitcast
can cause miscompilation (https://reviews.llvm.org/D99152), so we have to use the amx-specific castsi32x256
, or all vector types of size 8192? The LLVM file I referenced only does this fori32x256
, but there is really not reason to be restrictive.@rustbot label O-x86_64 T-compiler
r? codegen