-
Notifications
You must be signed in to change notification settings - Fork 5k
Add CPUID for AvxVnniInt8 and AvxVnniInt16 #113956
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
6714249
to
141d643
Compare
@tannergooding This is first of the 2 PRs needed for AVX VNNI INT* API introduction #112586 |
src/coreclr/jit/hwintrinsic.cpp
Outdated
{ NI_Illegal, NI_Illegal }, // AVXVNNIINT8 | ||
{ NI_Illegal, NI_Illegal }, // AVXVNNIINT8_V512 | ||
{ NI_Illegal, NI_Illegal }, // AVXVNNIINT16 | ||
{ NI_Illegal, NI_Illegal }, // AVXVNNIINT16_V512 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason we're not adding the APIs at the same time? They look like they should be generally table driven, so it should be a minimal change on top...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can do that too. For all other ISAs, we generally did CPUID and API introduction as separate PRs. Also, it becomes easier to run superpmi once the CPUID PR goes in. Let me know what you'd prefer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For all other ISAs, we generally did CPUID and API introduction as separate PRs.
For some of the others, like AVX10.2, we've done it incrementally because of the number of APIs and total work required.
That is, checking in the CPUID support first allowed a reduction of conflicts and parallelization of adding a large number of intrinsic APIs across several PRs.
In this case, there's only a very small number of APIs that are likely entirely table driven, so there's little to no risk of conflicts or additional churn.
Doing it all at once lets us build confidence the CPUID checks and end to end story is correct since it is self contained like that and since it allows adding the CPUID and other tests at the same time.
Also, it becomes easier to run superpmi once the CPUID PR goes in
There's not much need to run SPMI for net new intrinsics that nothing is using yet, we're going to get zero diffs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ohkay. Thanks for the review. I will switch this PR to add everything together and then update you.
src/coreclr/tools/Common/JitInterface/ThunkGenerator/InstructionSetDesc.txt
Show resolved
Hide resolved
141d643
to
98fc970
Compare
@tannergooding @saucecontrol I have added the CPUID, API surface, JIT handling and template tests here. |
@@ -233,6 +233,10 @@ public static InstructionSetSupport ConfigureInstructionSetSupport(string instru | |||
optimisticInstructionSetSupportBuilder.AddSupportedInstructionSet("vpclmul_v512"); | |||
optimisticInstructionSetSupportBuilder.AddSupportedInstructionSet("avx10v2"); | |||
optimisticInstructionSetSupportBuilder.AddSupportedInstructionSet("avx10v2_v512"); | |||
optimisticInstructionSetSupportBuilder.AddSupportedInstructionSet("avxvnniint8"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
avxvnniint8
and avxvnniint16
should be added with the other VEX instruction sets, where avxvnni
is. (you'll have conflicts with #114575 depending which lands first)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ohkay. I was thinking that we add AvxVnniInt*
in both the places. does that make sense? we want AvxVnniInt*
to be available independently as well as with Avx10.2
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They only need to be added under the AVX check. AVX10v2 implies AVX, so both blocks will execute in that case.
n.b. the _V512
ones would stay where they are.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'll need to add NAOT smoke tests for AVX10v2 as well, under https://github.com/dotnet/runtime/tree/main/src/tests/nativeaot/SmokeTests/HardwareIntrinsics
They should check that with Avx2 (today, or Avx after #114575) enabled, AvxVnniInt8+16 are optimistically supported (return null) and that with Avx10v2 enabled, they are enabled too (return true).
I'm not sure why AVX10v1 doesn't have a config too already, actually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'll need to add NAOT smoke tests for AVX10v2 as well, under https://github.com/dotnet/runtime/tree/main/src/tests/nativeaot/SmokeTests/HardwareIntrinsics
They should check that with Avx2 (today, or Avx after #114575) enabled, AvxVnniInt8+16 are optimistically supported (return null) and that with Avx10v2 enabled, they are enabled too (return true). I'm not sure why AVX10v1 doesn't have a config too already, actually.
Looking into this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True. We cannot say that Vex encodings of AvxVnniInt* will be supported if Evex encodings will be supported. But where do you want to put the ISA checks? If Avx10.2 is not available and AvxVnniInt* is, that should be suffice to asusme Vex encoding support according to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If Avx10.2 is not available and AvxVnniInt* is, that should be suffice to asusme Vex encoding support
That's correct. You can assume that no AVX10.2 means the support must come from AVX-VNNI-INT* and that VEX is supported.
However, Avx10.2
currently implies AvxVnniInt*
, which means you have no way to check that the instructions are supported but that VEX encoding is not. If AVX10.2 only guarantees EVEX encoding support, you need an additional ISA you can check for VEX support, which is why I suggested a third ISA for each of these.
where do you want to put the ISA checks?
I believe you'll want to look at emitter::TakesEvexPrefix
, where typically we will only choose EVEX encoding when necessary (for kmask, embedded broadcast, or extended registers). There needs to be an additional check for these new instructions that says that when the VEX support ISA is not present, they must be EVEX.
Again, this is only if the spec has not been revised since the last publication to indicate that the VEX form is guaranteed to be valid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we have a conclusion here.
Avx10.2
will support Evex
as well as Vex
versions of Avx-Vnni-Int*
instructions. This would mostly clear up all our discussions since now, we can move forward assuming that we will either have support for Vex
or both Vex
adn Evex
versions of these instructions. The change should reflect in my latest commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@saucecontrol for review.
e6cf454
to
90fa072
Compare
d220726
to
6c9e153
Compare
c373304
to
640af72
Compare
src/coreclr/jit/instrsxarch.h
Outdated
INST3(LAST_AVX10v2_INSTRUCTION, "LAST_AVX10v2_INSTRUCTION", IUM_WR, BAD_CODE, BAD_CODE, BAD_CODE, INS_TT_NONE, INS_FLAGS_None) | ||
|
||
INST3(FIRST_AVXVNNIINT16_INSTRUCTION, "FIRST_AVXVNNIINT16_INSTRUCTION", IUM_WR, BAD_CODE, BAD_CODE, BAD_CODE, INS_TT_NONE, INS_FLAGS_None) | ||
INST3(vpdpwsud, "pdpwsud", IUM_WR, BAD_CODE, BAD_CODE, PSSE38(0xf3, 0xD2), INS_TT_FULL, Input_32Bit | REX_W0 | Encoding_EVEX | INS_Flags_EmbeddedBroadcastSupported | INS_Flags_IsDstDstSrcAVXInstruction) // Multiply individual words of first source operand with individual words of second source operand and add the results |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When AVX-VNNI-INT8/16 CPUID bits are set, these instructions support VEX encoding. Only if AVX10.2 is also supported would they support EVEX-encoded forms.
There are definitely CPUs in the wild that support the VEX forms only (e.g. Lunar Lake). I think the current modeling of the ISAs will only work if it's guaranteed that anything supporting the EVEX form also supports VEX. In other words, we'd have to say that AVX10.2 implies AVX-VNNI-INT8/16, which the spec doesn't support.
We have the same issue modeling AVX-VNNI, where AVX10.1 says the instructions are available but only guarantees the EVEX forms.
We solved a similar problem with PCLMULQDQ and VPCLMULQDQ, where additional ISA checks are required in order to know whether VEX and EVEX encoding are supported. However, in that case we knew that VPCLMULQDQ strictly implied PCLMULQDQ, and EVEX encoding required AVX-512VL+VPCLMULQDQ, which we could check separately. In other words, you could have VEX-only, but you couldn't have EVEX without VEX.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a patch in my latest commit to see if handling vex encodings can help with the test failures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This too shall be resolved with #113956 (comment)
src/coreclr/jit/hwintrinsic.cpp
Outdated
intrinsic = (op2Type == TYP_UBYTE) | ||
? NI_EVEX_MultiplyWideningAndAddByteByte | ||
: ((op3Type == TYP_UBYTE) ? NI_EVEX_MultiplyWideningAndAddSByteByte | ||
: NI_EVEX_MultiplyWideningAndAddSByteSByte); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't look correct. We shouldn't be selecting some NI_EVEX_*
intrinsic for the VEX encoding.
We should instead select NI_AVXVNNIINT8_MultiplyWideningAndAdd
and then that can implicitly upgrade to the EVEX form, much as NI_SSE_Add
can do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I have changed the implementation to reflect this and we will use Vex
encoding where only Avx-Vnni-Int*
is present.
if (!compiler->compIsaSupportedDebugOnly(isa)) | ||
{ | ||
printf("isa: %d\n", isa); | ||
printf("evex: %d\n", InstructionSet_EVEX); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: This should probably be using the JITDUMP or alternative API rather than just calling printf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was not able to reproduce it offline so was trying to debug from CI. I have removed it now.
@@ -1725,6 +1765,18 @@ HARDWARE_INTRINSIC(EVEX, CompareUnorderedMask, | |||
HARDWARE_INTRINSIC(EVEX, ConvertMaskToVector, -1, 1, {INS_vpmovm2b, INS_vpmovm2b, INS_vpmovm2w, INS_vpmovm2w, INS_vpmovm2d, INS_vpmovm2d, INS_vpmovm2q, INS_vpmovm2q, INS_vpmovm2d, INS_vpmovm2q}, HW_Category_SimpleSIMD, HW_Flag_NoContainment|HW_Flag_ReturnsPerElementMask) | |||
HARDWARE_INTRINSIC(EVEX, ConvertVectorToMask, -1, 1, {INS_vpmovb2m, INS_vpmovb2m, INS_vpmovw2m, INS_vpmovw2m, INS_vpmovd2m, INS_vpmovd2m, INS_vpmovq2m, INS_vpmovq2m, INS_vpmovd2m, INS_vpmovq2m}, HW_Category_SimpleSIMD, HW_Flag_NoContainment|HW_Flag_ReturnsPerElementMask) | |||
HARDWARE_INTRINSIC(EVEX, MoveMask, -1, 1, {INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid}, HW_Category_Special, HW_Flag_NoContainment) | |||
HARDWARE_INTRINSIC(EVEX, MultiplyWideningAndAddSByteSByte, -1, 3, {INS_vpdpbssd, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid}, HW_Category_SimpleSIMD, HW_Flag_BaseTypeFromSecondArg|HW_Flag_EmbBroadcastCompatible|HW_Flag_EmbMaskingCompatible) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't need any of these EVEX
cases. The NI_EVEX_*
intrinsics are largely for special intrinsics that deal with masks and which do not map directly to a managed signature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I was using them as a fallback for all the Avx-Vnni-Int*
APIs since the APIs had same names with same type of first 2 args. But that is resolved now with my latest change.
be2836e
to
d0529a8
Compare
<!-- Test infra issue on apple devices: https://github.com/dotnet/runtime/issues/89917 --> | ||
<CLRTestTargetUnsupported Condition="'$(TargetsAppleMobile)' == 'true'">true</CLRTestTargetUnsupported> | ||
<AllowUnsafeBlocks>true</AllowUnsafeBlocks> | ||
<DefineConstants>$(DefineConstants);AVX10v2_INTRINSICS;VECTORT512_INTRINSICS</DefineConstants> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @saucecontrol
Looks like we cannot add the smoketest for Avx10 just yet. We can do it once IsProcessorFeaturePresent
(link) supports Avx10
. Do you agree? Until then, the smoketest will keep failing.
This PR adds support for CPUID for
AVX-VNNI-INT8
&AVX-VNNI-INT16
ISAsDesign
The changes are made in a way to enable the 2 ISAs when
Avx10.2
is enabled orThis is w.r.t the discussions done in API proposal #112586
Testing
Note1: Emitter unit tests not ran since they are added and verified along with AVX10.2 PR #111209
Note2: Superpmi results are not accurate since we are adding a new CPUID and it leads to a new jiteeversionguid. Even after changing the jiteeversion manually, superpmi run shows errors and failures based on the old mch files which can be ignored.
Run JIT subtree with AVXVNNIINT* enabled / disabled
AVXVNNIINT* Enabled

AVXVNNIINT* disabled
