Add CPUID for AvxVnniInt8 and AvxVnniInt16 #113956

khushal1996 · 2025-03-27T01:35:13Z

This PR adds support for CPUID for AVX-VNNI-INT8 & AVX-VNNI-INT16 ISAs

Design

The changes are made in a way to enable the 2 ISAs when

Avx10.2 is enabled or
CPUID for both ISAs are enabled

This is w.r.t the discussions done in API proposal #112586

Testing

Note1: Emitter unit tests not ran since they are added and verified along with AVX10.2 PR #111209

Note2: Superpmi results are not accurate since we are adding a new CPUID and it leads to a new jiteeversionguid. Even after changing the jiteeversion manually, superpmi run shows errors and failures based on the old mch files which can be ignored.

Run JIT subtree with AVXVNNIINT* enabled / disabled

AVXVNNIINT* Enabled

AVXVNNIINT* disabled

khushal1996 · 2025-03-27T07:10:00Z

@tannergooding This is first of the 2 PRs needed for AVX VNNI INT* API introduction #112586

tannergooding · 2025-03-27T15:22:43Z

src/coreclr/jit/hwintrinsic.cpp

+    { NI_Illegal, NI_Illegal },                                 // AVXVNNIINT8
+    { NI_Illegal, NI_Illegal },                                 // AVXVNNIINT8_V512
+    { NI_Illegal, NI_Illegal },                                 // AVXVNNIINT16
+    { NI_Illegal, NI_Illegal },                                 // AVXVNNIINT16_V512


Any reason we're not adding the APIs at the same time? They look like they should be generally table driven, so it should be a minimal change on top...

We can do that too. For all other ISAs, we generally did CPUID and API introduction as separate PRs. Also, it becomes easier to run superpmi once the CPUID PR goes in. Let me know what you'd prefer.

For all other ISAs, we generally did CPUID and API introduction as separate PRs.

For some of the others, like AVX10.2, we've done it incrementally because of the number of APIs and total work required.

That is, checking in the CPUID support first allowed a reduction of conflicts and parallelization of adding a large number of intrinsic APIs across several PRs.

In this case, there's only a very small number of APIs that are likely entirely table driven, so there's little to no risk of conflicts or additional churn.

Doing it all at once lets us build confidence the CPUID checks and end to end story is correct since it is self contained like that and since it allows adding the CPUID and other tests at the same time.

Also, it becomes easier to run superpmi once the CPUID PR goes in

There's not much need to run SPMI for net new intrinsics that nothing is using yet, we're going to get zero diffs.

Ohkay. Thanks for the review. I will switch this PR to add everything together and then update you.

src/coreclr/tools/Common/JitInterface/ThunkGenerator/InstructionSetDesc.txt

khushal1996 · 2025-04-14T21:44:30Z

@tannergooding @saucecontrol I have added the CPUID, API surface, JIT handling and template tests here.

saucecontrol · 2025-04-14T22:08:46Z

src/coreclr/tools/Common/InstructionSetHelpers.cs

@@ -233,6 +233,10 @@ public static InstructionSetSupport ConfigureInstructionSetSupport(string instru
                    optimisticInstructionSetSupportBuilder.AddSupportedInstructionSet("vpclmul_v512");
                    optimisticInstructionSetSupportBuilder.AddSupportedInstructionSet("avx10v2");
                    optimisticInstructionSetSupportBuilder.AddSupportedInstructionSet("avx10v2_v512");
+                    optimisticInstructionSetSupportBuilder.AddSupportedInstructionSet("avxvnniint8");


avxvnniint8 and avxvnniint16 should be added with the other VEX instruction sets, where avxvnni is. (you'll have conflicts with #114575 depending which lands first)

Ohkay. I was thinking that we add AvxVnniInt* in both the places. does that make sense? we want AvxVnniInt* to be available independently as well as with Avx10.2.

They only need to be added under the AVX check. AVX10v2 implies AVX, so both blocks will execute in that case.

n.b. the _V512 ones would stay where they are.

You'll need to add NAOT smoke tests for AVX10v2 as well, under https://github.com/dotnet/runtime/tree/main/src/tests/nativeaot/SmokeTests/HardwareIntrinsics

They should check that with Avx2 (today, or Avx after #114575) enabled, AvxVnniInt8+16 are optimistically supported (return null) and that with Avx10v2 enabled, they are enabled too (return true).

I'm not sure why AVX10v1 doesn't have a config too already, actually.

You'll need to add NAOT smoke tests for AVX10v2 as well, under https://github.com/dotnet/runtime/tree/main/src/tests/nativeaot/SmokeTests/HardwareIntrinsics

They should check that with Avx2 (today, or Avx after #114575) enabled, AvxVnniInt8+16 are optimistically supported (return null) and that with Avx10v2 enabled, they are enabled too (return true). I'm not sure why AVX10v1 doesn't have a config too already, actually.

Looking into this.

The spec I have doesn't list VEX encodings for the instructions -- only EVEX.

What I'm saying is that while the instructions are supported, I'm not sure it's safe to assume the VEX encodings of them will be supported unless we explicitly check the AVX-VNNI-INT8/16 flag as well.

True. We cannot say that Vex encodings of AvxVnniInt* will be supported if Evex encodings will be supported. But where do you want to put the ISA checks? If Avx10.2 is not available and AvxVnniInt* is, that should be suffice to asusme Vex encoding support according to me.

If Avx10.2 is not available and AvxVnniInt* is, that should be suffice to asusme Vex encoding support

That's correct. You can assume that no AVX10.2 means the support must come from AVX-VNNI-INT* and that VEX is supported.

However, Avx10.2 currently implies AvxVnniInt*, which means you have no way to check that the instructions are supported but that VEX encoding is not. If AVX10.2 only guarantees EVEX encoding support, you need an additional ISA you can check for VEX support, which is why I suggested a third ISA for each of these.

where do you want to put the ISA checks?

I believe you'll want to look at emitter::TakesEvexPrefix, where typically we will only choose EVEX encoding when necessary (for kmask, embedded broadcast, or extended registers). There needs to be an additional check for these new instructions that says that when the VEX support ISA is not present, they must be EVEX.

Again, this is only if the spec has not been revised since the last publication to indicate that the VEX form is guaranteed to be valid.

I think we have a conclusion here.
Avx10.2 will support Evex as well as Vex versions of Avx-Vnni-Int* instructions. This would mostly clear up all our discussions since now, we can move forward assuming that we will either have support for Vex or both Vex adn Evex versions of these instructions. The change should reflect in my latest commit.

@saucecontrol for review.

saucecontrol · 2025-04-18T19:27:02Z

src/coreclr/jit/instrsxarch.h

 INST3(LAST_AVX10v2_INSTRUCTION, "LAST_AVX10v2_INSTRUCTION", IUM_WR, BAD_CODE, BAD_CODE, BAD_CODE, INS_TT_NONE, INS_FLAGS_None)
+
+INST3(FIRST_AVXVNNIINT16_INSTRUCTION, "FIRST_AVXVNNIINT16_INSTRUCTION", IUM_WR, BAD_CODE, BAD_CODE, BAD_CODE, INS_TT_NONE, INS_FLAGS_None)
+INST3(vpdpwsud,         "pdpwsud",          IUM_WR, BAD_CODE,               BAD_CODE,     PSSE38(0xf3, 0xD2),          INS_TT_FULL,                         Input_32Bit    | REX_W0       | Encoding_EVEX             | INS_Flags_EmbeddedBroadcastSupported | INS_Flags_IsDstDstSrcAVXInstruction)                                         // Multiply individual words of first source operand with individual words of second source operand and add the results


When AVX-VNNI-INT8/16 CPUID bits are set, these instructions support VEX encoding. Only if AVX10.2 is also supported would they support EVEX-encoded forms.

There are definitely CPUs in the wild that support the VEX forms only (e.g. Lunar Lake). I think the current modeling of the ISAs will only work if it's guaranteed that anything supporting the EVEX form also supports VEX. In other words, we'd have to say that AVX10.2 implies AVX-VNNI-INT8/16, which the spec doesn't support.

We have the same issue modeling AVX-VNNI, where AVX10.1 says the instructions are available but only guarantees the EVEX forms.

We solved a similar problem with PCLMULQDQ and VPCLMULQDQ, where additional ISA checks are required in order to know whether VEX and EVEX encoding are supported. However, in that case we knew that VPCLMULQDQ strictly implied PCLMULQDQ, and EVEX encoding required AVX-512VL+VPCLMULQDQ, which we could check separately. In other words, you could have VEX-only, but you couldn't have EVEX without VEX.

Added a patch in my latest commit to see if handling vex encodings can help with the test failures.

This too shall be resolved with #113956 (comment)

…vailable

tannergooding · 2025-04-28T20:02:26Z

src/coreclr/jit/hwintrinsic.cpp

+                        intrinsic = (op2Type == TYP_UBYTE)
+                                        ? NI_EVEX_MultiplyWideningAndAddByteByte
+                                        : ((op3Type == TYP_UBYTE) ? NI_EVEX_MultiplyWideningAndAddSByteByte
+                                                                  : NI_EVEX_MultiplyWideningAndAddSByteSByte);


This doesn't look correct. We shouldn't be selecting some NI_EVEX_* intrinsic for the VEX encoding.

We should instead select NI_AVXVNNIINT8_MultiplyWideningAndAdd and then that can implicitly upgrade to the EVEX form, much as NI_SSE_Add can do.

Yes. I have changed the implementation to reflect this and we will use Vex encoding where only Avx-Vnni-Int* is present.

tannergooding · 2025-04-28T20:03:09Z

src/coreclr/jit/hwintrinsiccodegenxarch.cpp

+    if (!compiler->compIsaSupportedDebugOnly(isa))
+    {
+        printf("isa: %d\n", isa);
+        printf("evex: %d\n", InstructionSet_EVEX);
+    }


nit: This should probably be using the JITDUMP or alternative API rather than just calling printf

I was not able to reproduce it offline so was trying to debug from CI. I have removed it now.

tannergooding · 2025-04-28T20:04:06Z

src/coreclr/jit/hwintrinsiclistxarch.h

@@ -1725,6 +1765,18 @@ HARDWARE_INTRINSIC(EVEX,            CompareUnorderedMask,
 HARDWARE_INTRINSIC(EVEX,            ConvertMaskToVector,                        -1,              1,     {INS_vpmovm2b,          INS_vpmovm2b,           INS_vpmovm2w,           INS_vpmovm2w,           INS_vpmovm2d,           INS_vpmovm2d,           INS_vpmovm2q,           INS_vpmovm2q,           INS_vpmovm2d,           INS_vpmovm2q},          HW_Category_SimpleSIMD,             HW_Flag_NoContainment|HW_Flag_ReturnsPerElementMask)
 HARDWARE_INTRINSIC(EVEX,            ConvertVectorToMask,                        -1,              1,     {INS_vpmovb2m,          INS_vpmovb2m,           INS_vpmovw2m,           INS_vpmovw2m,           INS_vpmovd2m,           INS_vpmovd2m,           INS_vpmovq2m,           INS_vpmovq2m,           INS_vpmovd2m,           INS_vpmovq2m},          HW_Category_SimpleSIMD,             HW_Flag_NoContainment|HW_Flag_ReturnsPerElementMask)
 HARDWARE_INTRINSIC(EVEX,            MoveMask,                                   -1,              1,     {INS_invalid,           INS_invalid,            INS_invalid,            INS_invalid,            INS_invalid,            INS_invalid,            INS_invalid,            INS_invalid,            INS_invalid,            INS_invalid},           HW_Category_Special,                HW_Flag_NoContainment)
+HARDWARE_INTRINSIC(EVEX,            MultiplyWideningAndAddSByteSByte,           -1,              3,     {INS_vpdpbssd,          INS_invalid,            INS_invalid,            INS_invalid,            INS_invalid,            INS_invalid,            INS_invalid,            INS_invalid,            INS_invalid,            INS_invalid},           HW_Category_SimpleSIMD,             HW_Flag_BaseTypeFromSecondArg|HW_Flag_EmbBroadcastCompatible|HW_Flag_EmbMaskingCompatible)


We shouldn't need any of these EVEX cases. The NI_EVEX_* intrinsics are largely for special intrinsics that deal with masks and which do not map directly to a managed signature.

Yes. I was using them as a fallback for all the Avx-Vnni-Int* APIs since the APIs had same names with same type of first 2 args. But that is resolved now with my latest change.

khushal1996 · 2025-04-29T18:52:42Z

src/tests/nativeaot/SmokeTests/HardwareIntrinsics/X64Avx10v2.csproj

+    <!-- Test infra issue on apple devices: https://github.com/dotnet/runtime/issues/89917 -->
+    <CLRTestTargetUnsupported Condition="'$(TargetsAppleMobile)' == 'true'">true</CLRTestTargetUnsupported>
+    <AllowUnsafeBlocks>true</AllowUnsafeBlocks>
+    <DefineConstants>$(DefineConstants);AVX10v2_INTRINSICS;VECTORT512_INTRINSICS</DefineConstants>


Hi @saucecontrol
Looks like we cannot add the smoketest for Avx10 just yet. We can do it once IsProcessorFeaturePresent (link) supports Avx10. Do you agree? Until then, the smoketest will keep failing.

dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Mar 27, 2025

dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Mar 27, 2025

khushal1996 force-pushed the kcm-avxvnniint8-cpuid branch from 6714249 to 141d643 Compare March 27, 2025 02:06

khushal1996 marked this pull request as ready for review March 27, 2025 06:07

khushal1996 requested a review from MichalStrehovsky as a code owner March 27, 2025 06:07

tannergooding reviewed Mar 27, 2025

View reviewed changes

saucecontrol reviewed Mar 27, 2025

View reviewed changes

src/coreclr/tools/Common/JitInterface/ThunkGenerator/InstructionSetDesc.txt Show resolved Hide resolved

khushal1996 added 3 commits April 14, 2025 14:14

Add CPUID for AvxVnniInt8 and AvxVnniInt16

ea608ba

AVXVNNIINT* API surface and template tests

aa489fe

Run formatting

98fc970

khushal1996 force-pushed the kcm-avxvnniint8-cpuid branch from 141d643 to 98fc970 Compare April 14, 2025 21:40

tannergooding self-requested a review April 14, 2025 21:44

tannergooding self-assigned this Apr 14, 2025

saucecontrol reviewed Apr 14, 2025

View reviewed changes

Remove new keyword where not required

aeb858b

This was referenced Apr 16, 2025

[wasm][AOT] emcc : error - received SIGKILL (-9) #89402

Open

[browser][MT] test failure: Tests timed out. Killing driver service #103524

Open

build-analysis bot mentioned this pull request Apr 16, 2025

CI flakiness: mono interpreter build getting killed #114123

Open

khushal1996 added 3 commits April 17, 2025 10:52

Move AvxVnniInt* with other Vex instruction sets

d31e52f

Add smoke test for Avx10.2 and add AvxvnniInt* Isas to those tests

d7d6f51

Correct smoke tests for avx2

09c2032

khushal1996 force-pushed the kcm-avxvnniint8-cpuid branch from e6cf454 to 90fa072 Compare April 18, 2025 02:13

This was referenced Apr 18, 2025

Tracking: Failed to find a suitable device for target tvos-device dotnet/dnceng#5179

Open

Exclude System.IO.Compression async tests from wasm #114769

Closed

Add AvxVnniInt* implications

6c9e153

khushal1996 force-pushed the kcm-avxvnniint8-cpuid branch from d220726 to 6c9e153 Compare April 18, 2025 18:37

correct smoke test for AVX2

640af72

khushal1996 force-pushed the kcm-avxvnniint8-cpuid branch from c373304 to 640af72 Compare April 18, 2025 18:38

saucecontrol reviewed Apr 18, 2025

View reviewed changes

Enable Vex encoding of AvxVnniInt* instructions when Avx10.2 is not a…

5b3391c

…vailable

This was referenced Apr 19, 2025

Failures in System.Net.Http.Functional.Tests.PlatformHandler_HttpClientHandlerTest.GetAsync_ManyDifferentResponseHeaders_ParsedCorrectly #114790

Closed

[browser] NoFingerprint, LazyLoadingTests.LoadLazyAssemblyBeforeItIsNeeded, timeout #113836

Open

tannergooding reviewed Apr 28, 2025

View reviewed changes

Avx10.2 will support VEX versions of AvxVnniInt*

d0529a8

khushal1996 force-pushed the kcm-avxvnniint8-cpuid branch from be2836e to d0529a8 Compare April 29, 2025 05:16

khushal1996 added 2 commits April 28, 2025 22:23

Merge branch 'main' into kcm-avxvnniint8-cpuid

f213543

Run formatting

f0df13d

This was referenced Apr 29, 2025

tracing/runtimeeventsource/nativeruntimeeventsource/nativeruntimeeventsource failing in CI #90605

Open

[wasm] at System.Net.Http.BrowserHttpInterop.CancellationHelper #112533

Open

khushal1996 commented Apr 29, 2025

View reviewed changes

Merge branch 'main' into kcm-avxvnniint8-cpuid

8908888

build-analysis bot mentioned this pull request May 2, 2025

Test failure: baseservices/exceptions/stackoverflow/stackoverflowtester/stackoverflowtester.cmd #110173

Open

Merge branch 'main' into kcm-avxvnniint8-cpuid

413fa39

khushal1996 requested review from saucecontrol and tannergooding May 5, 2025 22:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add CPUID for AvxVnniInt8 and AvxVnniInt16 #113956

Add CPUID for AvxVnniInt8 and AvxVnniInt16 #113956

khushal1996 commented Mar 27, 2025 •

edited

Loading

khushal1996 commented Mar 27, 2025

tannergooding Mar 27, 2025

khushal1996 Mar 27, 2025

tannergooding Mar 27, 2025

khushal1996 Mar 27, 2025

khushal1996 commented Apr 14, 2025

saucecontrol Apr 14, 2025

khushal1996 Apr 15, 2025

saucecontrol Apr 15, 2025 •

edited

Loading

saucecontrol Apr 15, 2025

khushal1996 Apr 15, 2025

saucecontrol Apr 18, 2025

khushal1996 Apr 18, 2025

saucecontrol Apr 19, 2025

khushal1996 Apr 29, 2025

khushal1996 May 2, 2025

saucecontrol Apr 18, 2025 •

edited

Loading

khushal1996 Apr 18, 2025

khushal1996 Apr 29, 2025

tannergooding Apr 28, 2025

khushal1996 Apr 29, 2025

tannergooding Apr 28, 2025

khushal1996 Apr 29, 2025

tannergooding Apr 28, 2025

khushal1996 Apr 29, 2025

khushal1996 Apr 29, 2025

Add CPUID for AvxVnniInt8 and AvxVnniInt16 #113956

Are you sure you want to change the base?

Add CPUID for AvxVnniInt8 and AvxVnniInt16 #113956

Conversation

khushal1996 commented Mar 27, 2025 • edited Loading

Design

Testing

Run JIT subtree with AVXVNNIINT* enabled / disabled

khushal1996 commented Mar 27, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

khushal1996 commented Apr 14, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

saucecontrol Apr 15, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

saucecontrol Apr 18, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

khushal1996 commented Mar 27, 2025 •

edited

Loading

saucecontrol Apr 15, 2025 •

edited

Loading

saucecontrol Apr 18, 2025 •

edited

Loading