Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Add CPUID for AvxVnniInt8 and AvxVnniInt16 #113956

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 15 commits into
base: main
Choose a base branch
from

Conversation

khushal1996
Copy link
Member

@khushal1996 khushal1996 commented Mar 27, 2025

This PR adds support for CPUID for AVX-VNNI-INT8 & AVX-VNNI-INT16 ISAs

Design

image
image

The changes are made in a way to enable the 2 ISAs when

  1. Avx10.2 is enabled or
  2. CPUID for both ISAs are enabled

This is w.r.t the discussions done in API proposal #112586

Testing

Note1: Emitter unit tests not ran since they are added and verified along with AVX10.2 PR #111209

Note2: Superpmi results are not accurate since we are adding a new CPUID and it leads to a new jiteeversionguid. Even after changing the jiteeversion manually, superpmi run shows errors and failures based on the old mch files which can be ignored.

Run JIT subtree with AVXVNNIINT* enabled / disabled


AVXVNNIINT* Enabled
image

AVXVNNIINT* disabled
image

@khushal1996
Copy link
Member Author

@tannergooding This is first of the 2 PRs needed for AVX VNNI INT* API introduction #112586

Comment on lines 815 to 818
{ NI_Illegal, NI_Illegal }, // AVXVNNIINT8
{ NI_Illegal, NI_Illegal }, // AVXVNNIINT8_V512
{ NI_Illegal, NI_Illegal }, // AVXVNNIINT16
{ NI_Illegal, NI_Illegal }, // AVXVNNIINT16_V512
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason we're not adding the APIs at the same time? They look like they should be generally table driven, so it should be a minimal change on top...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can do that too. For all other ISAs, we generally did CPUID and API introduction as separate PRs. Also, it becomes easier to run superpmi once the CPUID PR goes in. Let me know what you'd prefer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For all other ISAs, we generally did CPUID and API introduction as separate PRs.

For some of the others, like AVX10.2, we've done it incrementally because of the number of APIs and total work required.

That is, checking in the CPUID support first allowed a reduction of conflicts and parallelization of adding a large number of intrinsic APIs across several PRs.

In this case, there's only a very small number of APIs that are likely entirely table driven, so there's little to no risk of conflicts or additional churn.

Doing it all at once lets us build confidence the CPUID checks and end to end story is correct since it is self contained like that and since it allows adding the CPUID and other tests at the same time.

Also, it becomes easier to run superpmi once the CPUID PR goes in

There's not much need to run SPMI for net new intrinsics that nothing is using yet, we're going to get zero diffs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohkay. Thanks for the review. I will switch this PR to add everything together and then update you.

@khushal1996 khushal1996 force-pushed the kcm-avxvnniint8-cpuid branch from 141d643 to 98fc970 Compare April 14, 2025 21:40
@khushal1996
Copy link
Member Author

@tannergooding @saucecontrol I have added the CPUID, API surface, JIT handling and template tests here.

@tannergooding tannergooding self-requested a review April 14, 2025 21:44
@tannergooding tannergooding self-assigned this Apr 14, 2025
@@ -233,6 +233,10 @@ public static InstructionSetSupport ConfigureInstructionSetSupport(string instru
optimisticInstructionSetSupportBuilder.AddSupportedInstructionSet("vpclmul_v512");
optimisticInstructionSetSupportBuilder.AddSupportedInstructionSet("avx10v2");
optimisticInstructionSetSupportBuilder.AddSupportedInstructionSet("avx10v2_v512");
optimisticInstructionSetSupportBuilder.AddSupportedInstructionSet("avxvnniint8");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

avxvnniint8 and avxvnniint16 should be added with the other VEX instruction sets, where avxvnni is. (you'll have conflicts with #114575 depending which lands first)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohkay. I was thinking that we add AvxVnniInt* in both the places. does that make sense? we want AvxVnniInt* to be available independently as well as with Avx10.2.

Copy link
Member

@saucecontrol saucecontrol Apr 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They only need to be added under the AVX check. AVX10v2 implies AVX, so both blocks will execute in that case.

n.b. the _V512 ones would stay where they are.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll need to add NAOT smoke tests for AVX10v2 as well, under https://github.com/dotnet/runtime/tree/main/src/tests/nativeaot/SmokeTests/HardwareIntrinsics

They should check that with Avx2 (today, or Avx after #114575) enabled, AvxVnniInt8+16 are optimistically supported (return null) and that with Avx10v2 enabled, they are enabled too (return true).
 
I'm not sure why AVX10v1 doesn't have a config too already, actually.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll need to add NAOT smoke tests for AVX10v2 as well, under https://github.com/dotnet/runtime/tree/main/src/tests/nativeaot/SmokeTests/HardwareIntrinsics

They should check that with Avx2 (today, or Avx after #114575) enabled, AvxVnniInt8+16 are optimistically supported (return null) and that with Avx10v2 enabled, they are enabled too (return true).   I'm not sure why AVX10v1 doesn't have a config too already, actually.

Looking into this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spec I have doesn't list VEX encodings for the instructions -- only EVEX.
image

What I'm saying is that while the instructions are supported, I'm not sure it's safe to assume the VEX encodings of them will be supported unless we explicitly check the AVX-VNNI-INT8/16 flag as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. We cannot say that Vex encodings of AvxVnniInt* will be supported if Evex encodings will be supported. But where do you want to put the ISA checks? If Avx10.2 is not available and AvxVnniInt* is, that should be suffice to asusme Vex encoding support according to me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If Avx10.2 is not available and AvxVnniInt* is, that should be suffice to asusme Vex encoding support

That's correct. You can assume that no AVX10.2 means the support must come from AVX-VNNI-INT* and that VEX is supported.

However, Avx10.2 currently implies AvxVnniInt*, which means you have no way to check that the instructions are supported but that VEX encoding is not. If AVX10.2 only guarantees EVEX encoding support, you need an additional ISA you can check for VEX support, which is why I suggested a third ISA for each of these.

where do you want to put the ISA checks?

I believe you'll want to look at emitter::TakesEvexPrefix, where typically we will only choose EVEX encoding when necessary (for kmask, embedded broadcast, or extended registers). There needs to be an additional check for these new instructions that says that when the VEX support ISA is not present, they must be EVEX.

Again, this is only if the spec has not been revised since the last publication to indicate that the VEX form is guaranteed to be valid.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have a conclusion here.
Avx10.2 will support Evex as well as Vex versions of Avx-Vnni-Int* instructions. This would mostly clear up all our discussions since now, we can move forward assuming that we will either have support for Vex or both Vex adn Evex versions of these instructions. The change should reflect in my latest commit.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@saucecontrol for review.

@khushal1996 khushal1996 force-pushed the kcm-avxvnniint8-cpuid branch from d220726 to 6c9e153 Compare April 18, 2025 18:37
@khushal1996 khushal1996 force-pushed the kcm-avxvnniint8-cpuid branch from c373304 to 640af72 Compare April 18, 2025 18:38
INST3(LAST_AVX10v2_INSTRUCTION, "LAST_AVX10v2_INSTRUCTION", IUM_WR, BAD_CODE, BAD_CODE, BAD_CODE, INS_TT_NONE, INS_FLAGS_None)

INST3(FIRST_AVXVNNIINT16_INSTRUCTION, "FIRST_AVXVNNIINT16_INSTRUCTION", IUM_WR, BAD_CODE, BAD_CODE, BAD_CODE, INS_TT_NONE, INS_FLAGS_None)
INST3(vpdpwsud, "pdpwsud", IUM_WR, BAD_CODE, BAD_CODE, PSSE38(0xf3, 0xD2), INS_TT_FULL, Input_32Bit | REX_W0 | Encoding_EVEX | INS_Flags_EmbeddedBroadcastSupported | INS_Flags_IsDstDstSrcAVXInstruction) // Multiply individual words of first source operand with individual words of second source operand and add the results
Copy link
Member

@saucecontrol saucecontrol Apr 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When AVX-VNNI-INT8/16 CPUID bits are set, these instructions support VEX encoding. Only if AVX10.2 is also supported would they support EVEX-encoded forms.

There are definitely CPUs in the wild that support the VEX forms only (e.g. Lunar Lake). I think the current modeling of the ISAs will only work if it's guaranteed that anything supporting the EVEX form also supports VEX. In other words, we'd have to say that AVX10.2 implies AVX-VNNI-INT8/16, which the spec doesn't support.

We have the same issue modeling AVX-VNNI, where AVX10.1 says the instructions are available but only guarantees the EVEX forms.

We solved a similar problem with PCLMULQDQ and VPCLMULQDQ, where additional ISA checks are required in order to know whether VEX and EVEX encoding are supported. However, in that case we knew that VPCLMULQDQ strictly implied PCLMULQDQ, and EVEX encoding required AVX-512VL+VPCLMULQDQ, which we could check separately. In other words, you could have VEX-only, but you couldn't have EVEX without VEX.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a patch in my latest commit to see if handling vex encodings can help with the test failures.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This too shall be resolved with #113956 (comment)

Comment on lines 2244 to 2247
intrinsic = (op2Type == TYP_UBYTE)
? NI_EVEX_MultiplyWideningAndAddByteByte
: ((op3Type == TYP_UBYTE) ? NI_EVEX_MultiplyWideningAndAddSByteByte
: NI_EVEX_MultiplyWideningAndAddSByteSByte);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't look correct. We shouldn't be selecting some NI_EVEX_* intrinsic for the VEX encoding.

We should instead select NI_AVXVNNIINT8_MultiplyWideningAndAdd and then that can implicitly upgrade to the EVEX form, much as NI_SSE_Add can do.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I have changed the implementation to reflect this and we will use Vex encoding where only Avx-Vnni-Int* is present.

Comment on lines 407 to 411
if (!compiler->compIsaSupportedDebugOnly(isa))
{
printf("isa: %d\n", isa);
printf("evex: %d\n", InstructionSet_EVEX);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This should probably be using the JITDUMP or alternative API rather than just calling printf

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was not able to reproduce it offline so was trying to debug from CI. I have removed it now.

@@ -1725,6 +1765,18 @@ HARDWARE_INTRINSIC(EVEX, CompareUnorderedMask,
HARDWARE_INTRINSIC(EVEX, ConvertMaskToVector, -1, 1, {INS_vpmovm2b, INS_vpmovm2b, INS_vpmovm2w, INS_vpmovm2w, INS_vpmovm2d, INS_vpmovm2d, INS_vpmovm2q, INS_vpmovm2q, INS_vpmovm2d, INS_vpmovm2q}, HW_Category_SimpleSIMD, HW_Flag_NoContainment|HW_Flag_ReturnsPerElementMask)
HARDWARE_INTRINSIC(EVEX, ConvertVectorToMask, -1, 1, {INS_vpmovb2m, INS_vpmovb2m, INS_vpmovw2m, INS_vpmovw2m, INS_vpmovd2m, INS_vpmovd2m, INS_vpmovq2m, INS_vpmovq2m, INS_vpmovd2m, INS_vpmovq2m}, HW_Category_SimpleSIMD, HW_Flag_NoContainment|HW_Flag_ReturnsPerElementMask)
HARDWARE_INTRINSIC(EVEX, MoveMask, -1, 1, {INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid}, HW_Category_Special, HW_Flag_NoContainment)
HARDWARE_INTRINSIC(EVEX, MultiplyWideningAndAddSByteSByte, -1, 3, {INS_vpdpbssd, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid}, HW_Category_SimpleSIMD, HW_Flag_BaseTypeFromSecondArg|HW_Flag_EmbBroadcastCompatible|HW_Flag_EmbMaskingCompatible)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't need any of these EVEX cases. The NI_EVEX_* intrinsics are largely for special intrinsics that deal with masks and which do not map directly to a managed signature.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I was using them as a fallback for all the Avx-Vnni-Int* APIs since the APIs had same names with same type of first 2 args. But that is resolved now with my latest change.

@khushal1996 khushal1996 force-pushed the kcm-avxvnniint8-cpuid branch from be2836e to d0529a8 Compare April 29, 2025 05:16
<!-- Test infra issue on apple devices: https://github.com/dotnet/runtime/issues/89917 -->
<CLRTestTargetUnsupported Condition="'$(TargetsAppleMobile)' == 'true'">true</CLRTestTargetUnsupported>
<AllowUnsafeBlocks>true</AllowUnsafeBlocks>
<DefineConstants>$(DefineConstants);AVX10v2_INTRINSICS;VECTORT512_INTRINSICS</DefineConstants>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @saucecontrol
Looks like we cannot add the smoketest for Avx10 just yet. We can do it once IsProcessorFeaturePresent (link) supports Avx10. Do you agree? Until then, the smoketest will keep failing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community-contribution Indicates that the PR has been added by a community member needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants