Add pattern matching for SVE intrinsics that operate on mask operands #114438

snickolls-arm · 2025-04-09T13:43:28Z

Introduces fgMorphTryUseAllMaskVariant for ARM64 that looks for various named intrinsics that have operands that look 'mask-like'. E.g. source operands originating from Sve.CreateTrueMask* may be recognized as masks, causing the JIT to prefer to use the predicated version of the instruction as codegen for the intrinsic. It will also inspect ConditionalSelect intrinsic nodes to match instructions with governing predicates. The transform runs during morph.

It's possible to emit the following instructions after this patch:

* ZIP{1,2} <Pd>.<T>, <Pn>.<T>, <Pm>.<T> (Sve.ZipLow, Sve.ZipHigh)
* UZP{1,2} <Pd>.<T>, <Pn>.<T>, <Pm>.<T> (Sve.UnzipEven, Sve.UnzipOdd)
* TRN{1,2} <Pd>.<T>, <Pn>.<T>, <Pm>.<T> (Sve.TransposeEven, Sve.TransposeOdd)
* REV <Pd>.<T>, <Pn>.<T>                (Sve.ReverseElement)
* AND <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.B    (Sve.And)
* BIC <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.B    (Sve.BitwiseClear)
* EOR <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.B    (Sve.Xor)
* ORR <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.B    (Sve.Or)
* SEL <Pd>.B, <Pg>, <Pn>.B, <Pm>.B      (Sve.ConditionalSelect)

Contributes towards #101970

snickolls-arm · 2025-04-09T13:43:46Z

@a74nh @kunalspathak

dotnet-policy-service · 2025-04-09T13:44:21Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

a74nh · 2025-04-09T15:10:46Z

It'd be nice to add some disasm tests here. But I don't think we currently can for SVE (we couldn't back in #109286 from what I remember).

AIUI, the problem is the ARM64-FULL-LINE command has to be valid where ever it's run, and we can't just work around it by putting an if(SVE) check around it.

Has that issue gone away now we have cobalt in the CI?

Alternatively, could we add ARM64-SVE-FULL-LINE to the disasmcheck infrastructure?

a74nh · 2025-04-09T15:15:59Z

src/coreclr/jit/hwintrinsicarm64.cpp

+//
+GenTree* Compiler::gtNewSimdAllFalseMaskNode(unsigned simdSize)
+{
+    return gtNewSimdHWIntrinsicNode(TYP_MASK, NI_Sve_CreateFalseMaskByte, CORINFO_TYPE_BYTE, simdSize);


I'm not sure on this line.

It should be switch(type) case byte: NI_Sve_CreateFalseMaskByte; case Int32: NI_Sve_CreateFalseMaskInt32 etc etc to keep to the hwintrinsiclistarm64sve.h interface.

However, regardless of which is used, it'll still produce the same pfalse instruction.

Alternatively, add a NI_Sve_CreateFalseMaskAll similar to NI_Sve_CreateTrueMaskAll which can take any type. But that require support adding to a few additional files.

kunalspathak · 2025-04-10T15:09:09Z

Seems some test failure

Beginning scenario: ConditionalSelect_FalseOp_all - operation in FalseValue

Assert failure(PID 9260 [0x0000242c], Thread: 8020 [0x1f54]): Assertion failed 'ins != INS_invalid' in 'JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_AbsoluteCompareGreaterThan_float:ConditionalSelect_ZeroOp():this' during 'Generate code' (IL size 231; hash 0xca76bdfb; FullOpts)

    File: D:\a\_work\1\s\src\coreclr\jit\hwintrinsiccodegenarm64.cpp:363
    Image: C:\h\w\A54C0924\p\corerun.exe

Introduces `fgMorphTryUseAllMaskVariant` for ARM64 that looks for various named intrinsics that have operands that look 'mask-like'. E.g. source operands originating from Sve.CreateTrueMask* may be recognized as masks, causing the JIT to prefer to use the predicated version of the instruction as codegen for the intrinsic. It will also inspect ConditionalSelect intrinsic nodes to match instructions with governing predicates. The transform runs during morph. It's possible to emit the following instructions after this patch: * ZIP{1,2} <Pd>.<T>, <Pn>.<T>, <Pm>.<T> (Sve.ZipLow, Sve.ZipHigh) * UZP{1,2} <Pd>.<T>, <Pn>.<T>, <Pm>.<T> (Sve.UnzipEven, Sve.UnzipOdd) * TRN{1,2} <Pd>.<T>, <Pn>.<T>, <Pm>.<T> (Sve.TransposeEven, Sve.TransposeOdd) * REV <Pd>.<T>, <Pn>.<T> (Sve.ReverseElement) * AND <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.B (Sve.And) * BIC <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.B (Sve.BitwiseClear) * EOR <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.B (Sve.Xor) * ORR <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.B (Sve.Or) * SEL <Pd>.B, <Pg>, <Pn>.B, <Pm>.B (Sve.ConditionalSelect) Contributes towards dotnet#101970

snickolls-arm · 2025-04-23T15:17:07Z

@kunalspathak I have fixed the test and some other build issues, this should be ready for review now.

kunalspathak

Added some questions/suggestion

kunalspathak · 2025-04-23T19:53:34Z

src/coreclr/jit/morpharm64.cpp

+{
+    switch (GetHWIntrinsicId())
+    {
+        // ZIP1 <Pd>.<T>, <Pn>.<T>, <Pm>.<T>


wondering if we should add a HW_Flag_AllMaskVariant for this?

I didn't want to use flag space as this list is unlikely to grow, these were the only instructions I could find that follow this pattern across all versions of SVE. But it might make it easier to apply this transform to other intrinsics in future if we find other patterns work too.

kunalspathak · 2025-04-23T19:59:36Z

src/coreclr/jit/morpharm64.cpp

+//
+GenTree* Compiler::fgMorphTryUseAllMaskVariant(GenTreeHWIntrinsic* node)
+{
+    if (node->HasAllMaskVariant() && canMorphAllVectorOperandsToMasks(node))


It makes sense to have node->HasAllMaskVariant() inside canMorphAllVectorOperandsToMasks() itself. That way for conditional select's left operand too, you can (and should) exercise it.

I agree with this change if all of these intrinsics have HasAllMaskVariant() == true, but I don't think this works, see my comment below.

If #114438 (comment) works, then consider tagging them with HW_Flag_AllMaskVariant, move the HasAllMaskVariant() inside canMorphAllVectorOperandsToMasks. Having HW_Flag_AllMaskVariant in table helps in easy discoverability of various flags in one place.

kunalspathak · 2025-04-23T20:02:37Z

src/coreclr/jit/morpharm64.cpp

+                    // BIC <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.B
+                    // EOR <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.B
+                    // ORR <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.B
+                    case NI_Sve_And:


i think these too should be marked as HW_Flag_AllMaskVariant and looked for in HasAllMaskVariant() itself.

I tried grouping these intrinsics with the others initially but it doesn't work because these ones should only be considered in relation to a ConditionalSelect. Grouping them with the others causes a transformation to run when a ConditionalSelect is not present, which wouldn't be correct for these instructions because they require the mask parameter for the governing predicate.

We wrap the IR nodes that has embedded mask semantics like And inside a ConditionalSelect during lowering, which runs way after the morphing phase where you are doing this optimization. See if (HWIntrinsicInfo::IsEmbeddedMaskedOperation(intrinsicId)) in LowerHWIntrinsic(). Until then, they continue to hold Vector operands. If you do the transformation here for IR nodes that has And(mask, mask), it shouldn't prohibit us from wrapping it in ConditionalSelect in lowering.

Currently it's marked HW_Flag_OptionalEmbeddedMaskedOperation, so I think this wrapping isn't occurring for this intrinsic? When I try to implement it like this, it changes all the operands to masks and then tries to emit AND <Zd>.D, <Zn>.D, <Zm>.D and runs into this assert because the register types are wrong:

runtime/src/coreclr/jit/emitarm64sve.cpp

Line 2912 in c5922a1

assert(isVectorRegister(reg3)); // ddddd

The mask variant of this intrinsic has an embedded mask, but it's required for this instruction instead of optional, so there would also need to be some handling of this edge case in codegen to make sure it definitely wraps the mask variant in ConditionalSelect. It feels like there should be a separate set of flags for when the intrinsic is TYP_MASK or TYP_SIMD. E.g. HW_Flag_MaskVariant(Optional)EmbeddedMaskOperation, etc.

I think we should have a separate intrinsics And_Predicates (and likewise for other APIs that have predicates variant). They are added in the section "Special intrinsics that are generated during importing or lowering". And_Predicates should have HW_Flag_EmbeddedMaskedOperation. We can have flag HW_Flag_AllMaskVariant on SVE_And intrinsics, to detect it in morph if this can be transformed into And_Predicates variant.

We come here in the morph and see And(Vector, Vector). If operands are mask, we can transform the node into And_Predicates(Mask, Mask). During lowering, we can then transform it into CndSel(AllTrue, And_Predicates(Mask, Mask), Zero) and codegen will handle generating the predicated version of And (predicates).

This sounds much better than what I was thinking, I'll try and implement this.

kunalspathak · 2025-04-23T20:49:38Z

src/coreclr/jit/hwintrinsicarm64.cpp

+// Return Value:
+//    The mask
+//
+GenTree* Compiler::gtNewSimdAllFalseMaskNode(unsigned simdSize)


Suggested change

GenTree* Compiler::gtNewSimdAllFalseMaskNode(unsigned simdSize)

GenTree* Compiler::gtNewSimdFalseMaskByteNode(unsigned simdSize)

kunalspathak · 2025-04-23T20:53:52Z

src/coreclr/jit/morph.cpp

@@ -9218,6 +9218,15 @@ GenTree* Compiler::fgOptimizeHWIntrinsic(GenTreeHWIntrinsic* node)
        }
    }

+#ifdef TARGET_ARM64
+    optimizedTree = fgMorphTryUseAllMaskVariant(node);
+    if (optimizedTree != nullptr)


Having it here might be preventing the node from getting further transformations/optimizations. Should this be done towards the end of this method?

This seems fine, I've moved it later in the method.

dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Apr 9, 2025

dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Apr 9, 2025

kunalspathak added the arm-sve Work related to arm64 SVE/SVE2 support label Apr 9, 2025

a74nh reviewed Apr 9, 2025

View reviewed changes

build-analysis bot mentioned this pull request Apr 9, 2025

[linux-x64] [mono-aot] Test Runtime_101731.TestConvertToInt64NativeSingle(3.4028235E+38) returns exit code 22 #112557

Open

snickolls-arm force-pushed the sve-use-all-predicate-variants branch from ddb2472 to c5922a1 Compare April 11, 2025 10:18

build-analysis bot mentioned this pull request Apr 11, 2025

The Operation will be canceled. The next steps may not contain expected logs. dotnet/dnceng#3008

Open

3 tasks

snickolls-arm added 2 commits April 14, 2025 13:02

Fix test failure and add FileCheck tests

3ee0230

Don't run tests on OSX

782d7fd

Don't run tests for Mono

8793f72

kunalspathak reviewed Apr 23, 2025

View reviewed changes

snickolls-arm added 2 commits April 24, 2025 14:57

Move the transform later in fgOptimizeHWIntrinsic

0b56784

Rename gtNewSimdAllFalseMaskNode

0378754

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add pattern matching for SVE intrinsics that operate on mask operands #114438

Add pattern matching for SVE intrinsics that operate on mask operands #114438

snickolls-arm commented Apr 9, 2025

snickolls-arm commented Apr 9, 2025

dotnet-policy-service bot commented Apr 9, 2025

a74nh commented Apr 9, 2025

a74nh Apr 9, 2025

kunalspathak commented Apr 10, 2025

snickolls-arm commented Apr 23, 2025

kunalspathak left a comment

kunalspathak Apr 23, 2025

snickolls-arm Apr 24, 2025

kunalspathak Apr 23, 2025

snickolls-arm Apr 24, 2025

kunalspathak Apr 24, 2025

kunalspathak Apr 23, 2025

snickolls-arm Apr 24, 2025

kunalspathak Apr 24, 2025

snickolls-arm Apr 25, 2025

kunalspathak Apr 25, 2025 •

edited

Loading

snickolls-arm Apr 30, 2025

kunalspathak Apr 23, 2025

kunalspathak Apr 23, 2025

snickolls-arm Apr 24, 2025

	GenTree* Compiler::gtNewSimdAllFalseMaskNode(unsigned simdSize)
	GenTree* Compiler::gtNewSimdFalseMaskByteNode(unsigned simdSize)

Add pattern matching for SVE intrinsics that operate on mask operands #114438

Are you sure you want to change the base?

Add pattern matching for SVE intrinsics that operate on mask operands #114438

Conversation

snickolls-arm commented Apr 9, 2025

snickolls-arm commented Apr 9, 2025

dotnet-policy-service bot commented Apr 9, 2025

a74nh commented Apr 9, 2025

Choose a reason for hiding this comment

kunalspathak commented Apr 10, 2025

snickolls-arm commented Apr 23, 2025

kunalspathak left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kunalspathak Apr 25, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kunalspathak Apr 25, 2025 •

edited

Loading