Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Add pattern matching for SVE intrinsics that operate on mask operands #114438

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

snickolls-arm
Copy link
Contributor

Introduces fgMorphTryUseAllMaskVariant for ARM64 that looks for various named intrinsics that have operands that look 'mask-like'. E.g. source operands originating from Sve.CreateTrueMask* may be recognized as masks, causing the JIT to prefer to use the predicated version of the instruction as codegen for the intrinsic. It will also inspect ConditionalSelect intrinsic nodes to match instructions with governing predicates. The transform runs during morph.

It's possible to emit the following instructions after this patch:

* ZIP{1,2} <Pd>.<T>, <Pn>.<T>, <Pm>.<T> (Sve.ZipLow, Sve.ZipHigh)
* UZP{1,2} <Pd>.<T>, <Pn>.<T>, <Pm>.<T> (Sve.UnzipEven, Sve.UnzipOdd)
* TRN{1,2} <Pd>.<T>, <Pn>.<T>, <Pm>.<T> (Sve.TransposeEven, Sve.TransposeOdd)
* REV <Pd>.<T>, <Pn>.<T>                (Sve.ReverseElement)
* AND <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.B    (Sve.And)
* BIC <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.B    (Sve.BitwiseClear)
* EOR <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.B    (Sve.Xor)
* ORR <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.B    (Sve.Or)
* SEL <Pd>.B, <Pg>, <Pn>.B, <Pm>.B      (Sve.ConditionalSelect)

Contributes towards #101970

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Apr 9, 2025
@snickolls-arm
Copy link
Contributor Author

@a74nh @kunalspathak

@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Apr 9, 2025
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

@kunalspathak kunalspathak added the arm-sve Work related to arm64 SVE/SVE2 support label Apr 9, 2025
@a74nh
Copy link
Contributor

a74nh commented Apr 9, 2025

It'd be nice to add some disasm tests here. But I don't think we currently can for SVE (we couldn't back in #109286 from what I remember).

AIUI, the problem is the ARM64-FULL-LINE command has to be valid where ever it's run, and we can't just work around it by putting an if(SVE) check around it.

Has that issue gone away now we have cobalt in the CI?

Alternatively, could we add ARM64-SVE-FULL-LINE to the disasmcheck infrastructure?

//
GenTree* Compiler::gtNewSimdAllFalseMaskNode(unsigned simdSize)
{
return gtNewSimdHWIntrinsicNode(TYP_MASK, NI_Sve_CreateFalseMaskByte, CORINFO_TYPE_BYTE, simdSize);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure on this line.

It should be switch(type) case byte: NI_Sve_CreateFalseMaskByte; case Int32: NI_Sve_CreateFalseMaskInt32 etc etc to keep to the hwintrinsiclistarm64sve.h interface.

However, regardless of which is used, it'll still produce the same pfalse instruction.

Alternatively, add a NI_Sve_CreateFalseMaskAll similar to NI_Sve_CreateTrueMaskAll which can take any type. But that require support adding to a few additional files.

@kunalspathak
Copy link
Member

Seems some test failure

Beginning scenario: ConditionalSelect_FalseOp_all - operation in FalseValue

Assert failure(PID 9260 [0x0000242c], Thread: 8020 [0x1f54]): Assertion failed 'ins != INS_invalid' in 'JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_AbsoluteCompareGreaterThan_float:ConditionalSelect_ZeroOp():this' during 'Generate code' (IL size 231; hash 0xca76bdfb; FullOpts)

    File: D:\a\_work\1\s\src\coreclr\jit\hwintrinsiccodegenarm64.cpp:363
    Image: C:\h\w\A54C0924\p\corerun.exe

Introduces `fgMorphTryUseAllMaskVariant` for ARM64 that looks for various
named intrinsics that have operands that look 'mask-like'. E.g. source
operands originating from Sve.CreateTrueMask* may be recognized as
masks, causing the JIT to prefer to use the predicated version of the
instruction as codegen for the intrinsic. It will also inspect
ConditionalSelect intrinsic nodes to match instructions with governing
predicates. The transform runs during morph.

It's possible to emit the following instructions after this patch:
* ZIP{1,2} <Pd>.<T>, <Pn>.<T>, <Pm>.<T> (Sve.ZipLow, Sve.ZipHigh)
* UZP{1,2} <Pd>.<T>, <Pn>.<T>, <Pm>.<T> (Sve.UnzipEven, Sve.UnzipOdd)
* TRN{1,2} <Pd>.<T>, <Pn>.<T>, <Pm>.<T> (Sve.TransposeEven, Sve.TransposeOdd)
* REV <Pd>.<T>, <Pn>.<T>                (Sve.ReverseElement)
* AND <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.B    (Sve.And)
* BIC <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.B    (Sve.BitwiseClear)
* EOR <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.B    (Sve.Xor)
* ORR <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.B    (Sve.Or)
* SEL <Pd>.B, <Pg>, <Pn>.B, <Pm>.B      (Sve.ConditionalSelect)

Contributes towards dotnet#101970
@snickolls-arm
Copy link
Contributor Author

@kunalspathak I have fixed the test and some other build issues, this should be ready for review now.

Copy link
Member

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some questions/suggestion

{
switch (GetHWIntrinsicId())
{
// ZIP1 <Pd>.<T>, <Pn>.<T>, <Pm>.<T>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wondering if we should add a HW_Flag_AllMaskVariant for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't want to use flag space as this list is unlikely to grow, these were the only instructions I could find that follow this pattern across all versions of SVE. But it might make it easier to apply this transform to other intrinsics in future if we find other patterns work too.

//
GenTree* Compiler::fgMorphTryUseAllMaskVariant(GenTreeHWIntrinsic* node)
{
if (node->HasAllMaskVariant() && canMorphAllVectorOperandsToMasks(node))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes sense to have node->HasAllMaskVariant() inside canMorphAllVectorOperandsToMasks() itself. That way for conditional select's left operand too, you can (and should) exercise it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with this change if all of these intrinsics have HasAllMaskVariant() == true, but I don't think this works, see my comment below.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If #114438 (comment) works, then consider tagging them with HW_Flag_AllMaskVariant, move the HasAllMaskVariant() inside canMorphAllVectorOperandsToMasks. Having HW_Flag_AllMaskVariant in table helps in easy discoverability of various flags in one place.

// BIC <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.B
// EOR <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.B
// ORR <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.B
case NI_Sve_And:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think these too should be marked as HW_Flag_AllMaskVariant and looked for in HasAllMaskVariant() itself.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried grouping these intrinsics with the others initially but it doesn't work because these ones should only be considered in relation to a ConditionalSelect. Grouping them with the others causes a transformation to run when a ConditionalSelect is not present, which wouldn't be correct for these instructions because they require the mask parameter for the governing predicate.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We wrap the IR nodes that has embedded mask semantics like And inside a ConditionalSelect during lowering, which runs way after the morphing phase where you are doing this optimization. See if (HWIntrinsicInfo::IsEmbeddedMaskedOperation(intrinsicId)) in LowerHWIntrinsic(). Until then, they continue to hold Vector operands. If you do the transformation here for IR nodes that has And(mask, mask), it shouldn't prohibit us from wrapping it in ConditionalSelect in lowering.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently it's marked HW_Flag_OptionalEmbeddedMaskedOperation, so I think this wrapping isn't occurring for this intrinsic? When I try to implement it like this, it changes all the operands to masks and then tries to emit AND <Zd>.D, <Zn>.D, <Zm>.D and runs into this assert because the register types are wrong:

assert(isVectorRegister(reg3)); // ddddd

The mask variant of this intrinsic has an embedded mask, but it's required for this instruction instead of optional, so there would also need to be some handling of this edge case in codegen to make sure it definitely wraps the mask variant in ConditionalSelect. It feels like there should be a separate set of flags for when the intrinsic is TYP_MASK or TYP_SIMD. E.g. HW_Flag_MaskVariant(Optional)EmbeddedMaskOperation, etc.

Copy link
Member

@kunalspathak kunalspathak Apr 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should have a separate intrinsics And_Predicates (and likewise for other APIs that have predicates variant). They are added in the section "Special intrinsics that are generated during importing or lowering". And_Predicates should have HW_Flag_EmbeddedMaskedOperation. We can have flag HW_Flag_AllMaskVariant on SVE_And intrinsics, to detect it in morph if this can be transformed into And_Predicates variant.

We come here in the morph and see And(Vector, Vector). If operands are mask, we can transform the node into And_Predicates(Mask, Mask). During lowering, we can then transform it into CndSel(AllTrue, And_Predicates(Mask, Mask), Zero) and codegen will handle generating the predicated version of And (predicates).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds much better than what I was thinking, I'll try and implement this.

// Return Value:
// The mask
//
GenTree* Compiler::gtNewSimdAllFalseMaskNode(unsigned simdSize)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
GenTree* Compiler::gtNewSimdAllFalseMaskNode(unsigned simdSize)
GenTree* Compiler::gtNewSimdFalseMaskByteNode(unsigned simdSize)

@@ -9218,6 +9218,15 @@ GenTree* Compiler::fgOptimizeHWIntrinsic(GenTreeHWIntrinsic* node)
}
}

#ifdef TARGET_ARM64
optimizedTree = fgMorphTryUseAllMaskVariant(node);
if (optimizedTree != nullptr)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having it here might be preventing the node from getting further transformations/optimizations. Should this be done towards the end of this method?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems fine, I've moved it later in the method.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI arm-sve Work related to arm64 SVE/SVE2 support community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants