-
Notifications
You must be signed in to change notification settings - Fork 5k
Add pattern matching for SVE intrinsics that operate on mask operands #114438
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add pattern matching for SVE intrinsics that operate on mask operands #114438
Conversation
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
It'd be nice to add some disasm tests here. But I don't think we currently can for SVE (we couldn't back in #109286 from what I remember). AIUI, the problem is the Has that issue gone away now we have cobalt in the CI? Alternatively, could we add |
src/coreclr/jit/hwintrinsicarm64.cpp
Outdated
// | ||
GenTree* Compiler::gtNewSimdAllFalseMaskNode(unsigned simdSize) | ||
{ | ||
return gtNewSimdHWIntrinsicNode(TYP_MASK, NI_Sve_CreateFalseMaskByte, CORINFO_TYPE_BYTE, simdSize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure on this line.
It should be switch(type) case byte: NI_Sve_CreateFalseMaskByte; case Int32: NI_Sve_CreateFalseMaskInt32
etc etc to keep to the hwintrinsiclistarm64sve.h interface.
However, regardless of which is used, it'll still produce the same pfalse
instruction.
Alternatively, add a NI_Sve_CreateFalseMaskAll
similar to NI_Sve_CreateTrueMaskAll
which can take any type. But that require support adding to a few additional files.
Seems some test failure
|
Introduces `fgMorphTryUseAllMaskVariant` for ARM64 that looks for various named intrinsics that have operands that look 'mask-like'. E.g. source operands originating from Sve.CreateTrueMask* may be recognized as masks, causing the JIT to prefer to use the predicated version of the instruction as codegen for the intrinsic. It will also inspect ConditionalSelect intrinsic nodes to match instructions with governing predicates. The transform runs during morph. It's possible to emit the following instructions after this patch: * ZIP{1,2} <Pd>.<T>, <Pn>.<T>, <Pm>.<T> (Sve.ZipLow, Sve.ZipHigh) * UZP{1,2} <Pd>.<T>, <Pn>.<T>, <Pm>.<T> (Sve.UnzipEven, Sve.UnzipOdd) * TRN{1,2} <Pd>.<T>, <Pn>.<T>, <Pm>.<T> (Sve.TransposeEven, Sve.TransposeOdd) * REV <Pd>.<T>, <Pn>.<T> (Sve.ReverseElement) * AND <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.B (Sve.And) * BIC <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.B (Sve.BitwiseClear) * EOR <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.B (Sve.Xor) * ORR <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.B (Sve.Or) * SEL <Pd>.B, <Pg>, <Pn>.B, <Pm>.B (Sve.ConditionalSelect) Contributes towards dotnet#101970
ddb2472
to
c5922a1
Compare
@kunalspathak I have fixed the test and some other build issues, this should be ready for review now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some questions/suggestion
{ | ||
switch (GetHWIntrinsicId()) | ||
{ | ||
// ZIP1 <Pd>.<T>, <Pn>.<T>, <Pm>.<T> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wondering if we should add a HW_Flag_AllMaskVariant
for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't want to use flag space as this list is unlikely to grow, these were the only instructions I could find that follow this pattern across all versions of SVE. But it might make it easier to apply this transform to other intrinsics in future if we find other patterns work too.
// | ||
GenTree* Compiler::fgMorphTryUseAllMaskVariant(GenTreeHWIntrinsic* node) | ||
{ | ||
if (node->HasAllMaskVariant() && canMorphAllVectorOperandsToMasks(node)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes sense to have node->HasAllMaskVariant()
inside canMorphAllVectorOperandsToMasks()
itself. That way for conditional select's left operand too, you can (and should) exercise it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with this change if all of these intrinsics have HasAllMaskVariant() == true
, but I don't think this works, see my comment below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If #114438 (comment) works, then consider tagging them with HW_Flag_AllMaskVariant
, move the HasAllMaskVariant()
inside canMorphAllVectorOperandsToMasks
. Having HW_Flag_AllMaskVariant
in table helps in easy discoverability of various flags in one place.
// BIC <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.B | ||
// EOR <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.B | ||
// ORR <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.B | ||
case NI_Sve_And: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think these too should be marked as HW_Flag_AllMaskVariant
and looked for in HasAllMaskVariant()
itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried grouping these intrinsics with the others initially but it doesn't work because these ones should only be considered in relation to a ConditionalSelect
. Grouping them with the others causes a transformation to run when a ConditionalSelect
is not present, which wouldn't be correct for these instructions because they require the mask
parameter for the governing predicate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We wrap the IR nodes that has embedded mask semantics like And
inside a ConditionalSelect
during lowering, which runs way after the morphing phase where you are doing this optimization. See if (HWIntrinsicInfo::IsEmbeddedMaskedOperation(intrinsicId))
in LowerHWIntrinsic()
. Until then, they continue to hold Vector
operands. If you do the transformation here for IR nodes that has And(mask, mask)
, it shouldn't prohibit us from wrapping it in ConditionalSelect
in lowering.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently it's marked HW_Flag_OptionalEmbeddedMaskedOperation
, so I think this wrapping isn't occurring for this intrinsic? When I try to implement it like this, it changes all the operands to masks and then tries to emit AND <Zd>.D, <Zn>.D, <Zm>.D
and runs into this assert because the register types are wrong:
runtime/src/coreclr/jit/emitarm64sve.cpp
Line 2912 in c5922a1
assert(isVectorRegister(reg3)); // ddddd |
The mask variant of this intrinsic has an embedded mask, but it's required for this instruction instead of optional, so there would also need to be some handling of this edge case in codegen to make sure it definitely wraps the mask variant in ConditionalSelect
. It feels like there should be a separate set of flags for when the intrinsic is TYP_MASK
or TYP_SIMD
. E.g. HW_Flag_MaskVariant(Optional)EmbeddedMaskOperation
, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should have a separate intrinsics And_Predicates
(and likewise for other APIs that have predicates variant). They are added in the section "Special intrinsics that are generated during importing or lowering". And_Predicates
should have HW_Flag_EmbeddedMaskedOperation
. We can have flag HW_Flag_AllMaskVariant
on SVE_And
intrinsics, to detect it in morph if this can be transformed into And_Predicates
variant.
We come here in the morph and see And(Vector, Vector)
. If operands are mask, we can transform the node into And_Predicates(Mask, Mask)
. During lowering, we can then transform it into CndSel(AllTrue, And_Predicates(Mask, Mask), Zero)
and codegen will handle generating the predicated version of And (predicates)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds much better than what I was thinking, I'll try and implement this.
src/coreclr/jit/hwintrinsicarm64.cpp
Outdated
// Return Value: | ||
// The mask | ||
// | ||
GenTree* Compiler::gtNewSimdAllFalseMaskNode(unsigned simdSize) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GenTree* Compiler::gtNewSimdAllFalseMaskNode(unsigned simdSize) | |
GenTree* Compiler::gtNewSimdFalseMaskByteNode(unsigned simdSize) |
src/coreclr/jit/morph.cpp
Outdated
@@ -9218,6 +9218,15 @@ GenTree* Compiler::fgOptimizeHWIntrinsic(GenTreeHWIntrinsic* node) | |||
} | |||
} | |||
|
|||
#ifdef TARGET_ARM64 | |||
optimizedTree = fgMorphTryUseAllMaskVariant(node); | |||
if (optimizedTree != nullptr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having it here might be preventing the node from getting further transformations/optimizations. Should this be done towards the end of this method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems fine, I've moved it later in the method.
Introduces
fgMorphTryUseAllMaskVariant
for ARM64 that looks for various named intrinsics that have operands that look 'mask-like'. E.g. source operands originating fromSve.CreateTrueMask*
may be recognized as masks, causing the JIT to prefer to use the predicated version of the instruction as codegen for the intrinsic. It will also inspectConditionalSelect
intrinsic nodes to match instructions with governing predicates. The transform runs during morph.It's possible to emit the following instructions after this patch:
Contributes towards #101970