-
Notifications
You must be signed in to change notification settings - Fork 5.2k
ARM64: Enable jumptable to BT optimization #91811
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch Issue Details"jump-table to BT" optimization is currently limited to XARCH only, this PR enables it for ARM64 int Test(int x)
{
switch (x)
{
case 11:
case 13:
case 15:
case 17:
case 18:
case 20:
return 1;
}
return 2;
} ; Method Prog:Test(int):int:this (FullOpts)
stp fp, lr, [sp, #-0x10]!
mov fp, sp
sub w0, w1, #11
cmp w0, #9
bhi G_M9307_IG05
- mov w0, w0
- adr x1, [@RWD00]
- ldr w1, [x1, x0, LSL #2]
- adr x2, [G_M9307_IG02]
- add x1, x1, x2
- br x1
-G_M9307_IG03:
+ mov w1, #725
+ lsr w0, w1, w0
+ mov w1, #1
+ tst w0, w1
+ beq G_M9307_IG05
mov w0, #1
ldp fp, lr, [sp], #0x10
ret lr
G_M9307_IG05:
mov w0, #2
ldp fp, lr, [sp], #0x10
ret lr
-RWD00 dd 00000024h ; case G_M9307_IG03
- dd 00000030h ; case G_M9307_IG05
- dd 00000024h ; case G_M9307_IG03
- dd 00000030h ; case G_M9307_IG05
- dd 00000024h ; case G_M9307_IG03
- dd 00000030h ; case G_M9307_IG05
- dd 00000024h ; case G_M9307_IG03
- dd 00000024h ; case G_M9307_IG03
- dd 00000030h ; case G_M9307_IG05
- dd 00000024h ; case G_M9307_IG03
; Total bytes of code: 68
|
#ifdef TARGET_XARCH | ||
// | ||
// Append BT(bitTable, switchValue) and JCC(condition) to the switch block. | ||
// | ||
|
||
var_types bitTableType = (bitCount <= (genTypeSize(TYP_INT) * 8)) ? TYP_INT : TYP_LONG; | ||
GenTree* bitTableIcon = comp->gtNewIconNode(bitTable, bitTableType); | ||
GenTree* bitTest = comp->gtNewOperNode(GT_BT, TYP_VOID, bitTableIcon, switchValue); | ||
GenTree* bitTest = comp->gtNewOperNode(GT_BT, TYP_VOID, bitTableIcon, switchValue); | ||
bitTest->gtFlags |= GTF_SET_FLAGS; | ||
GenTreeCC* jcc = comp->gtNewCC(GT_JCC, TYP_VOID, bbSwitchCondition); | ||
|
||
LIR::AsRange(bbSwitch).InsertAfter(switchValue, bitTableIcon, bitTest, jcc); | ||
|
||
#else // TARGET_XARCH | ||
// | ||
// Fallback to AND(RSZ(bitTable, switchValue), 1) | ||
// | ||
GenTree* tstCns = comp->gtNewIconNode(bbSwitch->bbNext != bbCase0 ? 0 : 1, bitTableType); | ||
GenTree* shift = comp->gtNewOperNode(GT_RSZ, bitTableType, bitTableIcon, switchValue); | ||
GenTree* one = comp->gtNewIconNode(1, bitTableType); | ||
GenTree* andOp = comp->gtNewOperNode(GT_AND, bitTableType, shift, one); | ||
GenTree* cmp = comp->gtNewOperNode(GT_EQ, TYP_INT, andOp, tstCns); | ||
GenTree* jcc = comp->gtNewOperNode(GT_JTRUE, TYP_VOID, cmp); | ||
LIR::AsRange(bbSwitch).InsertAfter(switchValue, bitTableIcon, shift, tstCns, one); | ||
LIR::AsRange(bbSwitch).InsertAfter(one, andOp, cmp, jcc); | ||
#endif // !TARGET_XARCH |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we still need the xarch special case? Does OptimzeConstCompare
not handle the case? Ideally we would teach it about the missing opportunity so that everyone benefits instead of special casing it here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about it, but I'd like to leave it for a future follow up since it involves more work to add that peephole and I was mainly interested in improving arm64
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, you mean that the peephole already exists?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, sounds fine to me. I would personally expect x & (1 << y)
to be quite common, and it looks like we are missing this opportunity for arm64 (TEST_EQ/TEST_NE(x, LSH(1, y)) => TEST_EQ/TEST_NE(RSZ(x, y), 1)
). Then this transform could produce EQ/NE(AND(x, LSH(1, y)))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it exists, but it seems like adding it here would be straightforward:
runtime/src/coreclr/jit/lower.cpp
Lines 3596 to 3621 in e04c75e
#ifdef TARGET_XARCH | |
if (cmp->OperIs(GT_TEST_EQ, GT_TEST_NE)) | |
{ | |
// | |
// Transform TEST_EQ|NE(x, LSH(1, y)) into BT(x, y) when possible. Using BT | |
// results in smaller and faster code. It also doesn't have special register | |
// requirements, unlike LSH that requires the shift count to be in ECX. | |
// Note that BT has the same behavior as LSH when the bit index exceeds the | |
// operand bit size - it uses (bit_index MOD bit_size). | |
// | |
GenTree* lsh = cmp->gtGetOp2(); | |
if (lsh->OperIs(GT_LSH) && varTypeIsIntOrI(lsh->TypeGet()) && lsh->gtGetOp1()->IsIntegralConst(1)) | |
{ | |
cmp->SetOper(cmp->OperIs(GT_TEST_EQ) ? GT_BITTEST_EQ : GT_BITTEST_NE); | |
cmp->AsOp()->gtOp2 = lsh->gtGetOp2(); | |
cmp->gtGetOp2()->ClearContained(); | |
BlockRange().Remove(lsh->gtGetOp1()); | |
BlockRange().Remove(lsh); | |
return cmp->gtNext; | |
} | |
} | |
#endif // TARGET_XARCH |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course we could also teach it about TEST_EQ/TEST_NE(RSZ(x, y), 1) => BT(x, y)
on x64/x86, but I guess this is a less common pattern
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anyway I'm ok with keeping this PR as is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jakobbotsch ok let's then keep as is and I'll file a "good-first-issue" to recognize BT pattern (or will work myself when I have time)
(bigger in fact because we don't take data section into account) |
Failures are #91838 |
Why are the regressions on arm? |
arm32 doesn't have |
I am asking because the PR title says that this only impacts Arm64. |
it used to, Jakob suggested a fix that enabled it for all archs including arm32, risc-v and la64 |
"jump-table to BT" optimization is currently limited to XARCH only, this PR enables it for ARM64