Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

EgorBo
Copy link
Member

@EgorBo EgorBo commented Sep 8, 2023

"jump-table to BT" optimization is currently limited to XARCH only, this PR enables it for ARM64

int Test(int x)
{
    switch (x)
    {
        case 11:
        case 13:
        case 15:
        case 17:
        case 18:
        case 20:
            return 1;
    }
    return 2;
}
; Method Prog:Test(int):int:this (FullOpts)	
            stp     fp, lr, [sp, #-0x10]!	
            mov     fp, sp	
            sub     w0, w1, #11	
            cmp     w0, #9	
            bhi     G_M9307_IG05	
-           mov     w0, w0	
-           adr     x1, [@RWD00]	
-           ldr     w1, [x1, x0, LSL #2]	
-           adr     x2, [G_M9307_IG02]	
-           add     x1, x1, x2	
-           br      x1	
-G_M9307_IG03:	
+           mov     w1, #725
+           lsr     w0, w1, w0
+           tbz     w0, #0, G_M9307_IG05
            mov     w0, #1	
            ldp     fp, lr, [sp], #0x10	
            ret     lr	
G_M9307_IG05:	
            mov     w0, #2	
            ldp     fp, lr, [sp], #0x10	
            ret     lr	
-RWD00  dd 00000024h ; case G_M9307_IG03	
-       dd 00000030h ; case G_M9307_IG05	
-       dd 00000024h ; case G_M9307_IG03	
-       dd 00000030h ; case G_M9307_IG05	
-       dd 00000024h ; case G_M9307_IG03	
-       dd 00000030h ; case G_M9307_IG05	
-       dd 00000024h ; case G_M9307_IG03	
-       dd 00000024h ; case G_M9307_IG03	
-       dd 00000030h ; case G_M9307_IG05	
-       dd 00000024h ; case G_M9307_IG03	
-; Total bytes of code: 68
+; Total bytes of code: 56

@ghost ghost added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Sep 8, 2023
@ghost ghost assigned EgorBo Sep 8, 2023
@ghost
Copy link

ghost commented Sep 8, 2023

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

"jump-table to BT" optimization is currently limited to XARCH only, this PR enables it for ARM64

int Test(int x)
{
    switch (x)
    {
        case 11:
        case 13:
        case 15:
        case 17:
        case 18:
        case 20:
            return 1;
    }
    return 2;
}
; Method Prog:Test(int):int:this (FullOpts)	
            stp     fp, lr, [sp, #-0x10]!	
            mov     fp, sp	
            sub     w0, w1, #11	
            cmp     w0, #9	
            bhi     G_M9307_IG05	
-           mov     w0, w0	
-           adr     x1, [@RWD00]	
-           ldr     w1, [x1, x0, LSL #2]	
-           adr     x2, [G_M9307_IG02]	
-           add     x1, x1, x2	
-           br      x1	
-G_M9307_IG03:	
+           mov     w1, #725
+           lsr     w0, w1, w0
+           mov     w1, #1
+           tst     w0, w1
+           beq     G_M9307_IG05
            mov     w0, #1	
            ldp     fp, lr, [sp], #0x10	
            ret     lr	
G_M9307_IG05:	
            mov     w0, #2	
            ldp     fp, lr, [sp], #0x10	
            ret     lr	
-RWD00  dd 00000024h ; case G_M9307_IG03	
-       dd 00000030h ; case G_M9307_IG05	
-       dd 00000024h ; case G_M9307_IG03	
-       dd 00000030h ; case G_M9307_IG05	
-       dd 00000024h ; case G_M9307_IG03	
-       dd 00000030h ; case G_M9307_IG05	
-       dd 00000024h ; case G_M9307_IG03	
-       dd 00000024h ; case G_M9307_IG03	
-       dd 00000030h ; case G_M9307_IG05	
-       dd 00000024h ; case G_M9307_IG03	
; Total bytes of code: 68
Author: EgorBo
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

Comment on lines +1272 to +1292
#ifdef TARGET_XARCH
//
// Append BT(bitTable, switchValue) and JCC(condition) to the switch block.
//

var_types bitTableType = (bitCount <= (genTypeSize(TYP_INT) * 8)) ? TYP_INT : TYP_LONG;
GenTree* bitTableIcon = comp->gtNewIconNode(bitTable, bitTableType);
GenTree* bitTest = comp->gtNewOperNode(GT_BT, TYP_VOID, bitTableIcon, switchValue);
GenTree* bitTest = comp->gtNewOperNode(GT_BT, TYP_VOID, bitTableIcon, switchValue);
bitTest->gtFlags |= GTF_SET_FLAGS;
GenTreeCC* jcc = comp->gtNewCC(GT_JCC, TYP_VOID, bbSwitchCondition);

LIR::AsRange(bbSwitch).InsertAfter(switchValue, bitTableIcon, bitTest, jcc);

#else // TARGET_XARCH
//
// Fallback to AND(RSZ(bitTable, switchValue), 1)
//
GenTree* tstCns = comp->gtNewIconNode(bbSwitch->bbNext != bbCase0 ? 0 : 1, bitTableType);
GenTree* shift = comp->gtNewOperNode(GT_RSZ, bitTableType, bitTableIcon, switchValue);
GenTree* one = comp->gtNewIconNode(1, bitTableType);
GenTree* andOp = comp->gtNewOperNode(GT_AND, bitTableType, shift, one);
GenTree* cmp = comp->gtNewOperNode(GT_EQ, TYP_INT, andOp, tstCns);
GenTree* jcc = comp->gtNewOperNode(GT_JTRUE, TYP_VOID, cmp);
LIR::AsRange(bbSwitch).InsertAfter(switchValue, bitTableIcon, shift, tstCns, one);
LIR::AsRange(bbSwitch).InsertAfter(one, andOp, cmp, jcc);
#endif // !TARGET_XARCH
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we still need the xarch special case? Does OptimzeConstCompare not handle the case? Ideally we would teach it about the missing opportunity so that everyone benefits instead of special casing it here

Copy link
Member Author

@EgorBo EgorBo Sep 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about it, but I'd like to leave it for a future follow up since it involves more work to add that peephole and I was mainly interested in improving arm64

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, you mean that the peephole already exists?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, sounds fine to me. I would personally expect x & (1 << y) to be quite common, and it looks like we are missing this opportunity for arm64 (TEST_EQ/TEST_NE(x, LSH(1, y)) => TEST_EQ/TEST_NE(RSZ(x, y), 1)). Then this transform could produce EQ/NE(AND(x, LSH(1, y)))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it exists, but it seems like adding it here would be straightforward:

#ifdef TARGET_XARCH
if (cmp->OperIs(GT_TEST_EQ, GT_TEST_NE))
{
//
// Transform TEST_EQ|NE(x, LSH(1, y)) into BT(x, y) when possible. Using BT
// results in smaller and faster code. It also doesn't have special register
// requirements, unlike LSH that requires the shift count to be in ECX.
// Note that BT has the same behavior as LSH when the bit index exceeds the
// operand bit size - it uses (bit_index MOD bit_size).
//
GenTree* lsh = cmp->gtGetOp2();
if (lsh->OperIs(GT_LSH) && varTypeIsIntOrI(lsh->TypeGet()) && lsh->gtGetOp1()->IsIntegralConst(1))
{
cmp->SetOper(cmp->OperIs(GT_TEST_EQ) ? GT_BITTEST_EQ : GT_BITTEST_NE);
cmp->AsOp()->gtOp2 = lsh->gtGetOp2();
cmp->gtGetOp2()->ClearContained();
BlockRange().Remove(lsh->gtGetOp1());
BlockRange().Remove(lsh);
return cmp->gtNext;
}
}
#endif // TARGET_XARCH

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course we could also teach it about TEST_EQ/TEST_NE(RSZ(x, y), 1) => BT(x, y) on x64/x86, but I guess this is a less common pattern

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyway I'm ok with keeping this PR as is.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jakobbotsch ok let's then keep as is and I'll file a "good-first-issue" to recognize BT pattern (or will work myself when I have time)

@EgorBo EgorBo marked this pull request as ready for review September 9, 2023 13:38
@EgorBo
Copy link
Member Author

EgorBo commented Sep 9, 2023

Diffs: https://dev.azure.com/dnceng-public/public/_build/results?buildId=400689&view=ms.vss-build-web.run-extensions-tab

(bigger in fact because we don't take data section into account)

@EgorBo
Copy link
Member Author

EgorBo commented Sep 9, 2023

Failures are #91838

@kunalspathak
Copy link
Contributor

Why are the regressions on arm?

@EgorBo
Copy link
Member Author

EgorBo commented Sep 9, 2023

Why are the regressions on arm?

arm32 doesn't have tbz short path, but overall it should still be sensitive size improvements for arm -- all the regressions come with jump-tables deleted (those aren't counted in Total bytes of code)

@EgorBo EgorBo merged commit fe43240 into dotnet:main Sep 9, 2023
@EgorBo EgorBo deleted the bt-arm64 branch September 9, 2023 21:30
@kunalspathak
Copy link
Contributor

Why are the regressions on arm?

arm32 doesn't have tbz short path, but overall it should still be sensitive size improvements for arm -- all the regressions come with jump-tables deleted (those aren't counted in Total bytes of code)

I am asking because the PR title says that this only impacts Arm64.

@EgorBo
Copy link
Member Author

EgorBo commented Sep 10, 2023

Why are the regressions on arm?

arm32 doesn't have tbz short path, but overall it should still be sensitive size improvements for arm -- all the regressions come with jump-tables deleted (those aren't counted in Total bytes of code)

I am asking because the PR title says that this only impacts Arm64.

it used to, Jakob suggested a fix that enabled it for all archs including arm32, risc-v and la64

@ghost ghost locked as resolved and limited conversation to collaborators Oct 10, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants