Thanks to visit codestin.com
Credit goes to github.com

Skip to content

JIT: Improve last-use copy omission for implicit byrefs #76069

@jakobbotsch

Description

@jakobbotsch

Consider:

[MethodImpl(MethodImplOptions.NoInlining)]
public static int Test(Span<int> s)
{
    return s.Length + Test2(s);
}

[MethodImpl(MethodImplOptions.NoInlining)]
public static int Test2(Span<int> s)
{
    return s.Length;
}

Today we generate the following on win-x64:

; Assembly listing for method Program:Test(System.Span`1[int]):int
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; optimized code
; rsp based frame
; partially interruptible
; No PGO data
; 0 inlinees with PGO data; 1 single block inlinees; 0 inlinees without PGO data
; Final local variable assignments
;
;  V00 arg0         [V00,T00] (  4,  8   )   byref  ->  rcx         ld-addr-op single-def
;  V01 OutArgs      [V01    ] (  1,  1   )  lclBlk (32) [rsp+00H]   "OutgoingArgSpace"
;* V02 tmp1         [V02    ] (  0,  0   )     int  ->  zero-ref    "non-inline candidate call"
;* V03 tmp2         [V03    ] (  0,  0   )   byref  ->  zero-ref    V05._reference(offs=0x00) P-INDEP "field V00._reference (fldOffset=0x0)"
;* V04 tmp3         [V04    ] (  0,  0   )     int  ->  zero-ref    V05._length(offs=0x08) P-INDEP "field V00._length (fldOffset=0x8)"
;* V05 tmp4         [V05    ] (  0,  0   )  struct (16) zero-ref    "Promoted implicit byref"
;  V06 tmp5         [V06    ] (  2,  4   )  struct (16) [rsp+20H]   do-not-enreg[XS] addr-exposed "by-value struct argument"
;
; Lcl frame size = 48

G_M62473_IG01:              ;; offset=0000H
       56                   push     rsi
       4883EC30             sub      rsp, 48
       C5F877               vzeroupper
                                                ;; size=8 bbWeight=1    PerfScore 2.25
G_M62473_IG02:              ;; offset=0008H
       8B7108               mov      esi, dword ptr [rcx+08H]
                                                ;; size=3 bbWeight=1    PerfScore 2.00
G_M62473_IG03:              ;; offset=000BH
       C5FA6F01             vmovdqu  xmm0, xmmword ptr [rcx]             ; unnecessary
       C5FA7F442420         vmovdqu  xmmword ptr [rsp+20H], xmm0         ; unnecessary
                                                ;; size=10 bbWeight=1    PerfScore 5.00
G_M62473_IG04:              ;; offset=0015H
       488D4C2420           lea      rcx, [rsp+20H]                      ; unnecessary
       FF15C86B1C00         call     [Program:Test2(System.Span`1[int]):int]
       03C6                 add      eax, esi
                                                ;; size=13 bbWeight=1    PerfScore 3.75
G_M62473_IG05:              ;; offset=0022H
       4883C430             add      rsp, 48
       5E                   pop      rsi
       C3                   ret
                                                ;; size=6 bbWeight=1    PerfScore 1.75

; Total bytes of code 40, prolog size 8, PerfScore 18.95, instruction count 12, allocated bytes for code 42 (MethodHash=e5250bf6) for method Program:Test(System.Span`1[int]):int
; ============================================================

We would probably need an early pass of (struct) liveness to get rid of some of these copies in a general fashion.

Some simple ad-hoc pattern matching during codegen (code) shows roughly 24k instances of this over libraries.crossgen and 3.2k over benchmarks.run. These are cases where we create a copy from a non-address exposed GTF_VAR_DEATH marked struct local.

We expect that this liveness pass may eventually be useful (necessary, even) for generalized promotion as well, and perhaps for forward sub too. Thus the pass will need to run early enough to be useful for these purposes. The likely best fit is right after local address visitor.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions