Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Eliding span Slice bounds checking is not trivial #115154

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
TonuFish opened this issue Apr 29, 2025 · 3 comments
Open

Eliding span Slice bounds checking is not trivial #115154

TonuFish opened this issue Apr 29, 2025 · 3 comments
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI tenet-performance Performance related issue
Milestone

Comments

@TonuFish
Copy link

Description

Changes in similar user logic to guarantee a valid upper bound when slicing a span result in significantly different codegen.

The five methods below progress from "first thought" to unsafe implementations of the same guarantee, with only the unsigned predicate and unsafe versions eliding the Slice bounds check.

ReadOnlySpan<char> TestPositiveThenMinI32(int num, ReadOnlySpan<char> chars)
{
    return (num >= 0) ? chars[..int.Min(chars.Length, num)] : chars;
}

ReadOnlySpan<char> TestPredicateI32(int num, ReadOnlySpan<char> chars)
{
    return (num >= 0 && num < chars.Length) ? chars[..num] : chars;
}

ReadOnlySpan<char> TestPredicateU32(int num, ReadOnlySpan<char> chars)
{
    // No bounds check
    return ((uint)num < chars.Length) ? chars[..num] : chars;
}

ReadOnlySpan<char> TestMinU32(int num, ReadOnlySpan<char> chars)
{
    return chars[..(int)uint.Min((uint)chars.Length, (uint)num)];
}

ReadOnlySpan<char> TestCreateRoS(int num, ReadOnlySpan<char> chars)
{
    // No bounds check
    return MemoryMarshal.CreateReadOnlySpan(
        ref MemoryMarshal.GetReference(chars),
        (int)uint.Min((uint)chars.Length, (uint)num));
}

Configuration

Config Value
Processor Intel 265K (Arrow Lake)
OS Windows 11 (10.0.26100.3775)
SDK .NET SDK 10.0.100-preview.3.25201.16
Runtime .NET SDK 10.0.100-preview.3.25201.16
Disamo Runtime Build 8fa9785

Regression?

No.

Data

godbolt

BenchmarkDotNet v0.14.0, Windows 11 (10.0.26100.3775)
Unknown processor
.NET SDK 10.0.100-preview.3.25201.16
  [Host]     : .NET 10.0.0 (10.0.25.17105), X64 RyuJIT AVX2
  Job-SBNQWE : .NET 10.0.0 (10.0.25.17105), X64 RyuJIT AVX2

Affinity=00000000000000000001  
Method Mean Error StdDev Ratio Code Size
PositiveThenMinI32 4.096 ms 0.0133 ms 0.0118 ms 1.00 207 B
PredicateI32 3.700 ms 0.0413 ms 0.0366 ms 0.90 200 B
PredicateU32 3.408 ms 0.0181 ms 0.0160 ms 0.83 176 B
MinU32 3.619 ms 0.0169 ms 0.0149 ms 0.88 190 B
CreateRoS 3.319 ms 0.0133 ms 0.0125 ms 0.81 169 B
Benchmark Code
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Diagnosers;
using BenchmarkDotNet.Running;
using System.Runtime.CompilerServices;
using System.Runtime.InteropServices;

// dotnet run --project .\FrozenBoundsCheck.Runner\ -c Release --filter '*BoundBench*' --affinity 1
BenchmarkSwitcher.FromAssembly(typeof(Program).Assembly).Run(args);

[SimpleJob]
[DisassemblyDiagnoser(maxDepth: 10, syntax: DisassemblySyntax.Intel)]
[SkipLocalsInit]
public class BoundBench
{
    private const string Str = "Some Arbitrary String";

    [MethodImpl(MethodImplOptions.NoInlining)]
    public ReadOnlySpan<char> TestPositiveThenMinI32(int num, ReadOnlySpan<char> chars)
    {
        return (num >= 0) ? chars[..int.Min(chars.Length, num)] : chars;
    }

    [MethodImpl(MethodImplOptions.NoInlining)]
    public ReadOnlySpan<char> TestPredicateI32(int num, ReadOnlySpan<char> chars)
    {
        return (num >= 0 && num < chars.Length) ? chars[..num] : chars;
    }

    [MethodImpl(MethodImplOptions.NoInlining)]
    public ReadOnlySpan<char> TestPredicateU32(int num, ReadOnlySpan<char> chars)
    {
        return ((uint)num < chars.Length) ? chars[..num] : chars;
    }

    [MethodImpl(MethodImplOptions.NoInlining)]
    public ReadOnlySpan<char> TestMinU32(int num, ReadOnlySpan<char> chars)
    {
        return chars[..(int)uint.Min((uint)chars.Length, (uint)num)];
    }

    [MethodImpl(MethodImplOptions.NoInlining)]
    public ReadOnlySpan<char> TestCreateRoS(int num, ReadOnlySpan<char> chars)
    {
        return MemoryMarshal.CreateReadOnlySpan(
            ref MemoryMarshal.GetReference(chars),
            (int)uint.Min((uint)chars.Length, (uint)num));
    }

    [Benchmark(Baseline = true)]
    public int PositiveThenMinI32()
    {
        var ret = 0;
        for (var ii = 0; ii < 4_000_000; ++ii)
        {
            var rv = TestPositiveThenMinI32(ii % 40, Str);
            ret += rv.Length;
        }
        return ret;
    }

    [Benchmark]
    public int PredicateI32()
    {
        var ret = 0;
        for (var ii = 0; ii < 4_000_000; ++ii)
        {
            var rv = TestPredicateI32(ii % 40, Str);
            ret += rv.Length;
        }
        return ret;
    }

    [Benchmark]
    public int PredicateU32()
    {
        var ret = 0;
        for (var ii = 0; ii < 4_000_000; ++ii)
        {
            var rv = TestPredicateU32(ii % 40, Str);
            ret += rv.Length;
        }
        return ret;
    }

    [Benchmark]
    public int MinU32()
    {
        var retVal = 0;
        for (var ii = 0; ii < 4_000_000; ++ii)
        {
            var rv = TestMinU32(ii % 40, Str);
            retVal += rv.Length;
        }
        return retVal;
    }

    [Benchmark]
    public int CreateRoS()
    {
        var retVal = 0;
        for (var ii = 0; ii < 4_000_000; ++ii)
        {
            var rv = TestCreateRoS(ii % 40, Str);
            retVal += rv.Length;
        }
        return retVal;
    }
}
TestPositiveThenMinI32 BDN Disassembly
; BoundBench.TestPositiveThenMinI32(Int32, System.ReadOnlySpan`1<Char>)
       push      rsi
       push      rbx
       sub       rsp,28
       mov       rbx,[r9]
       mov       esi,[r9+8]
       test      r8d,r8d
       jl        short M01_L01
       cmp       esi,r8d
       mov       eax,r8d
       cmovle    eax,esi
       cmp       eax,esi
       ja        short M01_L02
       mov       [rdx],rbx
       mov       [rdx+8],eax
M01_L00:
       mov       rax,rdx
       add       rsp,28
       pop       rbx
       pop       rsi
       ret
M01_L01:
       mov       [rdx],rbx
       mov       [rdx+8],esi
       jmp       short M01_L00
M01_L02:
       call      qword ptr [7FFA83737DC8]
       int3
; Total bytes of code 62
TestPositiveThenMinI32 Disasmo
; Assembly listing for method BoundBench:TestPositiveThenMinI32(int,System.ReadOnlySpan`1[ushort]):System.ReadOnlySpan`1[ushort]:this (FullOpts)
; Emitting BLENDED_CODE for X64 with AVX - Windows
; FullOpts code
; optimized code
; rsp based frame
; partially interruptible
; No PGO data
; 2 inlinees with PGO data; 3 single block inlinees; 1 inlinees without PGO data
; Final local variable assignments
;
;* V00 this         [V00    ] (  0,  0   )     ref  ->  zero-ref    this class-hnd single-def <BoundBench>
;  V01 RetBuf       [V01,T01] (  7,  5   )    long  ->  rbx         single-def
;  V02 arg1         [V02,T02] (  5,  4   )     int  ->   r8         single-def
;  V03 arg2         [V03,T00] (  4,  8   )   byref  ->   r9         ld-addr-op single-def
;  V04 OutArgs      [V04    ] (  1,  1   )  struct (32) [rsp+0x00]  do-not-enreg[XS] addr-exposed "OutgoingArgSpace" <UNNAMED>
;* V05 tmp1         [V05    ] (  0,  0   )     int  ->  zero-ref    "Inlining Arg"
;  V06 tmp2         [V06,T04] (  4,  2   )     int  ->  rbp         "Inline return value spill temp"
;* V07 tmp3         [V07    ] (  0,  0   )  struct (16) zero-ref    ld-addr-op "NewObj constructor temp" <System.ReadOnlySpan`1[ushort]>
;* V08 tmp4         [V08    ] (  0,  0   )   byref  ->  zero-ref    single-def "Inlining Arg"
;* V09 tmp5         [V09    ] (  0,  0   )   ubyte  ->  zero-ref    "Inlining Arg"
;* V10 tmp6         [V10    ] (  0,  0   )   ubyte  ->  zero-ref    "Inlining Arg"
;  V11 tmp7         [V11,T05] (  3,  2   )   byref  ->  rsi         single-def "field V03._reference (fldOffset=0x0)" P-INDEP
;  V12 tmp8         [V12,T03] (  5,  3   )     int  ->  rdi         "field V03._length (fldOffset=0x8)" P-INDEP
;  V13 tmp9         [V13,T06] (  2,  1   )   byref  ->  rsi         single-def "field V07._reference (fldOffset=0x0)" P-INDEP
;  V14 tmp10        [V14,T07] (  2,  1   )     int  ->  rbp         "field V07._length (fldOffset=0x8)" P-INDEP
;* V15 tmp11        [V15    ] (  0,  0   )  struct (16) zero-ref    "Promoted implicit byref" <System.ReadOnlySpan`1[ushort]>
;
; Lcl frame size = 40

G_M63008_IG01:
       push     rdi
       push     rsi
       push     rbp
       push     rbx
       sub      rsp, 40
       mov      rbx, rdx
						;; size=11 bbWeight=1 PerfScore 4.50
G_M63008_IG02:
       mov      rsi, bword ptr [r9]
       mov      edi, dword ptr [r9+0x08]
       test     r8d, r8d
       jl       SHORT G_M63008_IG05
						;; size=12 bbWeight=1 PerfScore 5.25
G_M63008_IG03:
       cmp      edi, r8d
       mov      ebp, r8d
       cmovle   ebp, edi
       cmp      ebp, edi
       ja       SHORT G_M63008_IG08
       test     ebp, ebp
       jge      SHORT G_M63008_IG04
       mov      rcx, 0xD1FFAB1E      ; 'length >= 0'
       mov      rdx, 0xD1FFAB1E      ; ''
       call     [System.Diagnostics.Debug:Fail(System.String,System.String)]
						;; size=43 bbWeight=0.50 PerfScore 3.38
G_M63008_IG04:
       mov      bword ptr [rbx], rsi
       mov      dword ptr [rbx+0x08], ebp
       jmp      SHORT G_M63008_IG06
						;; size=8 bbWeight=0.50 PerfScore 2.00
G_M63008_IG05:
       mov      bword ptr [rbx], rsi
       mov      dword ptr [rbx+0x08], edi
						;; size=6 bbWeight=0.50 PerfScore 1.00
G_M63008_IG06:
       mov      rax, rbx
						;; size=3 bbWeight=1 PerfScore 0.25
G_M63008_IG07:
       add      rsp, 40
       pop      rbx
       pop      rbp
       pop      rsi
       pop      rdi
       ret      
						;; size=9 bbWeight=1 PerfScore 3.25
G_M63008_IG08:
       call     [System.ThrowHelper:ThrowArgumentOutOfRangeException()]
       int3     
						;; size=7 bbWeight=0 PerfScore 0.00

; Total bytes of code 99, prolog size 8, PerfScore 19.62, instruction count 34, allocated bytes for code 99 (MethodHash=b48909df) for method BoundBench:TestPositiveThenMinI32(int,System.ReadOnlySpan`1[ushort]):System.ReadOnlySpan`1[ushort]:this (FullOpts)
; ============================================================
TestPredicateI32 BDN Disassembly
; BoundBench.TestPredicateI32(Int32, System.ReadOnlySpan`1<Char>)
       push      rsi
       push      rbx
       sub       rsp,28
       mov       rbx,[r9]
       mov       esi,[r9+8]
       cmp       r8d,esi
       jae       short M01_L01
       cmp       r8d,esi
       ja        short M01_L02
       mov       [rdx],rbx
       mov       [rdx+8],r8d
M01_L00:
       mov       rax,rdx
       add       rsp,28
       pop       rbx
       pop       rsi
       ret
M01_L01:
       mov       [rdx],rbx
       mov       [rdx+8],esi
       jmp       short M01_L00
M01_L02:
       call      qword ptr [7FFA83727DC8]
       int3
; Total bytes of code 55
TestPredicateI32 Disasmo
; Assembly listing for method BoundBench:TestPredicateI32(int,System.ReadOnlySpan`1[ushort]):System.ReadOnlySpan`1[ushort]:this (FullOpts)
; Emitting BLENDED_CODE for X64 with AVX - Windows
; FullOpts code
; optimized code
; rsp based frame
; partially interruptible
; No PGO data
; 1 inlinees with PGO data; 2 single block inlinees; 1 inlinees without PGO data
; Final local variable assignments
;
;* V00 this         [V00    ] (  0,  0   )     ref  ->  zero-ref    this class-hnd single-def <BoundBench>
;  V01 RetBuf       [V01,T01] (  7,  5   )    long  ->  rdx         single-def
;  V02 arg1         [V02,T02] (  5,  4   )     int  ->   r8         single-def
;  V03 arg2         [V03,T00] (  4,  8   )   byref  ->   r9         ld-addr-op single-def
;  V04 OutArgs      [V04    ] (  1,  1   )  struct (32) [rsp+0x00]  do-not-enreg[XS] addr-exposed "OutgoingArgSpace" <UNNAMED>
;* V05 tmp1         [V05    ] (  0,  0   )  struct (16) zero-ref    ld-addr-op "NewObj constructor temp" <System.ReadOnlySpan`1[ushort]>
;* V06 tmp2         [V06    ] (  0,  0   )   byref  ->  zero-ref    single-def "Inlining Arg"
;* V07 tmp3         [V07    ] (  0,  0   )   ubyte  ->  zero-ref    "Inlining Arg"
;* V08 tmp4         [V08    ] (  0,  0   )   ubyte  ->  zero-ref    "Inlining Arg"
;  V09 tmp5         [V09,T04] (  3,  2   )   byref  ->  rax         single-def "field V03._reference (fldOffset=0x0)" P-INDEP
;  V10 tmp6         [V10,T03] (  4,  3   )     int  ->  rcx         "field V03._length (fldOffset=0x8)" P-INDEP
;  V11 tmp7         [V11,T05] (  2,  1   )   byref  ->  rax         single-def "field V05._reference (fldOffset=0x0)" P-INDEP
;  V12 tmp8         [V12,T06] (  2,  1   )     int  ->   r8         "field V05._length (fldOffset=0x8)" P-INDEP
;* V13 tmp9         [V13    ] (  0,  0   )  struct (16) zero-ref    "Promoted implicit byref" <System.ReadOnlySpan`1[ushort]>
;
; Lcl frame size = 40

G_M57615_IG01:
       sub      rsp, 40
						;; size=4 bbWeight=1 PerfScore 0.25
G_M57615_IG02:
       mov      rax, bword ptr [r9]
       mov      ecx, dword ptr [r9+0x08]
       cmp      r8d, ecx
       jb       SHORT G_M57615_IG04
						;; size=12 bbWeight=1 PerfScore 5.25
G_M57615_IG03:
       mov      bword ptr [rdx], rax
       mov      dword ptr [rdx+0x08], ecx
       jmp      SHORT G_M57615_IG05
						;; size=8 bbWeight=0.50 PerfScore 2.00
G_M57615_IG04:
       cmp      r8d, ecx
       ja       SHORT G_M57615_IG07
       mov      bword ptr [rdx], rax
       mov      dword ptr [rdx+0x08], r8d
						;; size=12 bbWeight=0.50 PerfScore 1.62
G_M57615_IG05:
       mov      rax, rdx
						;; size=3 bbWeight=1 PerfScore 0.25
G_M57615_IG06:
       add      rsp, 40
       ret      
						;; size=5 bbWeight=1 PerfScore 1.25
G_M57615_IG07:
       call     [System.ThrowHelper:ThrowArgumentOutOfRangeException()]
       int3     
						;; size=7 bbWeight=0 PerfScore 0.00

; Total bytes of code 51, prolog size 4, PerfScore 10.62, instruction count 17, allocated bytes for code 51 (MethodHash=262d1ef0) for method BoundBench:TestPredicateI32(int,System.ReadOnlySpan`1[ushort]):System.ReadOnlySpan`1[ushort]:this (FullOpts)
; ============================================================
TestPredicateU32 BDN Disassembly
; BoundBench.TestPredicateU32(Int32, System.ReadOnlySpan`1<Char>)
       mov       rax,[r9]
       mov       ecx,[r9+8]
       cmp       r8d,ecx
       jae       short M01_L01
       mov       [rdx],rax
       mov       [rdx+8],r8d
M01_L00:
       mov       rax,rdx
       ret
M01_L01:
       mov       [rdx],rax
       mov       [rdx+8],ecx
       jmp       short M01_L00
; Total bytes of code 31
TestPredicateU32 Disasmo
; Assembly listing for method BoundBench:TestPredicateU32(int,System.ReadOnlySpan`1[ushort]):System.ReadOnlySpan`1[ushort]:this (FullOpts)
; Emitting BLENDED_CODE for X64 with AVX - Windows
; FullOpts code
; optimized code
; rsp based frame
; partially interruptible
; No PGO data
; 1 inlinees with PGO data; 2 single block inlinees; 1 inlinees without PGO data
; Final local variable assignments
;
;* V00 this         [V00    ] (  0,  0   )     ref  ->  zero-ref    this class-hnd single-def <BoundBench>
;  V01 RetBuf       [V01,T01] (  7,  5   )    long  ->  rbx         single-def
;  V02 arg1         [V02,T02] (  5,  4   )     int  ->  rsi         single-def
;  V03 arg2         [V03,T00] (  4,  8   )   byref  ->   r9         ld-addr-op single-def
;  V04 OutArgs      [V04    ] (  1,  1   )  struct (32) [rsp+0x00]  do-not-enreg[XS] addr-exposed "OutgoingArgSpace" <UNNAMED>
;* V05 tmp1         [V05    ] (  0,  0   )  struct (16) zero-ref    ld-addr-op "NewObj constructor temp" <System.ReadOnlySpan`1[ushort]>
;* V06 tmp2         [V06    ] (  0,  0   )   byref  ->  zero-ref    single-def "Inlining Arg"
;* V07 tmp3         [V07    ] (  0,  0   )   ubyte  ->  zero-ref    "Inlining Arg"
;* V08 tmp4         [V08    ] (  0,  0   )   ubyte  ->  zero-ref    "Inlining Arg"
;  V09 tmp5         [V09,T04] (  3,  2   )   byref  ->  rdi         single-def "field V03._reference (fldOffset=0x0)" P-INDEP
;  V10 tmp6         [V10,T03] (  3,  2.50)     int  ->  rbp         "field V03._length (fldOffset=0x8)" P-INDEP
;  V11 tmp7         [V11,T05] (  2,  1   )   byref  ->  rdi         single-def "field V05._reference (fldOffset=0x0)" P-INDEP
;  V12 tmp8         [V12,T06] (  2,  1   )     int  ->  rsi         "field V05._length (fldOffset=0x8)" P-INDEP
;* V13 tmp9         [V13    ] (  0,  0   )  struct (16) zero-ref    "Promoted implicit byref" <System.ReadOnlySpan`1[ushort]>
;
; Lcl frame size = 40

G_M26131_IG01:
       push     rdi
       push     rsi
       push     rbp
       push     rbx
       sub      rsp, 40
       mov      rbx, rdx
       mov      esi, r8d
						;; size=14 bbWeight=1 PerfScore 4.75
G_M26131_IG02:
       mov      rdi, bword ptr [r9]
       mov      ebp, dword ptr [r9+0x08]
       cmp      esi, ebp
       jae      SHORT G_M26131_IG05
						;; size=11 bbWeight=1 PerfScore 5.25
G_M26131_IG03:
       test     esi, esi
       jge      SHORT G_M26131_IG04
       mov      rcx, 0xD1FFAB1E      ; 'length >= 0'
       mov      rdx, 0xD1FFAB1E      ; ''
       call     [System.Diagnostics.Debug:Fail(System.String,System.String)]
						;; size=30 bbWeight=0.50 PerfScore 2.38
G_M26131_IG04:
       mov      bword ptr [rbx], rdi
       mov      dword ptr [rbx+0x08], esi
       jmp      SHORT G_M26131_IG06
						;; size=8 bbWeight=0.50 PerfScore 2.00
G_M26131_IG05:
       mov      bword ptr [rbx], rdi
       mov      dword ptr [rbx+0x08], ebp
						;; size=6 bbWeight=0.50 PerfScore 1.00
G_M26131_IG06:
       mov      rax, rbx
						;; size=3 bbWeight=1 PerfScore 0.25
G_M26131_IG07:
       add      rsp, 40
       pop      rbx
       pop      rbp
       pop      rsi
       pop      rdi
       ret      
						;; size=9 bbWeight=1 PerfScore 3.25

; Total bytes of code 81, prolog size 8, PerfScore 18.88, instruction count 28, allocated bytes for code 81 (MethodHash=fc1899ec) for method BoundBench:TestPredicateU32(int,System.ReadOnlySpan`1[ushort]):System.ReadOnlySpan`1[ushort]:this (FullOpts)
; ============================================================
TestMinU32 BDN Disassembly
; BoundBench.TestMinU32(Int32, System.ReadOnlySpan`1<Char>)
       sub       rsp,28
       mov       rax,[r9]
       mov       ecx,[r9+8]
       cmp       ecx,r8d
       cmovbe    r8d,ecx
       cmp       r8d,ecx
       ja        short M01_L00
       mov       [rdx],rax
       mov       [rdx+8],r8d
       mov       rax,rdx
       add       rsp,28
       ret
M01_L00:
       call      qword ptr [7FFA836F7DC8]
       int3
; Total bytes of code 45
TestMinU32 Disasmo
; Assembly listing for method BoundBench:TestMinU32(int,System.ReadOnlySpan`1[ushort]):System.ReadOnlySpan`1[ushort]:this (FullOpts)
; Emitting BLENDED_CODE for X64 with AVX - Windows
; FullOpts code
; optimized code
; rsp based frame
; partially interruptible
; No PGO data
; 2 inlinees with PGO data; 3 single block inlinees; 1 inlinees without PGO data
; Final local variable assignments
;
;* V00 this         [V00    ] (  0,  0   )     ref  ->  zero-ref    this class-hnd single-def <BoundBench>
;  V01 RetBuf       [V01,T01] (  5,  5   )    long  ->  rbx         single-def
;  V02 arg1         [V02,T02] (  4,  4   )     int  ->   r8         single-def
;  V03 arg2         [V03,T00] (  4,  8   )   byref  ->   r9         ld-addr-op single-def
;  V04 OutArgs      [V04    ] (  1,  1   )  struct (32) [rsp+0x00]  do-not-enreg[XS] addr-exposed "OutgoingArgSpace" <UNNAMED>
;* V05 tmp1         [V05    ] (  0,  0   )     int  ->  zero-ref    "Inlining Arg"
;  V06 tmp2         [V06,T03] (  4,  4   )     int  ->  rdi         "Inline return value spill temp"
;* V07 tmp3         [V07    ] (  0,  0   )  struct (16) zero-ref    ld-addr-op "NewObj constructor temp" <System.ReadOnlySpan`1[ushort]>
;* V08 tmp4         [V08    ] (  0,  0   )   byref  ->  zero-ref    single-def "Inlining Arg"
;* V09 tmp5         [V09    ] (  0,  0   )   ubyte  ->  zero-ref    "Inlining Arg"
;* V10 tmp6         [V10    ] (  0,  0   )   ubyte  ->  zero-ref    "Inlining Arg"
;  V11 tmp7         [V11,T05] (  2,  2   )   byref  ->  rsi         single-def "field V03._reference (fldOffset=0x0)" P-INDEP
;  V12 tmp8         [V12,T04] (  4,  4   )     int  ->  rcx         "field V03._length (fldOffset=0x8)" P-INDEP
;  V13 tmp9         [V13,T06] (  2,  2   )   byref  ->  rsi         single-def "field V07._reference (fldOffset=0x0)" P-INDEP
;  V14 tmp10        [V14,T07] (  2,  2   )     int  ->  rdi         "field V07._length (fldOffset=0x8)" P-INDEP
;* V15 tmp11        [V15    ] (  0,  0   )  struct (16) zero-ref    "Promoted implicit byref" <System.ReadOnlySpan`1[ushort]>
;
; Lcl frame size = 32

G_M56160_IG01:
       push     rdi
       push     rsi
       push     rbx
       sub      rsp, 32
       mov      rbx, rdx
						;; size=10 bbWeight=1 PerfScore 3.50
G_M56160_IG02:
       mov      rsi, bword ptr [r9]
       mov      ecx, dword ptr [r9+0x08]
       cmp      ecx, r8d
       mov      edi, r8d
       cmovbe   edi, ecx
       cmp      edi, ecx
       ja       SHORT G_M56160_IG06
       test     edi, edi
       jge      SHORT G_M56160_IG04
						;; size=24 bbWeight=1 PerfScore 7.25
G_M56160_IG03:
       mov      rcx, 0xD1FFAB1E      ; 'length >= 0'
       mov      rdx, 0xD1FFAB1E      ; ''
       call     [System.Diagnostics.Debug:Fail(System.String,System.String)]
						;; size=26 bbWeight=0.50 PerfScore 1.75
G_M56160_IG04:
       mov      bword ptr [rbx], rsi
       mov      dword ptr [rbx+0x08], edi
       mov      rax, rbx
						;; size=9 bbWeight=1 PerfScore 2.25
G_M56160_IG05:
       add      rsp, 32
       pop      rbx
       pop      rsi
       pop      rdi
       ret      
						;; size=8 bbWeight=1 PerfScore 2.75
G_M56160_IG06:
       call     [System.ThrowHelper:ThrowArgumentOutOfRangeException()]
       int3     
						;; size=7 bbWeight=0 PerfScore 0.00

; Total bytes of code 84, prolog size 7, PerfScore 17.50, instruction count 27, allocated bytes for code 84 (MethodHash=ff0d249f) for method BoundBench:TestMinU32(int,System.ReadOnlySpan`1[ushort]):System.ReadOnlySpan`1[ushort]:this (FullOpts)
; ============================================================
TestCreateRoS BDN Disassembly
; BoundBench.TestCreateRoS(Int32, System.ReadOnlySpan`1<Char>)
       mov       rax,[r9]
       mov       ecx,[r9+8]
       cmp       ecx,r8d
       cmova     ecx,r8d
       mov       [rdx],rax
       mov       [rdx+8],ecx
       mov       rax,rdx
       ret
; Total bytes of code 24
TestCreateRoS Disasmo
; Assembly listing for method BoundBench:TestCreateRoS(int,System.ReadOnlySpan`1[ushort]):System.ReadOnlySpan`1[ushort]:this (FullOpts)
; Emitting BLENDED_CODE for X64 with AVX - Windows
; FullOpts code
; optimized code
; rsp based frame
; partially interruptible
; No PGO data
; 1 inlinees with PGO data; 5 single block inlinees; 1 inlinees without PGO data
; Final local variable assignments
;
;* V00 this         [V00    ] (  0,  0   )     ref  ->  zero-ref    this class-hnd single-def <BoundBench>
;  V01 RetBuf       [V01,T01] (  5,  5   )    long  ->  rbx         single-def
;  V02 arg1         [V02,T02] (  4,  4   )     int  ->   r8         single-def
;  V03 arg2         [V03,T00] (  4,  8   )   byref  ->   r9         ld-addr-op single-def
;  V04 OutArgs      [V04    ] (  1,  1   )  struct (32) [rsp+0x00]  do-not-enreg[XS] addr-exposed "OutgoingArgSpace" <UNNAMED>
;* V05 tmp1         [V05    ] (  0,  0   )   byref  ->  zero-ref    single-def "impAppendStmt"
;* V06 tmp2         [V06    ] (  0,  0   )  struct (16) zero-ref    ld-addr-op "Inlining Arg" <System.ReadOnlySpan`1[ushort]>
;  V07 tmp3         [V07,T03] (  3,  6   )     int  ->  rdi         "Inlining Arg"
;  V08 tmp4         [V08,T04] (  3,  3   )     int  ->  rdi         "Inline return value spill temp"
;* V09 tmp5         [V09    ] (  0,  0   )  struct (16) zero-ref    ld-addr-op "NewObj constructor temp" <System.ReadOnlySpan`1[ushort]>
;* V10 tmp6         [V10    ] (  0,  0   )   ubyte  ->  zero-ref    "Inlining Arg"
;* V11 tmp7         [V11    ] (  0,  0   )   ubyte  ->  zero-ref    "Inlining Arg"
;* V12 tmp8         [V12    ] (  0,  0   )   byref  ->  zero-ref    "field V03._reference (fldOffset=0x0)" P-INDEP
;* V13 tmp9         [V13    ] (  0,  0   )     int  ->  zero-ref    "field V03._length (fldOffset=0x8)" P-INDEP
;  V14 tmp10        [V14,T05] (  2,  2   )   byref  ->  rsi         single-def "field V06._reference (fldOffset=0x0)" P-INDEP
;* V15 tmp11        [V15    ] (  0,  0   )     int  ->  zero-ref    "field V06._length (fldOffset=0x8)" P-INDEP
;  V16 tmp12        [V16,T06] (  2,  2   )   byref  ->  rsi         single-def "field V09._reference (fldOffset=0x0)" P-INDEP
;  V17 tmp13        [V17,T07] (  2,  2   )     int  ->  rdi         "field V09._length (fldOffset=0x8)" P-INDEP
;* V18 tmp14        [V18    ] (  0,  0   )  struct (16) zero-ref    "Promoted implicit byref" <System.ReadOnlySpan`1[ushort]>
;
; Lcl frame size = 32

G_M2100_IG01:
       push     rdi
       push     rsi
       push     rbx
       sub      rsp, 32
       mov      rbx, rdx
						;; size=10 bbWeight=1 PerfScore 3.50
G_M2100_IG02:
       mov      rsi, bword ptr [r9]
       mov      edi, dword ptr [r9+0x08]
       cmp      edi, r8d
       cmova    edi, r8d
       test     edi, edi
       jge      SHORT G_M2100_IG04
						;; size=18 bbWeight=1 PerfScore 5.75
G_M2100_IG03:
       mov      rcx, 0xD1FFAB1E      ; 'length >= 0'
       mov      rdx, 0xD1FFAB1E      ; ''
       call     [System.Diagnostics.Debug:Fail(System.String,System.String)]
						;; size=26 bbWeight=0.50 PerfScore 1.75
G_M2100_IG04:
       mov      bword ptr [rbx], rsi
       mov      dword ptr [rbx+0x08], edi
       mov      rax, rbx
						;; size=9 bbWeight=1 PerfScore 2.25
G_M2100_IG05:
       add      rsp, 32
       pop      rbx
       pop      rsi
       pop      rdi
       ret      
						;; size=8 bbWeight=1 PerfScore 2.75

; Total bytes of code 71, prolog size 7, PerfScore 16.00, instruction count 22, allocated bytes for code 71 (MethodHash=38b2f7cb) for method BoundBench:TestCreateRoS(int,System.ReadOnlySpan`1[ushort]):System.ReadOnlySpan`1[ushort]:this (FullOpts)
; ============================================================

Analysis

The only phase responsible for allow the upper bounds checking block to be removed was 'Redundant branch opts'.

TestPredicateU32

The bounds checking branch is pruned as the predicate in user code is identical to Slice(int,int) after 'Morph - Global', with < subsuming <=.

((uint) num < (uint) chars.Length) // user
((ulong) (uint) 0 + (ulong) (uint) length <= (ulong) (uint) chars.Length) // Slice(int,int)
;  V02 arg1              int
;  V10 tmp6              int  "field V03._length (fldOffset=0x8)" P-INDEP

--- Trying RBO in BB04 ---
Relop [000030] BB04 value unknown, trying inference
Can infer LE_UN from [true] dominating LT_UN

Dominator BB01 of BB04 can infer value of dominated relop
N003 (  3,  3) [000006] J---G+-N-U-                         *  LT        int    <l:$2c0, c:$2c1>
N001 (  1,  1) [000000] -----+-----                         +--*  LCL_VAR   int    V02 arg1         u:1 $c0
N002 (  1,  1) [000004] -----+-----                         \--*  LCL_VAR   int    V10 tmp6         u:1 <l:$280, c:$c1>
 Redundant compare; current relop:
N003 (  3,  3) [000030] N----+-N-U-                         *  LE        int    <l:$2c2, c:$2c3>
N001 (  1,  1) [000010] -----+-----                         +--*  LCL_VAR   int    V02 arg1         u:1 $c0
N002 (  1,  1) [000028] -----+-----                         \--*  LCL_VAR   int    V10 tmp6         u:1 (last use) <l:$280, c:$c1>
True successor BB04 of BB01 reaches, relop [000030] must be true
Flowgraph before redundant branch opts

Image

TestPredicateI32

The combination of user clauses guaranteeing the subrange [0..length] are ineffective.

;  V02 arg1              int 
;  V10 tmp6              int  "field V03._length (fldOffset=0x8)" P-INDEP

--- Trying RBO in BB05 ---
Relop [000032] BB05 value unknown, trying inference
No usable PhiDef VNs
Flowgraph after redundant branch opts

Image

TestMinU32

A temp value is used to store the result of Min, so it's not associated the same way.

;  V02 arg1              int 
;  V06 tmp2              int  "Inline return value spill temp"
;  V12 tmp8              int  "field V03._length (fldOffset=0x8)" P-INDEP

int tmp2 = ((uint) chars.Length <= (uint) num) ? chars.Length : num;
if (tmp2 > chars.Length) // throw
Flowgraph after redundant branch opts

Image

@TonuFish TonuFish added the tenet-performance Performance related issue label Apr 29, 2025
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Apr 29, 2025
@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label Apr 29, 2025
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

@EgorBo EgorBo self-assigned this Apr 30, 2025
@EgorBo EgorBo removed the untriaged New issue has not been triaged by the area owner label Apr 30, 2025
@EgorBo EgorBo added this to the Future milestone Apr 30, 2025
@EgorBo
Copy link
Member

EgorBo commented May 2, 2025

Haven't checked all of them, but TestPredicateU32 seems to be the same underlying issue as #112953

@TonuFish
Copy link
Author

TonuFish commented May 4, 2025

Yep, both scenarios there drop the bounds check the same way if they're tweaked so the predicate matches the span one.

Not sure which issue this is more appropriate on, but as below.

godbolt

Test method
public static ReadOnlySpan<char> MatchLogic(ReadOnlySpan<char> span, int offset)
{
    if ((uint)(offset + 1) <= (uint)span.Length) // +1<= instead of <
    {
        // No bounds check
        return span.Slice(offset + 1);
    }

    return span;
}
;  V02 arg1              int                                  // offset
;  V04 tmp1              int  "Inlining Arg"                  // offset + 1
;  V11 tmp8              int  "field V01._length (fldOffset=0x8)" P-INDEP

--- Trying RBO in BB02 ---
Relop [000027] BB02 value unknown, trying inference

Dominator BB01 of BB02 has relop with reversed liberal VN
N005 (  5,  5) [000006] N---G+-N-U-                         *  GT        int    <l:$2c1, c:$2c2>
N003 (  3,  3) [000002] -----+-----                         +--*  ADD       int    $2c0
N001 (  1,  1) [000000] -----+-----                         |  +--*  LCL_VAR   int    V02 arg1         u:1 $100
N002 (  1,  1) [000001] -----+-----                         |  \--*  CNS_INT   int    1 $41
N004 (  1,  1) [000005] -----+-----                         \--*  LCL_VAR   int    V11 tmp8         u:1 <l:$280, c:$101>
 Redundant compare; current relop:
N003 (  3,  3) [000027] N----+-N-U-                         *  LE        int    <l:$2c3, c:$2c4>
N001 (  1,  1) [000023] -----+-----                         +--*  LCL_VAR   int    V04 tmp1         u:1 $2c0
N002 (  1,  1) [000026] -----+-----                         \--*  LCL_VAR   int    V11 tmp8         u:1 <l:$280, c:$101>
False successor BB02 of BB01 reaches, relop [000027] must be true

Conditional folded at BB02
BB02 becomes a BBJ_ALWAYS to BB04
Count method
public static int MatchLogic(ReadOnlySpan<char> span, char c)
{
    int count = 0;
    int pos;
    while ((uint)((pos = span.IndexOf(c)) + 1) <= (uint)span.Length) // +1<= instead of <
    {
        count++;
        // No bounds check
        span = span.Slice(pos + 1);
    }
    return count;
}
;  V03 loc1              int                                         // pos 
;  V06 tmp2              int  "Inlining Arg"                         // slice `pos + 1`
;  V18 tmp14             int  "Inline return value spill temp"       // while conditional pos assignment
;  V23 tmp19             int  "field V00._length (fldOffset=0x8)" P-INDEP

--- Trying RBO in BB02 ---
Relop [000035] BB02 value unknown, trying inference

Dominator BB09 of BB02 has relop with same liberal VN
N005 (  5,  5) [000015] N---G+-N-U-                         *  LE        int    $346
N003 (  3,  3) [000011] -----+-----                         +--*  ADD       int    $345
N001 (  1,  1) [000007] -----+-----                         |  +--*  LCL_VAR   int    V18 tmp14        u:3 $2c2
N002 (  1,  1) [000010] -----+-----                         |  \--*  CNS_INT   int    1 $41
N004 (  1,  1) [000014] -----+-----                         \--*  LCL_VAR   int    V23 tmp19        u:2 $2c1
 Redundant compare; current relop:
N003 (  3,  3) [000035] N----+-N-U-                         *  LE        int    $346
N001 (  1,  1) [000031] -----+-----                         +--*  LCL_VAR   int    V06 tmp2         u:1 $345
N002 (  1,  1) [000034] -----+-----                         \--*  LCL_VAR   int    V23 tmp19        u:2 $2c1
True successor BB02 of BB09 reaches, relop [000035] must be true

Conditional folded at BB02
BB02 becomes a BBJ_ALWAYS to BB03

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI tenet-performance Performance related issue
Projects
None yet
Development

No branches or pull requests

2 participants