Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

EgorBo
Copy link
Member

@EgorBo EgorBo commented May 26, 2024

Fixes #100922 regression - it was regressed by #99982

bool AllAscii(Vector128<byte> vector) => 
    (vector & Vector128.Create((byte)0x80)).Equals(Vector128<byte>.Zero);

Main:

; Method Proga:AllAscii
            stp     fp, lr, [sp, #-0x10]!
            mov     fp, sp
            movi    v16.16b, #0x80
            and     v16.16b, v0.16b, v16.16b
            umaxp   v16.4s, v16.4s, v16.4s
            umov    x0, v16.d[0]
            cmp     x0, #0
            cset    x0, eq
            ldp     fp, lr, [sp], #0x10
            ret     lr
; Total bytes of code: 40

PR:

; Method Proga:AllAscii
            stp     fp, lr, [sp, #-0x10]!
            mov     fp, sp
            umaxp   v16.4s, v0.4s, v0.4s
            umov    x0, v16.d[0]
            tst     x0, #0x8080808080808080
            cset    x0, eq
            ldp     fp, lr, [sp], #0x10
            ret     lr
; Total bytes of code: 32

@ghost ghost added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label May 26, 2024
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

@EgorBo
Copy link
Member Author

EgorBo commented May 26, 2024

@EgorBot -arm64 -profiler

using BenchmarkDotNet.Attributes;
using System.Buffers;
using System.Text;
using BenchmarkDotNet.Running;

BenchmarkRunner.Run<Perf_Ascii>(args: args);

[DisassemblyDiagnoser(maxDepth: 5)]
public class Perf_Ascii
{
    byte[] _bytes = new byte[128];
    char[] _characters = new char[128];

    [Benchmark]
    public OperationStatus ToUtf16() => Ascii.ToUtf16(_bytes, _characters, out _);
}

@EgorBot
Copy link

EgorBot commented May 27, 2024

Results on Arm64

BenchmarkDotNet v0.13.12, Ubuntu 22.04.4 LTS (Jammy Jellyfish)
Unknown processor
  Job-TOHLVW : .NET 9.0.0 (42.42.42.42424), Arm64 RyuJIT AdvSIMD
  Job-IXPJBK : .NET 9.0.0 (), Arm64 RyuJIT AdvSIMD
Method Toolchain Mean Error Ratio Code Size
ToUtf16 Main 17.24 ns 0.004 ns 1.00 516 B
ToUtf16 PR 15.67 ns 0.001 ns 0.91 500 B

See BDN_Artifacts.zip for details.

🔥Profiler

Flame graphs: Main vs PR (interactive!)
Hot asm: Main vs PR
Hot functions: Main vs PR

Notes

For clean perf results, make sure you have just one [Benchmark] in your app.

// If op is "vec & cnsVec" where both u64 components in that cnsVec are the same (for both SIMD12 and
// SIMD16) then we'd better do this AND on top of TYP_LONG NI_AdvSimd_Extract in the end - it produces a
// more optimal codegen.
if (op->OperIsHWIntrinsic(NI_AdvSimd_And) && op->AsHWIntrinsic()->Op(2)->OperIs(GT_CNS_VEC))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't have to be a constant right?

Just any (x & y) == zero or (x & y) != zero can be optimzied down to a tst (on both xarch and arm64).

Copy link
Contributor

Draft Pull Request was automatically closed for 30 days of inactivity. Please let us know if you'd like to reopen it.

@xtqqczze
Copy link
Contributor

Blocks #105047.

Copy link
Contributor

Draft Pull Request was automatically closed for 30 days of inactivity. Please let us know if you'd like to reopen it.

@github-actions github-actions bot locked and limited conversation to collaborators Sep 17, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Perf] Linux/arm64: 4 Regressions on 4/8/2024 7:16:22 PM
4 participants