ARM64: Optimize Volatile.Read/Write for floats #101359

EgorBo · 2024-04-22T01:26:40Z

This PR eliminates explicit full/load memory barriers for loads and stores for floating points. Example:

double _location = 0;

double Read() => Volatile.Read(ref _location);
void Write(double val) => Volatile.Write(ref _location, val);

; Method Tests:Read():double:this (FullOpts)
            stp     fp, lr, [sp, #-0x10]!
            mov     fp, sp
            add     x0, x0, #8
-           ldr     d0, [x0]
-           dmb     ishld
+           ldar    x0, [x0]
+           fmov    d0, x0
            ldp     fp, lr, [sp], #0x10
            ret     lr
; Total bytes of code: 28


; Method Tests:Write(double):this (FullOpts)
            stp     fp, lr, [sp, #-0x10]!
            mov     fp, sp
            add     x0, x0, #8
-           dmb     ish
-           str     d0, [x0]
+           mov     x1, v0.d[0]
+           stlr    x1, [x0]
            ldp     fp, lr, [sp], #0x10
            ret     lr
; Total bytes of code: 28

…e-stores-loads-floats

EgorBo · 2024-04-30T16:36:28Z

PTAL @kunalspathak @TIHan @VSadov @dotnet/jit-contrib

Mostly text diffs since the codegen size is the same. A couple of regressions because ldr + dmb can encode addressing modes, but when we switch to ldapr - we no longer can contain it, we can do it only with armv8.4 LDAPUR, but our SPMI collections don't have this ISA

EgorBo · 2024-05-02T16:14:57Z

ping @kunalspathak @TIHan

kunalspathak · 2024-05-02T17:29:26Z

Not sure if I understand this...in the asmdiff above, we are loading/storing int value instead of float value?

-           ldr     d0, [x0]
-           dmb     ishld
+           ldar    x0, [x0]
+           fmov    d0, x0

EgorBo · 2024-05-02T17:46:46Z

@kunalspathak

Not sure if I understand this...in the asmdiff above, we are loading/storing int value instead of float value?
-           ldr     d0, [x0]
-           dmb     ishld
+           ldar    x0, [x0]
+           fmov    d0, x0

So

-           ldr     d0, [x0]
-           dmb     ishld

means "let's load a 4-byte float from x0 memory directly into simd(float) reg". Since there is no acq-release kind of ldr (you can't do ldar d0, [x0]) for SIMD destination, we have to emit an explicit memory barrier.

So in order to workaround it, we do this 4-byte load into a GPR reg (int) first, so we can avoid emitting a memory barrier since ldar gives us the needed semantics already, and then we just move it to a float reg, hence, fmov d0, x0

Effectively, we do:

float foo;

float LoadVolatile()
{
    return Unsafe.BitCast<int, float>
        (Volatile.Read(
            ref Unsafe.As<float, int>(ref foo)));
}

EgorBo · 2024-05-02T17:52:30Z

Native compiles use the same trick, e.g.: https://godbolt.org/z/KKTh79xbc

kunalspathak

LGTM

VSadov

Interesting! LGTM.

EgorBo · 2024-05-09T16:20:37Z

Improvements:

[Perf] Linux/arm64: 22 Improvements on 5/3/2024 2:25:46 AM perf-autofiling-issues#34053

EgorBo added 2 commits April 22, 2024 03:16

ARM64: Optimize Volatile.Read/Write for floats

73f3b9d

Clean up

12f44de

ghost added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Apr 22, 2024

dotnet-policy-service bot assigned EgorBo Apr 22, 2024

EgorBo added 5 commits April 22, 2024 03:46

Clean up

a7ee35e

Update lower.cpp

101c9e8

Update lower.cpp

17c1f6e

Merge branch 'main' of https://github.com/dotnet/runtime into volatil…

f117f3b

…e-stores-loads-floats

Handle LDAPUR

5ddab49

EgorBo marked this pull request as ready for review April 30, 2024 16:34

EgorBo requested review from kunalspathak and TIHan April 30, 2024 16:36

build-analysis bot mentioned this pull request Apr 30, 2024

System.Numerics.Tensors.Tests.SingleGenericTensorPrimitives.SpanScalarDestination_SpecialValues fails #101721

Closed

kunalspathak approved these changes May 2, 2024

View reviewed changes

VSadov approved these changes May 2, 2024

View reviewed changes

EgorBo merged commit 5da5873 into dotnet:main May 2, 2024

EgorBo deleted the volatile-stores-loads-floats branch May 2, 2024 19:18

michaelgsharp pushed a commit to michaelgsharp/runtime that referenced this pull request May 9, 2024

ARM64: Optimize Volatile.Read/Write for floats (dotnet#101359)

2a05c20

Ruihan-Yin pushed a commit to Ruihan-Yin/runtime that referenced this pull request May 30, 2024

ARM64: Optimize Volatile.Read/Write for floats (dotnet#101359)

b9e9ea1

github-actions bot locked and limited conversation to collaborators Jun 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ARM64: Optimize Volatile.Read/Write for floats #101359

ARM64: Optimize Volatile.Read/Write for floats #101359

Uh oh!

EgorBo commented Apr 22, 2024 •

edited

Loading

Uh oh!

EgorBo commented Apr 30, 2024 •

edited

Loading

Uh oh!

EgorBo commented May 2, 2024

Uh oh!

kunalspathak commented May 2, 2024

Uh oh!

EgorBo commented May 2, 2024 •

edited

Loading

Uh oh!

EgorBo commented May 2, 2024

Uh oh!

kunalspathak left a comment

Uh oh!

VSadov left a comment

Uh oh!

EgorBo commented May 9, 2024

Uh oh!

Uh oh!

ARM64: Optimize Volatile.Read/Write for floats #101359

ARM64: Optimize Volatile.Read/Write for floats #101359

Uh oh!

Conversation

EgorBo commented Apr 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

EgorBo commented Apr 30, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

EgorBo commented May 2, 2024

Uh oh!

kunalspathak commented May 2, 2024

Uh oh!

EgorBo commented May 2, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

EgorBo commented May 2, 2024

Uh oh!

kunalspathak left a comment

Choose a reason for hiding this comment

Uh oh!

VSadov left a comment

Choose a reason for hiding this comment

Uh oh!

EgorBo commented May 9, 2024

Uh oh!

Uh oh!

EgorBo commented Apr 22, 2024 •

edited

Loading

EgorBo commented Apr 30, 2024 •

edited

Loading

EgorBo commented May 2, 2024 •

edited

Loading