-
Notifications
You must be signed in to change notification settings - Fork 5.2k
ARM64: Optimize Volatile.Read/Write for floats #101359
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
PTAL @kunalspathak @TIHan @VSadov @dotnet/jit-contrib Mostly text diffs since the codegen size is the same. A couple of regressions because |
ping @kunalspathak @TIHan |
Not sure if I understand this...in the asmdiff above, we are loading/storing
|
So - ldr d0, [x0]
- dmb ishld means "let's load a 4-byte float from x0 memory directly into simd(float) reg". Since there is no acq-release kind of So in order to workaround it, we do this 4-byte load into a GPR reg (int) first, so we can avoid emitting a memory barrier since Effectively, we do: float foo;
float LoadVolatile()
{
return Unsafe.BitCast<int, float>
(Volatile.Read(
ref Unsafe.As<float, int>(ref foo)));
} |
Native compiles use the same trick, e.g.: https://godbolt.org/z/KKTh79xbc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting! LGTM.
Closes #67254
This PR eliminates explicit full/load memory barriers for loads and stores for floating points. Example: