-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Socket.Unix: reduce locking by using Interlocked operations #36008
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This optimize the common case where there is at most one on-going receive, and one on-going send operation.
Tagging subscribers to this area: @dotnet/ncl |
This is meant for benchmarking. No need to review until benchmarks show it is worth it. |
@tmds is there any chance you could share your copy of |
Adam, here it is: System.Net.Sockets.dll.tar.gz |
It looks like slight gains on x64, and mixed gain/loss on arm64. |
I've tried to run a few of them and the results seems similar (and still not super stable) |
There's no reason these changes should regress performance. |
From what I've seen in the profiles so far is that all the A good example is this simple micro-benchmark: public class Perf_Volatile
{
private double _location = 0;
private double _newValue = 1;
[Benchmark]
public double Read_double() => Volatile.Read(ref _location);
[Benchmark]
public void Write_double() => Volatile.Write(ref _location, _newValue);
} BenchmarkDotNet=v0.12.1, OS=ubuntu 18.04
ARMv8 Processor rev 1 (v8l), 4 logical cores
.NET Core SDK=5.0.100-preview.4.20217.5
[Host] : .NET Core 5.0.0 (CoreCLR 5.0.20.21702, CoreFX 5.0.20.21702), Arm64 RyuJIT
Job-ULPSAR : .NET Core 5.0.0 (CoreCLR 5.0.20.21702, CoreFX 5.0.20.21702), Arm64 RyuJIT
BenchmarkDotNet=v0.12.1, OS=Windows 10.0.18363.778 (1909/November2018Update/19H2)
Intel Xeon CPU E5-1650 v4 3.60GHz, 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=5.0.100-preview.4.20217.2
[Host] : .NET Core 5.0.0 (CoreCLR 5.0.20.21611, CoreFX 5.0.20.21611), X64 RyuJIT
Job-OXOUZI : .NET Core 5.0.0 (CoreCLR 5.0.20.21611, CoreFX 5.0.20.21611), X64 RyuJIT
@kunalspathak is this expected? Can we do anything about it? |
Sure, I can get them for you tomorrow. BTW when my VPN is not working I just close everything on my PC and run the benchmark server and |
On x86/64 the architecture's memory model is strong enough that volatile operations end up serving just as a compiler barrier; the actual instruction output doesn't differ based on whether the read or write is volatile. You can see that here: On ARM, there's a weaker memory model, so volatile operations end up entailing actual barriers in the JIT'd instructions, e.g. dmb. |
Echoing to what @stephentoub has said, one recommendation would be to perhaps see possibilities of reducing volatile variable access inside a loop. See #34225 for example. |
@adamsitnik let me make a few changes before you collect traces. |
@adamsitnik I pushed the change. If you collect perftraces for arm64 for jsonplatform with 128 connections before and after, maybe we'll learn why this is regressing. This is compiled System.Net.Sockets: System.Net.Sockets.dll.tar.gz |
I got traces from Adam and took a look. This change is trying to replace something with something else, assuming it is cheaper. But I'm not really sure it is cheaper, and I don't have a good way to measure. |
Thanks for trying, @tmds. |
This optimize the common case where there is at most one
on-going receive, and one on-going send operation.
cc @stephentoub @adamsitnik @antonfirsov @karelz