Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

tmds
Copy link
Member

@tmds tmds commented May 7, 2020

This optimize the common case where there is at most one
on-going receive, and one on-going send operation.

cc @stephentoub @adamsitnik @antonfirsov @karelz

This optimize the common case where there is at most one
on-going receive, and one on-going send operation.
@ghost
Copy link

ghost commented May 7, 2020

Tagging subscribers to this area: @dotnet/ncl
Notify danmosemsft if you want to be subscribed.

@tmds
Copy link
Member Author

tmds commented May 7, 2020

This is meant for benchmarking. No need to review until benchmarks show it is worth it.

@adamsitnik
Copy link
Member

This is meant for benchmarking.

@tmds is there any chance you could share your copy of System.Net.Sockets.dll with me? I am just lazy and would like to go the easy way without compiling your fork ;)

@tmds
Copy link
Member Author

tmds commented May 7, 2020

Adam, here it is: System.Net.Sockets.dll.tar.gz

@adamsitnik
Copy link
Member

obraz

BTW I was looking at the JSON profile today and we spent sth around 1.5% of total time in this particular lock

obraz

@tmds
Copy link
Member Author

tmds commented May 8, 2020

It looks like slight gains on x64, and mixed gain/loss on arm64.
The numbers aren't very consistent, and they deviate a lot from 1.5% spent in the lock for JSON.
If you run these benchmarks again, will the results be similar?

@adamsitnik
Copy link
Member

I've tried to run a few of them and the results seems similar (and still not super stable)

@adamsitnik
Copy link
Member

I've run the benchmarks one more time, the results look very similar:

obraz

(I've interrupted the ARM run as I am finishing work for today)

@tmds
Copy link
Member Author

tmds commented May 11, 2020

There's no reason these changes should regress performance.
I don't know what is going on on arm64. Do we care about figuring it out?
Adam, can you collect a perftrace on arm64 for jsonplatform with 128 connections before and after? Maybe it tells us something.

@adamsitnik
Copy link
Member

I don't know what is going on on arm64.

From what I've seen in the profiles so far is that all the Interlocked and Volatile operations which seem to be almost immediate on x64 are not so cheap on ARM64.

A good example is this simple micro-benchmark:

public class Perf_Volatile
{
    private double _location = 0;
    private double _newValue = 1;
    
    [Benchmark]
    public double Read_double() => Volatile.Read(ref _location);

    [Benchmark]
    public void Write_double() => Volatile.Write(ref _location, _newValue);
}
 BenchmarkDotNet=v0.12.1, OS=ubuntu 18.04
 ARMv8 Processor rev 1 (v8l), 4 logical cores
 .NET Core SDK=5.0.100-preview.4.20217.5
   [Host]     : .NET Core 5.0.0 (CoreCLR 5.0.20.21702, CoreFX 5.0.20.21702), Arm64 RyuJIT
   Job-ULPSAR : .NET Core 5.0.0 (CoreCLR 5.0.20.21702, CoreFX 5.0.20.21702), Arm64 RyuJIT
Method Mean
Read_double 10.34 ns
Write_double 17.53 ns
 BenchmarkDotNet=v0.12.1, OS=Windows 10.0.18363.778 (1909/November2018Update/19H2)
Intel Xeon CPU E5-1650 v4 3.60GHz, 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=5.0.100-preview.4.20217.2
  [Host]     : .NET Core 5.0.0 (CoreCLR 5.0.20.21611, CoreFX 5.0.20.21611), X64 RyuJIT
  Job-OXOUZI : .NET Core 5.0.0 (CoreCLR 5.0.20.21611, CoreFX 5.0.20.21611), X64 RyuJIT
Method Mean
Read_double 0.0006 ns
Write_double 0.2595 ns

@kunalspathak is this expected? Can we do anything about it?

@adamsitnik
Copy link
Member

Adam, can you collect a perftrace on arm64 for jsonplatform with 128 connections before and after?

Sure, I can get them for you tomorrow. BTW when my VPN is not working I just close everything on my PC and run the benchmark server and wrk myself. I use the wrk arguments from this file: https://github.com/aspnet/Benchmarks/blob/master/src/WrkClient/wrk.yml and it typically can give me an answer whether my change is going to improve the perf or not. I know that it is far from perfect, but it should shorten your perf feedback loop.

@stephentoub
Copy link
Member

From what I've seen in the profiles so far is that all the Interlocked and Volatile operations which seem to be almost immediate on x64 are not so cheap on ARM64

On x86/64 the architecture's memory model is strong enough that volatile operations end up serving just as a compiler barrier; the actual instruction output doesn't differ based on whether the read or write is volatile. You can see that here:
https://sharplab.io/#v2:EYLgxg9gTgpgtADwGwBYA0AXEBDAzgWwB8ABABgAJiBGAOgBUALWbAEwEsA7AcwG4BYAFDEAzJQBM5AMLkA3oPILKozhnIBZKgApYAM3Ir9ASnIBeAHzkAahAA22DGxswaAJRittMPW0P8BipX0OVTUxT29go1MLNj8AXyA=

On ARM, there's a weaker memory model, so volatile operations end up entailing actual barriers in the JIT'd instructions, e.g. dmb.

@kunalspathak
Copy link
Contributor

Echoing to what @stephentoub has said, one recommendation would be to perhaps see possibilities of reducing volatile variable access inside a loop. See #34225 for example.

@tmds
Copy link
Member Author

tmds commented May 12, 2020

Sure, I can get them for you tomorrow.

@adamsitnik let me make a few changes before you collect traces.

@tmds
Copy link
Member Author

tmds commented May 12, 2020

@adamsitnik I pushed the change. If you collect perftraces for arm64 for jsonplatform with 128 connections before and after, maybe we'll learn why this is regressing.

This is compiled System.Net.Sockets: System.Net.Sockets.dll.tar.gz

@tmds
Copy link
Member Author

tmds commented May 12, 2020

I got traces from Adam and took a look.
The variance between benchmarks makes it not possible to derive something meaningful.

This change is trying to replace something with something else, assuming it is cheaper. But I'm not really sure it is cheaper, and I don't have a good way to measure.
I'm giving up on this.

@tmds tmds closed this May 12, 2020
@stephentoub
Copy link
Member

Thanks for trying, @tmds.

@karelz karelz added this to the 5.0.0 milestone Aug 18, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 9, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants