-
Notifications
You must be signed in to change notification settings - Fork 5k
Significant performance difference between x += y and x = x + y on properties, differing between hardware and runtime version (7 / 8) #108227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
On .NET 8, both the add operators are compiled into one assembly. add dword ptr [rbx+0x08], r15d While on .NET 7, it's mov edi, r14d
add edi, dword ptr [rbx+08H]
mov dword ptr [rbx+08H], edi and mov edi, dword ptr [rbx+08H]
add edi, r14d
mov dword ptr [rbx+08H], edi , respectively. https://godbolt.org/z/51ooerfGr I'm unsure why the first one would be slower. |
It can be micro-architecture specific behavior of handling mem operands. Intensive loop may also increase the chance to mess things up by branch predictor and out-of-order execution. On my Ice Lake-SP there's merely no difference:
Manually unroll the loop by manipulating 8 properties in a row may also make the performance closer. |
.net 7.0:
.net 8.0 (same on net9.0):
so looks like everything is okay? |
@EgorBot -intel -arm64 --runtimes net7.0 net8.0 net9.0 using BenchmarkDotNet.Attributes;
public class FieldVsProperty
{
public int Prop_ReadWrite { get; set; } = Random.Shared.Next();
public static int N = 1000;
[Benchmark]
public int Property_ReadWrite_Write_Add()
{
for (int i = 0; i < N; i++)
{
Prop_ReadWrite += i;
}
return Prop_ReadWrite;
}
[Benchmark]
public int Property_ReadWrite_Write_Add_Separate()
{
for (int i = 0; i < N; i++)
{
var val = Prop_ReadWrite;
Prop_ReadWrite = val + i;
}
return Prop_ReadWrite;
}
} |
I can reproduce the same regression on Raptor Lake:
when affinitized to E-Cores:
Apparently there's something unhappy with the Golden Cove cores. E-Cores performs much better than P-Cores! |
I remember reading something in Intel's optimization guide that newer CPU models will fuse |
Your results exclude .NET 7. I'm wondering whether you see the same bump in execution time for |
I just executed on what I have on my machine. 6.0 represents for pre-8.0 which doesn't include the codegen change.
The behavior seems consistent for each micro-architecture. Ice Lake-SP, Gracemont: everything looks fine. |
@BruceForstall, PTAL when we get Meteor lake laptops this year. cc @dotnet/jit-contrib. |
What architecture were these run on? Yours are the only results I've seen that mimic my system. |
I remember Egor uses R9-7950X. It's also Zen 4. |
(I originally detailed this issue on StackOverflow, here.)
Description
The following two snippets produce wildly different benchmark results to eachother as well as between different machines and major runtime versions (where
SomeProperty
is anint
auto-property):The benchmark (below), when run on my machine, showed poor performance of the former case on .NET 7 but otherwise expected results. 2 others ran the benchmarks, resulting in poor performance for both cases on .NET 8 but not .NET 7. The host version did not appear to make a difference in these cases. I've included the benchmark results and system configurations below the benchmark code.
Potentially relevantly (but not directly related), I've noticed (but not been able to isolate) significant performance issues with setting data through a native memory pointer provided by mapping a Direct3D sub-resource which wasn't present on .NET 8 or any of my colleague's machines on .NET 7. That issue appears to be more strongly linked to number of assignments to the pointer than to the amount of data assigned.
Benchmark
Data
My machine (also ran this on my Arch Linux install, with no notable difference):
The two other machines:
Analysis
The most notable difference is between the CPU vendors, but the data is pretty limited.
The text was updated successfully, but these errors were encountered: