Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[Perf] Windows/x64: 4 Regressions on 10/27/2024 3:01:56 PM #109347

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
performanceautofiler bot opened this issue Oct 29, 2024 · 6 comments
Closed

[Perf] Windows/x64: 4 Regressions on 10/27/2024 3:01:56 PM #109347

performanceautofiler bot opened this issue Oct 29, 2024 · 6 comments
Assignees
Labels
arch-x64 area-System.Collections os-windows runtime-coreclr specific to the CoreCLR runtime tenet-performance Performance related issue tenet-performance-benchmarks Issue from performance benchmark
Milestone

Comments

@performanceautofiler
Copy link

Run Information

Name Value
Architecture x64
OS Windows 10.0.22621
Queue TigerWindows
Baseline 601753aeed1e3454328a23c7e8ab4a3b3277e87b
Compare e4fe27d8be89ac6805d9f2d4e92295a3322364a9
Diff Diff
Configs CompilationMode:tiered, RunKind:micro

Regressions in System.Collections.ContainsTrue<String>

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
174.53 μs 284.86 μs 1.63 0.44 False
249.64 μs 284.25 μs 1.14 0.13 False
248.22 μs 283.71 μs 1.14 0.14 False

graph
graph
graph
Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Collections.ContainsTrue&lt;String&gt;*'

System.Collections.ContainsTrue<String>.Stack(Size: 512)

ETL Files

Histogram

JIT Disasms

System.Collections.ContainsTrue<String>.ICollection(Size: 512)

ETL Files

Histogram

JIT Disasms

System.Collections.ContainsTrue<String>.List(Size: 512)

ETL Files

Histogram

JIT Disasms

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository


Run Information

Name Value
Architecture x64
OS Windows 10.0.22621
Queue TigerWindows
Baseline 601753aeed1e3454328a23c7e8ab4a3b3277e87b
Compare e4fe27d8be89ac6805d9f2d4e92295a3322364a9
Diff Diff
Configs CompilationMode:tiered, RunKind:micro

Regressions in System.Collections.IterateForEach<String>

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
26.17 μs 33.99 μs 1.30 0.13 False

graph
Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Collections.IterateForEach&lt;String&gt;*'

System.Collections.IterateForEach<String>.ImmutableDictionary(Size: 512)

ETL Files

Histogram

JIT Disasms

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

@performanceautofiler performanceautofiler bot added arch-x64 os-windows runtime-coreclr specific to the CoreCLR runtime untriaged New issue has not been triaged by the area owner labels Oct 29, 2024
@DrewScoggins DrewScoggins transferred this issue from dotnet/perf-autofiling-issues Oct 29, 2024
@DrewScoggins DrewScoggins added tenet-performance Performance related issue tenet-performance-benchmarks Issue from performance benchmark labels Oct 29, 2024
@jeffhandley jeffhandley removed the untriaged New issue has not been triaged by the area owner label Jan 16, 2025
@jeffhandley jeffhandley added this to the 10.0.0 milestone Jan 16, 2025
@AndyAyersMS
Copy link
Member

AndyAyersMS commented Feb 15, 2025

Current collation

Notes Recent Score Orig Score Tiger Ubuntu Tiger Windows Surface Windows Viper Ubuntu Viper Windows Benchmark
noise 1.41 1.42 1.40
1.63
System.Collections.ContainsTrue(String).Stack(Size: 512)
see below 1.21 1.34 1.18
1.30
1.01
1.30
1.00
1.22
1.58
1.53
1.42
1.40
System.Collections.IterateForEach(String).ImmutableDictionary(Size: 512)
see below 1.18 1.19 1.26
1.32
System.Collections.AddGivenSize(String).List(Size: 512)
bimodal 1.15 0.81 1.15
0.81
System.Collections.ContainsFalse(String).List(Size: 512)
fixed 1.15 1.20 1.15
1.20
System.Collections.Tests.Add_Remove_SteadyState(String).ConcurrentBag(Count: 512)
bimodal 1.10 1.38 1.46
1.67
0.82
1.14
System.Collections.ContainsTrue(String).List(Size: 512)
1.01 0.94 1.06
0.93
0.92
0.94
1.05
0.94
System.Collections.CreateAddAndRemove(String).LinkedList(Size: 512)
1.00 0.83 1.00
0.83
System.Collections.CreateAddAndClear(String).Queue(Size: 512)
1.00 0.92 1.00
0.92
System.Collections.ContainsTrueComparer(String).HashSet(Size: 512)
0.99 0.85 0.99
0.85
System.Collections.ContainsFalse(String).ImmutableArray(Size: 512)
0.98 0.84 0.98
0.84
System.Collections.ContainsTrue(String).ImmutableArray(Size: 512)
0.97 0.91 0.97
0.91
System.Collections.ContainsTrueComparer(String).FrozenSet(Size: 512)
0.95 1.04 1.47
1.66
0.62
0.65
System.Collections.ContainsTrue(String).Queue(Size: 512)
0.95 1.36 1.42
1.63
0.63
1.14
System.Collections.ContainsTrue(String).ICollection(Size: 512)
0.92 1.15 0.92
1.15
System.Collections.IterateForEach(String).ImmutableHashSet(Size: 512)
0.89 0.82 0.89
0.82
System.Collections.Concurrent.IsEmpty(String).Bag(Size: 512)
0.88 0.74 0.67
0.67
1.16
0.82
System.Collections.ContainsFalse(String).ICollection(Size: 512)
0.88 0.81 0.88
0.81
System.Collections.Concurrent.IsEmpty(String).Bag(Size: 0)
0.85 0.85 0.89
0.86
0.81
0.84
System.Collections.CtorDefaultSize(String).SortedSet
0.85 0.86 0.88
0.88
0.84
0.88
0.82
0.83
System.Collections.CtorDefaultSize(String).Queue
0.83 0.88 0.83
0.88
System.Collections.CtorDefaultSize(String).HashSet
0.83 0.65 1.01
0.65
0.68
0.65
System.Collections.ContainsFalse(String).Queue(Size: 512)
0.83 0.86 0.85
0.88
0.85
0.87
0.78
0.84
System.Collections.CtorDefaultSize(String).List
0.81 0.83 0.81
0.83
System.Collections.CtorDefaultSize(String).Stack
0.77 0.78 0.70
0.70
0.76
0.80
0.86
0.84
0.76
0.77
System.Collections.CtorDefaultSize(String).SortedList

@AndyAyersMS
Copy link
Member

System.Collections.IterateForEach(String).ImmutableDictionary(Size: 512)

This is mostly remedied, except for zen4 on windows

Image

System.Collections.AddGivenSize(String).List(Size: 512)

Mostly remedied, except for intel linux

Image

@AndyAyersMS
Copy link
Member

For AddGivenSize, inner loops are similar but .NET 9 has better alignment (despite there being a call in the loop). Wonder if that's it...

Current mainline offsets are higher because we inline the list ctor, which accesses a shared static field (which is what the blamed PR enables).

;; .NET 9

G_M24367_IG05:        ; offs=0x000053, size=0x0023, bbWeight=512.09, gcrefRegs=8008 {rbx r15}, byrefRegs=4000 {r14}, BB05 [0001], BB06 [0010], byref, isz
IN0012: 000053 mov      rdx, gword ptr [r14]
                            ; gcrRegs +[rdx]
IN0013: 000056 inc      dword ptr [r15+0x14]
IN0014: 00005A mov      rdi, gword ptr [r15+0x08]
                            ; gcrRegs +[rdi]
IN0015: 00005E mov      esi, dword ptr [r15+0x10]
IN0016: 000062 cmp      dword ptr [rdi+0x08], esi
IN0017: 000065 jbe      SHORT G_M24367_IG10
IN0018: 000067 lea      eax, [rsi+0x01]
IN0019: 00006A mov      dword ptr [r15+0x10], eax
IN001a: 00006E movsxd   rsi, esi
IN001b: 000071 call     CORINFO_HELP_ARRADDR_ST
                            ; gcrRegs -[rdx rdi]
                            ; gcr arg pop 0
						;; size=35 bbWeight=512.09 PerfScore 8065.38
G_M24367_IG06:        ; offs=0x000076, size=0x0009, bbWeight=512.09, gcrefRegs=8008 {rbx r15}, byrefRegs=4000 {r14}, BB07 [0012], byref, isz
IN001c: 000076 add      r14, 8
IN001d: 00007A dec      r13d
IN001e: 00007D jne      SHORT G_M24367_IG05

versus

G_M24367_IG08:        ; offs=0x00008F, size=0x0024, bbWeight=514.51, gcrefRegs=8008 {rbx r15}, byrefRegs=4000 {r14}, BB07 [0001], BB08 [0012], byref
IN0020: 00008F mov      rdx, gword ptr [r14]
                            ; gcrRegs +[rdx]
IN0021: 000092 inc      dword ptr [r15+0x14]
IN0022: 000096 mov      rdi, gword ptr [r15+0x08]
                            ; gcrRegs +[rdi]
IN0023: 00009A mov      esi, dword ptr [r15+0x10]
IN0024: 00009E cmp      dword ptr [rdi+0x08], esi
IN0025: 0000A1 jbe      G_M24367_IG18
IN0026: 0000A7 lea      eax, [rsi+0x01]
IN0027: 0000AA mov      dword ptr [r15+0x10], eax
IN0028: 0000AE call     CORINFO_HELP_ARRADDR_ST
                            ; gcrRegs -[rdx rdi]
                            ; gcr arg pop 0
						;; size=36 bbWeight=514.51 PerfScore 7974.97
G_M24367_IG09:        ; offs=0x0000B3, size=0x0009, bbWeight=514.51, gcrefRegs=8008 {rbx r15}, byrefRegs=4000 {r14}, BB10 [0014], byref, isz
IN0029: 0000B3 add      r14, 8
IN002a: 0000B7 dec      r13d
IN002b: 0000BA jne      SHORT G_M24367_IG08

Let's see if we can easily enable alignment padding for loops with calls...

@AndyAyersMS
Copy link
Member

Some of these improved with #114191, eg
Image

@AndyAyersMS
Copy link
Member

Going to close now, given that this is down to one benchmark on only one HW config.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arch-x64 area-System.Collections os-windows runtime-coreclr specific to the CoreCLR runtime tenet-performance Performance related issue tenet-performance-benchmarks Issue from performance benchmark
Projects
None yet
Development

No branches or pull requests

3 participants