Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Expose an internal ISimdVector interface and being using it to deduplicate some SIMD code #90764

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Oct 3, 2023

Conversation

tannergooding
Copy link
Member

This is just a revival of #76423 now that we're early in .NET 9.

The intent is still to keep this interface internal for the time being and give us time to ensure it is working as expected and with the necessary API surface. Once that has happened, we can move towards an API proposal that makes it public and available to all.

I've only done one API for now with the plan that anyone can help with the broader simplification once this initial PR is in.

@ghost
Copy link

ghost commented Aug 17, 2023

Tagging subscribers to this area: @dotnet/area-system-runtime-intrinsics
See info in area-owners.md if you want to be subscribed.

Issue Details

This is just a revival of #76423 now that we're early in .NET 9.

The intent is still to keep this interface internal for the time being and give us time to ensure it is working as expected and with the necessary API surface. Once that has happened, we can move towards an API proposal that makes it public and available to all.

I've only done one API for now with the plan that anyone can help with the broader simplification once this initial PR is in.

Author: tannergooding
Assignees: tannergooding
Labels:

area-System.Runtime.Intrinsics

Milestone: -

@tannergooding tannergooding marked this pull request as ready for review August 17, 2023 19:37
@tannergooding
Copy link
Member Author

CC. @SamMonoRT, @vargaz

This would be the start of some more static virtual in interface usage, this time around SIMD/Vectorization, allowing us to deduplicate the 2-4 code paths down to using 1 shared implementation.

We'd like to get this work started here early in .NET 9 and want to help ensure that it isn't causing unexpected regressions for Mono due to the additional generics or code patterns encountered.

What's the ideal way for us to work together on this to help ensure it can successfully land?

@vargaz
Copy link
Contributor

vargaz commented Aug 17, 2023

Would need some early perf testing to see if mono can see through these added interfaces/generics and generate the same code as before.

@tannergooding
Copy link
Member Author

For RyuJIT, we're in the general realm of noise. With around 0.5ns difference between the before/after:

// * Summary *

BenchmarkDotNet v0.13.7-nightly.20230717.35, Windows 11 (10.0.23521.1000)
AMD Ryzen 9 7950X, 1 CPU, 32 logical and 16 physical cores
.NET SDK 8.0.100-preview.7.23376.3
  [Host]     : .NET 8.0.0 (8.0.23.37506), X64 RyuJIT AVX2
  Job-CSGULO : .NET 8.0.0 (42.42.42.42424), X64 RyuJIT AVX2

PowerPlanMode=00000000-0000-0000-0000-000000000000  Arguments=/p:EnableUnsafeBinaryFormatterSerialization=true  Toolchain=CoreRun
IterationTime=250.0000 ms  MaxIterationCount=20  MinIterationCount=15
WarmupCount=1

### Base

|                   Method | Size |     Mean |     Error |    StdDev |   Median |      Min |      Max | Allocated |
|------------------------- |----- |---------:|----------:|----------:|---------:|---------:|---------:|----------:|
| LastIndexOfValue (byte)  |    4 | 1.066 ns | 0.0171 ns | 0.0143 ns | 1.059 ns | 1.055 ns | 1.103 ns |         - |
| LastIndexOfValue (byte)  |   33 | 1.246 ns | 0.0043 ns | 0.0038 ns | 1.247 ns | 1.240 ns | 1.254 ns |         - |
| LastIndexOfValue (byte)  |  512 | 2.023 ns | 0.0416 ns | 0.0389 ns | 2.004 ns | 1.985 ns | 2.095 ns |         - |
| LastIndexOfValue (char)  |    4 | 1.239 ns | 0.0162 ns | 0.0144 ns | 1.232 ns | 1.226 ns | 1.270 ns |         - |
| LastIndexOfValue (char)  |   33 | 1.636 ns | 0.0268 ns | 0.0251 ns | 1.633 ns | 1.606 ns | 1.689 ns |         - |
| LastIndexOfValue (char)  |  512 | 4.376 ns | 0.0544 ns | 0.0509 ns | 4.355 ns | 4.315 ns | 4.477 ns |         - |
| LastIndexOfValue (int32) |    4 | 1.230 ns | 0.0087 ns | 0.0073 ns | 1.226 ns | 1.222 ns | 1.243 ns |         - |
| LastIndexOfValue (int32) |   33 | 1.769 ns | 0.0284 ns | 0.0252 ns | 1.761 ns | 1.743 ns | 1.820 ns |         - |
| LastIndexOfValue (int32) |  512 | 9.533 ns | 0.0473 ns | 0.0442 ns | 9.530 ns | 9.460 ns | 9.602 ns |         - |

### Diff

|                   Method | Size |       Mean |     Error |    StdDev |     Median |       Min |        Max | Allocated |
|------------------------- |----- |-----------:|----------:|----------:|-----------:|----------:|-----------:|----------:|
| LastIndexOfValue (byte)  |    4 |  0.9506 ns | 0.0255 ns | 0.0238 ns |  0.9418 ns | 0.9258 ns |  0.9812 ns |         - |
| LastIndexOfValue (byte)  |   33 |  1.3180 ns | 0.0178 ns | 0.0167 ns |  1.3130 ns | 1.3018 ns |  1.3483 ns |         - |
| LastIndexOfValue (byte)  |  512 |  2.5872 ns | 0.0213 ns | 0.0189 ns |  2.5877 ns | 2.5574 ns |  2.6173 ns |         - |
| LastIndexOfValue (char)  |    4 |  0.9461 ns | 0.0138 ns | 0.0129 ns |  0.9443 ns | 0.9286 ns |  0.9666 ns |         - |
| LastIndexOfValue (char)  |   33 |  1.4333 ns | 0.0043 ns | 0.0038 ns |  1.4346 ns | 1.4271 ns |  1.4386 ns |         - |
| LastIndexOfValue (char)  |  512 |  4.0203 ns | 0.1462 ns | 0.1684 ns |  4.0211 ns | 3.8272 ns |  4.2897 ns |         - |
| LastIndexOfValue (int32) |    4 |  1.2600 ns | 0.0125 ns | 0.0111 ns |  1.2570 ns | 1.2510 ns |  1.2840 ns |         - |
| LastIndexOfValue (int32) |   33 |  1.6230 ns | 0.0124 ns | 0.0104 ns |  1.6190 ns | 1.6150 ns |  1.6500 ns |         - |
| LastIndexOfValue (int32) |  512 | 10.1680 ns | 0.1225 ns | 0.1023 ns | 10.1990 ns | 9.9440 ns | 10.2850 ns |         - |

@tannergooding
Copy link
Member Author

tannergooding commented Aug 18, 2023

For Mono Interpreter we're actually faster, nearly across the board (a minor 1ns diff for int32, size 33). This would be because the impl did change slightly for V128/V256 to return a bool rather than a vector (the pattern that's better across all platforms, when considering things like Arm64 with no efficient move mask and AVX-512 which has explicit mask registers).

// * Summary *

BenchmarkDotNet v0.13.7-nightly.20230717.35, Windows 11 (10.0.23521.1000)
AMD Ryzen 9 7950X, 1 CPU, 32 logical and 16 physical cores
.NET SDK 8.0.100-rc.2.23417.14
  [Host]     : .NET 8.0.0 (8.0.23.41404), X64 RyuJIT AVX2
  Job-BXPLJC : .NET 8.0.0 (42.42.42.42424) using MonoVM, X64 AOT

PowerPlanMode=00000000-0000-0000-0000-000000000000  Arguments=/p:EnableUnsafeBinaryFormatterSerialization=true  Toolchain=CoreRun
IterationTime=250.0000 ms  MaxIterationCount=20  MinIterationCount=15
WarmupCount=1

### Base

|                   Method | Size |      Mean |     Error |    StdDev |    Median |       Min |       Max | Allocated |
|------------------------- |----- |----------:|----------:|----------:|----------:|----------:|----------:|----------:|
| LastIndexOfValue (byte)  |    4 |  32.02 ns |  2.904 ns |  3.345 ns |  33.00 ns |  26.73 ns |  36.61 ns |         - |
| LastIndexOfValue (byte)  |   33 |  62.24 ns |  2.440 ns |  2.506 ns |  62.33 ns |  56.24 ns |  67.19 ns |         - |
| LastIndexOfValue (byte)  |  512 | 521.76 ns | 32.488 ns | 37.414 ns | 508.13 ns | 467.27 ns | 597.32 ns |         - |
| LastIndexOfValue (char)  |    4 |  30.31 ns |  2.361 ns |  2.719 ns |  30.28 ns |  26.23 ns |  35.48 ns |         - |
| LastIndexOfValue (char)  |   33 |  67.86 ns |  5.012 ns |  5.771 ns |  67.81 ns |  57.76 ns |  78.49 ns |         - |
| LastIndexOfValue (char)  |  512 | 589.03 ns | 82.138 ns | 94.590 ns | 539.64 ns | 488.08 ns | 773.72 ns |         - |
| LastIndexOfValue (int32) |    4 |  32.12 ns |  3.439 ns |  3.960 ns |  30.95 ns |  26.49 ns |  37.78 ns |         - |
| LastIndexOfValue (int32) |   33 |  63.19 ns |  3.679 ns |  4.089 ns |  62.47 ns |  57.04 ns |  71.81 ns |         - |
| LastIndexOfValue (int32) |  512 | 578.06 ns | 46.895 ns | 52.124 ns | 572.31 ns | 519.06 ns | 697.19 ns |         - |

### Diff

|                   Method | Size |      Mean |      Error |     StdDev |    Median |       Min |       Max | Allocated |
|------------------------- |----- |----------:|-----------:|-----------:|----------:|----------:|----------:|----------:|
| LastIndexOfValue (byte)  |    4 |  29.74 ns |   2.382 ns |   2.744 ns |  29.84 ns |  26.24 ns |  35.40 ns |         - |
| LastIndexOfValue (byte)  |   33 |  60.50 ns |   5.339 ns |   5.934 ns |  58.54 ns |  50.87 ns |  72.78 ns |         - |
| LastIndexOfValue (byte)  |  512 | 529.91 ns |  43.949 ns |  48.849 ns | 522.20 ns | 480.33 ns | 645.70 ns |         - |
| LastIndexOfValue (char)  |    4 |  28.31 ns |   2.181 ns |   2.512 ns |  28.32 ns |  24.37 ns |  32.82 ns |         - |
| LastIndexOfValue (char)  |   33 |  63.74 ns |   6.143 ns |   7.074 ns |  66.13 ns |  53.17 ns |  76.21 ns |         - |
| LastIndexOfValue (char)  |  512 | 570.93 ns | 104.852 ns | 120.748 ns | 506.97 ns | 463.47 ns | 808.09 ns |         - |
| LastIndexOfValue (int32) |    4 |  31.15 ns |   2.701 ns |   3.110 ns |  31.41 ns |  25.69 ns |  35.93 ns |         - |
| LastIndexOfValue (int32) |   33 |  65.65 ns |   7.074 ns |   8.147 ns |  64.64 ns |  55.26 ns |  79.10 ns |         - |
| LastIndexOfValue (int32) |  512 | 520.08 ns |  55.150 ns |  61.299 ns | 492.92 ns | 454.80 ns | 643.00 ns |         - |

@tannergooding
Copy link
Member Author

MonoJIT also shows better numbers:

// * Summary *

BenchmarkDotNet v0.13.7-nightly.20230717.35, Windows 11 (10.0.23521.1000)
AMD Ryzen 9 7950X, 1 CPU, 32 logical and 16 physical cores
.NET SDK 8.0.100-rc.2.23417.14
  [Host]     : .NET 8.0.0 (8.0.23.41404), X64 RyuJIT AVX2
  Job-ZRIKCY : .NET 8.0.0 (42.42.42.42424) using MonoVM, X64 VectorSize=128

PowerPlanMode=00000000-0000-0000-0000-000000000000  Arguments=/p:EnableUnsafeBinaryFormatterSerialization=true  Too
IterationTime=250.0000 ms  MaxIterationCount=20  MinIterationCount=15
WarmupCount=1

### Base

|                   Method | Size |       Mean |     Error |    StdDev |     Median |        Min |        Max | Allocated |
|------------------------- |----- |-----------:|----------:|----------:|-----------:|-----------:|-----------:|----------:|
| LastIndexOfValue (byte)  |    4 |   8.196 ns | 0.0882 ns | 0.0782 ns |   8.153 ns |   8.140 ns |   8.363 ns |         - |
| LastIndexOfValue (byte)  |   33 |  15.131 ns | 0.1061 ns | 0.0828 ns |  15.111 ns |  15.026 ns |  15.280 ns |         - |
| LastIndexOfValue (byte)  |  512 | 113.619 ns | 0.7712 ns | 0.6021 ns | 113.602 ns | 112.858 ns | 115.000 ns |         - |
| LastIndexOfValue (char)  |    4 |   8.166 ns | 0.0775 ns | 0.0687 ns |   8.133 ns |   8.110 ns |   8.339 ns |         - |
| LastIndexOfValue (char)  |   33 |  16.969 ns | 0.0993 ns | 0.0929 ns |  16.966 ns |  16.818 ns |  17.162 ns |         - |
| LastIndexOfValue (char)  |  512 | 147.536 ns | 0.4763 ns | 0.4455 ns | 147.674 ns | 146.757 ns | 148.080 ns |         - |
| LastIndexOfValue (int32) |    4 |   7.937 ns | 0.1026 ns | 0.0959 ns |   7.897 ns |   7.830 ns |   8.151 ns |         - |
| LastIndexOfValue (int32) |   33 |  17.502 ns | 0.1111 ns | 0.0928 ns |  17.490 ns |  17.361 ns |  17.668 ns |         - |
| LastIndexOfValue (int32) |  512 | 146.433 ns | 0.9018 ns | 0.8435 ns | 146.686 ns | 145.116 ns | 148.024 ns |         - |

### Diff

|                   Method | Size |       Mean |     Error |    StdDev |     Median |        Min |        Max | Allocated |
|------------------------- |----- |-----------:|----------:|----------:|-----------:|-----------:|-----------:|----------:|
| LastIndexOfValue (byte)  |    4 |   7.438 ns | 0.0183 ns | 0.0153 ns |   7.441 ns |   7.399 ns |   7.463 ns |         - |
| LastIndexOfValue (byte)  |   33 |  14.829 ns | 0.1298 ns | 0.1215 ns |  14.802 ns |  14.676 ns |  15.057 ns |         - |
| LastIndexOfValue (byte)  |  512 | 113.664 ns | 0.4293 ns | 0.3805 ns | 113.816 ns | 113.011 ns | 114.119 ns |         - |
| LastIndexOfValue (char)  |    4 |   8.309 ns | 0.0632 ns | 0.0591 ns |   8.307 ns |   8.232 ns |   8.451 ns |         - |
| LastIndexOfValue (char)  |   33 |  16.565 ns | 0.1313 ns | 0.1164 ns |  16.551 ns |  16.331 ns |  16.734 ns |         - |
| LastIndexOfValue (char)  |  512 | 143.178 ns | 1.0673 ns | 0.9983 ns | 142.958 ns | 141.936 ns | 145.325 ns |         - |
| LastIndexOfValue (int32) |    4 |   8.167 ns | 0.0509 ns | 0.0451 ns |   8.181 ns |   8.071 ns |   8.215 ns |         - |
| LastIndexOfValue (int32) |   33 |  17.548 ns | 0.1093 ns | 0.1022 ns |  17.540 ns |  17.405 ns |  17.771 ns |         - |
| LastIndexOfValue (int32) |  512 | 142.449 ns | 0.6319 ns | 0.5602 ns | 142.442 ns | 141.321 ns | 143.266 ns |         - |

@tannergooding
Copy link
Member Author

@vargaz, any other scenarios that need testing? The numbers above are looking good to me.

@vargaz
Copy link
Contributor

vargaz commented Aug 18, 2023

The AOT configurations are usually the problematic ones.

@tannergooding
Copy link
Member Author

MonoAOT also looks good, similar results as above where its either marginally faster or the same perf overall:

// * Summary *

BenchmarkDotNet v0.13.7-nightly.20230717.35, Windows 11 (10.0.23526.1000)
AMD Ryzen 9 7950X, 1 CPU, 32 logical and 16 physical cores
.NET SDK 8.0.100-rc.2.23420.6
  [Host]     : .NET 8.0.0 (8.0.23.41404), X64 RyuJIT AVX2
  Job-CMAJUW : .NET 8.0.0 (42.42.42.42424) using MonoVM, X64 VectorSize=128

PowerPlanMode=00000000-0000-0000-0000-000000000000  Runtime=MonoAOTLLVM  Arguments=/p:EnableUnsafeBinaryFormatterSerialization=true
Toolchain=MonoAOTLLVM  IterationTime=250.0000 ms  MaxIterationCount=20
MinIterationCount=15  WarmupCount=1

### Base

|                   Method | Size |       Mean |     Error |    StdDev |     Median |        Min |        Max | Allocated |
|------------------------- |----- |-----------:|----------:|----------:|-----------:|-----------:|-----------:|----------:|
| LastIndexOfValue (byte)  |    4 |   7.327 ns | 0.0162 ns | 0.0151 ns |   7.325 ns |   7.302 ns |   7.356 ns |         - |
| LastIndexOfValue (byte)  |   33 |  15.001 ns | 0.0783 ns | 0.0694 ns |  15.002 ns |  14.879 ns |  15.081 ns |         - |
| LastIndexOfValue (byte)  |  512 | 113.344 ns | 0.5977 ns | 0.5591 ns | 113.188 ns | 112.769 ns | 114.428 ns |         - |
| LastIndexOfValue (char)  |    4 |   7.812 ns | 0.0737 ns | 0.0689 ns |   7.786 ns |   7.721 ns |   7.936 ns |         - |
| LastIndexOfValue (char)  |   33 |  16.684 ns | 0.1147 ns | 0.0958 ns |  16.641 ns |  16.615 ns |  16.933 ns |         - |
| LastIndexOfValue (char)  |  512 | 147.363 ns | 1.4799 ns | 1.3843 ns | 146.855 ns | 145.794 ns | 149.970 ns |         - |
| LastIndexOfValue (int32) |    4 |   7.908 ns | 0.0145 ns | 0.0121 ns |   7.910 ns |   7.886 ns |   7.925 ns |         - |
| LastIndexOfValue (int32) |   33 |  16.952 ns | 0.1194 ns | 0.1117 ns |  16.958 ns |  16.747 ns |  17.143 ns |         - |
| LastIndexOfValue (int32) |  512 | 145.627 ns | 1.1446 ns | 1.0707 ns | 145.862 ns | 144.230 ns | 147.550 ns |         - |

### Diff

|                   Method | Size |       Mean |     Error |    StdDev |     Median |        Min |        Max | Allocated |
|------------------------- |----- |-----------:|----------:|----------:|-----------:|-----------:|-----------:|----------:|
| LastIndexOfValue (byte)  |    4 |   7.119 ns | 0.0787 ns | 0.0736 ns |   7.132 ns |   7.011 ns |   7.224 ns |         - |
| LastIndexOfValue (byte)  |   33 |  13.882 ns | 0.1314 ns | 0.1229 ns |  13.857 ns |  13.671 ns |  14.056 ns |         - |
| LastIndexOfValue (byte)  |  512 | 112.549 ns | 1.3383 ns | 1.2518 ns | 113.128 ns | 108.947 ns | 113.848 ns |         - |
| LastIndexOfValue (char)  |    4 |   7.825 ns | 0.0605 ns | 0.0566 ns |   7.793 ns |   7.780 ns |   7.936 ns |         - |
| LastIndexOfValue (char)  |   33 |  16.134 ns | 0.1402 ns | 0.1311 ns |  16.053 ns |  16.021 ns |  16.425 ns |         - |
| LastIndexOfValue (char)  |  512 | 138.821 ns | 2.8544 ns | 3.2871 ns | 137.619 ns | 135.536 ns | 143.869 ns |         - |
| LastIndexOfValue (int32) |    4 |   7.733 ns | 0.0361 ns | 0.0338 ns |   7.749 ns |   7.660 ns |   7.770 ns |         - |
| LastIndexOfValue (int32) |   33 |  16.477 ns | 0.0203 ns | 0.0180 ns |  16.480 ns |  16.436 ns |  16.504 ns |         - |
| LastIndexOfValue (int32) |  512 | 144.310 ns | 0.9878 ns | 0.9240 ns | 144.197 ns | 142.791 ns | 145.726 ns |         - |

@tannergooding
Copy link
Member Author

@vargaz, MonoJIT, MonoAOT, and MonoInterpreter all look good.

There's some ios/tvos failures in System.Runtime, however, and I don't have any hardware on which to test that config.

@SamMonoRT
Copy link
Member

SamMonoRT commented Sep 7, 2023

@vargaz @ivanpovazan @fanyang-mono - please can you help analyze the iOS/tvOS failures to unblock the PR (#90764 (comment))

@tannergooding - thanks for the above performance numbers for Mono configs. Did you use @LoopedBard3's new script (just looking for more validation of that effort)

@ivanpovazan
Copy link
Member

ivanpovazan commented Sep 8, 2023

Analysis

To verify the correctness, I ran the System.Runtime tests locally on ios-arm64 with:

./dotnet.sh build -c Release src/libraries/System.Runtime/tests/System.Runtime.Tests.csproj -p:TargetOS=ios -p:TargetArchitecture=arm64 -t:Test -p:MonoEnableLLVM=true

and all tests are passing on the device.

The all CI failures on ios-arm64 and tvos-arm64 report:

[195/196] System.Private.Xml.dll -> System.Private.Xml.dll.s, System.Private.Xml.dll-llvm.o, System.Private.Xml.aotdata
ERROR: WORKLOAD TIMED OUT - Killing user command..

Which results with the app build process to be cancelled and tests never executed. As this seems to be a compilation time problem, as a workaround I would suggest bumping the timeout value to verify whether the tests are passing on the CI.


I have further investigated the compilation times.
This change seems to introduce more generics for the AOT compilation, and with that mostly affects the compilation time of aot-instances.dll - the artificial assembly used for compiling all specific instantiations (we should also keep in mind that for building the test apps we are not trimming and we have to compile 195 assemblies + aot-instances.dll).

The tables below show the comparison of the build times and AOT stats of the assembly in question, with and without the changes in the PR:

  • build times:
Main This PR
372.65s user 49.12s system 95% cpu 7:20.37 total 483.17s user 112.56s system 91% cpu 10:54.14 total
  • AOT compilation stats:
Main This PR
Code: 563680(11%) Info: 1007280(21%) Ex Info: 890024(18%) Unwind Info: 2777(0%) Class Info: 7(0%) PLT: 12038(0%) GOT Info: 1816962(38%) Offsets: 469614(9%) GOT: 103712, BLOB: 3250257 Code: 565772(10%) Info: 1338499(24%) Ex Info: 999907(18%) Unwind Info: 2777(0%) Class Info: 7(0%) PLT: 12060(0%) GOT Info: 1949055(36%) Offsets: 505469(9%) GOT: 103896, BLOB: 3530628
Compiled: 68049/68061 (99%), LLVM: 67638 (99%), No GOT slots: 25116 (36%), Direct calls: 368 (10%) Compiled: 74316/74328 (99%), LLVM: 73904 (99%), No GOT slots: 27744 (37%), Direct calls: 369 (10%)
GOT slot distribution: GOT slot distribution:
method: 11994 (169567) method: 12016 (173453)
methodconst: 1834 (30087) methodconst: 1834 (30159)
jit_icall_id: 63 (97) jit_icall_id: 63 (97)
switch: 42 (460) switch: 42 (460)
class: 1386 (6122) class: 1386 (6222)
image: 19 (19) image: 19 (19)
vtable: 4904 (22699) vtable: 4904 (22995)
sflda: 1862 (10734) sflda: 1862 (10952)
ldstr: 232 (832) ldstr: 232 (832)
ldtoken: 3 (18) ldtoken: 3 (18)
type_from_handle: 5977 (60743) type_from_handle: 6165 (62687)
iid: 273 (1315) iid: 273 (1332)
rva: 3 (18) rva: 3 (18)
delegate_info: 927 (20782) delegate_info: 927 (21103)
interruption_request_flag: 2 (0) interruption_request_flag: 2 (0)
method_rgctx: 11577 (190140) method_rgctx: 11599 (191959)
mscorlib_got_addr: 2 (0) mscorlib_got_addr: 2 (0)
gc_card_table_addr: 2 (0) gc_card_table_addr: 2 (0)
castclass_cache: 808 (3854) castclass_cache: 806 (3897)
gc_nursery_start: 2 (0) gc_nursery_start: 2 (0)
gc_safe_point_flag: 2 (0) gc_safe_point_flag: 2 (0)
aot_module: 2 (0) aot_module: 2 (0)
gc_nursery_bits: 2 (0) gc_nursery_bits: 2 (0)
jit_icall_addr_nocall: 12 (14) jit_icall_addr_nocall: 12 (14)
gshared_method_info: 16045 (1241486) gshared_method_info: 19230 (1361448)
GOT SLOTS: 57975, INFO SIZE: 1758987 GOT SLOTS: 61390, INFO SIZE: 1887665
Encoding stats: Encoding stats:
Method ref: 177034 (2920k) Method ref: 190392 (3194k)
Class ref: 72166 (340k) Class ref: 73528 (391k)
Ginst: 11526 (88k) Ginst: 11734 (94k)
Method stats: Method stats:
Normal: 43932 Normal: 46918
Instance: 22507 Instance: 25787
GSharedvt: 94 GSharedvt: 95
Wrapper: 1516 Wrapper: 1516
JIT time: 18571 ms, Generation time: 226084 ms, Assembly+Link time: 0 ms. JIT time: 41528 ms, Generation time: 273928 ms, Assembly+Link time: 0 ms.

Conclusion

Since it can be seen there is quite a slowdown in the compilation time, we should investigate this further, and additionally, it would be good to measure how this change affects the size of binaries in the full AOT mode, especially due to increased number of generics.

This is the update I have so far, I will continue investigating and follow up on this.
@tannergooding @vargaz please let me know if you have any other comments or concerns.

@vargaz
Copy link
Contributor

vargaz commented Sep 8, 2023

Will look into the compilation speed issue.

@vargaz
Copy link
Contributor

vargaz commented Sep 8, 2023

Why are the tests building unlinked apps using llvm ? It doesn't match user scenarios, and it will lead to very long build times/slow CI.

@vargaz
Copy link
Contributor

vargaz commented Sep 8, 2023

This will hopefully speed up compiling aot-instances.dll:
#91802

@tannergooding
Copy link
Member Author

thanks for the above performance numbers for Mono configs. Did you use @LoopedBard3's new script (just looking for more validation of that effort)

@SamMonoRT, I followed the updated instructions on https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md, wasn't aware there was a script for it now.

-- Updated instructions are much improved and much easier to use on Windows now, many thanks!

Since it can be seen there is quite a slowdown in the compilation time, we should investigate this further, and additionally, it would be good to measure how this change affects the size of binaries in the full AOT mode, especially due to increased number of generics.

Thanks much for the analysis. Glad to hear this is just a CI timeout issue and not something more serious.

Are there instructions on how to get the compilation time numbers you shared so that it can be included as part of future work in the area? Similarly for getting assembly size numbers to ensure new generics aren't causing negative impact?

@kotlarmilos
Copy link
Member

@ivanpovazan Thank you for the analysis. Based on your suggestion, trimming could help speed up the compilation process. I also recommend disabling the dedup in the inner dev loop to reduce overall CI duration. However, we can keep it enabled for additional platforms.

If you think it is a good idea I will create a PR.

@ivanpovazan
Copy link
Member

re: @tannergooding

Are there instructions on how to get the compilation time numbers you shared so that it can be included as part of future work in the area? Similarly for getting assembly size numbers to ensure new generics aren't causing negative impact?

Regarding the compilation time, there isn't, unfortunately, any automated way of doing the measurements.
What I do in general is:

  1. Build/run the tests suite via:
    ./dotnet.sh build -c Release src/libraries/System.Runtime/tests/System.Runtime.Tests.csproj -p:TargetOS=ios -p:TargetArchitecture=arm64 -t:Test -p:MonoEnableLLVM=true -bl
    
  2. Inspect msbuild.binlog file to retrieve the exact command for running the AOT compiler
  3. Copy the AOT compiler command to a bash script compile.sh
  4. Measure the compile time by running the script with:
    time ./compile.sh
    

On the other hand regarding the AOT compilation stats, it is possible to include stats command line option to the AOT compiler, e.g.: --aot=stats,... which will make the compiler print the stats when the compilation ends. We could include this information into the logs on every build to make them available when inspecting CI builds. Although we have to keep in mind that in this case, there are 196 assemblies to compile and the additional information about compiling each assembly will increase the size of the logs.

As for the app size regression tracking, we do track app size in .NET performance PowerBi charts, but those measurements are done for sample Android, iOS, Blazor, Maui apps, not for tests suite bundles. Additionally, I believe @kotlarmilos is working on enabling similar measurements for build time tracking (within .NET performance tracking).


re: @kotlarmilos

I also recommend disabling the dedup in the inner dev loop to reduce overall CI duration

I believe that dedup actually speeds up the build time as it would prevent doing code generation for same generic instances across multiple assemblies, so I don't think we should disable it for tests. But yes, I think we should investigate enabling trimming when building tests in various places - please see my answer to Zoltan below.


re: @vargaz

Why are the tests building unlinked apps using llvm ? It doesn't match user scenarios, and it will lead to very long build times/slow CI.

There are some tests that assume there is no linking involved. If we enable -p:EnableAggressiveTrimming=true the number of assemblies to compile is 79 and consequently the build time is much faster (haven't got the exact build times though), however, 11 tests start failing in System.Runtime.Tests group. I have opened #91923 to investigate this and try to root all the necessary assemblies/types to make them pass, as it would speed-up CI runs.

{
retNode = impSIMDPopStack();
break;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason for these JIT changes? Was something breaking without it? Are we missing any tests, or it's newly exposed by the interfaces and is thus implicitly tested?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Newly "exposed" (its still internal only) by the interface and so it's just ensuring that it remains efficient/handled.

The path previously assumed floating-point only, so we need to handle integer (which is a no-op) explicitly.

/// <typeparam name="T">The type of the elements in the vector.</typeparam>
internal unsafe interface ISimdVector<TSelf, T>
: IAdditionOperators<TSelf, TSelf, TSelf>,
// IAdditiveIdentity<TSelf, TSelf>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are all of these interfaces commented out? Is that just because they've not been needed yet for the APIs that have implemented these and thus haven't been done yet? Or are they not relevant? A comment would be helpful.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ones that aren't needed or that aren't trivial to support yet.

Still wanting to cover them in as part of a follow up, but not crucial to getting the baseline support in so we can start using everything internally.

I can add a comment.

static abstract int Count { get; }

/// <summary>Gets a value that indicates whether the vector operations are subject to hardware acceleration through JIT intrinsic support.</summary>
/// <value><see langword="true" /> if the vector operations are subject to hardware acceleration; otherwise, <see langword="false" />.</value>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we inherit most of these docs rather than duplicating them?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally most things would eventually inherit their docs from the interface.

I'd prefer to get this in (so we can get perf numbers) and log an issue tracking us change the Vector128<T> and other types to inherit from here instead in a follow up PR.

/// <exception cref="ArgumentOutOfRangeException">The length of <paramref name="values" /> is less than <see cref="Count" />.</exception>
/// <exception cref="NotSupportedException">The type of the elements in the vector (<typeparamref name="T" />) is not supported.</exception>
/// <exception cref="NullReferenceException"><paramref name="values" /> is <c>null</c>.</exception>
static virtual TSelf Create(T[] values) => TSelf.Create(values.AsSpan());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are all of these virtual rather than abstract? These are net-new APIs rather than something that each of the VectorXx types already provides?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the same principle we followed for the generic math interfaces, which is that where a behavior could be trivially provided via a DIM, to do so.

In this case, Create(T[]) and Create(ROSpan<T>) do the same thing with the former forwarding to the latter by default.

Best practice is for someone to always override a DIM (which is something we always do), but it's not strictly required.

/// <returns>The value of the element at <paramref name="index" />.</returns>
/// <exception cref="ArgumentOutOfRangeException"><paramref name="index" /> was less than zero or greater than the number of elements.</exception>
/// <exception cref="NotSupportedException">The type of <paramref name="vector" /> (<typeparamref name="T" />) is not supported.</exception>
public static T GetElement<TVector, T>(this TVector vector, int index)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The indexer can't be part of the interface?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a pattern of making various APIs extensions rather than instance methods for efficiency purposes (instance methods take TVector by ref, which can make the IL and other code to be larger and more complex. That in turn causes the JIT to do more work and can mess up some optimizations.

Having the extension then allows you to access the API like an instance method, without forcing the operand be address taken, without forcing the introduction of additional locals, and so on.

{
return ComputeLastIndex(offset: 0, TNegator.GetMatchMask(values, current));
}
return SimdImpl<Vector512<TValue>>(ref searchSpace, value, length);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love how we can now share the same implementation across the three uses.

I do wonder whether it'd be worth implementing a few more such reductions in this PR, just to better help validate before unleashing it on others and the rest of corelib. It's possible all bases are covered, but we'll often find interesting things in the 2nd and 3rd uses of an abstraction that we didn't encounter in the 1st.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'd be better to get this in, so we can get the early perf numbers (which will take a week).

Especially since this is staying internal only, we can make adjustments if any users find them as we update other APIs to utilize it.

Ideally we do this with a couple people before we try and touch "everything". The ones I expect we're "missing" are net new APIs covering core concepts over ExtractMostSignificantBits, such as IndexOfFirst/LastMatch

@tannergooding
Copy link
Member Author

#92961 was logged to track the general freebsd infra issue.

@tannergooding tannergooding merged commit 15ffcff into dotnet:main Oct 3, 2023
@tannergooding tannergooding deleted the isimdvector branch October 3, 2023 19:14
GeroL added a commit to GeroL/custom-runtime that referenced this pull request Oct 3, 2023
* Minor code cleanup in TensorPrimitives tests (#92575)

* Normalize some test naming

* Alphabetize tests

* Improve mistmatched length tests with all positions of the shorter tensor

* Alphabetize methods in TensorPrimitives.cs

* Update android-bionic.md (#92632)

* Move TargetsCurrent to net9 and add net8 workload (#91480)

* Move TargetsCurrent to net9 and add net8 workload

* Fix version references

* Update src/mono/nuget/Microsoft.NET.Workload.Mono.Toolchain.Current.Manifest/WorkloadManifest.targets.in

* [wasm] build net8 workload

* Update emsdk

* Update current template to reference net9

* Bump 8.0 version used for workloads

* Fix version for latest emscripten packages

* fix typo in 8.0 version used for the workload

* disamiguate templates

* WBT: explicitly use net8.0 projects for template projects

* Update emsdk dependency to get the workload fix

* fix

* Add some addtional workarounds for net8

* Remove extra character

* Fix test

* More wasi fixes

* Add net8 wasi-wasm runtime pack reference

* Add wasi-experimental-net8 workload

* [wasi] Fix use of workload

* [wasm] WBT: Fix test

* wasi: Allow wasi-wasm runtimepacks even when targeting net8

* fix test

---------

Co-authored-by: Ankit Jain <[email protected]>

* Improve nullability check for generic `.ctor` parameters (#92514)

* implement absent generic ctor param check

* fix code style

* Improve nullability check for generic parameters in ctor

`NullabilityInfoContext.CheckParameterMetadataType` didn't have
code paths for parameters in constructors, leading to wrong
nullability results.

The PR adds a code path for constructor parameters.

Fix #92487

* add tests on nullability of ctors and methods with generic parameters

* fix test issues with AOT trimming

* [PERF] Add hybrid globalization testing runs (#89825)

Add blazor hybrid globalization runs. This includes updating the Blazor and iOS test names to take into account hybridGlobalization and setting up a standard for scenario run configs going forward, at least for now. By having the hybridglobaliztion in both the runconfig and the name when different from the default, the names will only update for non-default settings auto-updating PowerBI while the runconfigs will be available whenever necessary.

* JitDump improvements and other minor cleanup (#92510)

* JitDump improvements and other cleanups

* More comment cleanups

* Be consistent in capitalization of `GenTree`

* JIT: Remove CallArgABIInformation::IsStruct (#92635)

Since we store signature types now this bit is no longer necessary.

* [wasi] fixed the order of WASI_AFTER_RUNTIME_LOADED_CALLS (#92552)

* Add DebuggerDisplay to Meter and Instruments (#91496)

* Add net8 wasi workload tests (#92653)

* Add net8 wasi workload tests

* Update eng/testing/tests.wasi.targets

Co-authored-by: Ankit Jain <[email protected]>

* [wasm] CI: trigger WBT on changes to eng/testing/tests.{browser,wasm,wasi}.targets

* Update eng/testing/tests.wasi.targets

* Alias the net8 runtime pack correctly

---------

Co-authored-by: Ankit Jain <[email protected]>

* Update the Windows ARM64 unwinder (#92604)

* Update the Windows ARM64 unwinder

This change updates the Windows ARM64 unwinder to match the current
state in Windows. It contains a fix for a bug that is needed as a basis
for a .NET issue fix.

* Reflect PR feedback

* [mono][llvm] Remove support for llvm versions before 14.x. (#88346)

* Vectorize TensorPrimitives.Min/Max{Magnitude} (#92618)

* Vectorize TensorPrimitives.Min/Max{Magnitude}

* Use AdvSimd.Max/Min

* Rename some parameters/locals for consistency

* Improve HorizontalAggregate

* Move a few helpers

* Avoid scalar path for returning found NaN

* [main] Update dependencies from dotnet/runtime dotnet/source-build-reference-packages dotnet/emsdk dotnet/hotreload-utils dotnet/sdk (#92584)

[main] Update dependencies from dotnet/runtime dotnet/source-build-reference-packages dotnet/emsdk dotnet/hotreload-utils dotnet/sdk
- Coherency Updates:
  - runtime.linux-arm64.Microsoft.NETCore.Runtime.ObjWriter: from 16.0.5-alpha.1.23452.1 to 16.0.5-alpha.1.23472.1 (parent: Microsoft.NET.Workload.Emscripten.Current.Manifest-9.0.100.Transport)
  - runtime.linux-x64.Microsoft.NETCore.Runtime.ObjWriter: from 16.0.5-alpha.1.23452.1 to 16.0.5-alpha.1.23472.1 (parent: Microsoft.NET.Workload.Emscripten.Current.Manifest-9.0.100.Transport)
  - runtime.linux-musl-arm64.Microsoft.NETCore.Runtime.ObjWriter: from 16.0.5-alpha.1.23452.1 to 16.0.5-alpha.1.23472.1 (parent: Microsoft.NET.Workload.Emscripten.Current.Manifest-9.0.100.Transport)
  - runtime.linux-musl-x64.Microsoft.NETCore.Runtime.ObjWriter: from 16.0.5-alpha.1.23452.1 to 16.0.5-alpha.1.23472.1 (parent: Microsoft.NET.Workload.Emscripten.Current.Manifest-9.0.100.Transport)
  - runtime.win-arm64.Microsoft.NETCore.Runtime.ObjWriter: from 16.0.5-alpha.1.23452.1 to 16.0.5-alpha.1.23472.1 (parent: Microsoft.NET.Workload.Emscripten.Current.Manifest-9.0.100.Transport)
  - runtime.win-x64.Microsoft.NETCore.Runtime.ObjWriter: from 16.0.5-alpha.1.23452.1 to 16.0.5-alpha.1.23472.1 (parent: Microsoft.NET.Workload.Emscripten.Current.Manifest-9.0.100.Transport)
  - runtime.osx-arm64.Microsoft.NETCore.Runtime.ObjWriter: from 16.0.5-alpha.1.23452.1 to 16.0.5-alpha.1.23472.1 (parent: Microsoft.NET.Workload.Emscripten.Current.Manifest-9.0.100.Transport)
  - runtime.osx-x64.Microsoft.NETCore.Runtime.ObjWriter: from 16.0.5-alpha.1.23452.1 to 16.0.5-alpha.1.23472.1 (parent: Microsoft.NET.Workload.Emscripten.Current.Manifest-9.0.100.Transport)
  - runtime.linux-arm64.Microsoft.NETCore.Runtime.JIT.Tools: from 16.0.5-alpha.1.23452.1 to 16.0.5-alpha.1.23472.1 (parent: Microsoft.NET.Workload.Emscripten.Current.Manifest-9.0.100.Transport)
  - runtime.linux-x64.Microsoft.NETCore.Runtime.JIT.Tools: from 16.0.5-alpha.1.23452.1 to 16.0.5-alpha.1.23472.1 (parent: Microsoft.NET.Workload.Emscripten.Current.Manifest-9.0.100.Transport)
  - runtime.linux-musl-arm64.Microsoft.NETCore.Runtime.JIT.Tools: from 16.0.5-alpha.1.23452.1 to 16.0.5-alpha.1.23472.1 (parent: Microsoft.NET.Workload.Emscripten.Current.Manifest-9.0.100.Transport)
  - runtime.linux-musl-x64.Microsoft.NETCore.Runtime.JIT.Tools: from 16.0.5-alpha.1.23452.1 to 16.0.5-alpha.1.23472.1 (parent: Microsoft.NET.Workload.Emscripten.Current.Manifest-9.0.100.Transport)
  - runtime.win-arm64.Microsoft.NETCore.Runtime.JIT.Tools: from 16.0.5-alpha.1.23452.1 to 16.0.5-alpha.1.23472.1 (parent: Microsoft.NET.Workload.Emscripten.Current.Manifest-9.0.100.Transport)
  - runtime.win-x64.Microsoft.NETCore.Runtime.JIT.Tools: from 16.0.5-alpha.1.23452.1 to 16.0.5-alpha.1.23472.1 (parent: Microsoft.NET.Workload.Emscripten.Current.Manifest-9.0.100.Transport)
  - runtime.osx-arm64.Microsoft.NETCore.Runtime.JIT.Tools: from 16.0.5-alpha.1.23452.1 to 16.0.5-alpha.1.23472.1 (parent: Microsoft.NET.Workload.Emscripten.Current.Manifest-9.0.100.Transport)
  - runtime.osx-x64.Microsoft.NETCore.Runtime.JIT.Tools: from 16.0.5-alpha.1.23452.1 to 16.0.5-alpha.1.23472.1 (parent: Microsoft.NET.Workload.Emscripten.Current.Manifest-9.0.100.Transport)
  - runtime.linux-arm64.Microsoft.NETCore.Runtime.Mono.LLVM.Sdk: from 16.0.5-alpha.1.23452.1 to 16.0.5-alpha.1.23472.1 (parent: Microsoft.NET.Workload.Emscripten.Current.Manifest-9.0.100.Transport)
  - runtime.linux-arm64.Microsoft.NETCore.Runtime.Mono.LLVM.Tools: from 16.0.5-alpha.1.23452.1 to 16.0.5-alpha.1.23472.1 (parent: Microsoft.NET.Workload.Emscripten.Current.Manifest-9.0.100.Transport)
  - runtime.linux-musl-arm64.Microsoft.NETCore.Runtime.Mono.LLVM.Sdk: from 16.0.5-alpha.1.23452.1 to 16.0.5-alpha.1.23472.1 (parent: Microsoft.NET.Workload.Emscripten.Current.Manifest-9.0.100.Transport)
  - runtime.linux-musl-arm64.Microsoft.NETCore.Runtime.Mono.LLVM.Tools: from 16.0.5-alpha.1.23452.1 to 16.0.5-alpha.1.23472.1 (parent: Microsoft.NET.Workload.Emscripten.Current.Manifest-9.0.100.Transport)
  - runtime.linux-x64.Microsoft.NETCore.Runtime.Mono.LLVM.Sdk: from 16.0.5-alpha.1.23452.1 to 16.0.5-alpha.1.23472.1 (parent: Microsoft.NET.Workload.Emscripten.Current.Manifest-9.0.100.Transport)
  - runtime.linux-x64.Microsoft.NETCore.Runtime.Mono.LLVM.Tools: from 16.0.5-alpha.1.23452.1 to 16.0.5-alpha.1.23472.1 (parent: Microsoft.NET.Workload.Emscripten.Current.Manifest-9.0.100.Transport)
  - runtime.linux-musl-x64.Microsoft.NETCore.Runtime.Mono.LLVM.Sdk: from 16.0.5-alpha.1.23452.1 to 16.0.5-alpha.1.23472.1 (parent: Microsoft.NET.Workload.Emscripten.Current.Manifest-9.0.100.Transport)
  - runtime.linux-musl-x64.Microsoft.NETCore.Runtime.Mono.LLVM.Tools: from 16.0.5-alpha.1.23452.1 to 16.0.5-alpha.1.23472.1 (parent: Microsoft.NET.Workload.Emscripten.Current.Manifest-9.0.100.Transport)
  - runtime.win-x64.Microsoft.NETCore.Runtime.Mono.LLVM.Sdk: from 16.0.5-alpha.1.23452.1 to 16.0.5-alpha.1.23472.1 (parent: Microsoft.NET.Workload.Emscripten.Current.Manifest-9.0.100.Transport)
  - runtime.win-x64.Microsoft.NETCore.Runtime.Mono.LLVM.Tools: from 16.0.5-alpha.1.23452.1 to 16.0.5-alpha.1.23472.1 (parent: Microsoft.NET.Workload.Emscripten.Current.Manifest-9.0.100.Transport)
  - runtime.osx-arm64.Microsoft.NETCore.Runtime.Mono.LLVM.Sdk: from 16.0.5-alpha.1.23452.1 to 16.0.5-alpha.1.23472.1 (parent: Microsoft.NET.Workload.Emscripten.Current.Manifest-9.0.100.Transport)
  - runtime.osx-arm64.Microsoft.NETCore.Runtime.Mono.LLVM.Tools: from 16.0.5-alpha.1.23452.1 to 16.0.5-alpha.1.23472.1 (parent: Microsoft.NET.Workload.Emscripten.Current.Manifest-9.0.100.Transport)
  - runtime.osx-x64.Microsoft.NETCore.Runtime.Mono.LLVM.Sdk: from 16.0.5-alpha.1.23452.1 to 16.0.5-alpha.1.23472.1 (parent: Microsoft.NET.Workload.Emscripten.Current.Manifest-9.0.100.Transport)
  - runtime.osx-x64.Microsoft.NETCore.Runtime.Mono.LLVM.Tools: from 16.0.5-alpha.1.23452.1 to 16.0.5-alpha.1.23472.1 (parent: Microsoft.NET.Workload.Emscripten.Current.Manifest-9.0.100.Transport)

 - Merge branch 'main' into darc-main-be922536-a638-4652-9241-ddc0996cfe5a

* Avoiding trying to load the native library Microsoft.DiaSymReader.Native.<arch>.dll on Linux (#92492)

* 92278: add DFEATURE_ISYM_READER definition only for windows

---------

Co-authored-by: Andrey.Kudashkin <[email protected]>
Co-authored-by: Jan Kotas <[email protected]>

* Convert SpinWait to QCall (#92675)

* define bool as Interop.BOOL to prevent upper bytes setting native bool (#92679)

* Remove all PGO assets except for the runtime PGO archive. (#92668)

* Stop setting separate properties for BUNDLE_PROBE, HOSTPOLICY_EMBEDDED, PINVOKE_OVERRIDE (#92448)

* Make config binding gen incremental (#89587)

* Make config binding gen incremental

* Iterate on implementation

* Add incremental tests & driver

* Make incremental tests pass and revert functional regression

* Address failing tests

* Make tests pass

* Suppress diagnostic

* Address feedback on diag info creation

* Refactor member access expr parsing to indicate assumptions

* Address feedback & do misc clean up

* Adjust model to minimize baseline diff / misc clean up

* Extend preinitialization interpreter (#92470)

Things that I added:

* Support for `typeof(T) == typeof(Bar)` (this will be useful later, we'll eventually be able to also freeze these).
* Support static interface method calls
* Constrained method calls on valuetypes
* More `ReadOnlySpan` construction patterns, `.Length`
* More indirect load/store support

Contributes to #78681. To full resolve this, we need to fix up things so we can answer `Sse2.IsSupported`.

* Make it possible to preinitialize HW intrinsic IsSupported (#92666)

* Move the IL rewriting for HW intrinsics `IsSuported` calls to `ILProvider` from `RyuJitCompilation`
* Also rewrite constant true/false

* [mono][aot] Type load checks do not fail at compile time but produce a runtime exception (#91261)

* Enable tests.

* When AOTing, type checks do not fail compilation but create a runtime exception.

* Cleaned up type load error cleaning. TypeLoadException icall now has a message with type name.

* Removed another instance of indiscriminate exception clearing.

* Fixed build warning.

* Using class const instead of string const. Reverted some compile to runtime errors that were not necessary for the unit tests.

* White space.

* Fixed build warning.

* Trying to fix weird AOT errors, fixed type load throw function.

* Fixed build error.

* Special handling for classes that are NULL.

* Providing for a null klass when generating exception.

* Removed flow control directive from macro.

* Fixed stack corruption.

* Attempt to push the correct type onto the stack.

* Fixing uninitialized ins.

* Fixing ro_type.

* Initializing ins.

* Complex cases with type load failures replace method body with a throw.

* Cleaning up superfluous code changes.

* Restored sizeof cosntant on failed types.

* [mono] Implement Vector128.Shuffle () for llvm on x64. (#92656)

* JIT: Handle remainder accesses more precisely in physical promotion liveness (#92651)

The liveness pass in physical promotion will currently handle any struct
LCL_FLD access of a physically promoted struct as accessing the
remainder. However, if the LCL_FLD only touches promoted fields then the
remainder is not actually used. There was a TODO around this which this
PR fixes as I stumbled upon a case this would improve.

* Add tests for `UnsafeAccessor` on fields on generic types (#92657)

* Add tests for field access on generics

The tests are currently disabled.

* Fix ILC to compile UnsafeAccessorsTests

UnsafeAccessorsTests passes on NAOT.

* Switch to etw python script (#92508)

* initial work with hacks to switch to EtwProvider python script

* move to generated scripts

* Fixes for some link issues

* fix link issue

* adding private etw callback to enable GC events

* Fix x86 build break

* fixing Linux build break

* fixing gcpriv.h

* making minimal typedefs

* FB

* Fix for posix break

* Fix Excessive Encoding in Test Logs (#92286)

* Removed special encoding that was rendering the test logs near
impossible to read properly.

* Adjusted the offending test to print the invalid character's hex code
instead, and fixed it alongside its sibling test because they didn't
handle all correct/incorrect cases properly.

* Added special handling for illegal XML characters in the test results'
XML logs.

* Simplified the sanitizing algorithm to one pass, as per Dan's feedback.

* Fix LLVMAOT Mono runtime variant official build to produce correctly named runtime packs (#92712)

In https://github.com/dotnet/runtime/commit/75ee623b8f0350a4b4be86fa71745a74beb059d1 the condition in `src/installer/pkg/sfx/Microsoft.NETCore.App/Microsoft.NETCore.App.Runtime.props` got changed from checking `MonoBundleLLVMOptimizer` to `MonoAOTEnableLLVM` but we weren't setting that property in runtime-official.yml so both jobs produced runtime packs with the same suffix, resulting in the artifact uploads randomly overwriting each other.

* Change order of loads in LowerMemcmp (#92704)

* Fix arm64 fragment unwinding (#92678)

A bug in the Windows arm64 unwinder that existed a long time ago has
caused problems with unwinding in functions split in multiple fragments
in case the location in the function was in a secondary fragment. At
that time, it was not discovered that it was a bug in the unwinder and
it got "fixed" in the runtime by always using the first fragment unwind
info. However, now it turned out that was actually incorrect in some
cases. Checking the current state of the Windows unwinder revealed that
a bug was fixed there that was causing the problem we were seeing.
Effectively ignoring all the shadow prolog unwind info in the secondary
fragments.
This change reverts the old fix after the unwinder was updated.

* Add illink analyzer support for field/property initializers (#92600)

The dataflow analyzer was exiting early for cases where the owning
symbol was not an `IMethodSymbol`. This meant we weren't running
dataflow analysis for field and property initializers.

This fixes it by allowing through cases where the owning symbol is not
an `IMethodSymbol`, and adding testcases to validate that we don't hit
asserts in the code paths that only light up for methods.

* Don't build libraries native packages in the PGO leg (#92729)

* Change how test assemblies opt-in to LibraryImportGenerator usage (#92661)

* Move more sprintf usages to snprintf (#92674)

* [wasm] GetChromeVersions: Fix fetching v8 version given a chrome version (#92667)

* [wasm] GetChromeVersions: Fix fetching v8 version given a chrome version

* Address feedback from Ilona Tomkowicz

* Updated XML documentation for `IConfigurationProvider.GetReloadToken`. (#92720)

* Avoid membarrier on lower Android versions (#92686)

Hopefully fixes #92196.

I don't actually have an ARM64 device with an old Android version so I can't testing it actually fixes the problem, but it's a plausible fix.

* Revert "Remove Latin1CharSearchValues (#91884)" (#92726)

* Revert "Remove Latin1CharSearchValues (#91884)"

This reverts commit 4a09c82215399c27f52277a8db7178270410c693.

* Keep the projitems formatting

* [main] Update dependencies from dotnet/roslyn-analyzers (#92639)

Co-authored-by: dotnet-maestro[bot] <dotnet-maestro[bot]@users.noreply.github.com>
Co-authored-by: Sven Boemer <[email protected]>

* [wasm] Fix Wasm.Build.Tests on `main` (#92741)

* CI: Don't include ref pack when the tests need workload

.. as it already includes it.

https://github.com/dotnet/runtime/issues/92732 broke this as a
side-effect which caused the `microsoft.netcore.app.ref` directory to
not be created.

Fixes https://github.com/dotnet/runtime/issues/92732 .

* [wasm] WBT: Update skiasharp reference

`blz_deploy_on_build_Debug_True_npl3f0nk_qee.csproj : error NU1903: Package 'SkiaSharp' 2.88.4-preview.76 has a known high severity vulnerability`

* Make PCP tests conditional by algorithm

The ConditionalFact tests for a functional TPM using P-256. Tests assumed that if the TPM supported P-256, then P-384 and RSA is supported as well. This is not always the case - some TPMs implement 256 without support for 384.

This changes the TPM conditional facts to be per-algorithm.

* Small refactor to BuildElement to address NRT changes (#92742)

* [mono] Enable SIMD intrinsics on winx64. (#92673)

* [mono] Enable SIMD intrinsics on winx64.

* Re-enable decompose on SIMD intrinsics on Windows.

---------

Co-authored-by: lateralusX <[email protected]>

* JIT: Promote size-wise improvements in physical promotion (#92717)

I hit the following case:
```
Evaluating access byref @000
  Single write-back cost: 3
  Write backs: 0
  Read backs: 0
  Estimated cycle improvement: 0 cycles per invocation
  Estimated size improvement: 2 bytes
Disqualifying replacement
```

These cases happen when the blocks that have candidates for promotion in
them have bbWeight equal to 0.

If we estimate a size improvement without a cycle improvement it still
makes sense to promote a replacement. More generally, a large size
improvement can make up for a small cycle regression, so add a heuristic
similar to the existing one for this. I've set it to be quite
conservative: we require 100 bytes of size improvement before we allow 1
cycle of regression. This is enough to handle the common case where the
cycle improvement is 0 due to the bbWeight = 0.

* Update TensorPrimitives aggregations to vectorize handling of remaining elements (#92672)

* Update TensorPrimitives.CosineSimilarity to vectorize handling of remaining elements

* Vectorize remainder handling for Aggregate helpers

* JIT: Unify and clean up unspilling (#91663)

* [wasm][debugger] Support passing identifiers to methods (#92758)

* Basic fix.

* More tests.

* Move tests to more suitable place.

* Pause earlier.

* [llvm] Avoid zero extending non-negative constant array indexes, its not needed and it prevents abcrem from working. (#92760)

* Fix link in ILLink.Tasks README.md (#92769)

* [main] Update dependencies from dotnet/xharness dotnet/cecil dotnet/sdk (#92700)

* Update dependencies from https://github.com/dotnet/xharness build 20230927.1

Microsoft.DotNet.XHarness.CLI , Microsoft.DotNet.XHarness.TestRunners.Common , Microsoft.DotNet.XHarness.TestRunners.Xunit
 From Version 8.0.0-prerelease.23471.1 -> To Version 8.0.0-prerelease.23477.1

* Update dependencies from https://github.com/dotnet/cecil build 20230926.1

Microsoft.DotNet.Cecil
 From Version 0.11.4-alpha.23468.2 -> To Version 0.11.4-alpha.23476.1

* Update dependencies from https://github.com/dotnet/sdk build 20230927.2

Microsoft.DotNet.ApiCompat.Task
 From Version 9.0.100-alpha.1.23476.1 -> To Version 9.0.100-alpha.1.23477.2

* Update dependencies from https://github.com/dotnet/xharness build 20230927.1

Microsoft.DotNet.XHarness.CLI , Microsoft.DotNet.XHarness.TestRunners.Common , Microsoft.DotNet.XHarness.TestRunners.Xunit
 From Version 8.0.0-prerelease.23471.1 -> To Version 8.0.0-prerelease.23477.1

* Update dependencies from https://github.com/dotnet/cecil build 20230926.1

Microsoft.DotNet.Cecil
 From Version 0.11.4-alpha.23468.2 -> To Version 0.11.4-alpha.23476.1

* Update dependencies from https://github.com/dotnet/sdk build 20230927.63

Microsoft.DotNet.ApiCompat.Task
 From Version 9.0.100-alpha.1.23476.1 -> To Version 9.0.100-alpha.1.23477.63

---------

Co-authored-by: dotnet-maestro[bot] <dotnet-maestro[bot]@users.noreply.github.com>

* Update dependencies from https://dev.azure.com/dnceng/internal/_git/dotnet-optimization build 20230927.3 (#92761)

optimization.linux-arm64.MIBC.Runtime , optimization.linux-x64.MIBC.Runtime , optimization.windows_nt-arm64.MIBC.Runtime , optimization.windows_nt-x64.MIBC.Runtime , optimization.windows_nt-x86.MIBC.Runtime , optimization.PGO.CoreCLR
 From Version 1.0.0-prerelease.23471.3 -> To Version 1.0.0-prerelease.23477.3

Co-authored-by: dotnet-maestro[bot] <dotnet-maestro[bot]@users.noreply.github.com>

* Update JIT format job to pass new `--cross` argument on Linux (#92751)

* Update JIT format job to pass new `--cross` argument on Linux

This argument is passed through to the jit-format tool and is used
when invoking build-runtime.sh on Linux when in a cross-build
scenario, such as CI builds in Mariner.

* Fix copy/paste error

* Removes redundant code from `JsonCamelCaseNamingPolicy.cs` (#92738)

* Remove Unwrap flag from UniqueComInterfaceMarshaller (#92599)

The Unwrap flag only has effect when UniqueInstance is not set. To avoid confusion from anyone referencing this code, we should remove it here.

NativeAOT needed to move the Unwrap code to inside the !UniqueInstance block to match behavior of CoreCLR. This should only be noticeable when using ComWrappers to wrap an unwrap the same object in the same NativeAOT instance. In-Proc COM with different servers and clients won't hit this behavior.

* Account port number already included within server string (#92748)

* Account port number already included within server string

* Refactor the test

* Apply feedback

* Delete misc unnecessary code (#92764)

* [main] Update dependencies from dotnet/installer (#92703)

* Update dependencies from https://github.com/dotnet/installer build 20230927.3

Microsoft.Dotnet.Sdk.Internal
 From Version 9.0.100-alpha.1.23474.1 -> To Version 9.0.100-alpha.1.23477.3

* [wasm] WBT: Update skiasharp reference

`blz_deploy_on_build_Debug_True_npl3f0nk_qee.csproj : error NU1903: Package 'SkiaSharp' 2.88.4-preview.76 has a known high severity vulnerability`

* Update dependencies from https://github.com/dotnet/installer build 20230927.26

Microsoft.Dotnet.Sdk.Internal
 From Version 9.0.100-alpha.1.23474.1 -> To Version 9.0.100-alpha.1.23477.26

---------

Co-authored-by: dotnet-maestro[bot] <dotnet-maestro[bot]@users.noreply.github.com>
Co-authored-by: Ankit Jain <[email protected]>

* Add more static preinitialization support (#92739)

Resolves #78681 (or "there's nothing else we'd be willing to do for it"). This is the rest of what I implemented trying to get `SearchValues.Create(someVeryLongString)` to preinitialize. It doesn't actually enable more `SearchValues` scenarios because I eventually hit codepaths that would require us to implement hardware intrinsics support in the interpreter. The `SearchValues` scenarios that we do support were implemented in #92470 and #92666. But since I already implemented this, here it is; maybe it will be useful for something else:

* Support for modelling `Span<X>`, including creating it from stackalloc
* Support for math/comparisons with native int
* `Unsafe.Add`

* Flesh out TensorPrimitives XML docs (#92749)

* Flesh out TensorPrimitives XML docs

* Address PR feedback

- Remove use of FusedMultiplyAdd from all but CosineSimilarity
- Remove comments about platform/OS-specific behavior from Add/AddMultiply/Subtract/Multiply/MultiplyAdd/Divide/Negate
- Loosen comments about NaN and which exact one is returned

* Address PR feedback

* Converge Representations between NativeAOT and CoreCLR (#91821)

* Update RyuJit overview (#92789)

* Correctly set sendTrustList flag when saving credentials to cache (#92731)

* JIT: Make effect handling in lowering less conservative (#92710)

The interference checking in lowering bases some of its checks on
GenTree::gtFlags. This is conservative since it includes effect flags of
operands. For LIR this does not really make sense and ends up being
conservative.

This PR replaces the relevant uses of gtFlags with a new
GenTree::OperEffects() that computes the relevant effect flags for the
node, excluding operands. We already know how to recompute effect flags
other than GTF_GLOB_REF and GTF_ORDER_SIDEEFF. This PR adds functions
for these as well (the GTF_GLOB_REF version
GenTree::OperRequiresGlobRefFlag is courtesy of @SingleAccretion).

For GTF_ORDER_SIDEEFF we add a GenTree::OperSupportsOrderingSideEffect
which captures explicitly (and conservatively) the current cases where
we are setting the flag, and only allows these cases to support the
flag. Setting the flag for other cases may result in the flag being
removed or ignored. There is a new `GenTree::SetHasOrderingSideEffect` to
add the flag which also asserts that it is only added for trees that are
supported.

Fix #92699

* [wasm] Supress policheck warning in blazor-sample (#92711)

* [wasm] Supress policheck warning in blazor-sample

Replace the offending part in the layout name. I think the suit-spade
is false positive, I used just sp in place of spade to silence it.

* Feedback

* [mono] Cleanup unused runtime functions (#91681)

- Removes unused functions
- Removes cmake configure checks for functions/headers that are no longer needed
- Renames HAVE_UWP_WINAPI_SUPPORT to HAVE_APP_WINAPI_SUPPORT
- Move MSVC warning disables into cmake so it is more visible

Co-authored-by: Johan Lorensson <[email protected]>

* Do not nop-out SSA definitions in block morphing (#92786)

SSA definitions cannot be deleted.

* Implement StoreVector64x2 and StoreVector128x2 for Arm64 (#92109)

* Implement StoreVector128x2 for Arm64

* Remove redundant implmentations

* Implement StoreVector64x2 for Arm64

* Remove StoreVector64x2 implementation for Arm64

This reverts commit 49ef72e3a3eaa58d3b3338dc5d6d80a7ca0b50b5.

* Fix instruction type for the StoreVector128x2 intrinsic

* Review comments:

* Arrange APIs alphabetically

* Add StoreVector64x2

* fix the invalid instructions

* Add test cases

* Update src/coreclr/jit/hwintrinsicarm64.cpp

Co-authored-by: Bruce Forstall <[email protected]>

---------

Co-authored-by: Kunal Pathak <[email protected]>
Co-authored-by: Bruce Forstall <[email protected]>

* Vectorize TensorPrimitives.ConvertToHalf (#92715)

* Enable TensorPrimitives to perform in-place operations (#92820)

Some operations would produce incorrect results if the same span was passed as both an input and an output.  When vectorization was employed but the span's length wasn't a perfect multiple of a vector, we'd do the standard trick of performing one last operation on the last vector's worth of data; however, that relies on the operation being idempotent, and if a previous operation has overwritten input with a new value due to the same memory being used for input and output, some operations won't be idempotent.  This fixes that by masking off the already processed elements.  It adds tests to validate in-place use works, and it updates the docs to carve out this valid overlapping.

* JIT: Optimize SequenceEqual to use ccmp on ARM64 (#92810)

In the original PR we could not get this this working due to some
conservative interference. This now does the right thing with #92710
merged.

Also change LowerCallMemcmp/LowerCallMemmove to return next node to
lower just to align it a bit more with other functions.

* [main] Update dependencies from dnceng/internal/dotnet-optimization (#92813)

Co-authored-by: dotnet-maestro[bot] <dotnet-maestro[bot]@users.noreply.github.com>

* Move variable scope tracking code (#92800)

Move it out of codegencommon.cpp and into scopeinfo.cpp.

This is both to centralize the code but also to reduce the size of
the very large codegencommon.cpp.

* Allow `key#value` for superpmi JIT option specification (#92803)

superpmi.py will pass this through from the `-jitoption` /
`-base_jit_option` / `-diff_jit_option` to superpmi.exe
`-jitoption` and `-jit2option`.

Currently, the format is `key=value`. I wrap invocation of superpmi.py
with Windows batch file scripting, which has an annoying problem of
"eating" the equals size `=`. This works around that problem. I can't
think of any case where `#` is needed in a key or value, hence that choice
as an additional option.

* Use a different crossgen2 when running crossgen2 during our build than the crossgen2 that we are shipping (#92677)

* Fix Common.Tests.GetPrettyName_CannotRead_ReturnsNull test for root user (#92695)

* fix Common.Tests.GetPrettyName_CannotRead_ReturnsNull test for root user

* remove direct call to libc in Common.Tests.GetPrettyName_CannotRead_ReturnsNull

* Update src/libraries/Common/tests/Tests/Interop/OSReleaseTests.cs

* split Common.Tests.OSReleaseTests.GetPrettyName_CannotRead_ReturnsNull into two test cases

* replace ifs with ConditionalFact in Common.Tests.OSReleaseTests class

---------

Co-authored-by: Dan Moseley <[email protected]>

* Update dependencies from https://github.com/dotnet/installer build 20230928.5 (#92817)

Microsoft.Dotnet.Sdk.Internal
 From Version 9.0.100-alpha.1.23477.26 -> To Version 9.0.100-alpha.1.23478.5

Co-authored-by: dotnet-maestro[bot] <dotnet-maestro[bot]@users.noreply.github.com>

* Update dependencies from https://github.com/dotnet/roslyn-analyzers build 20230928.1 (#92814)

Microsoft.CodeAnalysis.Analyzers , Microsoft.CodeAnalysis.NetAnalyzers
 From Version 3.11.0-beta1.23475.2 -> To Version 3.11.0-beta1.23478.1

Co-authored-by: dotnet-maestro[bot] <dotnet-maestro[bot]@users.noreply.github.com>

* Add stress mode for arm/arm64 function fragment splitting (#92802)

Set JitSplitFunctionSize to either 4 or 200 under new STRESS_UNWIND mode.

* Always emit jump from hot finally block to cold target (#92797)

On some platforms, CodeGen::genCallFinally() will remove the jump between a finally block in a call-finally pair to its jump target if the target is its immediate successor in the block list (in other words, we just fall through). However, if we are doing hot/cold splitting, it is possible for the finally block to be the last hot block, and its target the first cold block. Thus, if the two are contiguous in the block list but in separate regions, we must always emit a jump.

* [tests][iOS] Fix artifacts path (#92783)

* Vectorize TensorPrimitives.ConvertToSingle (#92779)

* Vectorize TensorPrimitives.ConvertToSingle

* Address PR feedback

* Remove CompilationProvider dependency from the source generator. (#92833)

* Apply sequence equality comparison to the final Regex incremental value. (#92835)

* Apply sequence equality comparison to the final Regex incremental value.

* Avoid using SequenceEqual

* Throw exception in TensorPrimitives for unsupported span overlaps (#92838)

* Allow multiple post-build steps and allow templated pre and post-build steps (#92375)

* Include info about system call errors in some exceptions from operating on named mutexes (#92603)

* Include info about system call errors in some exceptions from operating on named mutexes

- Added new PAL APIs for creating and opening mutexes that take a string buffer for system call error info. These are called with a stack-allocated buffer and upon error the system call errors are appended to the exception message.
- When there is a system call failure that leads to the PAL API failing, some info is appended to the error string, including the system call, relevant arguments, return value, and `errno`
- `chmod` on OSX seemingly can be interrupted by signals, fixed to retry. Also fixed a couple other small things.

Fixes https://github.com/dotnet/runtime/issues/89090

* Remove fgUpdateFlowGraph from optOptimizeFlow (#92839)

* Split off patched code into separate .S file and disable subsections-via-symbols for it (#92555)

* [amd64/arm64] Split off patched code into separate .S file and disable subsections-via-symbols for it

* [amd64/arm64] Split off patched code into separate .asm file

[arm64] Move JIT_UpdateWriteBarrierState out of the patched region to match implementation in .S file

* Remove NO_SUBSECTIONS_VIA_SYMBOLS

* JIT: fix self-conflicting HFA arg prolog handling for arm64 (#92355)

Fix prolog handling in the case where the in-body destination register
for an HFA overlaps with one of the HFA argument registers. For instance
the HFA is passed in `s0-s3` and needs to end up in `v3`.

This requires special handling because the dependence analysis done in
`genFnPrologCalleeRegArgs` only tracks entire registers, not parts of
registers.

Fixes #83167

* Update targetingpacks.targets (#88991)

* Update targetingpacks.targets

The .NET 8 Preview 6 SDK has the features required to simplify the targetingpacks.targets logic.

* Update targetingpacks.targets

* Update targetingpacks.targets

* Update known items

* Update targetingpacks.targets

* Update targetingpacks.targets

* [wasm] Use specific version of v8 for tests (#91633)

* [wasm] Add support for installing V8

* [wasm] Use provisioned v8 for library tests

* [wasm] WBT: Use provisioned v8

* [wasm] enable use of provisioned v8 for library tests

* [wasm] add MSBUILD_ARGS for build-runtime-tests make target

* update docs

* Don't install v8 for runtime tests

* [wasm] CI: trigger library test jobs when chrome version changes

* Disable provisioning v8 when building runtime tests

* address review feedback

* [wasm] Disable installing v8 for runtime tests

* Address review feedback

* fix stamping for v8

* Automated bump of chrome version (#92854)

* Preinitialize pop/switch/Type.IsValueType (#92841)

These showed up in ASP.NET Stage1 and were low hanging enough.

* Use UnsafeAccessor in JSHostImplementation instead of reflection (#92755)

* [main] Update dependencies from dotnet/emsdk dotnet/sdk (#92815)

[main] Update dependencies from dotnet/emsdk dotnet/sdk

* [main] Update dependencies from dotnet/installer (#92848)

* Update dependencies from https://github.com/dotnet/installer build 20230929.5

Microsoft.Dotnet.Sdk.Internal
 From Version 9.0.100-alpha.1.23478.5 -> To Version 9.0.100-alpha.1.23479.5

* Update dependencies from https://github.com/dotnet/installer build 20230929.5

Microsoft.Dotnet.Sdk.Internal
 From Version 9.0.100-alpha.1.23478.5 -> To Version 9.0.100-alpha.1.23479.5

---------

Co-authored-by: dotnet-maestro[bot] <dotnet-maestro[bot]@users.noreply.github.com>

* Convert all remaining tests in the Loader subtree to the merged model (#92407)

* Call fgRenumberBlocks in optIfConversion (#92821)

* [llvm] Fix spilling of valuetypes to the stack if they are passed by addr. (#92658)

The previous code would spill the valuetype when it was passed by addr,
and another bblock could try to read from the unitialized stack location.

* Check DotNetFinalVersionKind when setting WorkloadVersionSuffix (#91792)

* [RISC-V] regArg dependcies unrolling in genFnPrologCalleeRegArgs() (#91904)

* [RISC-V] Fix target type unsignedness detection in genFloatToIntCast() (#92694)

* [RISC-V] Fix target type unsignedness detection in genFloatToIntCast()

treeNode->gtFlags & GTF_UNSIGNED means unsignedness of the source type. Use varTypeIsUnsigned instead which checks for VTF_UNS on target type classification.

Fixes TryConvertToSaturatingUInt64Test and TryConvertToTruncatingUInt64Test from System.Runtime.Numerics.ComplexTests_GenericMath.

* Fix compilation without FEATURE_TIERED_COMPILATION

* [browser] Remove duplicated marshaling of return value for JSExport (#92403)

* Remove duplicated marshaling of return value for JSExport
* Move unmarshal and return value marshal into try block

* Update intellisense.targets (#92868)

* Update intellisense.targets

* Update System.Text.Json.csproj

* Ensure the adapter name 100% matching when parsing proc/net/dev (#92187)

* Ensure the adapter name 100% matching when parsing proc/net/dev

* Update src/libraries/System.Net.NetworkInformation/src/System/Net/NetworkInformation/StringParsingHelpers.Statistics.cs

Co-authored-by: Miha Zupan <[email protected]>

* Move the stackalloc out of the loop

* Update src/libraries/System.Net.NetworkInformation/src/System/Net/NetworkInformation/StringParsingHelpers.Statistics.cs

Co-authored-by: Miha Zupan <[email protected]>

---------

Co-authored-by: Miha Zupan <[email protected]>

* [mono] Only emit pshufb when ssse3 is enabled. (#92842)

Fixes https://github.com/dotnet/runtime/issues/92827.

* [browser][nodejs] keep runtime alive for JSExport calls (#92871)

* [wasm] Disable MetricsSupport feature by default (#92696)

This should improve the startup perf and size.

* Fix build of IJW test after VS upgrade (#92878)

The latest build of VS carries a C/C++ compiler which produces warning C5271:
```
src\native\corehost\test\ijw\ijw.cpp(6): warning C5271: consider replacing #using <System.Console.dll>  with command line argument /FU "F:\dotnet\runtime2\.dotnet\packs\Microsoft.NETCore.App.Ref\8.0.0-rc.1.23414.4\ref\net8.0\System.Console.dll"
src\native\corehost\test\ijw\ijw.cpp(7): warning C5271: consider replacing #using <System.Runtime.Loader.dll>  with command line argument /FU "F:\dotnet\runtime2\.dotnet\packs\Microsoft.NETCore.App.Ref\8.0.0-rc.1.23414.4\ref\net8.0\System.Runtime.Loader.dll"
```

This breaks the build on Windows. For now I'm disabling the warning as the real fix is more complex (we would need to calculate the path to the required assemblies in CMake somehow).

* Add linux-arm64 workload definitions (#92892)

* [wasm] Set InstallV8ForTests=true only for windows/linux (#92896)

.. on CI, or in a container (like codespaces).
Without this it would be `true` on macOS by default, and then fail with:
`error : V8 provisioning only supported on Linux, and windows.`

* [wasm] Perf pipeline - fix blazor_scenarios run for hybrid globalization (#92898)

* [wasm] Perf pipeline - fix blazor_scenarios run for hybrid globalization

Fails with:
```
Traceback (most recent call last):
  File "/mnt/vss/_work/1/s/Payload/performance/scripts/ci_setup.py", line 487, in <module>
    __main(sys.argv[1:])
  File "/mnt/vss/_work/1/s/Payload/performance/scripts/ci_setup.py", line 483, in __main
    main(CiSetupArgs(**vars(args)))
  File "/mnt/vss/_work/1/s/Payload/performance/scripts/ci_setup.py", line 411, in main
    dotnet_version = dotnet.get_dotnet_version(target_framework_moniker, args.cli) if args.dotnet_versions == [] else args.dotnet_versions[0]
  File "/mnt/vss/_work/1/s/Payload/performance/scripts/dotnet.py", line 581, in get_dotnet_version
    raise RuntimeError(
RuntimeError: Unable to determine the .NET SDK used for net8.0
```

This is because the definition didn't copy over the `--dotnet-versions
8.0.0` workaround needed for now. And this runs only once a week, so it
was discovered on Oct 2(Monday) even though it was merged on Sep
26(Friday).

* [wasm] Perf: run the hybrid-globalization job on runtime-wasm-perf also, for validation

* Update owner list (#92900)

* [mono][android] Add Android linux-arm64 workload definitions (#92899)

* Add linux-arm64 workload definitions

* Add linux-arm64 for android workloads

* [main] Update dependencies from dotnet/roslyn (#92578)

* Update dependencies from https://github.com/dotnet/roslyn build 20230924.2

Microsoft.CodeAnalysis , Microsoft.CodeAnalysis.CSharp , Microsoft.Net.Compilers.Toolset
 From Version 4.8.0-3.23474.1 -> To Version 4.8.0-3.23474.2

* Update dependencies from https://github.com/dotnet/roslyn build 20230924.3

Microsoft.CodeAnalysis , Microsoft.CodeAnalysis.CSharp , Microsoft.Net.Compilers.Toolset
 From Version 4.8.0-3.23474.1 -> To Version 4.8.0-3.23474.3

* Update dependencies from https://github.com/dotnet/roslyn build 20230925.1

Microsoft.CodeAnalysis , Microsoft.CodeAnalysis.CSharp , Microsoft.Net.Compilers.Toolset
 From Version 4.8.0-3.23474.1 -> To Version 4.8.0-3.23475.1

* Update dependencies from https://github.com/dotnet/roslyn build 20230925.2

Microsoft.CodeAnalysis , Microsoft.CodeAnalysis.CSharp , Microsoft.Net.Compilers.Toolset
 From Version 4.8.0-3.23474.1 -> To Version 4.8.0-3.23475.2

* Update dependencies from https://github.com/dotnet/roslyn build 20230925.3

Microsoft.CodeAnalysis , Microsoft.CodeAnalysis.CSharp , Microsoft.Net.Compilers.Toolset
 From Version 4.8.0-3.23474.1 -> To Version 4.8.0-3.23475.3

* Update dependencies from https://github.com/dotnet/roslyn build 20230925.4

Microsoft.CodeAnalysis , Microsoft.CodeAnalysis.CSharp , Microsoft.Net.Compilers.Toolset
 From Version 4.8.0-3.23474.1 -> To Version 4.8.0-3.23475.4

* Update dependencies from https://github.com/dotnet/roslyn build 20230925.5

Microsoft.CodeAnalysis , Microsoft.CodeAnalysis.CSharp , Microsoft.Net.Compilers.Toolset
 From Version 4.8.0-3.23474.1 -> To Version 4.8.0-3.23475.5

* Update dependencies from https://github.com/dotnet/roslyn build 20230925.6

Microsoft.CodeAnalysis , Microsoft.CodeAnalysis.CSharp , Microsoft.Net.Compilers.Toolset
 From Version 4.8.0-3.23474.1 -> To Version 4.8.0-3.23475.6

* Update dependencies from https://github.com/dotnet/roslyn build 20230925.7

Microsoft.CodeAnalysis , Microsoft.CodeAnalysis.CSharp , Microsoft.Net.Compilers.Toolset
 From Version 4.8.0-3.23474.1 -> To Version 4.8.0-3.23475.7

* Update dependencies from https://github.com/dotnet/roslyn build 20230925.8

Microsoft.CodeAnalysis , Microsoft.CodeAnalysis.CSharp , Microsoft.Net.Compilers.Toolset
 From Version 4.8.0-3.23474.1 -> To Version 4.8.0-3.23475.8

* Update dependencies from https://github.com/dotnet/roslyn build 20230925.10

Microsoft.CodeAnalysis , Microsoft.CodeAnalysis.CSharp , Microsoft.Net.Compilers.Toolset
 From Version 4.8.0-3.23474.1 -> To Version 4.8.0-3.23475.10

* Update dependencies from https://github.com/dotnet/roslyn build 20230926.3

Microsoft.CodeAnalysis , Microsoft.CodeAnalysis.CSharp , Microsoft.Net.Compilers.Toolset
 From Version 4.8.0-3.23474.1 -> To Version 4.8.0-3.23476.3

* Update dependencies from https://github.com/dotnet/roslyn build 20230926.6

Microsoft.CodeAnalysis , Microsoft.CodeAnalysis.CSharp , Microsoft.Net.Compilers.Toolset
 From Version 4.8.0-3.23474.1 -> To Version 4.8.0-3.23476.6

* Update dependencies from https://github.com/dotnet/roslyn build 20230926.13

Microsoft.CodeAnalysis , Microsoft.CodeAnalysis.CSharp , Microsoft.Net.Compilers.Toolset
 From Version 4.8.0-3.23474.1 -> To Version 4.8.0-3.23476.13

* Update dependencies from https://github.com/dotnet/roslyn build 20230926.14

Microsoft.CodeAnalysis , Microsoft.CodeAnalysis.CSharp , Microsoft.Net.Compilers.Toolset
 From Version 4.8.0-3.23474.1 -> To Version 4.8.0-3.23476.14

* Update dependencies from https://github.com/dotnet/roslyn build 20230926.15

Microsoft.CodeAnalysis , Microsoft.CodeAnalysis.CSharp , Microsoft.Net.Compilers.Toolset
 From Version 4.8.0-3.23474.1 -> To Version 4.8.0-3.23476.15

* Update dependencies from https://github.com/dotnet/roslyn build 20230926.21

Microsoft.CodeAnalysis , Microsoft.CodeAnalysis.CSharp , Microsoft.Net.Compilers.Toolset
 From Version 4.8.0-3.23474.1 -> To Version 4.8.0-3.23476.21

* Update dependencies from https://github.com/dotnet/roslyn build 20230926.22

Microsoft.CodeAnalysis , Microsoft.CodeAnalysis.CSharp , Microsoft.Net.Compilers.Toolset
 From Version 4.8.0-3.23474.1 -> To Version 4.8.0-3.23476.22

* Update dependencies from https://github.com/dotnet/roslyn build 20230927.1

Microsoft.CodeAnalysis , Microsoft.CodeAnalysis.CSharp , Microsoft.Net.Compilers.Toolset
 From Version 4.8.0-3.23474.1 -> To Version 4.8.0-3.23477.1

* Update dependencies from https://github.com/dotnet/roslyn build 20230927.4

Microsoft.CodeAnalysis , Microsoft.CodeAnalysis.CSharp , Microsoft.Net.Compilers.Toolset
 From Version 4.8.0-3.23474.1 -> To Version 4.8.0-3.23477.4

* Update dependencies from https://github.com/dotnet/roslyn build 20230928.3

Microsoft.CodeAnalysis , Microsoft.CodeAnalysis.CSharp , Microsoft.Net.Compilers.Toolset
 From Version 4.8.0-3.23474.1 -> To Version 4.8.0-3.23478.3

* Update dependencies from https://github.com/dotnet/roslyn build 20230928.4

Microsoft.CodeAnalysis , Microsoft.CodeAnalysis.CSharp , Microsoft.Net.Compilers.Toolset
 From Version 4.8.0-3.23474.1 -> To Version 4.8.0-3.23478.4

* Update dependencies from https://github.com/dotnet/roslyn build 20231001.1

Microsoft.CodeAnalysis , Microsoft.CodeAnalysis.CSharp , Microsoft.Net.Compilers.Toolset
 From Version 4.8.0-3.23474.1 -> To Version 4.8.0-3.23501.1

---------

Co-authored-by: dotnet-maestro[bot] <dotnet-maestro[bot]@users.noreply.github.com>

* Fix deadlock in EventPipeEventDispatcher (#92806)

* Remove 'tracing' from Native AOT PR runs (#92825)

Tracing heavily increases the time it takes to run Native AOT tests.
We already run these tests in the outerloop and the probability that
any given PR will break these tests is low. Outerloop coverage should
be good enough right now.

* Clean up the crossgen2_publish project and local/live packs references (#92826)

* Inline some of the options for the new crossgen2_publish project.

* Resolve TODOs in targetingpacks.targets

* Crossgen1 is long gone. Don't try to discover it in our override targets.

* Move ReadyToRun.targets infra into the shared repo infrastructure and have projects automatically opt-in to it instead of the LKG crossgen2 when they are targeting the live build.

* Remove extraneous property set (the same value is calculated automatically already)

* Remove outdated comment.

* Fix NativeAOT and installer legs

* Condition turning off pack downloads based on opt-in to local pack usage.

* Use the LKG host instead of the 7.0 host as the fallback for NativeExports.

* Apply suggestions from code review

Co-authored-by: Viktor Hofer <[email protected]>

* Remove AdditionalProperties as they aren't needed (global properties on the command line are already transitive)

* PR feedback

* Hook into the targets pipeline to avoid overridding targets for R2Ring projects that reference the live framework packs. Move the "target override" logic back to where we build the runtime pack as that's the only place where we need crossgen2 and can't reference the runtime pack (as we're building it).

* Don't set CoreCLRArtifactsPath manually.

* PR feedback

---------

Co-authored-by: Viktor Hofer <[email protected]>

* fix typo (#92893)

* SPMI: Disable CodeQL in superpmi-collect pipeline (#92872)

This weekend's runs hit a bunch of timeouts due to auto-injected CodeQL.

* SPMI: Simplify and improve reporting of context information (#92824)

Currently we have multiple separate mechanisms to report information
back from superpmi.exe:
1. -baseMetricsSummary/metricsSummary, which outputs a .csv file with
   aggregated statics for all contexts from the perspective of the base
   JIT
2. -diffMetricsSummary, which is the corresponding for the diff JIT
   when diffing
3. -diffsInfo, which during diffing will output a .csv with individual
   rows for every context that had diffs in it

This PR replaces these three mechanisms with a -details argument. When
passed, superpmi.exe will write a .csv file to the specified path that
contains a row for every context.

The arg is supported in both replay and diff mode but creates .csv files
with slightly different formats for these. For replays the header output
is:
```
Context,Context size,Result,MinOpts,Size,Instructions
```

For diffs the output is:
```
Context,Context size,Base result,Diff result,MinOpts,Has diff,Base size,Diff size,Base instructions,Diff instructions
```

superpmi.py is changed to utilize this new output instead, which
involves computing some of the same details we were getting from the
metrics summaries before.

Prerequisite for #85755

* [mono][jit] Arm64 SIMD regs are now zeroed with movi instead of eor (#92882)

* SIMD regs are now zeroed with movi instead of eor.

* Simplified vector length selection.

* JIT: Merge consecutive stores (#92852)

Co-authored-by: Egor <[email protected]>
Co-authored-by: Jakob Botsch Nielsen <[email protected]>

* Improve throughput / allocations of JsonNode.GetPath (#92284)

* Improve throughput / allocations of JsonNode.GetPath

The current implementation is creating a `List<string>` and appending each segment to it, which in most of the cases is allocating a `string`. Then it iterates through that list in reverse order appending to a newly-created `StringBuilder`, which it then `ToString`s. In this change, it instead just uses `ValueStringBuilder`, appending to it as it goes.

In doing so, it does reverse the order of enumeration. Previously each node would effectively do:
```C#
void GetPath()
{
    AddNode(this);
    parent?.GetPath();
}
```
and now it's doing:
```C#
void GetPath()
{
    parent?.GetPath();
    AddNode(this);
}
```
While C# doesn't emit tail calls, with optimizations enabled, it's feasible the JIT might emit the recursive call as a jmp rather than a call, in which case it would avoid possible stack dives. However, that's not guaranteed, and doesn't happen today in tier 0 and other unoptimized code. On top of that, to get such a deep nesting in a JsonNode, you need to either go out of your way to create one manually using the JsonNode/Object/Array/Value constructors, or you need to use JsonSerializer.Deserializer, overriding its default MaxDepth, and in the case of a really deep input, it's also recursive and will stack overflow in smaller situations. I have a different version of this change that keeps the same ordering, passing around a span and a length separately, and prepending to the end of the span, but it results in more complicated code, so I'd prefer this variation that just uses ValueStringBuilder unless we have real concerns.

* Address PR feedback

* Update dependencies from https://github.com/dotnet/source-build-externals build 20231002.3 (#92936)

Microsoft.SourceBuild.Intermediate.source-build-externals
 From Version 9.0.0-alpha.1.23475.2 -> To Version 9.0.0-alpha.1.23502.3

Co-authored-by: dotnet-maestro[bot] <dotnet-maestro[bot]@users.noreply.github.com>

* [wasm] Use intended ports when running DevServer (#92913)

* [wasm] Ignore empty `$ASPNETCORE_URLS`

* [wasm] DevServer: honor urls specified in the options

* [wasm] CI: Don't trigger non-wbt jobs on wasm-app-host changes

* CI: don't trigger wasm runtime tests on wasm-app-host changes

* This vectorizes TensorPrimitives.Log2 (#92897)

* Add a way to support operations that can't be vectorized on netstandard

* Updating TensorPrimitives.Log2 to be vectorized on .NET Core

* Update src/libraries/System.Numerics.Tensors/src/System/Numerics/Tensors/TensorPrimitives.netstandard.cs

Co-authored-by: Stephen Toub <[email protected]>

* Ensure we do an arithmetic right shift in the Log2 vectorization

* Ensure the code can compile on .NET 7

* Ensure that edge cases are properly handled and don't resolve to `x`

* Ensure that Log2 special results are explicitly handled.

---------

Co-authored-by: Stephen Toub <[email protected]>

* Add analyzer support for inline array access (#92736)

This allows analysis of inline array access operations, by
treating them similarly to array access. However, like
ILLink/ILCompiler it doesn't understand inline array creation, so
doesn't track them as arrays. The result is that values read out
of an inline array are unknown, so this produces dataflow
warnings when such a value is passed to a location with dataflow
requirements, matching the ILLink/ILCompiler behavior.

Using `InlineArray` required referencing a more recent of the
.NET 8 reference assemblies.

Fixes https://github.com/dotnet/runtime/issues/88684

* [main] Update dependencies from dotnet/installer (#92933)

Microsoft.Dotnet.Sdk.Internal
 From Version 9.0.100-alpha.1.23479.5 -> To Version 9.0.100-alpha.1.23502.7

Co-authored-by: dotnet-maestro[bot] <dotnet-maestro[bot]@users.noreply.github.com>

* Add testing for #92539. (#92926)

* Add testing for #92539.

* Remove netfx test skips.

* Fix illink task lock during live build (#92928)

* Fix illink task lock during live build

Fixes https://github.com/dotnet/runtime/discussions/92126

* Update illink.targets

* Throw NotSupportedException when applying JsonObjectHandling.Populate on types with parameterized constructors. (#92937)

* CI: Don't trigger runtime pipelines on perf pipeline only changes (#92903)

* CI: Don't trigger runtime pipelines on perf pipeline only changes
* address review feedback from @ cincuranet

* Condition the use of NetCoreAppPrevious TFM (#92941)

* Condition the use of NetCoreAppPrevious TFM

NuGet doesn't support duplicate TFMs in the TargetFrameworks string.
Condition the use of NetCoreAppPrevious TFMs until NuGet supports that
(which is planned afaik).

* Fix ODBC project TFM

* Don't publish crossgen2 as NativeAOT when doing a cross-os build. (#92948)

* Update PGO to use the correct post-build steps model (#92958)

* Adding Log2 tests covering some special values (#92946)

* Expose an internal ISimdVector interface and being using it to deduplicate some SIMD code (#90764)

* Adding an internal ISimdVector`2 interface

* Move LastIndexOfValueType to use ISimdVector`2

* Fix a couple minor whitespace nits and remove an unnecessary local

* [wasm] Disable `TensorPrimitivesTests.ConvertToHalf_SpecialValues` (#92953)

Failing test: `System.Numerics.Tensors.Tests.TensorPrimitivesTests.ConvertToHalf_SpecialValues`

Issue: https://github.com/dotnet/runtime/issues/92885

* JIT: Expand unaligned address recognition for ARM32 (#92938)

The JIT has some backwards compatibility for accessing unaligned float
fields on ARM32. With physical promotion, we can end up with some new
patterns that we didn't handle. Expand the pattern matching to handle a
constant address unaligned address.

Fix #92382

* runtime-wasm-perf: add triggers for PRs (#92799)

* CI: runtime-wasm-perf: add triggers for running on PRs

This is useful to prevent perf pipeline from breaking when changes are
made in `dotnet/runtime`.

* CI: Add run-scenarios-job.yml to list of perf pipeline specific files

* [wasm] wasmbrowser - change the default webserver port to 0, to randomly select a port (#92952)

…mly select a port

* Adding a vectorized implementation of TensorPrimitives.Log (#92960)

* Adding a vectorized implementation of TensorPrimitives.Log

* Make sure to hit Ctrl+S

---------

Co-authored-by: Stephen Toub <[email protected]>
Co-authored-by: Michal Strehovský <[email protected]>
Co-authored-by: Larry Ewing <[email protected]>
Co-authored-by: Ankit Jain <[email protected]>
Co-authored-by: karakasa <[email protected]>
Co-authored-by: Parker Bibus <[email protected]>
Co-authored-by: Bruce Forstall <[email protected]>
Co-authored-by: Jakob Botsch Nielsen <[email protected]>
Co-authored-by: Filip W <[email protected]>
Co-authored-by: Badre BSAILA <[email protected]>
Co-authored-by: Jan Vorlicek <[email protected]>
Co-authored-by: Zoltan Varga <[email protected]>
Co-authored-by: dotnet-maestro[bot] <42748379+dotnet-maestro[bot]@users.noreply.github.com>
Co-authored-by: Andrey Kudashkin <[email protected]>
Co-authored-by: Andrey.Kudashkin <[email protected]>
Co-authored-by: Jan Kotas <[email protected]>
Co-authored-by: yowl <[email protected]>
Co-authored-by: Jeremy Koritzinsky <[email protected]>
Co-authored-by: Elinor Fung <[email protected]>
Co-authored-by: Layomi Akinrinade <[email protected]>
Co-authored-by: Jan Dupej <[email protected]>
Co-authored-by: Aaron Robinson <[email protected]>
Co-authored-by: Lakshan Fernando <[email protected]>
Co-authored-by: Ivan Diaz Sanchez <[email protected]>
Co-authored-by: Alexander Köplinger <[email protected]>
Co-authored-by: Egor Bogatov <[email protected]>
Co-authored-by: Sven Boemer <[email protected]>
Co-authored-by: Hazel <[email protected]>
Co-authored-by: Miha Zupan <[email protected]>
Co-authored-by: dotnet-maestro[bot] <dotnet-maestro[bot]@users.noreply.github.com>
Co-authored-by: Kevin Jones <[email protected]>
Co-authored-by: Levi Broderick <[email protected]>
Co-authored-by: lateralusX <[email protected]>
Co-authored-by: Ilona Tomkowicz <[email protected]>
Co-authored-by: Tarcisio <[email protected]>
Co-authored-by: Jackson Schuster <[email protected]>
Co-authored-by: Buyaa Namnan <[email protected]>
Co-authored-by: Andrew Au <[email protected]>
Co-authored-by: SingleAccretion <[email protected]>
Co-authored-by: Radek Zikmund <[email protected]>
Co-authored-by: Radek Doulik <[email protected]>
Co-authored-by: SwapnilGaikwad <[email protected]>
Co-authored-by: Kunal Pathak <[email protected]>
Co-authored-by: Tymoteusz Wenerski <[email protected]>
Co-authored-by: Dan Moseley <[email protected]>
Co-authored-by: Aman Khalid <[email protected]>
Co-authored-by: Mitchell Hwang <[email protected]>
Co-authored-by: Eirik Tsarpalis <[email protected]>
Co-authored-by: Koundinya Veluri <[email protected]>
Co-authored-by: Filip Navara <[email protected]>
Co-authored-by: Andy Ayers <[email protected]>
Co-authored-by: Viktor Hofer <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Tomáš Rylek <[email protected]>
Co-authored-by: Djuradj Kurepa <[email protected]>
Co-authored-by: t-mustafin <[email protected]>
Co-authored-by: Tomasz Sowiński <[email protected]>
Co-authored-by: Marek Fišera <[email protected]>
Co-authored-by: skyoxZ <[email protected]>
Co-authored-by: Pavel Savara <[email protected]>
Co-authored-by: Vitek Karas <[email protected]>
Co-authored-by: Eric StJohn <[email protected]>
Co-authored-by: David Mason <[email protected]>
Co-authored-by: Andy Gocke <[email protected]>
Co-authored-by: Milos Kotlar <[email protected]>
Co-authored-by: Egor <[email protected]>
Co-authored-by: Tanner Gooding <[email protected]>
@ghost ghost locked as resolved and limited conversation to collaborators Nov 3, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants