Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Massive performance regression .NET 9 vs .NET 10 #115033

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
tomrus88 opened this issue Apr 25, 2025 · 63 comments
Open

Massive performance regression .NET 9 vs .NET 10 #115033

tomrus88 opened this issue Apr 25, 2025 · 63 comments
Labels
Milestone

Comments

@tomrus88
Copy link

tomrus88 commented Apr 25, 2025

Description

I've seeing massive performance regression when targeting .NET 10 instead of .NET 9 with below repro code

using System.Diagnostics;

namespace ConsoleApp17
{
    internal class Program
    {
        struct Foo
        {
            public int Value;
        }


        static long counter;

        static void Main(string[] args)
        {
            var startTime = Stopwatch.StartNew();

            var fakeData = Enumerable.Range(0, 4200000).Select(i => new Foo { Value = i });

            int outputEntriesPerBlock = 157;

            for (var i = 0; i < 26000; i++)
            {
                var blockEntries = fakeData.Skip(i * outputEntriesPerBlock).Take(outputEntriesPerBlock).ToArray();

                for (var j = 0; j < blockEntries.Length; j++)
                {
                    DoSomething(blockEntries[j]);
                }
            }

            Console.WriteLine($"Work took {startTime.Elapsed}, result {counter}");
        }

        static void DoSomething(Foo foo)
        {
            counter += foo.Value;
        }
    }
}

https://github.com/wowdev/TACTSharp/blob/ed6fb1b7bd3220afea1c3845c2dbe797f553ac1c/TACTSharp/GroupIndex.cs#L93-L111 code where that problem was found initially

Project file

<Project Sdk="Microsoft.NET.Sdk">

  <PropertyGroup>
    <OutputType>Exe</OutputType>
    <TargetFrameworks>net9.0;net10.0</TargetFrameworks>
    <ImplicitUsings>enable</ImplicitUsings>
    <Nullable>enable</Nullable>
    <PublishAot>true</PublishAot>
  </PropertyGroup>

</Project>

Please note that PublishAot bit in a project is important because it apparently turns on UseSizeOptimizedLinq option.

Configuration

Win11 x64, .NET 10 Preview 3

Regression?

Yes.

Data

On my system i'm getting following results:

// .NET 10 
Work took 00:03:45.9408366, result 8331359959000
// .NET 9 or on .NET 10 with <UseSizeOptimizedLinq>false</UseSizeOptimizedLinq> 
Work took 00:00:00.0924906, result 8331359959000

So it's around 2443 times slower on .NET 10.

Analysis

Enabled UseSizeOptimizedLinq option (triggered by enabling PublishAot) seems to be what contributes to the problem.

@tomrus88 tomrus88 added the tenet-performance Performance related issue label Apr 25, 2025
@dotnet-issue-labeler dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Apr 25, 2025
@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label Apr 25, 2025
@huoyaoyuan huoyaoyuan added area-System.Linq and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Apr 25, 2025
@Clockwork-Muse
Copy link
Contributor

Clockwork-Muse commented Apr 25, 2025

// .NET 9 or on .NET 10 with <UseSizeOptimizedLinq>false</UseSizeOptimizedLinq>

... because you're trading one optimization for another (size of binary vs speed it runs at). There isn't necessarily a "fix" here for us, other than recommending you add that flag.

That said...

var blockEntries = fakeData.Skip(i * outputEntriesPerBlock).Take(outputEntriesPerBlock).ToArray();

... save yourself some trouble and just use Chunk():

foreach (var block in fakeData.Chunk(outputEntriesPerBlock))
{
    foreach (var entry in block) 
    {
        DoSomething(entry);
    }
}

This is likely to be performant on both versions, because you're not constantly re-reading the source data. It's also going to be safer, if your source ever ends up being some sort of read-once stream.


From a quick look at your code, I would recommend you look at Pipelines and Channels.

Also, BinaryPrimitives.ReverseEndianness() should almost never be called directly - instead you should be using methods that explicitly specify the target endianness. Failure to do so results in files that only work on a single endianness.

@tomrus88
Copy link
Author

tomrus88 commented Apr 25, 2025

because you're trading one optimization for another (size of binary vs speed it runs at). There isn't necessarily a "fix" here for us, other than recommending you add that flag

I'm not trading anything, I haven't even touched any of those settings in a first place, it's all basically defaults. All I did was changing target framework from .NET 9 to .NET 10 to see this performance regression... And I bet I won't be alone when .NET 10 releases.

I know that code is not perfect (it's not even my code, I'm just a potential user). What I do know is that code should not suddenly become 2000x slower when upgrading to new version.

@Symbai
Copy link

Symbai commented Apr 25, 2025

I'm not trading anything, I haven't even touched any of those settings in a first place, it's all basically defaults. All I did was changing target framework from .NET 9 to .NET 10

But you said it yourself in your first post. In .NET 10 using AOT UseSizeOptimizedLinq is probably enabled. In which case you get size optimzed linq instead. So yes, there is a performance regression from .NET 9 to .NET 10 if you built with AOT and you upgrade without disabling UseSizeOptimizedLinq. But this seems to be intended, otherwise UseSizeOptimizedLinq wouldn't be enabled by default. So Clockwork-Muse is right when saying "thre isn't necessarily a fix here for us".

@bencyoung-Fignum
Copy link

These kind of controls are always painful, especially as most people don't know they exist and especially when the default changes "silently". E.g. choosing between a 10x bigger app (or whatever) and a 2000x slower one isn't really a tradeoff. It seems more to be exposing issues in AOT compilation and trying to work around them in a very blunt way

@tomrus88
Copy link
Author

tomrus88 commented Apr 25, 2025

But this seems to be intended, otherwise UseSizeOptimizedLinq wouldn't be enabled by default. So Clockwork-Muse is right when saying "thre isn't necessarily a fix here for us".

So you are saying it's intended for user to have 2500x slower code just to save a few bytes in generated binary?

Published repro nativeaot app sizes:
.NET 9: 1656832 bytes
.NET 10: 1540608 bytes (slow, UseSizeOptimizedLinq=true)
.NET 10: 1776128 bytes (fast, UseSizeOptimizedLinq=false)

So I win whopping 235520 bytes (~15% of app size on .NET 10), but app now runs almost 2500 times slower and this is OK? Really?

These kind of controls are always painful, especially as most people don't know they exist and especially when the default changes "silently". E.g. choosing between a 10x bigger app (or whatever) and a 2000x slower one isn't really a tradeoff. It seems more to be exposing issues in AOT compilation and trying to work around them in a very blunt way

Exactly. It is probably not even documented anywhere. The only reason I know it exist is because when I mentioned this issue in C# discord, someone told me to try it out.

@HighPerfDotNet
Copy link

So yes, there is a performance regression from .NET 9 to .NET 10

And that's the problem which should not be happening, especially THAT much slower - even -10% is unacceptable.

@Fabi
Copy link

Fabi commented Apr 25, 2025

I'm not trading anything, I haven't even touched any of those settings in a first place, it's all basically defaults. All I did was changing target framework from .NET 9 to .NET 10

But you said it yourself in your first post. In .NET 10 using AOT UseSizeOptimizedLinq is probably enabled. In which case you get size optimzed linq instead. So yes, there is a performance regression from .NET 9 to .NET 10 if you built with AOT and you upgrade without disabling UseSizeOptimizedLinq. But this seems to be intended, otherwise UseSizeOptimizedLinq wouldn't be enabled by default. So Clockwork-Muse is right when saying "thre isn't necessarily a fix here for us".

Going from under a second to up to lets say 5 minutes is not a regression. That is a big, unacceptable bug in the size optimized linq. Especially when it does not even give you a huge size optimization.
That is a high priority bug that needs to be fixed.

@GerardSmit
Copy link

GerardSmit commented Apr 25, 2025

The default value of UseSizeOptimizedLinq is always true for NativeAOT:

<UseSizeOptimizedLinq Condition="'$(UseSizeOptimizedLinq)' == ''">true</UseSizeOptimizedLinq>

Wouldn't it be better to set UseSizeOptimizedLinq based on what's configured in OptimizationPreference?

OptimizationPreference UseSizeOptimizedLinq
Size true
Speed false
Not provided false

The OptimizationPreference-property is documented: https://learn.microsoft.com/en-us/dotnet/core/deploying/native-aot/optimizing (maybe add UseSizeOptimizedLinq to this page as well).

When you upgrade the project to .NET 10 and have OptimizationPreference set to Size, it makes sense that .NET made improvements to improve the size-reduction. The developer chose for size.
When you upgrade the project and don't have anything set, you won't get a regression.

@elgonzo
Copy link

elgonzo commented Apr 25, 2025

@Symbai, @Clockwork-Muse

that's such a silly argument to make and is entirely based on the preposterous assumption that a majority of .NET developers prefer paying a huge performance penalty for some savings in executable size -- because what else could explain the claim of this performance regression being a part of an intentional change of default build settings? Who in the world would believe that...?

Performance regressions of this order of magnitude, especially when submarined into the runtime or build tool-chain as the new default, are not trade-offs -- they are regression bugs. Great that you think "there isn't necessarily a fix here for us" -- what a glorious future awaits .NET (Looking forward to the day when i can run a .NET app on a Commodore 64 -- at the same slow speed as on a modern 5...6 GHz CPU... 😂)

@HighPerfDotNet
Copy link

it makes sense that .NET made improvements to improve the size-reduction. The developer chose for size.

No, it does not make sense that your application speed gets murdered performance wise even if size went down 10 times, which it did not. It's a time bomb, real killer that can easily get missed - because it's just a safe upgrade to next .Net.

Choosing size historically (in most languages) did very little and very rarely had significant performance reductions, nobody expects getting totally killed, which is what is happening here.

@Symbai
Copy link

Symbai commented Apr 25, 2025

So you are saying it's intended for user to have 2500x slower code just to save a few bytes in generated binary?

If its on by default, yes. Otherwise the person who made it by default wouldn't have done it. Am I saying this was a good decision? No. But I'm glad everyone insinuates that.

@elgonzo
Copy link

elgonzo commented Apr 25, 2025

@Symbai stop with the silly arguments already. By your broken logic, any bug would not be a bug but an intentional change, because guess what: Somebody had to write buggy code for a bug to exist in the first place. And if the person who made buggy code didn't intend to make bugs, they wouldn't have written buggy code, no...?

The situation is simple: Either enabling the UseSizeOptimizedLinq feature by default has been made prematurely and it is not really ready to be the default setting (in which case this decision has to be reverted), or this ridiculously huge performance regression needs to be fixed before .NET 10 gets its proper release.

@Clockwork-Muse
Copy link
Contributor

... I think the crux of your argument here is based on something of a false premise; that AoT will just "be faster" for all situations. Using AoT often means you need to change APIs used, or forgo certain other capabilities (reflection being a big one). In particular, people using AoT for speed reasons are likely to avoid Linq entirely (or at least in large processing loops/hot paths), because Linq itself carries a relatively large performance overhead.

Additionally, as I initially posted, the APIs you're currently using are suboptimal for what you're trying to write (at minimum just the particular Linq method chosen). You're relying on an internal optimization detail which isn't guaranteed to remain, and is also susceptible to disruption depending on how you might tweak your query. Regardless of whether we do change the default here, you will be best served by changing your code; the runtime is limited in its ability to be able to save you from yourself.

@bencyoung-Fignum
Copy link

This was changing from AOT on 9 to AOT on 10, so there's no presumtion that AOT is faster...

@elgonzo
Copy link

elgonzo commented Apr 25, 2025

@Clockwork-Muse

I think the crux of your argument here is based on something of a false premise;

No, it's not. The base line here is .NET 9 AoT itself, not something else. Do yo want to claim using .NET 9 AoT as baseline itself being a false premise?

Using AoT often means you need to change APIs used

Read the report: Using AoT on .NET 9 without needing to change APIs yielded an execution time of a fraction of a second.

Additionally, as I initially posted, the APIs you're currently using are suboptimal

Doesn't really matter here and is a merely a deflection, because - again - the "suboptimal" APIs used were good for a performance of less than a second with AoT in .NET 9.

You're relying on an internal optimization detail

Wait, are you saying that actually .NET 9 AoT performing competently is/was an optimization detail that nobody should rely upon? In other words, are you expecting from developers that by default they should now expect AoT to perform shitty going forward with .NET 10? So, developers should expect that AoT performs enormously worse than the original IL code that's running in the CLR and is being jitted in real-time during program execution now? Really?

What the heck....?

@rickbrew
Copy link
Contributor

The repro code in this issue is not actually representative of what's linked to over at https://github.com/wowdev/TACTSharp/blob/main/TACTSharp/GroupIndex.cs#L93-L111

Here, Skip is called on an IEnumerable. Over in TACTSharp, it's called on a List, which is trivially optimizable for Skip/Take. Perhaps .NET 10 isn't taking advantage of that for some reason?

@hez2010
Copy link
Contributor

hez2010 commented Apr 25, 2025

Related: #113214

We need to revisit this now that we get real-world scenarios regressed.
I believe UseSizeOptimizedLinq should only be enabled if optimizing for size, instead of enabled by default.

/cc @MichalStrehovsky

@Clockwork-Muse
Copy link
Contributor

Wait, are you saying that actually .NET 9 AoT performing competently is/was an optimization detail that nobody should rely upon? In other words, are you expecting from developers that by default they should now expect AoT to perform shitty going forward with .NET 10? So, developers should expect that AoT performs enormously worse than the original IL code that's running in the CLR and is being jitted in real-time during program execution now? Really?

What the heck....?

Not quite.
Strictly speaking, there's no guarantee that these optimizations will remain in Linq at all, regardless of compiling to AoT.

The team may have priorities and goals for optimization that differ from yours, which can change over time. For AoT, this is often binary or running program size. This is not unique to C#/AoT - this is something that happens for all languages and runtimes. As a developer, part of your job of evaluating new releases is becoming aware of any such changes and responding appropriately to them. This may involve tuning flags or tweaking code.

Again, a large part of the problem here is that you're writing suboptimal code in the first place, which is exacerbating the performance difference you're noticing. Fix your code or change your design appropriately, and the issue will disappear. The runtime cannot be counted on to save you from yourself in every situation.

Here, Skip is called on an IEnumerable, Over in TACTSharp, it's called on a List, which is trivially optimizable for Skip/Take. Perhaps .NET 10 isn't taking advantage of that for some reason?

Enumerable.Range is optimized internally the same way List is, essentially - there are internal-only types that take advantage of their implementations to not require enumeration for some scenarios (among other things). It's these types and code paths that refer to them that are being trimmed when AoT optimizes for binary size for Linq, because they're relatively large generic types.
Strictly speaking both aren't guaranteed to be optimized similarly, but for a repro it's fine.

@elgonzo
Copy link

elgonzo commented Apr 25, 2025

Again, a large part of the problem here is that you're writing suboptimal code in the first place, which is exacerbating the performance difference you're noticing. Fix your code or change your design appropriately, and the issue will disappear. The runtime cannot be counted on to save you from yourself in every situation.

No. That would only be a reasonable argument if the performance regression observed would be within reasonable limits. But it is not. It's an outrageous performance regression. That's not .NET 10 AoT merely not "saving the coder from themselves", that's .NET 10 AoT actively pushing one off a cliff and then trampling on the corpse for no good reason. But hey, you're apparently saying a performance regression of x2000 (an order of magnitude of 3) is somehow reasonable, okay...

I have to admit, a mischievous part of me secretly wishes this to remain in the eventually shipped .NET 10, if only because that part of me would like to see the ensuing bewilderment and expressions of incredulity across the .NET developer community...

@agocke
Copy link
Member

agocke commented Apr 25, 2025

Thanks everyone for the feedback. Fundamentally, this is a tradeoff and we need to pick a default.

Our initial expectation was that most apps would see a minor perf regression in favor of a minor size improvement. We were most interested in the change in algorithmic complexity: the speed-optimized LINQ basically grows quadratically in size with the instantiations.

If people are seeing real-world regressions this large, it may be better to keep size optimization opt-in instead of opt-out. However, no matter what someone will be disappointed with the choice.

Note that apps can set OptimizationPreference=Speed and OptimizationPreference=Size. This is well-documented at https://learn.microsoft.com/en-us/dotnet/core/deploying/native-aot/optimizing. If you strongly favor one over the other, you should set the correct value.

Marlamin added a commit to wowdev/TACTSharp that referenced this issue Apr 25, 2025
See dotnet/runtime#115033 and GroupIndexBlockBench for why we're using a span here now. Thanks Warpten for suggesting the span solution!
@agocke agocke removed the untriaged New issue has not been triaged by the area owner label Apr 25, 2025
@rickbrew
Copy link
Contributor

People who use LINQ are well aware of drawbacks

Not necessarily. Also, the use of LINQ could be buried in some library that can't be changed.

@HighPerfDotNet
Copy link

HighPerfDotNet commented Apr 26, 2025

Not necessarily.

They should have been, it was always known to be terrible for performance, however by saying that I maintain that however terrible it was it should not get a LOT worse all of a sudden, especially anything like 2000 times slower.

@danmoseley
Copy link
Member

danmoseley commented Apr 26, 2025

every blind person can see ... How can someone that is a 'Dev lead' not see such a simple thing

@Fabi please be respectful towards other members of this community.

@agocke
Copy link
Member

agocke commented Apr 26, 2025

I thought I was clear, but let me re-iterate: the goal of changing the defaults of LINQ in AOT was primarily to eliminate the use of GVMs that caused quadratic growth. This code has already been shipping in mono/mobile scenarios for years.

We knew that there was a potential for regression, but didn't know if people were actually using the code patterns that are a lot slower (read: places where we went from a constant-time algorithm to a linear-time algorithm).

It turns out there are. If we can reintroduce optimizations without GVMs to restore these paths to the same algorithm complexity, that seems like the best-case scenario. We can take a ~constant size increase while taking a ~linear time speed improvement.

If someone wants to actually try to implement that, it would be helpful. Otherwise, we'll look into it before shipping .NET 10.

@tfenise
Copy link
Contributor

tfenise commented Apr 26, 2025

If I understand it correctly, it's the optimizations with Select that actually cause the quadratic growth, so there are #109978 and

public virtual IEnumerable<TResult> Select<TResult>(Func<TSource, TResult> selector) =>
!IsSizeOptimized
? new IteratorSelectIterator<TSource, TResult>(this, selector)
: new IEnumerableSelectIterator<TSource, TResult>(this, selector);

If the goal is only to avoid the quadratic growth (and some linear growth is acceptable), instead of deoptimizing Select<TSource, valuetype> for all Iterator<TSource>, we could just deoptimize Select<TSource, valuetype1> for those Iterator<TSource> with multiple generic parameters, like IListSelectIterator<valuetype2, TSource>, but still retaining IList<>-like behavior when possible. This way, the code size would still be larger than with "SizeOptimizedLinq" enabled, but probably not a quadratic growth. "Places where we went from a constant-time algorithm to a linear-time algorithm" would probably be avoided.

@agocke
Copy link
Member

agocke commented Apr 26, 2025

Yup, that’s my read too. But I haven’t actually tried the changes so I won’t promise anything

@xPaw
Copy link
Contributor

xPaw commented Apr 27, 2025

Besides the performance discussion, I'd like to point out that UseSizeOptimizedLinq is not a documented option and if you try to google it you get 5 links and the first link is this issue.

System.Linq.Enumerable.IsSizeOptimized is giving 2 results. So I don't know how anyone is supposed to know that LINQ even has this option to have sub-par code path for some operations (seems questionable to me that this option needs to exist, but that's a different discussion).

@HighPerfDotNet
Copy link

That made me wonder too - is there a general option during compilation to check if this is size optimised build or not?

@agocke
Copy link
Member

agocke commented Apr 27, 2025

It’s now part of OptimizationPreference, which is well documented. “Default” is a balanced approach where we use best judgement to pick the right implementation for most users. There are no plans to specifically call out LINQ, as the same tradeoff can be present in numerous APIs.

@HighPerfDotNet
Copy link

the same tradeoff can be present in numerous APIs.

I hope not!

@tannergooding
Copy link
Member

tannergooding commented Apr 27, 2025

One thing that needs to be noted is that this particular example is 2400 times slower, but it is not that "all" instances of Skip/Take are 2400 times slower.

This particular example happens to be highlighting the already existing worst case potential for Skip/Take on something that isn't an IList<T> or the internal Iterator<T> type (therefore having a known Count, Indexer, and/or CopyTo method). This same scenario would occur for many different LINQ patterns regardless of the UseSizeOptimizedLinq change, this includes if you just inserted a Where or Select clause, which are decently typical in real world LINQ.

So the actual cause of the regression here is that certain LINQ operations no longer produce an IEnumerable<T> where the underlying type is IListSkipTakeIterator<T> which means that later LINQ operations may now hit the already existing worst case.


The reason the regression looks so egregious is then because the input in the sample above is 4.2 million elements. This would roughly a 16 megabyte allocation for real input (assuming just int) and would exist on the Large Object Heap (LOH -- anything over roughly 85 kilobytes goes here) and already has many special considerations.

In this worst case, it is then using Skip/Take to chunk the data into x element sequences. Because the source is no longer countable/indexable, it ends up walking x * i items, then copying out x items, and repeating this process Count / x times. With each new iteration having to re-enumerate all items the previous iteration walked and thus resulting in a ton of duplicated work.

For the scenario where UseSizeOptimizedLinq=false the code is still non-ideal and is still doing a "lot" of unnecessary work, work which scales with the size of the input enumerable. But, it is still able to finish quickly because the step of "walk x * i items" becomes a couple of branches with an early exit so the cost appears negligible.

LINQ has a dedicated API for this: Chunk and if you switch from Skip(x * i).Take(x) to Chunk(x) like this:

foreach (var blockEntries in fakeData.Chunk(outputEntriesPerBlock))
{
    for (var j = 0; j < blockEntries.Length; j++)
    {
        DoSomething(blockEntries[j]);
    }
}

Then not only does the code get simpler, but it also gets even faster than the version where UseSizeOptimizedLinq=false, and it fixes the bug that exists in the sample code where it's not handling the count % outputEntriesPerBlock remaining items:

Scenario UseSizeOptimizedLinq Result
Skip(i * x).Take(x) false Work took 00:00:00.0831966, result 8331359959000
Skip(i * x).Take(x) true Work took 00:05:40.3705856, result 8331359959000
Chunk(x) true Work took 00:00:00.0532271, result 8819997900000

-- Notably if you don't want the remaining items handled, for whatever reason, then that can still be trivially handled and still remains faster than the origin Skip(i * x).Take(x) approach which has the bad worst case scenarios.


It isn't never ideal when a scenario regresses and it impacts someone negatively, but the circumstances that cause it to look so egregious need to be acknowledged. The code using atypically sized inputs, rolling a naive helper rather than using the dedicated API, and more needs to be taken into account. The fact that the pit of failure always existed and could be trivially introduced in other ways needs to be considered.

There are many possible ways forward, many of which would allow UseSizeOptimizedLinq=true to remain as the default and some of which would toggle it back off by default. At the end of the day, the whole circumstance has to be understood, how it impacts the whole ecosystem, how likely it is to hit the negative scenario, whether there are better alternative fixes, whether we can push users towards using the better alternatives, etc.

@kasperk81
Copy link
Contributor

how it impacts the whole ecosystem, how likely it is to hit the negative scenario, whether there are better alternative fixes, whether we can push users towards using the better alternatives, etc.

that level of evaluation should have been done before merging #113214

@agocke
Copy link
Member

agocke commented Apr 27, 2025

Incorrect. .NET 10 is in preview. This is what previews are for: finding potential corner case, complex issues.

This code has already been shipping in the mobile workloads for years. If this were incredibly common it almost certainly would have come up before.

@kasperk81
Copy link
Contributor

This code has already been shipping in the mobile workloads for years. If this were incredibly common it almost certainly would have come up before.

Mobile workloads use Mono (AOT + interpreter). Does their performance degrade similarly to CoreCLR AOT, with slowdowns up to 2500x?

@tannergooding
Copy link
Member

tannergooding commented Apr 28, 2025

Mobile workloads use Mono (AOT + interpreter). Does their performance degrade similarly to CoreCLR AOT, with slowdowns up to 2500x?

This isn't a CoreCLR vs Mono thing, it's not even an AOT vs JIT thing. It's a thing about when the libraries are trimmed with UseSizeOptimizedLinq=true (or the System.Linq.Enumerable.IsSizeOptimized config knob is set in the runtime host or AppConfig settings, etc; which is notably what setting UseSizeOptimizedLinq toggles).

As agocke stated above, this has been the historical default for mobile workloads for years. So yes this same scenario would behave poorly there because it would hit the same path of having to iterate n items as part of Skip(n) and so the lowest outputEntriesPerBlock items will be iterated 26000 times. The next outputEntriesPerBlock items will be iterated 25999 times, and so on. Only the last outputEntriesPerBlock entries will be iterated once.

All that's changed here is CoreCLR AOT is now also defaulting to UseSizeOptimizedLinq=true. As per the PR that made it, this was done because it provides significant size savings for without regressing perf in a noticeable way for typical apps. Scenarios like the one here are expected to be more atypical. It is expected to be atypical for many reasons, including that we've gotten no such issues logged for any mobile workload over the years. It would also be expected to be atypical based on the expected size of collections (which has historically been used to decide on things like the LOH threshold and initial capacity for things like new List<T>()), and because there is a dedicated LINQ API, enumerable.Chunk(x), that completely avoids the issue and common bugs that may be encountered with the enumerable.Skip(x * i).Take(x) pattern (and it's expected code would use the dedicated API when it exists).

@huoyaoyuan
Copy link
Member

Just bring into context: the original cause of the change is #102131, which is a real world usage that hits O(N^2) size growth.

The size-optimized LINQ has been enabled for years for wasm and mobile, and it has always been O(N^2) for some cases. Is there any real world report that the O(N^2) complexity caused problem?

@MichalStrehovsky
Copy link
Member

Yep, like @huoyaoyuan said, this LINQ optimization causes a NxM combinatorial explosion in generics that did cause real-world apps not just get a couple dozen percent bigger, but so big that they can no longer compile ahead of time. The details start at #102131 (comment). Once your app uses enough LINQ, the size increase will get so bad you can no longer compile your app.

It's not just whether we prefer apps to be smaller and start faster or prefer best peak throughput. It's whether we'd like to compile things at all.

To be clear, this is not a "novel LINQ deoptimization". This setting removes a throughput optimization that was added to LINQ a couple years ago. The behavior of LINQ with the optimization removed is going to match .NET Framework/.NET Native, or .NET 9 on mobile platforms (where we made this the default in .NET 6).

If the choice is between "some users notice throughput that matches .NET Framework" and "some users cannot compile their app at all", our defaults will prefer the former, sorry if you disagree.

@MichalStrehovsky MichalStrehovsky modified the milestones: 10.0.0, Future Apr 28, 2025
@bencyoung-Fignum
Copy link

bencyoung-Fignum commented Apr 28, 2025

@MichalStrehovsky the point is well made but I guess the issue is the trade-offs here seem to be very stark in either direction and I think most people prefer "no options, just give me something in the middle that works"! Ideally you wouldn't have to choose between size or speed at all and you'd get something relatively small that's fast :)

Is this (the NxM issue) something that's being addressed in the compiler in future versions?

I think one of the things .NET has done really well in the past is avoid Java's 100 GC config options for performance style

@MichalStrehovsky
Copy link
Member

Is this (the NxM issue) something that's being addressed in the compiler in future versions?

The non-linear generic expansion can be addressed by adding a feature called universal shared code. It deoptimizes all generic code in different ways and that avoids the non-linear expansion but brings other performance problems instead. We don't have plans to add that to native AOT. It is a well understood problem but the tradeoffs add yet another dimension. I don't think anyone in this thread would be happy about that as the solution.

We don't have that many things that would behave very differently between AOT and JIT: compiled LINQ expression trees are interpreted with AOT, so AOT users might see perf difference (and can do nothing about it). Then there are compiled regexes (but: people can use source generator to restore perf). And same now for LINQ (but: most people can flip the switch discussed above and trade size for perf). There might be an extra gesture that is required, or it might be something that will never perform same as JIT with AOT. It might be an interesting thing to build a Roslyn performance analyzer that would flag these things as they are written.

These differences stem from the fundamental design of AOT - there is an upper bound on how much native code can be pregenerated.

@am11
Copy link
Member

am11 commented Apr 28, 2025

people can use source generator to restore perf

Using source generator could also help improving LINQ's performance without changing the code style. A project like LinqGen could be a suitable recommendation for this purpose (it may not help with open generics).

@elgonzo
Copy link

elgonzo commented Apr 28, 2025

And same now for LINQ (but: most people can flip the switch discussed above and trade size for perf)

The pitfalls of Linq (and whatever other common every-day components in the BCL exhibiting similar complications) -- or a guide of don'ts and dos for AoT -- need to be documented in some way that is discoverable in obvious ways (e.g., by search engines). I found a list of AoT limitations in the documentation here: https://learn.microsoft.com/en-us/dotnet/core/deploying/native-aot/?tabs=windows%2Cnet8#limitations-of-native-aot-deployment but it does not (yet?) mention anything that could steer developers away from the abyss of performance loss or the compilation issues expected when the code base of an app or library makes heavy use of unfavorable Linq constructs.

If an analyzer could be realized that helps developers by highlighting Linq constructs problematic for AoT, great! (Not sure, though, how complex the problem of creating such an analyzer that doesn't issue too many false alarms would be...)

@HighPerfDotNet
Copy link

HighPerfDotNet commented Apr 28, 2025

most people can flip the switch discussed above and trade size for perf

People who choose AOT specifically expect stuff to be faster - not just startup, but also execution, even though it does not work as well due to continued lack of working static PGO, but still it's generally comparable and smaller due to trimming, which works reasonably well. Trimming got it's own risks but there are compiler warnings and the risk is expressed in simple failure of method being invoked, that's fairly painless.

People who select size over perf don't expect to have even half perf as the result, and potential for 3 orders of magnitude slower is a time bomb waiting to explode - this sort of level of performance reduction is a security issue that can facilitate DOS attacks in the future.

If it is acceptable to have massive deviations from perf when optimizing for size, then what stops adding faster paths that will use a lot of memory, like 3 orders of magnitude for say 5% better perf? Clearly that's not ok for widely used base libraries that people will trust implicitly.

@tannergooding
Copy link
Member

Using source generator could also help improving LINQ's performance without changing the code style

This scenario doesn't require anything like source generation to fix it. Instead just use enumerable.Chunk(x) instead of enumerable.Skip(i * x).Take(x).

The dedicated Chunk API simplifies the code, avoids bugs that are common in the Skip+Take approach, and improves performance even over the UseSizeOptimizedLinq=false approach.

We already have an analyzer that recommends replacing certain LINQ patterns with other more optimal patterns and recommended code switch to using Chunk(x) is likely just one more scenario to support.

-- The same is true for many LINQ patterns that may get pessimized under UseSizeOptimizedLinq=true. There is often a better alternative API that exists for common 2-3 step LINQ sequences that won't hit the regression. If a specialized API doesn't exist, opening an API suggestion for it to be exposed is often a viable option and will resolve things instead.

People who choose AOT specifically expect stuff to be faster - not just startup, but also execution

This is not typically the case and is a bad expectation that's been propagated from years of misinformation.

You can get better performance for AOT if you remove all portability, compile for a specific target machine, and statically link the world. However, typical AOT deployments (even for C/C++) instead target a more portable common baseline (which for Intel/AMD is typically a machine from ~2003/2004) and hit many real world limitations in place due to things like not being able to inline across dynamic dependencies (cannot statically link the world), lack of tuning based on dynamic inputs (static PGO doesn't cover everything), etc.

Equally you can get better performance for a JIT because it specifically targets the hardware it runs on. Because it statically knows its dependency set, doesn't have to consider dynamic boundaries, can dynamically tune for the payloads it runs on, etc. The place the JIT tends to fall down is on startup because it has to compile more resources, but even that can be mitigated via things like pre-JIT, which is a form of partial AOT to improve startup but still allow the JIT to run.

But, the reality is that many performance improvements that show up in microbenchmarks do not translate equivalently over to real world applications; especially not applications that depend on disk IO, networking, or any kind of "human" latency (like user input). So outside of specific scenarios, typical AOT vs JIT apps tend to be comparable in performance and the decision of which to use should come down to your deployment needs.

For example, considerations should include (but are not limited to) things like (this is a high level overview):

  • Is a JIT allowed?
    • It's disallowed for many embedded platforms, mobile devices, and consoles
  • How much work is the typical app going to run?
    • If it's small, then AOT is often better
    • If it's large or long running then JIT is often better
  • Is startup performance critical?
    • This is typically a question about micro to milliseconds, not seconds to minutes
    • If this is something like small Azure Functions being launched thousands of times a day, then AOT is often better
    • If this is something like a user app, then JIT + ReadyToRun is often a great choice
  • Is deployment size critical?
    • This is typically a question about kilobytes, not megabytes
    • Most size savings come from trimming which works with both JIT and AOT scenarios
    • AOT additionally removes the need to pass around the JIT which saves some additional space on top

@huoyaoyuan
Copy link
Member

People who choose AOT specifically expect stuff to be faster - not just startup, but also execution

This is not a correct expectation. The upper boundary of performance of AOT is better for some cases (singleton devirtualization), and worse for some others (adaptive instruction set, dynamic PGO, absolute addressing of static variables). AOT is NOT naturally faster.

People who select size over perf don't expect to have even half perf as the result, and potential for 3 orders of magnitude slower is a time bomb waiting to explode

like 3 orders of magnitude for say 5% better perf?

It's meaningless to use ratio here - it's complexity of growing function (O(N) vs O(N^2) etc). The previous behavior has already hit real-world size explosion.

@tfenise
Copy link
Contributor

tfenise commented Apr 28, 2025

Just a reminder, there is something to try to avoid both quadratic size growth and "Places where we went from a constant-time algorithm to a linear-time algorithm". #115033 (comment)

@bencyoung-Fignum
Copy link

Yes I guess the ideal is to have a "lesser" set of perf optimization for SizeOptimized that avoid the NxM behaviour but keep the algorithmic complexity the same as the optimized version but with perhaps less performance...

@HighPerfDotNet
Copy link

The previous behavior has already hit real-world size explosion.

But the new behavior is a lot worse, isn't it? That's the problem - getting something that maybe was slow, but now a LOT slower.

This is not typically the case and is a bad expectation that's been propagated from years of misinformation.

Maybe, but what kind performance loss expectations most people have when choosing option to optimise for size?

  1. 10-15% (I am at this level)
  2. 50%
  3. 1000%
  4. 200000%

At what point they would NOT choose it if they were informed that their stuff can a lot slower? They get zero info right now, maybe compiler should warn about those parts that they can execute super slow and it's a potential security issue.

It was one thing to try it for mobile work loads, what's the worst going to happen there - app hangs and user restarts it? But here we are talking potential server run stuff that can take lots of services down, this is just DOS waiting to be exploited.

Anyway, I am done here.

@bencyoung-Fignum
Copy link

Thinking about this some more, I think there are a few points:

  1. Algorithms that change complexity (guarentees) based on platform/environment are really hard to rely on, and end up leading to blanket statements like "don't use Linq".
  2. One way of looking at this is that the issue is that Linq, in trying to be helpful by optimizing certain use-cases actually leads to surprising performance cliffs when you, for one reason or another, don't meet the conditions you thought you were providing to get the good performance
  3. It's not clear if the Linq alorithm complexity models are documented anyway
  4. It seems like this is just one example (although a common one) of where this NxM issue can show up, and it seems something that somehow needs addressing somehow. Whether there's a way of pruning the matrix when you know what the possible arguments are or something else, it seems like it will become an issue for any large application that wants to be used AoT, especially if you try and optimize by using GVMs (I know I've heavily used these for the "empty struct to force instatiation" method in perf sensitive code in the past). People are used to heavy compiles for AoT for things like C++ and Rust so I'm not sure .NET is doing anything not seen on those platforms

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: No status
Development

No branches or pull requests