-
Notifications
You must be signed in to change notification settings - Fork 5k
Massive performance regression .NET 9 vs .NET 10 #115033
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
... because you're trading one optimization for another (size of binary vs speed it runs at). There isn't necessarily a "fix" here for us, other than recommending you add that flag. That said... var blockEntries = fakeData.Skip(i * outputEntriesPerBlock).Take(outputEntriesPerBlock).ToArray(); ... save yourself some trouble and just use foreach (var block in fakeData.Chunk(outputEntriesPerBlock))
{
foreach (var entry in block)
{
DoSomething(entry);
}
} This is likely to be performant on both versions, because you're not constantly re-reading the source data. It's also going to be safer, if your source ever ends up being some sort of read-once stream. From a quick look at your code, I would recommend you look at Pipelines and Channels. Also, |
I'm not trading anything, I haven't even touched any of those settings in a first place, it's all basically defaults. All I did was changing target framework from .NET 9 to .NET 10 to see this performance regression... And I bet I won't be alone when .NET 10 releases. I know that code is not perfect (it's not even my code, I'm just a potential user). What I do know is that code should not suddenly become 2000x slower when upgrading to new version. |
But you said it yourself in your first post. In .NET 10 using AOT UseSizeOptimizedLinq is probably enabled. In which case you get size optimzed linq instead. So yes, there is a performance regression from .NET 9 to .NET 10 if you built with AOT and you upgrade without disabling UseSizeOptimizedLinq. But this seems to be intended, otherwise UseSizeOptimizedLinq wouldn't be enabled by default. So Clockwork-Muse is right when saying "thre isn't necessarily a fix here for us". |
These kind of controls are always painful, especially as most people don't know they exist and especially when the default changes "silently". E.g. choosing between a 10x bigger app (or whatever) and a 2000x slower one isn't really a tradeoff. It seems more to be exposing issues in AOT compilation and trying to work around them in a very blunt way |
So you are saying it's intended for user to have 2500x slower code just to save a few bytes in generated binary? Published repro nativeaot app sizes: So I win whopping 235520 bytes (~15% of app size on .NET 10), but app now runs almost 2500 times slower and this is OK? Really?
Exactly. It is probably not even documented anywhere. The only reason I know it exist is because when I mentioned this issue in C# discord, someone told me to try it out. |
And that's the problem which should not be happening, especially THAT much slower - even -10% is unacceptable. |
Going from under a second to up to lets say 5 minutes is not a regression. That is a big, unacceptable bug in the size optimized linq. Especially when it does not even give you a huge size optimization. |
The default value of
Wouldn't it be better to set
The When you upgrade the project to .NET 10 and have |
that's such a silly argument to make and is entirely based on the preposterous assumption that a majority of .NET developers prefer paying a huge performance penalty for some savings in executable size -- because what else could explain the claim of this performance regression being a part of an intentional change of default build settings? Who in the world would believe that...? Performance regressions of this order of magnitude, especially when submarined into the runtime or build tool-chain as the new default, are not trade-offs -- they are regression bugs. Great that you think "there isn't necessarily a fix here for us" -- what a glorious future awaits .NET (Looking forward to the day when i can run a .NET app on a Commodore 64 -- at the same slow speed as on a modern 5...6 GHz CPU... 😂) |
No, it does not make sense that your application speed gets murdered performance wise even if size went down 10 times, which it did not. It's a time bomb, real killer that can easily get missed - because it's just a safe upgrade to next .Net. Choosing size historically (in most languages) did very little and very rarely had significant performance reductions, nobody expects getting totally killed, which is what is happening here. |
If its on by default, yes. Otherwise the person who made it by default wouldn't have done it. Am I saying this was a good decision? No. But I'm glad everyone insinuates that. |
@Symbai stop with the silly arguments already. By your broken logic, any bug would not be a bug but an intentional change, because guess what: Somebody had to write buggy code for a bug to exist in the first place. And if the person who made buggy code didn't intend to make bugs, they wouldn't have written buggy code, no...? The situation is simple: Either enabling the UseSizeOptimizedLinq feature by default has been made prematurely and it is not really ready to be the default setting (in which case this decision has to be reverted), or this ridiculously huge performance regression needs to be fixed before .NET 10 gets its proper release. |
... I think the crux of your argument here is based on something of a false premise; that AoT will just "be faster" for all situations. Using AoT often means you need to change APIs used, or forgo certain other capabilities (reflection being a big one). In particular, people using AoT for speed reasons are likely to avoid Linq entirely (or at least in large processing loops/hot paths), because Linq itself carries a relatively large performance overhead. Additionally, as I initially posted, the APIs you're currently using are suboptimal for what you're trying to write (at minimum just the particular Linq method chosen). You're relying on an internal optimization detail which isn't guaranteed to remain, and is also susceptible to disruption depending on how you might tweak your query. Regardless of whether we do change the default here, you will be best served by changing your code; the runtime is limited in its ability to be able to save you from yourself. |
This was changing from AOT on 9 to AOT on 10, so there's no presumtion that AOT is faster... |
No, it's not. The base line here is .NET 9 AoT itself, not something else. Do yo want to claim using .NET 9 AoT as baseline itself being a false premise?
Read the report: Using AoT on .NET 9 without needing to change APIs yielded an execution time of a fraction of a second.
Doesn't really matter here and is a merely a deflection, because - again - the "suboptimal" APIs used were good for a performance of less than a second with AoT in .NET 9.
Wait, are you saying that actually .NET 9 AoT performing competently is/was an optimization detail that nobody should rely upon? In other words, are you expecting from developers that by default they should now expect AoT to perform shitty going forward with .NET 10? So, developers should expect that AoT performs enormously worse than the original IL code that's running in the CLR and is being jitted in real-time during program execution now? Really? What the heck....? |
The repro code in this issue is not actually representative of what's linked to over at https://github.com/wowdev/TACTSharp/blob/main/TACTSharp/GroupIndex.cs#L93-L111 Here, |
Related: #113214 We need to revisit this now that we get real-world scenarios regressed. |
Not quite. The team may have priorities and goals for optimization that differ from yours, which can change over time. For AoT, this is often binary or running program size. This is not unique to C#/AoT - this is something that happens for all languages and runtimes. As a developer, part of your job of evaluating new releases is becoming aware of any such changes and responding appropriately to them. This may involve tuning flags or tweaking code. Again, a large part of the problem here is that you're writing suboptimal code in the first place, which is exacerbating the performance difference you're noticing. Fix your code or change your design appropriately, and the issue will disappear. The runtime cannot be counted on to save you from yourself in every situation.
|
No. That would only be a reasonable argument if the performance regression observed would be within reasonable limits. But it is not. It's an outrageous performance regression. That's not .NET 10 AoT merely not "saving the coder from themselves", that's .NET 10 AoT actively pushing one off a cliff and then trampling on the corpse for no good reason. But hey, you're apparently saying a performance regression of x2000 (an order of magnitude of 3) is somehow reasonable, okay... I have to admit, a mischievous part of me secretly wishes this to remain in the eventually shipped .NET 10, if only because that part of me would like to see the ensuing bewilderment and expressions of incredulity across the .NET developer community... |
Thanks everyone for the feedback. Fundamentally, this is a tradeoff and we need to pick a default. Our initial expectation was that most apps would see a minor perf regression in favor of a minor size improvement. We were most interested in the change in algorithmic complexity: the speed-optimized LINQ basically grows quadratically in size with the instantiations. If people are seeing real-world regressions this large, it may be better to keep size optimization opt-in instead of opt-out. However, no matter what someone will be disappointed with the choice. Note that apps can set |
See dotnet/runtime#115033 and GroupIndexBlockBench for why we're using a span here now. Thanks Warpten for suggesting the span solution!
Not necessarily. Also, the use of LINQ could be buried in some library that can't be changed. |
They should have been, it was always known to be terrible for performance, however by saying that I maintain that however terrible it was it should not get a LOT worse all of a sudden, especially anything like 2000 times slower. |
@Fabi please be respectful towards other members of this community. |
I thought I was clear, but let me re-iterate: the goal of changing the defaults of LINQ in AOT was primarily to eliminate the use of GVMs that caused quadratic growth. This code has already been shipping in mono/mobile scenarios for years. We knew that there was a potential for regression, but didn't know if people were actually using the code patterns that are a lot slower (read: places where we went from a constant-time algorithm to a linear-time algorithm). It turns out there are. If we can reintroduce optimizations without GVMs to restore these paths to the same algorithm complexity, that seems like the best-case scenario. We can take a ~constant size increase while taking a ~linear time speed improvement. If someone wants to actually try to implement that, it would be helpful. Otherwise, we'll look into it before shipping .NET 10. |
If I understand it correctly, it's the optimizations with runtime/src/libraries/System.Linq/src/System/Linq/Iterator.cs Lines 90 to 93 in a37502b
If the goal is only to avoid the quadratic growth (and some linear growth is acceptable), instead of deoptimizing |
Yup, that’s my read too. But I haven’t actually tried the changes so I won’t promise anything |
Besides the performance discussion, I'd like to point out that
|
That made me wonder too - is there a general option during compilation to check if this is size optimised build or not? |
It’s now part of OptimizationPreference, which is well documented. “Default” is a balanced approach where we use best judgement to pick the right implementation for most users. There are no plans to specifically call out LINQ, as the same tradeoff can be present in numerous APIs. |
I hope not! |
One thing that needs to be noted is that this particular example is 2400 times slower, but it is not that "all" instances of This particular example happens to be highlighting the already existing worst case potential for Skip/Take on something that isn't an So the actual cause of the regression here is that certain LINQ operations no longer produce an The reason the regression looks so egregious is then because the input in the sample above is 4.2 million elements. This would roughly a 16 megabyte allocation for real input (assuming just In this worst case, it is then using Skip/Take to chunk the data into For the scenario where LINQ has a dedicated API for this: foreach (var blockEntries in fakeData.Chunk(outputEntriesPerBlock))
{
for (var j = 0; j < blockEntries.Length; j++)
{
DoSomething(blockEntries[j]);
}
} Then not only does the code get simpler, but it also gets even faster than the version where
-- Notably if you don't want the remaining items handled, for whatever reason, then that can still be trivially handled and still remains faster than the origin It isn't never ideal when a scenario regresses and it impacts someone negatively, but the circumstances that cause it to look so egregious need to be acknowledged. The code using atypically sized inputs, rolling a naive helper rather than using the dedicated API, and more needs to be taken into account. The fact that the pit of failure always existed and could be trivially introduced in other ways needs to be considered. There are many possible ways forward, many of which would allow |
that level of evaluation should have been done before merging #113214 |
Incorrect. .NET 10 is in preview. This is what previews are for: finding potential corner case, complex issues. This code has already been shipping in the mobile workloads for years. If this were incredibly common it almost certainly would have come up before. |
Mobile workloads use Mono (AOT + interpreter). Does their performance degrade similarly to CoreCLR AOT, with slowdowns up to 2500x? |
This isn't a CoreCLR vs Mono thing, it's not even an AOT vs JIT thing. It's a thing about when the libraries are trimmed with As agocke stated above, this has been the historical default for mobile workloads for years. So yes this same scenario would behave poorly there because it would hit the same path of having to iterate All that's changed here is CoreCLR AOT is now also defaulting to |
Just bring into context: the original cause of the change is #102131, which is a real world usage that hits O(N^2) size growth. The size-optimized LINQ has been enabled for years for wasm and mobile, and it has always been O(N^2) for some cases. Is there any real world report that the O(N^2) complexity caused problem? |
Yep, like @huoyaoyuan said, this LINQ optimization causes a NxM combinatorial explosion in generics that did cause real-world apps not just get a couple dozen percent bigger, but so big that they can no longer compile ahead of time. The details start at #102131 (comment). Once your app uses enough LINQ, the size increase will get so bad you can no longer compile your app. It's not just whether we prefer apps to be smaller and start faster or prefer best peak throughput. It's whether we'd like to compile things at all. To be clear, this is not a "novel LINQ deoptimization". This setting removes a throughput optimization that was added to LINQ a couple years ago. The behavior of LINQ with the optimization removed is going to match .NET Framework/.NET Native, or .NET 9 on mobile platforms (where we made this the default in .NET 6). If the choice is between "some users notice throughput that matches .NET Framework" and "some users cannot compile their app at all", our defaults will prefer the former, sorry if you disagree. |
@MichalStrehovsky the point is well made but I guess the issue is the trade-offs here seem to be very stark in either direction and I think most people prefer "no options, just give me something in the middle that works"! Ideally you wouldn't have to choose between size or speed at all and you'd get something relatively small that's fast :) Is this (the NxM issue) something that's being addressed in the compiler in future versions? I think one of the things .NET has done really well in the past is avoid Java's 100 GC config options for performance style |
The non-linear generic expansion can be addressed by adding a feature called universal shared code. It deoptimizes all generic code in different ways and that avoids the non-linear expansion but brings other performance problems instead. We don't have plans to add that to native AOT. It is a well understood problem but the tradeoffs add yet another dimension. I don't think anyone in this thread would be happy about that as the solution. We don't have that many things that would behave very differently between AOT and JIT: compiled LINQ expression trees are interpreted with AOT, so AOT users might see perf difference (and can do nothing about it). Then there are compiled regexes (but: people can use source generator to restore perf). And same now for LINQ (but: most people can flip the switch discussed above and trade size for perf). There might be an extra gesture that is required, or it might be something that will never perform same as JIT with AOT. It might be an interesting thing to build a Roslyn performance analyzer that would flag these things as they are written. These differences stem from the fundamental design of AOT - there is an upper bound on how much native code can be pregenerated. |
Using source generator could also help improving LINQ's performance without changing the code style. A project like LinqGen could be a suitable recommendation for this purpose (it may not help with open generics). |
The pitfalls of Linq (and whatever other common every-day components in the BCL exhibiting similar complications) -- or a guide of don'ts and dos for AoT -- need to be documented in some way that is discoverable in obvious ways (e.g., by search engines). I found a list of AoT limitations in the documentation here: https://learn.microsoft.com/en-us/dotnet/core/deploying/native-aot/?tabs=windows%2Cnet8#limitations-of-native-aot-deployment but it does not (yet?) mention anything that could steer developers away from the abyss of performance loss or the compilation issues expected when the code base of an app or library makes heavy use of unfavorable Linq constructs. If an analyzer could be realized that helps developers by highlighting Linq constructs problematic for AoT, great! (Not sure, though, how complex the problem of creating such an analyzer that doesn't issue too many false alarms would be...) |
People who choose AOT specifically expect stuff to be faster - not just startup, but also execution, even though it does not work as well due to continued lack of working static PGO, but still it's generally comparable and smaller due to trimming, which works reasonably well. Trimming got it's own risks but there are compiler warnings and the risk is expressed in simple failure of method being invoked, that's fairly painless. People who select size over perf don't expect to have even half perf as the result, and potential for 3 orders of magnitude slower is a time bomb waiting to explode - this sort of level of performance reduction is a security issue that can facilitate DOS attacks in the future. If it is acceptable to have massive deviations from perf when optimizing for size, then what stops adding faster paths that will use a lot of memory, like 3 orders of magnitude for say 5% better perf? Clearly that's not ok for widely used base libraries that people will trust implicitly. |
This scenario doesn't require anything like source generation to fix it. Instead just use The dedicated We already have an analyzer that recommends replacing certain LINQ patterns with other more optimal patterns and recommended code switch to using -- The same is true for many LINQ patterns that may get pessimized under
This is not typically the case and is a bad expectation that's been propagated from years of misinformation. You can get better performance for AOT if you remove all portability, compile for a specific target machine, and statically link the world. However, typical AOT deployments (even for C/C++) instead target a more portable common baseline (which for Intel/AMD is typically a machine from ~2003/2004) and hit many real world limitations in place due to things like not being able to inline across dynamic dependencies (cannot statically link the world), lack of tuning based on dynamic inputs (static PGO doesn't cover everything), etc. Equally you can get better performance for a JIT because it specifically targets the hardware it runs on. Because it statically knows its dependency set, doesn't have to consider dynamic boundaries, can dynamically tune for the payloads it runs on, etc. The place the JIT tends to fall down is on startup because it has to compile more resources, but even that can be mitigated via things like pre-JIT, which is a form of partial AOT to improve startup but still allow the JIT to run. But, the reality is that many performance improvements that show up in microbenchmarks do not translate equivalently over to real world applications; especially not applications that depend on disk IO, networking, or any kind of "human" latency (like user input). So outside of specific scenarios, typical AOT vs JIT apps tend to be comparable in performance and the decision of which to use should come down to your deployment needs. For example, considerations should include (but are not limited to) things like (this is a high level overview):
|
This is not a correct expectation. The upper boundary of performance of AOT is better for some cases (singleton devirtualization), and worse for some others (adaptive instruction set, dynamic PGO, absolute addressing of static variables). AOT is NOT naturally faster.
It's meaningless to use ratio here - it's complexity of growing function ( |
Just a reminder, there is something to try to avoid both quadratic size growth and "Places where we went from a constant-time algorithm to a linear-time algorithm". #115033 (comment) |
Yes I guess the ideal is to have a "lesser" set of perf optimization for SizeOptimized that avoid the NxM behaviour but keep the algorithmic complexity the same as the optimized version but with perhaps less performance... |
But the new behavior is a lot worse, isn't it? That's the problem - getting something that maybe was slow, but now a LOT slower.
Maybe, but what kind performance loss expectations most people have when choosing option to optimise for size?
At what point they would NOT choose it if they were informed that their stuff can a lot slower? They get zero info right now, maybe compiler should warn about those parts that they can execute super slow and it's a potential security issue. It was one thing to try it for mobile work loads, what's the worst going to happen there - app hangs and user restarts it? But here we are talking potential server run stuff that can take lots of services down, this is just DOS waiting to be exploited. Anyway, I am done here. |
Thinking about this some more, I think there are a few points:
|
Description
I've seeing massive performance regression when targeting .NET 10 instead of .NET 9 with below repro code
https://github.com/wowdev/TACTSharp/blob/ed6fb1b7bd3220afea1c3845c2dbe797f553ac1c/TACTSharp/GroupIndex.cs#L93-L111 code where that problem was found initially
Project file
Please note that PublishAot bit in a project is important because it apparently turns on UseSizeOptimizedLinq option.
Configuration
Win11 x64, .NET 10 Preview 3
Regression?
Yes.
Data
On my system i'm getting following results:
So it's around 2443 times slower on .NET 10.
Analysis
Enabled UseSizeOptimizedLinq option (triggered by enabling PublishAot) seems to be what contributes to the problem.
The text was updated successfully, but these errors were encountered: