-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Improve performance of Order{By}{Descending}(...).First/Last #97483
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
When there's no ThenBy, we can take a more optimized path that uses the TKey's comparer directly. We already have a fast path for this case that converts the O(n log n) operation into O(n), but it employs a comparer that's much more complicated, and as that comparer is used in the inner loop, it makes a meaningful difference.
Tagging subscribers to this area: @dotnet/area-system-linq Issue DetailsWhen there's no ThenBy, we can take a more optimized path that uses the TKey's comparer directly. We already have a fast path for this case that converts the O(n log n) operation into O(n), but it employs a comparer that's much more complicated, and as that comparer is used in the inner loop, it makes a meaningful difference. using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System.Runtime.InteropServices;
BenchmarkSwitcher.FromAssembly(typeof(Tests).Assembly).Run(args);
[MemoryDiagnoser(false)]
public class Tests
{
[Params(8, 8000)]
public int Count { get; set; }
private List<double> _doubles;
private List<int> _ints;
private string[] _strings;
[GlobalSetup]
public void Setup()
{
_doubles = new(Enumerable.Range(-Count, Count * 2).Select(x => (double)x));
new Random(42).Shuffle(CollectionsMarshal.AsSpan(_doubles));
_ints = new(Enumerable.Range(-Count, Count * 2));
new Random(42).Shuffle(CollectionsMarshal.AsSpan(_doubles));
_strings = Enumerable.Range(-Count, Count * 2).Select(x => x.ToString()).ToArray();
new Random(42).Shuffle(_strings);
}
[Benchmark] public double OrderByFirst_Double() => _doubles.OrderBy(x => x).First();
[Benchmark] public double OrderLast_Double() => _doubles.Order().Last();
[Benchmark] public int OrderByFirst_Int32() => _ints.OrderBy(x => x).First();
[Benchmark] public int OrderLast_Int32() => _ints.Order().Last();
[Benchmark] public string OrderByFirst_String() => _strings.OrderBy(x => x).First();
[Benchmark] public string OrderLast_String() => _strings.Order().Last();
}
Fixes #87921
|
internal sealed partial class OrderedEnumerable<TElement, TKey> : OrderedEnumerable<TElement> | ||
{ | ||
// For complicated cases, rely on the base implementation that's more comprehensive. | ||
// For the simple case of OrderBy(...).First() or OrderByDescending(...).First() (i.e. where |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it be even faster if MinBy
or MaxBy
were used in this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They're not the same thing. It would work for OrderBy(...).First but not OrderBy(...).Last, because of stability. Consider this:
object[] values = [9, 1, 2, 3, 4, 5, 6, 7, 8, 9];
object result1 = values.OrderBy(x => x).Last();
object result2 = values.MaxBy(x => x)!;
Console.WriteLine(result1);
Console.WriteLine(result2);
Console.WriteLine(ReferenceEquals(result1, result2));
That prints:
9
9
false
because the OrderBy(...).Last will find the last boxed 9 and the MaxBy(...) will find the first boxed 9.
It would also require refactoring MinBy/MaxBy to separate out the workhorse, as they might throw for empty where these implementations might not if they're being used as part of FirstOrDefault/LastOrDefault.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough. As you're pointing out though the above changes if I use values.OrderByDescending(x => x).First()
instead. Arguably then it's just an artifact of the particular workaround used to encode MaxBy
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arguably then it's just an artifact of the particular workaround used to encode MaxBy.
I don't understand. The issue is very specifically that MaxBy uses >
:
if (nextKey != null && comparer.Compare(nextKey, key) > 0) |
and the OrderBy.Last semantics need it to be
>=
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is using Last
common enough in such cases? grep.app seems to suggest that OrderByDescending.First
is more common than OrderBy.Last
(although the total number of hits don't appear to be large enough for this to be conclusive).
My earlier point was mostly an assertion that in most cases, what chain does get picked is accidental rather than intentional use of the stability guarantees of ordered enumerables, and that in most cases these patterns survive from when MaxBy
and MinBy
weren't available -- so perhaps it might be best to guide folks to just use that (e.g. via an analyzer). You're right that these are no substitute for chaining with FirstOrDefault
though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do the benchmarks compare to using MinBy
or MaxBy
?
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System.Runtime.InteropServices;
BenchmarkSwitcher.FromAssembly(typeof(Tests).Assembly).Run(args);
[MemoryDiagnoser(false)]
public class Tests
{
[Params(1000)]
public int Count { get; set; }
private List<double> _doubles;
private List<int> _ints;
private string[] _strings;
[GlobalSetup]
public void Setup()
{
_doubles = new(Enumerable.Range(-Count, Count * 2).Select(x => (double)x));
new Random(42).Shuffle(CollectionsMarshal.AsSpan(_doubles));
_ints = new(Enumerable.Range(-Count, Count * 2));
new Random(42).Shuffle(CollectionsMarshal.AsSpan(_doubles));
_strings = Enumerable.Range(-Count, Count * 2).Select(x => x.ToString()).ToArray();
new Random(42).Shuffle(_strings);
}
[Benchmark] public double OrderByLast_Double() => _doubles.OrderBy(x => x).Last();
[Benchmark] public double OrderLast_Double() => _doubles.Order().Last();
[Benchmark] public int OrderByLast_Int32() => _ints.OrderBy(x => x).Last();
[Benchmark] public int OrderLast_Int32() => _ints.Order().Last();
[Benchmark] public string OrderByLast_String() => _strings.OrderBy(x => x).Last();
[Benchmark] public string OrderLast_String() => _strings.Order().Last();
[Benchmark] public double MaxBy_Double() => _doubles.MaxBy(x => x);
[Benchmark] public string MaxBy_String() => _strings.MaxBy(x => x);
[Benchmark] public int MaxBy_Int32() => _ints.MaxBy(x => x);
}
|
When there's no ThenBy, we can take a more optimized path that uses the TKey's comparer directly. We already have a fast path for this case that converts the O(n log n) operation into O(n), but it employs a comparer that's much more complicated, and as that comparer is used in the inner loop, it makes a meaningful difference.
Fixes #87921