Improve vectorization of Enumerable.Min/Max #76144

stephentoub · 2022-09-25T03:11:36Z

Expand it to all vectorizable comparable types with Enumerable.Min<T>/Max<T> (byte, sbyte, short, ushort, uint, ulong, nuint, nint) beyond the int/long already supported
Combine Min/Max into a single method using a static abstract interface with generic specialization to differentiate the operations
Use Vector128<T> and Vector256<T> instead of Vector<T>
Process any remaining elements as part of a single final vector rather than falling back to a scalar loop
Improve test coverage

// Int32Tests (int was previously vectorized)

Method	Toolchain	Count	Mean	Ratio
Max	\main\corerun.exe	1	2.111 ns	1.00
Max	\pr\corerun.exe	1	1.978 ns	0.94

Max	\main\corerun.exe	4	4.719 ns	1.00
Max	\pr\corerun.exe	4	3.186 ns	0.67

Max	\main\corerun.exe	7	7.473 ns	1.00
Max	\pr\corerun.exe	7	3.372 ns	0.45

Max	\main\corerun.exe	23	11.092 ns	1.00
Max	\pr\corerun.exe	23	4.940 ns	0.45

Max	\main\corerun.exe	1024	99.851 ns	1.00
Max	\pr\corerun.exe	1024	43.150 ns	0.43

// ByteTests (bytes weren't previously vectorized)

Method	Toolchain	Count	Mean	Ratio
Max	\main\corerun.exe	1	16.055 ns	1.00
Max	\pr\corerun.exe	1	3.556 ns	0.22

Max	\main\corerun.exe	16	73.951 ns	1.00
Max	\pr\corerun.exe	16	15.251 ns	0.21

Max	\main\corerun.exe	28	126.087 ns	1.00
Max	\pr\corerun.exe	28	15.454 ns	0.12

Max	\main\corerun.exe	92	387.411 ns	1.00
Max	\pr\corerun.exe	92	29.735 ns	0.08

Max	\main\corerun.exe	4096	16,702.786 ns	1.000
Max	\pr\corerun.exe	4096	69.616 ns	0.004

using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System;
using System.Linq;

public partial class Program
{
    static void Main(string[] args) => BenchmarkSwitcher.FromAssembly(typeof(Program).Assembly).Run(args);
}

public class Int32Tests
{
    [Params(1, 4, 7, 23, 1024)]
    public int Count { get; set; }

    private int[] _values;

    [GlobalSetup]
    public void Setup()
    {
        var r = new Random(42);
        _values = Enumerable.Range(0, Count).Select(_ => r.Next()).ToArray();
    }

    [Benchmark]
    public int Max() => _values.Max();
}

public class ByteTests
{
    [Params(1, 16, 28, 92, 4096)]
    public int Count { get; set; }

    private byte[] _values;

    [GlobalSetup]
    public void Setup()
    {
        var r = new Random(42);
        _values = Enumerable.Range(0, Count).Select(_ => (byte)r.Next()).ToArray();
    }

    [Benchmark]
    public byte Max() => _values.Max();
}

ghost · 2022-09-25T03:11:51Z

Tagging subscribers to this area: @dotnet/area-system-linq
See info in area-owners.md if you want to be subscribed.

Issue Details

Expand it to all vectorizable comparable types with Enumerable.Min<T>/Max<T> (byte, sbyte, short, ushort, uint, ulong, nuint, nint) beyond the int/long already supported
Combine Min/Max into a single method using a static abstract interface with generic specialization to differentiate the operations
Use Vector128<T> and Vector256<T> instead of Vector<T>
Process any remaining elements as part of a single final vector rather than falling back to a scalar loop
Improve test coverage

// Int32Tests (int was previously vectorized)

Method	Toolchain	Count	Mean	Ratio
Max	\main\corerun.exe	1	2.111 ns	1.00
Max	\pr\corerun.exe	1	1.978 ns	0.94

Max	\main\corerun.exe	4	4.719 ns	1.00
Max	\pr\corerun.exe	4	3.186 ns	0.67

Max	\main\corerun.exe	7	7.473 ns	1.00
Max	\pr\corerun.exe	7	3.372 ns	0.45

Max	\main\corerun.exe	23	11.092 ns	1.00
Max	\pr\corerun.exe	23	4.940 ns	0.45

Max	\main\corerun.exe	1024	99.851 ns	1.00
Max	\pr\corerun.exe	1024	43.150 ns	0.43

// ByteTests (bytes weren't previously vectorized)

Method	Toolchain	Count	Mean	Ratio
Max	\main\corerun.exe	1	16.055 ns	1.00
Max	\pr\corerun.exe	1	3.556 ns	0.22

Max	\main\corerun.exe	16	73.951 ns	1.00
Max	\pr\corerun.exe	16	15.251 ns	0.21

Max	\main\corerun.exe	28	126.087 ns	1.00
Max	\pr\corerun.exe	28	15.454 ns	0.12

Max	\main\corerun.exe	92	387.411 ns	1.00
Max	\pr\corerun.exe	92	29.735 ns	0.08

Max	\main\corerun.exe	4096	16,702.786 ns	1.000
Max	\pr\corerun.exe	4096	69.616 ns	0.004

using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System;
using System.Linq;

public partial class Program
{
    static void Main(string[] args) => BenchmarkSwitcher.FromAssembly(typeof(Program).Assembly).Run(args);
}

public class Int32Tests
{
    [Params(1, 4, 7, 23, 1024)]
    public int Count { get; set; }

    private int[] _values;

    [GlobalSetup]
    public void Setup()
    {
        var r = new Random(42);
        _values = Enumerable.Range(0, Count).Select(_ => r.Next()).ToArray();
    }

    [Benchmark]
    public int Max() => _values.Max();
}

public class ByteTests
{
    [Params(1, 16, 28, 92, 4096)]
    public int Count { get; set; }

    private byte[] _values;

    [GlobalSetup]
    public void Setup()
    {
        var r = new Random(42);
        _values = Enumerable.Range(0, Count).Select(_ => (byte)r.Next()).ToArray();
    }

    [Benchmark]
    public byte Max() => _values.Max();
}

Author:	stephentoub
Assignees:	-
Labels:	`area-System.Linq`, `tenet-performance`
Milestone:	8.0.0

- Expand it to all supported types with `Enumerable.Min<T>`/`Max<T>` - Combine Min/Max into a single method using a static abstract interface with generic specialization to differentiate - Use `Vector128<T>` and `Vector256<T>` instead of `Vector<T>` - Improve test coverage

EgorBo · 2022-09-25T13:09:31Z

I suspect that LINQ Max/Min are not IEEE754 compatible for floating points due to usage of Vector_.Max/Min which are lowered to single vmaxp/vminp instructions.

So e.g. -0.0f can be considered as Max over +0.0f (on x86/64, arm is fine)

cc @tannergooding

Ah, looks like it's always been like that, reproduces on .net 5.0 and .net472

stephentoub · 2022-09-25T13:20:18Z

I suspect that LINQ Max/Min are not IEEE754 compatible for floating points due to usage of Vector_.Max/Min

This PR isn't using vector for floating point.

EgorBo · 2022-09-25T13:32:27Z

I suspect that LINQ Max/Min are not IEEE754 compatible for floating points due to usage of Vector_.Max/Min

This PR isn't using vector for floating point.

May I ask why then? Since it won't break the current behavior

stephentoub · 2022-09-25T14:59:01Z

May I ask why then? Since it won't break the current behavior

This pr is focused on the MaxInteger method, which doesn't have to be concerned with nans.

src/libraries/System.Linq/src/System/Linq/Min.cs

src/libraries/System.Linq/src/System/Linq/MaxMin.cs

stephentoub · 2022-09-25T20:58:20Z

May I ask why then? Since it won't break the current behavior

This pr is focused on the MaxInteger method, which doesn't have to be concerned with nans.

@EgorBo, FYI, I just tried vectorizing Max's use (still handling the search for the first non-NaN sequentially and then vectorizing the rest), and it breaks existing tests around NaN.

EgorBo · 2022-09-25T21:34:25Z

May I ask why then? Since it won't break the current behavior

This pr is focused on the MaxInteger method, which doesn't have to be concerned with nans.

@EgorBo, FYI, I just tried vectorizing Max's use (still handling the search for the first non-NaN sequentially and then vectorizing the rest), and it breaks existing tests around NaN.

Ah I see, thanks for trying it! I assume we need to raise an issue for this behavior since it's not consistent, will wait @tannergooding's feedback whether if it's expected or there is already an issue (I failed to find one)

I assume the IEEE754 Max simd function might look like this:

static Vector128<float> Max(Vector128<float> v1, Vector128<float> v2) =>
    (Vector128.Max(v1, v2) & Vector128.Max(v1, v2)) | Sse.CompareUnordered(v1, v2);

and just Vector128.Max on arm. but it will break the current contract

tannergooding · 2022-09-26T15:17:00Z

I assume we need to raise an issue for this behavior since it's not consistent, will wait @tannergooding's feedback whether if it's expected or there is already an issue (I failed to find one)

This is a "downside" of the Vector<T> APIs being exposed back in 2014 and without much consideration for IEEE 754 at the time. Instead, most of the code was simplify codified only for performance instead.

Realistically, we should have the Max/Min APIs on the Vector<T> types match the same APIs on float/double/Half. That will technically be a breaking change (although not one most people would notice) and would also negatively impact perf. -- The same should really also be true for the LINQ APIs, but they also follow a "different" behavior.

In an "ideal" world, I'd take the break here and consider exposing a FastMax/FastMin for the case where you want effectively (a > b) ? a : b and (a < b) ? a : b, as this is what maxps/minps effectively due (they return b if either input is nan or if both results are zero, regardless of sign).

src/libraries/System.Linq/src/System/Linq/MaxMin.cs

dakersnar · 2022-10-04T16:48:22Z

Regressions for Windows x64: dotnet/perf-autofiling-issues#8818 (comment)

stephentoub added area-System.Linq tenet-performance Performance related issue labels Sep 25, 2022

stephentoub added this to the 8.0.0 milestone Sep 25, 2022

stephentoub requested review from eiriktsarpalis and tannergooding September 25, 2022 03:11

ghost assigned stephentoub Sep 25, 2022

stephentoub force-pushed the enumerableminmax branch from 35ec905 to 7106575 Compare September 25, 2022 03:14

gfoidl reviewed Sep 25, 2022

View reviewed changes

src/libraries/System.Linq/src/System/Linq/Min.cs Show resolved Hide resolved

src/libraries/System.Linq/src/System/Linq/MaxMin.cs Show resolved Hide resolved

krwq reviewed Sep 27, 2022

View reviewed changes

src/libraries/System.Linq/src/System/Linq/MaxMin.cs Show resolved Hide resolved

krwq reviewed Sep 27, 2022

View reviewed changes

src/libraries/System.Linq/src/System/Linq/MaxMin.cs Show resolved Hide resolved

krwq reviewed Sep 27, 2022

View reviewed changes

src/libraries/System.Linq/src/System/Linq/MaxMin.cs Show resolved Hide resolved

krwq approved these changes Sep 27, 2022

View reviewed changes

stephentoub merged commit 31e0bd1 into dotnet:main Sep 27, 2022

stephentoub deleted the enumerableminmax branch September 27, 2022 16:02

dakersnar mentioned this pull request Oct 4, 2022

[Perf] Windows/x64: 3 Regressions on 9/27/2022 4:04:11 PM dotnet/perf-autofiling-issues#8818

Closed

ghost locked as resolved and limited conversation to collaborators Nov 3, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve vectorization of Enumerable.Min/Max #76144

Improve vectorization of Enumerable.Min/Max #76144

Uh oh!

stephentoub commented Sep 25, 2022

Uh oh!

ghost commented Sep 25, 2022

// Int32Tests (int was previously vectorized)

// ByteTests (bytes weren't previously vectorized)

Uh oh!

EgorBo commented Sep 25, 2022 •

edited

Loading

Uh oh!

stephentoub commented Sep 25, 2022 •

edited

Loading

Uh oh!

EgorBo commented Sep 25, 2022 •

edited

Loading

Uh oh!

stephentoub commented Sep 25, 2022

Uh oh!

Uh oh!

Uh oh!

stephentoub commented Sep 25, 2022

Uh oh!

EgorBo commented Sep 25, 2022

Uh oh!

tannergooding commented Sep 26, 2022

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dakersnar commented Oct 4, 2022

Uh oh!

Uh oh!

Improve vectorization of Enumerable.Min/Max #76144

Improve vectorization of Enumerable.Min/Max #76144

Uh oh!

Conversation

stephentoub commented Sep 25, 2022

// Int32Tests (int was previously vectorized)

// ByteTests (bytes weren't previously vectorized)

Uh oh!

ghost commented Sep 25, 2022

// Int32Tests (int was previously vectorized)

// ByteTests (bytes weren't previously vectorized)

Uh oh!

EgorBo commented Sep 25, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stephentoub commented Sep 25, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

EgorBo commented Sep 25, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stephentoub commented Sep 25, 2022

Uh oh!

Uh oh!

Uh oh!

stephentoub commented Sep 25, 2022

Uh oh!

EgorBo commented Sep 25, 2022

Uh oh!

tannergooding commented Sep 26, 2022

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dakersnar commented Oct 4, 2022

Uh oh!

Uh oh!

EgorBo commented Sep 25, 2022 •

edited

Loading

stephentoub commented Sep 25, 2022 •

edited

Loading

EgorBo commented Sep 25, 2022 •

edited

Loading