Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

stephentoub
Copy link
Member

  • Expand it to all vectorizable comparable types with Enumerable.Min<T>/Max<T> (byte, sbyte, short, ushort, uint, ulong, nuint, nint) beyond the int/long already supported
  • Combine Min/Max into a single method using a static abstract interface with generic specialization to differentiate the operations
  • Use Vector128<T> and Vector256<T> instead of Vector<T>
  • Process any remaining elements as part of a single final vector rather than falling back to a scalar loop
  • Improve test coverage

// Int32Tests (int was previously vectorized)

Method Toolchain Count Mean Ratio
Max \main\corerun.exe 1 2.111 ns 1.00
Max \pr\corerun.exe 1 1.978 ns 0.94
Max \main\corerun.exe 4 4.719 ns 1.00
Max \pr\corerun.exe 4 3.186 ns 0.67
Max \main\corerun.exe 7 7.473 ns 1.00
Max \pr\corerun.exe 7 3.372 ns 0.45
Max \main\corerun.exe 23 11.092 ns 1.00
Max \pr\corerun.exe 23 4.940 ns 0.45
Max \main\corerun.exe 1024 99.851 ns 1.00
Max \pr\corerun.exe 1024 43.150 ns 0.43

// ByteTests (bytes weren't previously vectorized)

Method Toolchain Count Mean Ratio
Max \main\corerun.exe 1 16.055 ns 1.00
Max \pr\corerun.exe 1 3.556 ns 0.22
Max \main\corerun.exe 16 73.951 ns 1.00
Max \pr\corerun.exe 16 15.251 ns 0.21
Max \main\corerun.exe 28 126.087 ns 1.00
Max \pr\corerun.exe 28 15.454 ns 0.12
Max \main\corerun.exe 92 387.411 ns 1.00
Max \pr\corerun.exe 92 29.735 ns 0.08
Max \main\corerun.exe 4096 16,702.786 ns 1.000
Max \pr\corerun.exe 4096 69.616 ns 0.004
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System;
using System.Linq;

public partial class Program
{
    static void Main(string[] args) => BenchmarkSwitcher.FromAssembly(typeof(Program).Assembly).Run(args);
}

public class Int32Tests
{
    [Params(1, 4, 7, 23, 1024)]
    public int Count { get; set; }

    private int[] _values;

    [GlobalSetup]
    public void Setup()
    {
        var r = new Random(42);
        _values = Enumerable.Range(0, Count).Select(_ => r.Next()).ToArray();
    }

    [Benchmark]
    public int Max() => _values.Max();
}

public class ByteTests
{
    [Params(1, 16, 28, 92, 4096)]
    public int Count { get; set; }

    private byte[] _values;

    [GlobalSetup]
    public void Setup()
    {
        var r = new Random(42);
        _values = Enumerable.Range(0, Count).Select(_ => (byte)r.Next()).ToArray();
    }

    [Benchmark]
    public byte Max() => _values.Max();
}

@stephentoub stephentoub added area-System.Linq tenet-performance Performance related issue labels Sep 25, 2022
@stephentoub stephentoub added this to the 8.0.0 milestone Sep 25, 2022
@ghost ghost assigned stephentoub Sep 25, 2022
@ghost
Copy link

ghost commented Sep 25, 2022

Tagging subscribers to this area: @dotnet/area-system-linq
See info in area-owners.md if you want to be subscribed.

Issue Details
  • Expand it to all vectorizable comparable types with Enumerable.Min<T>/Max<T> (byte, sbyte, short, ushort, uint, ulong, nuint, nint) beyond the int/long already supported
  • Combine Min/Max into a single method using a static abstract interface with generic specialization to differentiate the operations
  • Use Vector128<T> and Vector256<T> instead of Vector<T>
  • Process any remaining elements as part of a single final vector rather than falling back to a scalar loop
  • Improve test coverage

// Int32Tests (int was previously vectorized)

Method Toolchain Count Mean Ratio
Max \main\corerun.exe 1 2.111 ns 1.00
Max \pr\corerun.exe 1 1.978 ns 0.94
Max \main\corerun.exe 4 4.719 ns 1.00
Max \pr\corerun.exe 4 3.186 ns 0.67
Max \main\corerun.exe 7 7.473 ns 1.00
Max \pr\corerun.exe 7 3.372 ns 0.45
Max \main\corerun.exe 23 11.092 ns 1.00
Max \pr\corerun.exe 23 4.940 ns 0.45
Max \main\corerun.exe 1024 99.851 ns 1.00
Max \pr\corerun.exe 1024 43.150 ns 0.43

// ByteTests (bytes weren't previously vectorized)

Method Toolchain Count Mean Ratio
Max \main\corerun.exe 1 16.055 ns 1.00
Max \pr\corerun.exe 1 3.556 ns 0.22
Max \main\corerun.exe 16 73.951 ns 1.00
Max \pr\corerun.exe 16 15.251 ns 0.21
Max \main\corerun.exe 28 126.087 ns 1.00
Max \pr\corerun.exe 28 15.454 ns 0.12
Max \main\corerun.exe 92 387.411 ns 1.00
Max \pr\corerun.exe 92 29.735 ns 0.08
Max \main\corerun.exe 4096 16,702.786 ns 1.000
Max \pr\corerun.exe 4096 69.616 ns 0.004
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System;
using System.Linq;

public partial class Program
{
    static void Main(string[] args) => BenchmarkSwitcher.FromAssembly(typeof(Program).Assembly).Run(args);
}

public class Int32Tests
{
    [Params(1, 4, 7, 23, 1024)]
    public int Count { get; set; }

    private int[] _values;

    [GlobalSetup]
    public void Setup()
    {
        var r = new Random(42);
        _values = Enumerable.Range(0, Count).Select(_ => r.Next()).ToArray();
    }

    [Benchmark]
    public int Max() => _values.Max();
}

public class ByteTests
{
    [Params(1, 16, 28, 92, 4096)]
    public int Count { get; set; }

    private byte[] _values;

    [GlobalSetup]
    public void Setup()
    {
        var r = new Random(42);
        _values = Enumerable.Range(0, Count).Select(_ => (byte)r.Next()).ToArray();
    }

    [Benchmark]
    public byte Max() => _values.Max();
}
Author: stephentoub
Assignees: -
Labels:

area-System.Linq, tenet-performance

Milestone: 8.0.0

- Expand it to all supported types with `Enumerable.Min<T>`/`Max<T>`
- Combine Min/Max into a single method using a static abstract interface with generic specialization to differentiate
- Use `Vector128<T>` and `Vector256<T>` instead of `Vector<T>`
- Improve test coverage
@EgorBo
Copy link
Member

EgorBo commented Sep 25, 2022

I suspect that LINQ Max/Min are not IEEE754 compatible for floating points due to usage of Vector_.Max/Min which are lowered to single vmaxp/vminp instructions.

So e.g. -0.0f can be considered as Max over +0.0f (on x86/64, arm is fine)

cc @tannergooding

image

Ah, looks like it's always been like that, reproduces on .net 5.0 and .net472

@stephentoub
Copy link
Member Author

stephentoub commented Sep 25, 2022

I suspect that LINQ Max/Min are not IEEE754 compatible for floating points due to usage of Vector_.Max/Min

This PR isn't using vector for floating point.

@EgorBo
Copy link
Member

EgorBo commented Sep 25, 2022

I suspect that LINQ Max/Min are not IEEE754 compatible for floating points due to usage of Vector_.Max/Min

This PR isn't using vector for floating point.

May I ask why then? Since it won't break the current behavior

@stephentoub
Copy link
Member Author

May I ask why then? Since it won't break the current behavior

This pr is focused on the MaxInteger method, which doesn't have to be concerned with nans.

@stephentoub
Copy link
Member Author

May I ask why then? Since it won't break the current behavior

This pr is focused on the MaxInteger method, which doesn't have to be concerned with nans.

@EgorBo, FYI, I just tried vectorizing Max's use (still handling the search for the first non-NaN sequentially and then vectorizing the rest), and it breaks existing tests around NaN.

@EgorBo
Copy link
Member

EgorBo commented Sep 25, 2022

May I ask why then? Since it won't break the current behavior

This pr is focused on the MaxInteger method, which doesn't have to be concerned with nans.

@EgorBo, FYI, I just tried vectorizing Max's use (still handling the search for the first non-NaN sequentially and then vectorizing the rest), and it breaks existing tests around NaN.

Ah I see, thanks for trying it! I assume we need to raise an issue for this behavior since it's not consistent, will wait @tannergooding's feedback whether if it's expected or there is already an issue (I failed to find one)

I assume the IEEE754 Max simd function might look like this:

static Vector128<float> Max(Vector128<float> v1, Vector128<float> v2) =>
    (Vector128.Max(v1, v2) & Vector128.Max(v1, v2)) | Sse.CompareUnordered(v1, v2);

and just Vector128.Max on arm. but it will break the current contract

@tannergooding
Copy link
Member

I assume we need to raise an issue for this behavior since it's not consistent, will wait @tannergooding's feedback whether if it's expected or there is already an issue (I failed to find one)

This is a "downside" of the Vector<T> APIs being exposed back in 2014 and without much consideration for IEEE 754 at the time. Instead, most of the code was simplify codified only for performance instead.

Realistically, we should have the Max/Min APIs on the Vector<T> types match the same APIs on float/double/Half. That will technically be a breaking change (although not one most people would notice) and would also negatively impact perf. -- The same should really also be true for the LINQ APIs, but they also follow a "different" behavior.

In an "ideal" world, I'd take the break here and consider exposing a FastMax/FastMin for the case where you want effectively (a > b) ? a : b and (a < b) ? a : b, as this is what maxps/minps effectively due (they return b if either input is nan or if both results are zero, regardless of sign).

@stephentoub stephentoub merged commit 31e0bd1 into dotnet:main Sep 27, 2022
@stephentoub stephentoub deleted the enumerableminmax branch September 27, 2022 16:02
@dakersnar
Copy link
Contributor

Regressions for Windows x64: dotnet/perf-autofiling-issues#8818 (comment)

@ghost ghost locked as resolved and limited conversation to collaborators Nov 3, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Linq tenet-performance Performance related issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants