Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

seiko2plus
Copy link
Member

@seiko2plus seiko2plus commented Sep 6, 2025

numpy-simd-routines added as subrepo in meson subprojects
directory and the current FP configuration is static, ~1ulp used for double-precision
~4ulp for single-precision with handling floating-point errors,
special-cases extended precision for large arguments,
subnormals are enabled by default too.

numpy-simd-routines supports all SIMD extensions that are supported
by Google Highway including non-FMA extensions and is fully independent
from libm to guarantee unified results across all compilers and
platforms.

Full benchmarks will be provided within the pull-request, the following
benchmark was tested against clang-19 and x86 CPU (Ryzen7 7700X)
with AVX512 enabled.

Note: that there was no SIMD optimization enabled for sin/cos
for double-precision before, only single-precision.

Before After Ratio Benchmark (Parameter)
713±6μs 633±6μs 0.89 UnaryFP(<ufunc 'cos'>, 1, 2, 'f')
717±9μs 637±6μs 0.89 UnaryFP(<ufunc 'cos'>, 4, 1, 'f')
705±3μs 607±10μs 0.86 UnaryFP(<ufunc 'sin'>, 4, 1, 'f')
714±10μs 595±0.5μs 0.83 UnaryFP(<ufunc 'sin'>, 1, 2, 'f')
370±0.3μs 277±4μs 0.75 UnaryFP(<ufunc 'cos'>, 1, 1, 'f')
373±2μs 236±0.6μs 0.63 UnaryFP(<ufunc 'sin'>, 1, 1, 'f')
1.06±0.01ms 648±3μs 0.61 UnaryFP(<ufunc 'cos'>, 4, 2, 'f')
1.06±0.01ms 617±30μs 0.58 UnaryFP(<ufunc 'sin'>, 4, 2, 'f')
5.06±0.06ms 2.61±0.3ms 0.52 UnaryFPSpecial(<ufunc 'cos'>, 4, 2, 'd')
1.48±0ms 715±5μs 0.48 UnaryFPSpecial(<ufunc 'sin'>, 1, 2, 'f')
1.50±0.01ms 639±6μs 0.43 UnaryFPSpecial(<ufunc 'cos'>, 1, 2, 'f')
5.15±0.1ms 1.96±0.01ms 0.38 UnaryFPSpecial(<ufunc 'cos'>, 4, 1, 'd')
5.72±0.02ms 2.09±0.1ms 0.37 UnaryFP(<ufunc 'cos'>, 4, 2, 'd')
5.76±0.01ms 2.03±0.08ms 0.35 UnaryFP(<ufunc 'sin'>, 4, 2, 'd')
5.07±0.08ms 1.76±0.2ms 0.35 UnaryFPSpecial(<ufunc 'cos'>, 1, 2, 'd')
6.04±0.04ms 2.05±0.09ms 0.34 UnaryFPSpecial(<ufunc 'sin'>, 4, 2, 'd')
5.79±0.03ms 1.90±0.2ms 0.33 UnaryFP(<ufunc 'sin'>, 4, 1, 'd')
2.29±0.1ms 762±40μs 0.33 UnaryFPSpecial(<ufunc 'sin'>, 4, 1, 'f')
5.72±0.1ms 1.75±0.07ms 0.31 UnaryFP(<ufunc 'cos'>, 4, 1, 'd')
6.04±0.03ms 1.82±0.2ms 0.3 UnaryFPSpecial(<ufunc 'sin'>, 4, 1, 'd')
2.49±0.1ms 748±30μs 0.3 UnaryFPSpecial(<ufunc 'sin'>, 4, 2, 'f')
2.23±0.1ms 634±6μs 0.28 UnaryFPSpecial(<ufunc 'cos'>, 4, 1, 'f')
1.31±0.03ms 367±5μs 0.28 UnaryFPSpecial(<ufunc 'sin'>, 1, 1, 'f')
2.55±0.09ms 654±30μs 0.26 UnaryFPSpecial(<ufunc 'cos'>, 4, 2, 'f')
4.97±0.03ms 1.14±0ms 0.23 UnaryFPSpecial(<ufunc 'cos'>, 1, 1, 'd')
5.67±0.01ms 1.22±0.03ms 0.22 UnaryFP(<ufunc 'cos'>, 1, 2, 'd')
5.76±0.03ms 1.28±0.06ms 0.22 UnaryFP(<ufunc 'sin'>, 1, 2, 'd')
1.26±0.01ms 272±2μs 0.22 UnaryFPSpecial(<ufunc 'cos'>, 1, 1, 'f')
7.03±0.02ms 1.31±0.01ms 0.19 UnaryFPSpecial(<ufunc 'sin'>, 1, 2, 'd')
5.67±0.01ms 810±9μs 0.14 UnaryFP(<ufunc 'cos'>, 1, 1, 'd')
5.71±0.01ms 817±40μs 0.14 UnaryFP(<ufunc 'sin'>, 1, 1, 'd')
7.05±0.03ms 915±4μs 0.13 UnaryFPSpecial(<ufunc 'sin'>, 1, 1, 'd')

@seiko2plus seiko2plus force-pushed the brings_npsr branch 2 times, most recently from 0ccab74 to 77f1bc9 Compare September 6, 2025 15:27
  numpy-simd-routines added as subrepo in meson subprojects
  directory and the current FP configuration is static, ~1ulp used for double-precision
  ~4ulp for single-precision with handling floating-point errors,
  special-cases extended precision for large arguments,
  subnormals are enabled by default too.

  numpy-simd-routines supports all SIMD extensions that are supported
  by Google Highway including non-FMA extensions and is fully independent
  from libm to guarantee unified results across all compilers and
  platforms.

  Full benchmarks will be provided within the pull-request, the following
  benchmark was tested against clang-19 and x86 CPU (Ryzen7 7700X)
  with AVX512 enabled.

  Note: that there was no SIMD optimization enabled for sin/cos
  for double-precision, only single-precision.

  | Before        | After       |  Ratio | Benchmark (Parameter)                    |
  |---------------|-------------|--------|------------------------------------------|
  | 713±6μs       | 633±6μs     |   0.89 | UnaryFP(<ufunc 'cos'>, 1, 2, 'f')        |
  | 717±9μs       | 637±6μs     |   0.89 | UnaryFP(<ufunc 'cos'>, 4, 1, 'f')        |
  | 705±3μs       | 607±10μs    |   0.86 | UnaryFP(<ufunc 'sin'>, 4, 1, 'f')        |
  | 714±10μs      | 595±0.5μs   |   0.83 | UnaryFP(<ufunc 'sin'>, 1, 2, 'f')        |
  | 370±0.3μs     | 277±4μs     |   0.75 | UnaryFP(<ufunc 'cos'>, 1, 1, 'f')        |
  | 373±2μs       | 236±0.6μs   |   0.63 | UnaryFP(<ufunc 'sin'>, 1, 1, 'f')        |
  | 1.06±0.01ms   | 648±3μs     |   0.61 | UnaryFP(<ufunc 'cos'>, 4, 2, 'f')        |
  | 1.06±0.01ms   | 617±30μs    |   0.58 | UnaryFP(<ufunc 'sin'>, 4, 2, 'f')        |
  | 5.06±0.06ms   | 2.61±0.3ms  |   0.52 | UnaryFPSpecial(<ufunc 'cos'>, 4, 2, 'd') |
  | 1.48±0ms      | 715±5μs     |   0.48 | UnaryFPSpecial(<ufunc 'sin'>, 1, 2, 'f') |
  | 1.50±0.01ms   | 639±6μs     |   0.43 | UnaryFPSpecial(<ufunc 'cos'>, 1, 2, 'f') |
  | 5.15±0.1ms    | 1.96±0.01ms |   0.38 | UnaryFPSpecial(<ufunc 'cos'>, 4, 1, 'd') |
  | 5.72±0.02ms   | 2.09±0.1ms  |   0.37 | UnaryFP(<ufunc 'cos'>, 4, 2, 'd')        |
  | 5.76±0.01ms   | 2.03±0.08ms |   0.35 | UnaryFP(<ufunc 'sin'>, 4, 2, 'd')        |
  | 5.07±0.08ms   | 1.76±0.2ms  |   0.35 | UnaryFPSpecial(<ufunc 'cos'>, 1, 2, 'd') |
  | 6.04±0.04ms   | 2.05±0.09ms |   0.34 | UnaryFPSpecial(<ufunc 'sin'>, 4, 2, 'd') |
  | 5.79±0.03ms   | 1.90±0.2ms  |   0.33 | UnaryFP(<ufunc 'sin'>, 4, 1, 'd')        |
  | 2.29±0.1ms    | 762±40μs    |   0.33 | UnaryFPSpecial(<ufunc 'sin'>, 4, 1, 'f') |
  | 5.72±0.1ms    | 1.75±0.07ms |   0.31 | UnaryFP(<ufunc 'cos'>, 4, 1, 'd')        |
  | 6.04±0.03ms   | 1.82±0.2ms  |   0.3  | UnaryFPSpecial(<ufunc 'sin'>, 4, 1, 'd') |
  | 2.49±0.1ms    | 748±30μs    |   0.3  | UnaryFPSpecial(<ufunc 'sin'>, 4, 2, 'f') |
  | 2.23±0.1ms    | 634±6μs     |   0.28 | UnaryFPSpecial(<ufunc 'cos'>, 4, 1, 'f') |
  | 1.31±0.03ms   | 367±5μs     |   0.28 | UnaryFPSpecial(<ufunc 'sin'>, 1, 1, 'f') |
  | 2.55±0.09ms   | 654±30μs    |   0.26 | UnaryFPSpecial(<ufunc 'cos'>, 4, 2, 'f') |
  | 4.97±0.03ms   | 1.14±0ms    |   0.23 | UnaryFPSpecial(<ufunc 'cos'>, 1, 1, 'd') |
  | 5.67±0.01ms   | 1.22±0.03ms |   0.22 | UnaryFP(<ufunc 'cos'>, 1, 2, 'd')        |
  | 5.76±0.03ms   | 1.28±0.06ms |   0.22 | UnaryFP(<ufunc 'sin'>, 1, 2, 'd')        |
  | 1.26±0.01ms   | 272±2μs     |   0.22 | UnaryFPSpecial(<ufunc 'cos'>, 1, 1, 'f') |
  | 7.03±0.02ms   | 1.31±0.01ms |   0.19 | UnaryFPSpecial(<ufunc 'sin'>, 1, 2, 'd') |
  | 5.67±0.01ms   | 810±9μs     |   0.14 | UnaryFP(<ufunc 'cos'>, 1, 1, 'd')        |
  | 5.71±0.01ms   | 817±40μs    |   0.14 | UnaryFP(<ufunc 'sin'>, 1, 1, 'd')        |
  | 7.05±0.03ms   | 915±4μs     |   0.13 | UnaryFPSpecial(<ufunc 'sin'>, 1, 1, 'd') |
Allow up to 3 ULP error for float32 sin/cos when native
FMA is not available.
@rgommers rgommers added the component: SIMD Issues in SIMD (fast instruction sets) code or machinery label Sep 10, 2025
@seiko2plus seiko2plus added this to the 2.4.0 release milestone Sep 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
01 - Enhancement component: SIMD Issues in SIMD (fast instruction sets) code or machinery
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants