ENH: Add support SLEEF for transcendental functions #23068

yamadafuyuka · 2023-01-23T02:50:41Z

Functions such as sin and log use libm except for AVX512_SKX, and at least in my environment SIMD instruction were not used.
Therefore, I added implementation to use SIMD library SLEEF ( https://sleef.org/ ) and measured the calculation time of some functions.
My branch: ( https://github.com/yamadafuyuka/numpy/tree/add_SLEEF )

I graphed the results. We also confirmed that using SVE intrinsics as in ( PR-22265 ) further speeds up (the log10 function is about 4 times faster).
I would like to add SLEEF support, but I am not sure which part of NumPy is the best place to implement it. Could you please advise?

The text was updated successfully, but these errors were encountered:

mattip · 2023-01-23T07:53:50Z

We are trying to move towards using universal intrinsics inside NumPy, so I am not sure we would want to mix in a whole new library with a different paradigm.

What version of NumPy did you test? On what platform?

kawakami-k · 2023-01-23T11:48:47Z

Hi, @mattip
I'm implimenting SVE support for Numpy with @yamadafuyuka .

The motivation of this issue is to improve the calculation speed of transcendental functions such as sin/cos/tan/log2/log10/exp and etc. For x64 on Linux, NumPy can be built with SVML and calculation is vectorized. In my understanding, for non-x64 CPUs, compilers links NumPy with libmath.so for the transcendental functions, that provides not-vectorized transcendental functions. Is this right?

SLEEF is one of the vectorized mathematical library. It supports multiple architecture as show in Table 1.1. Becuase the function names of SLEEF follow its naming convention, it is easy to abstract function name and write source code for multiple architectures/multiple instruction sets. The below is a function name example. u10 means that the function achieves 1.0-ULP calculation accuracy.

Transcendental function	Data type	ISA	SLEEF function name
sine	float	Arm NEON	Sleef_sinf4_u10
sine	float	Arm SVE	Sleef_sinfx_u10sve
sine	float	x64 AVX512	Sleef_sinf16_u10
sine	dobuel	Arm NEON	Sleef_sind2_u10
sine	double	Arm SVE	Sleef_sindx_u10sve
sine	double	x64 AVX512	Sleef_sind8_u10

NumPy has the universal intrinsic for multiple ISAs, so I think there is a way to use this to implement transcendental functions in a unified way. However, it would be time-consuming and difficult to implement various transcendental functions. I think it would be a good idea to divert SLEEF.

Since transcendental function processing is vectorized by SLEEF, the expected performance gain will be close to N, where N means the number of SIMD lanes. In practice, the gain will be smaller than N due to Python and other overhead.

Thank you.

mattip · 2023-01-23T13:39:23Z

What version of NumPy did you test to get your performance graphs? On what platform? We have already moved some of these functions to universal intrinsics, which is why I ask for exact platform and version information. It would be great if you could report import sys, numpy; print(numpy.__version__); print(sys.version) If you are running NumPy 1.24+, also show print(numpy.show_runtime())

yamadafuyuka · 2023-01-24T01:01:08Z

Thank you for your comment. Sorry for the late reply.
The environment is as follows:.

NumPy version: 1.23.3
Platform: AArch64
CPU: A64FX (Armv8.2a + SVE)

>>> print(numpy.__version__)
0+untagged.28802.g57e71fd   // Edited "1.23.3-release" version  e47cbb69b
>>> print(sys.version)
3.10.7 (main, Oct  4 2022, 00:38:28) [GCC 11.3.0]

yamadafuyuka · 2023-01-24T01:17:43Z

I am sorry for the insufficient explanation.

For the functions defined in numpy/numpy/core/src/umath/loops_umath_fp.dispatch.c.src, I want to use the SLEEF in other architectures the same way AVX512 uses the SVML.
In the current implementation, except for AVX512, NumPy uses the functions of #include <math.h>, which are scalar functions, right?

numpy/numpy/core/src/umath/loops_umath_fp.dispatch.c.src

Lines 216 to 238 in 172a194

    
           NPY_NO_EXPORT void NPY_CPU_DISPATCH_CURFX(@TYPE@_@func@) 
        
           (char **args, npy_intp const *dimensions, npy_intp const *steps, void *NPY_UNUSED(data)) 
        
           { 
        
           #if NPY_SIMD && defined(NPY_HAVE_AVX512_SKX) && defined(NPY_CAN_LINK_SVML) 
        
               const @type@ *src = (@type@*)args[0]; 
        
                     @type@ *dst = (@type@*)args[1]; 
        
               const int lsize = sizeof(src[0]); 
        
               const npy_intp ssrc = steps[0] / lsize; 
        
               const npy_intp sdst = steps[1] / lsize; 
        
               const npy_intp len = dimensions[0]; 
        
               assert(len <= 1 || (steps[0] % lsize == 0 && steps[1] % lsize == 0)); 
        
               if (!is_mem_overlap(src, steps[0], dst, steps[1], len) && 
        
                   npyv_loadable_stride_@sfx@(ssrc) && 
        
                   npyv_storable_stride_@sfx@(sdst)) { 
        
                   simd_@intrin@_@sfx@(src, ssrc, dst, sdst, len); 
        
                   return; 
        
               } 
        
           #endif 
        
               UNARY_LOOP { 
        
                   const @type@ in1 = *(@type@ *)ip1; 
        
                   *(@type@ *)op1 = npy_@intrin@@vsub@(in1); 
        
               } 
        
           }

mattip · 2023-01-24T05:31:43Z

We discussed approaches to using SIMD intrinsics in NEP 38. Specifically, we have a section about code enhancements. We did not really apply that section in the discussion to add SVML (PR #19478) other than to note

Getting SVML with BSD license is great deal, and it gonna be good base for start replacing them to universal intrinsics. Thank you!

There was a brief mention of SLEEF in that PR, but we did not consider using SLEEF instead/in addition to SVML.

Looking back over the mailing list, there is the discussion in 2015 mentioned in the SVML PR, and a recent mail from Chris Sidebottom about an effort to target aarch64.

I am not sure how I feel about integrating yet another vendored library for accelerated operations. On the one hand, we already have precedent with SVML. Integrating SLEEF would improve performance for other platforms. On the other, SLEEF's sources are twice as large as SVML, and the scope is larger. Would we then declare that we are not going to move these functions to universal intrinsics? What would we do with the code from #17587, #18101, and more? Could we do something more generic so that people who wished to could switch out SVML entirely, or use VOLK (GPL3) or simd or another library?

Maybe I am overthinking this, and we should just move forward since there is a contributor willing to do the work. I do think this should hit the mailing list.

kawakami-k · 2023-01-25T01:09:30Z

@mattip
Thank you for letting me know about the previous discussions. I would consider discussing this on the mailing list.

yamadafuyuka · 2023-01-25T02:19:01Z

@mattip
Thank you very much for your comment.
I would consider it with @kawakami-k .

Mousius · 2023-05-18T14:04:44Z

@mattip is it worth re-visiting this as the universal intrinsics work is likely to be fairly long lived (#23603 has been open for a month now with no activity)? SLEEF could provide some short-term boost though I don't think it handles errors correctly from my initial look at it.

mattip · 2023-05-19T05:59:51Z

I don't think SLEEF is a step in the right direction, I think we should close this PR.

yamadafuyuka changed the title ~~ENH: Add support SLEEF~~ ENH: Add support SLEEF for transcendental functions Jan 23, 2023

seiko2plus self-assigned this Jan 25, 2023

seiko2plus added the component: SIMD Issues in SIMD (fast instruction sets) code or machinery label Jan 25, 2023

seiko2plus removed their assignment Feb 8, 2023

carlkl mentioned this issue Mar 31, 2023

Options for implementing a quadruple-precision dtype #14574

Open

Mousius mentioned this issue May 23, 2023

ENH: float64 tan using Numpy intrinsics #23603

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ENH: Add support SLEEF for transcendental functions #23068

ENH: Add support SLEEF for transcendental functions #23068

yamadafuyuka commented Jan 23, 2023

mattip commented Jan 23, 2023

Uh oh!

kawakami-k commented Jan 23, 2023 •

edited

Loading

Uh oh!

mattip commented Jan 23, 2023

Uh oh!

yamadafuyuka commented Jan 24, 2023 •

edited

Loading

Uh oh!

yamadafuyuka commented Jan 24, 2023 •

edited

Loading

Uh oh!

mattip commented Jan 24, 2023

Uh oh!

kawakami-k commented Jan 25, 2023

Uh oh!

yamadafuyuka commented Jan 25, 2023

Uh oh!

Mousius commented May 18, 2023

Uh oh!

mattip commented May 19, 2023

Uh oh!

Uh oh!

ENH: Add support SLEEF for transcendental functions #23068

ENH: Add support SLEEF for transcendental functions #23068

Comments

yamadafuyuka commented Jan 23, 2023

mattip commented Jan 23, 2023

Uh oh!

kawakami-k commented Jan 23, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mattip commented Jan 23, 2023

Uh oh!

yamadafuyuka commented Jan 24, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yamadafuyuka commented Jan 24, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mattip commented Jan 24, 2023

Uh oh!

kawakami-k commented Jan 25, 2023

Uh oh!

yamadafuyuka commented Jan 25, 2023

Uh oh!

Mousius commented May 18, 2023

Uh oh!

mattip commented May 19, 2023

Uh oh!

kawakami-k commented Jan 23, 2023 •

edited

Loading

yamadafuyuka commented Jan 24, 2023 •

edited

Loading

yamadafuyuka commented Jan 24, 2023 •

edited

Loading