-
-
Notifications
You must be signed in to change notification settings - Fork 11k
ENH: Add support SLEEF for transcendental functions #23068
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
We are trying to move towards using universal intrinsics inside NumPy, so I am not sure we would want to mix in a whole new library with a different paradigm. What version of NumPy did you test? On what platform? |
Hi, @mattip The motivation of this issue is to improve the calculation speed of transcendental functions such as sin/cos/tan/log2/log10/exp and etc. For x64 on Linux, NumPy can be built with SVML and calculation is vectorized. In my understanding, for non-x64 CPUs, compilers links NumPy with SLEEF is one of the vectorized mathematical library. It supports multiple architecture as show in Table 1.1. Becuase the function names of SLEEF follow its naming convention, it is easy to abstract function name and write source code for multiple architectures/multiple instruction sets. The below is a function name example.
NumPy has the universal intrinsic for multiple ISAs, so I think there is a way to use this to implement transcendental functions in a unified way. However, it would be time-consuming and difficult to implement various transcendental functions. I think it would be a good idea to divert SLEEF. Since transcendental function processing is vectorized by SLEEF, the expected performance gain will be close to N, where N means the number of SIMD lanes. In practice, the gain will be smaller than N due to Python and other overhead. Thank you. |
What version of NumPy did you test to get your performance graphs? On what platform? We have already moved some of these functions to universal intrinsics, which is why I ask for exact platform and version information. It would be great if you could report |
Thank you for your comment. Sorry for the late reply.
|
I am sorry for the insufficient explanation. For the functions defined in numpy/numpy/core/src/umath/loops_umath_fp.dispatch.c.src Lines 216 to 238 in 172a194
|
We discussed approaches to using SIMD intrinsics in NEP 38. Specifically, we have a section about code enhancements. We did not really apply that section in the discussion to add SVML (PR #19478) other than to note
There was a brief mention of SLEEF in that PR, but we did not consider using SLEEF instead/in addition to SVML. Looking back over the mailing list, there is the discussion in 2015 mentioned in the SVML PR, and a recent mail from Chris Sidebottom about an effort to target aarch64. I am not sure how I feel about integrating yet another vendored library for accelerated operations. On the one hand, we already have precedent with SVML. Integrating SLEEF would improve performance for other platforms. On the other, SLEEF's sources are twice as large as SVML, and the scope is larger. Would we then declare that we are not going to move these functions to universal intrinsics? What would we do with the code from #17587, #18101, and more? Could we do something more generic so that people who wished to could switch out SVML entirely, or use VOLK (GPL3) or simd or another library? Maybe I am overthinking this, and we should just move forward since there is a contributor willing to do the work. I do think this should hit the mailing list. |
@mattip |
@mattip |
I don't think SLEEF is a step in the right direction, I think we should close this PR. |
Functions such as sin and log use libm except for
AVX512_SKX
, and at least in my environment SIMD instruction were not used.Therefore, I added implementation to use SIMD library SLEEF ( https://sleef.org/ ) and measured the calculation time of some functions.
My branch: ( https://github.com/yamadafuyuka/numpy/tree/add_SLEEF )
I graphed the results. We also confirmed that using SVE intrinsics as in ( PR-22265 ) further speeds up (the log10 function is about 4 times faster).
I would like to add SLEEF support, but I am not sure which part of NumPy is the best place to implement it. Could you please advise?
The text was updated successfully, but these errors were encountered: