SIMD: [Doubt] Doubts on dispatch and usage of npy functions #17925

ganesh-k13 · 2020-12-05T06:39:07Z

I am posting this from NumPy slack:

How and when to set a macro at dispatch based on intrinsic, ex: libdivide(Note, that macro is not only for including header, functions are defined based on that below in file):

numpy/numpy/core/include/numpy/libdivide/libdivide.h

Lines 29 to 35 in e538e11

    
           #if defined(LIBDIVIDE_AVX512) 
        
               #include <immintrin.h> 
        
           #elif defined(LIBDIVIDE_AVX2) 
        
               #include <immintrin.h> 
        
           #elif defined(LIBDIVIDE_SSE2) 
        
               #include <emmintrin.h> 
        
           #endif

, now if I set it say in arithmetic.h in sse, its a bit late and is not picked up.
Just to bypass this, I am doing it in loops.c.src, doesn't look right though:

   23 /* Use Libdivide for faster division */
   24 #ifdef NPY_HAVE_SSE2_INTRINSICS
   25     #define LIBDIVIDE_SSE2 1
   26 #endif
   27 #include "numpy/libdivide/libdivide.h"

What is the best way to do the following with NumPy intrinsics:
Say I have an array npy_long* of 11 elements. Each element is 8 bytes.
One __m128i can hold 2 npy_longs. I can use npyv_load_s64 to load into a __m128i(i.e.npyv_s32).
q2.1: How to best handle the last case with one extra element? Do we use npyv_load_till_s64 ?
q2.2: How to load result(npyv_s32) into a npy_long* pointer back? ~~npyv_store_s32~~ npyv_store_s64[EDIT1] is giving me some junk values on write. It's probably just with the way I am sliding but wanted to know what is the right way to do it. [EDIT2] I forgot to reduce the loop length :).
Example that gave junk:

  891             #ifdef NPY_HAVE_SSE2_INTRINSICS // Just some POC code to try stuff
  892                  is1*=2; // Slide by 2 elements cause of above point
  893                  os1*=2;
  xxx                   n/=2; // [EDIT2], kinda works after this is added.
  894                  BINARY_LOOP_SLIDING {
  895                      npyv_s32 num_as_simdvector = npyv_load_s64((npy_long*)ip1);
  896                      npyv_s32 volatile res = npyv_div_s32(num_as_simdvector, &fast_d1); // Till here works
  897                      npyv_store_s64((npy_long*)op1, res); // [EDIT1] from 32->64
  898                  }
  899             #elif

General question: When is npyv_storen_s32 used with strides?

cc: @seberg @mattip

The text was updated successfully, but these errors were encountered:

mattip · 2020-12-05T09:10:30Z

It might be nice to add this as an example (once it gets going) to the empty SIMD examples section of the docs

Qiyu8 · 2020-12-07T03:39:43Z

How and when to set a macro at dispatch based on intrinsic

I suggest you can replace LIBDIVIDE to NPY_HAVE in order to incorporated into existing dispatching mechanisms. but currently the dispatching way is only recommended in ufunc methods.

q2.1: How to best handle the last case with one extra element?

The npyv_load_till_sXX is recommended when you need to handle extra elements throughout different instruments, please see an example here. but if only one element is remained, you'd better handle it directly.

q2.2: How to load result(npyv_s32) into a npy_long* pointer back?

I think that npyv_store_s64 is the right way to store npyv_s64 into npy_long*,

General question: When is npyv_storen_s32 used with strides?

when you want to store vector into non-contiguous memory, such as odd/even position of an array.

mattip · 2022-05-07T17:19:03Z

@ganesh-k13 should we change this to a documentation issue or close it?

ganesh-k13 · 2022-05-07T17:26:51Z

We can close this, pretty specific to a use case.

ganesh-k13 mentioned this issue Dec 24, 2020

ENH: libdivide for unsigned integers #18055

Closed

rgommers added 33 - Question Question about NumPy usage or development component: SIMD Issues in SIMD (fast instruction sets) code or machinery labels Jan 22, 2022

ganesh-k13 closed this as completed May 7, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

SIMD: [Doubt] Doubts on dispatch and usage of npy functions #17925

SIMD: [Doubt] Doubts on dispatch and usage of npy functions #17925

ganesh-k13 commented Dec 5, 2020 •

edited

Loading

mattip commented Dec 5, 2020

Uh oh!

Qiyu8 commented Dec 7, 2020

Uh oh!

mattip commented May 7, 2022

Uh oh!

ganesh-k13 commented May 7, 2022

Uh oh!

Uh oh!

SIMD: [Doubt] Doubts on dispatch and usage of npy functions #17925

SIMD: [Doubt] Doubts on dispatch and usage of npy functions #17925

Comments

ganesh-k13 commented Dec 5, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

mattip commented Dec 5, 2020

Uh oh!

Qiyu8 commented Dec 7, 2020

Uh oh!

mattip commented May 7, 2022

Uh oh!

ganesh-k13 commented May 7, 2022

Uh oh!

ganesh-k13 commented Dec 5, 2020 •

edited

Loading