Thanks to visit codestin.com
Credit goes to github.com

Skip to content

SIMD: [Doubt] Doubts on dispatch and usage of npy functions #17925

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ganesh-k13 opened this issue Dec 5, 2020 · 4 comments
Closed

SIMD: [Doubt] Doubts on dispatch and usage of npy functions #17925

ganesh-k13 opened this issue Dec 5, 2020 · 4 comments
Labels
33 - Question Question about NumPy usage or development component: SIMD Issues in SIMD (fast instruction sets) code or machinery

Comments

@ganesh-k13
Copy link
Member

ganesh-k13 commented Dec 5, 2020

Hi @seiko2plus and @Qiyu8.

I am posting this from NumPy slack:

  1. How and when to set a macro at dispatch based on intrinsic, ex: libdivide(Note, that macro is not only for including header, functions are defined based on that below in file):
    #if defined(LIBDIVIDE_AVX512)
    #include <immintrin.h>
    #elif defined(LIBDIVIDE_AVX2)
    #include <immintrin.h>
    #elif defined(LIBDIVIDE_SSE2)
    #include <emmintrin.h>
    #endif
    , now if I set it say in arithmetic.h in sse, its a bit late and is not picked up.
    Just to bypass this, I am doing it in loops.c.src, doesn't look right though:
   23 /* Use Libdivide for faster division */
   24 #ifdef NPY_HAVE_SSE2_INTRINSICS
   25     #define LIBDIVIDE_SSE2 1
   26 #endif
   27 #include "numpy/libdivide/libdivide.h"
  1. What is the best way to do the following with NumPy intrinsics:
    Say I have an array npy_long* of 11 elements. Each element is 8 bytes.
    One __m128i can hold 2 npy_longs. I can use npyv_load_s64 to load into a __m128i(i.e.npyv_s32).
    q2.1: How to best handle the last case with one extra element? Do we use npyv_load_till_s64 ?
    q2.2: How to load result(npyv_s32) into a npy_long* pointer back? npyv_store_s32 npyv_store_s64[EDIT1] is giving me some junk values on write. It's probably just with the way I am sliding but wanted to know what is the right way to do it. [EDIT2] I forgot to reduce the loop length :).
    Example that gave junk:
  891             #ifdef NPY_HAVE_SSE2_INTRINSICS // Just some POC code to try stuff
  892                  is1*=2; // Slide by 2 elements cause of above point
  893                  os1*=2;
  xxx                   n/=2; // [EDIT2], kinda works after this is added.
  894                  BINARY_LOOP_SLIDING {
  895                      npyv_s32 num_as_simdvector = npyv_load_s64((npy_long*)ip1);
  896                      npyv_s32 volatile res = npyv_div_s32(num_as_simdvector, &fast_d1); // Till here works
  897                      npyv_store_s64((npy_long*)op1, res); // [EDIT1] from 32->64
  898                  }
  899             #elif
  1. General question: When is npyv_storen_s32 used with strides?

cc: @seberg @mattip

@mattip
Copy link
Member

mattip commented Dec 5, 2020

It might be nice to add this as an example (once it gets going) to the empty SIMD examples section of the docs

@Qiyu8
Copy link
Member

Qiyu8 commented Dec 7, 2020

  1. How and when to set a macro at dispatch based on intrinsic

I suggest you can replace LIBDIVIDE to NPY_HAVE in order to incorporated into existing dispatching mechanisms. but currently the dispatching way is only recommended in ufunc methods.

  1. q2.1: How to best handle the last case with one extra element?

The npyv_load_till_sXX is recommended when you need to handle extra elements throughout different instruments, please see an example here. but if only one element is remained, you'd better handle it directly.

  1. q2.2: How to load result(npyv_s32) into a npy_long* pointer back?

I think that npyv_store_s64 is the right way to store npyv_s64 into npy_long*,

  1. General question: When is npyv_storen_s32 used with strides?

when you want to store vector into non-contiguous memory, such as odd/even position of an array.

@rgommers rgommers added 33 - Question Question about NumPy usage or development component: SIMD Issues in SIMD (fast instruction sets) code or machinery labels Jan 22, 2022
@mattip
Copy link
Member

mattip commented May 7, 2022

@ganesh-k13 should we change this to a documentation issue or close it?

@ganesh-k13
Copy link
Member Author

We can close this, pretty specific to a use case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
33 - Question Question about NumPy usage or development component: SIMD Issues in SIMD (fast instruction sets) code or machinery
Projects
None yet
Development

No branches or pull requests

4 participants