SIMD: [Doubt] Doubts on dispatch and usage of npy functions #17925
Labels
33 - Question
Question about NumPy usage or development
component: SIMD
Issues in SIMD (fast instruction sets) code or machinery
Uh oh!
There was an error while loading. Please reload this page.
Hi @seiko2plus and @Qiyu8.
I am posting this from NumPy slack:
numpy/numpy/core/include/numpy/libdivide/libdivide.h
Lines 29 to 35 in e538e11
Just to bypass this, I am doing it in loops.c.src, doesn't look right though:
Say I have an array
npy_long*
of 11 elements. Each element is 8 bytes.One
__m128i
can hold 2npy_longs
. I can usenpyv_load_s64
to load into a__m128i
(i.e.npyv_s32
).q2.1: How to best handle the last case with one extra element? Do we use
npyv_load_till_s64
?q2.2: How to load result(
npyv_s32
) into anpy_long*
pointer back?npyv_store_s32
npyv_store_s64
[EDIT1] is giving me some junk values on write. It's probably just with the way I am sliding but wanted to know what is the right way to do it. [EDIT2] I forgot to reduce the loop length :).Example that gave junk:
npyv_storen_s32
used with strides?cc: @seberg @mattip
The text was updated successfully, but these errors were encountered: