Thanks to visit codestin.com
Credit goes to github.com

Skip to content

BUG: segfault when array with dtype=np.float32 is sliced then squared #25231

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
RoryMB opened this issue Nov 22, 2023 · 4 comments · Fixed by #25243
Closed

BUG: segfault when array with dtype=np.float32 is sliced then squared #25231

RoryMB opened this issue Nov 22, 2023 · 4 comments · Fixed by #25243
Labels
00 - Bug component: SIMD Issues in SIMD (fast instruction sets) code or machinery

Comments

@RoryMB
Copy link

RoryMB commented Nov 22, 2023

Describe the issue:

The code example creates a large array, sets the dtype to np.float32 and slices, then segfaults fairly consistently upon squaring the result.

Smaller array size values (e.g. 1024 * 1024 * 2, which produces only a 32MB structure) are less likely to segfault, but still crash often.

Things I tried that DID cause segfaults:
np.zeros((1024*1024*64, 2)).astype(np.float32)[:, 1]**2
np.zeros((1024*1024*2, 2)).astype(np.float32)[:, 1]**2
np.zeros((1024*1024*64, 2), dtype=np.float32)[:, 1]**2
np.ones((1024*1024*64, 2), dtype=np.float32)[:, 1]**2
np.zeros((1024*1024*64*2)).reshape((-1, 2)).astype(np.float32)[:, 1]**2

Things I tried that DID NOT cause segfaults:
np.zeros((1024*1024*64, 2)).astype(np.float32)[:, 0]**2
np.zeros((1024*1024*64, 2)).astype(np.float32)[:, 1]
np.zeros((1024*1024*64, 2)).astype(np.float32)**2
np.zeros((1024*1024*64, 2))[:, 1]**2

Reproduce the code example:

import numpy as np
np.zeros((1024 * 1024 * 64, 2)).astype(np.float32)[:, 1]**2

Error message:

zsh: segmentation fault  python sf.py

Runtime information:

M1 Max MacBook Pro
NumPy installed through pip install -U numpy

>>> numpy.show_runtime()
[{'numpy_version': '1.26.2',
  'python': '3.10.4 (v3.10.4:9d38120e33, Mar 23 2022, 17:29:05) [Clang 13.0.0 '
            '(clang-1300.0.29.30)]',
  'uname': uname_result(system='Darwin', node='MacBook-Pro.local', release='23.1.0', version='Darwin Kernel Version 23.1.0: Mon Oct  9 21:27:24 PDT 2023; root:xnu-10002.41.9~6/RELEASE_ARM64_T6000', machine='arm64')},
 {'simd_extensions': {'baseline': ['NEON', 'NEON_FP16', 'NEON_VFPV4', 'ASIMD'],
                      'found': ['ASIMDHP'],
                      'not_found': ['ASIMDFHM']}},
 {'architecture': 'armv8',
  'filepath': '/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/numpy/.dylibs/libopenblas64_.0.dylib',
  'internal_api': 'openblas',
  'num_threads': 10,
  'prefix': 'libopenblas',
  'threading_layer': 'pthreads',
  'user_api': 'blas',
  'version': '0.3.23.dev'}]

Context for the issue:

No response

@hvsesha
Copy link

hvsesha commented Nov 23, 2023

@RoryMB In windows i tried and we are not getting any segfault error .Any Advice

@seberg seberg added the component: SIMD Issues in SIMD (fast instruction sets) code or machinery label Nov 23, 2023
@seberg
Copy link
Member

seberg commented Nov 23, 2023

Thanks for the report! I am not immediately sure what is wrong. @seiko2plus can you have a quick look.
The slicing is important (you need to access the last element to trigger presumably. The size may be important because small arrays are allocated in arenas, so out-of-bound access won't trigger errors reliably.

The lldb backtrace seems pretty clear, I would suspect we access one element too many, but at this point it is still a guess.

>>> np.zeros((1024 * 1024 * 64, 2)).astype(np.float32)[:, 1]**2
Process 58393 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x2a0000000)
    frame #0: 0x0000000103dd7aa4 _multiarray_umath.cpython-310-darwin.so`FLOAT_square at memory.h:57:16 [opt]
   54  	{
   55  	    switch (stride) {
   56  	    case 2:
-> 57  	        return vld2q_s32((const int32_t*)ptr).val[0];
   58  	    case 3:
   59  	        return vld3q_s32((const int32_t*)ptr).val[0];
   60  	    case 4:
Target 0: (python) stopped.
warning: _multiarray_umath.cpython-310-darwin.so was compiled with optimization - stepping may behave oddly; variables may not be available.
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x2a0000000)
  * frame #0: 0x0000000103dd7aa4 _multiarray_umath.cpython-310-darwin.so`FLOAT_square at memory.h:57:16 [opt]
    frame #1: 0x0000000103dd7aa4 _multiarray_umath.cpython-310-darwin.so`FLOAT_square [inlined] npyv_loadn_f32(ptr=<unavailable>, stride=2) at memory.h:81:9 [opt]
    frame #2: 0x0000000103dd7aa4 _multiarray_umath.cpython-310-darwin.so`FLOAT_square at loops_unary_fp.dispatch.c.src:130:35 [opt]

@seberg seberg added this to the 1.26.3 release milestone Nov 23, 2023
@seberg
Copy link
Member

seberg commented Nov 23, 2023

Ahh, squinting at it vld2q_s32 loads two vectors and de-interleaves them. So the result is ((x[0], x[2]), (x[1], x[3]))[0] giving x[0], x[2]. That is what we want for the strided access, here.

But, x[3] is potentially out-of-bound (unless we peel the loop). (Well, something like this anyway)

@seiko2plus
Copy link
Member

Thank you @RoryMB, and @seberg for demonstrating this issue. Assuming the alignment of 32-bit stride over non-contiguous memory access is kind of naive, my bad. #25243 should fix this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
00 - Bug component: SIMD Issues in SIMD (fast instruction sets) code or machinery
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants