Thanks to visit codestin.com
Credit goes to github.com

Skip to content

BUG: segmentation fault running nan_to_num on a 3D complex array #25959

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
scottstanie opened this issue Mar 7, 2024 · 12 comments
Closed

BUG: segmentation fault running nan_to_num on a 3D complex array #25959

scottstanie opened this issue Mar 7, 2024 · 12 comments
Labels
00 - Bug component: SIMD Issues in SIMD (fast instruction sets) code or machinery

Comments

@scottstanie
Copy link

Describe the issue:

I'm getting a segmentation fault using np.nan_to_num on a certain array. I'd like to attach it here somehow, but github doesn't let me attach binary files (it's about 10MB).

Reproduce the code example:

import numpy as np
block = np.load("nan_check.npy")
np.nan_to_num(block)

Error message:

>>> np.nan_to_num(block)
Segmentation fault: 11

Python and NumPy Versions:

>>> import sys, numpy; print(numpy.__version__); print(sys.version)
1.26.2
3.11.6 | packaged by conda-forge | (main, Oct  3 2023, 10:37:07) [Clang 15.0.7 ]

Runtime Environment:

>>> import numpy; print(numpy.show_runtime())
[{'numpy_version': '1.26.2',
  'python': '3.11.6 | packaged by conda-forge | (main, Oct  3 2023, 10:37:07) '
            '[Clang 15.0.7 ]',
  'uname': uname_result(system='Darwin', node='MT-317120', release='22.6.0', version='Darwin Kernel Version 22.6.0: Sun Dec 17 22:12:45 PST 2023; root:xnu-8796.141.3.703.2~2/RELEASE_ARM64_T6000', machine='arm64')},
 {'simd_extensions': {'baseline': ['NEON', 'NEON_FP16', 'NEON_VFPV4', 'ASIMD'],
                      'found': ['ASIMDHP'],
                      'not_found': ['ASIMDFHM']}},
 {'architecture': 'VORTEX',
  'filepath': '/Users/staniewi/miniconda3/envs/mapping-311/lib/libopenblas.0.dylib',
  'internal_api': 'openblas',
  'num_threads': 10,
  'prefix': 'libopenblas',
  'threading_layer': 'openmp',
  'user_api': 'blas',
  'version': '0.3.25'},
 {'filepath': '/Users/staniewi/miniconda3/envs/mapping-311/lib/libomp.dylib',
  'internal_api': 'openmp',
  'num_threads': 10,
  'prefix': 'libomp',
  'user_api': 'openmp',
  'version': None}]
None

Context for the issue:

I have tried narrowing down the array to something as small as possible, but when I limit to subsets, the segfault goes away.

I've also tried just creating something with np.full(shape, 1j * np.nan), but that also doesn't error.

I ran $ xxd nan_check_smaller.bin binary_contents.txt and tried to see if there was some malformed number, but all the data in the array looks to have the same '0000 c0ff 0000 c07f 0000 c0ff 0000 c07f' content.

I'm running this on an M1 Macbook with numpy 1.26. If I install another environment with 1.24.4, I don't get the segfault.

@ngoldbaum
Copy link
Member

Can you run python under faulthandler and/or lldb to get a traceback for the segfault?

@scottstanie
Copy link
Author

$ cat test_nan.py
import numpy as np
np.nan_to_num(np.load('nan_check_smaller.npy'))
$ python test_nan.py
Segmentation fault: 11
$ lldb python test_nan.py
(lldb) run
Process 23842 launched: '/Users/staniewi/miniconda3/envs/mapping-311/bin/python' (arm64)
Process 23842 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x157f1c000)
    frame #0: 0x0000000106c93458 _multiarray_umath.cpython-311-darwin.so`FLOAT_isnan + 704
_multiarray_umath.cpython-311-darwin.so`FLOAT_isnan:
->  0x106c93458 <+704>: ld2.4s { v24, v25 }, [x21]
    0x106c9345c <+708>: b      0x106c933f0               ; <+600>
    0x106c93460 <+712>: mov    x21, x0
    0x106c93464 <+716>: ld4.4s { v1, v2, v3, v4 }, [x21], x16
Target 0: (python) stopped.

@scottstanie
Copy link
Author

 python -q -X faulthandler test_nan.py
Fatal Python error: Segmentation fault

Current thread 0x0000000201b76100 (most recent call first):
  File "/Users/staniewi/miniconda3/envs/mapping-311/lib/python3.11/site-packages/numpy/lib/type_check.py", line 514 in nan_to_num
  File "/Users/staniewi/Documents/Learning/OPERA/2024-02-gamma-delivery/smaller/test_nan.py", line 2 in <module>

Extension modules: numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator (total: 13)
Segmentation fault: 11

@ngoldbaum
Copy link
Member

In lldb, execute bt to get a traceback. But seeing that the crash is in isnan is a good clue.

@scottstanie
Copy link
Author

sorry about that! I haven't used lldb before:

(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x14ea8c000)
  * frame #0: 0x00000001051c7458 _multiarray_umath.cpython-311-darwin.so`FLOAT_isnan + 704
    frame #1: 0x0000000105174b38 _multiarray_umath.cpython-311-darwin.so`generic_wrapped_legacy_loop + 40
    frame #2: 0x000000010517c908 _multiarray_umath.cpython-311-darwin.so`execute_ufunc_loop + 1240
    frame #3: 0x000000010517a55c _multiarray_umath.cpython-311-darwin.so`PyUFunc_GenericFunctionInternal + 2604
    frame #4: 0x0000000105178b00 _multiarray_umath.cpython-311-darwin.so`ufunc_generic_fastcall + 3056
    frame #5: 0x00000001000600e0 python`PyObject_Vectorcall + 76
    frame #6: 0x000000010015febc python`_PyEval_EvalFrameDefault + 47116
    frame #7: 0x0000000100164078 python`_PyEval_Vector + 184
    frame #8: 0x00000001000600e0 python`PyObject_Vectorcall + 76
    frame #9: 0x00000001050a79d4 _multiarray_umath.cpython-311-darwin.so`dispatcher_vectorcall + 564
    frame #10: 0x00000001000600e0 python`PyObject_Vectorcall + 76
    frame #11: 0x000000010015febc python`_PyEval_EvalFrameDefault + 47116
    frame #12: 0x000000010015372c python`PyEval_EvalCode + 220
    frame #13: 0x00000001001b953c python`run_mod + 144
    frame #14: 0x00000001001b8f9c python`_PyRun_SimpleFileObject + 1264
    frame #15: 0x00000001001b8058 python`_PyRun_AnyFileObject + 240
    frame #16: 0x00000001001de8f4 python`Py_RunMain + 3128
    frame #17: 0x00000001001df788 python`pymain_main + 1312
    frame #18: 0x0000000100003628 python`main + 56
    frame #19: 0x00000001a60cbf28 dyld`start + 2236

@ngoldbaum
Copy link
Member

No worries, remote debugging is fun. If you can share the problematic file via some other means, that would help too. npy files that don’t contain inline pickle files are safe to share.

@scottstanie
Copy link
Author

Ah right, I realized it's not hard to just make a repo for it. Here you go: https://github.com/scottstanie/numpy-nan-to-num-debug/blob/main/test_nan.py

@seberg
Copy link
Member

seberg commented Mar 8, 2024

Can you share the output arr.__array_interface__ for a crash? Although I think uploading it would be much better. Try compressing it, it might make the file very small (it sounds a bit like it might contain mostly one value).

FWIW, the value you mentioned: value = np.uint64(0x0000c0ff0000c07f).view(np.float64) is 1.048414460685683e-309 a denormal number, which I guess might be related but wasn't enough to repro for me, although I didn't try with the conda-forge build.

@seberg seberg added the component: SIMD Issues in SIMD (fast instruction sets) code or machinery label Mar 8, 2024
@scottstanie
Copy link
Author

Can you share the output arr.__array_interface__ for a crash? Although I think uploading it would be much better. Try compressing it, it might make the file very small (it sounds a bit like it might contain mostly one value).

ah you're right that i should have just zipped it:
nan_check_smaller.npy.zip

FWIW, the value you mentioned: value = np.uint64(0x0000c0ff0000c07f).view(np.float64) is 1.048414460685683e-309 a denormal number

Just noting that the numbers are supposed to be nan + nanj as complex64.
Also that I didn't reproduce the crash running on linux, only Mac.

@seberg
Copy link
Member

seberg commented Mar 8, 2024

Thanks, I can reproduce this with 1.26.2, but not with 1.26.4. So I suspect you should simply upgrade and this is probably a fixed SIMD issue. Although, unfortunately, I can't say I know which PR would have fixed it.

supposed to be nan + nanj

Ah, thought they were complex128. Could be that these are slightly odd NaN values (as they differ for real and imaginary part, but didn't check).

@scottstanie
Copy link
Author

Thanks for checking! Yes I was able to solve this for my own purposes by upgrading. I only reported to help keep the problem from creeping back into future versions

@seberg
Copy link
Member

seberg commented Mar 8, 2024

Aha, nan_to_num operates on real and imaginary part seperatly, so it is the same as slicing and looks like #25243, closing.

Thanks for the report, though.

@seberg seberg closed this as completed Mar 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
00 - Bug component: SIMD Issues in SIMD (fast instruction sets) code or machinery
Projects
None yet
Development

No branches or pull requests

3 participants