Thanks to visit codestin.com
Credit goes to github.com

Skip to content

BUG: ASAN detects heap-buffer-overflow from numpy.strings.find #28791

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
devdanzin opened this issue Apr 21, 2025 · 2 comments · Fixed by #28804
Closed

BUG: ASAN detects heap-buffer-overflow from numpy.strings.find #28791

devdanzin opened this issue Apr 21, 2025 · 2 comments · Fixed by #28804
Labels
00 - Bug component: numpy.strings String dtypes and functions

Comments

@devdanzin
Copy link

Describe the issue:

ASAN reports a heap-buffer-overflow when calling numpy.strings.find on specific strings.

Reproduce the code example:

import numpy.strings

numpy.strings.find("A" * (2 ** 17), r"[\w]+\Z",)

Error message:

=================================================================
==1211586==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x60300000c610 at pc 0x7ffff76594b5 bp 0x7fffffffbec0 sp 0x7fffffffb668
READ of size 4 at 0x60300000c610 thread T0
    #0 0x7ffff76594b4 in MemcmpInterceptorCommon(void*, int (*)(void const*, void const*, unsigned long), void const*, void const*, unsigned long) ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:861
    #1 0x7ffff7659bc6 in __interceptor_memcmp ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:892
    #2 0x7ffff7659bc6 in __interceptor_memcmp ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:887
    #3 0x7fffb38ff321 in void preprocess<unsigned int>(CheckedIndexer<unsigned int>, long, prework<unsigned int>*) [clone .constprop.0] (/home/danzin/venvs/3.13_upstream_venv/lib/python3.13t/site-packages/numpy/_core/_multiarray_umath.cpython-313t-x86_64-linux-gnu.so+0x2af321)
    #4 0x7fffb3909043 in long string_find<(ENCODING)1>(Buffer<(ENCODING)1>, Buffer<(ENCODING)1>, long, long) (/home/danzin/venvs/3.13_upstream_venv/lib/python3.13t/site-packages/numpy/_core/_multiarray_umath.cpython-313t-x86_64-linux-gnu.so+0x2b9043)
    #5 0x7fffb38fd8c3 in int string_findlike_loop<(ENCODING)1>(PyArrayMethod_Context_tag*, char* const*, long const*, long const*, NpyAuxData_tag*) (/home/danzin/venvs/3.13_upstream_venv/lib/python3.13t/site-packages/numpy/_core/_multiarray_umath.cpython-313t-x86_64-linux-gnu.so+0x2ad8c3)
    #6 0x7fffb38c4905 in ufunc_generic_fastcall (/home/danzin/venvs/3.13_upstream_venv/lib/python3.13t/site-packages/numpy/_core/_multiarray_umath.cpython-313t-x86_64-linux-gnu.so+0x274905)
    #7 0x555555977b31 in _PyObject_VectorcallTstate Include/internal/pycore_call.h:168
    #8 0x555555977c8c in PyObject_Vectorcall Objects/call.c:327
    #9 0x555555d1e173 in _PyEval_EvalFrameDefault Python/generated_cases.c.h:813
    #10 0x555555d514ff in _PyEval_EvalFrame Include/internal/pycore_ceval.h:119
    #11 0x555555d514ff in _PyEval_Vector Python/ceval.c:1816
    #12 0x555555d51726 in PyEval_EvalCode Python/ceval.c:604
    #13 0x555555e908a8 in run_eval_code_obj Python/pythonrun.c:1381
    #14 0x555555e93eda in run_mod Python/pythonrun.c:1466
    #15 0x555555e9421c in pyrun_file Python/pythonrun.c:1295
    #16 0x555555e99bb2 in _PyRun_SimpleFileObject Python/pythonrun.c:517
    #17 0x555555e9a176 in _PyRun_AnyFileObject Python/pythonrun.c:77
    #18 0x555555efde67 in pymain_run_file_obj Modules/main.c:410
    #19 0x555555efe648 in pymain_run_file Modules/main.c:429
    #20 0x555555f022f1 in pymain_run_python Modules/main.c:696
    #21 0x555555f024c5 in Py_RunMain Modules/main.c:775
    #22 0x555555f026ac in pymain_main Modules/main.c:805
    #23 0x555555f02a24 in Py_BytesMain Modules/main.c:829
    #24 0x5555557d6b05 in main Programs/python.c:15
    #25 0x7ffff72ded8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
    #26 0x7ffff72dee3f in __libc_start_main_impl ../csu/libc-start.c:392
    #27 0x5555557d6a34 in _start (/home/danzin/projects/3.13_upstream_cpython/python+0x282a34)

0x60300000c610 is located 20 bytes to the right of 28-byte region [0x60300000c5e0,0x60300000c5fc)
allocated by thread T0 here:
    #0 0x7ffff7679a57 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:154
    #1 0x7fffb3778fad in PyDataMem_UserNEW_ZEROED (/home/danzin/venvs/3.13_upstream_venv/lib/python3.13t/site-packages/numpy/_core/_multiarray_umath.cpython-313t-x86_64-linux-gnu.so+0x128fad)

SUMMARY: AddressSanitizer: heap-buffer-overflow ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:861 in MemcmpInterceptorCommon(void*, int (*)(void const*, void const*, unsigned long), void const*, void const*, unsigned long)
Shadow bytes around the buggy address:
  0x0c067fff9870: 00 00 fa fa 00 00 00 00 fa fa 00 00 00 00 fa fa
  0x0c067fff9880: 00 00 00 04 fa fa 00 00 00 00 fa fa 00 00 00 00
  0x0c067fff9890: fa fa 00 00 00 00 fa fa 00 00 00 00 fa fa 00 00
  0x0c067fff98a0: 00 04 fa fa 00 00 00 00 fa fa 00 00 00 00 fa fa
  0x0c067fff98b0: 00 00 00 00 fa fa 00 00 00 00 fa fa 00 00 00 04
=>0x0c067fff98c0: fa fa[fa]fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c067fff98d0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c067fff98e0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c067fff98f0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c067fff9900: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c067fff9910: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
  Shadow gap:              cc
==1211586==ABORTING

Python and NumPy Versions:

2.2.5
3.13.3+ experimental free-threading build (heads/3.13:d8b90117024, Apr 21 2025, 15:20:00) [GCC 11.4.0]

Runtime Environment:

[{'numpy_version': '2.2.5',
  'python': '3.13.3+ experimental free-threading build '
            '(heads/3.13:d8b90117024, Apr 21 2025, 15:20:00) [GCC 11.4.0]',
  'uname': uname_result(system='Linux', node='LAPTOP-CS6PE5KB', release='5.15.167.4-microsoft-standard-WSL2', version='#1 SMP Tue Nov 5 00:21:55 UTC 2024', machine='x86_64')},
 {'simd_extensions': {'baseline': ['SSE', 'SSE2', 'SSE3'],
                      'found': ['SSSE3',
                                'SSE41',
                                'POPCNT',
                                'SSE42',
                                'AVX',
                                'F16C',
                                'FMA3',
                                'AVX2'],
                      'not_found': ['AVX512F',
                                    'AVX512CD',
                                    'AVX512_KNL',
                                    'AVX512_KNM',
                                    'AVX512_SKX',
                                    'AVX512_CLX',
                                    'AVX512_CNL',
                                    'AVX512_ICL']}},
 {'architecture': 'Haswell',
  'filepath': '/home/danzin/venvs/3.13_upstream_venv/lib/python3.13t/site-packages/numpy.libs/libscipy_openblas64_-6bb31eeb.so',
  'internal_api': 'openblas',
  'num_threads': 8,
  'prefix': 'libscipy_openblas',
  'threading_layer': 'pthreads',
  'user_api': 'blas',
  'version': '0.3.28'}]

Context for the issue:

I have been fuzzing Numpy using fusil by @vstinner. I realize these issues are unlikely to be triggered in normal usage and therefore might be of low priority.

@ngoldbaum
Copy link
Member

I can reproduce this. The overflow is happening here:

assert(p->period + p->cut <= len_needle);
// Compare parts of the needle to check for periodicity.
int cmp;
if (std::is_same<char_type, npy_ucs4>::value) {
cmp = memcmp(needle.buffer,
needle.buffer + (p->period * sizeof(npy_ucs4)),
(size_t) p->cut);
}

The assert probably needs to be converted to a runtime error. Also it looks like the assert isn't accounting for the fact that UCS4 is four bytes per character.

@ngoldbaum ngoldbaum added the component: numpy.strings String dtypes and functions label Apr 23, 2025
@ngoldbaum
Copy link
Member

Also it looks like the assert isn't accounting for the fact that UCS4 is four bytes per character.

Oh no, it's the other way around. Multiplying by sizeof(npy_ucs4) is nonsense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
00 - Bug component: numpy.strings String dtypes and functions
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants