Thanks to visit codestin.com
Credit goes to github.com

Skip to content

BUG: SIGABRT / heap corruption when two threads operate on basic-slice views of a 2-D StringDType array #31415

@jeromekelleher

Description

@jeromekelleher

Describe the issue:

Concurrently calling astype('S<N>') (on one thread) and a fancy indexing expression (on another) against basic-slice views of the same 2-D StringDType array reliably aborts the process with the glibc message:

The futex facility returned an unexpected error code.
Fatal Python error: Aborted

Both operations are documented as read-only with respect to the source array, and the two threads only ever touch shared state through read-only access — they never mutate the original array. The expected behavior is that this completes successfully (the GIL is held throughout both ops). Instead, the heap is corrupted and the next mutex operation inside glibc's malloc trips its sanity check.

The bug only surfaces when:

  • the source StringDType array is 2-D, and
  • the data mixes empty and non-empty strings (i.e. some packed entries are inline, others are arena-resident),

1-D arrays and arrays whose strings all fit inline did not crash within the same wall-clock window in testing.

Reproduce the code example:

# The script below crashes the interpreter within ~20 seconds (5/5 runs
# on my machine). 

import faulthandler
import threading
import time

import numpy as np

faulthandler.enable(all_threads=True)

n_rows = 163_000
n_cols = 4
data = np.array(
    [
        [f"A_{r}_{c}" if (r + c) % 2 else "" for c in range(n_cols)]
        for r in range(n_rows)
    ],
    dtype=np.dtypes.StringDType(),
)


def reader1(stop):
    chunk_size = 1000
    while not stop.is_set():
        for start in range(0, n_rows, chunk_size):
            view = data[start : start + chunk_size]
            view[:, 0].astype("S16")
            view[:, 1:].astype("S16")
            if stop.is_set():
                return


def reader2(stop):
    chunk_size = 1000
    sel = np.arange(0, chunk_size)
    while not stop.is_set():
        for start in range(0, n_rows, chunk_size):
            view = data[start : start + chunk_size]
            _ = view[sel]
            if stop.is_set():
                return


stop = threading.Event()
t1 = threading.Thread(target=reader1, args=(stop,))
t2 = threading.Thread(target=reader2, args=(stop,))
t1.start()
t2.start()
time.sleep(20)
stop.set()
t1.join()
t2.join()
print("survived")

Error message:

The futex facility returned an unexpected error code.
Fatal Python error: Aborted

Thread 0x00007846c5dc96c0 (most recent call first):
  File "<string>", line 24 in reader2
  File ".../threading.py", line 982 in run
  ...

Current thread 0x... (most recent call first):
  File "<string>", line 20 in reader1
  File ".../threading.py", line 982 in run
  ...

Extension modules: numpy._core._multiarray_umath, numpy.linalg._umath_linalg
Aborted (core dumped)


The frame at `line 20` is inside `reader1` (specifically
`view[:, 0].astype("S16")`); the frame at `line 24` is inside `reader2`
(`_ = view[sel]`). Different runs can show either thread as `Current`.

Python and NumPy Versions:

NumPy: 2.4.4
Python: 3.11.15 (CPython, Clang 21.1.4 build)
Platform: Linux 6.8.0-111-generic, x86_64
glibc: (Ubuntu's; pthread futex abort path)
NumPy BLAS: scipy-openblas 0.3.31 (probably irrelevant here)

Runtime Environment:

single-process, multi-threaded; no multiprocessing, no numpy.threading_local
GIL held by both threads throughout the op (no with nogil)

How does this issue affect you or how did you find it:

This bug led to a reliable crash in my application. I have worked around by forcing each chunk to own its allocator before any thread touches it:

view = data[start : start + chunk_size].copy()

After the .copy() each chunk's StringDType array reports flags.owndata == True and base is None — a brand-new allocator instance. The script then runs indefinitely without aborting.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions