Numpy has some wrappers for data allocation that call into the tracemalloc C API. For example, here's the wrapper around `malloc`: https://github.com/numpy/numpy/blob/f6440be7b8eec4a6481832f15f6730d984d78ef0/numpy/_core/src/multiarray/alloc.c#L255-L271 Recently a [stackoverflow question](https://stackoverflow.com/questions/79851420/multithreading-becomes-much-slower-than-multiprocessing-in-free-threaded-python#79851420) led me to report a [numpy issue](https://github.com/numpy/numpy/issues/30494) about poor multithreaded scaling. I think the bulk of the scaling bottleneck is due to [the global mutex in the tracemalloc implementation](https://github.com/python/cpython/blob/09044dd42b50e628b197afb2979afcbe49d4b83f/Python/tracemalloc.c#L35-L40), as you can see in the flame graph and profile in the linked NumPy issue. From the NumPy issue: > On my M3 Macbook Pro, I get the following stdout running the script: > >``` >Inner loops 10, multithreading time: 6.68 sec, result sum: 717434683.1879175 >Inner loops 10, multiprocessing time: 4.86 sec, result sum: 717434683.1879175 >``` @Yhg1s told me on Discord that he has a patch that adds a fast path to tracemalloc based on an atomic flag and that seems to help a lot. <!-- gh-linked-prs --> ### Linked PRs * gh-143065 * gh-143066 * gh-143071 <!-- /gh-linked-prs -->
Numpy has some wrappers for data allocation that call into the tracemalloc C API. For example, here's the wrapper around
malloc:https://github.com/numpy/numpy/blob/f6440be7b8eec4a6481832f15f6730d984d78ef0/numpy/_core/src/multiarray/alloc.c#L255-L271
Recently a stackoverflow question led me to report a numpy issue about poor multithreaded scaling. I think the bulk of the scaling bottleneck is due to the global mutex in the tracemalloc implementation, as you can see in the flame graph and profile in the linked NumPy issue.
From the NumPy issue:
@Yhg1s told me on Discord that he has a patch that adds a fast path to tracemalloc based on an atomic flag and that seems to help a lot.
Linked PRs
tracemallocC-APIs when it is not enabled (GH-143065) #143071