-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
ENH: add locking around initializing the argparse cache #26430
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I just got feedback that using the private pyatomic API isn't a good idea and I should rewrite this to use C11 atomics directly. I'll also need to see if that causes issues on Windows since MSVC added C11 atomics in visual studio 2022, and several of the windows builds use visual studio 2019. |
If we want to upgrade to 2022, starting the process now would be a good idea. @h-vetinari Are there constraints on doing this? |
If you need additional atomics beyond this, then I'd consider figuring out how to use VS Code 2022. Otherwise, I'd rearchitect this a bit as follows: In each You also have a single So in pseudo-code it would look something like: #ifdef __STDC_NO_ATOMICS__
#define atomic_int volatile int
#else
#include <stdatomic.h>
#endif
static PyThread_type_lock global_cache_lock;
typedef struct {
atomic_int initialized;
...
} _NpyArgParserCache;
// cache initialization
if (!cache->initialized) {
PyThread_acquire_lock(global_cache_lock, 1);
if (!cache->initialized) {
initialize_keywords(cache);
cache->initialized = 1;
}
PyThread_release_lock(global_cache_lock);
}
// module initialization
global_cache_lock = PyThread_allocate_lock(); EDIT: fixed bug in pseudo-code |
VS2019 is now EOL, so you'd have every right to move on. I'm guessing MSFT will also start removing it from public CI offerings in the not-too-distant future (if they follow a similar timeline as for VS2019), at which point it becomes a moot question anyway. What it means is that all people wanting to compile against compiled numpy artefacts (in the past that was mostly And in this case, it would probably also force conda-forge to move on, as numpy has ~1000 compiled dependencies, and at that point it becomes a pretty unavoidable. I can bring this up in the core call, but my personal opinion is that you should not make suboptimal engineering choices just to support EOL toolchains (obviously a question of degree, but certainly if the win is substantial). As VS2022 is ABI-compatible with VS2019 and a drop-in replacement, I really don't see much of an issue arising from the toolchain version virality (I didn't hear of any relevant case in the transition from VS2017->VS2019 either, though I asked around for feedback). CC @rgommers |
Not necessarily, the only large user of the
There is a path and a mostly working prototype to remove
If we have significant needs here for supporting the free-threaded build, then I guess we have to bite the bullet.
@ngoldbaum since we are not using atomics now, please keep in mind that we will need build config changes before this PR goes in. |
I tried using a single global mutex initialized during module setup, but I found that spawned threads saw uninitialized mutexes. Maybe this issue is specific to |
f265014
to
9293a4d
Compare
9293a4d
to
9a3b8cc
Compare
It turns out the issue I was seeing was caused by initializing the lock in the |
FWIW, we discussed this in conda-forge/core, and I invited people to comment here if they care - AFAICT there was no opposition or concern about numpy using VS2022. We might even be able to stay on vs2019, because under the hood, we're using vs2022 already, but targeting the vc142 toolset. Will depend how that shakes out, but in any case, don't worry about conda-forge w.r.t. this. |
@rgommers I never tried running CI on this PR without the build changes and I know you were interested in taking a look. Should I try to see if it's actually necessary? I know we do test via QEMU on ARM with gcc so at least a few of the emulated tests should fail without the libatomic detection if it is necessary. |
That may be interesting to do once indeed, yes. Now that I look at this again: the reason for the |
@@ -12,6 +12,16 @@ | |||
|
|||
#include "arrayfunction_override.h" | |||
|
|||
static PyThread_type_lock global_mutex; | |||
|
|||
int init_argparse_mutex() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be: int init_argparse_mutex(void)
?
/* Null terminated list of keyword argument name strings */ | ||
PyObject *kw_strings[_NPY_MAX_KWARGS+1]; | ||
} _NpyArgParserCache; | ||
|
||
NPY_NO_EXPORT int init_argparse_mutex(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
init_argparse_mutex(void);
@@ -214,6 +214,43 @@ else | |||
lapack_dep = declare_dependency(dependencies: [lapack, blas_dep]) | |||
endif | |||
|
|||
# Determine whether it is necessary to link libatomic. This could be the case | |||
# e.g. on 32-bit platforms when atomic operations are used on 64-bit types. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RISC-V with GCC is weird in that it supports 64-bit atomic ops without -latomic
, but not 8-bit or 16-bit ops.
So in CPython we had to adjust the test to check for both 64-bit and 8-bit atomic operations:
https://github.com/python/cpython/blob/406ffb5293a8c9ca315bf63de1ee36a9b33f9aaf/configure.ac#L7417-L7422
If you are only using atomic_int
then it might not matter.
@@ -4,6 +4,13 @@ | |||
#include <Python.h> | |||
#include "numpy/ndarraytypes.h" | |||
|
|||
#ifdef __STDC_NO_ATOMICS__ | |||
#define atomic_int volatile int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Caveat: this is "good enough" for x86 MSVC and also Window's "ARM64EC" platform target, but not other ARM64 targets on Windows.
If we need to support other Windows ARM64 targets, we'll need a bit more work. Hopefully, MSVC will support <stdatomic.h>
soon, but I wouldn't hold my breath for it.
https://learn.microsoft.com/en-us/cpp/build/reference/volatile-volatile-keyword-interpretation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for clarifying! We don't test or build wheels for windows ARM64 right now, but I think we might need to support _M_ARM64
(see e.g. #22530). According to the MSVC docs, ARM64EC
sets _M_X64
.
I see pyatomic_msc.h
has:
static inline uint64_t
_Py_atomic_load_uint64(const uint64_t *obj)
{
#if defined(_M_X64) || defined(_M_IX86)
return *(volatile uint64_t *)obj;
#elif defined(_M_ARM64)
return (uint64_t)__ldar64((unsigned __int64 volatile *)obj);
#else
# error "no implementation of _Py_atomic_load_uint64"
#endif
}
And storing is implemented using e.g. _InterlockedCompareExchange64
from intrin.h
.
That all seems reasonable to include in NumPy to support this.
Even with the experimental atomics support in MSVC 2022, they still set _ STDC_NO_ATOMICS _
, so we'd need to go out of our way to detect and use it.
Following up in #26780 |
The argparse cache relies on a struct that gets statically defined in each function using
npy_parse_arguments
. The struct caches the interned string names of keyword arguments. Up until now, the GIL prevented data races while initializing the cache. Nothing prevents the same python function from being simultaneously called in multiple threads in the free-threaded build, so we need something a little more sophisticated there.It turns out that the symbols in the up-until-now privatepyatomic.h
header are now exported byPython.h
starting in Python 3.13. I've used some of atomic operations to implement an internal-only single initialization API,npy_call_once
and then use that API in the argument parser.There are some other caches like this, so I've factored things so that I can re-usenpy_call_once
elsewhere in the codebase. Because thepyatomic
symbols are only exposed starting in Python3.13, I've decided to only make use ofnpy_call_once
in the free-threaded build, everything is guarded by preprocessor checks forPy_GIL_DISABLED
and this should be a no-op change for the regular build.I re-implemented this using a single atomic flag and a
PyThread_type_lock
mutex, per Sam's suggestion.