-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
ENH: Support free-threaded python build (tracking issue) #26157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Maybe another interesting resource is the past work done by Sam Gross to patch numpy to make it work well enough with the original Things might have drifted a bit since that time, but those patches are enough to get the scikit-learn test suite pass with |
It looks like numpy's main branch passes the full numpy test suite with the gil disabled. I've been testing using the There are still a ton of threading bugs I'm sure because of limited use of threads in the tests - we need better test coverage for threaded workflows. I'm already aware of a number of internal caches we'll likely need to disable or add locking for. But, it's a good start! |
Just curious, are contributors planning to work on #24755 or is there any overlap where work for nogil would also help with allowing subinterpreters? |
While this does have some overlap with work on subinterpreters since the C thread safety issues we're working on apply to subinterpreters as well, the bulk of the work required to support subinterpreters - converting numpy's extension types from single-phase to multi-phase initialization - is not required and we are not currently planning to do that. |
There are now To install the wheel on a free-threaded build of python you'll need the prerelease of pip 24.1, installable with e.g. |
I think we should do that, at least for now. It may not be tenable in the very long term, but for the coming 1-2 years at least, everyone who is building for free-threaded CPython should know how to install a BLAS library. There's no dedicated method for "is this a free-threaded build" yet in Meson. Something like this should do it though I think (untested): diff --git a/numpy/meson.build b/numpy/meson.build
index 7e9ec5244c..ee9450ddc5 100644
--- a/numpy/meson.build
+++ b/numpy/meson.build
@@ -56,6 +56,11 @@ endif
blas_name = get_option('blas')
lapack_name = get_option('lapack')
allow_noblas = get_option('allow-noblas')
+if py.get_variable('Py_GIL_DISABLED', fallback: '0') == '1'
+ # We don't allow using `lapack_lite` in the free-threaded build, it's unsafe (see gh-26157)
+ allow_noblas = false
+endif
+
# This is currently injected directly into CFLAGS/CXXFLAGS for wheel builds
# (see cibuildwheel settings in pyproject.toml), but used by CI jobs already
blas_symbol_suffix = get_option('blas-symbol-suffix') |
As for global state, it'd be good to check
@rkern do you happen to know off the top of your head what the state is? |
Not sure. GILlessness didn't exist at the time of the design, so it surely didn't enter our mental model when thinking about it. We did want to support |
And I haven't gotten my head around free-threaded Python yet to have a mental model to reason about it at this time. |
I'll try to take a look with an eye toward thread safety. I'm not very familiar with the RNG internals so it'll be a good excuse to poke around. I might ping the mailing list or open an issue if I have questions to give @rkern a chance to help clarify things :) |
I looked at the All existing usages of the lowlevel bitgen state are guarded by acquiring a lock. We can probably make all of this faster using The |
Are there plans to release free-threaded builds of numpy 1.x? Or will this effort be strictly 2.x |
No, there is no plan to do that, the first version of NumPy to support Python 3.13 will either be a 2.0.x version or NumPy 2.1 and the first version to support the free-threaded build will be NumPy 2.1. |
Creating a free-threaded CI job over at nipy/nibabel#1339. Note that the GIL is being enabled by default:
Disabling it with |
Thanks! That's a known issue. Cython doesn't have a way to add the |
@ngoldbaum it looks like that Cython PR will land in 1-2 weeks. Using that is only a single flag when we're invoking |
Good point, see #26913 |
Since its the tracking issue, mentioning here also: Object arrays should not be fully thread safe since there is no locking when they are mutated So one thread can decref before another incref's the same element (or am I missing some magic?). |
The Cython PR just got merged so I'll send in the necessary Cython build changes as well whenever the Cython wheel gets regenerated. |
In case it helps: you cannot add the flag unconditionally and whether the interpreter is a free-threaded one cannot be (easily) detected in |
The flag is okay to be added even under a normal (non-free-threaded) build, where it'll basically do nothing. So, yes, the way to go is add the flag unconditionally for the Cython version that supports it. |
I chatted with @colesbury about locking for object arrays today. He thinks we will need some sort of locking or synchronization to fix the threaded object array access issues @seberg is worried about. Rather than adding a new lock, he suggested using the critical section API on the array object before and after spots we release or acquire the GIL for dtypes that don't need the GIL. There are quite a few places we use the macros that release and acquire the GIL, but I also think it should be relatively straightforward to grep for those macros and find spots where we have a special check for object arrays so that we don't release the GIL in that case. In those spots we'd need to use the critical section API. We may also be able to expand usage of However all that said, I think given the release timing we might just need to document this limitation for the 2.1 release as a known issue. As far as I know this hasn't caused any crashes with pandas or scikit-learn, and they heavily use object string arrays. |
This comment has been minimized.
This comment has been minimized.
You would have to walk the base anyway, which isn't super nice, of course you could add an owner field to the additional information that is passed in via the struct, though, if that ends up as the best solution. There are still some issues and I don't know if they even can be solved fully currently:
Unless there is a fallback like |
3.13 rc1 is released! This means the ABI is stable and wheels can be uploaded to PyPI. Tracking for cibuildwheel: pypa/cibuildwheel#1949 |
We're planning to upload Hopefully we'll be able to get Windows wheels up too but that's waiting on resolving some tricky merge conflicts in our meson patches, which will also hopefully help us to upstream the meson patches themselves. |
cibuildwheel now uses the latest Python Free-threaded wheels are still behind a flag (CIBW_FREE_THREADED_SUPPORT), but can also be build and uploaded using the ABI stable Python 3.13.0rc1. |
I may have missed discussion of this elsewhere, but I think there are also thread safety issues with in-place updates to ndarray Setting those attributes is documented as a bad idea, but with the GIL I'm not sure it could cause crashes (I might be wrong on that). As an example, on my machine with recent NumPy nightly and Python 3.13.0rc1 the below script segfaults without the GIL and loops until killed with it. Example Scriptimport threading
import sysconfig
import sys
import numpy as np
if sysconfig.get_config_var("Py_GIL_DISABLED"):
print(f"GIL enabled? {sys._is_gil_enabled()}")
arr = np.ones(1024, np.uint64)
def switch_dtypes(arr):
dt1 = np.uint64
dt2 = np.uint8
while True:
arr.dtype = dt1
arr.dtype = dt2
def access_item(arr):
while True:
arr[-1]
t1 = threading.Thread(target=switch_dtypes, kwargs={"arr": arr})
t2 = threading.Thread(target=access_item, kwargs={"arr": arr})
t1.start()
t2.start() |
Yup, right now the suggestion is: don't do that. NumPy isn't thread safe in Python programs and has never been thread safe. Free-threading makes this more acute but it's a pre-existing problem. My plan for NumPy 2.1 is to document that mutating shared state in multithreaded python code should be done with extreme care and may cause crashes if there are races. This is also true with the GIL. It's certainly easier to cause a crash in the free-threaded build but the same issues about shared mutable state happen if any threads release the GIL and touch shared state. There are also probably some safety issues around object arrays that are only present in the GIL-disabled build. We're going to need to think carefully about adding improved locking around ndarray itself. It'll be a big project. |
That makes sense, thanks! I figured that was probably already on the radar somewhere. The thing that prompted this was reviewing some interactions I have with the NumPy C API where currently I make sure to load array shapes, strides, etc. before releasing the GIL. I mostly just wanted to be sure there isn't currently anything more to be done to protect those. I'll definitely check any updated guidance in the future. Thanks for the work so far, the experimental build works nicely in the little bit of testing that I've done. |
If anyone was waiting for Windows wheels: there are nightlies at https://anaconda.org/scientific-python-nightly-wheels/numpy/files now. The next release (2.1.3) will include them on PyPI. |
NumPy 2.1 included initial support for free-threading, so I'm closing this issue (a bit belatedly). The main followup is #27199, which I'm hoping to have time to work on soon. |
Myself, @lysnikolaou, and @rgommers are working on a new project in collaboration with the Python Runtime team at Meta to add support for Python 3.13 nogil builds in the scientific python software stack.
Our initial minimal goal is a build of NumPy that runs and passes all the tests in a nogil build of Python.
Resources:
PEP 703
Fork of numpy 1.24 with early nogil support
Add changelog entry and docs describing the current state of using NumPy in a multithreaded environment.
Convert
PyDict_GetItem
andPyList_GetItem
to use variants returning strong references (MNT: Add linter for thread-unsafe C API uses #26159)TSK: Make NumPy buildable on the nogil branch #26161
Set up CPython 3.13 CI
--disable-gil
build (TST: add basic free-threaded CI testing #26463)Make the test suite pass with the gil disabled
Mark NumPy C extensions with
Py_MOD_GIL_NOT_USED
.Allow building f2py extensions that can be imported without the GIL.
Use PyMutex instead of PyThread_type_lock
Nightly wheels for
cp313t
ABIGlobal state
_multiarray_umath
reorganized into structs: MNT: Reorganize non-constant global statics into structs #26607wrapping_array_method.c
uses a freelist that is certainly not thread safe. It has no internal consumers besides_scaled_float_dtype
to test the implementation so ignoring it for now.NPY_NUMUSERTYPES
, which also unfortunately is part of the public C API. This one may just need to remain unfixed, given it's in a legacy API.np.set_string_function
andPyArray_SetStringFunction
are implemented using static global state.np.set_string_function
implementation should be removed #26576, MNT: Removeset_string_function
#26611NUMPY_WARN_IF_NO_MEMORY_POLICY
is implemented using a global static. The only public interface is via an environment variable read at module initialization but it's tested using a thread-unsafe interface.The text was updated successfully, but these errors were encountered: