Thanks to visit codestin.com
Credit goes to github.com

Skip to content

ENH: Support free-threaded python build (tracking issue) #26157

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
16 tasks done
ngoldbaum opened this issue Mar 28, 2024 · 32 comments
Closed
16 tasks done

ENH: Support free-threaded python build (tracking issue) #26157

ngoldbaum opened this issue Mar 28, 2024 · 32 comments
Assignees
Labels
39 - free-threading PRs and issues related to support for free-threading CPython (a.k.a. no-GIL, PEP 703) Tracking / planning

Comments

@ngoldbaum
Copy link
Member

ngoldbaum commented Mar 28, 2024

Myself, @lysnikolaou, and @rgommers are working on a new project in collaboration with the Python Runtime team at Meta to add support for Python 3.13 nogil builds in the scientific python software stack.

Our initial minimal goal is a build of NumPy that runs and passes all the tests in a nogil build of Python.

Resources:

@ogrisel
Copy link
Contributor

ogrisel commented Apr 9, 2024

Maybe another interesting resource is the past work done by Sam Gross to patch numpy to make it work well enough with the original nogil fork of CPython 3.9: v1.24.0...colesbury:numpy:v1.24.0-nogil

Things might have drifted a bit since that time, but those patches are enough to get the scikit-learn test suite pass with nogil-CPython 3.9: see e.g. this run of the Linux_nogil pylatest_pip_nogil job that is scheduled run on a nightly basis on our Azure Pipelines CI.

@rgommers rgommers added the 39 - free-threading PRs and issues related to support for free-threading CPython (a.k.a. no-GIL, PEP 703) label Apr 10, 2024
@ngoldbaum
Copy link
Member Author

ngoldbaum commented Apr 16, 2024

It looks like numpy's main branch passes the full numpy test suite with the gil disabled. I've been testing using the nogil-integration branch in @colesbury's fork of CPython, but i tried just now using the CPython main branch and the test suite passes with the GIL disabled there as well.

There are still a ton of threading bugs I'm sure because of limited use of threads in the tests - we need better test coverage for threaded workflows. I'm already aware of a number of internal caches we'll likely need to disable or add locking for.

But, it's a good start!

@a-reich
Copy link

a-reich commented Apr 26, 2024

Just curious, are contributors planning to work on #24755 or is there any overlap where work for nogil would also help with allowing subinterpreters?

@ngoldbaum
Copy link
Member Author

ngoldbaum commented Apr 26, 2024

While this does have some overlap with work on subinterpreters since the C thread safety issues we're working on apply to subinterpreters as well, the bulk of the work required to support subinterpreters - converting numpy's extension types from single-phase to multi-phase initialization - is not required and we are not currently planning to do that.

@ngoldbaum ngoldbaum changed the title ENH: Support nogil python build (tracking issue) ENH: Support free-threaded python build (tracking issue) May 21, 2024
@ngoldbaum
Copy link
Member Author

ngoldbaum commented May 28, 2024

There are now cp313t nightly wheels up for manylinux and musllinux at our normal spot for nightlies. Windows wheels will need to wait on resolving some windows-specific build issues and Mac wheels will need to wait for 3.13.0b2, which will have mac binaries cibuildwheel can use.

To install the wheel on a free-threaded build of python you'll need the prerelease of pip 24.1, installable with e.g. pip install -U --pre pip. The current stable release doesn't know about the new ABI tag and will install the wheel for the cp313 ABI which will then fail to import at runtime because of the ABI mismatch.

@rgommers
Copy link
Member

rgommers commented Jun 5, 2024

lapack_lite uses static globals all over the place and is not thread safe. Maybe we should require a real BLAS on the free-threaded build?

I think we should do that, at least for now. It may not be tenable in the very long term, but for the coming 1-2 years at least, everyone who is building for free-threaded CPython should know how to install a BLAS library.

There's no dedicated method for "is this a free-threaded build" yet in Meson. Something like this should do it though I think (untested):

diff --git a/numpy/meson.build b/numpy/meson.build
index 7e9ec5244c..ee9450ddc5 100644
--- a/numpy/meson.build
+++ b/numpy/meson.build
@@ -56,6 +56,11 @@ endif
 blas_name = get_option('blas')
 lapack_name = get_option('lapack')
 allow_noblas = get_option('allow-noblas')
+if py.get_variable('Py_GIL_DISABLED', fallback: '0') == '1'
+  # We don't allow using `lapack_lite` in the free-threaded build, it's unsafe (see gh-26157)
+  allow_noblas = false
+endif
+
 # This is currently injected directly into CFLAGS/CXXFLAGS for wheel builds
 # (see cibuildwheel settings in pyproject.toml), but used by CI jobs already
 blas_symbol_suffix = get_option('blas-symbol-suffix')

@rgommers
Copy link
Member

rgommers commented Jun 5, 2024

As for global state, it'd be good to check numpy.random. There are nice parallel features for users that know what they are doing (see docs), but the worry is more about single-threaded-like usage:

  • I'm not sure that it's safe to call create a Generator instance, create threads, then call methods from multiple threads when there is no GIL. There is pretty careful lock usage in random/_generator.pyx and calls that advance the bitgenerator state look atomic, but how sure are we that that's actually the case?
  • The legacy API (np.random.* functions and RandomState) is implemented in the same fashion and so uses locks as well, but probably there are some separate concerns (e.g., calling seed() from one thread while generating numbers in another).

@rkern do you happen to know off the top of your head what the state is?

@rkern
Copy link
Member

rkern commented Jun 5, 2024

Not sure. GILlessness didn't exist at the time of the design, so it surely didn't enter our mental model when thinking about it. We did want to support with nogil: blocks that just call the C functions that draw data from and advance the BitGenerator state, so that is where the careful lock usage comes from. I don't know if that's going to be sufficient for other removals of the GIL where we were expecting there to be a GIL in place in the past.

@rkern
Copy link
Member

rkern commented Jun 5, 2024

And I haven't gotten my head around free-threaded Python yet to have a mental model to reason about it at this time.

@ngoldbaum
Copy link
Member Author

ngoldbaum commented Jun 5, 2024

I'll try to take a look with an eye toward thread safety. I'm not very familiar with the RNG internals so it'll be a good excuse to poke around. I might ping the mailing list or open an issue if I have questions to give @rkern a chance to help clarify things :)

@ngoldbaum
Copy link
Member Author

I looked at the np.random cython code today and I think we're fine without any new major updates.

All existing usages of the lowlevel bitgen state are guarded by acquiring a lock. We can probably make all of this faster using PyThread_type_lock directly or when it's available, using PyMutex.

The SeedSequence state is generated inside __init__, so no need to lock there because __init__ will only ever be called on a single thread for a single instance. Any data races are also present with the GIL and require doing silly things like spawning child seed sequences from the same parent sequence in multiple threads.

@aaronzo
Copy link

aaronzo commented Jun 28, 2024

Are there plans to release free-threaded builds of numpy 1.x? Or will this effort be strictly 2.x

@ngoldbaum
Copy link
Member Author

ngoldbaum commented Jun 28, 2024

No, there is no plan to do that, the first version of NumPy to support Python 3.13 will either be a 2.0.x version or NumPy 2.1 and the first version to support the free-threaded build will be NumPy 2.1.

@effigies
Copy link
Contributor

Creating a free-threaded CI job over at nipy/nibabel#1339.

Note that the GIL is being enabled by default:

<frozen importlib._bootstrap>:488: RuntimeWarning: The global interpreter lock (GIL) has been enabled to load module 'numpy._core._multiarray_umath', which has not declared that it can run safely without the GIL. To override this behavior and keep the GIL disabled (at your own risk), run with PYTHON_GIL=0 or -Xgil=0.

Disabling it with PYTHON_GIL=0 passes our test suite, but figured I should note it here in case this isn't a known issue. We're installing https://pypi.anaconda.org/scientific-python-nightly-wheels/simple/numpy/2.1.0.dev0/numpy-2.1.0.dev0-cp313-cp313t-manylinux_2_17_x86_64.manylinux2014_x86_64.whl, uploaded 2024-07-10.

@ngoldbaum
Copy link
Member Author

Thanks! That's a known issue. Cython doesn't have a way to add the Py_mod_gil slot to the extension. Hopefully cython nightly grows that feature but if worst comes to worst we might end up using a patched cython to build the numpy wheels we upload to pypi. For now you'll need to set the environment variable or pass -Xgil=0.

@rgommers
Copy link
Member

@ngoldbaum it looks like that Cython PR will land in 1-2 weeks. Using that is only a single flag when we're invoking cython though, while converting the other extension modules outside of numpy/random/ (which don't use Cython) is a larger diff. It'd be great to start using Py_mod_gil in all of those now, so (a) those extension modules get exercised even without -Xgil=0 and (b) when we backport the Cython change to 2.1.x the diff will be small.

@ngoldbaum
Copy link
Member Author

Good point, see #26913

@seberg
Copy link
Member

seberg commented Jul 11, 2024

Since its the tracking issue, mentioning here also: Object arrays should not be fully thread safe since there is no locking when they are mutated So one thread can decref before another incref's the same element (or am I missing some magic?).
There are approaches to this, may need to discuss in a separate issue eventually (maybe there is also some simple solution I don't know yet).

@ngoldbaum
Copy link
Member Author

The Cython PR just got merged so I'll send in the necessary Cython build changes as well whenever the Cython wheel gets regenerated.

@rgommers
Copy link
Member

The Cython PR just got merged so I'll send in the necessary Cython build changes as well whenever the Cython wheel gets regenerated.

In case it helps: you cannot add the flag unconditionally and whether the interpreter is a free-threaded one cannot be (easily) detected in meson.build, so the way to do this is to add the flag unconditionally for Cython >=3.1.0a0. There's already a Cython version check in the top-level meson.build file.

@lysnikolaou
Copy link
Member

lysnikolaou commented Jul 13, 2024

The flag is okay to be added even under a normal (non-free-threaded) build, where it'll basically do nothing. So, yes, the way to go is add the flag unconditionally for the Cython version that supports it.

@ngoldbaum
Copy link
Member Author

I chatted with @colesbury about locking for object arrays today. He thinks we will need some sort of locking or synchronization to fix the threaded object array access issues @seberg is worried about.

Rather than adding a new lock, he suggested using the critical section API on the array object before and after spots we release or acquire the GIL for dtypes that don't need the GIL.

There are quite a few places we use the macros that release and acquire the GIL, but I also think it should be relatively straightforward to grep for those macros and find spots where we have a special check for object arrays so that we don't release the GIL in that case.

In those spots we'd need to use the critical section API. We may also be able to expand usage of NPY_BEGIN_THREADS_DESCR NPY_END_THREADS_DESCR to handle this without adding a bunch of new boilerplate. The one sticking point that comes to mind is the need to have a reference to the array object, and the ArrayMethod API intentionally does not expose access to the array object. I'm not sure yet if that's a problem.

However all that said, I think given the release timing we might just need to document this limitation for the 2.1 release as a known issue. As far as I know this hasn't caused any crashes with pandas or scikit-learn, and they heavily use object string arrays.

@albanD

This comment has been minimized.

@seberg
Copy link
Member

seberg commented Jul 19, 2024

The one sticking point that comes to mind is the need to have a reference to the array object, and the ArrayMethod API intentionally does not expose access to the array object

You would have to walk the base anyway, which isn't super nice, of course you could add an owner field to the additional information that is passed in via the struct, though, if that ends up as the best solution.
I still think the StringDType design is sound to do this via the dtype instance, but without a side-car buffer I dunno if an explicit owner somewhere else might end up easier/cleaner.

There are still some issues and I don't know if they even can be solved fully currently:

  • Even within NumPy you would have to walk the base which is a bit annoying currently (the dtype instance would solve this a bit).
  • Memoryviews and other protocols allow sharing of Python objects but ("immutable" flag says hello), it has no mechanism to pass on the right "owner" a critical section should lock on.
  • It seems the critical section API is limited to two objects, but more objects are possible:
    • Is there a way to fallback to something else?
    • One could potentially do defensive copies. For non-fast paths, I think it would actually make sense to do this in the buffering layer (copying objects is pretty cheap and then you get away with only ever locking a single "owner" at the same time).

Unless there is a fallback like Py_BEGIN_CRITICAL_SECTION(NULL) to say that the owning/critical object is unknown?! (Practically re-enabling the GIL, though, so it seems mainly a good fallback for memory sharing.).

@EwoutH
Copy link
Contributor

EwoutH commented Aug 1, 2024

3.13 rc1 is released! This means the ABI is stable and wheels can be uploaded to PyPI.

Tracking for cibuildwheel: pypa/cibuildwheel#1949

@ngoldbaum
Copy link
Member Author

We're planning to upload cp313t wheels to pypi for Mac and Linux for the NumPy 2.1 release candidate.

Hopefully we'll be able to get Windows wheels up too but that's waiting on resolving some tricky merge conflicts in our meson patches, which will also hopefully help us to upstream the meson patches themselves.

@EwoutH
Copy link
Contributor

EwoutH commented Aug 3, 2024

cibuildwheel now uses the latest Python 3.13.0rc1 and builds Python 3.13 by default (pypa/cibuildwheel#1950). Next cibuildwheel release should be soon.

Free-threaded wheels are still behind a flag (CIBW_FREE_THREADED_SUPPORT), but can also be build and uploaded using the ABI stable Python 3.13.0rc1.

@karlotness
Copy link
Contributor

I may have missed discussion of this elsewhere, but I think there are also thread safety issues with in-place updates to ndarray shape, strides and dtype attributes. Setting these with an array shared between threads can cause inconsistent views of the array layout/size.

Setting those attributes is documented as a bad idea, but with the GIL I'm not sure it could cause crashes (I might be wrong on that).

As an example, on my machine with recent NumPy nightly and Python 3.13.0rc1 the below script segfaults without the GIL and loops until killed with it.

Example Script
import threading
import sysconfig
import sys
import numpy as np

if sysconfig.get_config_var("Py_GIL_DISABLED"):
    print(f"GIL enabled? {sys._is_gil_enabled()}")

arr = np.ones(1024, np.uint64)

def switch_dtypes(arr):
    dt1 = np.uint64
    dt2 = np.uint8
    while True:
        arr.dtype = dt1
        arr.dtype = dt2

def access_item(arr):
    while True:
        arr[-1]

t1 = threading.Thread(target=switch_dtypes, kwargs={"arr": arr})
t2 = threading.Thread(target=access_item, kwargs={"arr": arr})

t1.start()
t2.start()

@ngoldbaum
Copy link
Member Author

ngoldbaum commented Aug 9, 2024

Yup, right now the suggestion is: don't do that.

NumPy isn't thread safe in Python programs and has never been thread safe. Free-threading makes this more acute but it's a pre-existing problem.

My plan for NumPy 2.1 is to document that mutating shared state in multithreaded python code should be done with extreme care and may cause crashes if there are races. This is also true with the GIL. It's certainly easier to cause a crash in the free-threaded build but the same issues about shared mutable state happen if any threads release the GIL and touch shared state.

There are also probably some safety issues around object arrays that are only present in the GIL-disabled build.

We're going to need to think carefully about adding improved locking around ndarray itself. It'll be a big project.

@karlotness
Copy link
Contributor

That makes sense, thanks! I figured that was probably already on the radar somewhere.

The thing that prompted this was reviewing some interactions I have with the NumPy C API where currently I make sure to load array shapes, strides, etc. before releasing the GIL. I mostly just wanted to be sure there isn't currently anything more to be done to protect those. I'll definitely check any updated guidance in the future.

Thanks for the work so far, the experimental build works nicely in the little bit of testing that I've done.

@rgommers
Copy link
Member

If anyone was waiting for Windows wheels: there are nightlies at https://anaconda.org/scientific-python-nightly-wheels/numpy/files now. The next release (2.1.3) will include them on PyPI.

@ngoldbaum
Copy link
Member Author

NumPy 2.1 included initial support for free-threading, so I'm closing this issue (a bit belatedly).

The main followup is #27199, which I'm hoping to have time to work on soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
39 - free-threading PRs and issues related to support for free-threading CPython (a.k.a. no-GIL, PEP 703) Tracking / planning
Projects
None yet
Development

No branches or pull requests