-
Notifications
You must be signed in to change notification settings - Fork 31
Add support for free-threaded build #53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Implement thread-safety using Cython 3.1+ critical sections to protect Pool operations (alloc, free, realloc) from race conditions. Update dependencies and CI to support free-threaded builds.
Co-authored-by: Guido Imperiale <[email protected]>
Co-authored-by: Guido Imperiale <[email protected]>
Co-authored-by: Guido Imperiale <[email protected]>
|
While we're at it, can we also add 3.14 and 3.14t to the CI matrix? |
Co-authored-by: Matthew Honnibal <[email protected]>
|
FYI @ngoldbaum discovered there are some intermittent failures when testing with lysnikolaou/test-cymem-threadsafety. I'm having a look at what's going on. I know there's still open discussion items, but even if these are resolved soon, let's not merge this yet. |
|
@lysnikolaou Thanks for the answers on the comments, all makes sense. Please go ahead and merge once the tests are passing, I don't need to re-review. |
ad51310 to
53bb400
Compare
c34f476 to
05e8576
Compare
ngoldbaum
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one comment before merging: I'd rather skip building cp313t wheels than worry about fixing 3.13t-specific bugs if that comes up.
Maybe it also makes sense to change the version to 20.1.0.dev0. This seems like a pretty big change for a bugfix release!
Otherwise this looks great. I tested locally and see no failures after running the test from the test-cymem-threadsafety repo in a bash for loop for 10 minutes with both a normal and TSan-instrumented build.
| skip = "pp* cp36* cp37* cp38*" | ||
| test-skip = "" | ||
| free-threaded-support = false | ||
| enable = ["cpython-freethreading"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the thread safety guarantees we're adding depend critically on details of how critical sections work and we haven't and don't really want to do careful thread safety testing on 3.13t, can you delete this? It's not required to build cp314t wheels.
But do keep the deletion of free-threaded-support = false.
cymem/cymem.pyx
Outdated
| # See comment in alloc on why we're acquiring a critical section on | ||
| # self.addresses instead of self. | ||
| with cython.critical_section(self.addresses): | ||
| self.size -= <size_t>self.addresses.pop(<size_t>p) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would still be better to not use augmented subtraction here as I explained on call because of time-of-access vs time-of-use issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds like something I'd like to be aware of going forward. Is there a link I could read? In a perfect world is there something Cython could support better here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would still be better to not use augmented subtraction here as I explained on call because of time-of-access vs time-of-use issue
The cast to size_t eliminated that problem though, no?
That sounds like something I'd like to be aware of going forward.
The problem mostly relates to Cython calling PyNumber_InPlaceSubtract here, so it does the following:
- Construct a long-object out of
self.size - Call pop
- Call in-place subtract
- Save the result back into
self.size
So there's a lot of stuff happening between self.size being accessed and self.size being used. In general, checking the generated C code and making sure that it avois C API calls and does the expected thing is the way to go. For this specific example, Cython maybe should have done the subtraction in C directly, since size is defined as a size_t. The problem is that self.addresses.pop returns a PyObject * that can be pretty much anything. I'll open an upstream issue with that question.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cast to size_t eliminated that problem though, no?
Yes, but it seems like a Cython specific hack to me. The core issue is because of using augmented assignment, if you avoid using that then the size_t cast won't be needed as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds like something I'd like to be aware of going forward.
Agreed, there's a hot debate going on internally about the general usability issue(s) with critical sections.
Is there a link I could read?
Nothing conclusive yet I think. @kumaraditya303 just drafted a summary of what the issue is (see this gist), as well as this gist about races from augmented assignments - but that will need more follow-up and discussion.
In a perfect world is there something Cython could support better here?
Almost certainly, yes.
|
I think all the concerns have been addressed with the latest version. I stress-tested the latest version using both vanilla 3.14.0t and a build with TSan instrumentation. I didn't see any races or test crashes using @lysnikolaou test repo. Merging - thanks @lysnikolaou! |
Add support for Python 3.13+ free-threaded builds. The
cymem.Poolclass is now thread-safe, allowing concurrent memory operations from multiple threads without external locking.Major changes: