Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@andfoy
Copy link
Contributor

@andfoy andfoy commented Jun 18, 2024

This PR introduces a CI to run the test suite on Linux against the free-threaded distribution of Python 3.13. This effort follows the recently added testing and wheel distributions of NumPy and SciPy.

@rgommers rgommers added the CI Continuous integration label Jun 19, 2024
Copy link
Member

@rgommers rgommers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @andfoy for getting the ball rolling on this! Since this is a single job, I'd like it to be as clean and simple as possible. We need numpy/scipy/cython, install, and run tests. A single TODO for adding matplotlib to the test dependencies once that is ready.

@andfoy
Copy link
Contributor Author

andfoy commented Jun 20, 2024

This is passing now on my fork: https://github.com/andfoy/pywt/actions/runs/9590437998

@andfoy andfoy marked this pull request as ready for review June 20, 2024 02:04
Comment on lines 78 to 83
cython_args = ['-3', '--fast-fail', '--output-file', '@OUTPUT@', '--include-dir', '@BUILD_ROOT@', '@INPUT@']

cython_gen = generator(cython,
arguments : cython_args,
output : '@[email protected]',
depends : _cython_tree)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I copied this pattern over from SciPy, this enables granular control over the flags passed to Cython, in this case it helped me to debug the actual error by setting --gdb

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keeping this unresolved, since it is useful context.

Comment on lines 205 to 208
strides_view = <signed long [:data.ndim]> <signed long *> cA.strides
strides_view = strides_view.copy()
output_info.strides = <pywt_index_t *> &strides_view[0]
output_info.shape = <size_t *> &output_shape[0]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A copy was required here, since cA would go out of scope after redefinition on any of the complex cases

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! That would explain a crash indeed. I'm not quite sure whether this isn't happening also further down, since there is another pattern like that where an ndarray or memory view is overwritten. It's probably protected only by this code that keeps a reference:

        if not trim_approx:
            ret.append((cA, cD))

however, the default for trim_approx is False. It all looks a bit fragile.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you audit the other .pyx files for this pattern?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you audit the other .pyx files for this pattern?

I did this audit, and didn't find a pattern like in this file with an ndarray being overwritten and then being reused in a for-loop.

@rgommers rgommers added this to the v1.7.0 milestone Jun 20, 2024
@rgommers
Copy link
Member

I pushed a couple of tiny cleanups to make the diff even smaller.

Copy link
Member

@rgommers rgommers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The added CI job and the build file changes LGTM now. The issue in the Cython code is actually interesting. It's not clear to me whether:

  1. this code was always buggy, or
  2. it relied on the GIL and hence we only see the crash with free-threaded CPython, or
  3. the code is supposed to work even with free-threaded CPython, and this is a Cython bug at the moment.

I think this is exactly the kind of thing that we may want to use to better test Cython, so rather than fixing up the issue by making a copy (the code for which isn't too pretty), let's make sure to really understand what is going on here.

Could you look at the generated C code with and without the change to _swt.pyx and see what is different? And reduce the issue to the minimal code for a standalone reproducer, if applicable?

@rgommers rgommers dismissed their stale review June 25, 2024 16:17

Requested changes were made

@andfoy
Copy link
Contributor Author

andfoy commented Jun 26, 2024

After a more detailed exploration, I compared the output C files produced by Cython under Python 3.12 and 3.13. In particular, the issue comes from the following trace:

  1. A ndarray strides are obtained by calling PyArray_STRIDES on an array cA.
    output_info.strides = <pywt_index_t *> cA.strides

    For simplification process, lets call this variable raw_strides:
__pyx_v_raw_strides = ((npy_intp *)__pyx_f_5numpy_7ndarray_7strides_strides(__pyx_v_cA));
  1. Then, cA is redefined again by calling empty:
    cA = np.zeros(output_shape, dtype=np.complex128)

Such redefinition implies the creation of an intermediate variable that contains the array, which is then assigned to the original cA variable by means of Py_DECREF:

__Pyx_DECREF_SET(__pyx_v_cA, ((PyArrayObject *)__pyx_t_15));

On Python 3.12, calling Py_DECREF, for some reason, does not affect the values pointed by __pyx_v_raw_strides, which means that the memory allocated to store the strides (or shape) is not being released when an array is redefined.

The bug (or feature) that we are now experiencing on Python 3.13 is that, when the original array object is deleted (as its refcount hits zero), its attributes also are deleted. In this case, since PyArray_STRIDES yields a pointer to the strides values in memory and not a copy, once cA goes out of scope, the values pointed to any of the strides and shape attributes are being released with the object, which was different from the behaviour experienced in 3.12.

The main question here is, should any access to the attributes of an object imply an increase of the refcount, or is this behaviour expected and it should be taken care of explicitly when writing Cython code?

It is important to mention that this happens regardless of the setting of the PYTHON_GIL environment variable.

cc @lysnikolaou @da-woods

@da-woods
Copy link

I suspect this change is because the memory allocator got changed in Python 3.13 (if nothing else uses the memory that strides points to then you may get away with accessing it, even if it's technically not allowed).

I think you need to account for the reference counting explicitly. As far as Cython is concerned strides and shape are just arbitrary int pointers and it knows nothing about the underlying ownership.

@rgommers
Copy link
Member

Thanks for digging in @andfoy, and thanks for confirming @da-woods! Then this code was always technically buggy, we were just lucky that it worked until now.

I'll have a last look at this, and then we'll get this in.

@rgommers rgommers merged commit b14cc9c into PyWavelets:main Jun 26, 2024
@rgommers
Copy link
Member

Merged, nice work @andfoy. Looks like the next step is to start uploading Linux and macOS wheels - would you be able to open a new PR for that?

@andfoy andfoy deleted the add_free_threaded_ci branch June 26, 2024 12:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI Continuous integration

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants