ENH: stats: add axis tuple support to _axis_nan_policy_factory decorators #15257

mdhaber · 2021-12-21T07:23:28Z

Reference issue

gh-14651 (checks the second box)

What does this implement/fix?

gh-13312 added _axis_nan_policy_factory, which returns a decorator that adds axis and nan_policy arguments to stats functions. With the additions in this PR, the decorators produced now support axis arguments that are tuples of integers.

Additional information

This implements the steps discussed in this comment.
To facilitate review, I've left some self-review comments to explain the changes. Feel free to mark them resolved.
I would also suggest viewing the changes in three separate steps.

In 7c3af00, _broadcast_arrays and related functions are moved (verbatim) from _hypotests.py into _axis_nan_policy.py, a better permanent home
In fddaf06, the _broadcast... functions are consolidated, so that the main logic is only in _broadcast_arrays
The remaining commits (select 565d4fd and 4993958) modify _axis_nan_policy_factory to support axis tuples and add tests.

Details

Selecting the last two commits in the last step will also select the first three commits (because of the merge commit).

This should be squash-merged; the commit history is messier than intended.

mdhaber · 2021-12-21T07:42:00Z

scipy/stats/_axis_nan_policy.py

    Broadcast shapes, ignoring incompatibility of specified axes
    """
+    if not shapes:
+        return shapes


If shapes is an empty array, just return it; we don't want an error.

mdhaber · 2021-12-21T07:42:27Z

scipy/stats/_axis_nan_policy.py


    # Remove the shape elements of the axes to be ignored, but remember them.
    if axis is not None:
-        axis = np.atleast_1d(axis)


Now this is already done above.

mdhaber · 2021-12-21T07:44:19Z

scipy/stats/_axis_nan_policy.py

    if axis is not None:
-        axis = np.atleast_1d(axis)
        axis[axis < 0] = n_dims + axis[axis < 0]
+        axis = np.sort(axis)


The was actually needed to fix a bug in the existing _broadcast_shapes, which was supposed to work for axis tuples already - but previously it only worked if the axes were already sorted. test_other_axis_tuples now checks that everything works even when axes are passed in out of order.

mdhaber · 2021-12-21T07:46:45Z

scipy/stats/_axis_nan_policy.py

-        axis = np.atleast_1d(axis)
        axis[axis < 0] = n_dims + axis[axis < 0]
+        axis = np.sort(axis)
+        if axis[-1] >= n_dims or axis[0] < 0:


Note that axes are converted to be all positive above.

mdhaber · 2021-12-21T07:49:25Z

scipy/stats/_axis_nan_policy.py

+            # standardize to always work along last axis
            if axis is None:
                samples = [sample.ravel() for sample in samples]
-                axis = 0


Changed this to axis=-1 below. For axis is None, they're the same.

mdhaber · 2021-12-21T07:49:53Z

scipy/stats/_axis_nan_policy.py

-            elif axis != int(axis):
-                raise ValueError('`axis` must be an integer')


Input validation is handled in _broadcast_arrays now.

mdhaber · 2021-12-21T07:51:06Z

scipy/stats/_axis_nan_policy.py

+                axis = np.atleast_1d(axis)
+                n_axes = len(axis)
+                # move all axes in `axis` to the end to be raveled
+                samples = [np.moveaxis(sample, axis, range(-len(axis), 0))
+                           for sample in samples]
+                shapes = [sample.shape for sample in samples]
+                # New shape is unchanged for all axes _not_ in `axis`
+                # At the end, we append the product of the shapes of the axes
+                # in `axis`. Appending -1 doesn't work for zero-size arrays!
+                new_shapes = [shape[:-n_axes] + (np.prod(shape[-n_axes:]),)
+                              for shape in shapes]
+                samples = [sample.reshape(new_shape)
+                           for sample, new_shape in zip(samples, new_shapes)]
+            axis = -1  # work over the last axis


This is the heart of the changes. It's a little more complicated than first expected because we have to calculate the size of the last axis (all axes in axis raveled) manually; -1 doesn't work for empty arrays.

mdhaber · 2021-12-21T07:52:04Z

scipy/stats/tests/test_axis_nan_policy.py



-@pytest.mark.parametrize(("axis"), range(-2, 2))
+@pytest.mark.parametrize(("axis"), range(-3, 3))


This test (already merged) was supposed to work for this range, but there was a bug before. It's fixed now.

tirthasheshpatel · 2021-12-21T10:52:05Z

scipy/stats/_axis_nan_policy.py

-        raise ValueError("Array shapes are incompatible for broadcasting.")
-    return tuple(new_shape)
+    shapes = _broadcast_shapes(shapes, axis)
+    shape = np.delete(shapes[0], axis)


wouldn't work when axis=None:

In [1]: from from scipy.stats._axis_nan_policy import _broadcast_shapes_remove_axis In [2]: _broadcast_shapes_remove_axis([(3, 9, 2, 5), (3, 9, 2, 5)]) --------------------------------------------------------------------------- IndexError Traceback (most recent call last) <ipython-input-35-66e4d08b298f> in <module> ----> 1 _broadcast_shapes_remove_axis([(3, 9, 2, 5), (3, 9, 2, 5)]) ~/Desktop/scipy_source/scipy/stats/_axis_nan_policy.py in _broadcast_shapes_remove_axis(shapes, axis) 122 """ 123 shapes = _broadcast_shapes(shapes, axis) --> 124 shape = np.delete(shapes[0], axis) 125 return tuple(shape) 126 <__array_function__ internals> in delete(*args, **kwargs) ~/Desktop/scipy_source/scipy-dev/lib/python3.9/site-packages/numpy/lib/function_base.py in delete(arr, obj, axis) 4550 else: 4551 keep = ones(N, dtype=bool) -> 4552 keep[obj,] = False 4553 4554 slobj[axis] = keep IndexError: arrays used as indices must be of integer (or boolean) type

I guess we never use this function unless we want to remove some axes. In that case, it's better to not default axis to None so the signature is just _broadcast_shapes_remove_axis(shapes, axis)

The previous version did work with axis=None. I guess we should preserve that behavior:

def _broadcast_shapes_remove_axis(shapes, axis=None): """ Broadcast shapes, dropping specified axes Same as _broadcast_array_shapes, but given a sequence of array shapes `shapes` instead of the arrays themselves. """ shapes = _broadcast_shapes(shapes, axis) if axis is not None: shapes = shapes[0] shape = np.delete(shapes, axis) return tuple(shape)

Thanks. I'll change these things when we get CI back.

Done! Er, I think you meant:

shapes = _broadcast_shapes(shapes, axis) shape = shapes[0] if axis is not None: shape = np.delete(shape, axis) return tuple(shape)

That's what I did.

Yep, you are right! Although, this would still fail for cases like these:

In [1]: import numpy as np In [2]: from scipy.stats._axis_nan_policy import _check_empty_inputs In [3]: _check_empty_inputs([np.array([]), np.array([])], None) --------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-3-8945ea70ea1a> in <module> ----> 1 _check_empty_inputs([np.array([]), np.array([])], None) ~/Desktop/scipy_source/scipy/stats/_axis_nan_policy.py in _check_empty_inputs(samples, axis) 222 # otherwise, the statistic and p-value will be either empty arrays or 223 # arrays with NaNs. Produce the appropriate array and return it. --> 224 output_shape = _broadcast_array_shapes_remove_axis(samples, axis) 225 output = np.ones(output_shape) * np.nan 226 return output ~/Desktop/scipy_source/scipy/stats/_axis_nan_policy.py in _broadcast_array_shapes_remove_axis(arrays, axis) 111 # ravel arrays before broadcasting. 112 shapes = [arr.shape for arr in arrays] --> 113 return _broadcast_shapes_remove_axis(shapes, axis) 114 115 ~/Desktop/scipy_source/scipy/stats/_axis_nan_policy.py in _broadcast_shapes_remove_axis(shapes, axis) 125 if axis is not None: 126 shape = np.delete(shape, axis) --> 127 return tuple(shape) 128 129 TypeError: 'numpy.int64' object is not iterable

But from the _axis_nan_policy_factory code, it looks like the case of axis=None is never hit. So, I think we don't need to worry about this.

scipy/stats/_axis_nan_policy.py

Kai-Striega · 2021-12-24T00:18:44Z

@mdhaber it's going to take me a while to read back into all this. I'm also going to be busy with family stuff until Christmas is over. I'll should have enough time to give this a thorough review between Christmas and New Years. Is that going to be too late for you?

scipy/stats/tests/test_axis_nan_policy.py

mdhaber · 2021-12-24T07:49:40Z

@Kai-Striega Thanks!

Is that going to be too late for you?

I can't expect other volunteers to work on my timetable! That said, I have some time this week, and I would like to move on to the more exciting part of all this - applying the decorator to more functions and testing those out. Since @tirthasheshpatel reviewed the recent PR and started to review this PR, maybe it's not necessary for you to dig deeply into the guts of the decorator right now, and you could spend your review time in the coming weeks on the next step (applying the decorator to more functions and testing those)?

@tirthasheshpatel Were you pretty happy with this? I made the change that you suggested and fixed a lint error. More of the CI suite is running now than when I submitted, and that's all looking good. What would you prefer Kai work on (reviewing this PR or the application to new functions)?

tirthasheshpatel · 2021-12-24T10:35:53Z

Were you pretty happy with this?

Yeah, this PR looks in a very good shape!

What would you prefer Kai work on (reviewing this PR or the application to new functions)?

I'd agree with you. @Kai-Striega I looked at the internals of the decorator but could use a helping hand in experimenting with the decorator and applying it to stats functions. Feel free to review this PR if you have time, otherwise, it would be great if you could review the PRs that apply the decorator!

mdhaber · 2021-12-24T20:30:57Z

So @tirthasheshpatel is this ready? As soon as this is merged I'll post the gmean PR again.

tirthasheshpatel · 2021-12-25T17:53:28Z

is this ready?

Yes, I was just doing some experiments with tuple axis and it seems to work as expected. I will approve a merge this!

tirthasheshpatel

Test failures are unrelated. Although the decorator has become quite complex, the tests are very strong and seem to validate that it works. So, merging! Thanks @mdhaber!

mdhaber added 7 commits December 17, 2021 19:15

ENH: stats: add masked-array support to _axis_nan_policy decorator

6fd3510

TST: stats: more tests for axis_nan_policy_factory with masked arrays

b21cf45

ENH: stats: add axis tuple support to axis_nan_policy_factory

022f875

MAINT: stats: move _broadcast_arrays etc to appropriate file

7c3af00

MAINT: stats: consolidate _broadcast functions

fddaf06

Merge branch 'tuple_axis' into axis_tuple

565d4fd

MAINT: stats: fix bug in _axis_nan_policy_factory w/ empty arrays

4993958

mdhaber added the scipy.stats label Dec 21, 2021

mdhaber requested review from Kai-Striega and tirthasheshpatel December 21, 2021 07:23

mdhaber changed the base branch from master to maintenance/1.8.x December 21, 2021 07:24

mdhaber requested review from andyfaff, ev-br, larsoner, perimosocordiae, person142, rgommers and tylerjereddy as code owners December 21, 2021 07:24

mdhaber changed the base branch from maintenance/1.8.x to master December 21, 2021 07:25

mdhaber commented Dec 21, 2021

View reviewed changes

mdhaber added the enhancement A new feature or improvement label Dec 21, 2021

mdhaber removed request for perimosocordiae and rgommers December 21, 2021 07:56

mdhaber removed request for andyfaff, ev-br, larsoner, person142 and tylerjereddy December 21, 2021 07:56

This was referenced Dec 21, 2021

ENH: stats: consistent nan_policy, axis, masked array, and dtype support #14651

Open

ENH: stats: add masked array support to _axis_nan_policy_factory decorators #15239

Merged

tirthasheshpatel reviewed Dec 21, 2021

View reviewed changes

mdhaber commented Dec 23, 2021

View reviewed changes

scipy/stats/_axis_nan_policy.py Outdated Show resolved Hide resolved

Update scipy/stats/_axis_nan_policy.py

c7f5bf0

mdhaber commented Dec 24, 2021

View reviewed changes

scipy/stats/tests/test_axis_nan_policy.py Outdated Show resolved Hide resolved

MAINT: stats: maintain behavior of axis=None w/ different shape inputs

f76ff40

tirthasheshpatel approved these changes Dec 25, 2021

View reviewed changes

tirthasheshpatel merged commit cadc4b6 into scipy:master Dec 25, 2021

		elif axis != int(axis):
		raise ValueError('`axis` must be an integer')



		@pytest.mark.parametrize(("axis"), range(-2, 2))
		@pytest.mark.parametrize(("axis"), range(-3, 3))

Uh oh!

ENH: stats: add axis tuple support to _axis_nan_policy_factory decorators #15257

ENH: stats: add axis tuple support to _axis_nan_policy_factory decorators #15257

Uh oh!

Conversation

mdhaber commented Dec 21, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference issue

What does this implement/fix?

Additional information

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mdhaber Dec 21, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mdhaber Dec 21, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Kai-Striega commented Dec 24, 2021

Uh oh!

Uh oh!

mdhaber commented Dec 24, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tirthasheshpatel commented Dec 24, 2021

Uh oh!

mdhaber commented Dec 24, 2021

Uh oh!

tirthasheshpatel commented Dec 25, 2021

Uh oh!

tirthasheshpatel left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mdhaber commented Dec 21, 2021 •

edited

Loading

mdhaber Dec 21, 2021 •

edited

Loading

mdhaber Dec 21, 2021 •

edited

Loading

mdhaber commented Dec 24, 2021 •

edited

Loading