Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@mdhaber
Copy link
Contributor

@mdhaber mdhaber commented Dec 31, 2020

Reference issue

gh-9307
Can address brunnermunzel part of gh-9558
Could be used to address gh-9409
gh-9252
supersedes gh-13223
follow-up PR will supersede gh-13215, gh-12916
gh-12143

What does this implement/fix?

This adds a decorator factory _axis_nan_policy_factory that adds axis and nan_policy parameters to functions. Currenty, it is applied only to a few closely related hypothesis tests, and some of the code is tailored to hypothesis tests. However, in previous commits, we explored applying it to many other stats functions. Rather than making sweeping changes in this PR, we decided to apply it to other scipy.stats functions in a series of separate PRs. Few modifications to the decorator are expected to be needed, but because of the peculiarities of some stats functions (e.g. inconsistent existing nan_policy behavior, inconsistent default axis, etc.), it makes sense for each case to get individual attention.

For functions that already have an axis argument (e.g. mannwhitneyu is natively vectorized), the behavior is not changed when all samples are finite. In other cases, the decorator uses np.apply_along_axis and deals with NaNs for each axis-slice as appropriate. We might want to change this to an np.ndenumerate approach in a separate PR, but for now, correctness is more important than speed. As-is, this approach is typically faster than the masked array approach to dealing with NaNs unless the number of axis-slices is high relative to the length of each axis-slice. This suggests that it might make sense to generalize this decorator (slightly) to also handle masked arrays. This may be a path to unifying stats and stats.mstats.

@mdhaber mdhaber added scipy.stats enhancement A new feature or improvement labels Dec 31, 2020
@mdhaber mdhaber changed the title ENH: stats: vectorize two-sample hypothesis tests ENH: stats: vectorize and add nan_policy to two-sample hypothesis tests Jan 1, 2021
@mdhaber mdhaber closed this Jan 11, 2021
@mdhaber mdhaber reopened this Jan 11, 2021
@mdhaber
Copy link
Contributor Author

mdhaber commented Jan 11, 2021

Closed-reopened because there were failures downloading openblas across the board! Seems temporary.

Copy link
Contributor Author

@mdhaber mdhaber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some reminders that were lost due to refactoring:

  • Explore possibility of finding NaNs once for each array rather than once per axis-slice. This would probably require using np.ndenumerate instead of the np.apply_along_axis approach
  • I had notes in this PR about how to generalize to functions with different numbers of outputs and generalizing to axis tuples; I've preserved those locally.
  • We had discussed that when this is merged I will create a meta-issue about the axis and nan_policy arguments of all stats/mstats functions to keep track of which functions do/don't have these arguments, which use this wrapper, which are natively vectorized, etc.

import inspect


def _broadcast_array_shapes_remove_axis(arrays, axis=None):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These first three functions were in stats.py before, but this new file is the perfect place for them.

# When the two are combined, it can be tricky to get all the behavior just
# right. This file contains utility functions useful for scipy.stats functions
# that support `axis` and `nan_policy`, including a decorator that
# automatically adds `axis` and `nan_policy` arguments to a function.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had discussed adding a _decorators,py file or _util.py file. You had suggested a more specific name than _util.py, and I thought that we needed more than one decorator for a _decorators.py. I think a good compromise is a file just for axis and nan_policy utilities, of which we have several now.

default='propagate')


def _axis_nan_policy_factory(result_object, default_axis=0,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Appearances of _vectorize_hypotest have been changed to _axis_nan_policy as this also adds nan_policy and it is not specific to hypothesis tests. Other related refactorings have been done in the test suite.

return True
return False

def axis_nan_policy_decorator(hypotest_fun_in):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't rename all appearances of hypotest yet. Currently, this is only applied to hypothesis tests, so it's not wrong.

return var_win


def _broadcast_concatenate(xs, axis):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to _axis_nan_policy.py.

@mdhaber mdhaber changed the title ENH: stats: vectorize and add nan_policy to hypothesis tests with decorator ENH: stats: vectorize and add nan_policy to functions with decorator Aug 22, 2021
@mdhaber mdhaber changed the title ENH: stats: vectorize and add nan_policy to functions with decorator ENH: stats: add axis and nan_policy parameters to functions with decorator Aug 22, 2021
@mdhaber
Copy link
Contributor Author

mdhaber commented Aug 26, 2021

@tupui I think this is ready from my side. I've begun to draft a spreadsheet that will help us keep track of applying it to other stats functions, and can open an issue about this (as we discussed above) when this is merged. Thanks for your help here! It's a long road ahead but I think it will have a big impact in making the behavior of scipy.stats (and possibly mstats) more consistent.

Copy link
Member

@tupui tupui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM and CI is green, great PR Matt! Let's see how it goes in practice 😃

@tupui tupui added this to the 1.8.0 milestone Aug 27, 2021
@tupui tupui merged commit e742ae1 into scipy:master Aug 27, 2021
@mdhaber
Copy link
Contributor Author

mdhaber commented Aug 27, 2021

Thanks for the review @tupui! I'm working on gh-14651, the issue to track progress.

@tupui
Copy link
Member

tupui commented Aug 27, 2021

Mmmm yes about the issue template... I saw a few problems like this one and also for bugs you cannot add a description, just a code block. I wanted to wait a little bit more for the new contributor who updated this to do a follow up. But maybe it's long enough and I should just do it quick.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement A new feature or improvement scipy.stats

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants