-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
ENH: stats: add axis and nan_policy parameters to functions with decorator
#13312
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Closed-reopened because there were failures downloading openblas across the board! Seems temporary. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some reminders that were lost due to refactoring:
- Explore possibility of finding NaNs once for each array rather than once per axis-slice. This would probably require using
np.ndenumerateinstead of thenp.apply_along_axisapproach - I had notes in this PR about how to generalize to functions with different numbers of outputs and generalizing to axis tuples; I've preserved those locally.
- We had discussed that when this is merged I will create a meta-issue about the
axisandnan_policyarguments of all stats/mstats functions to keep track of which functions do/don't have these arguments, which use this wrapper, which are natively vectorized, etc.
| import inspect | ||
|
|
||
|
|
||
| def _broadcast_array_shapes_remove_axis(arrays, axis=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These first three functions were in stats.py before, but this new file is the perfect place for them.
| # When the two are combined, it can be tricky to get all the behavior just | ||
| # right. This file contains utility functions useful for scipy.stats functions | ||
| # that support `axis` and `nan_policy`, including a decorator that | ||
| # automatically adds `axis` and `nan_policy` arguments to a function. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We had discussed adding a _decorators,py file or _util.py file. You had suggested a more specific name than _util.py, and I thought that we needed more than one decorator for a _decorators.py. I think a good compromise is a file just for axis and nan_policy utilities, of which we have several now.
| default='propagate') | ||
|
|
||
|
|
||
| def _axis_nan_policy_factory(result_object, default_axis=0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Appearances of _vectorize_hypotest have been changed to _axis_nan_policy as this also adds nan_policy and it is not specific to hypothesis tests. Other related refactorings have been done in the test suite.
| return True | ||
| return False | ||
|
|
||
| def axis_nan_policy_decorator(hypotest_fun_in): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't rename all appearances of hypotest yet. Currently, this is only applied to hypothesis tests, so it's not wrong.
| return var_win | ||
|
|
||
|
|
||
| def _broadcast_concatenate(xs, axis): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved to _axis_nan_policy.py.
axis and nan_policy parameters to functions with decorator
|
@tupui I think this is ready from my side. I've begun to draft a spreadsheet that will help us keep track of applying it to other stats functions, and can open an issue about this (as we discussed above) when this is merged. Thanks for your help here! It's a long road ahead but I think it will have a big impact in making the behavior of scipy.stats (and possibly mstats) more consistent. |
tupui
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM and CI is green, great PR Matt! Let's see how it goes in practice 😃
|
Mmmm yes about the issue template... I saw a few problems like this one and also for bugs you cannot add a description, just a code block. I wanted to wait a little bit more for the new contributor who updated this to do a follow up. But maybe it's long enough and I should just do it quick. |
Reference issue
gh-9307
Can address
brunnermunzelpart of gh-9558Could be used to address gh-9409
gh-9252
supersedes gh-13223
follow-up PR will supersede gh-13215, gh-12916
gh-12143
What does this implement/fix?
This adds a decorator factory
_axis_nan_policy_factorythat addsaxisandnan_policyparameters to functions. Currenty, it is applied only to a few closely related hypothesis tests, and some of the code is tailored to hypothesis tests. However, in previous commits, we explored applying it to many other stats functions. Rather than making sweeping changes in this PR, we decided to apply it to otherscipy.statsfunctions in a series of separate PRs. Few modifications to the decorator are expected to be needed, but because of the peculiarities of some stats functions (e.g. inconsistent existingnan_policybehavior, inconsistent defaultaxis, etc.), it makes sense for each case to get individual attention.For functions that already have an
axisargument (e.g.mannwhitneyuis natively vectorized), the behavior is not changed when all samples are finite. In other cases, the decorator uses np.apply_along_axis and deals with NaNs for each axis-slice as appropriate. We might want to change this to annp.ndenumerateapproach in a separate PR, but for now, correctness is more important than speed. As-is, this approach is typically faster than the masked array approach to dealing with NaNs unless the number of axis-slices is high relative to the length of each axis-slice. This suggests that it might make sense to generalize this decorator (slightly) to also handle masked arrays. This may be a path to unifyingstatsandstats.mstats.