Thanks to visit codestin.com
Credit goes to github.com

Skip to content

ENH: stats: consistent nan_policy, axis, masked array, and dtype support #14651

@mdhaber

Description

@mdhaber

gh-13312 introduced a decorator that makes it easy to add nan_policy and axis parameters to almost all reduction stats functions (those that "consume" samples along an axis), that is, almost all "summary statistics", "statistical tests", and - if we want - "correlation functions". If we are happy with the decorator, with minor modification it could handle masked arrays by manually eliminating masked elements as each axis-slice is processed, which in many cases would be faster than the existing mstats implementations.(Done.)

The purpose of this issue is to track progress toward a (much) more consistent axis/nan_policy experience throughout scipy.stats and toward adding masked array support to scipy.stats functions. Consistent handling of zero-length slices and empty input is also in scope.

This issue is geared toward reduction functions. When those are in better shape, perhaps this can be used to track other functions (e.g. see gh-8669).

For now, I'll just link to a spreadsheet I started that summarizes the status of nan_policy and axis support for stats functions.
https://docs.google.com/spreadsheets/d/1yBhu3Ihy9_xhDh5N9GgoNVyFbUHKWvt_lCuHm5BZ06I/edit?usp=sharing

Should this be converted to markdown? Should I allow anyone to edit the spreadsheet? Or can you think of a better way to track this @tupui? (agreed to stick with the spreadsheet for now)

Other improvement we discussed for the _axis_nan_policy decorator:

Related issues/PRs:
gh-2178
gh-2324
gh-4086
gh-5432
gh-5474 (would be irrelevant if all stats functions accept masked arrays)
gh-6416
gh-6551
gh-6654
gh-7178
gh-7342
gh-9307
gh-9558
gh-9409
gh-9252
gh-11790
gh-11409
gh-11355
gh-12143
gh-12241
gh-12548
gh-12916
gh-13223
gh-13215
gh-13844
gh-13900
gh-14421
gh-14651
gh-14725
gh-15375
gh-15630
gh-15660
gh-17154
gh-17288
gh-19039

Other information:
The decorator does some introspection to check whether axis and nan_policy are already parameters. Whatever is already accepted, it continues to accept it (as a positional or keyword argument - however it is currently allowed ). For whatever is not already accepted, the decorator adds a keyword-only argument and updates the documentation.

If axis is already a parameter, it uses that existing behavior when there are no NaNs. That is, it takes advantage of the function's native vectorization for efficiency when possible. If there are NaNs or if axis is not already a parameter, it uses np.applyalongaxis to loop over the axis-slices. It overrides any existing nan_policy behavior, manually removing NaNs of each axis-slice instead of using masked arrays.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions