-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
ENH: Adding __array_ufunc__ capability to MaskedArrays #21977
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This enables any ufunc numpy operations that are called on a MaskedArray to use the masked version of that function automatically without needing to resort to np.ma.func() calls.
This test makes sure that a MaskedArray defers properly to another class if it doesn't know how to handle it. See numpy#15200.
Just to reiterate my subjective summary of the comments from the original PR: In general, we should probably aim to split masked arrays out of numpy into a separate library. That would require a lot of work, so in the mean time this PR moves in the direction of localizing masked array operations into one area, instead of needing to have lots of |
Thanks @greglucas |
SciPy ran into an issue, and I suspect it is due to this: scipy/scipy#16610 The code sample is this failing, which was not failing before:
|
Adapted from the problem reported at numpy#21977 (comment)
Fixes the problem reported at numpy#21977 (comment) The reduce method here effectively calls itself with an unmasked MaskedArray (mask=nomask) and then expects either a MaskedArray or a scalar. This change ensures that an ordinary ndarray is converted to a MaskedArray, following the pattern already used in mean and var in this module.
Unfortunately, I'm on vacation again and won't have much access to a computer this next week. I don't think this will be too hard to fix though, and still think it would be better to improve this than revert... At first glance this might be something as simple as I agree with your assessment in the other thread that a few more eyes would be valuable on this. My guess is there might be some other downstream packages that this hits as well and I may have missed some corner cases. I'm pretty sure Matplotlib is hitting this same issue too in that weekly job testing Numpy main: matplotlib/matplotlib#23431 |
I have proposed a fix for the |
Might have broken |
If this is a problem (and we can confirm its this PR), we could even revert it (again). For this larger change, it really makes sense to require checking against astropy first, since that is one of the largest users to begin with. I like having |
We are using Lines 3152 to 3156 in fe2bb38
|
@rcomer I think we might need the opposite and allow everything. That is I think the design here may be wrong, and what
So in that sense, I am not sure the design is quite right. |
The weird thing about that is, that there should be an inherent order to things. If Quantity also allows everything, then So maybe that is not right, and since @rcomer so I think you may be right, and this should just be a strict type check for (Yes, we could auto-wrap more than just ndarray, but I guess we have to make that a problem of downstream, and I doubt that it matters in practice.) |
outputs = kwargs.pop('out', ()) | ||
if not isinstance(outputs, tuple): | ||
outputs = (outputs,) | ||
outputs += inputs[np_ufunc.nin:] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is also not necessary, all outputs are normalized into out
by the ufunc machinery here.
Just to quickly note that the failures in astropy are with our On a more general note, since the way That said, in my own |
else: | ||
# The masked power function doesn't support extra args | ||
if np_ufunc.__name__ in ('power'): | ||
kwargs = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be an error instead?
@@ -3292,7 +3384,7 @@ def _scalar_heuristic(arr, elem): | |||
return dout | |||
else: | |||
# Force dout to MA | |||
dout = dout.view(type(self)) | |||
dout = MaskedArray(dout) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This breaks getting items on subclasses:
from astropy.table import MaskedColumn
mc = MaskedColumn([1., 2.], mask=[True, False])
mc[:1]
Out[4]:
masked_BaseColumn(data=[--],
mask=[ True],
fill_value=1e+20,
dtype=float64)
This should return a MaskedColumn
...
There are also some real regressions: e.g.,
while in previous versions this correctly gave
|
@@ -4200,7 +4314,7 @@ def __add__(self, other): | |||
""" | |||
if self._delegate_binop(other): | |||
return NotImplemented | |||
return add(self, other) | |||
return np.add(self, other) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that for all these, one can now safely remove the overrides, since __array_ufunc__
will take care.
# prevent infinite recursion. | ||
|
||
# Make ndarray views of the input arguments | ||
args = [getdata(input) if isinstance(input, MaskedArray) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One can prevent infinite recursion more easily by just calling the ndarray.__array_ufunc__
, i.e.,
return super().__array_ufunc__(np_ufunc, method, *inputs, **kwargs)
# Determine what class types we are compatible with and return | ||
# NotImplemented if we don't know how to handle them | ||
for arg in args + outputs: | ||
if not isinstance(arg, (ndarray, np.bool_, Number, list, str)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For outputs
, one should be stricter: they need to be ndarray
- indeed, for most functions, they should be MaskedArray
, otherwise the mask cannot be stored.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think one could be a bit less strict on the inputs - after all, the unwrapped call will do its own checks anyway.
I put some in-line comments; I think this may need a bit more thought to get out the wrinkles. Not sure whether it is best to have a PR with fixes or revert and start with a new PR. |
I am tempted to revert (unfortunately), unless @greglucas has bandwidth to look into it soon. |
I agree. I did think the numpy test suite was more comprehensive, but it seems we have a way to go. |
Unfortunately I'm on vacation away from a computer and won't be able to pick it up for a week or so at least and even then it will take me a bit to get back up to speed since the majority of the work was done a couple years ago. Reverting is probably the right move, and I appreciate that the dependent libraries are testing main and can hopefully help identify some good test cases for these updates when I go digging through those failure logs. Thanks for the ping and helpful comments so far @seberg and @mhvk. Probably good for a two review PR on the next one as well just to ensure we've had enough eyes on it. |
Fixes the problem reported at numpy#21977 (comment) The reduce method here effectively calls itself with an unmasked MaskedArray (mask=nomask) and then expects either a MaskedArray or a scalar. This change ensures that an ordinary ndarray is converted to a MaskedArray, following the pattern already used in mean and var in this module.
Adapted from the problem reported at numpy#21977 (comment)
Redo of: #16022 which was merged and then reverted due to a failing docstring test with flattened masks. Thanks to @rcomer for adding the final commit that fixed this case. That PR has a lot of discussion in it for background too.
This enables any ufunc numpy operations that are called on a MaskedArray to use the masked version of that function automatically without needing to resort to np.ma.func() calls directly.
Example
Even though data values in
a
are masked, they are still acted upon and evaluated. Changing that underlying data. However, calling the masked less version doesn't do anything with the data in the array. Note that this is a big change but to a somewhat ambiguous case of what happens to masked values under function evaluation.The real reason I began looking into this was to not evaluate the function in the masked values because I'd already premasked the condition that would throw warnings at me. See #4959 for evaluating log10 at locations less than 0 that were under the mask, so still threw warnings.
Implementation
I haven't added any documentation yet because I wanted to check and see if this was even the right way of approaching this idea. I struggled to figure out how to pass the
out=
object pointers around in a consistent manner, so I am definitely interested in suggestions on improving that. I had to update quite a few seemingly random areas of the Masked codebase to make that work everywhere. There are a few tests that I had to update because values under the mask were being compared, and those have now changed with this implementation applying the masked version of ufuncs everywhere.This is a pretty major (underlying) API change, so I expect there to be quite a bit of discussion on the best approaches here.
Linked Issues
Fixes #4959
Closes #15200