-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
ENH: Adding __array_ufunc__ capability to MaskedArrays (again) #22914
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This enables any ufunc numpy operations that are called on a MaskedArray to use the masked version of that function automatically without needing to resort to np.ma.func() calls.
Fixes the problem reported at numpy#21977 (comment) The reduce method here effectively calls itself with an unmasked MaskedArray (mask=nomask) and then expects either a MaskedArray or a scalar. This change ensures that an ordinary ndarray is converted to a MaskedArray, following the pattern already used in mean and var in this module.
Adapted from the problem reported at numpy#21977 (comment)
Now we are calling np.power() in std() which goes through the ufunc machinery, so we don't want to pass any additional unsafe casting kwargs that aren't allowed within the masked implementation.
Move the np-ufunc check to the top of the routine so we immediately go to super() when necessary. Before we were returning NotImplemented if an arg wasn't able to be handled. Update the arg instance check to defer for everything but another class that has implemented __array_ufunc__
This allows for subclasses to be handled correctly
This is handled in the C code now within the ufunc machinery.
1bf272f
to
366dfc3
Compare
@greglucas - the |
Actually, I quickly tried to implement
The reason is that if the function is not defined in |
p.s. Might want to check test completeness by checking all functions in |
if ma_ufunc is np_ufunc: | ||
# We didn't have a Masked version of the ufunc, so we need to | ||
# call the ndarray version to prevent infinite recursion. | ||
return super().__array_ufunc__(np_ufunc, method, *inputs, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You will need to unwrap the data arrays here and then call things again. On the plus side, it may actually make sense to always just do that.
Also, you should not use super()
here, but rather call the original ufunc I think, because after unwrapping it could be that you need to dispatch to a different argument.
There is the small change that the ufuncs behave a bit different from the np.ma.ufunc
version (and maybe even better?): see gh-22347.
So there might even be a point of just ignoring what np.ma.func
does.
After the fallout from the "simple" gh-22046, I am a bit hesitant about changing these subtleties, we probably need to test very carefully and I am wondering if we should push it of for a "2.0" (which I hope to bring up now for in a year). In that case, a possible approach could be to hide these definitions behind an environment variable (feature flag) maybe.
Maybe worth pulling @seberg's comment in the main thread: a question is whether the regular numpy functions should just call the But probably it is easier to just add the few missing ufunc to |
Thanks for pointing me to that function! Somewhat surprisingly, there are actually 30 ufuncs that have no implementation in I went digging through some of the old mailing list archives that I wasn't aware of and came across this really nice discussion: My goal for this was to have a bit of a stepping stone before going all-in on a new class. However, I am now possibly seeing that this might just be playing a game of whack-a-mole that I'm sure will change some behavior and upset someone, so maybe we should just do a hard-breaking-change all at once instead of introduce these small papercuts. (i.e. work on the better implementation for a 2.0 release rather than adding these now) I think one of the things missing from many of the discussions is that many downstream libraries make use of My desires from a new class:
I haven't seen anyone argue for this yet, and this looks like it goes way back in git blame.
It would be annoying to get divide by zero warnings on your computations when you've already pre-masked the data (see my example at the top of the PR for the motivation here!)
It is quite surprising right now to do
Should fall out of (3) if implemented properly with NotImplemented being returned. I'll try and take a better look at @ahaldane's implementation in nd_duck_array and @mhvk's implementation in Astropy. I think right now we are missing having a clear path between where we are now and where we want to go, so something a little more concrete with actions of what is needed might be nice. I'll try and give it some thought. |
Thanks, that's a great summary! There is only one item I'd add: think carefully how to deal with underlying Overall, it certainly is not crazy to move in small steps! I think a bigger question here is whether when wrapping regular numpy |
Any news here? |
This is a rebase and update of several previous PRs: #21977 and #16022, which were both reverted due to issues downstream. I've kept a bunch of smaller commits showing the updates to comments to try and make it easier for review.
I've tested with Astropy and there are still two failing tests over there, which I don't quite understand yet, so I'm hoping a review might be able to point out some hints for what I'm missing. It looks like we do enter
MaskedColumn.__array_wrap__
, but with acontext=None
, so that wrap isn't undoing anything because it doesn't know which function it came from... I'm not sure if this means the update needs to be handled in numpy or astropy.cc @rcomer, @seberg, @mhvk since you've all had some great comments and help on the previous PRs!
Summary
This enables any ufunc numpy operations that are called on a MaskedArray to use the masked version of that function automatically without needing to resort to np.ma.func() calls directly.
Example
Even though data values in
a
are masked, they are still acted upon and evaluated. Changing that underlying data. However, calling the masked less version doesn't do anything with the data in the array. Note that this is a big change but to a somewhat ambiguous case of what happens to masked values under function evaluation.The real reason I began looking into this was to not evaluate the function in the masked values because I'd already premasked the condition that would throw warnings at me. See #4959 for evaluating log10 at locations less than 0 that were under the mask, so still threw warnings.
Implementation
I haven't added any documentation yet because I wanted to check and see if this was even the right way of approaching this idea. I struggled to figure out how to pass the
out=
object pointers around in a consistent manner, so I am definitely interested in suggestions on improving that. I had to update quite a few seemingly random areas of the Masked codebase to make that work everywhere. There are a few tests that I had to update because values under the mask were being compared, and those have now changed with this implementation applying the masked version of ufuncs everywhere.This is a pretty major (underlying) API change, so I expect there to be quite a bit of discussion on the best approaches here.
Linked Issues
Fixes #4959
Closes #15200