ENH: Adding __array_ufunc__ capability to MaskedArrays #21977

greglucas · 2022-07-13T17:17:43Z

Redo of: #16022 which was merged and then reverted due to a failing docstring test with flattened masks. Thanks to @rcomer for adding the final commit that fixed this case. That PR has a lot of discussion in it for background too.

This enables any ufunc numpy operations that are called on a MaskedArray to use the masked version of that function automatically without needing to resort to np.ma.func() calls directly.

Example

import numpy as np
a = np.ma.array(np.arange(10), [0, 1] * 5)
x = a < 10
# x.data == [True,  True,  True,  True,  True, False, False, False, False, False]
y = np.ma.less(a, 10)
# y.data == [True,  True,  True,  True,  True,  True, False,  True, False, True]

Even though data values in a are masked, they are still acted upon and evaluated. Changing that underlying data. However, calling the masked less version doesn't do anything with the data in the array. Note that this is a big change but to a somewhat ambiguous case of what happens to masked values under function evaluation.

The real reason I began looking into this was to not evaluate the function in the masked values because I'd already premasked the condition that would throw warnings at me. See #4959 for evaluating log10 at locations less than 0 that were under the mask, so still threw warnings.

import numpy as np
A = arange(-2, 5)/2
Am = numpy.ma.masked_less_equal(A, 0)
np.log10(Am) # Previously threw a warning, now works as expected
np.ma.log10(Am) # No warnings before or after

Implementation

I haven't added any documentation yet because I wanted to check and see if this was even the right way of approaching this idea. I struggled to figure out how to pass the out= object pointers around in a consistent manner, so I am definitely interested in suggestions on improving that. I had to update quite a few seemingly random areas of the Masked codebase to make that work everywhere. There are a few tests that I had to update because values under the mask were being compared, and those have now changed with this implementation applying the masked version of ufuncs everywhere.

This is a pretty major (underlying) API change, so I expect there to be quite a bit of discussion on the best approaches here.

Linked Issues

Fixes #4959
Closes #15200

This enables any ufunc numpy operations that are called on a MaskedArray to use the masked version of that function automatically without needing to resort to np.ma.func() calls.

This test makes sure that a MaskedArray defers properly to another class if it doesn't know how to handle it. See numpy#15200.

mattip · 2022-07-13T18:02:40Z

Just to reiterate my subjective summary of the comments from the original PR:

In general, we should probably aim to split masked arrays out of numpy into a separate library. That would require a lot of work, so in the mean time this PR moves in the direction of localizing masked array operations into one area, instead of needing to have lots of if is_masked_array(arg) scattered across NumPy, so it is a GoodThing.

mattip · 2022-07-14T12:32:01Z

Thanks @greglucas

seberg · 2022-07-16T04:00:18Z

SciPy ran into an issue, and I suspect it is due to this: scipy/scipy#16610

The code sample is this failing, which was not failing before:

import numpy as np
data = [[0, 1, 2, 3, 4, 9], [5, 5, 0, 9, 3, 3]]
mask = [[0, 0, 0, 0, 0, 1], [0, 0, 1, 1, 0, 0]]
a = np.ma.masked_array(data, mask=mask)
np.ma.minimum.reduce(a, axis=1)

Adapted from the problem reported at numpy#21977 (comment)

Fixes the problem reported at numpy#21977 (comment) The reduce method here effectively calls itself with an unmasked MaskedArray (mask=nomask) and then expects either a MaskedArray or a scalar. This change ensures that an ordinary ndarray is converted to a MaskedArray, following the pattern already used in mean and var in this module.

greglucas · 2022-07-16T14:38:47Z

Unfortunately, I'm on vacation again and won't have much access to a computer this next week. I don't think this will be too hard to fix though, and still think it would be better to improve this than revert... At first glance this might be something as simple as elif m is masked instead of just the elif m check. Someone else should feel free to take this on if they have time!

I agree with your assessment in the other thread that a few more eyes would be valuable on this. My guess is there might be some other downstream packages that this hits as well and I may have missed some corner cases.

I'm pretty sure Matplotlib is hitting this same issue too in that weekly job testing Numpy main: matplotlib/matplotlib#23431

rcomer · 2022-07-16T16:16:09Z

I have proposed a fix for the minimum.reduce issue at #21993.

pllim · 2022-07-18T18:57:25Z

Might have broken astropy as well, see astropy/astropy#13464 (cc @mhvk )

seberg · 2022-07-18T19:51:52Z

If this is a problem (and we can confirm its this PR), we could even revert it (again). For this larger change, it really makes sense to require checking against astropy first, since that is one of the largest users to begin with.

I like having __array_ufunc__, but unfortunately, I am not surprised that it is tricky...

rcomer · 2022-07-18T20:21:46Z

We are using isinstance to check which types the MaskedArray‘s ufunc can process. Do we need a stricter check that would not include subclasses?

numpy/numpy/ma/core.py

Lines 3152 to 3156 in fe2bb38

    
           # Determine what class types we are compatible with and return 
        
           # NotImplemented if we don't know how to handle them 
        
           for arg in args + outputs: 
        
               if not isinstance(arg, (ndarray, np.bool_, Number, list, str)): 
        
                   return NotImplemented

seberg · 2022-07-18T20:24:54Z

@rcomer I think we might need the opposite and allow everything.

That is I think the design here may be wrong, and what MaskedArray should be doing is more simiarl to what astropy.units.Quantity is doing:

Allow everything
Unpack all masked arrays and apply whatever logic is necessary (mutated where or keep track of the new mask)
Call the ufunc again after unpacking all masked arrays, so that the next call will not recruse (and needs no information about masked arrays at all).
Wrap the result into a masked array again and return

So in that sense, I am not sure the design is quite right.

seberg · 2022-07-18T21:23:26Z

The weird thing about that is, that there should be an inherent order to things. If Quantity also allows everything, then quantity + masked could return a masked(quantity) or a quantity(masked) depending on the order of operations.

So maybe that is not right, and since maskedArray is in NumPy it is not allowed to wrap anything.

@rcomer so I think you may be right, and this should just be a strict type check for ndarray. Rather than checking for Number, you should also check for not hasattr("__array_ufunc__"), probably (or something like that).
(EDIT: Do we even forward those who do not implement __array_ufunc__? Because in that case, maybe allowing those ins not necessary to begin with?)

(Yes, we could auto-wrap more than just ndarray, but I guess we have to make that a problem of downstream, and I doubt that it matters in practice.)

seberg · 2022-07-18T21:25:05Z

numpy/ma/core.py

+        outputs = kwargs.pop('out', ())
+        if not isinstance(outputs, tuple):
+            outputs = (outputs,)
+        outputs += inputs[np_ufunc.nin:]


This is also not necessary, all outputs are normalized into out by the ufunc machinery here.

mhvk · 2022-07-19T13:22:39Z

Just to quickly note that the failures in astropy are with our MaskedColumn class, not Quantity; from a first guess, it looks like __array_finalize__ may need to be called.

On a more general note, since the way MaskedArray is designed, it really wraps a subclass, it seems logical to pursue the idea that it can act independent of subclass (in fact, this was argued in the subclassing section of the __array_ufunc__ documentation), i.e., unpack, re-call, and repack the result.

That said, in my own Masked implementation, I ended up going the route of trying to keep the mask as close to the array as possible, since that turned out to be much easier to wrap inside Quantity (leaving the quantity-specific methods exposed, unlike in a MaskedArray or a quantity).

mhvk · 2022-07-19T13:28:38Z

numpy/ma/core.py

+        else:
+            # The masked power function doesn't support extra args
+            if np_ufunc.__name__ in ('power'):
+                kwargs = {}


Shouldn't this be an error instead?

mhvk · 2022-07-19T15:44:16Z

numpy/ma/core.py

@@ -3292,7 +3384,7 @@ def _scalar_heuristic(arr, elem):
                    return dout
        else:
            # Force dout to MA
-            dout = dout.view(type(self))
+            dout = MaskedArray(dout)


This breaks getting items on subclasses:

from astropy.table import MaskedColumn mc = MaskedColumn([1., 2.], mask=[True, False]) mc[:1] Out[4]: masked_BaseColumn(data=[--], mask=[ True], fill_value=1e+20, dtype=float64)

This should return a MaskedColumn...

mhvk · 2022-07-19T15:48:39Z

There are also some real regressions: e.g.,


In [8]: ma = np.ma.MaskedArray([1., 2.], mask=[True, False])

In [9]: np.sign(ma)
Out[9]: array([1., 1.])

while in previous versions this correctly gave

masked_array(data=[--, 1.0],
             mask=[ True, False],
       fill_value=1e+20)

mhvk · 2022-07-19T15:49:44Z

numpy/ma/core.py

@@ -4200,7 +4314,7 @@ def __add__(self, other):
        """
        if self._delegate_binop(other):
            return NotImplemented
-        return add(self, other)
+        return np.add(self, other)


I think that for all these, one can now safely remove the overrides, since __array_ufunc__ will take care.

mhvk · 2022-07-19T15:52:00Z

numpy/ma/core.py

+            # prevent infinite recursion.
+
+            # Make ndarray views of the input arguments
+            args = [getdata(input) if isinstance(input, MaskedArray)


One can prevent infinite recursion more easily by just calling the ndarray.__array_ufunc__, i.e.,

return super().__array_ufunc__(np_ufunc, method, *inputs, **kwargs)

mhvk · 2022-07-19T15:52:46Z

numpy/ma/core.py

+        # Determine what class types we are compatible with and return
+        # NotImplemented if we don't know how to handle them
+        for arg in args + outputs:
+            if not isinstance(arg, (ndarray, np.bool_, Number, list, str)):


For outputs, one should be stricter: they need to be ndarray - indeed, for most functions, they should be MaskedArray, otherwise the mask cannot be stored.

I think one could be a bit less strict on the inputs - after all, the unwrapped call will do its own checks anyway.

mhvk · 2022-07-19T15:54:40Z

I put some in-line comments; I think this may need a bit more thought to get out the wrinkles. Not sure whether it is best to have a PR with fixes or revert and start with a new PR.

seberg · 2022-07-19T19:10:56Z

I am tempted to revert (unfortunately), unless @greglucas has bandwidth to look into it soon.

mattip · 2022-07-19T20:01:49Z

I agree. I did think the numpy test suite was more comprehensive, but it seems we have a way to go.

greglucas · 2022-07-19T22:46:54Z

Unfortunately I'm on vacation away from a computer and won't be able to pick it up for a week or so at least and even then it will take me a bit to get back up to speed since the majority of the work was done a couple years ago. Reverting is probably the right move, and I appreciate that the dependent libraries are testing main and can hopefully help identify some good test cases for these updates when I go digging through those failure logs. Thanks for the ping and helpful comments so far @seberg and @mhvk. Probably good for a two review PR on the next one as well just to ensure we've had enough eyes on it.

Fixes the problem reported at numpy#21977 (comment) The reduce method here effectively calls itself with an unmasked MaskedArray (mask=nomask) and then expects either a MaskedArray or a scalar. This change ensures that an ordinary ndarray is converted to a MaskedArray, following the pattern already used in mean and var in this module.

Adapted from the problem reported at numpy#21977 (comment)

ENH: Adding __array_ufunc__ capability to MaskedArrays.

8cd6f4c

This enables any ufunc numpy operations that are called on a MaskedArray to use the masked version of that function automatically without needing to resort to np.ma.func() calls.

github-actions bot added the 01 - Enhancement label Jul 13, 2022

greglucas and others added 3 commits July 13, 2022 11:39

TST: Adding a test that MaskedArrays respect ufunc deferral heirarchy

a8b0893

This test makes sure that a MaskedArray defers properly to another class if it doesn't know how to handle it. See numpy#15200.

DOC: Adding improvement note for MaskedArray ufunc

fe8b171

nomask in nomask out

db1a98b

greglucas force-pushed the masked_ufunc branch from 8b19753 to db1a98b Compare July 13, 2022 17:42

mattip merged commit cb28cd1 into numpy:main Jul 14, 2022

rcomer mentioned this pull request Jul 14, 2022

Cube arithmetic array type handling SciTools/iris#3790

Merged

greglucas deleted the masked_ufunc branch July 15, 2022 15:49

seberg mentioned this pull request Jul 16, 2022

TST: stats.mstats.ttests: fix mistakes in tests scipy/scipy#16610

Closed

rcomer added a commit to rcomer/numpy that referenced this pull request Jul 16, 2022

TST: add a test for ma.minimum.reduce with axis keyword

e2efced

Adapted from the problem reported at numpy#21977 (comment)

rcomer mentioned this pull request Jul 16, 2022

BUG: fix ma.minimum.reduce with axis keyword #21993

Merged

mdhaber mentioned this pull request Jul 17, 2022

BUG: SciPy test failures due to changes in NumPy main or Python 3.11 #22006

Closed

seberg mentioned this pull request Jul 18, 2022

BUG: Fix masked median bug #21999

Merged

pllim mentioned this pull request Jul 18, 2022

TST: 191 failures with numpy-dev astropy/astropy#13464

Closed

seberg reviewed Jul 18, 2022

View reviewed changes

mhvk reviewed Jul 19, 2022

View reviewed changes

mattip mentioned this pull request Jul 20, 2022

BUG: Revert using __array_ufunc__ for MaskedArray #22016

Merged

greglucas pushed a commit to greglucas/numpy that referenced this pull request Jan 3, 2023

TST: add a test for ma.minimum.reduce with axis keyword

997d27d

Adapted from the problem reported at numpy#21977 (comment)

greglucas mentioned this pull request Jan 3, 2023

ENH: Adding __array_ufunc__ capability to MaskedArrays (again) #22914

Open

Uh oh!

ENH: Adding __array_ufunc__ capability to MaskedArrays #21977

ENH: Adding __array_ufunc__ capability to MaskedArrays #21977

Uh oh!

Conversation

greglucas commented Jul 13, 2022

Example

Implementation

Linked Issues

Uh oh!

mattip commented Jul 13, 2022

Uh oh!

mattip commented Jul 14, 2022

Uh oh!

seberg commented Jul 16, 2022

Uh oh!

greglucas commented Jul 16, 2022

Uh oh!

rcomer commented Jul 16, 2022

Uh oh!

pllim commented Jul 18, 2022

Uh oh!

seberg commented Jul 18, 2022

Uh oh!

rcomer commented Jul 18, 2022

Uh oh!

seberg commented Jul 18, 2022

Uh oh!

seberg commented Jul 18, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

seberg Jul 18, 2022

Choose a reason for hiding this comment

Uh oh!

mhvk commented Jul 19, 2022 • edited by seberg Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mhvk Jul 19, 2022

Choose a reason for hiding this comment

Uh oh!

mhvk Jul 19, 2022

Choose a reason for hiding this comment

Uh oh!

mhvk commented Jul 19, 2022

Uh oh!

mhvk Jul 19, 2022

Choose a reason for hiding this comment

Uh oh!

mhvk Jul 19, 2022

Choose a reason for hiding this comment

Uh oh!

mhvk Jul 19, 2022

Choose a reason for hiding this comment

Uh oh!

mhvk Jul 19, 2022

Choose a reason for hiding this comment

Uh oh!

mhvk commented Jul 19, 2022

Uh oh!

seberg commented Jul 19, 2022

Uh oh!

mattip commented Jul 19, 2022

Uh oh!

greglucas commented Jul 19, 2022

Uh oh!

Uh oh!

seberg commented Jul 18, 2022 •

edited

Loading

mhvk commented Jul 19, 2022 •

edited by seberg

Loading