ENH: Adding __array_ufunc__ capability to MaskedArrays #16022

greglucas · 2020-04-20T03:31:34Z

This enables any ufunc numpy operations that are called on a MaskedArray to use the masked version of that function automatically without needing to resort to np.ma.func() calls directly.

Example

import numpy as np
a = np.ma.array(np.arange(10), [0, 1] * 5)
x = a < 10
# x.data == [True,  True,  True,  True,  True, False, False, False, False, False]
y = np.ma.less(a, 10)
# y.data == [True,  True,  True,  True,  True,  True, False,  True, False, True]

Even though data values in a are masked, they are still acted upon and evaluated. Changing that underlying data. However, calling the masked less version doesn't do anything with the data in the array. Note that this is a big change but to a somewhat ambiguous case of what happens to masked values under function evaluation.

The real reason I began looking into this was to not evaluate the function in the masked values because I'd already premasked the condition that would throw warnings at me. See #4959 for evaluating log10 at locations less than 0 that were under the mask, so still threw warnings.

import numpy as np
A = arange(-2, 5)/2
Am = numpy.ma.masked_less_equal(A, 0)
np.log10(Am) # Previously threw a warning, now works as expected
np.ma.log10(Am) # No warnings before or after

Implementation

I haven't added any documentation yet because I wanted to check and see if this was even the right way of approaching this idea. I struggled to figure out how to pass the out= object pointers around in a consistent manner, so I am definitely interested in suggestions on improving that. I had to update quite a few seemingly random areas of the Masked codebase to make that work everywhere. There are a few tests that I had to update because values under the mask were being compared, and those have now changed with this implementation applying the masked version of ufuncs everywhere.

This is a pretty major (underlying) API change, so I expect there to be quite a bit of discussion on the best approaches here.

Linked Issues

Fixes #4959
Closes #15200

seberg · 2020-04-23T19:33:47Z

@ahaldane and @mhvk may be interested in this start. I like this. The main question is whether we should aim for creating a MaskedArray outside of NumPy instead of updating it here.
There was some consensus around that being the ideal scenario: A new masked array project outside of NumPy, that we can at some point consider a replacement for the one in NumPy. That said, if these are uncontroversial changes/improvement, nothing should stop it.

If you/anyone wants to create such a project and it seems helpful. I would be happy if we make a new repository in the numpy organization.

jthielen · 2020-04-28T20:28:29Z

Glad to see someone taking this on and moving it along! I apologize for not getting any chance to address #15200 since I raised it in December.

Just one early note/request: it would be great to add tests for proper dispatching/deferral to make sure that this truly closes #15200. Right now, as far as I can tell, it doesn't look like the suggested adaptations to the internal arithmetic methods have been made yet, so I'd suspect that the ufunc dispatch hierarchy issue remains.

greglucas · 2020-04-28T22:04:34Z

Great observation, @jthielen! Thanks for catching that. I just changed those over to dispatch through the ufunc calls for me locally and your example still fails with this update. (So indeed doesn't close that issue yet) My guess is that I've probably got some bad re-casting going on somewhere? I agree that this definitely needs some tests and some documentation. If you want to try and add anything or build upon this, please feel free to.

greglucas · 2020-07-08T04:01:22Z

@jthielen I decided to look at your example again to see why the dispatching still wasn't working right with this implementation and ran into some questions on what should actually be defined to happen in this situation. When I read the spec for array_ufunc, __array_priority__ only comes into play if one of the objects does not implement __array_ufunc__, but if both of the objects implement __array_ufunc__ then I think the first one called takes precedence. I'm wondering if the __array_ufunc__ implementation needs to consider the __array_priority__ internally? The reason this is an issue here is because your example has MaskedArray and WrappedArray both at the same class level. If we truly inherit from MaskedArray, so that WrappedArray is a level deeper in your example:
class WrappedArray(np.ma.MaskedArray, numpy.lib.mixins.NDArrayOperatorsMixin):
then all of the dispatching in the PR works as one would expect. I may also be missing something simple that will take care of this too, like maybe I need to call __array_wrap__ or __array_finalize__ somewhere to take care of the typecasting properly?

From another perspective, it is a bit ambiguous and I'm not clear if there should be commutativity between say a DaskArray and an Xarray object, or is it whichever library puts in the higher __array_priority__ to their object?

Edit: I was curious to see what that example would produce myself and it is non-commutative as well.

In [1]: import dask.array as da

In [2]: import xarray as xr

In [3]: d = da.array([1, 2])

In [4]: x = xr.DataArray([3, 4])

In [5]: type(d*x)
Out[5]: dask.array.core.Array

In [6]: type(x*d)
Out[6]: xarray.core.dataarray.DataArray

mhvk · 2020-07-08T11:15:32Z

@greglucas - __array_priority__ is indeed ignored when __array_ufunc__ is defined and overwriters get to try in order, but subclasses first. The idea very much is that if a class does not know the other class, it should generally return NotImplemented. Getting things to work automatically is nearly impossibe. So, the equivalent of __array_priority__ would be to explicitly decide which of a pair of classes returns NotImplemented and which takes action.

For your example with dask: MaskedArray cannot know generally know about dask, but dask relies on numpy and can thus know about MaskedArray, so I think MaskedArray should return NotImplemented.

jthielen · 2020-07-08T15:41:39Z

@greglucas To follow-up on @mhvk's points, I'd like to emphasize some of the existing documentation that may be helpful (it looks like you've already read through NEP 13/__array_ufunc__ docs, but the others may help round that out):

It is quite a bit to wrap one's head around at first, but what it ends up coming down to is ensuring commutativity through consistently deferring to upcast types (those above the respective type in the hierarchy). The main problem is that there is no "official" or "standard" type casting hierarchy accepted among libraries right now...instead it is an emergent phenomenon from library-by-library ad hoc decisions. The closest thing I know of to a standard is the graph depicted in Pint's documentation (which comes from discussions in xarray, Pint, and Dask issue trackers), but that is by no means official. An continuously updated informational NEP (as I believe suggested by @shoyer) may be of use for this, but I don't know what the process for that would look like.

All that being said, MaskedArray's place in the hierarchy seems fairly clear: right above ndarray. So, I think MaskedArray should handle operations with ndarray (and subclasses), and defer to (that is, return NotImplemented) any other duck array.

For your example @greglucas, xarray and Dask are still not in the best place with respect to fully respecting the type casting hierarchy (see pydata/xarray#3950 and dask/dask#4583). Perhaps a better example of what should happen is with Pint and Sparse:

In [1]: import numpy as np                                                      

In [2]: from pint import UnitRegistry                                           

In [3]: from sparse import COO

In [5]: ureg = UnitRegistry()                                                   

In [6]: q = [[1], [2]] * ureg.m                                                 

In [7]: s = COO(np.array([[0, 1, 0], [0, 25, 50]]), np.array([2, 4, 8]))        

In [8]: q * s                                                                   
Out[8]: <COO: shape=(2, 51), dtype=int64, nnz=3, fill_value=0> <Unit('meter')>

In [9]: s * q                                                                   
Out[9]: <COO: shape=(2, 51), dtype=int64, nnz=3, fill_value=0> <Unit('meter')>

(both these are Pint Quantities wrapping Sparse COOs)

greglucas · 2020-07-08T15:49:22Z

Thanks for the details both of you! I think this all makes more sense for deferrals and I was missing a simple check at the top of the function:

        # Determine what class types we are compatible with and return
        # NotImplemented if we don't know how to handle them
        for arg in args + outputs:
            if not isinstance(arg, (ndarray, np.bool_, Number, str)):
                return NotImplemented

Of course, then some other tests fail due to mask reshaping/resizing, so it isn't that simple. I'll look into this more later this week. Thanks for the links!

greglucas · 2020-07-09T15:28:26Z

OK, I think I worked out the different resizing bug I was running into and pushed up some changes that all work locally for me. I also added a test for the WrappedArray example you gave in #15200.

Unfortunately, it looks like MacOS 3.6 and PyPy don't like the changes, but the majority of systems do. May be tricky to track down those specifics unless someone sees something obvious right away.

ahaldane · 2020-07-19T15:53:04Z

Just to note for those interested, I have a new MaskedArray implementation using __array_ufunc__ and __array_function__ available here: https://github.com/ahaldane/ndarray_ducktypes

See the documentation at: https://github.com/ahaldane/ndarray_ducktypes/blob/master/doc/MaskedArray.md

I haven't make it more publicly available yet because it is still a work in progress, however I would say it is 95% functional now (The last 5% is always longest/hardest to implement). A TODO list is at ahaldane/ndarray_ducktypes#1. If you would like to try it out and comment, feel free to leave issues on github. Also, I welcome PRs, and you should feel welcome to reuse anything there in these PRs here. That repository also has two other ndarray-ducktypes I would like to get closer to a working state.

greglucas

I'm coming back to this again here and cleaning it up a bit and I'm wondering if anyone in the thread had any thoughts on this PR, or anything else I'm overlooking in it.

greglucas · 2021-04-14T02:17:47Z

numpy/ma/core.py

@@ -5785,7 +5911,7 @@ def mini(self, axis=None):
            "`mini` is deprecated; use the `min` method or "
            "`np.ma.minimum.reduce instead.",
            DeprecationWarning, stacklevel=2)
-        return minimum.reduce(self, axis)


Do we want to just remove np.ma.mini()? It has been deprecated since 1.13.0 it looks like.

Yes please. If you are concerned about the size of the PR, it would be fine to do this in a separate PR. Note there is a commented out deprecated code in numpy/ma/core.pyi as well.

greglucas · 2021-04-14T02:22:26Z

numpy/lib/tests/test_function_base.py

@@ -740,7 +740,7 @@ def test_subclass(self):
                     mask=[[False, False], [True, False],
                           [False, True], [True, True], [False, False]])
        out = diff(x)
-        assert_array_equal(out.data, [[1], [1], [1], [1], [1]])
+        assert_array_equal(out.data, [[1], [4], [6], [8], [1]])


It is ambiguous what should happen here. All of these values are under the mask.

numpy/ma/extras.py

greglucas · 2021-04-14T02:28:25Z

numpy/ma/tests/test_core.py

@@ -3168,7 +3168,7 @@ def test_compress(self):
        assert_equal(b.fill_value, 9999)
        assert_equal(b, a[condition])

-        condition = (a < 4.)
+        condition = (a.data < 4.)


This depended on evaluating values under the mask before.

greglucas · 2021-04-14T02:28:59Z

numpy/ma/tests/test_core.py

@@ -5233,7 +5233,7 @@ def test_ufunc_with_out_varied():
    a        = array([ 1,  2,  3], mask=[1, 0, 0])
    b        = array([10, 20, 30], mask=[1, 0, 0])
    out      = array([ 0,  0,  0], mask=[0, 0, 1])
-    expected = array([11, 22, 33], mask=[1, 0, 0])
+    expected = array([1, 22, 33], mask=[1, 0, 0])


Ambiguous data under the mask also changed here.

TomNicholas · 2021-07-22T18:11:08Z

Just dropping in to say that it would be great to see this implemented because it would allow us to simplify our handling of masked arrays in xarray, and generalise it to handle masked duck arrays, which would be awesome!

greglucas · 2021-07-26T15:46:00Z

@TomNicholas, I think this is still ready for review and feedback, so thanks for adding in the voice of support of this from Xarray's perspective too :)

rcomer · 2021-07-28T15:09:39Z

If it helps, I'd also love to see this go in. It will make a workaround I've written for SciTools/iris' arithmetic redundant.

seberg · 2021-07-28T17:39:49Z

I have not looked at this in depth. What would help me to get started would be to understand what/if behaviour is changing. I.e. are all/most changes here basically fixing arguably outright bugs, or are there more sublet changes more likely to break workflows? I like the approach of improving the current implementation, rather than aiming for replacement, but with masked-arrays, it feels a bit doomed to be incomplete. (Which does not mean we should not do it, if anything improving things will make the eventually necessary transition easier.)

greglucas · 2021-07-31T21:45:36Z

What would help me to get started would be to understand what/if behaviour is changing

The biggest change here is the deferral mechanism to to whichever object's priority takes precedent. I added a test in to try and capture that by creating a small WrappedArray. I tried to keep behavior changes to a minimum where possible, so not many tests were updated. However, I did put some inline review notes about data under masks being different between the versions.

One area that I am still not clear about is using the out keyword during Masked computations. I think I am creating an extra array during the computation and then copying it back over, but I'm not clear if that was the way it was before or not, so I could use some help on that aspect during a review.

Which does not mean we should not do it, if anything improving things will make the eventually necessary transition easier

I agree with this. I'm hoping that this can be a stepping stone to something like the new implementation mentioned in #16022 (comment)

rcomer · 2022-06-25T10:53:52Z

Hi, is this PR still under consideration?

This enables any ufunc numpy operations that are called on a MaskedArray to use the masked version of that function automatically without needing to resort to np.ma.func() calls.

This test makes sure that a MaskedArray defers properly to another class if it doesn't know how to handle it. See numpy#15200.

greglucas · 2022-06-30T02:10:19Z

@rcomer, do you have a specific use case that this would help with? I just rebased this and it appears to still be ready to go.

rcomer · 2022-06-30T09:16:02Z

Thanks @greglucas. The problem that ultimately led me here was that when doing arithmetic with dask arrays, any mask the dask array has is lost:

import numpy
import numpy.ma as ma
import dask
import dask.array as da

print("numpy version:", numpy.__version__)
print("dask version:", dask.__version__)

my_numpy_array = ma.masked_array(range(4), mask=[0, 1, 0, 0])
my_dask_array = da.ma.masked_array(range(4), mask=[0, 0, 1, 0])

print(my_numpy_array * my_dask_array)

numpy version: 1.23.0
dask version: 2022.6.1
[0 -- 4 9]

The result should be a dask array which when computed gives

[0 -- -- 9]

It's easy enough to work around once you realise what's happening, and I have an open PR in SciTools/iris with a workaround that should solve it for Iris users. But a low-level fix is always better than a high-level workaround!

mattip · 2022-07-05T08:44:23Z

numpy/ma/tests/test_subclassing.py

+        # Test domained binary operations
+        assert_(isinstance(np.divide(wm, m), WrappedArray))
+        assert_(isinstance(np.divide(m, wm), WrappedArray))
+        assert_equal(np.divide(wm, m) * m, np.divide(m, m) * wm)


Could you add some test where the operands are not the same shape and broadcasting must be done? That is tricky to get right with masks.

Are reductions tested elsewhere?

There are some broadcasting tests of other functions, but it could likely be made more robust. I just added a quick test for the instance type after broadcasting in the new PR, good suggestion.

mattip · 2022-07-05T08:51:01Z

I think we should try to get this in since it does not look like we will deprecate masked arrays soon. I do like the idea of having a separate repo like https://github.com/ahaldane/ndarray_ducktypes mentioned above for a new implementation of masked arrays, maybe we could make removing masked arrays part of NumPy 2.0

mattip · 2022-07-09T17:08:02Z

Thanks @greglucas

mattip · 2022-07-09T18:23:46Z

This changed the result of np.ma.average(x, axis=1, keepdims=True) where x = np.ma.arange(6.).reshape(3, 2). Previously the result had mask = False, now it has mask = [False, False, False]. I submitted #21960 to revert it.

mattip · 2022-07-09T19:40:21Z

Please resubmit this as a new PR since the merge was reverted.

greglucas · 2022-07-11T12:43:06Z

@mattip, my apologies for the silence, I was on vacation the past week, but I should be able to try and update this and resubmit in the next day/two. Thanks for taking care of the revert.

It may be this change: #16022 (comment)
which was needed when I first submitted this PR a couple years ago, but something else must have been updated during that time to no longer flatten the mask on output. I'm still unclear what a user would expect/want for "keepdims" on the mask...

anirudh2290 added 01 - Enhancement component: numpy.ma masked arrays labels Apr 20, 2020

cosama mentioned this pull request May 2, 2020

Masked fields should not be used in comparison #15978

Open

greglucas force-pushed the masked_ufunc branch 2 times, most recently from ae90141 to facb5ea Compare July 9, 2020 14:45

rossbar mentioned this pull request Jul 11, 2020

BUG: Inconsistent behaviour of masked arrays for equivalent operations #16359

Closed

jthielen mentioned this pull request Sep 30, 2020

BUG: Fix passing masked arrays to CAPE/CIN calcs Unidata/MetPy#1516

Merged

3 tasks

This was referenced Oct 15, 2020

Cube arithmetic array type handling SciTools/iris#3790

Merged

Mixing lazy and non-lazy masked cubes in arithmetic SciTools/iris#2987

Closed

Masked array performance SciTools/iris#3470

Closed

Base automatically changed from master to main March 4, 2021 02:04

greglucas force-pushed the masked_ufunc branch 3 times, most recently from 18f6ead to 437736e Compare April 14, 2021 02:07

greglucas commented Apr 14, 2021

View reviewed changes

TomNicholas mentioned this pull request Jul 22, 2021

Rely on NEP-18 to dispatch to dask in duck_array_ops pydata/xarray#5571

Merged

4 tasks

TomNicholas mentioned this pull request Jul 29, 2021

Duck array compatibility meeting pydata/xarray#5648

Open

greglucas and others added 3 commits June 29, 2022 19:17

ENH: Adding __array_ufunc__ capability to MaskedArrays.

6d77c59

This enables any ufunc numpy operations that are called on a MaskedArray to use the masked version of that function automatically without needing to resort to np.ma.func() calls.

TST: Adding a test that MaskedArrays respect ufunc deferral heirarchy

1557c87

This test makes sure that a MaskedArray defers properly to another class if it doesn't know how to handle it. See numpy#15200.

DOC: Adding improvement note for MaskedArray ufunc

a819f00

greglucas force-pushed the masked_ufunc branch from 437736e to a819f00 Compare June 30, 2022 02:01

mattip reviewed Jul 5, 2022

View reviewed changes

mattip merged commit ab541f9 into numpy:main Jul 9, 2022

mattip mentioned this pull request Jul 9, 2022

Revert "ENH: Adding __array_ufunc__ capability to MaskedArrays" #21960

Merged

dopplershift mentioned this pull request Jul 11, 2022

BUG: MaskedArray does not seem to respect ufunc dispatch hierarchy #15200

Open

greglucas mentioned this pull request Jul 13, 2022

ENH: Adding __array_ufunc__ capability to MaskedArrays #21977

Merged

bjlittle mentioned this pull request Jul 22, 2022

Fix array_equal behaviour for masked arrays SciTools/iris#4457

Merged

greglucas mentioned this pull request Dec 18, 2022

Allow using masked in set_offsets matplotlib/matplotlib#24757

Merged

1 task

greglucas mentioned this pull request Jan 3, 2023

ENH: Adding __array_ufunc__ capability to MaskedArrays (again) #22914

Open

Uh oh!

ENH: Adding __array_ufunc__ capability to MaskedArrays #16022

ENH: Adding __array_ufunc__ capability to MaskedArrays #16022

Uh oh!

Conversation

greglucas commented Apr 20, 2020

Example

Implementation

Linked Issues

Uh oh!

seberg commented Apr 23, 2020

Uh oh!

jthielen commented Apr 28, 2020

Uh oh!

greglucas commented Apr 28, 2020

Uh oh!

greglucas commented Jul 8, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mhvk commented Jul 8, 2020

Uh oh!

jthielen commented Jul 8, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

greglucas commented Jul 8, 2020

Uh oh!

greglucas commented Jul 9, 2020

Uh oh!

ahaldane commented Jul 19, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

greglucas left a comment

Choose a reason for hiding this comment

Uh oh!

greglucas Apr 14, 2021

Choose a reason for hiding this comment

Uh oh!

mattip Jul 5, 2022

Choose a reason for hiding this comment

Uh oh!

greglucas Apr 14, 2021

Choose a reason for hiding this comment

Uh oh!

Uh oh!

greglucas Apr 14, 2021

Choose a reason for hiding this comment

Uh oh!

greglucas Apr 14, 2021

Choose a reason for hiding this comment

Uh oh!

TomNicholas commented Jul 22, 2021

Uh oh!

greglucas commented Jul 26, 2021

Uh oh!

rcomer commented Jul 28, 2021

Uh oh!

seberg commented Jul 28, 2021

Uh oh!

greglucas commented Jul 31, 2021

Uh oh!

rcomer commented Jun 25, 2022

Uh oh!

greglucas commented Jun 30, 2022

Uh oh!

rcomer commented Jun 30, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mattip Jul 5, 2022

Choose a reason for hiding this comment

Uh oh!

greglucas Jul 13, 2022

Choose a reason for hiding this comment

Uh oh!

mattip commented Jul 5, 2022

Uh oh!

mattip commented Jul 9, 2022

Uh oh!

mattip commented Jul 9, 2022

Uh oh!

mattip commented Jul 9, 2022

greglucas commented Jul 8, 2020 •

edited

Loading

jthielen commented Jul 8, 2020 •

edited

Loading

ahaldane commented Jul 19, 2020 •

edited

Loading

rcomer commented Jun 30, 2022 •

edited

Loading