-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
np.asarray(masked_array)
should raise rather than silently dropping the mask
#26669
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Note that to do this, you will need to refine the
(strictly speaking the check isn't necessary but an optimization to avoid unnecessary calls.) Ping @mhvk since I think astropy might be the best "test bed" (or its downstream). This would definitely be nice to do and maybe it would be just OK. It is a bit unclear to me how pervasive and "well working" code paths exist where things actually work out just fine. I.e. I don't mind trying it, but I could imagine that we need some preparation to keep some things working (i.e. things that we would keep working in a replacement). |
I added a code example. I haven't investigated what dunder method is best used/adapted here, that's an implementation detail at this point. Tests will show what is actually supported and should keep working.
I suspect that ufuncs and all other functions which operate element-wise and correctly return a masked array, because of Ideally, all functions in the main namespace should document whether they do or don't preserve subclasses, and that would be consistent for types of functions (element-wise, reductions, etc.) and tested. That was always the intent, and it mostly works due to Anyway, whether functions work is secondary to whether
If code outside numpy uses |
It's a bit of a dead horse I'm beating here, but really numpy should just use But the time to do that would have been numpy 2.0. Equally, though, I think the time for such a major break of what happens to More constructively, implementing This perhaps goes more in the direction of how it would be approached with the array api, with |
Now read the last comment: Why would What I do worry about is that raising in |
Right, this is what I was worried about, and I am not sure if I underestimating the challenge when it comes to code like In this case, also note that if you deprecate things you must also reject buffer protocol exports the same way.
Making masked arrays special wouldn't be great, which is why I brought up the logic of guaranteeing a call to The question is really about downstream use of At some point, it might be a similar effort to create the MA alternative that isn't a subclass ;). |
Completely agree with the need/desire for better subclass support inside numpy. To me, that really is a much bigger and separate topic though than "make Ideally I'd like this to be a two line patch (modulo tests) that does accomplishes this, and changes nothing else. That would solve a host of silent correctness bugs.
It is semantically wrong, because masked values may contain arbitrary values or mean "undefined value" (a la Side note: pandas series/dataframes with NA-aware dtypes backed by numpy arrays are a close analogy here, calling
I don't have much experience with |
With Note that at the level of just blacklisting numpy functions, the
Deferring to Though it would probably help to start with stuff from astropy's Masked - see its p.s. I'd happily review PRs to adjust EDIT (seberg): I have doubts super works, but |
I'm still not 100% sure I understand the answer. It matters whether the numerical values are correct or not. For If you have an ndarray subclass, it translates to: can function calls with two inputs be separated into "func(ndarray, ndarray) + separate handling of the extra bits". I.e. for
That does sound like a nice and simple implementation. |
Uh oh!
There was an error while loading. Please reload this page.
There are a lot of issues, both in numpy itself and in downstream packages, where masked arrays silently do the wrong thing when they get passed to a function that isn't aware of masked arrays and starts off by calling
np.asarray
on its inputs. gh-26530 is a recent example. Across the mainnumpy
namespace, almost all functions are not mask-aware, and for those that do the right thing it's mostly by accident. Masked arrays should only be passed to functions in thenumpy.ma
namespace. In SciPy it's the same:scipy.stats.mstats
supports masked arrays, nothing else in SciPy does.Masked arrays should be treated like sparse arrays or other array types with different semantics: conversion should not be done implicitly, because dropping a mask implicitly is almost never the correct thing to do.
gh-18675 basically came to this conclusion. Other related issues:
We should aim to add a
FutureWarning
first I think, since raising immediately may be disruptive.EDIT: code example:
So if dropping the mask is wrong and
np.asarray(masked_array)
starts raising, what if you do want the underlyingndarray
/values? https://numpy.org/devdocs/reference/maskedarray.generic.html#accessing-the-data suggests that using eitherx.data
or np.ma.getdata(x)` is the way to do that.The text was updated successfully, but these errors were encountered: