-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
"Controlled" creation of object arrays #15047
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
One option would be try:
# get ahead of the game and promote the deprecation to an error that will replace it
with warnings.catch_warnings():
warnings.filterwarnings('raise', DeprecationWarning, message="...")
c_arr = np.asarray(c)
except (DeprecationWarning, ValueError):
# whatever you currently do for ValueError |
I guess this, and the failing test mentioned in gh-15045 are instances where emitting a |
Note that |
I think that the code-churn is worth the deprecation period. Matplotlib runs it's test suite with warnings-as-failures to catch exactly this sort of change early so this seems like the system working to me :). |
But AFAICT there isn't even a reasonably easy fix (as pointed above the proposed fix is not threadsafe) for it :/ |
I think I see @anntzer's point here. We're in a mess where downstream library want to fail fast so they can try something else, while users should be shown a gentler message. The problem is today there is no way for the library author to ask "would this emit a warning" without actually... emitting the warning, and suppressing it isn't threadsafe. Regarding warning thread-safety: https://bugs.python.org/issue37604 |
AFAIK, the deprecation is in the release branch. Do we want to revert it? If not, the fixes will need backports. I'm still not clear why the warnings were not raised in the release branch wheels and didn't show up in the nightly builds until the last two builds. I didn't change anything after the branch and nothing looks very suspicious in the commits since then in the master branch except, possibly, #15040. |
IMHO (and in agreement with @mattip's point above) it's the kind of changes that would be much easier to handle downstream if the switch to raising happened without a deprecation period. Not sure that's an option though :/ |
Or possibly multibuild treats branches are differently than master. |
FWIW I was always at least -1 on this change, especially as a keen user of ragged data structures, but anyway now I need to figure out what to do about the hundreds of test failures for the SciPy |
An easy option would be:
The whole point in us using |
I think we do, it's raining issues. We now have a list of what's breaking in Pandas, Matplotlib, SciPy, inside |
Can we compromise on a pendingdeprecationwarning? |
That way, downstream projects can add it to their ignore lists, and when we switch back to DeprecationWarning they get to make the decision again. |
We seem to have diverged from the original issue, which seems to be "given a sequence of values, how can matplotlib determine if they are a single color or a list of colors". I think there should be a solution that does not require casting the values to an ndarray, and checking the dtype of that array. Some kind of recursive |
I've reverted the change for 1.18.x in #15053. |
The sentiment is that breaking scipy and pandas CI is annoying enough to temporarily revert it in master as well. I would like it to go back in basically scheduled (say within a month) though. We may need to find a solution though. Also the fixups pandas are doing are slightly worrisome to me, since they use If there is really no way, and we need thread-safe warning suppression. |
I think the issue @anntzer brings up is more general though. It's about writing a function that takes many types of input, with logic like:
since one can't add
@seberg wasn't |
@rgommers no, |
Not completely sure if the problematic cases run agaist the original intention (https://numpy.org/neps/nep-0034.html) of they we're just not anticipated. Anyway, a way out would be to explicitly enable the old behavior along the lines of "appreciating your concern, but we explicitly want the context-dependent object dtype and will handle problematic input ourselves". Something like one of
all not very pretty and naming for sure to be improved. But this gives a clean way out for libraries which relied on the behavior and want to keep it (at least for the moment). |
Isn't the matplotlib case actually:
since really you want to catch the error, rather than getting an object dtype? |
There is no reversion of the deprecation currently pending for master. I don't think it should be reverted wholesale as it was in 1.18 because that also removed the fixes, which I think we want to keep. @mattip A more targeted reversion would be appreciated until we decide what to do in the long term. |
FWIW I think most of the places in mpl which hit this can be fixed (with more or less restructuring -- in one case it turns out the code if much faster after...). (Again, I think the change is a good one, it's just a matter of how it's executed.) |
Yeah, we just need to figure out how the API should look like. As pointed out by many, there are currently two main issues:
Finally, we have to figure out how to cram it into our code :). |
I don't see a problem with a full revert and then reintroduce whatever parts make sense now. Again, reverting something is not a value judgement about what is good or bad, it's just a pragmatic way to unbreak a bunch of stuff we just broke by pushing the merge button. There's clearly impact and unsolved issues that were not foreseen in the NEP, so reverting first is the right thing to do. |
An argument for not reverting yet - while the change is in master, we can leverage downstream CI runs to try and work out what their workarounds would look like |
Downstream CI is red, that's very unhelpful. We now have their list of failures, we don't need to keep their CI's red to make our life a little easier here. |
And at least Matplotlib's CI is running against |
That's pulling from the nightly wheels it looks like. The change was already reverted for 1.18.0rc1, so you shouldn't see it if you would be installing with |
Some of the above comments amount to rethinking the proposed changes in NEP 34. I'm not sure if this thread is the appropriate place to continue this discussion, but here goes. (No harm if it should be discussed elsewhere--copying and pasting comments is easy. 😄 Also, some of you have seen a variation these comments in a discussion on slack.) After thinking about this recently, I ended up with the same idea as @timhoffm's first suggestion (and the idea has probably been proposed at other times in the last few months): define a specific string or singleton object that, when given as the I think it is clear now that using For example (cf. item 1 in @seberg's last comment), suppose Alternatives that add a new keyword argument, such as @timhoffm's second suggestion, seem more complicated than necessary. The problem that we're trying to solve is the "foot gun" where ragged input is automatically converted to a 1-d object array. The problem only arises when |
Sounds reasonable, maybe. It's also good to point out that there is no real "ragged array" concept in NumPy. It's something we basically don't support (search for "ragged" in the docs, on the issue tracker or mailing list to confirm if you want), it's something that DyND and XND support, and we only started talking about to have a concise phrase to discuss "we want to remove the |
It seems general opinion is coalescing around a solution of extending the deprecation (maybe indefinitely) by allowing Now we need to decide upon the name and where it should be exposed. Perhaps I think the path forward is to amend NEP 34, then expose the discussion on the mailing list. |
Note that there have been a couple of reports also of problems with using operators ( |
In almost all cases it is probably known that one of the operands is a numpy array. So it should probably be possible to get well defined behaviour by manually converting to a numpy array. |
Could someone point to the As for the Many of the scipy failures seem to be of the form |
Saw that in @jbrockmendel's Pandas PR, but I think it has since change (don't see an explicit
At that point it becomes |
@mattip wrote:
That sounds OK to me. |
My current working name for this is I'm not sure the name should be private. By any practical definition of private and public, this will be a public object. It provides users the means to preserve the legacy behavior of, for example, |
Agreed wtih @WarrenWeckesser's description of public/private; either it's public, or it shouldn't be used by anyone outside of NumPy. Re name: please pick a name that describes the functionality. Things like "legacy" are almost never a good idea. |
|
Thinking out loud for a bit... What does this object do? Currently, when NumPy is given a Python object that contains subsequences whose lengths are not consistent with a regular n-d array, NumPy will create an array with What is a good name for that? |
Well, I was paraphrasing. The actual test is more like I will propose an extension to NEP 34 to allow a special value for dtype. |
Note that it seems scipy does not need this sentinel to pass tests with scipy/scipy#11310 and scipy/scipy#11308 |
gh-15119 was merged, which re-implemented the NEP. If it is not reverted, we can close this issue |
I am going to close this, since we did not followup on it before the 1.19 release. And I at least hope the reason for this was because the discussion has died down since all major projects were able to find reasonable solutions to the problems created by it. |
Auto-creation of object arrays was recently deprecated in numpy. I agree with the change, but it seems a bit hard to write certain kinds of generic code that determine whether a user-provided argument is convertible to a non-object array.
Reproducing code example:
Matplotlib contains the following snippet:
but sometimes the function is called with an array of colors in various formats (e.g.
["red", (0.5, 0.5, 0.5), "blue"]
) -- we catch the ValueError and convert each item one at a time instead.Now the call to np.array(c) will emit a DeprecationWarning. How can we work around that? Even something like
np.min_scalar_type(c)
emits a warning (which I would guess it shouldn't?), so it's not obvious to me how to check "if we converted this thing to an array, what would the dtype be?"Numpy/Python version information:
1.19.0.dev0+bd1adc3 3.8.0 (default, Nov 6 2019, 21:49:08)
[GCC 7.3.0]
The text was updated successfully, but these errors were encountered: