-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
NEP: add default-dtype-object-deprecation nep 34 #14674
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
A few high level comments:
|
I believe it is a good idea, because it would have a non-optional
Today there is no easy way to construct The name is awful, but I think to be useful at all, a depth argument is needed. |
Try to look at this from an end user perspective. Why is this a problem worth solving? Why would you want such behavior? In every case I can think of, there are much better alternatives. Another way to say this: if you would design such a feature without knowing about the current issue that gave rise to this NEP, would you ever propose |
We're all agreed I think that no one wants the behavior of My claim is that additionally no one wants the behavior of student_lists = np.array([class1.students, class2.students], dtype=object)
assert student_lists.ndim == 1 This code works just fine unless by some coincidence the two classes have the same number of students. Because of this, I strongly think What remains is the question "what is the correct way to write the above?". Today, the best I can come up with is the wasteful: def ragged_array_object(seq, depth):
arr = np.array(seq, dtype=object) # this relies on the broken behavior!
assert arr.ndim >= depth
arr2 = np.empty(arr.shape[:depth], dtype=object)
arr2[...] = arr
return arr2
Something similar to it, yes - I've repeatedly found myself wanting |
yes, I think so too
Perhaps, but not because of some minor inconsistency. more because of object arrays being very hard to work with anyway, and hence not being very useful.
That is not what remains, that's what I'm trying to make clear with my questions above. What remains is: are you (the user) really trying to work with ragged arrays here? If so, how good do you want that experience to be? NumPy is limited and is unlikely to ever get nice, well-thought-out support for ragged arrays. NumPy can try to make it less likely for users that don't want ragged arrays to not shoot themselves in the foot (hence, raising an error by default). For users that do want ragged arrays, they're probably better off using Arrow (I think, not an expert) or XND (which has proper ragged array support), or staying with list of lists.
I think just documenting some workaround is much better, for the unlikely case where a user really wants these kind of object arrays (she probably doesn't!). |
My 2 cents about ragged arrays: I am not sure I like a special function for it, but I think usage out there is large enough that we cannot just ignore it. The long way to write it is:
is fairly short, but also not very discoverable, plus you need to know the exact shape instead of just the number of dimensions which could be enough. The next way would be to add a specific The last way would require an additional hook for dtypes (which we can always add later if we do new dtypes; and which actually would somewhat make sense also to solve the "tuple is scalar" issue for void coercion). And that is provide a way ask the DType "Is this a scalar or a sequence" during coercion. We could then write |
We may have to decide here (or at least mention) if we want to do the same thing for the two other cases which create object arrays possibly surprisingly):
The 2. part is likely a different issue (or at least is much easier addressed on its own). The first one may have overlap here depending on where we want to fix this. Note that if we are not in a hurry, I am hoping to rewrite the whole thing (I have actually done so, but it needs cleaning up) for new dtypes. Although, that does not matter too much w.r.t. starting this now in the old code already. |
Updated to put off discussion of |
doc/neps/nep-0034.rst
Outdated
It was also suggested to add a kwarg `depth` to array creation, or perhaps to | ||
add another array creation API function `ragged_array_object`. The goal was | ||
to eliminate the ambiguity in creating an object array from `array([[1, 2], | ||
[1]], dtype=object)`: should the returned array have a shape of `(1,)`, or | ||
`(2,)`? This NEP does not deal with that issue, and only deprecates the use of | ||
`array` with no `dtype=object` for ragged arrays. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Double ``
throughout
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps worth adding "as a consequence of choosing not to deal with this issue, users of ragged arrays may be faced with a second deprecation cycle in the future" or something.
I think I agree with declaring this out of scope - there's a lot of value to making noise when ragged arrays weren't intended.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there's a lot of value to making noise when ragged arrays weren't intended.
I agree
I think I agree with declaring this out of scope
I don't mind either, but I would suggest to do whatever is easier (I don't know what that is). If just ragged arrays are easier, perhaps give that as a rationale. If we have to jump through extra hoops to do just ragged arrays (IIRC we failed once before?), then why bother?
Decimal(10)])``. This too is out of scope for the current NEP: only if all the | ||
top-level elements are `sequences`_ will we require an explicit | ||
``dtype=object``. | ||
- It was also suggested to deprecate all automatic creation of ``object``-dtype |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps drop "It was also suggested to" and "we could". and phrase these all in the imperative, "Deprecate all", "Continue with", "Add a kwarg".
doc/neps/nep-0034.rst
Outdated
arrays, which would require a dtype for something like ``np.array([Decimal(10), | ||
Decimal(10)])``. This too is out of scope for the current NEP: only if all | ||
the top-level elements are `sequences`_ will we require an explicit | ||
``dtype=object``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is only if they are ragged, right? np.array([[[Decimal(1)]]])
is still fine?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the intention is that should still just work
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case, I think this sentence needs rewording
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed the confusing last clause of the sentence
Copying some valuable comments in case they get lost in the rewrites
outer = np.array([None, None])
outer[0] = outer[1] = np.array([1, 2, 3])
np.array(outer).shape # today: (2,)
np.array([outer]).shape # today: (1, 2,)
outer_ragged = np.array([None, None])
outer_ragged[0] = np.array([1, 2, 3])
outer_ragged[1] = np.array([1, 2, 3, 4])
# will both of these emit warnings?
np.array(outer_ragged).shape # today: (2,)
np.array([outer_ragged]).shape # today: (1, 2,) Examples of things that need decisions but should probably still work:
|
The previous failure was trying to change the error message of As for the examples, maybe we should get a PR going, mark it WIP and see how painful this all is and what we can and cannot detect. |
xref gh-14794 |
I think this is ready for the mailing list? From NEP 000
|
doc/neps/nep-0034.rst
Outdated
cycle in the future. | ||
|
||
- It was also suggested to deprecate all automatic creation of ``object``-dtype | ||
arrays, which would require a dtype for something like ``np.array([Decimal(10), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wrote a comment "this sentence doesn't make sense to me. Why would that require a new dtype?" and before hitting the "add comment" button I realized you mean ...would require adding `dtype=object` for ....
. I suggest that as a rephrase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or possibly require adding an explicit ``dtype=object`` for ...
doc/neps/nep-0034.rst
Outdated
Backward compatibility | ||
---------------------- | ||
|
||
Anyone depending on ragged nested sequences creating object arrays will need to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anyone depending on ragged nested sequences creating object arrays will need to | |
Anyone depending on creating object arrays from ragged nested sequences will need to |
doc/neps/nep-0034.rst
Outdated
``(1,)``, or ``(2,)``? This NEP does not deal with that issue, and only | ||
deprecates the use of ``array`` with no ``dtype=object`` for ragged nested | ||
sequences. Users of ragged nested sequences may face another deprecation | ||
cycle in the future. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd add something like: Rationale: we expect that there are very few users who intend to use ragged arrays like that, this was never intended as a use case of NumPy arrays. Users are likely better off with another library or just using list of lists.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The phrasing I was pushing for earlier was more along the lines of "this isn't a big enough problem to be worth bringing into the scope of this NEP", rather than "this isn't something in scope for numpy". I'd still consider attempting to solve this in some future NEP. The rationale in my mind was simply "this is a different problem that can be solved later, and is lower value than the contents of this NEP".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That phrasing works for me too.
I'd still consider attempting to solve this in some future NEP
I would suggest that we have many more interesting/important things to do. But yeah, we've never had the discussion so let's just postpone it rather than discussing it now.
doc/neps/nep-0034.rst
Outdated
|
||
- It was also suggested to deprecate all automatic creation of ``object``-dtype | ||
arrays, which would require a dtype for something like ``np.array([Decimal(10), | ||
Decimal(10)])``. This too is out of scope for the current NEP. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd add something like: Rationale: it's harder to asses the impact of this larger change, we're not sure how many users this may impact.
agreed, it reads well. a few final comments made |
doc/neps/nep-0034.rst
Outdated
behaviour will emit a ``DeprecationWarning``. There is an open question whether | ||
the ``assert_equal`` family of functions should be changed or users be forced | ||
to change code like | ||
|
||
``` | ||
np.assert_equal(a, [[1, 2], 3]) | ||
``` | ||
|
||
to | ||
|
||
``` | ||
np.assert_equal(a, np.array([[1, 2], 3], dtype=object) | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this decision any different to deciding whether the following should be allowed?
>>> np.add([1, (2, 3)], [4, (5, 6)])
array([5, (2, 3, 5, 6)], dtype=object)
vs requiring it be spelt
>>> np.add(np.array([1, (2, 3)], dtype=object), np.array([4, (5, 6)], dtype=object))
array([5, (2, 3, 5, 6)], dtype=object)
At any rate, it would be worth drawing attention to the fact that since this affects asarray
, it affects almost every numpy function with list → array semantics that uses asarray
internally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you have a point. Special-casing array_equal
and friends would make it hard to remove the deprecation in the future. We would need to try: ... except
around internal asarray
use. I will drop this discussion point, users should modify their code now to avoid the warning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reformulated and moved to Usage and Impact
for emphasis.
Note that this comment still needs addressing:
My impression is that the answer is no (at least, based on your PR), and that the wording needs tweaking there. Let's add a test to the PR and see what the behavior is? |
Added here, they all pass with no |
Right - so we need to change either the implementation or the NEP, because currently they disagree. My feeling would be to just remain silent on |
Are you referring to the |
Co-Authored-By: Hameer Abbasi <[email protected]>
There were no repsonses on the mailing list. I assume that means no-one opposes the NEP. Can we merge the draft status? |
Yep, merged! Thanks @mattip and @eric-wieser |
NEP to deprecate
a = np.array([1, 2], [1])
without explicitly statingdtype=object
.xref gh-5303, related to gh-13913, gh-14341.