NEP: add default-dtype-object-deprecation nep 34 #14674

mattip · 2019-10-10T12:12:15Z

NEP to deprecate a = np.array([1, 2], [1]) without explicitly stating dtype=object.

xref gh-5303, related to gh-13913, gh-14341.

doc/neps/nep-0034.rst

rgommers · 2019-10-13T11:01:40Z

A few high level comments:

Title and description are unclear, is it about "list of lists" specifically, something a little broader (like tuple of lists/tuples too), or any kind of sequence?
Your current text says you want to raise an error for explicit dtype=object as well in some cases. This seems odd, explicit dtype=object should preserve current behavior I'd think.
Related to the previous point: np.ragged_array_object is not a good idea, just make dtype=object not raise.
Please add some rationale for why to special-case this type of input, rather than making all object array creation raise unless it's explicitly requested.

eric-wieser · 2019-10-13T11:04:26Z

np.ragged_array_object is not a good idea, just make dtype=object not raise.

I believe it is a good idea, because it would have a non-optional array_depth argument. This would solve the following problem:

a = np.ragged_array_object([[1, 2], [3, 4]], array_depth=1)
b = np.ragged_array_object([[1, 2], [3, 4, 5]], array_depth=1)
assert a.ndim == b.ndim == 1

Today there is no easy way to construct a.

The name is awful, but I think to be useful at all, a depth argument is needed.

doc/neps/nep-0034.rst

rgommers · 2019-10-13T11:09:36Z

This would solve the following problem:

Try to look at this from an end user perspective. Why is this a problem worth solving? Why would you want such behavior? In every case I can think of, there are much better alternatives.

Another way to say this: if you would design such a feature without knowing about the current issue that gave rise to this NEP, would you ever propose np.ragged_array_object for inclusion in NumPy?

eric-wieser · 2019-10-13T11:16:28Z

Why would you want such behavior?

We're all agreed I think that no one wants the behavior of np.array([[1, 2], [1]])

My claim is that additionally no one wants the behavior of np.array([[1, 2], [1]], dtype=object). To illustrate that, consider:

student_lists = np.array([class1.students, class2.students], dtype=object)
assert student_lists.ndim == 1

This code works just fine unless by some coincidence the two classes have the same number of students. Because of this, I strongly think np.array([[1, 2], [1]], dtype=object) should also be an error.

What remains is the question "what is the correct way to write the above?". Today, the best I can come up with is the wasteful:

def ragged_array_object(seq, depth):
    arr = np.array(seq, dtype=object)  # this relies on the broken behavior!
    assert arr.ndim >= depth
    arr2 = np.empty(arr.shape[:depth], dtype=object)
    arr2[...] = arr
    return arr2

~~Perhaps forcing the user to do this workaround is ok~~ (edit: forcing the user to use a workaround also breaks this workaround), but removing np.array([[1, 2], [1]], dtype=object) would be a lot more palatable if we provided an easy replacement.

would you ever propose np.ragged_array_object for inclusion in NumPy?

Something similar to it, yes - I've repeatedly found myself wanting np.ragged_array_object(some_obj, depth=0).

rgommers · 2019-10-13T11:30:05Z

We're all agreed I think that no one wants the behavior of np.array([[1, 2], [1]])

yes, I think so too

My claim is that additionally no one wants the behavior of np.array([[1, 2], [1]], dtype=object).

Perhaps, but not because of some minor inconsistency. more because of object arrays being very hard to work with anyway, and hence not being very useful.

What remains is the question "what is the correct way to write the above?".

That is not what remains, that's what I'm trying to make clear with my questions above. What remains is: are you (the user) really trying to work with ragged arrays here? If so, how good do you want that experience to be? NumPy is limited and is unlikely to ever get nice, well-thought-out support for ragged arrays. NumPy can try to make it less likely for users that don't want ragged arrays to not shoot themselves in the foot (hence, raising an error by default). For users that do want ragged arrays, they're probably better off using Arrow (I think, not an expert) or XND (which has proper ragged array support), or staying with list of lists.

Perhaps forcing the user to do this workaround is ok, but removing np.array([[1, 2], [1]], dtype=object) would be a lot more palatable if we provided an easy replacement.

I think just documenting some workaround is much better, for the unlikely case where a user really wants these kind of object arrays (she probably doesn't!).

seberg · 2019-10-14T21:17:40Z

My 2 cents about ragged arrays: I am not sure I like a special function for it, but I think usage out there is large enough that we cannot just ignore it. The long way to write it is:

arr = np.empty(correct_shape, dtype=object)
arr[...] = values

is fairly short, but also not very discoverable, plus you need to know the exact shape instead of just the number of dimensions which could be enough.

The next way would be to add a specific ndim=3 kwarg, so that we can stop shape discovery when we hit the correct dimension (we already do this, just the ndim on the python level is always 32).

The last way would require an additional hook for dtypes (which we can always add later if we do new dtypes; and which actually would somewhat make sense also to solve the "tuple is scalar" issue for void coercion). And that is provide a way ask the DType "Is this a scalar or a sequence" during coercion.

We could then write np.array([...], dtype=PyObject[np.ndarray]) (assuming the inside sequences are arrays in this case). I have not looked into that yet, because it seems to me that it is easier to just special case tuples+void right now.

seberg · 2019-10-14T21:27:11Z

We may have to decide here (or at least mention) if we want to do the same thing for the two other cases which create object arrays possibly surprisingly):

Failure to promote: np.array([np.array((1,2), "i,i"), 1]) (note that the conversion to string or string+floats is not as such a failure to promote right now, because we actually define it specifically)
The dtype of large python integers can fluctuate randomly (especially on windows).

The 2. part is likely a different issue (or at least is much easier addressed on its own). The first one may have overlap here depending on where we want to fix this. Note that if we are not in a hurry, I am hoping to rewrite the whole thing (I have actually done so, but it needs cleaning up) for new dtypes. Although, that does not matter too much w.r.t. starting this now in the old code already.

mattip · 2019-10-24T10:46:37Z

Updated to put off discussion of depth until later, reformatted to use a Usage and Impact section.

doc/neps/nep-0034.rst

eric-wieser · 2019-10-24T12:17:02Z

doc/neps/nep-0034.rst

+It was also suggested to add a kwarg `depth` to array creation, or perhaps to
+add another array creation API function `ragged_array_object`. The goal was
+to eliminate the ambiguity in creating an object array from `array([[1, 2],
+[1]], dtype=object)`: should the returned array have a shape of `(1,)`, or
+`(2,)`? This NEP does not deal with that issue, and only deprecates the use of
+`array` with no `dtype=object` for ragged arrays.


Double `` throughout

Perhaps worth adding "as a consequence of choosing not to deal with this issue, users of ragged arrays may be faced with a second deprecation cycle in the future" or something.

I think I agree with declaring this out of scope - there's a lot of value to making noise when ragged arrays weren't intended.

there's a lot of value to making noise when ragged arrays weren't intended.

I agree

I think I agree with declaring this out of scope

I don't mind either, but I would suggest to do whatever is easier (I don't know what that is). If just ragged arrays are easier, perhaps give that as a rationale. If we have to jump through extra hoops to do just ragged arrays (IIRC we failed once before?), then why bother?

doc/neps/nep-0034.rst

eric-wieser · 2019-10-24T14:23:56Z

doc/neps/nep-0034.rst

-Decimal(10)])``. This too is out of scope for the current NEP: only if all the
-top-level elements are `sequences`_ will we require an explicit
-``dtype=object``.
+- It was also suggested to deprecate all automatic creation of ``object``-dtype


Perhaps drop "It was also suggested to" and "we could". and phrase these all in the imperative, "Deprecate all", "Continue with", "Add a kwarg".

eric-wieser · 2019-10-24T14:24:37Z

doc/neps/nep-0034.rst

+  arrays, which would require a dtype for something like ``np.array([Decimal(10),
+  Decimal(10)])``. This too is out of scope for the current NEP: only if all
+  the top-level elements are `sequences`_ will we require an explicit
+  ``dtype=object``.


This is only if they are ragged, right? np.array([[[Decimal(1)]]]) is still fine?

the intention is that should still just work

In that case, I think this sentence needs rewording

removed the confusing last clause of the sentence

mattip · 2019-10-24T14:51:42Z

Copying some valuable comments in case they get lost in the rewrites

... I would suggest to do whatever is easier (I don't know what that is). If just ragged arrays are easier, perhaps give that as a rationale. If we have to jump through extra hoops to do just ragged arrays (IIRC we failed once before?), then why bother (i.e., fail if no dtype is specified, mattip)?

Since you call out ndarray here - will this start failing?

outer = np.array([None, None])
outer[0] = outer[1] = np.array([1, 2, 3])
np.array(outer).shape  # today: (2,)
np.array([outer]).shape  # today: (1, 2,)

outer_ragged = np.array([None, None])
outer_ragged[0] = np.array([1, 2, 3])
outer_ragged[1] = np.array([1, 2, 3, 4])
# will both of these emit warnings?
np.array(outer_ragged).shape  # today: (2,)
np.array([outer_ragged]).shape  # today: (1, 2,)

Examples of things that need decisions but should probably still work:

np.array([[[Decimal(1)]]])

mattip · 2019-10-24T14:54:43Z

The previous failure was trying to change the error message of array(<lists-of-lists>, dtype=int) to something that indicated ragged arrays.

As for the examples, maybe we should get a PR going, mark it WIP and see how painful this all is and what we can and cannot detect.

mattip · 2019-10-28T20:24:57Z

xref gh-14794

mattip · 2019-10-29T09:09:00Z

I think this is ready for the mailing list? From NEP 000

Once the PR is in place, the NEP should be announced on the mailing list for discussion. Discussion about implementation details will take place on the pull request, but once editorial issues are solved, the PR should be merged, even if with draft status. The mailing list e-mail will contain the NEP upto the section titled “Backward compatibility”,

rgommers · 2019-10-29T09:18:07Z

doc/neps/nep-0034.rst

+  cycle in the future.
+
+- It was also suggested to deprecate all automatic creation of ``object``-dtype
+  arrays, which would require a dtype for something like ``np.array([Decimal(10),


I wrote a comment "this sentence doesn't make sense to me. Why would that require a new dtype?" and before hitting the "add comment" button I realized you mean ...would require adding `dtype=object` for ..... I suggest that as a rephrase.

Or possibly require adding an explicit ``dtype=object`` for ...

eric-wieser · 2019-10-29T09:21:01Z

doc/neps/nep-0034.rst

+Backward compatibility
+----------------------
+
+Anyone depending on ragged nested sequences creating object arrays will need to


Suggested change

Anyone depending on ragged nested sequences creating object arrays will need to

Anyone depending on creating object arrays from ragged nested sequences will need to

rgommers · 2019-10-29T09:21:54Z

doc/neps/nep-0034.rst

+  ``(1,)``, or ``(2,)``? This NEP does not deal with that issue, and only
+  deprecates the use of ``array`` with no ``dtype=object`` for ragged nested
+  sequences. Users of ragged nested sequences may face another deprecation
+  cycle in the future.


I'd add something like: Rationale: we expect that there are very few users who intend to use ragged arrays like that, this was never intended as a use case of NumPy arrays. Users are likely better off with another library or just using list of lists.

The phrasing I was pushing for earlier was more along the lines of "this isn't a big enough problem to be worth bringing into the scope of this NEP", rather than "this isn't something in scope for numpy". I'd still consider attempting to solve this in some future NEP. The rationale in my mind was simply "this is a different problem that can be solved later, and is lower value than the contents of this NEP".

That phrasing works for me too.

I'd still consider attempting to solve this in some future NEP

I would suggest that we have many more interesting/important things to do. But yeah, we've never had the discussion so let's just postpone it rather than discussing it now.

rgommers · 2019-10-29T09:21:59Z

doc/neps/nep-0034.rst

+
+- It was also suggested to deprecate all automatic creation of ``object``-dtype
+  arrays, which would require a dtype for something like ``np.array([Decimal(10),
+  Decimal(10)])``. This too is out of scope for the current NEP.


I'd add something like: Rationale: it's harder to asses the impact of this larger change, we're not sure how many users this may impact.

doc/neps/nep-0034.rst

rgommers · 2019-10-29T09:22:34Z

I think this is ready for the mailing list?

agreed, it reads well. a few final comments made

eric-wieser · 2019-10-29T09:24:23Z

doc/neps/nep-0034.rst

+behaviour will emit a ``DeprecationWarning``. There is an open question whether
+the ``assert_equal`` family of functions should be changed or users be forced
+to change code like
+
+```
+np.assert_equal(a, [[1, 2], 3])
+```
+
+to 
+
+```
+np.assert_equal(a, np.array([[1, 2], 3], dtype=object)
+```


Is this decision any different to deciding whether the following should be allowed?

>>> np.add([1, (2, 3)], [4, (5, 6)]) array([5, (2, 3, 5, 6)], dtype=object)

vs requiring it be spelt

>>> np.add(np.array([1, (2, 3)], dtype=object), np.array([4, (5, 6)], dtype=object)) array([5, (2, 3, 5, 6)], dtype=object)

At any rate, it would be worth drawing attention to the fact that since this affects asarray, it affects almost every numpy function with list → array semantics that uses asarray internally.

I think you have a point. Special-casing array_equal and friends would make it hard to remove the deprecation in the future. We would need to try: ... except around internal asarray use. I will drop this discussion point, users should modify their code now to avoid the warning.

reformulated and moved to Usage and Impact for emphasis.

eric-wieser · 2019-10-29T10:26:04Z

Note that this comment still needs addressing:

Since you call out ndarray here - will this start failing?

My impression is that the answer is no (at least, based on your PR), and that the wording needs tweaking there. Let's add a test to the PR and see what the behavior is?

mattip · 2019-10-29T10:32:10Z

Let's add a test to the PR and see what the behavior is?

Added here, they all pass with no DeprecationWarning

eric-wieser · 2019-10-29T10:38:13Z

they all pass with no DeprecationWarning

Right - so we need to change either the implementation or the NEP, because currently they disagree. My feeling would be to just remain silent on np.ndarrays being considered sequence objects in the NEP, which is the easiest path - if nothing else, the case that we're trying to stop users being bitten by is for nested lists of regular python objects, not so much lists of arrays.

mattip · 2019-10-29T11:09:05Z

Are you referring to the outer_ragged example? I added a footnote hopefully clarifying the algorithm.

doc/neps/nep-0034.rst

Co-Authored-By: Hameer Abbasi <[email protected]>

mattip · 2019-10-31T06:14:43Z

There were no repsonses on the mailing list. I assume that means no-one opposes the NEP. Can we merge the draft status?

rgommers · 2019-10-31T09:20:05Z

Yep, merged! Thanks @mattip and @eric-wieser

NEP: add default-dtype-object-deprecation nep

6dc23b0

mattip added the component: NEP label Oct 10, 2019

eric-wieser reviewed Oct 10, 2019

View reviewed changes

doc/neps/nep-0034.rst Outdated Show resolved Hide resolved

eric-wieser reviewed Oct 10, 2019

View reviewed changes

doc/neps/nep-0034.rst Outdated Show resolved Hide resolved

eric-wieser reviewed Oct 10, 2019

View reviewed changes

doc/neps/nep-0034.rst Outdated Show resolved Hide resolved

eric-wieser reviewed Oct 10, 2019

View reviewed changes

doc/neps/nep-0034.rst Outdated Show resolved Hide resolved

NEP: fixes from review, add ragged_array_object function

e6e0ac1

eric-wieser reviewed Oct 13, 2019

View reviewed changes

doc/neps/nep-0034.rst Outdated Show resolved Hide resolved

eric-wieser reviewed Oct 13, 2019

View reviewed changes

doc/neps/nep-0034.rst Outdated Show resolved Hide resolved

eric-wieser reviewed Oct 13, 2019

View reviewed changes

doc/neps/nep-0034.rst Outdated Show resolved Hide resolved

eric-wieser reviewed Oct 13, 2019

View reviewed changes

doc/neps/nep-0034.rst Outdated Show resolved Hide resolved

eric-wieser reviewed Oct 13, 2019

View reviewed changes

doc/neps/nep-0034.rst Outdated Show resolved Hide resolved

NEP: put off discussion of , fixes from review

d0be6b6

rgommers reviewed Oct 24, 2019

View reviewed changes

doc/neps/nep-0034.rst Outdated Show resolved Hide resolved

DOC: changes from review

3973d26

rgommers reviewed Oct 24, 2019

View reviewed changes

doc/neps/nep-0034.rst Show resolved Hide resolved

mattip added 2 commits October 24, 2019 14:34

DOC: changes from review

d6a76ae

NEP: explicitly define ragged array

ea4e17d

eric-wieser reviewed Oct 24, 2019

View reviewed changes

doc/neps/nep-0034.rst Outdated Show resolved Hide resolved

eric-wieser reviewed Oct 24, 2019

View reviewed changes

doc/neps/nep-0034.rst Outdated Show resolved Hide resolved

eric-wieser reviewed Oct 24, 2019

View reviewed changes

doc/neps/nep-0034.rst Outdated Show resolved Hide resolved

DOC: fixes from review

49c37d9

eric-wieser reviewed Oct 24, 2019

View reviewed changes

doc/neps/nep-0034.rst Outdated Show resolved Hide resolved

eric-wieser reviewed Oct 24, 2019

View reviewed changes

mattip changed the title ~~NEP: add default-dtype-object-deprecation nep~~ NEP: add default-dtype-object-deprecation nep 34 Oct 28, 2019

NEP: update from comments and PR 14794

7d31c13

rgommers reviewed Oct 29, 2019

View reviewed changes

eric-wieser reviewed Oct 29, 2019

View reviewed changes

rgommers reviewed Oct 29, 2019

View reviewed changes

eric-wieser reviewed Oct 29, 2019

View reviewed changes

doc/neps/nep-0034.rst Outdated Show resolved Hide resolved

eric-wieser reviewed Oct 29, 2019

View reviewed changes

NEP: changes from review

8305449

NEP: clarify np.ndarray behaviour

f8bf0ca

hameerabbasi reviewed Oct 29, 2019

View reviewed changes

doc/neps/nep-0034.rst Outdated Show resolved Hide resolved

NEP: fix typo

1a7c11e

Co-Authored-By: Hameer Abbasi <[email protected]>

rgommers merged commit ed7a077 into numpy:master Oct 31, 2019

mattip deleted the nep-0034 branch November 2, 2020 08:29

	Anyone depending on ragged nested sequences creating object arrays will need to
	Anyone depending on creating object arrays from ragged nested sequences will need to

Uh oh!

NEP: add default-dtype-object-deprecation nep 34 #14674

NEP: add default-dtype-object-deprecation nep 34 #14674

Uh oh!

Conversation

mattip commented Oct 10, 2019

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rgommers commented Oct 13, 2019

Uh oh!

eric-wieser commented Oct 13, 2019

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rgommers commented Oct 13, 2019

Uh oh!

eric-wieser commented Oct 13, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rgommers commented Oct 13, 2019

Uh oh!

seberg commented Oct 14, 2019

Uh oh!

seberg commented Oct 14, 2019

Uh oh!

mattip commented Oct 24, 2019

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mattip commented Oct 24, 2019

Uh oh!

mattip commented Oct 24, 2019

Uh oh!

mattip commented Oct 28, 2019

Uh oh!

mattip commented Oct 29, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

rgommers commented Oct 29, 2019

Uh oh!

eric-wieser commented Oct 13, 2019 •

edited

Loading

eric-wieser Oct 29, 2019 •

edited

Loading