DISCUSS: Should object array comparisons return objects? (Trac #2117) #577

numpy-gitbot · 2012-10-19T15:08:47Z

Original ticket http://projects.scipy.org/numpy/ticket/2117 on 2012-04-26 by @yarikoptic, assigned to unknown.

This issue became more visible since 1.6.x allowed to construct such heterogeneous arrays without specification of dtype=object:

>>> import numpy as np
>>> print np.__version__
1.7.0.dev-3cb783e
>>> a = np.array([np.array([0, 1]), np.array(1)])
>>> print a.dtype
object
>>> print a == a.copy()
[ True  True]
>>> print a == np.array([np.array([0, 1]), np.array(1)])
False

So -- comparing an object array to itself's copy worked just fine but comparison to identically created another one -- failed.

Edit: (mattip, 2019-10-28) formatting

The text was updated successfully, but these errors were encountered:

numpy-gitbot · 2012-10-19T15:08:48Z

@charris wrote on 2012-04-27

Strange indeed. This is going to be fun to figure out. My guess it that the first item is the same object in a and a.copy(), whereas it is a different object for the other array created with the same parameters... and it is so.

In [12]: b = np.array([np.array([0, 1]), np.array(1)], dtype=object)

In [13]: type b[1]
-------> type(b[1])
Out[13]: numpy.ndarray

In [14]: c = b.copy()

In [15]: c[0] is b[0]
Out[15]: True

In [16]: d = np.array([0,1])

In [17]: c[0] is d
Out[17]: False

So the comparison is comparing object pointers rather than calling a comparison routine. It's certainly easier to compare the pointers. I'm not sure if this is a bug or a feature ;) Shouldn't be hard to find where it happens.

numpy-gitbot · 2012-10-19T15:08:48Z

@charris wrote on 2012-04-27

OTOH

In [37]: a = np.array([[257]*2, 257], dtype=object)

In [38]: b = np.array([[257]*2, 257], dtype=object)

In [39]: a == b
Out[39]: array([ True,  True], dtype=bool)

In [40]: a[0] is b[0]
Out[40]: False

So it looks specific to arrays.

seberg · 2013-01-04T21:50:35Z

This looks weird on first sight but it is actually down to python's C-API. PyObject_RichCompareBool which is used (probably wrongly) always returns True if it is the same object. I think that it should be exchanged here. The correct behaviour would be the second one because the comparison of two (compatibly shaped) arrays cannot result in a single boolean (unless you want a special case here). Of course from a speed perspective RichCompareBool is probably good, the biggest disadvantage is the obviously wrong result for np.array(np.nan, dtype=object) == np.array(np.nan, dtype=object).

My guess is that RichCompareBool should be replaced with a PyObject_IsTrue(RichCompare(in1, in2, OP)) (probably not just for the ufuncs, but everywhere in the code). Which will result in always False being returned here. There is however another bug here IMO. That incompatible arrays return False I can understand, however in this case, the error leading to the False result is not due to incompatible shape and should surface.

If that is considered the right approach, I will do a small PR to that effect (for both things).

seberg · 2013-01-04T22:02:01Z

Actually, maybe the current behaviour should just be kept, since even if its somewhat wrong it is also what python lists do...

njsmith · 2013-01-04T22:15:51Z

In principle I guess == and friends on object arrays should always return object arrays (using PyObject_RichCompare), since there's no rule that says that == has to return a bool. OTOH if we're going to cast the result to bool, then either the PyObject_IsTrue(PyObject_RichCompare(...)) or the PyObject_RichCompareBool behaviour seems credible, though arguably the former is fixing a python bug.

But! What I want to know is, why does that last comparison return a scalar?!

In [10]: a = np.array([np.array([0, 1]), np.array(1)])

In [11]: b = np.array([np.array([0, 1]), np.array(1)])

In [12]: a == b
Out[12]: False

Surely that should be [False, False] or something?

seberg · 2013-01-04T22:28:55Z

@njsmith agreed, it would logically not cast to bool I guess. The last thing is a bug in numpy if you check np.equal(a, b) it actually gives an error. Numpy suppresses errors because for non-broadcastable arrays it returns False (which is debatable), and my guess is, that also catches the errors created inside the ufunc.

seberg · 2017-07-02T20:53:01Z

This should be about as close to being fixed as it can be, so closing (no "is" check anymore, and == raises more errors nowadays).

yarikoptic · 2017-07-03T18:01:53Z

hm @seberg , so now it just consistently provides incorrect answer?

>>> import numpy as np
>>> print np.__version__
1.14.0.dev0+14cd918
>>> a = np.array([np.array([0, 1]), np.array(1)])
>>> print a.dtype
object
>>> print a == a.copy()
False
>>> print a == np.array([np.array([0, 1]), np.array(1)])
False

why it should be the correct behavior whenever objects are comparable etc?

Edit: (mattip, 2019-10-28) formatting

seberg · 2017-07-03T18:04:32Z

No:

>>> a = np.array([np.array([0, 1]), np.array(1)])
>>> print(a.dtype)
object
>>> print( a == a.copy())
runtests.py:1: DeprecationWarning: elementwise == comparison failed; this will raise an error in the future.
runtests.py:1: DeprecationWarning: elementwise == comparison failed; this will raise an error in the future.
False
>>> print (a == np.array([np.array([0, 1]), np.array(1)]))
runtests.py:1: DeprecationWarning: elementwise == comparison failed; this will raise an error in the future.
runtests.py:1: DeprecationWarning: elementwise == comparison failed; this will raise an error in the future.
False

seberg · 2017-07-03T18:10:58Z

@yarikoptic hmmm, or maybe I am missing something. What do you expect? Then again, maybe we planned to go via an error to go somewhere else. Apparently we still do the cast to bool (which for arrays will typically create an error), so that we would only go to error for now.

seberg · 2017-07-03T18:13:39Z

Heh, I don't remember, I am sure there are other issues about this transition, but if you feel better, will reopen it :).

yarikoptic · 2017-07-03T18:18:11Z

I am not sure why I do not see those deprecation warnings and you do (I cut pasted exact command and output with the version as currently straight from the master branch).

I expect that a "thing" should be equal to its own copy, either element-wise (custom numpy semantic) or just altogether (result of __eq__ is typically a bool).
Whenever before it was providing a correct, although with mixed semantic (element-wise but not recursively into comparing elements) at least for a copy and incorrect (False) for an identical separately defined array, now it is just always provides incorrect result. Not sure how that is better really.

seberg · 2017-07-03T18:25:07Z

@yarikoptic unfortunately deprecation warning are only printed if you endable them to be (most users don't care much, EDIT: This is a python choice, not ours), we likely will need to turn these into a VisibleDeprecationWarning before actually changing things :(. Well, the current state fixes if you have an object array with e.g. NaN inside (which previously sometimes gave you True for the same reason.

The code currently always casts to bool, and bool(array) -> Error, so the warning is there to say that that error will actually show up in the future. The error is unfortunately also the reason you currently get a False. I closed it since I thought that after the deprecation is done, it will be correct (or as good as possible), I am not sure now. It might be we want to morph the behaviour further (i.e. by removing the cast to bool at some point), I suppose.

mattip · 2019-09-25T10:21:14Z

Raising this again as those deprecation warnings have expired and it is time to clean up the code. The question is this: given an object array a = np.array([1, np.array([1, 2, 3])], dtype=object), should a == a.copy() produce True, or should it be consistent with the rest of numpy and produce np.array([True, np.array([True, True, True])], dtype=object)? Then a == 0 should produce the same shaped result with False, right?

A trickier one: a = np.array([1, [1, 2, 3]], dtype=object); a == 0will now producenp.array([False, False], dtype=object) since [1, 2, 3] == 0 produces False not [False, False, False].

seberg · 2019-09-25T15:28:50Z

I feel the main options are the options are:

a == a will raise an Error (for nested arrays)
a == a always returns np.equal(a, a, signature="OO->O")

The first one is what we made all the fuzz about. The second is probable the easier sell/better. So I am tending towards it. The tricky thing is that our warnings warn about the first one right now (indirectly, but nevertheless)?! You are right, the last example will stump users, but it is the consistent choice and trying to be smart would just get us in trouble.

WarrenWeckesser · 2019-09-25T15:36:27Z

@seberg's option 2 seems like the correct approach. == acting on an object array should act like == on any other array, and do the comparison element-wise (with the usual broadcasting rules). These all look correct to me:

In [25]: a = np.array([1, np.array([1,2,3])], dtype=object)

In [26]: np.equal(a, a, signature='OO->O')
Out[26]: array([True, array([ True,  True,  True])], dtype=object)

In [27]: np.equal.outer(a, a, signature='OO->O')
Out[27]: 
array([[True, array([ True, False, False])],
       [array([ True, False, False]), array([ True,  True,  True])]],
      dtype=object)

In [28]: b = np.array([1, [1, 2, 3]])

In [29]: np.equal(b, b, signature='OO->O')
Out[29]: array([True, True], dtype=object)

seberg · 2019-09-25T16:48:21Z

It is an interesting question whether we should maybe just reorder the whole loop priority to use OO->O before OO->? always.

EDIT: Just to be clear, that should solve the whole == deprecation for the nasty and important object case.

seberg · 2022-12-07T12:25:33Z

So... this issue is now resolved towards option "1." above. We still use the OO->? implementation which means that and object array with nested arrays must fail == and !=.

I honestly think that is OK to enforce a boolean result for array comparison, but of course we could default to the object result. The way it is now, you would have to use np.equal(..., dtype=object) (at the time of writing, maybe that didn't work, it now does).
That does make more sense for use-cases with nested arrays, but getting an object array back may be confusing for other use-cases, e.g. when the array is filled with decimals?

seberg · 2022-12-07T12:27:06Z

I.e. the behavior is consistent now, I think. The only remaining issue is to discuss whether we want to modify the behavior of object comparisons in general.

mattip · 2022-12-07T14:15:30Z

Let's close this. If the current behavior does not match someone's expectations in a concrete use case, please open a new issue.

seberg closed this as completed Jul 2, 2017

seberg reopened this Jul 3, 2017

toobaz mentioned this issue Dec 14, 2017

Comparison of arrays with iterable values fails and raises wrong warning #10218

Closed

mattip removed the priority: normal label Oct 21, 2018

mattip mentioned this issue Oct 29, 2019

ENH: change object-array comparisons to prefer OO->O unfuncs #14800

Merged

seberg added 15 - Discussion and removed 00 - Bug labels Dec 7, 2022

seberg changed the title ~~inconsistent comparison of object type arrays (Trac #2117)~~ DISCUSS: Should object array comparisons return objects? (Trac #2117) Dec 7, 2022

mattip closed this as completed Dec 7, 2022

Uh oh!

DISCUSS: Should object array comparisons return objects? (Trac #2117) #577

DISCUSS: Should object array comparisons return objects? (Trac #2117) #577

Comments

numpy-gitbot commented Oct 19, 2012 • edited by mattip Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

numpy-gitbot commented Oct 19, 2012

Uh oh!

numpy-gitbot commented Oct 19, 2012

Uh oh!

seberg commented Jan 4, 2013

Uh oh!

seberg commented Jan 4, 2013

Uh oh!

njsmith commented Jan 4, 2013

Uh oh!

seberg commented Jan 4, 2013

Uh oh!

seberg commented Jul 2, 2017

Uh oh!

yarikoptic commented Jul 3, 2017 • edited by mattip Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

seberg commented Jul 3, 2017 • edited by mattip Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

seberg commented Jul 3, 2017

Uh oh!

seberg commented Jul 3, 2017

Uh oh!

yarikoptic commented Jul 3, 2017

Uh oh!

seberg commented Jul 3, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mattip commented Sep 25, 2019

Uh oh!

seberg commented Sep 25, 2019

Uh oh!

WarrenWeckesser commented Sep 25, 2019

Uh oh!

seberg commented Sep 25, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

seberg commented Dec 7, 2022

Uh oh!

seberg commented Dec 7, 2022

Uh oh!

mattip commented Dec 7, 2022

Uh oh!

numpy-gitbot commented Oct 19, 2012 •

edited by mattip

Loading

yarikoptic commented Jul 3, 2017 •

edited by mattip

Loading

seberg commented Jul 3, 2017 •

edited by mattip

Loading

seberg commented Jul 3, 2017 •

edited

Loading

seberg commented Sep 25, 2019 •

edited

Loading