Thanks to visit codestin.com
Credit goes to github.com

Skip to content

DISCUSS: Should object array comparisons return objects? (Trac #2117) #577

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
numpy-gitbot opened this issue Oct 19, 2012 · 20 comments
Closed

Comments

@numpy-gitbot
Copy link

numpy-gitbot commented Oct 19, 2012

Original ticket http://projects.scipy.org/numpy/ticket/2117 on 2012-04-26 by @yarikoptic, assigned to unknown.

This issue became more visible since 1.6.x allowed to construct such heterogeneous arrays without specification of dtype=object:

>>> import numpy as np
>>> print np.__version__
1.7.0.dev-3cb783e
>>> a = np.array([np.array([0, 1]), np.array(1)])
>>> print a.dtype
object
>>> print a == a.copy()
[ True  True]
>>> print a == np.array([np.array([0, 1]), np.array(1)])
False

So -- comparing an object array to itself's copy worked just fine but comparison to identically created another one -- failed.

Edit: (mattip, 2019-10-28) formatting

@numpy-gitbot
Copy link
Author

@charris wrote on 2012-04-27

Strange indeed. This is going to be fun to figure out. My guess it that the first item is the same object in a and a.copy(), whereas it is a different object for the other array created with the same parameters... and it is so.

In [12]: b = np.array([np.array([0, 1]), np.array(1)], dtype=object)

In [13]: type b[1]
-------> type(b[1])
Out[13]: numpy.ndarray

In [14]: c = b.copy()

In [15]: c[0] is b[0]
Out[15]: True

In [16]: d = np.array([0,1])

In [17]: c[0] is d
Out[17]: False

So the comparison is comparing object pointers rather than calling a comparison routine. It's certainly easier to compare the pointers. I'm not sure if this is a bug or a feature ;) Shouldn't be hard to find where it happens.

@numpy-gitbot
Copy link
Author

@charris wrote on 2012-04-27

OTOH

In [37]: a = np.array([[257]*2, 257], dtype=object)

In [38]: b = np.array([[257]*2, 257], dtype=object)

In [39]: a == b
Out[39]: array([ True,  True], dtype=bool)

In [40]: a[0] is b[0]
Out[40]: False

So it looks specific to arrays.

@seberg
Copy link
Member

seberg commented Jan 4, 2013

This looks weird on first sight but it is actually down to python's C-API. PyObject_RichCompareBool which is used (probably wrongly) always returns True if it is the same object. I think that it should be exchanged here. The correct behaviour would be the second one because the comparison of two (compatibly shaped) arrays cannot result in a single boolean (unless you want a special case here). Of course from a speed perspective RichCompareBool is probably good, the biggest disadvantage is the obviously wrong result for np.array(np.nan, dtype=object) == np.array(np.nan, dtype=object).

My guess is that RichCompareBool should be replaced with a PyObject_IsTrue(RichCompare(in1, in2, OP)) (probably not just for the ufuncs, but everywhere in the code). Which will result in always False being returned here. There is however another bug here IMO. That incompatible arrays return False I can understand, however in this case, the error leading to the False result is not due to incompatible shape and should surface.

If that is considered the right approach, I will do a small PR to that effect (for both things).

@seberg
Copy link
Member

seberg commented Jan 4, 2013

Actually, maybe the current behaviour should just be kept, since even if its somewhat wrong it is also what python lists do...

@njsmith
Copy link
Member

njsmith commented Jan 4, 2013

In principle I guess == and friends on object arrays should always return object arrays (using PyObject_RichCompare), since there's no rule that says that == has to return a bool. OTOH if we're going to cast the result to bool, then either the PyObject_IsTrue(PyObject_RichCompare(...)) or the PyObject_RichCompareBool behaviour seems credible, though arguably the former is fixing a python bug.

But! What I want to know is, why does that last comparison return a scalar?!

In [10]: a = np.array([np.array([0, 1]), np.array(1)])

In [11]: b = np.array([np.array([0, 1]), np.array(1)])

In [12]: a == b
Out[12]: False

Surely that should be [False, False] or something?

@seberg
Copy link
Member

seberg commented Jan 4, 2013

@njsmith agreed, it would logically not cast to bool I guess. The last thing is a bug in numpy if you check np.equal(a, b) it actually gives an error. Numpy suppresses errors because for non-broadcastable arrays it returns False (which is debatable), and my guess is, that also catches the errors created inside the ufunc.

@seberg
Copy link
Member

seberg commented Jul 2, 2017

This should be about as close to being fixed as it can be, so closing (no "is" check anymore, and == raises more errors nowadays).

@seberg seberg closed this as completed Jul 2, 2017
@yarikoptic
Copy link
Contributor

yarikoptic commented Jul 3, 2017

hm @seberg , so now it just consistently provides incorrect answer?

>>> import numpy as np
>>> print np.__version__
1.14.0.dev0+14cd918
>>> a = np.array([np.array([0, 1]), np.array(1)])
>>> print a.dtype
object
>>> print a == a.copy()
False
>>> print a == np.array([np.array([0, 1]), np.array(1)])
False

why it should be the correct behavior whenever objects are comparable etc?

Edit: (mattip, 2019-10-28) formatting

@seberg
Copy link
Member

seberg commented Jul 3, 2017

No:

>>> a = np.array([np.array([0, 1]), np.array(1)])
>>> print(a.dtype)
object
>>> print( a == a.copy())
runtests.py:1: DeprecationWarning: elementwise == comparison failed; this will raise an error in the future.
runtests.py:1: DeprecationWarning: elementwise == comparison failed; this will raise an error in the future.
False
>>> print (a == np.array([np.array([0, 1]), np.array(1)]))
runtests.py:1: DeprecationWarning: elementwise == comparison failed; this will raise an error in the future.
runtests.py:1: DeprecationWarning: elementwise == comparison failed; this will raise an error in the future.
False

@seberg
Copy link
Member

seberg commented Jul 3, 2017

@yarikoptic hmmm, or maybe I am missing something. What do you expect? Then again, maybe we planned to go via an error to go somewhere else. Apparently we still do the cast to bool (which for arrays will typically create an error), so that we would only go to error for now.

@seberg
Copy link
Member

seberg commented Jul 3, 2017

Heh, I don't remember, I am sure there are other issues about this transition, but if you feel better, will reopen it :).

@seberg seberg reopened this Jul 3, 2017
@yarikoptic
Copy link
Contributor

I am not sure why I do not see those deprecation warnings and you do (I cut pasted exact command and output with the version as currently straight from the master branch).

I expect that a "thing" should be equal to its own copy, either element-wise (custom numpy semantic) or just altogether (result of __eq__ is typically a bool).
Whenever before it was providing a correct, although with mixed semantic (element-wise but not recursively into comparing elements) at least for a copy and incorrect (False) for an identical separately defined array, now it is just always provides incorrect result. Not sure how that is better really.

@seberg
Copy link
Member

seberg commented Jul 3, 2017

@yarikoptic unfortunately deprecation warning are only printed if you endable them to be (most users don't care much, EDIT: This is a python choice, not ours), we likely will need to turn these into a VisibleDeprecationWarning before actually changing things :(. Well, the current state fixes if you have an object array with e.g. NaN inside (which previously sometimes gave you True for the same reason.

The code currently always casts to bool, and bool(array) -> Error, so the warning is there to say that that error will actually show up in the future. The error is unfortunately also the reason you currently get a False. I closed it since I thought that after the deprecation is done, it will be correct (or as good as possible), I am not sure now. It might be we want to morph the behaviour further (i.e. by removing the cast to bool at some point), I suppose.

@mattip
Copy link
Member

mattip commented Sep 25, 2019

Raising this again as those deprecation warnings have expired and it is time to clean up the code. The question is this: given an object array a = np.array([1, np.array([1, 2, 3])], dtype=object), should a == a.copy() produce True, or should it be consistent with the rest of numpy and produce np.array([True, np.array([True, True, True])], dtype=object)? Then a == 0 should produce the same shaped result with False, right?

A trickier one: a = np.array([1, [1, 2, 3]], dtype=object); a == 0will now producenp.array([False, False], dtype=object) since [1, 2, 3] == 0 produces False not [False, False, False].

@seberg
Copy link
Member

seberg commented Sep 25, 2019

I feel the main options are the options are:

  1. a == a will raise an Error (for nested arrays)
  2. a == a always returns np.equal(a, a, signature="OO->O")

The first one is what we made all the fuzz about. The second is probable the easier sell/better. So I am tending towards it. The tricky thing is that our warnings warn about the first one right now (indirectly, but nevertheless)?! You are right, the last example will stump users, but it is the consistent choice and trying to be smart would just get us in trouble.

@WarrenWeckesser
Copy link
Member

@seberg's option 2 seems like the correct approach. == acting on an object array should act like == on any other array, and do the comparison element-wise (with the usual broadcasting rules). These all look correct to me:

In [25]: a = np.array([1, np.array([1,2,3])], dtype=object)

In [26]: np.equal(a, a, signature='OO->O')
Out[26]: array([True, array([ True,  True,  True])], dtype=object)

In [27]: np.equal.outer(a, a, signature='OO->O')
Out[27]: 
array([[True, array([ True, False, False])],
       [array([ True, False, False]), array([ True,  True,  True])]],
      dtype=object)

In [28]: b = np.array([1, [1, 2, 3]])

In [29]: np.equal(b, b, signature='OO->O')
Out[29]: array([True, True], dtype=object)

@seberg
Copy link
Member

seberg commented Sep 25, 2019

It is an interesting question whether we should maybe just reorder the whole loop priority to use OO->O before OO->? always.

EDIT: Just to be clear, that should solve the whole == deprecation for the nasty and important object case.

@seberg
Copy link
Member

seberg commented Dec 7, 2022

So... this issue is now resolved towards option "1." above. We still use the OO->? implementation which means that and object array with nested arrays must fail == and !=.

I honestly think that is OK to enforce a boolean result for array comparison, but of course we could default to the object result. The way it is now, you would have to use np.equal(..., dtype=object) (at the time of writing, maybe that didn't work, it now does).
That does make more sense for use-cases with nested arrays, but getting an object array back may be confusing for other use-cases, e.g. when the array is filled with decimals?

@seberg
Copy link
Member

seberg commented Dec 7, 2022

I.e. the behavior is consistent now, I think. The only remaining issue is to discuss whether we want to modify the behavior of object comparisons in general.

@seberg seberg changed the title inconsistent comparison of object type arrays (Trac #2117) DISCUSS: Should object array comparisons return objects? (Trac #2117) Dec 7, 2022
@mattip
Copy link
Member

mattip commented Dec 7, 2022

Let's close this. If the current behavior does not match someone's expectations in a concrete use case, please open a new issue.

@mattip mattip closed this as completed Dec 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants