Thanks to visit codestin.com
Credit goes to github.com

Skip to content

is_string_like returns True for numpy object arrays #7725

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
3 tasks
fdtomasi opened this issue Jan 2, 2017 · 3 comments · Fixed by #8011
Closed
3 tasks

is_string_like returns True for numpy object arrays #7725

fdtomasi opened this issue Jan 2, 2017 · 3 comments · Fixed by #8011
Milestone

Comments

@fdtomasi
Copy link

fdtomasi commented Jan 2, 2017

Bug report

Bug summary

Function is_string_like returns True for numpy object arrays

Code for reproduction

import numpy as np
from matplotlib.cbook import is_string_like
print(is_string_like(np.array(map(str, [1,2,3]), dtype=object)))

Actual outcome

True

Expected outcome

False

The problem is that the line 707 obj + '' in is_string_like does not generate an exception with such numpy array.

Matplotlib version

  • Matplotlib version: 1.5.3
  • OS: Ubuntu 14.04
  • Python and Matplotlib Installed with Anaconda
@tacaswell
Copy link
Member

In what context is this a problem? A numpy object array of str objects is very string like [he says having flash-backs to MATLAB days].

@tacaswell tacaswell added this to the 2.1 (next point release) milestone Jan 2, 2017
@fdtomasi
Copy link
Author

fdtomasi commented Jan 2, 2017

I use seaborn, and a pandas.Dataframe to scatter some points.
Here, I need to concatenate a label vector (of type object) to my data matrix. Such operation causes the DataFrame to force all columns to have the object type.

For the moment, in my project, I overcame the issue by forcing the columns related to the points to have again the float type, since the is_string_like function with object arrays seems to consider my columns as strings.

I understand this may be an unexpected behaviour of numpy, rather than matplotlib's.
In fact, the operation

np.array(['1','2'], dtype=object) + ''

does not generate any type of errors or warnings. It simply has no effect.
Regarding the observations, I'd say that

np.array(['1','2'], dtype=str) + ''

instead, generates an error, hence is_string_like in this case would return False.

It seemed reasonable, however, to open an issue here, due to the fact that assuming the generation of some errors in such operations, causes the function to consider strings something which is not.

@tacaswell
Copy link
Member

That sounds like pandas is doing something wrong (as one of the big advantages of panadas over numpy is to have heterogeneous columns)

That test has no effect because you are adding an empty string, if you do

In [5]: np.array(['1','2'], dtype=object) + 'b'
Out[5]: 
array(['1b', '2b'], dtype=object)

what is doing is clearer (the add unfunc is falling back to defering to python to deal with the + operation and giving you back a new object array with concatenation broadcast across all entries).

In the second case, you get

In [4]: np.array(['1','2'], dtype=str) + ''
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-4-1ccaacc2525d> in <module>()
----> 1 np.array(['1','2'], dtype=str) + ''

TypeError: ufunc 'add' did not contain a loop with signature matching types dtype('<U1') dtype('<U1') dtype('<U1')

Because instead of holding the values as python string objects (which know how to add with another string and is just a black-box to numpy) it is holding the values as fixed length c-strings and the dispatch mechanism in numpy does not know how to deal with this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants