Thanks to visit codestin.com
Credit goes to github.com

Skip to content

ValueError when using pandas with a module overloading numpy's __eq__ with xpress #22612

Closed
@merraksh

Description

@merraksh

The following code produces a ValueError at the second print with pandas >= 0.22 and any version of xpress (available on conda and PyPI):

import pandas as pd

n = 3

str1 = ['a'] * n
str2 = ['b'] * n
str3 = ['c'] * n

str1[0] = 'd'

df = pd.DataFrame({'key':str1, 'val1':str2, 'val2':str3})

df = df.set_index('key')

print (df.loc['d'])

import xpress as xp

print (df.loc['d'])

This is because xpress' overloading of a NumPy eq operation, which is done through NumPy's PyUFunc_FromFuncAndData() function.

This overloading works by replacing function pointers for an array of (operand_type, operand_type, result_type) tuples and possibly changing those types. For xpress to work, one of the two elements of the array having NPY_OBJECT as operand types should be changed so that the result is also NPY_OBJECT.

The ValueError is raised in pandas' _maybe_get_bool_indexer(), where indexer, an ndarray of bytes, is cython-defined and then assigned the result of the comparison. The comparison runs xpress' code, which realizes it's a comparison of non-xpress objects and just reverts to the original comparison operation, but returns an array of objects rather than of bytes. Assigning it to indexer thus returning a ValueError.

Issue does not exist with pandas < 0.22.

Output is as follows:

val1    b
val2    c
Name: d, dtype: object
Traceback (most recent call last):
  File "bug2.py", line 19, in <module>
    print (df.loc['d'])
  File "/home/pietro/.local/lib/python3.5/site-packages/pandas/core/indexing.py", line 1478, in __getitem__
    return self._getitem_axis(maybe_callable, axis=axis)
  File "/home/pietro/.local/lib/python3.5/site-packages/pandas/core/indexing.py", line 1912, in _getitem_axis
    return self._get_label(key, axis=axis)
  File "/home/pietro/.local/lib/python3.5/site-packages/pandas/core/indexing.py", line 140, in _get_label
    return self.obj._xs(label, axis=axis)
  File "/home/pietro/.local/lib/python3.5/site-packages/pandas/core/generic.py", line 2987, in xs
    loc = self.index.get_loc(key)
  File "/home/pietro/.local/lib/python3.5/site-packages/pandas/core/indexes/base.py", line 3078, in get_loc
    return self._engine.get_loc(key)
  File "pandas/_libs/index.pyx", line 140, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 157, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 183, in pandas._libs.index.IndexEngine._get_loc_duplicates
  File "pandas/_libs/index.pyx", line 191, in pandas._libs.index.IndexEngine._maybe_get_bool_indexer
ValueError: Item size of buffer (8 bytes) does not match size of 'uint8_t' (1 byte)

The expected output is as follows:

val1    b
val2    c
Name: d, dtype: object
val1    b
val2    c
Name: d, dtype: object

and here is the output of pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 3.5.3.final.0
python-bits: 64
OS: Linux
OS-release: 4.9.0-8-amd64
machine: x86_64
processor: 
byteorder: little
LC_ALL: None
LANG: en_GB.utf8
LOCALE: en_GB.UTF-8

pandas: 0.23.4
pytest: None
pip: 9.0.1
setuptools: 40.0.0
Cython: None
numpy: 1.15.1
scipy: 1.0.0
pyarrow: None
xarray: None
IPython: 5.1.0
sphinx: None
patsy: None
dateutil: 2.7.3
pytz: 2018.5
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: 2.4.8
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.999999999
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.8
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugCompatpandas objects compatability with Numpy or Python functions

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions