-
-
Notifications
You must be signed in to change notification settings - Fork 18.6k
avoid ValueError when overriding eq #22611
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Overloading "eq" in NumPy for use by other modules (e.g. creating equations) requires replacing a NPY_OBJECT,NPY_OBJECT,NPY_BYTE operation with one that returns a NPY_OBJECT. This triggers a ValueError in _maybe_get_bool_indexer where a NumPy array of bytes is cdef'd but then assigned an ndarray of objects. Added an ndarray of objects to be assigned the same.
@alimcmaster1 I thought so too, but indexer is pre-declared with Cython as a byte array, while indexerObj is an object array. They are both used when creating found and they are returned in case count > 1. One would need one object to receive the comparison, but they are Cython declared and I don't think they can be declared dynamically---but I don't use Cython, so I don't know. |
@merraksh not really sure what you are trying to do, this needs a lot more information, e.g. pls show the traceback. nor is this solution going to be acceptable. e.g. why is it raising? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comments
@jreback traceback posted on #22612, copied below for clarity: For the short script
I get the following with pandas >= 0.22:
|
why would you expect this to work? what exactly is this package doing? |
@jreback xpress and pandas are unrelated, but when imported together they trigger the ValueError when running code on a pandas index. The xpress module is for modeling optimization problems. It allows for creating NumPy arrays of optimization variables and constraints. Constraints use leq/geq/eq signs so xpress overloads Python's and NumPy's ==/<=/>= operators, using PyUFunc_FromFuncAndData() for NumPy. The resulting functions revert to previously defined ==/<=/>= functions when the members of the (in)equality are found not to be xpress objects. The fix allows |
@merraksh my point is we need a test that does not use an external package. I suspect you are actually doing something very odd here. Further where the ValueError is caught is very very odd. It need to be much higher in the stack. Please construct a dummy class that shows the same behavior. |
@jreback I see the point of replicating the issue without external packages, and I'm trying to write an example that overrides The C extension uses As for where to catch ValueError, it is in the assignment of |
@jreback I can't find a way to overload In the meantime, I reverted the try/except code change as there seems to be a much less invasive workaround: replace
with
I'm not sure if this this impact on performance (the commit affecting |
Can you check if we have a benchmark covering this and run asv over it? http://pandas-docs.github.io/pandas-docs-travis/contributing.html#running-the-performance-test-suite |
Thanks. I tried a few weeks ago and today and I get the output below. Note that the 3.6 environment does have cython installed. Any suggestions?
|
this needs a test. closing, but if you can' provide the addtional details can reopen. |
Hi @merraksh , I discovered that I am also running into this issue. Did you ever find a solution that doesn't involve changes to pandas? |
Hi @mhulko, sorry for the reply. We are finally working on a fix for the issue because I couldn't run a successful test of pandas, both master and with the fix. I haven't tried this, but the free_module() function of the xpress module has a call to try and restore the old numpy loop functions. Can you try "del xpress" and see if this has any effect? |
@merraksh - the solution we discovered was to add the line |
@mhulko that's actually better than removing the xpress module. Restoring all operations would require to do something similar for |
Using pandas >= 0.22 together with module xpress that I maintain, I get a
ValueError
when callinga.loc['foo']
on an indexa
. The reason has to do with xpress' overloading of a NumPyeq
operation (and alsoleq
,geq
). This is done through NumPy's PyUFunc_FromFuncAndData() function, which is then passed as a value to a dictionary for key'equal'
; the same happens with'less_equal'
and'greater_equal'
.This overloading works by replacing function pointers for an array of (operand_type, operand_type, result_type) tuples and possibly changing those types. For xpress to work, one of the two elements of the array having
NPY_OBJECT
as operand types should be changed so that the result is alsoNPY_OBJECT
. The ValueError is triggered in _maybe_get_bool_indexer(), whereindexer
, an ndarray of bytes, is cython-defined and then assigned the result of the comparison. The comparison runs xpress' code, which realizes it's a comparison of non-xpress objects and just reverts to the original comparison operation, but returns an array of objects rather than of bytes. Assigning it toindexer
thus returning a ValueError.My change is to wrap the assignment around a try/except block and use a cython-defined array of objects to do the same task if a ValueError exception is raised.
I realize this is not a fix for any bug in pandas, but I believe this should make pandas compatible again with some modules that do the same sort of overloading, such as modeling modules.
All tests passed.
[Edit] fixes #22612