-
-
Notifications
You must be signed in to change notification settings - Fork 7.9k
[Bug]: plt.pcolormesh
crashes when called with Int64
nullable dtype
#23991
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
At our root we are a numpy based library. If you can't cleanly convert your data to a numpy numerical array with dd = df.astype('Int64')
print(np.asanyarray(dd).dtype) returns an I would say this is an upstream issue, and pandas and numpy need to work out what they want to do here, versus us adding a conversion shim. |
I think the root issue here is that float(pd.NA)
I do find this a little surprising; you might expect |
This would also require changing matplotlib/lib/matplotlib/axes/_axes.py Line 5622 in 3feaa5d
to c = np.asanyarray(args[0], dtype=float) Together with In [1]: float(pd.NA)
Out[1]: nan it looks like this solves the issue Would that be acceptable? The reason this is necessary is that a Series of type |
I guess this is OK? It will mean that an integer array Someone will have to go through and do this for all the methods that take a 2-D array ( I think its a bit stubborn to not just convert the array to float, but I can see the idea that you want to preserve the ints as ints. |
I don't think so? df = pd.DataFrame([[1, 2], [3, pd.NA]]).astype("Int64")
np.asanyarray(df, float) ---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In [72], line 2
1 df = pd.DataFrame([[1, 2], [3, pd.NA]]).astype("Int64")
----> 2 np.asanyarray(df, float)
File ~/miniconda/envs/py310/lib/python3.10/site-packages/pandas/core/generic.py:2069, in NDFrame.__array__(self, dtype)
2068 def __array__(self, dtype: npt.DTypeLike | None = None) -> np.ndarray:
-> 2069 return np.asarray(self._values, dtype=dtype)
TypeError: float() argument must be a string or a real number, not 'NAType' |
@mwaskom sorry I meant with the
Yeah TBH I'm not keen on the current solution of |
Oh, sorry, I somehow thought it worked straight as
|
I meant changing If I make that change in pandas, and also change c = np.asanyarray(args[0], dtype=float) in matplotlib/lib/matplotlib/axes/_axes.py Line 5622 in 3feaa5d
|
Shouldn't this be handled via the unit code? |
I don't think we pass any mappable data through a units machinery, despite some attempts/desire to do so. I guess if this was x or y data it could be handled via units, but I'm not aware of special units for pandas objects. |
oh, fair point I missed that this was the color channel not the x/y 🐑 There are handler registered someplace (I think by pandas) to correctly handle their datetime types, I think it would make sense to also handle the nullable integers the same way (we treat them as a "unit" that needs casting to floats somehow). |
Agreed. OTOH asanyarray should work and all that is being suggested here is that we further cast to float, which makes sense to me. It may actually catch user type errors sooner in the pipeline. |
fair, but explicit type casts is a bit of a code-smell to me (which may be just be due to the persistent issues with the unit code which are not currently used here). |
Hi @tacaswell - would love your thoughts on pandas-dev/pandas#48891 if possible In the case of e.g.
Could that be acceptable? |
Well I'd argue the code smell here (had to look that phrase up) is that pandas didn't push their NA concept upstream to numpy. From a practical point of view I was going to claim that we implicitly cast to float pretty early here. However, that is not true - we have BoundaryNorm that maps to integers. So we have another example where things have developed too many paths to easily infer behaviour. In this case I think our best bet is to short term tell folks to explicitly cast their data to something that works. Longterm, I think the idea of unit conversion for mapables may be ok, but maybe via passing a converter in explicitly and/or allowing norms to handle the unit conversion. The problem as always is if the units are triggered via the container object or the type of elements in an array. |
Do we have any other methods that can succesfully plot this data type? (e.g. plot, scatter) |
Bug summary
When calling
plt.pcolormesh
with a dataframe containing either of the pandas nullable dtypes (Int64
orFloat64
) it crashes.Code for reproduction
Actual outcome
Expected outcome
Additional information
Noticed via seaborn mwaskom/seaborn#3042.
Operating system
lubuntu
Matplotlib Version
3.5.3
Matplotlib Backend
module://matplotlib_inline.backend_inline
Python version
3.10.4
Jupyter version
6.4.12
Installation
pip
The text was updated successfully, but these errors were encountered: