Thanks to visit codestin.com
Credit goes to github.com

Skip to content

FIX: make _reshape_2D accept pandas df with string indices #18374

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 30, 2020

Conversation

jklymak
Copy link
Member

@jklymak jklymak commented Aug 29, 2020

Closes #18371

In #17289 we changed _reshape_2D (used by violinplot and boxplot). It used to basically just call np.asanyarray(X) which works fine with pandas. However, there is a ragged array deprecation, so #17289 now iterates over the columns using
for x in X to get each column of the matrix individually.

That is incompatible with Pandas data frame like df = pd.DataFrame(np.random.randn(100, 3), columns=["a", "b", "c"]), which return the error in #18371, which is a regression.

Here we try to extract the matrix from X using to_numpy() or values before doing the rest of the manipulations.

Note this still doesn't do the "right thing" for the column names, at least using box plot, but this fixes the regression. If the folks who use boxplot or violinplot want to do something fancier with the column names they can try to make that work in some reasonable way, but I think such fancy pandas handling really belongs in pandas, or perhaps the structured data refactor.

@jklymak
Copy link
Member Author

jklymak commented Aug 29, 2020

ping @mwaskom for your opinion.... Thanks!

@mwaskom
Copy link

mwaskom commented Aug 29, 2020

Seems reasonable for handling pandas objects, but if someone passes in a dictionary (or anything else with a .values method), they might get a pretty confusing error.

I guess matplotlib doesn't in general support dict inputs so that's maybe not a risk, although boxplots are an obvious place to do so (each entry in the dict becomes a box over its values vector at the location of its key).

@jklymak
Copy link
Member Author

jklymak commented Aug 29, 2020

Ok it's pretty easy to also check if values returns an array. I guess I'm just used to values from xarray.

@jklymak jklymak force-pushed the fix-reshape-pandas branch from 27a81eb to cf9bb88 Compare August 29, 2020 22:12
@timhoffm
Copy link
Member

Anybody can merge after CI psss.

@jklymak jklymak removed the request for review from dstansby August 30, 2020 02:57
@dopplershift dopplershift merged commit d3eaffd into matplotlib:master Aug 30, 2020
meeseeksmachine pushed a commit to meeseeksmachine/matplotlib that referenced this pull request Aug 30, 2020
dopplershift added a commit that referenced this pull request Aug 30, 2020
…374-on-v3.3.x

Backport PR #18374 on branch v3.3.x (FIX: make _reshape_2D accept pandas df with string indices)
@jklymak jklymak deleted the fix-reshape-pandas branch January 29, 2024 00:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Plotting a pandas DataFrame with string MultiIndex
5 participants