FIX: make _reshape_2D accept pandas df with string indices #18374
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Closes #18371
In #17289 we changed
_reshape_2D
(used byviolinplot
andboxplot
). It used to basically just callnp.asanyarray(X)
which works fine with pandas. However, there is a ragged array deprecation, so #17289 now iterates over the columns usingfor x in X
to get each column of the matrix individually.That is incompatible with Pandas data frame like
df = pd.DataFrame(np.random.randn(100, 3), columns=["a", "b", "c"])
, which return the error in #18371, which is a regression.Here we try to extract the matrix from X using
to_numpy()
orvalues
before doing the rest of the manipulations.Note this still doesn't do the "right thing" for the column names, at least using box plot, but this fixes the regression. If the folks who use boxplot or violinplot want to do something fancier with the column names they can try to make that work in some reasonable way, but I think such fancy pandas handling really belongs in pandas, or perhaps the structured data refactor.