-
-
Notifications
You must be signed in to change notification settings - Fork 7.9k
cbook._reshape_2D flattens ndarray with 2 dims (rectangular ndarray) #8092
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This is due to the fundamentally broken (IMO) semantics of boxplot, as given in the docstring:
The ragged input is a 1D object array where each entry is an array, so each row becomes a box plot. The rectangular input is a 2D float array, so it is each column that becomes a box plot. The options I can think of are:
|
attn @phobson |
I'm not sure that I'd go so far as to say that the semantics are fundamentally broken, but I can see what @anntzer is saying. With the third option above, I'm not sure what the point would be in raising an error and telling the user to convert it to a list, since that's what we're already doing. Seems like a warning would suffice. While we could add yet another boxplot option (e.g., My vote is that we keep the current behavior, improve the documentation around this, and maybe raise a warning. |
I probably don't use
in which case The third option could be implemented as a warning too, I don't have a strong opinion there. It makes sense for MATLAB's |
FYI, I consider that the semantic are fundamentally broken. |
After discussion with @efiring, we see two possibilities to help fixing the semantics.
|
If we were designing the system from scratch (without regard for backcompat), I think the expected semantics would be that 2D arrays would have each individual row plotted as a boxplot (similarly to lists of lists). |
I understand that (though my opinion differs). My question is: once 2D-arrays are fully deprecated in boxplots, should they raise an error or get reshaped? |
I would raise an error. See e.g. #7785 for issues with implicit reshaping. |
An edge-ier case: list of 2D arrays also raises? |
Yes. What's the alternative? |
I can imagine a world where flattening each array individually makes sense, but I don't think we should cover that. |
This issue has been marked "inactive" because it has been 365 days since the last comment. If this issue is still present in recent Matplotlib releases, or the feature request is still wanted, please leave a comment and this label will be removed. If there are no updates in another 30 days, this issue will be automatically closed, but you are free to re-open or create a new issue if needed. We value issue reports, and this procedure is meant to help us resurface and prioritize issues that have not been addressed yet, not make them disappear. Thanks for your help! |
This issue has been marked "inactive" because it has been 365 days since the last comment. If this issue is still present in recent Matplotlib releases, or the feature request is still wanted, please leave a comment and this label will be removed. If there are no updates in another 30 days, this issue will be automatically closed, but you are free to re-open or create a new issue if needed. We value issue reports, and this procedure is meant to help us resurface and prioritize issues that have not been addressed yet, not make them disappear. Thanks for your help! |
Repinging @timhoffm on the above, though feel free to just say if you don't really care either way :) |
IMHO the documented behavior is the reasonable one.
In general, if you have a sequence of datasets [ds1, ..., dsN] you should get a box for each dataset in the sequence. There's one exception though for 2D arrays. Commonly, this is "tabulated data" where each dataset is a column (ML/datascience terminology columns are features, rows are instances). This is e.g. also consistent with pandas and what we use in So the distinction criteria here should be:
Edit: The correct way forward is to adapt the implementation. This will only affect the array-of-arraylike case where the container is a 1D-object array. There are two possible ways to do this:
We may have similar issues in other functions that accept multiple datasets. Of the top of my head we should check |
This issue has been marked "inactive" because it has been 365 days since the last comment. If this issue is still present in recent Matplotlib releases, or the feature request is still wanted, please leave a comment and this label will be removed. If there are no updates in another 30 days, this issue will be automatically closed, but you are free to re-open or create a new issue if needed. We value issue reports, and this procedure is meant to help us resurface and prioritize issues that have not been addressed yet, not make them disappear. Thanks for your help! |
Bug report
boxplot throwing an error (below) when x is an ndarray with len(x.shape)==2 (I.E. when x is rectangular).
ValueError: List of boxplot statistics and 'positions' values must have same the length
Code for reproduction
Matplotlib version
Possible cause
I believe the issue is in cbook._reshape_2D in the line below. I am not sure why that logic is in place but assume it is for good reason.
Got there by looking in:
reference _reshape_2D:
Current Workaround
Converting the ndarray to a list of lists. The necessity for this workaround doesn't really make sense especially since the plotting works fine with a ragged/non-rectangular ndarray but does not work with a rectangular ndarray.
The text was updated successfully, but these errors were encountered: