Thanks to visit codestin.com
Credit goes to github.com

Skip to content

FIX: Fix shape of hist output when input is multidimensional empty list #13368

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 15, 2019

Conversation

hershen
Copy link
Contributor

@hershen hershen commented Feb 5, 2019

PR Summary

Fixes #13002.

Currently plt.hist([np.array([])]) returns a single array of zeroes for the histogram values (n in the documentation).
There is some pre-processing that converts any input for which np.size(input) == 0 into [np.array([])].

In #13002 the code plt.hist([[], []], color=["k", "r"]) produces an error because the input of [[],[]] is pre-processed into [np.array([])] and its length is no longer equal to the length of color.

The fact that an input of [[],[]] is pre-processed in this way means that its output is a single array of bin values. This seems to contradict the documentation for n:

If input is a sequence of arrays [data1, data2,..], then this is a list of arrays with the values of the histograms for each of the arrays in the same order.

This PR modifies the treatment of multiple empty lists as input to follow the documentation. If the input contains multiple sets of data (even if they're empty), the output will contain the same number of histogram value sets. This also solves #13002.

PR Checklist

  • Has Pytest style unit tests
  • Code is Flake 8 compliant
  • New features are documented, with examples if plot related
  • Documentation is sphinx and numpydoc compliant
  • Added an entry to doc/users/next_whats_new/ if major new feature (follow instructions in README.rst there)
  • Documented in doc/api/api_changes.rst if API changed in a backward-incompatible way

@jklymak
Copy link
Member

jklymak commented Feb 6, 2019

I'm not 100% following this PR. Why is the new behaviour better than the old? Does it really fix #13002? The test doesn't directly test plt.hist([[],[]], color=['k', 'r']) But more to the point, why are we supporting empty lists to hist?

@@ -6573,7 +6573,7 @@ def hist(self, x, bins=None, range=None, density=None, weights=None,
# basic input validation
input_empty = np.size(x) == 0
# Massage 'x' for processing.
if input_empty:
if input_empty and len(x) == 0:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or more simply, this entire if...else block can be deleted and replaced by x = cbook._reshape_2D(x, 'x') which handles empty inputs just fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That'll be much cleaner.
But it seems _reshape_2D currently doesn't work with an empty list []:

>>> mpl.cbook._reshape_2D([], 'x')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/hershen/matplotlib/lib/matplotlib/cbook/__init__.py", line 1418, in _reshape_2D
    if X.ndim == 1 and not isinstance(X[0], collections.abc.Iterable):
IndexError: index 0 is out of bounds for axis 0 with size 0

Is it reasonable to modify it so that it returns [[]] in such a case?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for noticing that, that's actually a regression due to #11921 that I've re-reported in #13392, which will need to get fixed. I think best would be for this PR to wait for #13392, but I'm not going to hold it up on that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I noticed it only because of your suggestion ;)
I don't mind waiting for #13392.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's fixed now.

@anntzer
Copy link
Contributor

anntzer commented Feb 7, 2019

I think the test is actually fine (it checks that we return a number of bar collections matching the number of inputs).

Supporting empty inputs to hist() is like supporting empty inputs to plot(): "why wouldn't we?"

@hershen
Copy link
Contributor Author

hershen commented Feb 7, 2019

@jklymak, I added context to the PR description.
The new behavior more closely follows the documentation for the output n in hist.
It fixes the issue exposed by #13002 and the code in #13002 does not produce an error message anymore.

It's true that the test doesn't directly test that code. Should I add a test with that exact code?

Currently empty list(s) to hist are supported and produce output (except in cases like #13002). My expectation would be that if output is produced, it's shape will be the same shape as the input (2 sets of input data produce 2 sets of output histogram values, 3 produce 3, etc.). In your opinion, what should happen for inputs of [], [[]], [[],[]]?

@jklymak
Copy link
Member

jklymak commented Feb 7, 2019

Currently empty list(s) to hist are supported and produce output (except in cases like #13002). My expectation would be that if output is produced, it's shape will be the same shape as the input (2 sets of input data produce 2 sets of output histogram values, 3 produce 3, etc.). In your opinion, what should happen for inputs of [], [[]], [[],[]]?

What happens now if you do plt.hist([[], []])? Well, OK, I checked, and its not what this PR proposes:

a, _, _ = plt.hist([[], []]) 
print(a)

yields

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

so this is an API change. Is it a good one? I guess so? At the very least needs an API note.

@hershen hershen force-pushed the empty_hist_with_colors branch from 7b81ca5 to 5c9f024 Compare February 7, 2019 21:50
@hershen
Copy link
Contributor Author

hershen commented Feb 7, 2019

Right. With this PR, the output is:

[array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]),
 array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])]

Added an API change entry.

Copy link
Member

@jklymak jklymak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for humouring me @hershen

@hershen
Copy link
Contributor Author

hershen commented Feb 8, 2019

No worries @jklymak!
Sorry for the initial lack of explanations.

Copy link
Contributor

@anntzer anntzer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

anyone can merge post-ci

@hershen hershen force-pushed the empty_hist_with_colors branch from d99ed09 to 45f29b7 Compare February 13, 2019 22:39
@timhoffm timhoffm merged commit 09b2b0d into matplotlib:master Feb 15, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Hist color kwarg broken for multiple empty datasets
5 participants