-
-
Notifications
You must be signed in to change notification settings - Fork 7.9k
Behavior of hist() with normed=True changes from v2.0 to v2.1 #9557
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
That seems very very bad. If you do it 'by hand' which one is correct? |
I think |
The 2.0.2 result matches what I would compute by-hand # compute normalized heights by hand
heights, bins = np.histogram(t, bins=[-5, -3, -2, -1, -0.5, 0, 2, 4, 5, 10])
bin_widths = bins[1:] - bins[:-1]
normed_heights = heights / bin_widths / heights.sum()
bin_centers = 0.5 * (bins[1:] + bins[:-1])
# compare to normed hist output
plt.hist(t, bins=[-5, -3, -2, -1, -0.5, 0, 2, 4, 5, 10], normed=True)
plt.plot(bin_centers, normed_heights, 'ok');
plt.title(f'matplotlib v{matplotlib.__version__}') |
Note that the numpy |
This seems to be the critical difference: >>> print(np.__version__)
1.13.1
>>> print(np.histogram(t, bins, normed=True)[0])
[ 0. 0.00482315 0.03215434 0.0192926 0.03215434 0.1607717
0.1318328 0.02250804 0.0659164 ]
>>> print(np.histogram(t, bins, density=True)[0])
[ 0. 0.01027397 0.06849315 0.08219178 0.1369863 0.17123288
0.14041096 0.04794521 0.02808219] In mpl 2.1, the I'm honestly not certain what the numpy |
https://github.com/numpy/numpy/blob/v1.13.0/numpy/lib/function_base.py#L432-L826 It seems to do what you are doing.... |
@jklymak – thanks, you're right. It looks like matplotlib 2.0 |
My error, it is Ours, you are supposed to use |
Here is a concise test case that passes in matplotlib 2.0, but not in 2.1 (tested with numpy 1.13.1): import numpy as np
import matplotlib.pyplot as plt
from numpy.testing import assert_allclose
def test_hist_normed():
rng = np.random.RandomState(57483)
t = rng.randn(100)
bins = [-3, -1, -0.5, 0, 1, 5]
mpl_heights, _, _ = plt.hist(t, bins=bins, normed=True)
np_heights, _ = np.histogram(t, bins=bins, density=True)
assert_allclose(mpl_heights, np_heights) |
What a mess: t = np.concatenate([rng.randn(100),
2 + 0.1 * rng.randn(100),
5 + 3 * rng.randn(100)])
# compute normalized heights by hand
bins0 = [-5, -3, -2.2, -1, -0.5, 0, 2, 4.2, 5.6, 10]
heights, bins = np.histogram(t, bins=bins0)
bin_widths = bins[1:] - bins[:-1]
normed_heights = heights / bin_widths / heights.sum()
bin_centers = 0.5 * (bins[1:] + bins[:-1])
# compare to density hist output
hn, hbins = np.histogram(t, bins=bins0, density=True)
# compare to normed hist output
hn0, hbins0 = np.histogram(t, bins=bins0, normed=True)
# compare to mpl....
hn2, hbins2, patches = plt.hist(t, bins=bins0, density=True, label='MPL plot')
plt.plot(bin_centers, normed_heights, 'ok', label='by hand');
plt.plot(0.5*(hbins[1:]+hbins[:-1]), hn, 'or', ms=3., label='np: density=True');
plt.plot(0.5*(hbins0[1:]+hbins0[:-1]), hn0, 'oc', ms=6., label='np: normed=True');
plt.title(f'matplotlib v{matplotlib.__version__}')
plt.legend() |
Here's the issue – in 2.1, the normalization is applied twice: once by matplotlib, and once by numpy: https://github.com/matplotlib/matplotlib/blob/v2.1.x/lib/matplotlib/axes/_axes.py#L6201-L6224 |
The problematic lines seem to be removed already from master, though they are still in the 2.1.x branch. |
Ha ha. Embarassingly, I even commented on it: #9121 Even more embarassingly, I hadn't updated that branch of master yet. Someone should still probably talk to numpy about their "normed" kw! |
Ah sorry. I thought they were deprecating density. |
I could open a PR with that test case – I suspect it will pass on master, and we could then back-port appropriate changes to 2.1.x |
It’d be great to have a test w unequal bins. |
Did we backport what ever fixed this on master to 2.1.x? Re-opening to make sure that does not get lost (sorry if I am stepping on anyone's toes!). |
#9121. Not sure if it was backported! |
Huh, just thought I had removed old code, didn't realise that I'd accidentally fixed anything with that PR! |
Actually, #9586 hasn't been merged yet but the button is green if someone wants to do it 😄 |
Merged. 🎉 |
@mshonichev – this bug would only change things if you were using unevenly-spaced bins; I don't think it's related to the problem you're having. |
mmm... that might be exactly that case, the source data has only 144 points and they are not evenly distributed. Any workaround but for downgrading? |
Can you open a new issue with a minimal (no extra calls) self contained example (no csv file)? But the normed kwarg is deprecated and I don’t know what passing zero in does. |
The relevant piece is the spacing of the bins, not the spacing of the data points. Since you use I would open a new matplotlib issue to ask about this bug, but try to put together an example that others can run: the one above relies on reading a data file that is unavailable to anyone else – see http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports |
Found in the context of astropy/astropy#6786
When hist() is passed irregular bins with
normed=True
, the output is different between matplotlib 2.0 and 2.1. Here is a test script to reproduce the issue:hist-2.0.2.png
:hist-2.1.0.png
:The text was updated successfully, but these errors were encountered: