-
-
Notifications
You must be signed in to change notification settings - Fork 7.9k
[Bug]: Gaps and overlapping areas between bins when using float16 #22622
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
To be checked: Can the same effect occur when using (numpy) int arrays? |
Just a note that You may want to try using "stairs" here instead, which won't draw the bars all the way down to zero and help avoid those artifacts. |
I am not sure, but it seems like possibly a problem in NumPy.
It looks like the diff is not really what is expected. |
|
It is possible to trigger it with quite high probability using three bins, so that may be an easier case to debug (second and third bar overlap). Bin edges and diff seems to be the same independent of overlap or not.
|
There is an overlap in the plot data (so it is not caused by the actual plotting, possibly rounding the wrong way):
As the second bar ends at 6.66992188e-01 and the third bar starts at 0.66601562, this will happen. |
A possibly easy way to solve this is to provide a keyword argument to matplotlib/lib/matplotlib/axes/_axes.py Line 2382 in 8b1881f
Something like np.diff(np.cumsum(x) - width/2) may work, but should then only be conditionally executed if the keyword argument is set.
(Then, I am not sure to what extent np.diff and np.cumsum are 100% numerically invariant, it is not trivial under floating-point arithmetic. But probably this will reduce the probability of errors anyway.) |
Yes and no. As the int array will become a float64 after multiplying with a float (dr in the code), it is quite unlikely to happen. However, it is not theoretically impossible to obtain the same effect with float64, although not very likely that it will actually be seen in a plot (the accumulated numerical error should correspond to something close to half(?) a pixel). But I am quite sure that one can trigger this by trying. |
If you force the bins to be float64, then you won't have this problem: import numpy as np
import matplotlib.pyplot as plt
values = np.clip(np.random.normal(0.5, 0.3, size=1000), 0, 1).astype(np.float16)
n, bins = np.histogram(values, bins=100)
n, bins, patches = plt.hist(values, bins=np.array(bins, dtype='float64'), alpha=0.5)
plt.show() so I think the reasonable fix here is simply for matplotlib to coerce the output from |
Bug summary
When creating a histogram out of float16 data, the bins are also calculated in float16. The lower precision can cause two errors:
Code for reproduction
Actual outcome
Expected outcome
Created by
plt.hist(values.astype(np.float32), bins=100, alpha=0.5) plt.show()
Additional information
Possible solution
Calculate the bins in float32:
Theoretical possible, but unwanted solution
Convert data into float32 before calculating the histogram. This behavior does not make a lot of sense, as float16 is mostly used because of memory limitations (arrays with billions of values can easily take several gigabytes).
Operating system
Windows 10
Matplotlib Version
3.4.3
Matplotlib Backend
TkAgg
Python version
3.7.1
Jupyter version
No response
Installation
pip
The text was updated successfully, but these errors were encountered: