[Bug]: Gaps and overlapping areas between bins when using float16 #22622

G-nn-r · 2022-03-08T10:34:29Z

Bug summary

When creating a histogram out of float16 data, the bins are also calculated in float16. The lower precision can cause two errors:

Gaps between certain bins.
Two neighboring bins overlap each other (only visible when alpha < 1)

Code for reproduction

import numpy as np
import matplotlib.pyplot as plt
values = np.clip(np.random.normal(0.5, 0.3, size=1000), 0, 1).astype(np.float16)
plt.hist(values, bins=100, alpha=0.5)
plt.show()

Actual outcome

Expected outcome

Created by plt.hist(values.astype(np.float32), bins=100, alpha=0.5) plt.show()

Additional information

Possible solution
Calculate the bins in float32:

Determine minimal and maximal value in float16.
Convert min and max to float32.
Calculate the bin edges.

Theoretical possible, but unwanted solution
Convert data into float32 before calculating the histogram. This behavior does not make a lot of sense, as float16 is mostly used because of memory limitations (arrays with billions of values can easily take several gigabytes).

Operating system

Windows 10

Matplotlib Version

3.4.3

Matplotlib Backend

TkAgg

Python version

3.7.1

Jupyter version

No response

Installation

pip

The text was updated successfully, but these errors were encountered:

timhoffm · 2022-03-08T15:47:38Z

To be checked: Can the same effect occur when using (numpy) int arrays?

greglucas · 2022-03-08T15:48:03Z

Just a note that np.hist(float16) returns float16 edges.

You may want to try using "stairs" here instead, which won't draw the bars all the way down to zero and help avoid those artifacts.
plt.stairs(*np.histogram(values, bins=100), fill=True, alpha=0.5)

oscargus · 2022-03-08T16:13:46Z

I am not sure, but it seems like possibly a problem in NumPy.

In[9]: cnt, bins = np.histogram(values, 100)

In [10]: bins
Out[10]: 
array([0.  , 0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09, 0.1 ,
       0.11, 0.12, 0.13, 0.14, 0.15, 0.16, 0.17, 0.18, 0.19, 0.2 , 0.21,
       0.22, 0.23, 0.24, 0.25, 0.26, 0.27, 0.28, 0.29, 0.3 , 0.31, 0.32,
       0.33, 0.34, 0.35, 0.36, 0.37, 0.38, 0.39, 0.4 , 0.41, 0.42, 0.43,
       0.44, 0.45, 0.46, 0.47, 0.48, 0.49, 0.5 , 0.51, 0.52, 0.53, 0.54,
       0.55, 0.56, 0.57, 0.58, 0.59, 0.6 , 0.61, 0.62, 0.63, 0.64, 0.65,
       0.66, 0.67, 0.68, 0.69, 0.7 , 0.71, 0.72, 0.73, 0.74, 0.75, 0.76,
       0.77, 0.78, 0.79, 0.8 , 0.81, 0.82, 0.83, 0.84, 0.85, 0.86, 0.87,
       0.88, 0.89, 0.9 , 0.91, 0.92, 0.93, 0.94, 0.95, 0.96, 0.97, 0.98,
       0.99, 1.  ], dtype=float16)

In [11]: np.diff(bins)
Out[11]: 
array([0.01    , 0.01    , 0.009995, 0.01001 , 0.00998 , 0.01001 ,
       0.01001 , 0.01001 , 0.01001 , 0.00995 , 0.01001 , 0.01001 ,
       0.01001 , 0.01001 , 0.01001 , 0.01001 , 0.01001 , 0.01001 ,
       0.00989 , 0.01001 , 0.01001 , 0.01001 , 0.01001 , 0.01001 ,
       0.01001 , 0.01001 , 0.01001 , 0.01001 , 0.01001 , 0.01001 ,
       0.01001 , 0.01001 , 0.01001 , 0.01001 , 0.01001 , 0.01001 ,
       0.01001 , 0.009766, 0.01001 , 0.01001 , 0.01001 , 0.01001 ,
       0.01001 , 0.01001 , 0.01001 , 0.01001 , 0.01001 , 0.01001 ,
       0.01001 , 0.01001 , 0.009766, 0.010254, 0.009766, 0.010254,
       0.009766, 0.010254, 0.009766, 0.010254, 0.009766, 0.010254,
       0.009766, 0.010254, 0.009766, 0.010254, 0.009766, 0.010254,
       0.009766, 0.010254, 0.009766, 0.010254, 0.009766, 0.010254,
       0.009766, 0.010254, 0.009766, 0.009766, 0.010254, 0.009766,
       0.010254, 0.009766, 0.010254, 0.009766, 0.010254, 0.009766,
       0.010254, 0.009766, 0.010254, 0.009766, 0.010254, 0.009766,
       0.010254, 0.009766, 0.010254, 0.009766, 0.010254, 0.009766,
       0.010254, 0.009766, 0.010254, 0.009766], dtype=float16)

It looks like the diff is not really what is expected.

oscargus · 2022-03-08T16:22:22Z

~~I am actually a bit doubtful if the bins are really float16 here though.~~ I guess they are, since it is float16, not bfloat16.

oscargus · 2022-03-08T16:30:38Z

It is possible to trigger it with quite high probability using three bins, so that may be an easier case to debug (second and third bar overlap). Bin edges and diff seems to be the same independent of overlap or not.

In [44]: bins
Out[44]: array([0.    , 0.3333, 0.6665, 1.    ], dtype=float16)

In [45]: np.diff(bins)
Out[45]: array([0.3333, 0.3333, 0.3335], dtype=float16)

oscargus · 2022-03-08T16:52:55Z

There is an overlap in the plot data (so it is not caused by the actual plotting, possibly rounding the wrong way):

In [98]: bc.patches[1].get_corners()
Out[98]: 
array([[3.33251953e-01, 0.00000000e+00],
       [6.66992188e-01, 0.00000000e+00],
       [6.66992188e-01, 4.05000000e+02],
       [3.33251953e-01, 4.05000000e+02]])

In [99]: bc.patches[2].get_corners()
Out[99]: 
array([[  0.66601562,   0.        ],
       [  0.99951172,   0.        ],
       [  0.99951172, 314.        ],
       [  0.66601562, 314.        ]])

As the second bar ends at 6.66992188e-01 and the third bar starts at 0.66601562, this will happen.

oscargus · 2022-04-03T10:06:00Z

A possibly easy way to solve this is to provide a keyword argument to bar/barh that makes sure that the bars are always adjacent, i.e., let bar/barh know that the next bar should have the same starting point as the previous bars end point. That keyword argument can then be called from from hist in case of an rwidth of 1.
This is probably the line causing the error:

matplotlib/lib/matplotlib/axes/_axes.py

Line 2382 in 8b1881f

left = x - width / 2

Something like np.diff(np.cumsum(x) - width/2) may work, but should then only be conditionally executed if the keyword argument is set.

(Then, I am not sure to what extent np.diff and np.cumsum are 100% numerically invariant, it is not trivial under floating-point arithmetic. But probably this will reduce the probability of errors anyway.)

oscargus · 2022-04-03T14:46:15Z

To be checked: Can the same effect occur when using (numpy) int arrays?

Yes and no. As the int array will become a float64 after multiplying with a float (dr in the code), it is quite unlikely to happen. However, it is not theoretically impossible to obtain the same effect with float64, although not very likely that it will actually be seen in a plot (the accumulated numerical error should correspond to something close to half(?) a pixel). But I am quite sure that one can trigger this by trying.

jklymak · 2022-04-04T07:07:24Z

If you force the bins to be float64, then you won't have this problem:

import numpy as np
import matplotlib.pyplot as plt
values = np.clip(np.random.normal(0.5, 0.3, size=1000), 0, 1).astype(np.float16)
n, bins = np.histogram(values, bins=100)
n, bins, patches = plt.hist(values, bins=np.array(bins, dtype='float64'), alpha=0.5)

plt.show()

so I think the reasonable fix here is simply for matplotlib to coerce the output from np.histogram to be floats - the output is turned to float64 when rendered anyways, and the extra memory for any visible number of bins is not going to matter.

oscargus added the status: confirmed bug label Mar 8, 2022

oscargus mentioned this issue Apr 3, 2022

Refactor hist for less numerical errors #22773

Closed

6 tasks

oscargus mentioned this issue May 14, 2022

Fix issue with hist and float16 data #23047

Merged

2 tasks

jklymak closed this as completed in #23047 May 16, 2022

QuLogic added this to the v3.6.0 milestone May 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bug]: Gaps and overlapping areas between bins when using float16 #22622

[Bug]: Gaps and overlapping areas between bins when using float16 #22622

G-nn-r commented Mar 8, 2022

timhoffm commented Mar 8, 2022

Uh oh!

greglucas commented Mar 8, 2022

Uh oh!

oscargus commented Mar 8, 2022

Uh oh!

oscargus commented Mar 8, 2022 •

edited

Loading

Uh oh!

oscargus commented Mar 8, 2022 •

edited

Loading

Uh oh!

oscargus commented Mar 8, 2022

Uh oh!

oscargus commented Apr 3, 2022

Uh oh!

oscargus commented Apr 3, 2022

Uh oh!

jklymak commented Apr 4, 2022 •

edited

Loading

Uh oh!

Uh oh!

[Bug]: Gaps and overlapping areas between bins when using float16 #22622

[Bug]: Gaps and overlapping areas between bins when using float16 #22622

Comments

G-nn-r commented Mar 8, 2022

Bug summary

Code for reproduction

Actual outcome

Expected outcome

Additional information

Operating system

Matplotlib Version

Matplotlib Backend

Python version

Jupyter version

Installation

timhoffm commented Mar 8, 2022

Uh oh!

greglucas commented Mar 8, 2022

Uh oh!

oscargus commented Mar 8, 2022

Uh oh!

oscargus commented Mar 8, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

oscargus commented Mar 8, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

oscargus commented Mar 8, 2022

Uh oh!

oscargus commented Apr 3, 2022

Uh oh!

oscargus commented Apr 3, 2022

Uh oh!

jklymak commented Apr 4, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

oscargus commented Mar 8, 2022 •

edited

Loading

oscargus commented Mar 8, 2022 •

edited

Loading

jklymak commented Apr 4, 2022 •

edited

Loading