Thanks to visit codestin.com
Credit goes to github.com

Skip to content

BUG: zero-width histogram bins if the data values are in a small range close to numeric precision #27142

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
timhoffm opened this issue Aug 8, 2024 · 3 comments · Fixed by #27148
Labels

Comments

@timhoffm
Copy link
Contributor

timhoffm commented Aug 8, 2024

Describe the issue:

np.histogram can produce zero-width bins if the data values are in a small range close to numeric precision. Noted in matplotlib/matplotlib#28685

I'm aware that there is no reasonable representation / that calculating a reasonable representation close to the numeric precision is not possible. But would it be an option to check the bin-width and error out instead of returning nonsense histogram binning?

Reproduce the code example:

In [1]: import numpy as np

In [2]: a = np.array([1, 1+2e-16] * 10)

In [3]: counts, lims = np.histogram(a, bins=10)

In [4]: counts
Out[4]: array([ 0, 10,  0,  0,  0,  0,  0,  0,  0, 10])

In [5]: lims
Out[5]: array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])

In [6]: np.diff(lims)
Out[6]: 
array([0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
       0.00000000e+00, 2.22044605e-16, 0.00000000e+00, 0.00000000e+00,
       0.00000000e+00, 0.00000000e+00])

Error message:

No response

Python and NumPy Versions:

numpy 1.26.4
python 3.12.2

Runtime Environment:

No response

Context for the issue:

No response

@timhoffm
Copy link
Contributor Author

timhoffm commented Aug 8, 2024

I saw there is already a check for monotony:

if np.any(bin_edges[:-1] > bin_edges[1:]):

Is there a reason = is accepted or could this check be changed to >=?

@mattip
Copy link
Member

mattip commented Aug 8, 2024

A PR with a test for this edge case might expose where other tests could fail.

@ngoldbaum
Copy link
Member

This vaguely sounds like #23110 which was supposed to be solved by #24161

timhoffm added a commit to timhoffm/numpy that referenced this issue Aug 8, 2024
When many bins are requested in a small value region,
it may not be possible to create enough distinct bin
edges due to limited numeric precision. Up to now,
`histogram` then returned identical subsequent bin
edges, which would mean a bin width of 0. These bins
could also have counts associated with them.

Instead of returning such unlogical bin distributions,
this PR raises a value error if the calculated bins
do not all have a finite size.

Closes numpy#27142.
timhoffm added a commit to timhoffm/numpy that referenced this issue Aug 8, 2024
When many bins are requested in a small value region,
it may not be possible to create enough distinct bin
edges due to limited numeric precision. Up to now,
`histogram` then returned identical subsequent bin
edges, which would mean a bin width of 0. These bins
could also have counts associated with them.

Instead of returning such unlogical bin distributions,
this PR raises a value error if the calculated bins
do not all have a finite size.

Closes numpy#27142.
ArvidJB pushed a commit to ArvidJB/numpy that referenced this issue Nov 1, 2024
When many bins are requested in a small value region,
it may not be possible to create enough distinct bin
edges due to limited numeric precision. Up to now,
`histogram` then returned identical subsequent bin
edges, which would mean a bin width of 0. These bins
could also have counts associated with them.

Instead of returning such unlogical bin distributions,
this PR raises a value error if the calculated bins
do not all have a finite size.

Closes numpy#27142.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants