Thanks to visit codestin.com
Credit goes to github.com

Skip to content

BUG: IndexError when using np.histogram with small values #23110

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
astrofrog opened this issue Jan 27, 2023 · 1 comment · Fixed by #24161
Closed

BUG: IndexError when using np.histogram with small values #23110

astrofrog opened this issue Jan 27, 2023 · 1 comment · Fixed by #24161
Labels

Comments

@astrofrog
Copy link
Contributor

astrofrog commented Jan 27, 2023

Describe the issue:

When using very small values with np.histogram, one can encounter an IndexError under some circumstances

Reproduce the code example:

np.histogram(np.array([-0.9e-308], dtype='>f8'), bins=2, range=(-1e-308, -2e-313))

Error message:

/home/tom/python/dev/lib/python3.11/site-packages/numpy/lib/histograms.py:810: RuntimeWarning: overflow encountered in scalar divide
  norm = n_equal_bins / _unsigned_subtract(last_edge, first_edge)
/home/tom/python/dev/lib/python3.11/site-packages/numpy/lib/histograms.py:839: RuntimeWarning: invalid value encountered in cast
  indices = f_indices.astype(np.intp)
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
Cell In[11], line 1
----> 1 np.histogram(np.array([-0.9e-308], dtype='>f8'), bins=2, range=(-1e-308, -2e-313))

File <__array_function__ internals>:200, in histogram(*args, **kwargs)

File ~/python/dev/lib/python3.11/site-packages/numpy/lib/histograms.py:844, in histogram(a, bins, range, density, weights)
    840 indices[indices == n_equal_bins] -= 1
    842 # The index computation is not guaranteed to give exactly
    843 # consistent results within ~1 ULP of the bin edges.
--> 844 decrement = tmp_a < bin_edges[indices]
    845 indices[decrement] -= 1
    846 # The last bin includes the right edge. The other bins do not.

IndexError: index -9223372036854775808 is out of bounds for axis 0 with size 3

Runtime information:

1.24.1

3.11.1 (main, Dec  7 2022, 01:11:34) [GCC 11.3.0]

[{'simd_extensions': {'baseline': ['SSE', 'SSE2', 'SSE3'],
                      'found': ['SSSE3',
                                'SSE41',
                                'POPCNT',
                                'SSE42',
                                'AVX',
                                'F16C',
                                'FMA3',
                                'AVX2'],
                      'not_found': ['AVX512F',
                                    'AVX512CD',
                                    'AVX512_KNL',
                                    'AVX512_KNM',
                                    'AVX512_SKX',
                                    'AVX512_CLX',
                                    'AVX512_CNL',
                                    'AVX512_ICL']}},
 {'architecture': 'Haswell',
  'filepath': '/home/tom/python/dev/lib/python3.11/site-packages/numpy.libs/libopenblas64_p-r0-15028c96.3.21.so',
  'internal_api': 'openblas',
  'num_threads': 16,
  'prefix': 'libopenblas',
  'threading_layer': 'pthreads',
  'user_api': 'blas',
  'version': '0.3.21'}]

Context for the issue:

No response

@ngoldbaum
Copy link
Member

This is happening because norm is ending up inf after this line:

https://github.com/numpy/numpy/blob/main/numpy/lib/histograms.py#L810

ipdb> p _unsigned_subtract(last_edge, first_edge)
9.9998e-309
ipdb> p n_equal_bins
2
ipdb> p 2 / 9.9998e-309
inf

The large negative index comes from casting inf to np.intp.

tylerjereddy added a commit to tylerjereddy/numpy that referenced this issue Jul 10, 2023
* Fixes numpy#23110

* the histogram `norm` variable is used to determine the bin
index of input values, and `norm` is calculated in some cases
by dividing `n_equal_bins` by the range of the data; when the
range of the data is extraordinarily small, the `norm` can become
floating point infinity

* in this patch, we delay calculating `norm` to increase resistance
to the generation of infinite values--for example, a really small
input value divided by a really small range is more resistant
to generating infinity, so we effectively just change the order
of operations a bit

* however, I haven't considered whether this is broadly superior
for resisting floating point non-finite values for other `histogram`
input/extreme value permutations--one might speculate that this is just
patching one extreme case that happened to show up in the wild, but
may increase likelihood of some other extreme case that isn't in our
testsuite yet

* the main logic for this patch is that it fixes an issue that
occurred in the wild and adds a test for
it--if another extreme value case eventually pops up, at least
this case will have a regression guard to keep guiding us in the right
direction
tylerjereddy added a commit to tylerjereddy/numpy that referenced this issue Jul 10, 2023
* Fixes numpy#23110

* the histogram `norm` variable is used to determine the bin
index of input values, and `norm` is calculated in some cases
by dividing `n_equal_bins` by the range of the data; when the
range of the data is extraordinarily small, the `norm` can become
floating point infinity

* in this patch, we delay calculating `norm` to increase resistance
to the generation of infinite values--for example, a really small
input value divided by a really small range is more resistant
to generating infinity, so we effectively just change the order
of operations a bit

* however, I haven't considered whether this is broadly superior
for resisting floating point non-finite values for other `histogram`
input/extreme value permutations--one might speculate that this is just
patching one extreme case that happened to show up in the wild, but
may increase likelihood of some other extreme case that isn't in our
testsuite yet

* the main logic for this patch is that it fixes an issue that
occurred in the wild and adds a test for
it--if another extreme value case eventually pops up, at least
this case will have a regression guard to keep guiding us in the right
direction
charris pushed a commit to charris/numpy that referenced this issue Jul 14, 2023
* Fixes numpy#23110

* the histogram `norm` variable is used to determine the bin
index of input values, and `norm` is calculated in some cases
by dividing `n_equal_bins` by the range of the data; when the
range of the data is extraordinarily small, the `norm` can become
floating point infinity

* in this patch, we delay calculating `norm` to increase resistance
to the generation of infinite values--for example, a really small
input value divided by a really small range is more resistant
to generating infinity, so we effectively just change the order
of operations a bit

* however, I haven't considered whether this is broadly superior
for resisting floating point non-finite values for other `histogram`
input/extreme value permutations--one might speculate that this is just
patching one extreme case that happened to show up in the wild, but
may increase likelihood of some other extreme case that isn't in our
testsuite yet

* the main logic for this patch is that it fixes an issue that
occurred in the wild and adds a test for
it--if another extreme value case eventually pops up, at least
this case will have a regression guard to keep guiding us in the right
direction
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants