BUG: IndexError when using np.histogram with small values #23110

astrofrog · 2023-01-27T11:53:25Z

Describe the issue:

When using very small values with np.histogram, one can encounter an IndexError under some circumstances

Reproduce the code example:

np.histogram(np.array([-0.9e-308], dtype='>f8'), bins=2, range=(-1e-308, -2e-313))

Error message:

/home/tom/python/dev/lib/python3.11/site-packages/numpy/lib/histograms.py:810: RuntimeWarning: overflow encountered in scalar divide
  norm = n_equal_bins / _unsigned_subtract(last_edge, first_edge)
/home/tom/python/dev/lib/python3.11/site-packages/numpy/lib/histograms.py:839: RuntimeWarning: invalid value encountered in cast
  indices = f_indices.astype(np.intp)
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
Cell In[11], line 1
----> 1 np.histogram(np.array([-0.9e-308], dtype='>f8'), bins=2, range=(-1e-308, -2e-313))

File <__array_function__ internals>:200, in histogram(*args, **kwargs)

File ~/python/dev/lib/python3.11/site-packages/numpy/lib/histograms.py:844, in histogram(a, bins, range, density, weights)
    840 indices[indices == n_equal_bins] -= 1
    842 # The index computation is not guaranteed to give exactly
    843 # consistent results within ~1 ULP of the bin edges.
--> 844 decrement = tmp_a < bin_edges[indices]
    845 indices[decrement] -= 1
    846 # The last bin includes the right edge. The other bins do not.

IndexError: index -9223372036854775808 is out of bounds for axis 0 with size 3

Runtime information:

1.24.1

3.11.1 (main, Dec  7 2022, 01:11:34) [GCC 11.3.0]

[{'simd_extensions': {'baseline': ['SSE', 'SSE2', 'SSE3'],
                      'found': ['SSSE3',
                                'SSE41',
                                'POPCNT',
                                'SSE42',
                                'AVX',
                                'F16C',
                                'FMA3',
                                'AVX2'],
                      'not_found': ['AVX512F',
                                    'AVX512CD',
                                    'AVX512_KNL',
                                    'AVX512_KNM',
                                    'AVX512_SKX',
                                    'AVX512_CLX',
                                    'AVX512_CNL',
                                    'AVX512_ICL']}},
 {'architecture': 'Haswell',
  'filepath': '/home/tom/python/dev/lib/python3.11/site-packages/numpy.libs/libopenblas64_p-r0-15028c96.3.21.so',
  'internal_api': 'openblas',
  'num_threads': 16,
  'prefix': 'libopenblas',
  'threading_layer': 'pthreads',
  'user_api': 'blas',
  'version': '0.3.21'}]

Context for the issue:

No response

The text was updated successfully, but these errors were encountered:

ngoldbaum · 2023-01-27T17:15:11Z

This is happening because norm is ending up inf after this line:

https://github.com/numpy/numpy/blob/main/numpy/lib/histograms.py#L810

ipdb> p _unsigned_subtract(last_edge, first_edge)
9.9998e-309
ipdb> p n_equal_bins
2
ipdb> p 2 / 9.9998e-309
inf

The large negative index comes from casting inf to np.intp.

* Fixes numpy#23110 * the histogram `norm` variable is used to determine the bin index of input values, and `norm` is calculated in some cases by dividing `n_equal_bins` by the range of the data; when the range of the data is extraordinarily small, the `norm` can become floating point infinity * in this patch, we delay calculating `norm` to increase resistance to the generation of infinite values--for example, a really small input value divided by a really small range is more resistant to generating infinity, so we effectively just change the order of operations a bit * however, I haven't considered whether this is broadly superior for resisting floating point non-finite values for other `histogram` input/extreme value permutations--one might speculate that this is just patching one extreme case that happened to show up in the wild, but may increase likelihood of some other extreme case that isn't in our testsuite yet * the main logic for this patch is that it fixes an issue that occurred in the wild and adds a test for it--if another extreme value case eventually pops up, at least this case will have a regression guard to keep guiding us in the right direction

astrofrog added the 00 - Bug label Jan 27, 2023

astrofrog mentioned this issue Jan 27, 2023

Fix segmentation fault astrofrog/fast-histogram#62

Merged

tylerjereddy mentioned this issue Jul 10, 2023

BUG: histogram small range robust #24161

Merged

seberg closed this as completed in #24161 Jul 11, 2023

charris mentioned this issue Jul 14, 2023

BUG: histogram small range robust #24185

Merged

ngoldbaum mentioned this issue Aug 8, 2024

BUG: zero-width histogram bins if the data values are in a small range close to numeric precision #27142

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

BUG: IndexError when using np.histogram with small values #23110

BUG: IndexError when using np.histogram with small values #23110

astrofrog commented Jan 27, 2023 •

edited

Loading

ngoldbaum commented Jan 27, 2023

Uh oh!

Uh oh!

BUG: IndexError when using np.histogram with small values #23110

BUG: IndexError when using np.histogram with small values #23110

Comments

astrofrog commented Jan 27, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Describe the issue:

Reproduce the code example:

Error message:

Runtime information:

Context for the issue:

ngoldbaum commented Jan 27, 2023

Uh oh!

astrofrog commented Jan 27, 2023 •

edited

Loading