Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit 78137ba

Browse files
tylerjereddycharris
authored andcommitted
BUG: histogram small range robust
* Fixes #23110 * the histogram `norm` variable is used to determine the bin index of input values, and `norm` is calculated in some cases by dividing `n_equal_bins` by the range of the data; when the range of the data is extraordinarily small, the `norm` can become floating point infinity * in this patch, we delay calculating `norm` to increase resistance to the generation of infinite values--for example, a really small input value divided by a really small range is more resistant to generating infinity, so we effectively just change the order of operations a bit * however, I haven't considered whether this is broadly superior for resisting floating point non-finite values for other `histogram` input/extreme value permutations--one might speculate that this is just patching one extreme case that happened to show up in the wild, but may increase likelihood of some other extreme case that isn't in our testsuite yet * the main logic for this patch is that it fixes an issue that occurred in the wild and adds a test for it--if another extreme value case eventually pops up, at least this case will have a regression guard to keep guiding us in the right direction
1 parent b92248a commit 78137ba

File tree

2 files changed

+11
-2
lines changed

2 files changed

+11
-2
lines changed

numpy/lib/histograms.py

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -807,7 +807,8 @@ def histogram(a, bins=10, range=None, density=None, weights=None):
807807
n = np.zeros(n_equal_bins, ntype)
808808

809809
# Pre-compute histogram scaling factor
810-
norm = n_equal_bins / _unsigned_subtract(last_edge, first_edge)
810+
norm_numerator = n_equal_bins
811+
norm_denom = _unsigned_subtract(last_edge, first_edge)
811812

812813
# We iterate over blocks here for two reasons: the first is that for
813814
# large arrays, it is actually faster (for example for a 10^8 array it
@@ -835,7 +836,8 @@ def histogram(a, bins=10, range=None, density=None, weights=None):
835836

836837
# Compute the bin indices, and for values that lie exactly on
837838
# last_edge we need to subtract one
838-
f_indices = _unsigned_subtract(tmp_a, first_edge) * norm
839+
f_indices = ((_unsigned_subtract(tmp_a, first_edge) / norm_denom)
840+
* norm_numerator)
839841
indices = f_indices.astype(np.intp)
840842
indices[indices == n_equal_bins] -= 1
841843

numpy/lib/tests/test_histograms.py

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -408,6 +408,13 @@ def test_big_arrays(self):
408408
hist = np.histogramdd(sample=sample, bins=(xbins, ybins, zbins))
409409
assert_equal(type(hist), type((1, 2)))
410410

411+
def test_gh_23110(self):
412+
hist, e = np.histogram(np.array([-0.9e-308], dtype='>f8'),
413+
bins=2,
414+
range=(-1e-308, -2e-313))
415+
expected_hist = np.array([1, 0])
416+
assert_array_equal(hist, expected_hist)
417+
411418

412419
class TestHistogramOptimBinNums:
413420
"""

0 commit comments

Comments
 (0)