-
-
Notifications
You must be signed in to change notification settings - Fork 11k
np.histogram_bin_edges not returning expected bin width for argument bin = 'fd' #18319
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
The code for the Freedman Diaconis binwidth estimator is here: Line 199 in f36e940
Aside from cruft, the function is two lines long:
This is pretty much identical to your method, and yields the same result. However, some additional magic happens under the hood in
As you can see, it is the rounding step that is responsible for the difference in results. So your observation of the difference is correct, but the code is using the same result as yours along with some additional transformations. We could argue ad nauseum about rounding vs rounding up or rounding down, but the result would never match the optimal bin width in any case. One other alternative I can think of is keeping the optimal width, and setting the start/end points to fully contain the range. This would result in biases of the edge bins, however, which is generally undesirable. |
Hey, thanks for getting back to me! Your explanation makes a lot of sense. I see what the code is doing now, and it makes sense. I guess I just find it surprising that if you specify the bin width method, the produced bin width is not what the formula in the documentation. However, if the code is behaving as expected I guess it's best to close the issue. |
It may be worth documenting this somewhere. A sentence like "The actual number of bins is always chosen to divide the range into an integer number of bins that is at least as large as the estimate.", or something to that effect in |
I can give that a go - thanks! |
You can now close this issue. I suspect it's going to be a handy source/reference for places like Stack Overflow. Nice work! |
Cool, thanks for the help! |
Hey,
I was playing about with histogram bin widths recently and was trying to test my method for the Freedman Diaconis Estimator by checking it against the "np.histogram_bin_edges". I spotted that I was getting different values when I calculated the bin width using this method and am suspecting there is some rounding going on - and I just want to check whether it is intentional or not.
This is because when I choose the length of my data to be a cube number my results match. However, in other cases my answers differ (see example below).
Apologies if this is expected behaviour or there is a bug in my method.
Reproducing code example:
Output:
NumPy/Python version information:
1.19.5 3.6.9 (default, Oct 8 2020, 12:12:24)
[GCC 8.4.0]
The text was updated successfully, but these errors were encountered: