-
-
Notifications
You must be signed in to change notification settings - Fork 32.1k
bpo-37000: Remove obsolete comment in _randbelow_with_getrandbits #95775
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bpo-37000: Remove obsolete comment in _randbelow_with_getrandbits #95775
Conversation
In random._randbelow_with_getrandbits we are generating a random integer in [0,n). We do that by repeatedly drawing a random number, r, in [0,2**k) until we hit an r < n. That's a great strategy. It's not only very simple, but even with worst case input, we only need to draw 2 random numbers on average, before we hit the jackpot. However, the code is a bit overly cautious. When n is a power of 2, we ask `getrandbits` to sample from [0,2*n]. The code justifies that strange behaviour with a comment expressing fear of the special case n==1. That fear is unfounded: the obvious code works just fine for n==1. Fixing this not only makes the code simpler to understand, it also guarantees that the important special case of n being a power of two now has a guaranteed worst case runtime of only a single call to getrandbits; instead of an expected runtime of two calls with no worst case bound. Existing tests already cover this code and the special cases of interest. So we don't need any new tests. This minor performance bug was introduced in 0515661 as far as I can tell. But I can't really tell what the original author was thinking.
Most changes to Python require a NEWS entry. Please add it using the blurb_it web app or the blurb command-line tool. |
Before Python 3.9, But there's a different issue now: as the failing tests show, while |
Most changes to Python require a NEWS entry. Please add it using the blurb_it web app or the blurb command-line tool. |
Thanks for the quick review, @tim-one! You make a very good point. If you are interested, I have restored the original behaviour and included your reasoning to explain this seemingly unusual behaviour to the next person reading this code. If you think that might be useful, I can polish the description etc to make this merge-able? |
See also previous discussion at #81181. |
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
For efficiency, we'd like to use k = (n-1).bit_length() here, but before Python 3.9, `getrandbits()` did not accept 0 as an argument. That's why using `n-1` was a problem when `n` was 1. Now we are stuck with this version, because we want to reproduce results after explicitly setting a seed even between releases.
Most changes to Python require a NEWS entry. Please add it using the blurb_it web app or the blurb command-line tool. |
Comment changes don't warrant a comment.
Most changes to Python require a NEWS entry. Please add it using the blurb_it web app or the blurb command-line tool. |
@matthiasgoergens: Status check is done, and it's a failure ❌ . |
In
random._randbelow_with_getrandbits
we are generating a random integerin [0,n). We do that by repeatedly drawing a random number, r, in [0,2**k)
until we hit an r < n.
That's a great strategy. It's not only very simple, but even with worst
case input, we only need to draw 2 random numbers on average, before we
hit the jackpot.
However, the code is a bit overly cautious. When n is a power of 2, we
ask
getrandbits
to sample from [0,2*n]. The code justifies thatstrange behaviour with a comment expressing fear of the special case n==1.
That fear is unfounded: the obvious code works just fine for n==1.
Fixing this not only makes the code simpler to understand, it also
guarantees that the important special case of n being a power of two now
has a guaranteed worst case runtime of only a single call to
getrandbits; instead of an expected runtime of two calls with no worst
case bound.
Existing tests already cover this code and the special cases of
interest. So we don't need any new tests.
This minor performance bug was introduced in
0515661 as far as I can tell. But I
can't really tell what the original author was thinking.
Since this is a fairly trivial change, I did not create an issue. (But I am happy to create one, if you think it's better this way.)
I discovered this problem while working on https://discuss.python.org/t/random-choice-from-a-dict/17834
Automerge-Triggered-By: GH:rhettinger