Thanks to visit codestin.com
Credit goes to github.com

Skip to content

DOC: better document the spawn interface, compare and contrast it to Jax's "split" #15656

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
NeilGirdhar opened this issue Feb 27, 2020 · 9 comments

Comments

@NeilGirdhar
Copy link
Contributor

NeilGirdhar commented Feb 27, 2020

Jax, a new deep learning machine learning library has copied numpy's excellent interface. They added one innovation to random number generation: splitting.

This is useful when parallel processes that should not be serialized need access to random numbers from a reproducible stream.

I suggest adding a method to BitGenerator:

b: BitGenerator
b.split(n)  # returns a list of n BitGenerators, each differently initialized from b in a unique, reproducible way

and a similar method on Generator that splits the underlying bit-generator and returns

[type(self)(big_generator[i]) for range(i)]
@mattip
Copy link
Member

mattip commented Feb 27, 2020

We already have such an interface, see Parallel Random Number Generation. If you couldn't find it that means we need to somehow make it more discoverable, so I am pivoting this to a documentation issue.

@mattip mattip changed the title Feature request: Add split method to Generator and BitGenerator. DOC: better document the spawn interface, compare and contrast it to Jax's "split" Feb 27, 2020
@NeilGirdhar
Copy link
Contributor Author

@mattip Please don't be so fast to blame your documentation. I didn't look hard enough. I'm goign to read it now.

@NeilGirdhar
Copy link
Contributor Author

Looks like an excellent design. I guess you could use the word "splittable" and "fork" somewhere on the page so that Google finds it. This is the terminology used by Jax.

@charris
Copy link
Member

charris commented Feb 27, 2020

Maybe we should have keyword lists in the documentation.

@NeilGirdhar
Copy link
Contributor Author

Good idea. Anyway, feel free to close this. I've tried to convince the Jax people to use your interface: jax-ml/jax#2294 If they agree, the ambiguity will disappear.

@rkern
Copy link
Member

rkern commented Feb 27, 2020

While the current API forces you to work with SeedSequences directly to access its spawn() method, we did make sure that it would be possible to expose spawn() methods on BitGenerator and Generator, which will be more convenient. We wanted to wait until we got a little more experience with the concept before exposing it so prominently.

Just to provide some mathematical background, Jax's PRNG is in the same weak-crypto family as our Philox BitGenerator. The method by which it splits only works well for that family of weak-crypto PRNGs because that family keeps its initial seed around as the key value and only evolves a counter as one draws numbers from it. The other PRNGs iterate the state.

SeedSequence implements a similar scheme, but separates that scheme out from the PRNG implementation. By keeping the original seed state around as SeedSequence and by implementing good integer hashing techniques, we can get the same benefits without needing the full weak-crypto functionality of the PRNG. This may be of particular interest to the Jax developers since the ThreeFry algorithm that they use is not the fastest, nor necessarily the most GPU-friendly (once the massive-parallelization playing field is leveled by SeedSequence). They might be able to use a faster PRNG like SFC64 (our default of PCG64 may not be appropriate on a GPU because it requires 128-bit multiplication).

@NeilGirdhar
Copy link
Contributor Author

@rkern I wish there was a fascinating reaction, because that is fascinating. I'll point them to your comment.

@peteroupc
Copy link

peteroupc commented Feb 27, 2020

Note that Philox, and the Threefry PRNG, and SFC are merely examples of a general construction called the "counter-based PRNG" (as used in the "Random123" paper). In general, counter-based PRNGs use an underlying hash function or block cipher to hash a seed and an incrementing counter. And any other hash function can substitute for Threefry (or whatever underlying function the counter-based PRNG uses) as long as the resulting PRNG provides adequate randomness.

Also, splittable PRNGs are far from being an "innovation" of JAX (see also the JAX PRNG design notes); they have existed, for example, in Haskell and Java for years. Some of the known constructions for splittable PRNGs are surveyed in "Evaluation of Splittable Pseudo-Random Generators" by H. G. Schaathun, 2015. Some of them are general enough to be used by any PRNG (including Mersenne Twister and PCG), but do not necessarily lead to high-quality splittable PRNGs. Another example of a splittable PRNG is found in JuliaLang/julia#34852. See also idontgetoutmuch/random#7.

@rkern
Copy link
Member

rkern commented Feb 27, 2020

Just for the record, note that SFC64 is not actually counter-based. It incorporates a counter into its state, but the counter isn't the only part of the state that evolves a la ThreeFry and Philox. The counter in SFC64 is there to ensure a minimum size of the cycles, which are of variable size.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants