-
Notifications
You must be signed in to change notification settings - Fork 76
Closed
Description
targets
has two major challenges with pseudo-random number generation:
- Statistical reproducibility. Different runs of the same pipeline must give the exact same results.
- Statistical independence. Pseudo-random numbers from different targets must be independent.
For years, targets
has controlled (1) in a simplistic way: set a deterministic seed for each target, where each target seed depends on the target name and an overarching global seed:
Lines 35 to 42 in 116f1ce
produce_seed <- function(scalar) { | |
seed <- tar_options$get_seed() | |
if_any( | |
anyNA(seed), | |
NA_integer_, | |
digest::digest2int(as.character(scalar), seed = seed) | |
) | |
} |
But this approach may violate (2). Different targets have different pseudo-random number generator states, and there is nothing to prevent the PRNG sequence of one target from overlapping with that of a different target.
This problem is not unique to targets
, it is a general issue with parallel computing.
C.f. wlandau/crew#113.
Metadata
Metadata
Assignees
Labels
No labels