Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Statistical independence of pseudo-random numbers #1139

@wlandau

Description

@wlandau

targets has two major challenges with pseudo-random number generation:

  1. Statistical reproducibility. Different runs of the same pipeline must give the exact same results.
  2. Statistical independence. Pseudo-random numbers from different targets must be independent.

For years, targets has controlled (1) in a simplistic way: set a deterministic seed for each target, where each target seed depends on the target name and an overarching global seed:

targets/R/utils_digest.R

Lines 35 to 42 in 116f1ce

produce_seed <- function(scalar) {
seed <- tar_options$get_seed()
if_any(
anyNA(seed),
NA_integer_,
digest::digest2int(as.character(scalar), seed = seed)
)
}

But this approach may violate (2). Different targets have different pseudo-random number generator states, and there is nothing to prevent the PRNG sequence of one target from overlapping with that of a different target.

This problem is not unique to targets, it is a general issue with parallel computing.

C.f. wlandau/crew#113.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions