Generating random numbers
Generating uniform(0,1) deviates
Books: DE Knuth (1998) The art of computer
programming, vol 2, 3rd ed, ch 3
Numerical recipes in C, ch 7
Linear congruential generator
Xn+1 = a Xn + c (mod m)
m = modulus = 232 – 1
a = multiplier = choose carefully!
c = increment = (maybe) 0
X0 = seed
• Xn { 0, 1, 2, …, m – 1 }; Un = Xn/m
• Use odd number as seed
• Algebra/group theory helps with choice of a
• Want cycle of generator (number of steps
before it begins repeating) to be large
• Don't generate more than m/1000 numbers
Composite generator
Xn+1 = a1 Xn + c1 (mod m)
Yn+1 = a2 Yn + c2 (mod m)
1
W n+1 = Xn + Yn (mod m)
Shuffling a random number generator
• Initialization
– Generate an array R of n (= 100) random
numbers from sequence Xk
– Generate an additional number X to start
the process
• Each time generator is called
– Use X to find an index into the array R
j← X*n
– X ← R[ j ]
– R[ j ] ← a new random number
– Return X as random number for call
Shuffling with two generators
• Initialization
– Generate an array R of n (= 100) random
numbers from sequence Xk
• Each time generator is called
– Generate X from Xk and Y from Yk
– Use Y to find an index into the array R
j← Y*n
– Z ← R[ j ]
– R[ j ] ← X
– Return Z as random number for call
2
Generating random numbers in R
.Random.seed
changed or created after each call to any of the
random number generators
runif(n, min=0, max=1)
generates uniform(0,1) random numbers
Access from C
#include <R.h>
GetRNGstate();
PutRNGstate();
double unif_rand();
double norm_rand();
double exp_rand();
double r****();
/usr/local/lib/R/include/R.h
/usr/local/lib/R/include/R_ext/Mathlib.h
RHOME/src/main/RNG.c
RHOME/src/nmath/snorm.c
RHOME/src/nmath/sexp.c
3
Random draw from { 1, 2, 3, …, n } with
probabilities p1, p2, p3, …, pn
r <- runif(1)
for(i in 1:n) {
if(r < p[i]) return(i)
else r <- r – p[i]
}
sample(1:n, 1, prob=p)
Random permutation of {1, 2, …, n }
for(i in 1:n) {
r <- sample(i:n, 1)
# exchange x[i] and x[r]
x[c(i,r)] <- x[c(r,i)]
}
sample(1:n)
4
Simulate r.v. X with cdf F
U ~ uniform(0,1)
X ~ cdf F, so Pr(X ≤ x) = F(x); let G = F-1
G(U) has the same distribution as X
Example: exponential( )
F(x) = 1 – exp( – x )
F-1(u) = – log(1 – u) /
U ~ uniform(0,1)
X = – log(U ) / ~ exponential ( )
Geometric(p)
E ~ exponential[ = - log(1-p) ]
X = E or E + 1
5
Generating Gaussian deviates
CLT method
Generate U1, U2, …, Un ~ uniform(0,1)
X = (∑Ui – n/2) / √(n/12) ~ normal(0, 1)
Polar method (Marsaglia 1962)
1. Generate U1, U2 ~ uniform(0,1)
2. Calculate Vi = 2Ui – 1 [ Vi ~ uniform( -1, 1 ) ]
3. Calculate S = (V1)2 + (V2)2
4. If S ≥ 1, return to step 1
5. Let Z = √[ -2 log(S) / S ]
6. X1 = V1 Z X2 = V2 Z
Multiple calls: use a static variable
1. Generate V1 and V2; return V1
2. Return V2
3. Generate V1 and V2; return V1
4. Return V2
6
Acceptance - rejection technique (Von
Neumann 1951)
We wish to simulate from the density f(x)
Suppose f(x) is majorized by g(x):
c > 1 such that f(x) ≤ c g(x) = h(x) for all x
1. Sample X from g(x)
2. Sample U ~ uniform(0,1)
3. Accept X if U ≤ f(X) / h(X)
[otherwise, return to 1.]
It’s often convenient to use exponentials for this.
7
Multivariate normal
Suppose we wish to generate
X = ( x1, …, xp )' ~ MVN( µ, Σ )
Use Z = ( z1, …, zp )' ~ iid normal(0,1)
var(D’Z) = D’ var(Z) D = D’D
1. Cholesky decomposition Σ = D' D
2. Z = ( z1, …, zp )' ~ iid normal(0,1)
3. X = D'Z + µ
rmvn <-
function(n, mu=0, V = matrix(1))
{
p <- length(mu)
if(any(is.na(match(dim(V),p))))
stop("Dimension problem!")
D <- chol(V)
matrix(rnorm(n*p), ncol=p) %*% D +
rep(mu,rep(n,p))
}
8
Order statistics (K Lange’s example 20.7.5)
Exponential distribution
Consider X1, X2, …, Xn ~ iid exponential( = 1)
Order statistics X(1), X(2), …, X(n)
Define Z1 = X(1); Zi+1 = X(i+1) - X(i)
Then the Zi are independent; Zi ~ exp(n – i + 1)
X(k) = Z1 + Z2 + … + Zk
Uniform distribution
U(k) = exp( – X(n – k + 1) )
or U(k) = 1 – exp( – X(k) )
Empirical distribution { Y1, Y2, …, Yn }
Let j = n U(k)
Y*(k) = Y(j)
NOTE: This last bit is a Sunday night calculation,
and so should be treated with caution!
9
Order statistics from empirical distribution
ordstat1 <-
function(x, k=floor(length(x)*0.95))
sort(sample(x,repl=T))[k]
ordstat2 <-
function(x, k=floor(length(x)*0.95))
{
# x <- sort(x)
n <- length(x)
x[ceiling(n*exp(-sum(
rexp(n-k+1, n-(1:(n-k+1))+1)
)))]
}
> x <- rgamma(1000, 5, 5)
> u1 <- u2 <- 1:10000
> unix.time( for(i in 1:10000)
+ u1[i] <- ordstat1(x,950) )
[1] 170.34 2.59 172.98 0.00 0.00
> unix.time( for(i in 1:10000)
+ u2[i] <- ordstat2(x,950) )
[1] 8.32 0.50 8.82 0.00 0.00
10