CH 2
CH 2
Gerald Trutnau
Seoul National University
Fall Term 2024
Non-Corrected version
2
Contents
1 Basic Notions 5
1 Probability spaces . . . . . . . . . . . . . . . . . . . . . . . . 5
2 Discrete models . . . . . . . . . . . . . . . . . . . . . . . . . 13
3 Transformations of probability spaces . . . . . . . . . . . . . . 18
4 Random variables . . . . . . . . . . . . . . . . . . . . . . . . 21
5 Inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
6 Variance and Covariance . . . . . . . . . . . . . . . . . . . . . 30
7 The (strong and the weak) law of large numbers . . . . . . . . 33
8 Convergence and uniform integrability . . . . . . . . . . . . . . 39
9 Distribution of random variables . . . . . . . . . . . . . . . . . 47
10 Weak convergence of probability measures . . . . . . . . . . . 51
11 Dynkin-systems and Uniqueness of probability measures . . . . 57
2 Independence 63
1 Independent events . . . . . . . . . . . . . . . . . . . . . . . . 63
2 Independent random variables . . . . . . . . . . . . . . . . . . 69
3 Kolmogorov’s law of large numbers . . . . . . . . . . . . . . . 71
4 Joint distribution and convolution . . . . . . . . . . . . . . . . 78
5 Characteristic functions . . . . . . . . . . . . . . . . . . . . . 86
6 Central limit theorem . . . . . . . . . . . . . . . . . . . . . . 88
3
Bibliography
3. Billingsley, P., Probability and Measure, third edition. Wiley, 1995. ISBN:
0-471-00710-2.
4
1 Basic Notions
1 Probability spaces
Probability theory is the mathematical theory of randomness. The basic notion is
that of a random experiment, which is an event whose outcome is not predictable
and can only be determined after performing it and then observing the outcome.
Probability theory tries to quantify the possible outcomes by attaching a
probability to every event. This is of importance for example for an insurance
company when asking the question what is a fair price of an insurance against
events like fire or death that are events that can happen but need not happen.
The set of all possible outcomes of a random experiment is denoted by Ω.
The set Ω may be finite, denumerable or even uncountable.
Example 1.1. Examples of random experiments and corresponding Ω:
(i) Coin tossing: The possible outcomes of tossing a coin are either “head”
or “tail”. Denoting one outcome by “0” and the other one by “1”, the set
of all possible outcomes is given by Ω = {0, 1}.
(ii) Tossing a coin n times: In this case any sequence of zeros and ones
(alias heads or tails) of length n are considered as one possible outcome;
hence
Ω = (x1 , x2 , . . . , xn ) xi ∈ {0, 1} =: {0, 1}n
5
(iv) A random number between 0 and 1: Ω = [0, 1].
ω(t)
t
Events:
Reasonable subsets A ⊂ Ω for which it makes sense to calculate the probabil-
ity are called events (a precise definition will be given in Definition 1.3 below).
If we consider an event A and observe ω ∈ A in a random experiment, we say
that A has occured.
Combination of events:
[ “at least one of the events Ai oc-
A1 ∪ A2 , Ai
cur”
i
\
A1 ∩ A2 , Ai “all of the events Ai occur”
i
6
\[
lim sup An := Am “infinitely many of the Am occur”
n→∞
n m>n
n n
X o
n
A = (x1 , . . . , xn ) ∈ {0, 1} xi = k .
i=1
(iv) A random number 0 and 1: “number ∈ [a, b]”, A = [a, b] ⊂ Ω = [0, 1].
A = ω ∈ C [0, 1]
max ω(t) > c .
06t61
c
0
ω(t)
7
Let Ω be countable. A probability distribution function p on Ω is a function
X
p : Ω → [0, 1] with p(ω) = 1 .
ω∈Ω
Given any subset A ⊂ Ω, its probability P(A) can then be defined by simply
adding up X
P(A) = p(ω) .
ω∈A
(i) Ω ∈ A,
(ii) A ∈ A implies Ac ∈ A,
(iii) Ai ∈ A, i ∈ N, implies Ai ∈ A.
S
i∈N
• A1 , . . . , An ∈ A implies
n
[ n
\
Ai ∈ A and Ai ∈ A
i=1 i=1
8
• Ai ∈ A, i ∈ N, implies
\[ [\
Am ∈ A and Am ∈ A.
n m>n n m>n
(iii) Let I be an index set (not necessarily countable) and for any i ∈ I, let
Ai be a σ-algebra. Then i∈I Ai := {A ⊂ Ω | A ∈ Ai for any i ∈ I} is
T
again a σ-algebra.
• P(Ω) = 1
•
∞
[ X
P Ai = P(Ai ) ( “σ-additivity”)
i∈N i=1
9
Example 1.7. (i) Coin tossing Let A := P(Ω) = ∅, {0}, {1}, {0, 1} .
Tossing a fair coin means “head” and “tail” have equal probability 12 , hence:
1
P({0}) := P({1}) := , P(∅) := 0, P({0, 1}) := 1.
2 | {z }
=Ω
:= 2−n .
P (x1 , x2 , . . . ) ∈ {0, 1}N x1 = x̄1 , . . . , xn = x̄n
| {z }
∈A0
(Proof: Later !)
10
R
β
α
0
ω(t)
t0 t
Remark 1.8. Let (Ω, A, P) be a probability space, and let A1 , . . . , An ∈ A be
pairwise disjoint. Then
n
[ X
P Ai = P(Ai ) (P is additive)
i6n i=1
11
Proposition 1.9. Let A be a σ-algebra, and P : A → R+ := [0, ∞) be a
mapping with P(Ω) = 1. Then the following are equivalent:
Proof.
∞
[ 1.9 n
[ (1.1) n
X ∞
X
P Ai = lim P Ai 6 lim P(Ai ) = P(Ai ).
n→∞ n→∞
i=1 i=1 i=1 i=1
Proof. Since
[ n→∞ \ [
Am & Am
m>n n∈N m>n
12
the continuity from above of P implies that
[ 1.10 ∞
1.9 X
P lim sup An = lim P Am 6 lim P(Am ) = 0,
n→∞ n→∞ n→∞
m>n m=n
P∞
since m=1 P(Am ) < ∞.
Example 1.12. (i) Uniform distribution on [0, 1]: Let Ω = [0,1] and A
be the Borel-σ-algebra on Ω (= σ [a, b] 0 6 a 6 b 6 1 ). Let P
be the restriction of the Lebesgue measure on the Borel subset [0, 1] of
R. Then (Ω, A, P) is a probability space. The probability measure P is
called the uniform distribution on [0, 1], since P([a, b]) = b − a for any
0 ≤ a ≤ b ≤ 1 (translation invariance).
if ωi ∈ Ω, i ∈ I.
2 Discrete models
Throughout the whole section
13
• Ω 6= ∅ countable (i.e finite or denumerable)
• A = P(Ω) and
(ii) Every probability measure P on (Ω, A) is of this form, with p(ω) := P({ω})
for all ω ∈ Ω.
Proof. (i)
X
P= p(ω) · δω .
ω∈Ω
(ii) Exercise.
1
p(ω) = ∀ω ∈ Ω .
|Ω|
Then
14
Example 2.3. (i) random permutations: Let M := {1, . . . , n} and Ω :=
all permutations of M . Then |Ω| = n! Let P be the uniform distribution
on Ω.
Problem: What is the probability P(“at least one fixed point”)?
Consider the event Ai := {ω | ω(i) = i} (fixed point at position i). Then
Sylvester’s formula (cf. (1.2)) implies that
n
[
P(“at least one fixed point”) = P Ai
i=1
n
(1.2) X X
= (−1)k+1 · P(Ai1 ∩ · · · ∩ Aik )
| {z }
k=1 16i1 <···<ik 6n
= (n−k)!
n!
(k positions are fixed)
n X (−1)k n
X
k+1 n (n − k)!
= (−1) · =− .
k n! k!
k=1 k=1
Consequently,
n n
X (−1)k X (−1)k n→∞
P(“no fixed point”) = 1 + = −−−→ e−1
k! k!
k=1 k=0
Asymptotics as n → ∞:
n−k
1 X (−1)j n→∞ 1 −1
P(“exactly k fixed points”) = −−−→ ·e
k! j! k!
j=0
15
The Poisson distribution with parameter λ > 0 on N ∪ {0} is given by
∞
−λ
X λj
πλ := e · δj .
j!
j=0
|Ω| = |S|n .
Ω := ω = (x1 , . . . , xn ) xi ∈ S ,
n→∞ λk
−−−→ · e−λ (k = 0, 1, 2, . . .)
k!
16
(for n big and p small, the Poisson distribution with parameter λ = p · n
is a good approximation for B(n, p)).
(iii) Urn model (for example: opinion polls, samples, poker, lottery...)
We consider an urn containing N balls, K red and N − K black (N ≥ 2,
0 6= K 6= N ). Suppose that n 6 N balls are sampled without replace-
ment. What is the probability that exactly k balls in the sample are red?
typical application: suppose that a small lake contains an (unknown) num-
ber N of fish. To estimate N one can do the following: K fish will be
marked by red and after that n (n ≤ N ) fish are “sampled” from the
lake. If k is the number of marked fish in the sample, N̂ := K · nk is an
estimation of the unknown number N . In this case the probability below
with N replaced by N̂ is also an estimation.
Model:
Let Ω be all subsets of {1, . . . , N } having cardinality n, hence
N
Ω := ω ∈ P {1, . . . , N }
|ω| = n , |Ω| =
n
so that
K N −K
k n−k
P(Ak ) = N
(k = 0, . . . , n), (hypergeometric distribution).
n
17
3 Transformations of probability spaces
Throughout this section let (Ω, A) and (Ω̃, Ã) be measurable spaces.
Definition 3.1. A mapping T : Ω → Ω̃ is called A/Ã-measurable (or simply
measurable), if T −1 (Ã) ∈ A for all à ∈ Ã.
Notation:
(ii) Sufficient criterion for measurability: suppose that à := σ(Ã0 ) for some
collection of subsets Ã0 ⊂ P(Ω̃). Then T is A/Ã-measurable, if T −1 (Ã) ∈
A for all à ∈ Ã0 .
(iii) Let Ω, Ω̃ be topological spaces, and A, Ã be the associated Borel σ-
algebras. Then:
T : Ω → Ω̃ is continuous ⇒ T is A/Ã-measurable.
T2 ◦ T1 is A1 /A3 -measurable.
(iv) Exercise.
Definition 3.3. Let T : Ω̄ → Ω be a mapping and let A be a σ-Algebra of
subsets of Ω. The system
σ(T ) := T −1 (A) A ∈ A
18
Proposition 3.4. Let T : Ω → Ω̃ be A/Ã-measurable and P be a probability
measure on (Ω, A). Then
Proof. Clearly, P̃(Ã) > 0 for all à ∈ Ã, P̃(∅) = 0 and P̃(Ω̃) = 1. For pairwise
disjoint Ãi ∈ Ã, i ∈ N, T −1 (Ãi ) are pairwise disjoint too, hence
P is ∞ ∞
[ −1
[ σ-additive X
−1
X
P̃ Ãi = P T Ãi = P T (Ãi ) = P̃(Ãi ).
i∈N i∈N i=1 i=1
| {z }
S
= T −1 (Ãi )
i∈N
so that
X X
P̃(Ã) = P(T ∈ Ã) = P(T = ω̃i )·1Ã (ω̃i ) = P(T = ω̃i )·δω̃i (Ã).
| {z }
i∈I i∈I
=δω̃i (Ã)
19
Define X̃i : Ω̃ → {0, 1} by
X̃i (xn )n∈N := xi , i ∈ N,
and let
à := σ
{X̃i = 1} i ∈ N .
R R
T1 (ω) T2 (ω)
1 1
1 1 1 3
2
1 Ω 4 2 4
1 Ω
20
4 Random variables
Let (Ω, A) be a measurable space and
B(R̄) = B ⊂ R̄ B ∩ R ∈ B(R) .
R̄ := R ∪ {−∞, +∞},
(iii) Let X be a random variable on (Ω, A) with values in R (resp. R̄) and
h : R → R (resp. h : R̄ → R̄) be B(R)/B(R)-measurable (resp.
B(R̄)/B(R̄)-measurable). Then h(X) is a random variable too.
Examples: |X|, X 2 , |X|p , eX , . . .
(iv) The class of random variables on (Ω, A) is closed under the following
countable operations.
If X1 , X2 , . . . are random variables, then
Pn
• · Xi (αi ∈ R, n ∈ N), provided the sum of R̄-valued r.v.’s
i=1 αi
makes sense, ∞ − ∞ = ?,
• supi∈N Xi , inf i∈N Xi , in particular
X1 ∧ X2 := min(X1 , X2 ), X1 ∨ X2 := max(X1 , X2 )
are r.v.’s
• lim supi→∞ Xi , lim inf i→∞ Xi (hence also limi→∞ Xi , if it exists),
21
are random variables too.
since for a real number x it holds x > c ⇔ x > q for some q ∈ Q, q > c.
Important examples
22
(ii) simple random variables
n
X
X= ci · 1Ai , ci ∈ R, Ai ∈ A,
i=1
(i) X = X + − X − , with
(ii) Let X > 0. Then there exists a sequence of simple random variables
Xn , n ∈ N, with 0 ≤ Xn 6 Xn+1 and X = limn→∞ Xn (in short:
0 ≤ Xn % X).
Then
Z Z
E[X] := X dP = X dP
Ω
23
Definition/Construction of the integral w.r.t. P:
Let X be a random variable.
1. If X = 1A , A ∈ A, define
Z
X dP := P(A) .
Pn
2. If X = i=1 ci · 1Ai , ci ∈ R, Ai ∈ A, define
Z n
X
X dP := ci · P(Ai )
i=1
Z Z Z
X + dP − X − dP.
E[X] = X dP :=
Definition 4.6. (i) The set of all P-integrable random variables is defined
by
24
If
N := {X r.v. | X = 0 P-a.s.}
L1.
L := L (Ω, A, P) :=
1 1
(X ∼ Y :⇔ X −Y ∈ N :⇔ X = Y P-a.s.)
N
is a Banach space w.r.t. the norm E |X| .
h X “3.” and
i “2.” above X
E[X] = E x · 1{X=x} = x · P(X = x). (1.4)
x∈X(Ω) x∈X(Ω)
X X X
E[X] = X(ω) · E[1{ω} ] = X(ω) · P({ω}) = p(ω) · X(ω).
| {z }
ω∈Ω ω∈Ω =:p(ω) ω∈Ω
Example 4.8. Infinitely many coin tosses with a fair coin: Let Ω =
{0, 1}N . A and P as in 3.6
(1.4) 1
E[Xi ] = 1 · P(Xi = 1) + 0 · P(Xi = 0) = .
2
25
(ii) Expectation of number of “successes”:
Sn := X1 + · · · + Xn = number of “successes”(= ones) in n tosses
Then for k = 0, 1, . . . , n
X
P(Sn = k) = P(X1 = x1 , . . . , Xn = xn )
(x1 ,...,xn )∈{0,1}n
with
x1 +...+xn =k
n
= · 2−n .
k
Hence
n n
(1.4) X X n n
E[Sn ] = k · P(Sn = k) = k· · 2−n = .
k 2
k=0 k=1
q P∞ P∞
(Recall: 1 d d
)
k k−1
(1−q)2
= dq 1−q = dq k=1 q = k=1 kq
26
Proposition 4.10. Let X, Y be r.v. satisfying (1.3). Then
(i) 0 6 X 6 Y P-a.s. =⇒ 0 6 E[X] 6 E[Y ].
(ii) Xn 6 Y P-a.s. for all n ∈ N =⇒ E lim supn→∞ Xn > lim supn→∞ E[Xn ].
B. Levi
= lim E inf Xk − Y +E Y
n→∞ k >n
4.10 4.10(ii)
6 lim inf E[Xk − Y ] + E Y = lim inf E[Xn ].
n→∞ k>n n→∞
27
Proposition 4.14 (Lebesgue’s dominated convergence theorem, DCT). Let
Xn , n ∈ N be random variables and Y ∈ L1 with |Xn | 6 Y P-a.s. Suppose
that the pointwise limit limn→∞ Xn exists P-a.s. Then
E lim Xn = lim E[Xn ].
n→∞ n→∞
it follows
4.9 Fatou
E lim Xn = E lim inf Xn 6 lim inf E[Xn ] 6 lim sup E[Xn ]
n→∞ n→∞ n→∞ n→∞
Fatou 4.9
6 E lim sup Xn = E lim Xn .
n→∞ n→∞
Example 4.15. Tossing a fair coin Consider the following simple game: A
fair coin is thrown and the player can invest an arbitrary amount of KRW on
either “head” or “tail”. If the right side shows up, the player gets twice his
investment back, otherwise nothing.
Suppose now a player plays the following bold strategy: he doubles his invest-
ment until his first success. Assuming the initial investment was 1000 KRW,
the investment in the nth round is given by
whereas on the other hand limn→∞ In = 0 P-a.s. (more precisely: for all
ω 6= (0, 0, 0, . . . )).
5 Inequalities
Let (Ω, A, P) be a probability space.
28
Proposition 5.1 (Jensen’s inequality). Let h be a convex function defined on
some interval I ⊆ R, X in L1 with X(Ω) ⊂ I. Then E[X] ∈ I and
h E[X] 6 E h(X) .
E[X]2 6 E[X 2 ].
More generally, for 0 < p 6 q:
1 1
E |X|p p 6 E |X|q q .
| {z } | {z }
=:kXkp =:kXkq
q p
Proof. h(x) := |x| p is convex. Since |X| ∧ n ∈ L1 for n ∈ N, we obtain
that
h p i pq h q i
E |X| ∧ n 6 E |X| ∧ n ,
29
Proposition 5.5. Let X be a random variable, h : R̄ → [0, ∞] be increasing.
Then
h(c) P(X > c) 6 E h(X) ∀ c > 0.
Proof.
h(c) P(X > c) 6 h(c) P h(X) > h(c) = E h(c) 1{h(X)>h(c)}
6 E h(X) .
Corollary 5.6. (i) Markov inequality: Choose h(x) = x1[0,∞] (x) and re-
place X by |X| in 5.5. Then
1
P |X| > c 6 E |X| ∀c > 0.
c
In particular,
|X| = 0 P-a.s.
E |X| = 0 ⇒
|X| < ∞ P-a.s.
E |X| < ∞ ⇒
(ii) Chebychev’s inequality: Choose h(x) = x2 1[0,∞] (x) and and replace
X by |X − E[X]| in 5.5. Then X ∈ L2 implies
1 var(X)
h 2 i
P X − E[X] > c 6 2 E X − E[X] = .
c c2
30
is called the variance of X (mean square prediction error).
The variance is a measure for fluctuations of X around E[X]. It indicates
the risk thatpone takes when a prognosis is based on the expectation.
σ(X) := var(X) is called standard deviation.
(ii) var(X) = 0
⇔ P X = E[X] = 1
i.e. X behaves deterministically
var(aX + b) = a2 · var(X).
(1.6)
cov(X, Y ) = 0 ⇔ var(X + Y ) = var(X) + var(Y ) .
31
Proposition 6.7 (Cauchy-Schwarz). Let X and Y ∈ L2 . Then
2 · XY = (X + Y )2 − X 2 − Y 2 ∈ L1 .
and for i 6= j
cov(Xi , Xj ) = E[Xi Xj ] − p2 = 0,
so that X1 , X2 , . . . are pairwise uncorrelated (in fact even independent, see
below).
32
Let Sn := X1 + · · · + Xn be the number of successes. Then
and using Levi and the fact that X1 , X2 , . . . are pairwise uncorrelated, we con-
clude that
∞
X 1
var(X) = 2−2n · p(1 − p) = · p(1 − p).
3
n=1
Finally, let T be the waiting time until the first “success”. Then
Pp (T = n) = Pp X1 = · · · = Xn−1 = 0, Xn = 1
= (1 − p)n−1 p (geometric distribution),
then
∞
hX i ∞
X
E[T ] = E n · 1{T =n} = n · Pp (T = n)
n=1 n=1
∞
X 1
= n · (1 − p)n−1 p = ,
p
|n=1 {z }
“derivative of the
geometr. series”
and analogously
1−p
var(T ) = · · · = .
p2
33
• (Ω, A, P) be a probability space
• X1 , X2 , . . . ∈ L2 r.v. with
– Xi uncorrelated, i.e. cov(Xi , Xj ) = 0 for i 6= j
– uniformly bounded variances, i.e. supi∈N var(Xi ) < ∞.
| {z }
=σ 2 (Xi )
=:σi2
Let
Sn := X1 + · · · + Xn
S (ω)
so that nn is the arithmetic mean of the first n observations X1 (ω), . . . , Xn (ω)
(“empirical mean”).
Our aim in this section is to show that randomness in the empirical mean
vanishes for increasing n, i.e.
Sn (ω) n large
∼ m if E[Xi ] ≡ m .
n
Remark 7.1. W.l.o.g. we may assume that E[Xi ] = 0 for all i, because
otherwise consider X̃i := Xi − E[Xi ] (“centered”), which satisfies:
• X̃i ∈ L2
• var(X̃i ) = var(Xj ).
E[S̃n ] E[Sn ] Pn
• S̃n
n − n = Sn
n − n (S̃n := i=1 X̃i "centered sum")
Proposition 7.2.
S n E[Sn ] 2
lim E − = 0,
n→∞ n n
2
Sn
(resp. lim E −m =0 if E[Xi ] ≡ m).
n→∞ n
34
Proof.
S n E[Sn ] 2 S
n 1
E − = var = · var(Sn )
n n n n2
n
Bienaymé 1 X 2 1 n→∞
= 2
σi 6 · const. −−−→ 0.
n n
i=1
35
Example 7.6. Application: uniform approximation of f ∈ C [0, 1] with
Bernstein polynomials
Let p ∈ [0, 1]. Then by the transformation theorem (see assignments)
n
X k n n
X k
k n−k
Bn (p) := f · p (1 − p) = f · Pp [Sn = k]
n k n
k=0 k=0
Sn
= Ep f .
n
Now
Sn Sn
Bn (p) − f (p) = Ep f − f (p) 6 Ep f − f (p)
n n
Sn Sn
= Ep f − f (p) · 1{| Sn −p|6δ} + Ep f − f (p) · 1{| Sn −p|>δ}
n n n n
Sn Sn
6 ε · Pp − p 6 δ + 2kf k∞ Pp −p >δ .
n n
| {z }
≤ δ21n p(1−p)≤ 4δ12 n
Consequently,
36
implies
1
Zn (ω) < for all but finitely many n ∈ N,
k
Sk2 (ω)
lim =0 / N1 with P(N1 ) = 0 .
∀ω ∈
k→∞ k2
37
2. Step Let Dk := maxk2 6l<(k+1)2 |Sl − Sk2 |. We show fast convergence in
probability of D
k2 to 0. For all ε > 0:
k
k2[
+2k
D
P 2k > ε |Sl − Sk2 | > εk 2
=P
k
l=k2 +1
2
kX +2k
|Sl − Sk2 | > εk 2
6 P
| {z }
l=k2 +1 Chebychev
6 1
ε2 k4
(l−k2 ) ·c
| {z }
62k
(2k)(2k) · c
6
ε2 k 4
4c
= 2 2.
ε k
Lemma 7.7 now implies that
Dk (ω)
lim =0 / N2 with P(N2 ) = 0.
∀ω ∈
k→∞ k 2
S n = Y1 + · · · + Yn
= position of a particle undergoing a “random walk” on Z.
Increasing refinement (+ rescaling and linear interpolation) of the random walk
yields the Brownian motion:
38
S (ω)
The strong law of large numbers implies that nn → 0 P-a.s.
In particular, fluctuations are growing slower than linear.
Sn (ω)
lim sup √ = +1 P-a.s.
n→∞ 2n log log n
Sn (ω)
lim inf √ = −1 P-a.s.
n→∞ 2n log log n
39
(iii) P-a.s. convergence
P lim Xn = X = 1.
n→∞
(i) +3 (ii)
X` >F
if sup|Xn | ∈ Lp
n∈N
(resp. |Xn |p unif. int.)
along some
~ subsequence
(iii)
(iii)⇒(ii):
∞ [
∞ \
\ 1
lim Xn = X = |Xn − X| 6 .
n→∞ k
k=1 m=1 n>m
| {z }
=:Ak
1
\
1.9
1 = P(Ak ) = lim P |Xn − X| 6
m→∞ k
n>m
1 1
6 lim inf P |Xm − X| 6 ≤ lim sup P |Xm − X| 6 6 1.
m→∞ k m→∞ k
Consequently,
1
lim P |Xm − X| > = 0.
m→∞ k
40
(iii)⇒(i): Y := supn∈N |Xn | ∈ Lp , limn→∞ Xn = X P-a.s. implies |X| 6 Y
In particular, |Xn − X|p 6 2p Y p ∈ L1 .
limn→∞ |Xn − X|p = 0 P-a.s. with Lebesgue’s dominated convergence
now implies
lim E |Xn − X|p = 0.
n→∞
• in general (i);(iii) and (iii);(i) (hence (ii);(i) too). For examples, see
Exercises.
Definition 8.4. Let I be an index set. A family (Xi )i∈I ⊂ L1 of r.v. is called
uniformly integrable if
Z
lim sup |Xi | dP = 0.
c→∞ i∈I {|Xi |>c}
Note that by Lebesgue’s theorem {|Xi |>c} |Xi | dP = E[1{|Xi |>c} · |Xi |] →
R
41
The next Proposition is the definitive version of Lebesgue’s theorem on dom-
inated convergence.
Proposition 8.5 (Vitali convergence therorem). Let Xn ∈ L1 , n ≥ 1, and X
be r.v. Then the following statements are equivalent:
(i) lim Xn = X in L1 .
n→∞
Lemma 8.7 (ε-δ criterion). Let (Xi )i∈I ⊂ L1 . Then the following statements
are equivalent:
(i) (Xi )i∈I is uniformly integrable.
6 c + 1 < ∞.
For δ := ε
2c and A ∈ A with P(A) < δ we now conclude
Z Z Z
|Xi | dP = |Xi | dP + |Xi | dP
A A∩{|Xi |<c} A∩{|Xi |>c}
Z Z
ε
6c dP + |Xi | dP < c · P(A) + < ε.
A {|Xi |>c} 2
42
(ii)⇒(i): Let ε > 0 and δ be as in (ii). Using Markov’s inequality (and the two
properties in (ii)), we get for any i ∈ I
1 supi∈I E |Xi | + 1
if c >
P |Xi | > c 6 · E |Xi | < δ, ,
c δ
Z
hence |Xi | dP < ε ∀ i ∈ I.
{|Xi |>c}
Remark 8.8. (i) Existence of dominating integrable r.v. implies uniform in-
tegrability: |Xi | ≤ Y ∈ L1 ∀i ∈ I
Z Z
DCT c%∞
⇒ |Xi | dP 6 Y dP = E 1{Y >c} ·Y −−−−−−−−→ 0,
{|Xi |>c} {Y >c}
c→∞
since 1{Y >c} · Y −−−→ 0 P-a.s. (Markov’s inequality)
In particular, I finite ⇒ (Xi )i∈I ⊂ L1 uniformly integrable.
(see Exercises).
Proof of Proposition 8.5. (i)⇒(ii): see Exercises. (Hint: Use Lemma 8.7).
Fatou
E |X| = E lim inf |Xnk | 6 lim inf E |Xnk |
k→∞ k→∞
6 sup E |Xn | < ∞.
n∈N
43
Let ε > 0. Then there exists δ > 0 such that for all A ∈ A with
P(A) < δ it follows that A |Xn | dP < 2ε for any n ∈ N.
R
Z Z
E |Xn | = |Xn | dP + |Xn | dP < ε,
{|Xn |< 2ε } {|Xn |> 2ε }
| {z } | {z }
≤ 2ε < 2ε
⇒ lim Xn = X in Lp .
n→∞
g(x)
Proposition 8.10. Let g : [0, ∞) → [0, ∞) be measurable with limx→∞ x =
∞. Then
g(x)
Proof. Let ε > 0. Choose c > 0, such that 1
x > ε supi∈I E g |Xi | +1
for x > c. Then for all i ∈ I
Z Z
|Xi |
|Xi | dP = g |Xi | · dP
{|Xi |>c} {|Xi |>c} g |Xi |
Z
ε
6 · g |Xi | dP 6 ε.
sup E g |Xj | + 1 {|Xi |>c}
j∈I
44
Example 8.11. (i) p > 1, supi E[|Xi |p ] < ∞ ⇒ (Xi )i∈I uniformly inte-
grable
Proof: g(x) := x · log+ (x) ist monotone increasing and convex. Consequently,
n n
|S |
h i monotonicity h 1X i convexity 1 X
E g n
6 E g |Xi | 6 E g |Xi |
n n n
i=1 i=1
6 sup E g |Xi | ∀n,
16i6n
and so
|Sn |
h i
sup E g 6 sup E g |Xi | < ∞.
n∈N n i∈N
Consequently, Sn
is uniformly integrable and (1.9) holds. Thus by
n n∈N
Proposition 8.5
Sn
lim = m in L1 .
n→∞ n
45
One complementary remark concerning Lebesgue’s dominated convergence
theorem.
lim Xn = X in L1
n→∞
“⇐”:
X + Xn = X ∨ Xn + X ∧ Xn
| {z } | {z }
:=sup{X,Xn } :=inf{X,Xn }
Then
Lebesgue
lim E[X ∧ Xn ] = E[X]
n→∞
and thus
Lp -completeness
Proposition 8.14 (Lp -completeness, Riesz-Fischer). Let 1 6 p < ∞ and
Xn ∈ Lp with
Z
lim |Xn − Xm |p dP = 0.
n,m→∞
(ii) lim Xn = X in Lp .
n→∞
46
9 Distribution of random variables
Let (Ω, A, P) be a probability space, and X : Ω → R̄ be a r.v.
Let µ be the distribution of X (under P), i.e., µ(A) = P(X ∈ A) for all
A ∈ B(R̄).
Assume that P(X ∈ R) = 1 (in particular, X P-a.s. finite, and µ is a
probability measure on (R, B(R)).
(1.10)
F (b) := P(X 6 b) = µ (−∞, b] , b ∈ R,
(ii) Existence: Let λ be the Lebesgue measure on (0, 1). Define the “inverse
function” G of F : R → [0, 1] by
G : (0, 1) → R
G(y) := inf x ∈ R F (x) > y .
47
Since 0 < y < F (x) ⇒ G(y) 6 x, we have
0, F (x) ⊂ {G 6 x}.
so that G is measurable.
Let µ := G(λ) = λ ◦ G−1 (probability measure on (R, B(R))). Then
µ (−∞, x] = λ({G 6 x}) = λ 0, F (x) = F (x) ∀x ∈ R.
Uniqueness: later.
48
Definition 9.5. (i) F (resp. µ) is called discrete, if there exists a countable
set S ⊂ R with µ(S) = 1. In this case, µ is uniquenely determined by the
weights µ({x}), x ∈ S, and F is a step function of the following type:
X
F (x) = µ({y}).
y∈S,
y 6x
49
(ii) (Continuous) exponential distribution with parameter α > 0.
(
αe−αx if x > 0
f (x) :=
0 if x < 0,
(
x
1 − e−αx if x > 0
Z
F (x) = f (t) dt =
−∞ 0 if x < 0.
= (1 − p)k p = P (X = k).)
50
Proof. See Assignments.
In particular:
r
2
p = 1 : E |X − m| = σ ·
π
p = 2 : E |X − m|2 = σ 2
3 σ3
p = 3 : E |X − m|3 = 2 2 · √
π
p = 4 : E |X − m|4 = 3σ 4 .
51
Definition 10.1. Let µ and µn , n ∈ N, be probability measures on (S, S).
The sequence (µn ) converges to µ weakly if for all f ∈ Cb (S) (= the space of
bounded continuous functions on S) it follows that
Z Z
n→∞
f dµn −−−→ f dµ.
(i) µn → µ weakly
n→∞
(ii) f dµ for all f bounded and uniformly continuous (w.r.t.
R R
f dµn −−−→
d)
(iv) lim inf n→∞ µn (G) > µ(G) for all G ⊂ S open
(v) limn→∞ µn (A) = µ(A) for all µ-continuity sets A, i.e. ∀ A ∈ S with
µ(∂A) = 0.
n→∞
(vi) f dµ for all f bounded, measurable and µ-a.s. contin-
R R
f dµn −−−→
uous.
52
(ii)⇒(iii): Let F ⊂ S be closed and define d(x, F ) := inf y∈F d(x, y), x ∈ S.
The sets
1
Gm := x ∈ S d(x, F ) < , m∈N are open,
m
Define
1
if x 6 0
ϕ(x) := 1 − x if x ∈ [0, 1]
if x > 1.
0
(iii)⇒(v): For a subset A ⊂ S we denote the closure by Ā, the interior by Å,
and the boundary by ∂A. Let A be such that µ(Ā \ Å) = µ(∂A) = 0.
Then
(iv)
µ(A) = µ(Å) 6 lim inf µn (Å) ≤ lim inf µn (A) ≤ lim sup µn (A)
n→∞ n→∞ n→∞
(iii)
≤ lim sup µn (Ā) 6 µ(Ā) = µ(A).
n→∞
53
(v)⇒(vi): Let f be as in (vi). The distribution function
F (x) = µ({f 6 x})
has at most countably many jumps. Thus D := x ∈ R µ({f = x}) 6=
0 is at most countable, and so R \ D ⊂ R is dense. By denseness and
since f is bounded: for any ε > 0 we find c0 < · · · < cm ∈ R \ D with
Df := {x ∈ R | f is not continuous at x}
and
thus by (1.13)
B f (ω), ε ∩ [ck , ck+1 ) 6= ∅ 6= B f (ω), ε ∩ R \ [ck , ck+1 ) .
54
Pm−1
Let g := k=0 ck · IAk . Then kf − gk∞ 6 ε and
Z Z
f dµ − f dµn
Z Z Z Z
6 |f − g| dµ + g dµ − g dµn + |g − f | dµn
| {z } | {z }
6ε 6ε
m−1 (v)
n→∞
X
6 2ε + |ck | · µ Ak − µn Ak −−−→ 2ε.
k=0
Proof. Let f ∈ Cb (S) be uniformly continuous and ε > 0. Then there exists a
δ = δ(ε) > 0 such that:
x, y ∈ S with d(x, y) 6 δ implies |f (x) − f (y)| ≤ ε
Hence
Z Z
f dµ − f dµn = E f (X) − E f (Xn )
Z Z
6 f (X) − f (Xn ) dP + f (X) − f (Xn ) dP
{d(X,Xn )6δ} {d(X,Xn )>δ}
6 ε + 2kf k∞ · P d(Xn , X) > δ .
| {z }
n→∞
−−−→0
55
n→∞
(ii) µn −−−→ µ weakly
n→∞
(iii) Fn (x) −−−→ F (x) for all x where F is continuous.
n→∞
(iv) µn (a, b] −−−→ µ (a, b] for all (a, b] with µ({a}) = µ({b}) = 0.
n→∞
Fn (x) = µn (−∞, x] −−−→ µ (−∞, x] = F (x).
(iii)
µ (a, b] = F (b) − F (a) = lim Fn (b) − lim Fn (a)
n→∞ n→∞
= lim µn (a, b] .
n→∞
m
X
f− f (ck−1 ) · I(ck−1 ,ck ] 6 sup sup f (x)−f (ck−1 ) ≤ ε,
∞ 1≤k≤m x∈[ck−1 ,ck ]
|k=1 {z }
=:g
56
and so
Z Z
f dµ − f dµn
Z Z Z Z
6 |f − g| dµ + g dµ − g dµn + |f − g| dµn
| {z } | {z }
≤ε ≤ε
m
X
6 2ε + |f (ck−1 )| · µ (ck−1 , ck ] − µn (ck−1 , ck ]
k=1
(iv)
n→∞
−−−→ 2ε.
(i) Ω ∈ D.
(ii) A ∈ D ⇒ Ac ∈ D.
[
Ai ∈ D.
i∈N
D := A ∈ A P1 (A) = P2 (A)
is a Dynkin-system.
57
Remark 11.3. (i) Let D be a Dynkin-system. Then
A, B ∈ D , A ⊂ B ⇒ B \ A = (B c ∪ A)c ∈ D
(ii) Every Dynkin-system which is closed under finite intersections (short no-
tation: ∩-stable), is a σ-algebra, because:
(a) A, B ∈ D A ∪ B = A ∪ B \ (A ∩ B) ∈ D.
⇒
| {z }
∈D by ass.
| {z }
(i)
∈D
(b) Ai ∈ D, i ∈ N
i−1 i−1
[ [ [ [ [ c
⇒ Ai = Ai \ An = Ai ∩ An ∈ D.
n=1
i∈N i∈N
| {z } i∈N
| n=1{z }
pairwise disjoint ! (a)
| {z∈ D }
∈ D by ass.,
σ(B) = D(B) ,
where
\
D(B) := D
D Dynkin-system
B⊂D
58
=⇒ E ∩ D = D ∩ E ∈ D(B).
The latter implies B ⊂ DD , hence D(B) ⊂ DD .
D := A ∈ A P1 (A) = P2 (A)
∅, {X1 = x1 , . . . , Xn = xn }, n ∈ N, x1 , . . . , xn ∈ {0, 1}
59
Definition 11.7. Let H be a vector space of real-valued bounded functions
on Ω. H is called a monotone vector space (MVS), if:
Then
⇒ lim gk = f + 2a1 ∈ H ⇒ f ∈ H.
k→∞
60
σ(M) := “smallest σ-algebra for which all f ∈ M are measurable”
The next theorem plays the same role in measure theory and probability theory
as the Stone-Weierstrass Theorem in analysis.
Claim: f ∈ M0 ⇒ f ∧ α ∈ M0 , ∀α ∈ R.
σ(M0 ) = {A ∈ σ(M0 ) | 1A ∈ H} =: S.
“⊃”: Clear
“⊂”: S is a Dynkin system. Put
E := {A ∈ σ(M0 ) | ∃fn ∈ M0 , fn ≥ 0, n ≥ 1 : fn % 1A }.
61
For f ∈ M0 , α ∈ R, we have:
n · (f − α)+ ∧ 1 % 1{f >α} ⇒ {f > α} ∈ E
| {z } n→∞ | {z }
∈M0 such sets generate σ(M0 )
How big is σ(Cb (S)) ? Clearly σ(Cb (S)) ⊂ B(S) since every continuous func-
tion on S is measurable w.r.t. B(S). Let F ⊂ S be closed and d(x, F ) :=
inf y∈F d(x, y), x ∈ S. Then d(·, F ) is Lipschitz continuous and so f :=
d(·, F ) ∧ 1 ∈ Cb (S). Moreover
F = {f = 0} ∈ σ(Cb (S)).
Hence B(S) ⊂ σ(Cb (S)). In particular 1A ∈ σ(Cb (S))b for all A ∈ B(S),
hence µ1 = µ2 .
62
2 Independence
1 Independent events
Let (Ω, A, P ) be a probability space.
are independent.
63
To this end suppose first that Aj2 ∈ Bj2 , . . . , Ajn ∈ Bjn , and define
Then Dj1 is a Dynkin system (!) containing Bj1 . Proposition 1.11.4 now
implies
hence σ(Bj1 ) = Dj1 . Iterating the above argument for Dj2 , Dj3 , implies
(2.1).
64
Remark 1.4. Pairwise independence does not imply independence in general.
Example: Consider two tosses with a fair coin, i.e.
P := uniform distribution.
Ω := (i, k) i, k ∈ {0, 1} ,
1
P (A ∩ B) = P (B ∩ C) = P (C ∩ A) = .
4
But on the other hand
1
P (A ∩ B ∩ C) = 6= P (A) · P (B) · P (C).
4
Example 1.5. Independent 0-1-experiments with success probability
p ∈ [0, 1]. Let Ω := {0, 1}N , Xi (ω) := xi and ω := (xi )i∈N . Let Pp be a
probability measure on A := σ {Xi = 1}, i = 1, 2, . . . , with
65
be the tail σ-algebra (resp. σ-algebra of terminal events). Then
P (A) ∈ {0, 1} ∀ A ∈ B∞
i.e., P is deterministic on B∞ .
Proof of the Zero-One Law. Proposition 1.2 implies that for all n
∞
[
B1 , B2 , . . . , Bn−1 , σ Bm
m=n
S
are independent. Since B∞ ⊂ σ B
m>n m , this implies that for all n
B1 , B2 , . . . , Bn−1 , B∞
B∞ , Bn , n ∈ N are independent
66
Lemma 1.7. Let B ⊂ A be a σ-algebra such that B is independent from B.
Then
P (A) ∈ {0, 1} ∀A ∈ B.
∞
X
P (Ai ) < ∞ ⇒ P lim sup Ai = 0.
i→∞
i=1
∞
X
P (Ai ) = ∞ ⇒ P lim sup Ai = 1.
i→∞
i=1
∞
[ ∞
\
P Am = 1 resp. P Acm =0 ∀n.
m=n m=n
67
The last equality follows from the fact that
∞
\ n+k
\
P Acm = lim P Acm
k→∞
m=n m=n
| {z }
ind.
Qn+k c
= m=n P (Am )
n+k n+k
!
Y X
= lim (1 − P (Am )) ≤ lim exp − P (Am ) =0
k→∞ k→∞
m=n m=n
and consider the events Ai = "text occurs in the ith block". Clearly, Ai , i ∈ N,
are independent events (!) by Proposition 1.2(ii) with equal probability
Pp (Ai ) = pK (1 − p)N −K =: α > 0.
PN
where K := i=1 xi is the total sum of ones in the text. In particular,
∞ ∞
i=1 α = ∞, and now Borel-Cantelli implies Pp (A∞ ) = 1,
P P
i=1 Pp (Ai ) =
where
A∞ = lim sup Ai := "text occurs infinitely many times" .
i→∞
Moreover: since the indicator functions 1A1 , 1A2 , . . . are uncorrelated (since
A1 , A2 , . . . are independent) with uniformly bounded variances, the strong law
of large numbers implies that
n
1X Pp -a.s.
1A1 −−−−→ E[1A1 ] = α ,
n
i=1
i.e. the relative frequency of the given text in blocks of the infinite sequence is
strictly positive.
68
2 Independent random variables
Let (Ω, A, P ) be a probability space.
are independent, i.e. for all finite subsets J ⊂ I and any Borel subsets Aj ∈
B(R̄)
\ Y
P {Xj ∈ Aj } = P [Xj ∈ Aj ].
j∈J j∈J
Proof. W.l.o.g. n = 2. (Proof of the general case by induction, using the fact
that X1 · . . . · Xn−1 and Xn are independent , since X1 · . . . · Xn−1 is
measurable
w.r.t σ σ(X1 ) ∪ · · · ∪ σ(Xn−1 ) and σ σ(X1 ) ∪ · · · ∪ σ(Xn−1 ) and σ(Xn )
are independent by Proposition 1.2.)
It therefore suffices to consider two independent r.v. X, Y , ≥ 0, and we have
to show that
69
with αi , βj > 0 and Ai ∈ σ(X) resp. Bj ∈ σ(Y ) it follows that
X X
E[XY ] = αi βj ·P (Ai ∩Bj ) = αi βj ·P (Ai )·P (Bj ) = E[X]· E[Y ].
i,j i,j
Proof. Let ε1 , ε2 ∈ {+, −}. Then X ε1 and Y ε2 are independent by Remark 2.2
and nonnegative. Proposition 2.3 implies
X · Y = X + · Y + + X − · Y − − (X + · Y − + X − · Y + ) ∈ L1 ,
Remark 2.5. (i) In general the converse to the above corollary does not hold:
For example let X be N (0, 1)-distributed and Y = X 2 . Then X and Y
are not independent, but
(ii)
X, Y ∈ L2 independent ⇒ X, Y uncorelated
because
n
1X
If E[Xi ] ≡ m then lim Xi = m P -a.s.
n→∞ n
i=1
70
3 Kolmogorov’s law of large numbers
Proposition 3.1 (Kolmogorov, 1930). Let X1 , X2 , · · · ∈ L1 be independent,
identically distributed (i.i.d.), m = E[Xi ]. Then
n
1X n→∞
Xi −−−→ m P -a.s.
n
| i=1
{z }
empirical
mean
n
1X n→∞
Xi −−−→ m P -a.s.
n
i=1
Clearly,
(
x if x < i
X̃i = hi (Xi ) with hi (x) :=
0 if x > i
Then X̃1 , X̃2 , . . . are pairwise independent by Remark 2.2. For the proof it
Pn
is now sufficient to show that for S̃n := i=1 X̃i we have that
S̃n n→∞
−−−→ m P -a.s.
n
71
Indeed,
∞
X ∞
X ∞
X
P [Xn 6= X̃n ] = P [Xn > n] = P [X1 > n]
n=1 n=1 n=1
∞ X
X ∞ ∞
X
= P X1 ∈ [k, k + 1) = k · P X1 ∈ [k, k + 1)
n=1 k=n k=1
X∞
= E k · 1{X1 ∈[k,k+1)} 6 E[X1 ] < ∞
k=1
| {z }
6X1 ·1{X1 ∈[k,k+1)}
72
Hence for all ω ∈
/ Nα
we get
S̃l (ω)
lim = m.
l→∞ l
3. Due to Lemma 1.7.7 it suffices for the proof of (2.3) to show that
∞
" #
X S̃kn − E[S̃kn ]
∀ε > 0 : P >ε <∞
kn
n=1
73
We will show in the following that there exists a constant c such that
X 1 c
6 . (2.4)
kn2 i2
n : kn >i
bαn c = kn 6 αn < kn + 1
α>1 α−1
n n n−1
⇒ kn > α − 1 > α − α = αn .
α
| {z }
=:cα
Let ni be the smallest natural number satisfying kni = bαni c > i, hence
αni > i, then
X 1 X 1 1 c−2 1
6 c−2
α = c−2
α · −2
·α −2ni
6 α
−2
· 2.
kn2 α2n 1−α 1−α i
n : kn > i n>ni
74
Corollary 3.3. Let X1 , X2 , . . . be pairwise independent, identically distributed
with Xi > 0. Then
n
1X
P-a.s.
lim Xi = E[X1 ] ∈ [0, ∞]
n→∞ n
i=1
Pn n→∞
Proof. W.l.o.g. E[X1 ] = ∞. Then 1
n i=1 Xi (ω) ∧ N −−−→ E[X1 ∧ N ],
P -a.s. for all N , hence
n n
1X 1X n→∞ N →∞
Xi > Xi ∧ N −−−→ E[X1 ∧ N ] % E[X1 ] P -a.s.
n n
i=1 i=1
and
α < 0: ∃ ε > 0 with α + ε < 0, so that Xn (ω) 6 en(α+ε) ∀ n > n0 (ω), hence
P-a.s. exponential decay
α > 0: ∃ ε > 0 with α − ε > 0, so that Xn (ω) > en(α−ε) ∀ n > n0 (ω), hence
P-a.s. exponential growth
75
Note that by Jensen’s inequality
and typically the inequality is strict, i.e. α < log m, so that it might happen
that α < 0 although m > 1 (!)
Illustration As a particular example consider the following model:
Let X0 := 1 be the capital at time 0. At time n − 1 invest 12 Xn−1 and win
c 12 Xn−1 or 0, both with probability 12 . Then
(
1 1
2 (1 + c) with prob. 12
Xn = Xn−1 + = Xn−1 Yn
|2 {z } 0 with prob. 12 ,
not invested | {z }
gain/loss
with (
1
2 (1 + c) with prob. 12
Yn := 1
2 with prob. 12 ,
so that E[Yi ] = 41 (1 + c) + 1
4 = c+2
4 (supercritical if c > 2)
On the other hand
" #
1 1 1 1 1 + c c<3
E[log Y1 ] = · log (1 + c) + log = · log < 0.
2 2 2 2 4
n→∞
Hence Xn −−−→ 0 P -a.s. with exponential rate for c < 3, whereas at the same
time for c > 2 E[Xn ] = mn % ∞ with exponential rate.
76
"random measure"
n
1X
%n (ω, A) := 1A Xi (ω)
n
i=1
is a P -null set too, and for all x ∈ R and all s, r ∈ Q with s < x < r and
ω∈ / N:
F (s) = lim Fn (ω, s) 6 lim inf Fn (ω, x)
n→∞ n→∞
77
4 Joint distribution and convolution
Let Xi ∈ L1 i.i.d. Kolmogorov’s law of large numbers implies that
n
1X n→∞
Xi (ω) −−−→ E[X1 ] P -a.s.
n
|i=1 {z }
=:Sn
hence
Z −1 ! " #
Sn Sn
f (x) d P ◦ (x) = E f
n n
(Lebesgue) Z
n→∞
−−−→ f E[X1 ] = f (x) dδE[X1 ] (x) ∀f ∈ Cb (R)
i.e., the distribution of Snn converges weakly to δE[X1 ] . This is not surprising,
because at least for Xi ∈ L2
n
Sn 1 X n→∞
var = 2 var(Xi ) −−−→ 0.
n n | {z }
i=1 =var(X )
1
X̄ : Ω → Rn ,
ω 7→ X̄(ω) := X1 (ω), . . . , Xn (ω)
78
Remark 4.2. (i) µ̄ is well-defined, because X̄ : Ω → Rn is A/B(Rn )-
measurable.
Proof:
B(Rn ) = σ A1 × · · · × An Ai ∈ B(R)
(= σ A1 × · · · × An Ai = (−∞, xi ] , xi ∈ R )
Example 4.3. (i) Let X, Y be r.v., uniformly distributed on [0, 1]. Then
• X, Y independent ⇒ joint distribution = uniform distribution on
[0, 1]2
• X = Y ⇒ joint distribution = uniform distribution on the diagonal
79
are independent and
0 if r < 0.
A := σ({A1 × . . . × An | Ai ∈ Ai , 1 ≤ i ≤ n})
n
O
X1 , . . . , Xn independent ⇔ µ̄ = µi ,
i=1
80
(ii)
Z
ϕ(x1 , . . . , xn ) dµ̄(x1 , . . . , xn )
Z Z !
= ··· ϕ(x1 , . . . , xn ) µi1 (dxi1 ) · · · µin (dxin ).
Hence,
Z
µ̌(A) := f¯(x̄) dx̄, A ∈ B(Rn ),
A
Hence µ̄ = µ̌ by 1.11.5.
81
Let X1 , . . . , Xn be independent, Sn := X1 + · · · + Xn
How to calculate the distribution of Sn with the help of the distribution of
Xi ?
In the following denote by Tx : R1 → R1 , y 7→ x + y, the translation by
x ∈ R.
Proposition 4.6. Let X1 , X2 be independent r.v. with distributions µ1 , µ2 .
Then:
(i) The distribution of X1 + X2 is given by the convolution
Z
µ1 ∗ µ2 := µ1 (dx1 ) µ2 ◦ Tx−1
1
, i.e.
Z Z
µ1 ∗ µ2 (A) = 1A (x1 + x2 ) µ1 (dx1 ) µ2 (dx2 )
Z
= µ1 (dx1 ) µ2 (A − x1 ) ∀ A ∈ B(R1 ) .
82
(ii)
Z Z Z
(µ1 ∗ µ2 )(A) = µ1 (dx1 ) µ2 (A − x1 ) = µ1 (dx1 ) f2 (x2 ) dx2
A−x1
change of variable Z Z
x−x1 =x2
= µ1 (dx1 ) f2 (x − x1 ) dx
A
Z Z
4.5
= µ1 (dx1 ) f2 (x − x1 ) dx.
A
Example 4.7.
(iii) The Gamma distribution Γα,p is defined through its density γα,p given by
(
1
Γ(p)
· αp xp−1 e−αx if x > 0
γα,p (x) =
0 if x 6 0
83
Example 4.8 (The waiting time paradox). Let T1 , T2 , . . . be independent,
exponentially distributed waiting times (e.g. time between reception of two
phone calls in a call center) with parameter α > 0, so that in particular
Z ∞
1
E[Ti ] = x · αe−αx dx = · · · = .
0 α
Fix some time t > 0. Let X denote the time-interval from the preceding event
to t, and Y denote the time-interval from t to the next event.
T T
z }|1 { z }|2 { . . . t
| {z } | {z }
X Y
Question: How long on average is the waiting time from t until the next event,
i.e., how big is E[Y ] ?
1 1
E[X] = (1 − e−αt ) ≈ for large t .
α α
More precisely:
(ii) X has exponential distribution with parameter α, "compressed to" [0, t],
i.e.:
P [X > s] = e−αs ∀ 0 6 s 6 t,
P [X = t] = e−αt ;
In particular,
Z t
1
E[X] = s · αe−αs ds + t · e−αt = · · · = (1 − e−αt )
0 α
84
(iii) X, Y are independent.
P [X > x, Y > y]
[
= P {t − Sn > x, Sn+1 − t > y}
n≥0
∞
X
= P [T1 > y + t] + P Sn 6 t − x, Tn+1 > y + t − Sn
n=1
∞ ZZ
X
= e−α(t+y) + 1[0,t−x]×[y+t−s,∞) (s, t) · γα,n (s) · αe−αr ds dr
n=1
∞ Z
X t−x
−α(t+y)
=e + γα,n (s) · e−α(y+t−s) ds
n=1 0
Z t−x ∞
X
= e−α(t+y) 1 + eαs γα,n (s) ds
0
|n=1 {z }
=α
Z t−x
−α(t+y) αs
=e 1+ αe ds
0
Consequently:
85
5 Characteristic functions
Let M1+ (Rn ) be the set of all probability measures on (Rn , B(Rn )).
For given µ ∈ M1+ (Rn ) define its characteristic function as the complex-
valued function µ̂ : Rn → C defined by
Z Z Z
ihu,yi
µ̂(u) := e µ(dy) := cos(hu, yi) µ(dy) + i sin(hu, yi) µ(dy) .
(i) µ̂(0) = 1.
(ii) |µ̂| 6 1.
Proof. Exercise.
Proposition 5.2 (Uniqueness theorem). Let µ1 , µ2 ∈ M1+ (Rn ) with µ̂1 = µ̂2 .
Then µ1 = µ2 .
86
Proof. For the proofs or references where to find the proofs of the three previous
propositions see Klenke Theorem 15.9, 15.23, and 15.29.
Yn
n
P̂(X1 ,...,Xn ) (u1 , . . . , un ) = (⊗j=1 PXj )(u1 , . . . , un ) ( =
\ P̂Xj (uj ) ),
| {z } | {z }
j=1
=ϕ(X1 ,...,Xn ) =ϕXj (uj )
n
Y
i.e.: P̂(X1 ,...,Xn ) = P̂Xj ◦ P rj , where P rj (u) = uj .
j=1
Proof.
Z n
Z Y n Z n
Indep.
Y Y
iuS iαuXk iαuXk
ϕS (u) = e dP = e dP = e dP = ϕXk (αu).
k=1 k=1 k=1
87
P∞ P∞
(ii) Let µ := i=0 αi δai (αi ≥ 0, i=0 αi = 1). Then
∞
X
µ̂(u) = αi eiuai , u ∈ R.
i=0
Special cases:
Pn
a) Binomial distribution βnp = n
pk q n−k δk Then for all u ∈ R:
k=0 k
n
X n
β̂np (u) = pk q n−k · eiuk = (q + peiu )n .
k
k=0
P∞ −α αn δ .
b) Poisson distribution πα = n=0 e n! n Then for all u ∈ R:
∞
−α
X αn iu
−1)
π̂α (u) = e · eiun = eα(e .
n!
n=0 | {z }
iu n
= (αen! )
Sn − E[Sn ]
Sn∗ := p ("standardized sum")
var(Sn )
or equivalently
Z b
1 x2
lim P [Sn∗ 6 b] = √ e− 2 dx = Φ(b), ∀b ∈ R.
n→∞ 2π −∞
88
Proposition 6.2. (Central limit theorem) Let X1 , X2 , . . . ∈ L2 be independent
r.v., σn2 := var(Xn ) > 0 and
n
X 21
sn := σk2 .
k=1
89
(iii) Let (Xn ) be bounded and suppose that sn → ∞. Then (Xn ) satisfies
Lyapunov’s condition for any δ > 0, because
α
|Xk | 6
2
⇒ Xk − E[Xk ] 6 α
n h i n h i
X 2+δ X 2 δ
E Xk − E[Xk ] E Xk − E[Xk ] α
k=1 k=1
⇒ 6
s2+δ
n s2n sδn
δ n i α δ
α 1 X
h
2
= · E Xk − E[Xk ] = .
sn s2n sn
|k=1 {z }
=s2n
6 Ln (ε) + ε2 .
The proof of Proposition 6.2 requires some further preparations.
Lemma 6.5. For all t ∈ R and n ∈ N:
it (it)2 (it)n−1 |t|n
eit − 1 − − − ··· − 6 .
1! 2! (n − 1)! n!
Proof. Define f (t) := eit , then f (k) (t) = ik eit . Then Taylor series expansion
around t = 0, applied to real and imaginary part, implies that
it (it)n−1
e − 1 − ··· − = Rn (t)
(n − 1)!
with
t |t|
|t|n
Z Z
1 n−1 n is 1
Rn (t) = (t−s) i e ds 6 sn−1 ds = .
(n − 1)! 0 (n − 1)! 0 n!
90
Proposition 6.6. Let X ∈ L2 . Then ϕX (u) = eiuX dP is two times
R
continuously differentiable with
Z Z
ϕ0X (u) =i Xe iuX
dP , ϕ00X (u) =− X 2 eiuX dP .
In particular
1
ϕX (u) = 1 + iu · E[X] + · θ(u)u2 · E[X 2 ]
2
Proof. Clearly,
1 2 2
|eiuX − 1 − iuX| 6 ·u X .
2
Hence
Z
1 2
ϕX (u) − 1 − iu · E[X] = eiuX − 1 − iuX dP 6 · u · E[X 2 ].
2
ϕX (u)−1−iu·E[X]
Now define θ(u) := 0 if u2 E[X 2 ] = 0, and θ(u) := 1
·u2 ·E[X 2 ]
otherwise.
2
91
Proposition 6.7. Suppose that
σk
(F ) lim max =0 and
n→∞ 1≤k≤n sn
n !
X u 1
(b) lim ϕXk −1 = − u2 ∀u ∈ R.
n→∞ sn 2
k=1
Then (Xn ) has the CLP.
Proof. It is sufficient to show that
n
Y u 1 2
lim ϕXk = e− 2 u . (2.6)
n→∞ sn
k=1
Pn
because for Sn∗ = 1
sn k=1 Xk we have that
n
Y u
ϕSn∗ (u) = ϕXk ,
sn
k=1
n→∞ 1 2
and ϕSn∗ (u) −−−→ e− 2 u = N \ (0, 1)(u) pointwise as well as N
\ (0, 1)(u) contin-
uous at u = 0, implies by Lévy’s continuity theorem and the uniqueness theorem
that limn→∞ PSn∗ = N (0, 1) weakly.
For the proof of (2.6) we need to show that for all u ∈ R
n n !
Y u Y u
lim ϕXk − exp ϕXk −1 = 0.
n→∞ sn sn
k=1 k=1
| {z }
=exp[ ··· ]→exp[− 12 u2 ]
P
n
Y n
Y
ak − bk = (a1 − b1 ) · a2 · · · an + b1 · (a2 − b2 ) · a3 · · · an + . . .
k=1 k=1
+ b1 · · · bn−1 · (an − bn )
n
X
6 |ak − bk |.
k=1
92
Consequently,
n n
Y u Y u
ϕXk − exp ϕXk −1
sn sn
k=1 k=1
n
X u u
6 ϕXk − exp ϕXk −1 =: Dn .
sn sn
k=1
Note that E[Xk ] = 0 and E[Xk2 ] = σk2 . The previous proposition now implies
that for all k
2 2
u u 1 u u 1 u
|zk | = ϕXk −1 = i ·E[Xk ]+ ·θ( ) ·E[Xk2 ] 6 σk2 ,
sn sn 2 sn sn 2 sn
and moreover by (F) we can find n0 ∈ N such that for all n > n0 and 1 6 k 6 n
2
1 u
σk2 < ε.
2 sn
Hence for all n > n0
n n
X u2 X σk2 u2
Dn 6 ε |zk | 6 ε = ε · .
2 s2n 2
k=1 k=1
Consequently, limn→∞ Dn = 0.
Proof of Proposition 6.2. W.l.o.g. assume that E[Xn ] = 0 for all n ∈ N. We
will use Proposition 6.7. Since (L) ⇒ (F) by Lemma 6.4 it remains to show (b)
of Proposition 6.7. We will show that Lindeberg’s condition implies (b), i.e. we
show that (L) implies
n !
X u 1
lim ϕXk −1 = − · u2 .
n→∞ sn 2
k=1
93
Let u ∈ R, n ∈ N, 1 6 k 6 n. By Lemma 6.5, we get
3
1 u2
u u 1 u
Yk := exp i · · Xk −1−i· · Xk + · 2 · Xk2 6 · Xk ,
sn sn 2 sn 6 sn
| {z }
E[... ]=0
n Z
1 u2
X u u
· Xk + · 2 · Xk2 dP
6 exp i · · Xk −1−i·
sn sn 2 sn
k=1 | {z }
E[... ]=0
n
X
6 E[Yk ],
k=1
and for any ε > 0
Z Z
E[Yk ] = Yk dP + Yk dP
{|Xk |>εsn } {|Xk |<εsn }
u2 |u|3
Z Z
6 2 Xk2 dP + 3 |Xk |3 dP.
sn {|Xk |>εsn } 6sn {|Xk |<εsn }
Note that
σk2
Z Z
1 3 ε 2
|X k | dP 6 X k dP = ε · ,
s3n {|Xk |<εsn } s2n s2n
so that we obtain
n n Z 2 n
|u|3 X σk2
X
2
X Xk
E[Yk ] 6 u dP + ·ε
X
{| snk |>ε} sn 6 s2n
k=1 k=1
|k=1{z }
=1
|u|3
= u2 Ln (ε) + · ε.
6
94
Consequently
n
X
lim E[Yk ] = 0 ,
n→∞
k=1
and thus
n !
X u 1 2
lim ϕXk −1 + ·u = 0.
n→∞ sn 2
k=1
Π := m + λσ 2
= average claim size + safety loading.
Income: nΠ
n
X
Expenditures: Sn = Xi .
i=1
95
Let
Sn − nm
Sn∗ := √ .
nσ
The central limit theorem implies for large n that Sn∗ ∼ N (0, 1), so that
K + nλσ 2
K + nΠ − nm
P (R) = P Sn∗ > √ = P Sn∗ > √
nσ nσ
K + nλσ 2
CLT
≈ 1−Φ √ ,
nσ
| {z }
n→∞
−−−→∞
where Φ denotes the distribution function of the standard normal distribu-
tion. Note that the ruin probability decreases with an increasing number
of contracts.
Example
Assume that n = 2000, σ = 60, λ = 0.5‰.
96
Then Sn := X1 + · · · + Xn has Poisson distribution πn , i.e.,
∞
−n
X nk
PSn = e δk ,
k!
k=0
Sn − n
Sn∗ = √ ,
n
In particular, for
f∞ (x) := x−
= (−x) ∨ 0
it follows that
n
nk
Z Z
x−n −n
X n−k
f∞ dPSn∗ = f∞ √ πn (dx) = e · √
R n k! n
|( {z } k=0 | {z }
= 0 if x>n =f∞ ( k−n
√ )
n
n−x
= √
n
if x6n
n
e−n nk (n − k)
X
= √ · n+
n k!
k=1
n 1
e−n nk+1 nk e−n · nn+ 2
X
= √ · n+ − = .
n k! (k − 1)! n!
|k=1 {z }
n+1 1
= n n! − n0!
Moreover,
Z Z 0 0
1 2
− x2 1 x2 1
f∞ dN (0, 1) = √ (−x)·e dx = √ ·e− 2 =√ .
2π −∞ 2π −∞ 2π
97
Hence, Stirling’s formula (2.7) would follow, once we have shown that
Z Z
n→∞
f∞ dPSn∗ −−−→ f∞ dN (0, 1). (2.8)
Note that this is not implied by the weak convergence in the CLT since
f∞ is continuous but unbounded. Hence, we consider for given m ∈ N
fm := f∞ ∧ m ∈ Cb (R) .
98