0% found this document useful (0 votes)

21 views98 pages

CH 2

The document outlines the syllabus for the 'Topics in Mathematics 2' course at Seoul National University for Fall Term 2024, focusing on probability theory and its fundamental concepts. It includes sections on probability spaces, random variables, independence, and various theorems such as the law of large numbers and the central limit theorem. Additionally, it provides a bibliography of key texts in probability theory for further reading.

Uploaded by

micster0116

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views98 pages

CH 2

Uploaded by

micster0116

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 98

Topics in Mathematics 2

Gerald Trutnau
Seoul National University
Fall Term 2024

Non-Corrected version

This text is a summary of the lecture

Topics in Mathematics 2
held at Seoul National University
(Fall Term 2024)
Please email all misprints and mistakes to me at
[email protected]

2
Contents

1 Basic Notions 5
1 Probability spaces . . . . . . . . . . . . . . . . . . . . . . . . 5
2 Discrete models . . . . . . . . . . . . . . . . . . . . . . . . . 13
3 Transformations of probability spaces . . . . . . . . . . . . . . 18
4 Random variables . . . . . . . . . . . . . . . . . . . . . . . . 21
5 Inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
6 Variance and Covariance . . . . . . . . . . . . . . . . . . . . . 30
7 The (strong and the weak) law of large numbers . . . . . . . . 33
8 Convergence and uniform integrability . . . . . . . . . . . . . . 39
9 Distribution of random variables . . . . . . . . . . . . . . . . . 47
10 Weak convergence of probability measures . . . . . . . . . . . 51
11 Dynkin-systems and Uniqueness of probability measures . . . . 57

2 Independence 63
1 Independent events . . . . . . . . . . . . . . . . . . . . . . . . 63
2 Independent random variables . . . . . . . . . . . . . . . . . . 69
3 Kolmogorov’s law of large numbers . . . . . . . . . . . . . . . 71
4 Joint distribution and convolution . . . . . . . . . . . . . . . . 78
5 Characteristic functions . . . . . . . . . . . . . . . . . . . . . 86
6 Central limit theorem . . . . . . . . . . . . . . . . . . . . . . 88

3
Bibliography

1. Bauer, H., Probability theory, de Gruyter, 1996.

2. Bauer, H., Measure and integration theory, de Gruyter, 1996, ISBN: 3-

11-016719-0.

3. Billingsley, P., Probability and Measure, third edition. Wiley, 1995. ISBN:
0-471-00710-2.

4. Billingsley, P., Convergence of probability measures, Wiley, 1999.

5. Chung, K. L., A Course in Probability Theory, Third Edition, Academic

Press;

6. Dudley, R.M., Real analysis and probability, Cambridge University Press,

2002.

7. Feller, W., An introduction to probability theory and its applications, Vol.

1 & 2, Wiley, 1950.

8. Halmos, P.R., Measure Theory, Springer, 1974.

9. Jacod, J.; Protter, P., Probability essentials, second edition. Universitext.

Springer, ISBN: 3-540-43871-8

10. Klenke, A., Probability theory. A comprehensive course., Universitext.

Springer, ISBN: 978-1-84800-047-6.

11. Shiryaev, A.N., Probability, Springer, 1996.

4
1 Basic Notions

1 Probability spaces
Probability theory is the mathematical theory of randomness. The basic notion is
that of a random experiment, which is an event whose outcome is not predictable
and can only be determined after performing it and then observing the outcome.
Probability theory tries to quantify the possible outcomes by attaching a
probability to every event. This is of importance for example for an insurance
company when asking the question what is a fair price of an insurance against
events like fire or death that are events that can happen but need not happen.
The set of all possible outcomes of a random experiment is denoted by Ω.
The set Ω may be finite, denumerable or even uncountable.
Example 1.1. Examples of random experiments and corresponding Ω:
(i) Coin tossing: The possible outcomes of tossing a coin are either “head”
or “tail”. Denoting one outcome by “0” and the other one by “1”, the set
of all possible outcomes is given by Ω = {0, 1}.

(ii) Tossing a coin n times: In this case any sequence of zeros and ones
(alias heads or tails) of length n are considered as one possible outcome;
hence
Ω = (x1 , x2 , . . . , xn ) xi ∈ {0, 1} =: {0, 1}n

is the space of all possible outcomes.

(iii) Tossing a coin infinitely many times: In this case

Ω = (xi )i∈N xi ∈ {0, 1} =: {0, 1}N .
In this case Ω is uncountable in contrast to the previous examples. We
can define a surjection from Ω onto the set [0, 1] ⊂ R using the binary
expansion
∞
X
x= xi 2−i .
i=1

5
(iv) A random number between 0 and 1: Ω = [0, 1].

(v) Continuous stochastic processes, e.g. Brownian motion on R:

Any continuous real-valued function
defined on [0, 1] ⊂ R is a possible
outcome. In this case Ω = C [0, 1] .
Example:
R

ω(t)

t
Events:
Reasonable subsets A ⊂ Ω for which it makes sense to calculate the probabil-
ity are called events (a precise definition will be given in Definition 1.3 below).
If we consider an event A and observe ω ∈ A in a random experiment, we say
that A has occured.

• elementary events: A = {ω} for some ω ∈ Ω

• the impossible event A = ∅ (never occurs) and the certain event A = Ω

(always occurs)

• “A does not occur” Ac = Ω \ A

Combination of events:
[ “at least one of the events Ai oc-
A1 ∪ A2 , Ai
cur”
i
\
A1 ∩ A2 , Ai “all of the events Ai occur”
i

6
\[
lim sup An := Am “infinitely many of the Am occur”
n→∞
n m>n

[\ “all but finitely many of the Am oc-

lim inf An := Am
n→∞ cur”
n m>n

Example 1.2. (i) Coin tossing “1 occurs”: A = {1}

(ii) Tossing a coin n times “tossing k ones”:

n n
X o
n
A = (x1 , . . . , xn ) ∈ {0, 1} xi = k .
i=1

(iii) Tossing a coin infinitely many times “relative frequency of 1 equals

p”:
n
N 1X
A= (xi )i∈N ∈ {0, 1} lim xi = p .
n→∞ n
i=1

(iv) A random number 0 and 1: “number ∈ [a, b]”, A = [a, b] ⊂ Ω = [0, 1].

(v) Continuous stochastic processes “exceeding level c”:

A = ω ∈ C [0, 1]

max ω(t) > c .
06t61

c
0

ω(t)

7
Let Ω be countable. A probability distribution function p on Ω is a function
X
p : Ω → [0, 1] with p(ω) = 1 .
ω∈Ω

Given any subset A ⊂ Ω, its probability P(A) can then be defined by simply
adding up X
P(A) = p(ω) .
ω∈A

In the uncountable case, however, there is no reasonable way of adding up an

uncountable set of numbers. There is no way to build a reasonable theory
by starting with probability functions specifying the probability of individual
outcomes. The best way out is to specify directly the probability of events.
In the uncountable case it is not possible in general to consider the power set
P(Ω), i.e. the collection of all subsets of Ω (including the empty set ∅ and the
whole set Ω) but only a certain subclass A. On the other hand A should satisfy
some minimal requirements specified in the following definition:

Definition 1.3. A ⊂ P(Ω) is called a σ-algebra, if

(i) Ω ∈ A,

(ii) A ∈ A implies Ac ∈ A,

(iii) Ai ∈ A, i ∈ N, implies Ai ∈ A.
S
i∈N

Remark 1.4. (i) Let A be a σ-algebra. Then:

• ∅ = Ωc ∈ A.
• Ai ∈ A, i ∈ N, implies
\ [ c
Ai = Aci ∈ A.
i∈N i∈N

• A1 , . . . , An ∈ A implies
n
[ n
\
Ai ∈ A and Ai ∈ A
i=1 i=1

(consider Am = ∅ for all m > n, resp. Am = Ω for all m > n).

8
• Ai ∈ A, i ∈ N, implies
\[ [\
Am ∈ A and Am ∈ A.
n m>n n m>n

(ii) The power set P(Ω) is a σ-algebra.

(iii) Let I be an index set (not necessarily countable) and for any i ∈ I, let
Ai be a σ-algebra. Then i∈I Ai := {A ⊂ Ω | A ∈ Ai for any i ∈ I} is
T
again a σ-algebra.

(iv) Typical construction of a σ-algebra: Let A0 6= ∅ be a class of events.

Then
\
σ(A0 ) := B
B σ-algebra
in Ω ,A0 ⊂B

is the smallest σ-algebra containing A0 . σ(A0 ) is called the σ-algebra

generated by A0 .

Example 1.5. Let Ω be a topological space, and Ao be the collection of open

subsets of Ω. Then B(Ω) := σ(Ao ) is called the Borel-σ-algebra of Ω, or
σ-algebra of Borel-subsets.
Example of Borel-subsets: closed sets, countable unions of closed sets, etc...
Note: not every subset of a topological space is a Borel-subset, e.g. B(R) 6=
P(R).
Definition 1.6. Let Ω 6= ∅ and A ⊂ P(Ω) a σ-algebra. A mapping P : A →
[0, 1] is called a probability measure (on (Ω, A)) if:

• P(Ω) = 1

•
∞

[ X
P Ai = P(Ai ) ( “σ-additivity”)
i∈N i=1

for all pairwise disjoint Ai ∈ A, i ∈ N.

In this case (Ω, A, P) is called a probability space and A ∈ A an event. The

pair (Ω, A) of a set Ω together with a σ-algebra is called a measurable space.

9
Example 1.7. (i) Coin tossing Let A := P(Ω) = ∅, {0}, {1}, {0, 1} .

Tossing a fair coin means “head” and “tail” have equal probability 12 , hence:
1
P({0}) := P({1}) := , P(∅) := 0, P({0, 1}) := 1.
2 | {z }
=Ω

(iii) Tossing a coin infinitely many times: Ω = {0, 1}N .

Let A := σ(A0 ) where

A0 := B ⊂ Ω ∃ n ∈ N and B0 ∈ P {0, 1}n ,

such that B = B0 × {0, 1} × {0, 1} × . . . .

An event B is contained in A0 if it depends on finitely many tosses.

Fix x̄1 , . . . , x̄n ∈ {0, 1} and define

:= 2−n .

P (x1 , x2 , . . . ) ∈ {0, 1}N x1 = x̄1 , . . . , xn = x̄n
| {z }
∈A0

P can be extended to a probability measure on A = σ(A0 ). For this

probability measure we have that
n
N 1X 1
P (x1 , x2 , . . . ) ∈ {0, 1} lim xi = = 1.
n→∞ n 2
i=1

(Proof: Later !)

(v) Continuous stochastic processes: Ω = C([0, 1]), P = Wiener measure

on (Ω, B(Ω)) (“Brownian motion”). For fixed t0 ∈ (0, ∞) and α, β ∈ R,
α < β, we have that
Z β 2
1 x
− 2t
P ω ω(t0 ) ∈ [α, β] := √ e 0 dx
2πt0 α

(Gaussian or normal distribution).

What is B(Ω) and how to construct the Wiener measure ? An answer
to this question is given in a course on stochastic processes or stochastic
differential equations !
What is now the probability P ω max ω(t) > c ? Answer to this

06t61
question: see also a course on stochastic processes or stochastic differential
equations !

10
R

β
α
0

ω(t)

t0 t
Remark 1.8. Let (Ω, A, P) be a probability space, and let A1 , . . . , An ∈ A be
pairwise disjoint. Then
n

[ X
P Ai = P(Ai ) (P is additive)
i6n i=1

(simply let Am = ∅ for all m > n). In particular:

A, B ∈ A, A ⊂ B ⇒ P(B) = P(A) + P(B \ A)
⇒ P(B \ A) = P(B) − P(A),
and P(Ac ) = P(Ω \ A) = P(Ω) − P(A) = 1 − P(A).
P is subadditive, that is, for A, B ∈ A
P(A ∪ B) = P A ∪ B \ (A ∩ B)

= P(A) + P(B) − P(A ∩ B)

6 P(A) + P(B), (1.1)
and by induction one obtains Sylvester’s formula:
Let I be a finite index set, Ai , i ∈ I, be a collection of subsets in A (not
necessarily disjoint). Then (below, |J| denotes the number of elements in J)
[ X \
P Ai = (−1)|J|−1 · P Aj (1.2)
i∈I J⊂I, j∈J
J6=∅
n
I={1,...,n} X X
= (−1)k−1 · P(Ai1 ∩ · · · ∩ Aik ).
k=1 16i1 <···<ik 6n

11
Proposition 1.9. Let A be a σ-algebra, and P : A → R+ := [0, ∞) be a
mapping with P(Ω) = 1. Then the following are equivalent:

(i) P is a probability measure.

(ii) P is additive and continuous from below, that is, Ai ∈ A, i ∈ N, with

Ai ⊂ Ai+1 for all i ∈ N, implies
[
P Ai = lim P(Ai ).
i→∞
i∈N

(iii) P is additive and continuous from above, that is Ai ∈ A, i ∈ N, with

Ai ⊃ Ai+1 for all i ∈ N, implies
\
P Ai = lim P(Ai ).
i→∞
i∈N

Corollary 1.10 (σ-subadditivity). Let (Ω, A, P) be a probability space and Ai ,

i ∈ N, be a sequence of subsets in A (not necessarily pairwise disjoint). Then:
∞
[ ∞
X
P Ai 6 P(Ai ).
i=1 i=1

Proof.
∞
[ 1.9 n
[ (1.1) n
X ∞
X
P Ai = lim P Ai 6 lim P(Ai ) = P(Ai ).
n→∞ n→∞
i=1 i=1 i=1 i=1

Lemma 1.11 (Borel-Cantelli). Let (Ω, A, P) be a probability space and Ai ∈

A, i ∈ N. Then
∞
X \ [
P(Ai ) < ∞ ⇒ P Am = 0.
i=1 n∈N m>n
| {z }
=:lim sup An
n→∞

Proof. Since
[ n→∞ \ [
Am & Am
m>n n∈N m>n

12
the continuity from above of P implies that
[ 1.10 ∞
1.9 X
P lim sup An = lim P Am 6 lim P(Am ) = 0,
n→∞ n→∞ n→∞
m>n m=n
P∞
since m=1 P(Am ) < ∞.

Example 1.12. (i) Uniform distribution on [0, 1]: Let Ω = [0,1] and A
be the Borel-σ-algebra on Ω (= σ [a, b] 0 6 a 6 b 6 1 ). Let P
be the restriction of the Lebesgue measure on the Borel subset [0, 1] of
R. Then (Ω, A, P) is a probability space. The probability measure P is
called the uniform distribution on [0, 1], since P([a, b]) = b − a for any
0 ≤ a ≤ b ≤ 1 (translation invariance).

(ii) Dirac-measure: Let Ω 6= ∅ and ω0 ∈ Ω. Let A be an arbitrary σ-algebra

on Ω (e.g. A = P(Ω)). Then
(
1 if ω0 ∈ A
P(A) := 1A (ω0 ) :=
0 if ω0 ∈
/ A.

defines a probability measure on A. P is called the Dirac-measure in ω0 ,

denoted by P = δω0 or P = εω0 .

(iii) Convex combinations of probability measures: Let Ω 6= ∅ and A be

a σ-algebra of subsets of Ω. Let I be a countable index set. Let Pi , i ∈ I,
be a family of probability measures on (Ω, A), and αi ∈ [0, 1], i ∈ I, be
such that Then i∈I αi · Pi is again a probability
P P
i∈I αi = 1. P :=
measure on (Ω, A).
This holds in particular for
X
P := αi · δωi
i∈I

if ωi ∈ Ω, i ∈ I.

2 Discrete models
Throughout the whole section

13
• Ω 6= ∅ countable (i.e finite or denumerable)

• A = P(Ω) and

• ω ∈ Ω (more precisely {ω} ⊂ Ω) an elementary event.

Proposition 2.1. (i) Let p : Ω → [0, 1] be a function with

P
ω∈Ω p(ω) =1
(p is called a probability distribution function). Then
X
P(A) := p(ω) ∀A ⊂ Ω
ω∈A

defines a probability measure on (Ω, A).

(ii) Every probability measure P on (Ω, A) is of this form, with p(ω) := P({ω})
for all ω ∈ Ω.

Proof. (i)
X
P= p(ω) · δω .
ω∈Ω

(ii) Exercise.

Example 2.2 (Laplace probability space). Fundamental example in the discrete

case that forms the basis of many other discrete models.
Let Ω be a nonempty finite set (that is 0 < |Ω| < ∞). Define

1
p(ω) = ∀ω ∈ Ω .
|Ω|

Then

|A| number of elements in A

P(A) = = .
|Ω| number of elements in Ω

Hence measure theoretic problems reduce to combinatorial problems in the dis-

crete case.
P is called uniform distribution on Ω, because every elementary event ω ∈ Ω
has the same probability |Ω|
1
.

14
Example 2.3. (i) random permutations: Let M := {1, . . . , n} and Ω :=
all permutations of M . Then |Ω| = n! Let P be the uniform distribution
on Ω.
Problem: What is the probability P(“at least one fixed point”)?
Consider the event Ai := {ω | ω(i) = i} (fixed point at position i). Then
Sylvester’s formula (cf. (1.2)) implies that
n
[
P(“at least one fixed point”) = P Ai
i=1
n
(1.2) X X
= (−1)k+1 · P(Ai1 ∩ · · · ∩ Aik )
| {z }
k=1 16i1 <···<ik 6n
= (n−k)!
n!
(k positions are fixed)
n X (−1)k n
X
k+1 n (n − k)!
= (−1) · =− .
k n! k!
k=1 k=1

Consequently,
n n
X (−1)k X (−1)k n→∞
P(“no fixed point”) = 1 + = −−−→ e−1
k! k!
k=1 k=0

and thus for all k ∈ {0, . . . , n}:

P(“exactly k fixed points”)

n−k n−k
(−1)j 1 X (−1)j

1 n X
= · · (n − k)! = .
n! k j! k! j!
j=0 j=0
|{z} | {z } | {z }
all possible all ω with all ω with n-k positions
outcomes k fixed points without fixed points

Asymptotics as n → ∞:
n−k
1 X (−1)j n→∞ 1 −1
P(“exactly k fixed points”) = −−−→ ·e
k! j! k!
j=0

(the number of fixed points is asymptotically Poisson distributed with pa-

rameter λ = 1).

15
The Poisson distribution with parameter λ > 0 on N ∪ {0} is given by
∞
−λ
X λj
πλ := e · δj .
j!
j=0

(ii) n experiments with state space S, |S| < ∞

|Ω| = |S|n .

Ω := ω = (x1 , . . . , xn ) xi ∈ S ,

Let P be the uniform distribution on Ω.

Fix a subset S0 ⊂ S, such that xi ∈ S0 is called a “success”, hence
|S |
p := |S|0 is the probability of success.
What is the probability of the event Ak = “(exactly) k successes”, k =
0, . . . , n?
n

|S0 |k · |S \ S0 |n−k

|A | n
P(Ak ) = k = k
= · pk (1 − p)n−k
|Ω| |S|n k

(Binomial distribution with parameters n, p).

The Binomial distribution with parameters n, p on {0, ..., n} is given by
n
X n
B(n, p) := βnp := · pk (1 − p)n−k · δk ,
k
k=0

where p = P(“Success in the i-th experiment”), i = 1, . . . , n (see above).

Moreover, if we replace p by pn := nλ , λ > 0, then for k = 0, 1, 2, ... the
asymptotics of the binomial distribution as n → ∞ is given by
k n−k
n λ λ
· 1−
k n n
n−k
λk n · (n − 1) · · · (n − k + 1)

λ
= · k
· 1−
k! n n
| {z } | {z }
n→∞ n→∞
−−−→ 1 −−−→ e−λ

n→∞ λk
−−−→ · e−λ (k = 0, 1, 2, . . .)
k!

16
(for n big and p small, the Poisson distribution with parameter λ = p · n
is a good approximation for B(n, p)).

(iii) Urn model (for example: opinion polls, samples, poker, lottery...)
We consider an urn containing N balls, K red and N − K black (N ≥ 2,
0 6= K 6= N ). Suppose that n 6 N balls are sampled without replace-
ment. What is the probability that exactly k balls in the sample are red?
typical application: suppose that a small lake contains an (unknown) num-
ber N of fish. To estimate N one can do the following: K fish will be
marked by red and after that n (n ≤ N ) fish are “sampled” from the
lake. If k is the number of marked fish in the sample, N̂ := K · nk is an
estimation of the unknown number N . In this case the probability below
with N replaced by N̂ is also an estimation.
Model:
Let Ω be all subsets of {1, . . . , N } having cardinality n, hence

N
Ω := ω ∈ P {1, . . . , N }

|ω| = n , |Ω| =
n

and let P be the uniform distribution on Ω. Consider the event Ak :=

“exactly k red”. Then

K N −K
|Ak | = ,
k n−k

so that
K N −K

k n−k
P(Ak ) = N
(k = 0, . . . , n), (hypergeometric distribution).
n

Asymptotics for N → ∞, K → ∞ with K

N → p ∈ [0, 1] and n fixed:

n
P(Ak ) −→ · pk (1 − p)n−k (k = 0, . . . , n),
k

(good approximation for N big and n

N small).

17
3 Transformations of probability spaces
Throughout this section let (Ω, A) and (Ω̃, Ã) be measurable spaces.
Definition 3.1. A mapping T : Ω → Ω̃ is called A/Ã-measurable (or simply
measurable), if T −1 (Ã) ∈ A for all Ã ∈ Ã.
Notation:

{T ∈ Ã} := T −1 (Ã) = ω ∈ Ω T (ω) ∈ Ã .

Remark 3.2. (i) Clearly, if A := P(Ω) then every mapping T : Ω → Ω̃ is

measurable.

(ii) Sufficient criterion for measurability: suppose that Ã := σ(Ã0 ) for some
collection of subsets Ã0 ⊂ P(Ω̃). Then T is A/Ã-measurable, if T −1 (Ã) ∈
A for all Ã ∈ Ã0 .
(iii) Let Ω, Ω̃ be topological spaces, and A, Ã be the associated Borel σ-
algebras. Then:

T : Ω → Ω̃ is continuous ⇒ T is A/Ã-measurable.

(iv) Let (Ωi , Ai ), i = 1, 2, 3, be measurable spaces, and Ti : Ai → Ai+1 ,

i = 1, 2, measurable mappings. Then:

T2 ◦ T1 is A1 /A3 -measurable.

Proof. (ii) {Ã ∈ P(Ω̃) | T −1 (Ã) ∈ A} is a σ-algebra containing Ã0 . Conse-

quently,

σ(Ã0 ) ⊂ Ã ∈ P(Ω̃) T −1 (Ã) ∈ A .

(iii) Easy consequence of (ii).

(iv) Exercise.
Definition 3.3. Let T : Ω̄ → Ω be a mapping and let A be a σ-Algebra of
subsets of Ω. The system

σ(T ) := T −1 (A) A ∈ A

is a σ-algebra of subsets of Ω̄; σ(T ) is called the σ-algebra generated by T . More

precisely: σ(T ) is the smallest σ-algebra Ā, such that T is Ā/A-measurable.

18
Proposition 3.4. Let T : Ω → Ω̃ be A/Ã-measurable and P be a probability
measure on (Ω, A). Then

P̃(Ã) := T (P)(Ã) := P T −1 (Ã) =: P(T ∈ Ã), Ã ∈ Ã,

defines a probability measure on (Ω̃, Ã). P̃ is called the induced measure on

(Ω̃, Ã) or the distribution of T under P.
Notation: P̃ = P ◦ T −1 or P̃ = T (P).

Proof. Clearly, P̃(Ã) > 0 for all Ã ∈ Ã, P̃(∅) = 0 and P̃(Ω̃) = 1. For pairwise
disjoint Ãi ∈ Ã, i ∈ N, T −1 (Ãi ) are pairwise disjoint too, hence
P is ∞ ∞

[ −1

[ σ-additive X
−1
X
P̃ Ãi = P T Ãi = P T (Ãi ) = P̃(Ãi ).
i∈N i∈N i=1 i=1
| {z }

S
= T −1 (Ãi )
i∈N

Remark 3.5. Let T be as in Proposition 3.4, and T (Ω) be countable, so that

T (Ω) = {ω̃i | i ∈ I}, with I finite or I = N. Then
X
P̃ = P(T = ω̃i ) · δω̃i .
i∈I

Proof. For any Ã ∈ Ã we can write

[
T ∈ Ã = {T = ω̃i }
{i∈I,ω̃i ∈Ã}

so that
X X
P̃(Ã) = P(T ∈ Ã) = P(T = ω̃i )·1Ã (ω̃i ) = P(T = ω̃i )·δω̃i (Ã).
| {z }
i∈I i∈I
=δω̃i (Ã)

Example 3.6. Infinitely many coin tosses: existence of a probability

measure. Let Ω := [0, 1] and A be the Borel σ-algebra on [0, 1]. Let P be the
restriction of the Lebesgue measure on [0, 1]. Let

Ω̃ := ω̃ = (xn )n∈N xi ∈ {0, 1} ∀ i ∈ N = {0, 1}N .

19
Define X̃i : Ω̃ → {0, 1} by

X̃i (xn )n∈N := xi , i ∈ N,
and let
Ã := σ

{X̃i = 1} i ∈ N .

Note that Ã = σ(A0 ), where A0 is the algebra of cylindrical subsets of Example

1.7 (iii). The binary expansion of some ω ∈ [0, 1] defines a mapping
T : Ω → Ω̃
ω 7→ T (ω) = (T1 (ω), T2 (ω), . . . ),
with

R R
T1 (ω) T2 (ω)
1 1

1 1 1 3
2
1 Ω 4 2 4
1 Ω

(and similar for T3 , T4 , . . . ). Note that Ti = X̃i ◦ T for all i ∈ N. T is

A/Ã-measurable, since
T −1 {X̃i = 1} = {Ti = 1} = finite union of intervals ∈ A .

Define P̃ := P ◦ T −1 . For fixed (x1 , . . . , xn ) ∈ {0, 1}n we now obtain

n
\
P̃(X̃1 = x1 , . . . , X̃n = xn ) = P̃ {X̃i = xi }
i=1
n
\ n
\
−1
= P T ({X̃i = xi }) = P {Ti = xi }
i=1 i=1
−n −n
= P(interval of length 2 ) = 2 .
Hence, for any fixed n, P̃ coincides with the probability measure for n coin
tosses ( = uniform distribution on binary sequences of length n). We have thus
shown the existence of a probability measure P̃ on (Ω̃, Ã) and solved a part of
the problem of 1.7.
Uniqueness of P̃ later !

20
4 Random variables
Let (Ω, A) be a measurable space and

B(R̄) = B ⊂ R̄ B ∩ R ∈ B(R) .

R̄ := R ∪ {−∞, +∞},

Definition 4.1. A random variable (r.v.) on (Ω, A) is a A/B(R̄)-measurable

map X : Ω → R̄.
Remark: We will mainly consider real-valued r.v. X : Ω → R. In this case,
it is of course enough to check A/B(R)-measurability. For a r.v. we let

{X 6 c} := {ω ∈ Ω | X(ω) 6 c}, {X < c} := {ω ∈ Ω | X(ω) < c}, etc.

Remark 4.2. (i) X : Ω → R̄ is a random variable if for all c ∈ R, {X 6

c} = {X ∈ {−∞} ∪ (−∞, c])} ∈ A. In the same way, if X : Ω → R is
real-valued, it is enough to check {X 6 c} = {X ∈ (−∞, c]} ∈ A for all
c ∈ R. In both cases, it even suffices to only consider c ∈ Q instead of
c ∈ R, or instead of Q any other set S ⊂ R which is dense in R.

(ii) If A = P(Ω), then every function from Ω to R̄ is a random variable on

(Ω, A).

(iii) Let X be a random variable on (Ω, A) with values in R (resp. R̄) and
h : R → R (resp. h : R̄ → R̄) be B(R)/B(R)-measurable (resp.
B(R̄)/B(R̄)-measurable). Then h(X) is a random variable too.
Examples: |X|, X 2 , |X|p , eX , . . .

(iv) The class of random variables on (Ω, A) is closed under the following
countable operations.
If X1 , X2 , . . . are random variables, then
Pn
• · Xi (αi ∈ R, n ∈ N), provided the sum of R̄-valued r.v.’s
i=1 αi
makes sense, ∞ − ∞ = ?,
• supi∈N Xi , inf i∈N Xi , in particular

X1 ∧ X2 := min(X1 , X2 ), X1 ∨ X2 := max(X1 , X2 )

are r.v.’s
• lim supi→∞ Xi , lim inf i→∞ Xi (hence also limi→∞ Xi , if it exists),

21
are random variables too.

Proof. (i) Obvious, since σ({{−∞} ∪ (−∞, c] : c ∈ R}) = B(R̄), resp.

σ({(−∞, c] : c ∈ R}) = B(R). This also holds, if replace c ∈ R by
c ∈ S where S ⊂ R is any dense subset of R. For instance, if S = Q,
then for c ∈ R, we have
[
{X > c} = {X > q},
q∈Q
q>c

since for a real number x it holds x > c ⇔ x > q for some q ∈ Q, q > c.

(ii) and (iii) Obvious.

(iv) for example:

• supremum
\
{Xi 6 c} ∈ A.

(sup Xi ) 6 c =
i∈N | {z }
i∈N ∈A

• sum (assume X, Y to be real-valued r.v.’s). Then for c ∈ R

[
{X < r} ∩ {Y < c − r} ∈ A.

X +Y <c =
| {z }
r∈Q ∈A

Important examples

Example 4.3. (i) Indicator functions of an event A ∈ A:

(
1 if ω ∈ A,
ω 7→ 1A (ω) := (alternative notation IA )
0 if ω ∈
/ A,

is a random variable, because


∅
 if c < 0,
{1A 6 c} = Ac if 0 6 c < 1,
if c > 1.

Ω


22
(ii) simple random variables
n
X
X= ci · 1Ai , ci ∈ R, Ai ∈ A,
i=1

Note: any finite-valued random variable is simple, because X(Ω) = {c1 , . . . , cn }

implies
X
ci 1Ai with Ai := {X = ci } = X −1 {ci } ∈ A.

X = X ·1∪ni=1 Ai =
i

Proposition 4.4 (Structure of random variables on (Ω, A)). Let X be a ran-

dom variable on (Ω, A). Then:

(i) X = X + − X − , with

X + := max(X, 0), X − := − min(X, 0) (random variables!).

(ii) Let X > 0. Then there exists a sequence of simple random variables
Xn , n ∈ N, with 0 ≤ Xn 6 Xn+1 and X = limn→∞ Xn (in short:
0 ≤ Xn % X).

Proof. (of (ii))

n
n2 −1
X i
Xn := 1 i i+1 + n1{X≥n}
2n { 2n ≤X< 2n }
i=0

Let (Ω, A, P) be a probability space.

Definition 4.5. Let X be a random variable on (Ω, A) with

Z Z
min X + dP, X − dP < ∞. (1.3)

Then
Z Z
E[X] := X dP = X dP
Ω

is called the expectation of X (w.r.t. P).

23
Definition/Construction of the integral w.r.t. P:
Let X be a random variable.

1. If X = 1A , A ∈ A, define
Z
X dP := P(A) .

Pn
2. If X = i=1 ci · 1Ai , ci ∈ R, Ai ∈ A, define
Z n
X
X dP := ci · P(Ai )
i=1

(have to show: independent of the particular representation of X!)

3. X ≥ 0, then there exist Xn simple, Xn ≥ 0 (see 4.4) with Xn % X. Define

Z Z

X dP := lim Xn dP ∈ [0, ∞] .
n→∞

(have to show: independent of the particular choice for Xn !)

4. for general X, decompose X = X + − X − and define

Z Z Z
X + dP − X − dP.

E[X] = X dP :=

(well-defined, if (1.3) satisfied by using the usual definitions for c ∈ R:

−∞ + c := c − ∞ := −∞, c + ∞ := ∞ + c := ∞.)

Definition 4.6. (i) The set of all P-integrable random variables is defined
by

L1 := L1 (Ω, A, P) := X r.v. E |X| < ∞ .

(ii) A property E of points ω ∈ Ω holds P-almost surely (P-a.s.), if there

exists a measurable zero set N , i.e. a set N ∈ A with P(N ) = 0, such
that every ω ∈ Ω \ N has property E.

24
If

N := {X r.v. | X = 0 P-a.s.}

then the quotient space

L1.
L := L (Ω, A, P) :=
1 1
(X ∼ Y :⇔ X −Y ∈ N :⇔ X = Y P-a.s.)
N
is a Banach space w.r.t. the norm E |X| .

Remark 4.7. Special case: X random variable, X ≥ 0, X(Ω) countable. Then

(with the usual definitions (+∞) · 0 := (−∞) · 0 := 0, (+∞) · α := +∞ if
α > 0, (+∞) · α := −∞ if α < 0, etc) we have

h X “3.” and
i “2.” above X
E[X] = E x · 1{X=x} = x · P(X = x). (1.4)
x∈X(Ω) x∈X(Ω)

Similarly for X not necessarily finite, but E[X] well-defined:

X X
E[X] = x · P(X = x) − (−x) · P(X = x).
x∈X(Ω), x∈X(Ω),
x>0 x<0

If, in addition, Ω is countable, and X > 0, then

X
X= X(ω) · 1{ω} , and
ω∈Ω

X X X
E[X] = X(ω) · E[1{ω} ] = X(ω) · P({ω}) = p(ω) · X(ω).
| {z }
ω∈Ω ω∈Ω =:p(ω) ω∈Ω

Example 4.8. Infinitely many coin tosses with a fair coin: Let Ω =
{0, 1}N . A and P as in 3.6

(i) Expectation of the ith coin toss Xi (xn )n∈N := xi :

(1.4) 1
E[Xi ] = 1 · P(Xi = 1) + 0 · P(Xi = 0) = .
2

25
(ii) Expectation of number of “successes”:
Sn := X1 + · · · + Xn = number of “successes”(= ones) in n tosses
Then for k = 0, 1, . . . , n
X
P(Sn = k) = P(X1 = x1 , . . . , Xn = xn )
(x1 ,...,xn )∈{0,1}n
with
x1 +...+xn =k

n
= · 2−n .
k

Hence
n n
(1.4) X X n n
E[Sn ] = k · P(Sn = k) = k· · 2−n = .
k 2
k=0 k=1

Easier: Once we have noticed that E[ · ] is linear (see next proposition):

n
X (i) n
E[Sn ] = E[Xi ] = .
2
i=1

(iii) Waiting time until first success: Let

T (ω) := min{n ∈ N | Xn (ω) = 1}

= waiting time until first success

(min ∅ := +∞, T measurable !, why ?). Then

P(T = k) = P(X1 = · · · = Xk−1 = 0, Xk = 1) = 2−k ,

and P(T = ∞) = 0 so that

∞ ∞ ∞ k−1
(1.4) X X
−k 1X 1
E[T ] = k · P(T = k) = k·2 = k· = 2.
2 2
k=1 k=1 k=1

q P∞ P∞
(Recall: 1 d d
)
k k−1
(1−q)2
= dq 1−q = dq k=1 q = k=1 kq

Remark 4.9. X = Y P-a.s., i.e. P(X = Y ) = 1, implies E[X] = E[Y ].

26
Proposition 4.10. Let X, Y be r.v. satisfying (1.3). Then
(i) 0 6 X 6 Y P-a.s. =⇒ 0 6 E[X] 6 E[Y ].

(ii) α, β ∈ R, Y ∈ L1 =⇒ E[αX + βY ] = α E[X] + β E[Y ].

In particular, (i) implies: X 6 Y P-a.s. =⇒ E[X] 6 E[Y ] (only (1.3) is
necessary !).
Proof. See textbooks on measure theory.
In addition X 7→ E[X] is continuous w.r.t. monotone increasing sequences, i.e.
the following proposition holds:
Proposition 4.11 (monotone integration, B. Levi). Let Xn random variables
with 0 6 X1 6 X2 6 . . . . Then:

lim E[Xn ] = E lim Xn .
n→∞ n→∞
Proof. See textbooks on measure theory.
Corollary 4.12. Let Xn > 0, n ∈ N, be random variables. Then
∞
hX i X∞
E Xn = E[Xn ].
n=1 n=1
Lemma 4.13 (Fatou’s lemma). Let Xn , n ∈ N, be a sequence random vari-
ables satisfying (1.3) and let Y ∈ L1 . Then
(i) Xn > Y P-a.s. for all n ∈ N =⇒ E lim inf n→∞ Xn 6 lim inf n→∞ E[Xn ].

(ii) Xn 6 Y P-a.s. for all n ∈ N =⇒ E lim supn→∞ Xn > lim supn→∞ E[Xn ].

(Remark: (i) and (ii) are mostly applied with Y = 0.)

Proof. (i) By Remark 4.9 and Corollary 5.6 below we may assume that |Y (ω|) <
∞ for all ω ∈ Ω
4.10(ii)
E lim inf Xn = E lim inf Xk − Y + E Y
n→∞ n→∞ k>n
| {z }
satisfies (1.3)

B. Levi
= lim E inf Xk − Y +E Y
n→∞ k >n
4.10 4.10(ii)
6 lim inf E[Xk − Y ] + E Y = lim inf E[Xn ].
n→∞ k>n n→∞

(ii) is similar to (i).

27
Proposition 4.14 (Lebesgue’s dominated convergence theorem, DCT). Let
Xn , n ∈ N be random variables and Y ∈ L1 with |Xn | 6 Y P-a.s. Suppose
that the pointwise limit limn→∞ Xn exists P-a.s. Then

E lim Xn = lim E[Xn ].
n→∞ n→∞

Proof. Since −Y 6 Xn 6 Y P-a.s. for any n and

lim inf Xn = lim sup Xn = lim Xn P − a.s,

n→∞ n→∞ n→∞

it follows
4.9 Fatou
E lim Xn = E lim inf Xn 6 lim inf E[Xn ] 6 lim sup E[Xn ]
n→∞ n→∞ n→∞ n→∞
Fatou 4.9
6 E lim sup Xn = E lim Xn .
n→∞ n→∞

Example 4.15. Tossing a fair coin Consider the following simple game: A
fair coin is thrown and the player can invest an arbitrary amount of KRW on
either “head” or “tail”. If the right side shows up, the player gets twice his
investment back, otherwise nothing.
Suppose now a player plays the following bold strategy: he doubles his invest-
ment until his first success. Assuming the initial investment was 1000 KRW,
the investment in the nth round is given by

In = 2n−1 · 1{T >n−1} ,

where T = waiting time until the first “1”. Then

E[In ] = 2n−1 · P(T > n − 1) = 1.

| {z }
n−1
=( 12 )

whereas on the other hand limn→∞ In = 0 P-a.s. (more precisely: for all
ω 6= (0, 0, 0, . . . )).

5 Inequalities
Let (Ω, A, P) be a probability space.

28
Proposition 5.1 (Jensen’s inequality). Let h be a convex function defined on
some interval I ⊆ R, X in L1 with X(Ω) ⊂ I. Then E[X] ∈ I and

h E[X] 6 E h(X) .

Proof. W.l.o.g. we may asssume that x0 := E[X] ∈ ˚ I (otherwise X is P-a.s.

equal to a constant function for which the statement is trivial). Since h is
convex, there exists an affine linear function ` with `(x0 ) = h(x0 ) and ` 6 h
(“support tangent line”). Consequently,
linearity monotonicity
h E[X] = ` E[X] = E `(X) 6 E h(X) .
Example 5.2.

E[X]2 6 E[X 2 ].
More generally, for 0 < p 6 q:
1 1
E |X|p p 6 E |X|q q .

| {z } | {z }
=:kXkp =:kXkq
q p
Proof. h(x) := |x| p is convex. Since |X| ∧ n ∈ L1 for n ∈ N, we obtain
that
h p i pq h q i
E |X| ∧ n 6 E |X| ∧ n ,

which implies the assertion by B. Levi taking the limit n → ∞.

Definition 5.3. For 1 6 p < ∞ let

Lp := X X r.v. and E |X|p < ∞ .

Lp is called the set of p-integrable random variables.

Remark 5.4. (i) If 1 6 p 6 q then Lq ⊂ Lp .

(ii) Let N = {X | X r.v. and X = 0 P-a.s.}, and p > 1. Then N ⊂ Lp is a

linear subspace and the quotient space
Lp.
Lp :=
N
is a Banach space w.r.t. k · kp (i.e. a complete normed vector space).

29
Proposition 5.5. Let X be a random variable, h : R̄ → [0, ∞] be increasing.
Then

h(c) P(X > c) 6 E h(X) ∀ c > 0.

Proof.

h(c) P(X > c) 6 h(c) P h(X) > h(c) = E h(c) 1{h(X)>h(c)}

6 E h(X) .

Corollary 5.6. (i) Markov inequality: Choose h(x) = x1[0,∞] (x) and re-
place X by |X| in 5.5. Then
1
P |X| > c 6 E |X| ∀c > 0.
c
In particular,
|X| = 0 P-a.s.

E |X| = 0 ⇒
|X| < ∞ P-a.s.

E |X| < ∞ ⇒

(ii) Chebychev’s inequality: Choose h(x) = x2 1[0,∞] (x) and and replace
X by |X − E[X]| in 5.5. Then X ∈ L2 implies

1 var(X)
h 2 i
P X − E[X] > c 6 2 E X − E[X] = .
c c2

6 Variance and Covariance

Let (Ω, A, P) be a probability space.
E[X] = “average value” of X(ω), ω ∈ Ω (prediction).

Remark 6.1. Let P be the uniform distribution on Ω = {ω1 , . . . , ωn }, then

n
1X
E[X] = X(ωi ) = arithmetic mean of X(ω1 ), . . . , X(ωn ) .
n
i=1

Definition 6.2. Let X ∈ L1 . Then

h 2 i
2

var(X) := σ (X) := E X − E[X] ∈ [0, ∞]

30
is called the variance of X (mean square prediction error).
The variance is a measure for fluctuations of X around E[X]. It indicates
the risk thatpone takes when a prognosis is based on the expectation.
σ(X) := var(X) is called standard deviation.

Remark 6.3. (i)

h 2 i 2
var(X) = E X − E[X] = E[X 2 ] − E[X] .

(ii) var(X) = 0

⇔ P X = E[X] = 1
i.e. X behaves deterministically

(iii) var(X) < ∞ ⇔ X ∈ L2 .

Definition 6.4. Let X, Y ∈ L2 . Then

h i
cov(X, Y ) := E X − E[X] Y − E[Y ] = E[XY ] − E[X] · E[Y ]
(1.5)

is called the covariance of X and Y . If additionally X, Y are not deterministic,

then
cov(X, Y )
%(X, Y ) :=
σ(X) · σ(Y )

is called the correlation of X and Y . The correlation is independent of scaling,

i.e %(aX + b, cY + d) = %(X, Y ), if ac > 0 (use the formula in Remark 6.5(i)
below).

Remark 6.5 (properties of the covariance). (i) Let X ∈ L1 . Then

var(aX + b) = a2 · var(X).

(ii) Let X, Y ∈ L2 . Then

var(X + Y ) = var(X) + var(Y ) + 2 cov(X, Y ).

Definition 6.6. Two random variables X, Y ∈ L2 are called uncorrelated, if

(1.6)

cov(X, Y ) = 0 ⇔ var(X + Y ) = var(X) + var(Y ) .

31
Proposition 6.7 (Cauchy-Schwarz). Let X and Y ∈ L2 . Then

X · Y ∈ L1 and cov(X, Y ) 6 σ(X) · σ(Y ).

In particular, %(X, Y ) ∈ [−1, 1].

Proof. Let X, Y ∈ L2 . Then X + Y ∈ L2 , hence

2 · XY = (X + Y )2 − X 2 − Y 2 ∈ L1 .

cov(·, ·) : L2 × L2 → R, (X, Y ) 7→ cov(X, Y ) is bilinear, symmetric and

positive. Hence by the Cauchy-Schwarz inequality
1 1
| cov(X, Y )| ≤ cov(X, X) 2 cov(Y, Y ) 2 = σ(X) · σ(Y ).

Example 6.8. Tossing a coin with probability p ∈ [0, 1] for success:

Ω = ω = (x1 , x2 , . . . ) xi ∈ {0, 1} = {0, 1}N ,
Xi : Ω → {0, 1} with Xi (xn )n∈N = xi ,

i ∈ N,
A=σ

{Xi = 1} i ∈ N .

Then there exists a unique probability measure P = Pp on (Ω, A) with

n
P n
P
xi n− xi
Pp (Xi1 = x1 , . . . , Xin = xn ) = p i=1 ·(1−p) i=1 for any 1 ≤ i1 < ... < in ,

(existence for p = 12 in Example 3.6, existence for general p 6= 21 later, unique-

ness later).
Then P(Xi = 1) = p and P (Xi = 1, Xj = 1) = p2 for all i 6= j. Conse-
quently,

E[Xi ] = p and var(Xi ) = E[Xi2 ] − E[Xi ]2 = p − p2 = p(1 − p)

and for i 6= j
cov(Xi , Xj ) = E[Xi Xj ] − p2 = 0,
so that X1 , X2 , . . . are pairwise uncorrelated (in fact even independent, see
below).

32
Let Sn := X1 + · · · + Xn be the number of successes. Then

E[Sn ] = np and var(Sn ) = np(1 − p).

P∞ −n
If X := n=1 2 Xn then
∞
hX ∞
i Levi X ∞
X
−n −n
E[X] = E 2 Xn = E[2 Xn ] = 2−n p = p
n=1 n=1 n=1

and using Levi and the fact that X1 , X2 , . . . are pairwise uncorrelated, we con-
clude that
∞
X 1
var(X) = 2−2n · p(1 − p) = · p(1 − p).
3
n=1

Finally, let T be the waiting time until the first “success”. Then

Pp (T = n) = Pp X1 = · · · = Xn−1 = 0, Xn = 1
= (1 − p)n−1 p (geometric distribution),

then
∞
hX i ∞
X
E[T ] = E n · 1{T =n} = n · Pp (T = n)
n=1 n=1
∞
X 1
= n · (1 − p)n−1 p = ,
p
|n=1 {z }
“derivative of the
geometr. series”

and analogously
1−p
var(T ) = · · · = .
p2

7 The (strong and the weak) law of large

numbers
Let

33
• (Ω, A, P) be a probability space

• X1 , X2 , . . . ∈ L2 r.v. with
– Xi uncorrelated, i.e. cov(Xi , Xj ) = 0 for i 6= j
– uniformly bounded variances, i.e. supi∈N var(Xi ) < ∞.
| {z }
=σ 2 (Xi )
=:σi2

Let
Sn := X1 + · · · + Xn
S (ω)
so that nn is the arithmetic mean of the first n observations X1 (ω), . . . , Xn (ω)
(“empirical mean”).
Our aim in this section is to show that randomness in the empirical mean
vanishes for increasing n, i.e.

Sn (ω) n large E[Sn ]

∼ ,
n n
resp.

Sn (ω) n large
∼ m if E[Xi ] ≡ m .
n
Remark 7.1. W.l.o.g. we may assume that E[Xi ] = 0 for all i, because
otherwise consider X̃i := Xi − E[Xi ] (“centered”), which satisfies:

• X̃i ∈ L2

• cov(X̃i , X̃j ) = cov(Xi , Xj ) = 0 for i 6= j

• var(X̃i ) = var(Xj ).
E[S̃n ] E[Sn ] Pn
• S̃n
n − n = Sn
n − n (S̃n := i=1 X̃i "centered sum")

Proposition 7.2.

S n E[Sn ] 2
lim E − = 0,
n→∞ n n
2
Sn
(resp. lim E −m =0 if E[Xi ] ≡ m).
n→∞ n

34
Proof.

S n E[Sn ] 2 S
n 1
E − = var = · var(Sn )
n n n n2
n
Bienaymé 1 X 2 1 n→∞
= 2
σi 6 · const. −−−→ 0.
n n
i=1

Remark 7.3. Mere functional analytic fact: in the Hilbert space L2 = L2

∼
1
(with the scalar product (X, Y ) = E[X · Y ] and norm k · k = ( · , · ) 2 ), the
arithmetic mean of uniformly bounded, orthogonal vectors converges to zero:
2 n
Sn 1 2 1 X n→∞
= 2 · kSn k = 2 kXi k2 −−−→ 0.
n n n
i=1

Chebychev’s inequality immediately implies the following:

Proposition 7.4 (“Weak law of large numbers”). Let X1 , X2 , . . . ∈ L2 (Ω, A, P)
uncorrelated r.v. with uniformly bounded variances and E[Xi ] = m ∀ i. Then
for all ε > 0:
Sn
lim P − m > ε = 0.
n→∞ n
“convergence in probability of Sn
n to m”
Proof.

Sn 1 Sn

P −m >ε 6 2
· var → 0 if n → ∞.
n ε n

Example 7.5. Bernoulli experiments with parameter p ∈ [0, 1] Let Xi (ω) =

xi and Pp [Xi = 1] = p, hence Ep [Xi ] = p and var(Xi ) = p(1 − p) (6 41 ).
Then

Sn n→∞
Pp −p >ε −−−→ 0; (1.7)
n
|{z}
rel. freq.
of “1”

(J. Bernoulli: Ars Conjectandi)

Interpretation of the success probability p as relative frequency.
Problem: to infer p from (1.7) one uses a probabilistic statement w.r.t. a
probability measure Pp that is defined with the help of p.

35

Example 7.6. Application: uniform approximation of f ∈ C [0, 1] with
Bernstein polynomials
Let p ∈ [0, 1]. Then by the transformation theorem (see assignments)
n
X k n n
X k
k n−k
Bn (p) := f · p (1 − p) = f · Pp [Sn = k]
n k n
k=0 k=0

Sn
= Ep f .
n

Let ε > 0: f uniformly continuous on [0, 1] ⇒ ∃δ = δ(ε) > 0 such that

sup f (x) − f (y) 6 ε.

x,y∈[0,1],
|x−y|6δ

Now

Sn Sn
Bn (p) − f (p) = Ep f − f (p) 6 Ep f − f (p)
n n

Sn Sn
= Ep f − f (p) · 1{| Sn −p|6δ} + Ep f − f (p) · 1{| Sn −p|>δ}
n n n n

Sn Sn
6 ε · Pp − p 6 δ + 2kf k∞ Pp −p >δ .
n n
| {z }
≤ δ21n p(1−p)≤ 4δ12 n

Consequently,

lim sup kBn − f k∞ 6 ε ∀ ε > 0, hence lim kBn − f k∞ = 0.

n→∞ n→∞

From convergence in probability to P-a.s.-convergence:

Lemma 7.7. Let Z1 , Z2 , . . . be r.v. on (Ω, A, P). Then

P(lim sup{|Zn | ≥ ε}) = 0 ∀ε > 0 ⇐⇒ P( lim Zn = 0) = 1.

n→∞ n→∞

In particular, using the Lemma of Borel-Cantelli (Lemma 1.11), we obtain that

∞
X
( “fast convergence in probability to 0”),

P |Zn | > ε < ∞ ∀ε > 0
n=1

36
implies

( “almost sure convergence to 0”).

P ω lim Zn (ω) = 0 =1
n→∞

Proof. “⇒”: Define Nk := lim supn→∞ { Zn ≥ k1 } and N := k≥1 Nk . Then

S
N ∈ A and by assumption P(N ) = 0. Let ω ∈ N c (= Ω \ N ). Then ω ∈ Nkc
for any k ∈ N, i.e. for any k ∈ N, we have

1
Zn (ω) < for all but finitely many n ∈ N,
k

thus limn→∞ Zn (ω) = 0.

n→∞
“⇐”: Suppose ∃N measurable with P(N ) = 0 and Zn (ω) −−−→ 0 ∀ω ∈ Ω\N .
Let ε > 0 arbitrary and An := {|Zn | > ε}. If ω ∈ N c , then ω ∈ Acn for all but
finitely many n, thus N c ⊂ ∪∩Acn = (∩∪An )c and so P(lim supn→∞ An ) = 0.

Proposition 7.8 ( “Strong law of large numbers”). Let X1 , X2 , . . . ∈ L2 (Ω, A, P)

be uncorrelated with supi∈N σ 2 (Xi ) = c < ∞. Then:

Sn E[Sn ]
lim − = 0 P-a.s.
n→∞ n n

(resp., if E[Xi ] = m: limn→∞ Sn

n = m P-a.s.).

Proof. Again w.l.o.g. we may assume that E[Xi ] = 0 (otherwise consider

X̃i := Xi − E[Xi ]).

1. Step “Fast convergence in probability towards 0 along the subsequence nk =

k 2 ”:
For all ε > 0

Sk2 Chebychev 1 c
P 2
> ε 6 2 4
var(Sk2 ) ≤ .
k ε k ε2 k 2

Consequently, Lemma 7.7 implies that

Sk2 (ω)
lim =0 / N1 with P(N1 ) = 0 .
∀ω ∈
k→∞ k2

37
2. Step Let Dk := maxk2 6l<(k+1)2 |Sl − Sk2 |. We show fast convergence in
probability of D
k2 to 0. For all ε > 0:
k

k2[
+2k
D
P 2k > ε |Sl − Sk2 | > εk 2

=P
k
l=k2 +1
2
kX +2k
|Sl − Sk2 | > εk 2

6 P
| {z }
l=k2 +1 Chebychev
6 1
ε2 k4
(l−k2 ) ·c
| {z }
62k

(2k)(2k) · c
6
ε2 k 4
4c
= 2 2.
ε k
Lemma 7.7 now implies that

Dk (ω)
lim =0 / N2 with P(N2 ) = 0.
∀ω ∈
k→∞ k 2

3. Step For n ∈ N and k = k(n) ∈ N with k 2 6 n < (k + 1)2 we obtain that

Sn (ω) Sk2 (ω) + Dk (ω) n→∞

6 −−−→ 0 ∀ω ∈
/ N1 ∪ N2 .
n k2

Example 7.9. Bernoulli experiments with p ∈ [0, 1]:

n
1X
Xi −−→ p Pp -a.s. (E. Borel 1909, improves Bernoulli’s result).
n
i=1

Consider the experiment of tossing a fair coin (p = 12 ), Yi := 2Xi − 1

S n = Y1 + · · · + Yn
= position of a particle undergoing a “random walk” on Z.
Increasing refinement (+ rescaling and linear interpolation) of the random walk
yields the Brownian motion:

38
S (ω)
The strong law of large numbers implies that nn → 0 P-a.s.
In particular, fluctuations are growing slower than linear.

A precise description of the order of fluctuations is provided by the law of the

iterated logarithm:

Sn (ω)
lim sup √ = +1 P-a.s.
n→∞ 2n log log n
Sn (ω)
lim inf √ = −1 P-a.s.
n→∞ 2n log log n

8 Convergence and uniform integrability

Definition 8.1. Let X, X1 , X2 , . . . be r.v. on (Ω, A, P).

(i) Lp -convergence (p > 1)

lim E |Xn − X|p = 0

n→∞

(alternative notation limn→∞ kXn − Xkp = 0).

(ii) Convergence in probability

∀ ε > 0 : lim P |Xn − X| > ε = 0.
n→∞

39
(iii) P-a.s. convergence

P lim Xn = X = 1.
n→∞

Proposition 8.2 (Dependencies between the three types of convergence).

(i) +3 (ii)
X` >F

if sup|Xn | ∈ Lp
n∈N
(resp. |Xn |p unif. int.)
along some
~ subsequence
(iii)

Proof. (i)⇒(ii): Chebychev’s inequality implies:

E |Xn − X|p
P |Xn − X| > ε 6 .
εp

(iii)⇒(ii):

∞ [
∞ \
\ 1
lim Xn = X = |Xn − X| 6 .
n→∞ k
k=1 m=1 n>m
| {z }
=:Ak

Then P limn→∞ Xn = X = 1 implies P(Ak ) = 1 for all k ∈ N.

Continuity of P from below (cf. Proposition 1.9) implies that

1
\
1.9
1 = P(Ak ) = lim P |Xn − X| 6
m→∞ k
n>m
1 1
6 lim inf P |Xm − X| 6 ≤ lim sup P |Xm − X| 6 6 1.
m→∞ k m→∞ k

Consequently,

1
lim P |Xm − X| > = 0.
m→∞ k

40
(iii)⇒(i): Y := supn∈N |Xn | ∈ Lp , limn→∞ Xn = X P-a.s. implies |X| 6 Y
In particular, |Xn − X|p 6 2p Y p ∈ L1 .
limn→∞ |Xn − X|p = 0 P-a.s. with Lebesgue’s dominated convergence
now implies
lim E |Xn − X|p = 0.

n→∞

(ii)⇒(iii): For each k ∈ N there exists nk ∈ N with

1
1 k
P Xnk −X > < .
k 2
(We may and do choose nk such that nk is strictly increasing in k.) Then
for ε > 0,
X
P Xnk − X > ε
k≥1
X X
= P Xnk − X > ε + P Xnk − X > ε
| {z }
k : k1 ≥ε k : k1 <ε 1
| {z } ≤ P(|Xn k
−X| > k
)
≤ const.
≤ const. + 1 < ∞.

Lemma 7.7 (“fast convergence in probability”) now implies limk→∞ Xnk =

X P-a.s.
Remark 8.3. The diagram can be complemented as follows:
• (ii)⇒(i) holds, if supn∈N |Xn | ∈ Lp (resp. |Xn |p uniformly integrable)(see
Proposition 8.5 and Remark 8.8 below).

• in general (i);(iii) and (iii);(i) (hence (ii);(i) too). For examples, see
Exercises.
Definition 8.4. Let I be an index set. A family (Xi )i∈I ⊂ L1 of r.v. is called
uniformly integrable if
Z
lim sup |Xi | dP = 0.
c→∞ i∈I {|Xi |>c}

Note that by Lebesgue’s theorem {|Xi |>c} |Xi | dP = E[1{|Xi |>c} · |Xi |] →
R

0 for any fixed i ∈ I as c → ∞, but that uniform integrabillity requires this

covergence to be uniform in i ∈ I.

41
The next Proposition is the definitive version of Lebesgue’s theorem on dom-
inated convergence.
Proposition 8.5 (Vitali convergence therorem). Let Xn ∈ L1 , n ≥ 1, and X
be r.v. Then the following statements are equivalent:
(i) lim Xn = X in L1 .
n→∞

(ii) lim Xn = X in probability and (Xn )n∈N uniformly integrable.

n→∞

Corollary 8.6. limn→∞ Xn = X P-a.s. and (Xn )n∈N uniformly integrable

implies

lim E[|Xn − X|] = 0 hence lim E[Xn ] = E[X].

n→∞ n→∞

Lemma 8.7 (ε-δ criterion). Let (Xi )i∈I ⊂ L1 . Then the following statements
are equivalent:
(i) (Xi )i∈I is uniformly integrable.

(ii) supi∈I E |Xi | < ∞ and ∀ ε > 0 ∃δ > 0 such that

Z
A ∈ A and P(A) < δ =⇒ |Xi | dP < ε ∀ i ∈ I.
A

Proof. (i)⇒(ii): ∃c > 0 such that supi∈I |Xi | dP 6 1 Consequently,

R
{|Xi |>c}
Z Z Z

sup |Xi | dP = sup |Xi | dP + |Xi | dP
i∈I i∈I {|Xi |<c} {|Xi |>c}

6 c + 1 < ∞.

Let ε > 0. Then there exists c > 0 such that

Z
ε
sup |Xi | dP < .
i∈I {|Xi |>c} 2

For δ := ε
2c and A ∈ A with P(A) < δ we now conclude
Z Z Z
|Xi | dP = |Xi | dP + |Xi | dP
A A∩{|Xi |<c} A∩{|Xi |>c}
Z Z
ε
6c dP + |Xi | dP < c · P(A) + < ε.
A {|Xi |>c} 2

42
(ii)⇒(i): Let ε > 0 and δ be as in (ii). Using Markov’s inequality (and the two
properties in (ii)), we get for any i ∈ I

1 supi∈I E |Xi | + 1
if c >

P |Xi | > c 6 · E |Xi | < δ, ,
c δ

Z
hence |Xi | dP < ε ∀ i ∈ I.
{|Xi |>c}

Remark 8.8. (i) Existence of dominating integrable r.v. implies uniform in-
tegrability: |Xi | ≤ Y ∈ L1 ∀i ∈ I
Z Z
DCT c%∞
⇒ |Xi | dP 6 Y dP = E 1{Y >c} ·Y −−−−−−−−→ 0,
{|Xi |>c} {Y >c}

c→∞
since 1{Y >c} · Y −−−→ 0 P-a.s. (Markov’s inequality)
In particular, I finite ⇒ (Xi )i∈I ⊂ L1 uniformly integrable.

(ii) Let (Xi )i∈I , (Yi )i∈I be uniformly integrable, α, β ∈ R

⇒ (αXi + βYi )i∈I uniformly integrable

(see Exercises).

Proof of Proposition 8.5. (i)⇒(ii): see Exercises. (Hint: Use Lemma 8.7).

(ii)⇒(i): a) X ∈ L1 , because there exists a subsequence (nk ) such that

limk→∞ Xnk = X P-a.s., so that

Fatou
E |X| = E lim inf |Xnk | 6 lim inf E |Xnk |
k→∞ k→∞

6 sup E |Xn | < ∞.
n∈N

b) W.l.o.g. X = 0 (because (Xn )n∈N uniformly integrable implies

(Xn − X)n∈N uniformly integrable too by Remark 8.8(ii) and the
following argument does not change, when applied to (Xn − X)n∈N
instead of (Xn )n∈N ).

43
Let ε > 0. Then there exists δ > 0 such that for all A ∈ A with
P(A) < δ it follows that A |Xn | dP < 2ε for any n ∈ N.
R

Since Xn → 0 in probability, there exists n0 ∈ N, such that

P |Xn | > 2 < δ ∀ n ≥ n0 . Hence, for n ≥ n0
ε

Z Z

E |Xn | = |Xn | dP + |Xn | dP < ε,
{|Xn |< 2ε } {|Xn |> 2ε }
| {z } | {z }
≤ 2ε < 2ε

and thus limn→∞ E[|Xn |] = 0.

Corollary 8.9. limn→∞ Xn = X in probability and |Xn |p uniformly

n∈N
integrable, p > 0

⇒ lim Xn = X in Lp .
n→∞

Proof. limn→∞ |Xn − X|p → 0 in probability and since

|Xn − X|p 6 2p · |Xn |p + |X|p ,

| {z }
unif. integrable

(|Xn − X|p )n∈N is uniformly integrable too. Proposition 8.5 implies

lim E |Xn − X|p = 0.

n→∞

g(x)
Proposition 8.10. Let g : [0, ∞) → [0, ∞) be measurable with limx→∞ x =
∞. Then

< ∞ ⇒ (Xi )i∈I uniformly integrable

sup E g |Xi |
i∈I

g(x)
Proof. Let ε > 0. Choose c > 0, such that 1

x > ε supi∈I E g |Xi | +1
for x > c. Then for all i ∈ I
Z Z
|Xi |
|Xi | dP = g |Xi | · dP
{|Xi |>c} {|Xi |>c} g |Xi |
Z
ε
6 · g |Xi | dP 6 ε.
sup E g |Xj | + 1 {|Xi |>c}
j∈I

44
Example 8.11. (i) p > 1, supi E[|Xi |p ] < ∞ ⇒ (Xi )i∈I uniformly inte-
grable

(ii) (“finite entropy condition”)

h i
+
sup E |Xi | · log |Xi | < ∞ ⇒ (Xi )i∈I uniformly integrable (1.8)
i∈I

Example 8.12. Application to the strong law of large numbers Let

X1 , X2 , . . . be r.v. in L1 (Ω, A, P) with E[Xi ] = m for all i ∈ N. Suppose that
n
Sn 1X n→∞
= Xi −−−→ m P-a.s. (1.9)
n n
i=1

Question: Which additional condition implies L1 -convergence in (1.9) ?

(We have seen: if supi∈N E[Xi2 ] < ∞ and (Xi )i∈N uncorrelated, then (1.9)
holds (cf. Proposition 7.8). In this case even L2 -convergence holds in (1.9)
by Proposition 7.2 (“ L2 -case”). Later, we will see that (1.9) always holds if
Xi ∈ L1 are pairwise independent, identically distributed.)
Answer to the question:
Sn
sup E |Xi | · log+ |Xi | implies = m in L1 .

<∞ lim
i∈N n→∞ n

Proof: g(x) := x · log+ (x) ist monotone increasing and convex. Consequently,
n n
|S |
h i monotonicity h 1X i convexity 1 X
E g n

6 E g |Xi | 6 E g |Xi |
n n n
i=1 i=1

6 sup E g |Xi | ∀n,
16i6n

and so
|Sn |
h i
sup E g 6 sup E g |Xi | < ∞.
n∈N n i∈N

Consequently, Sn
is uniformly integrable and (1.9) holds. Thus by

n n∈N
Proposition 8.5
Sn
lim = m in L1 .
n→∞ n

45
One complementary remark concerning Lebesgue’s dominated convergence
theorem.

Proposition 8.13. Let Xn ≥ 0, lim Xn = X P-a.s. (or in probability). Then

n→∞

lim Xn = X in L1
n→∞

⇔ lim E[Xn ] = E[X] and E[X] < ∞.

n→∞

Proof. “⇒”: Obvious.

“⇐”:

X + Xn = X ∨ Xn + X ∧ Xn
| {z } | {z }
:=sup{X,Xn } :=inf{X,Xn }

Then
Lebesgue
lim E[X ∧ Xn ] = E[X]
n→∞

and thus

lim E[X ∨ Xn ] = E[X] .

n→∞

Now |Xn − X| = (X ∨ Xn ) − (X ∧ Xn ) implies

lim E |Xn − X| = E[X] − E[X] = 0.
n→∞

Lp -completeness
Proposition 8.14 (Lp -completeness, Riesz-Fischer). Let 1 6 p < ∞ and
Xn ∈ Lp with
Z
lim |Xn − Xm |p dP = 0.
n,m→∞

Then there exists a r.v. X ∈ Lp such that

(i) lim Xnk = X P-a.s. along some subsequence,

k→∞

(ii) lim Xn = X in Lp .
n→∞

Proof. See textbooks on measure theory.

46
9 Distribution of random variables
Let (Ω, A, P) be a probability space, and X : Ω → R̄ be a r.v.
Let µ be the distribution of X (under P), i.e., µ(A) = P(X ∈ A) for all
A ∈ B(R̄).
Assume that P(X ∈ R) = 1 (in particular, X P-a.s. finite, and µ is a
probability measure on (R, B(R)).

Definition 9.1. The function F : R → [0, 1], defined by

(1.10)

F (b) := P(X 6 b) = µ (−∞, b] , b ∈ R,

is called the distribution function of X resp. µ.

Proposition 9.2. (i) F is monotone increasing: a 6 b ⇒ F (a) 6 F (b).

right continuous: F (a) = lim F (b)
b&a
normalized: lima&−∞ F (a) = 0,, limb%+∞ F (b) = 1.
(ii)To any function F as in (i) there exists a unique probability measure µ on
(R, B(R)) with (1.10), and there exist P, and X as in (1.10).

Proof. (i) Monotonicity is obvious.

Right continuity: if b & a then (−∞, b] & (−∞, a], hence by continuity
of µ from above (cf. Proposition 1.9):
1.9
F (a) = µ (−∞, a] = lim µ (−∞, b] = lim F (b).
b&a b&a

Similarly, (−∞, a] & ∅ if a & −∞ (resp. (−∞, b] % R if b % ∞), and

thus

lim F (a) = lim µ (−∞, a] = 0
a&−∞ a&−∞

(resp. lim F (b) = lim µ (−∞, b] = 1).

b%∞ b%∞

(ii) Existence: Let λ be the Lebesgue measure on (0, 1). Define the “inverse
function” G of F : R → [0, 1] by

G : (0, 1) → R

G(y) := inf x ∈ R F (x) > y .

47
Since 0 < y < F (x) ⇒ G(y) 6 x, we have

0, F (x) ⊂ {G 6 x}.

By definition G(y) 6 x ⇒ ∃ xn & x with F (xn ) > y, hence by

right-continuity F (x) > y, so that

{G 6 x} ⊂ 0, F (x) .

Combining both inclusions we obtain that

0, F (x) ⊂ {G 6 x} ⊂ 0, F (x) .

so that G is measurable.
Let µ := G(λ) = λ ◦ G−1 (probability measure on (R, B(R))). Then

µ (−∞, x] = λ({G 6 x}) = λ 0, F (x) = F (x) ∀x ∈ R.

Uniqueness: later.

Remark 9.3. (i) Let µ be an arbitrary probability measure on (R, B(R))

with distribution function F . Then there exists a probability space (Ω, A, P)
and a random variable X : Ω → R such that µ = P ◦ X −1 (i.e.
µ can be "simulated"). Indeed, we just have to choose (Ω, A, P) =
((0, 1), B((0, 1)), λ) and X = G =“F −1 ” (λ, G as in the proof of Propo-
sistion 9.2(ii)).

(ii) Some authors define the distribution function F by F (x) := µ (−∞, x) .

In this case F is left continuous, not right continuous.

Remark 9.4. (i) Let F be the distribution function of µ and x ∈ R: Then

1
F (x) − F (x−) = lim µ x − , x = µ({x})
n%∞ n

is called the step height of F in x. In particular:

F continuous ⇔ ∀ x ∈ R : µ({x}) = 0 “µ is continuous” .

(ii) Since F as in (i) is monotone increasing and bounded, F has at most

countable many points of discontinuity.

48
Definition 9.5. (i) F (resp. µ) is called discrete, if there exists a countable
set S ⊂ R with µ(S) = 1. In this case, µ is uniquenely determined by the
weights µ({x}), x ∈ S, and F is a step function of the following type:
X
F (x) = µ({y}).
y∈S,
y 6x

(ii) F (resp. µ) is called absolutely continuous, if there exists a measurbale

function f > 0 called the “density ”, such that
Z x
F (x) = f (t) dt, (1.11)
−∞

(resp., for all A ∈ B(R):

Z Z ∞
µ(A) = f (t) dt = 1A · f dt). (1.12)
A −∞
Z +∞
In particular f (t) dt = 1.
−∞
R +∞
Remark 9.6. (i) Every measurable function f > 0 with −∞ f (t) dt = 1
defines a probability measure on (R, B(R)) by A 7→ A f (t) dt.
R

(ii) In the previous definition “(1.11)⇒(1.12)”, because A 7→ µ(A) = A f (t) dt

R
defines a probability measure on (R, B(R)) with distribution function F .
Thus from the uniqueness in 9.2(ii), we know that (1.12) must hold if the
distribution function F satisfies (1.11).
Conversly if a µ satisfies (1.12) then clearly its distribution function satis-
fies (1.11).

Example 9.7. (i) Uniform distribution on [a, b]. Let f := b−a 1

· 1[a,b] .
The associated absolutely continuous distribution function is given by

Z x 0
 if x 6 a
F (x) = f (t) dt = 1
· (x − a) if x ∈ [a, b]
−∞  b−a
1 if x > b.


(continuous analogue to the discrete uniform distribution on a finite set)

49
(ii) (Continuous) exponential distribution with parameter α > 0.
(
αe−αx if x > 0
f (x) :=
0 if x < 0,
(
x
1 − e−αx if x > 0
Z
F (x) = f (t) dt =
−∞ 0 if x < 0.

(continuous analogue of the geometric distribution on N ∪ {0}

Z k+1
f (x) dx = F (k + 1) − F (k) = e−αk (1| −{ze−α})
k =:p

= (1 − p)k p = P (X = k).)

(iii) Normal distribution N (m, σ 2 ), m ∈ R, σ 2 > 0

1 (x−m)2
fm,σ2 (x) = √ · e− 2σ 2 .
2πσ 2
The associated distribution function is given by
Z x
1 (y−m)2
Fm,σ2 (x) = √ · e− 2σ 2 dy
2πσ 2 −∞
Z x−m
z= y−m
σ 1 σ 2
− z2 x−m
= √ · e dz = F0,1 .
2π −∞ σ
Φ := F0,1 is called the distribution function of the standard normal distri-
bution N (0, 1).
The expectation E[X] (or more general E[h(X)]) can be calculated with the
help of the distribution µ of X:
Proposition 9.8. Let h > 0 be measurable, then

Z +∞
E h(X) = h(x) µ(dx)
−∞
Z +∞


 h(x) · f (x) dx if µ absolutely continuous with density f
−∞
= X


 h(x) · µ({x}) if µ discrete, i.e. µ(S) = 1 and S countable.
x∈S

50
Proof. See Assignments.

Example 9.9. Let X be N (m, σ 2 )-distributed. Then

Z Z
E[X] = x · fm,σ2 (x) dx = m + (x − m) · fm,σ2 (x) dx = m .
| {z }
=0

The p-th central moment of X is given by

Z
p
|x − m|p · fm,σ2 (x) dx,

E |X − m| =
Z
= |x|p · f0,σ2 (x) dx.
Z ∞
1 x2
=2 xp · √ · e− 2σ2 dx,
0 2πσ 2
Z ∞
1 p p+1
−1
= √ ·
|{z} π 2 · σ
2 p
y 2 · e−y dy.
2x
y= 2σ 2
|0 {z }
=Γ( p+1
2
)

In particular:
r
2
p = 1 : E |X − m| = σ ·
π
p = 2 : E |X − m|2 = σ 2

3 σ3
p = 3 : E |X − m|3 = 2 2 · √

π
p = 4 : E |X − m|4 = 3σ 4 .

10 Weak convergence of probability measures

Let S be a metric space, S = B(S) be the Borel σ-algebra on S, and µ, µn ,
n ∈ N, be probability measures on (S, S).
What is a reasonable notion of convergence of the sequence µn to µ? The
notion of “pointwise convergence” (= strong convergence) in the sense that
n→∞
µn (A) −−−→ µ(A) for all A ∈ S is too strong for many applications.

51
Definition 10.1. Let µ and µn , n ∈ N, be probability measures on (S, S).
The sequence (µn ) converges to µ weakly if for all f ∈ Cb (S) (= the space of
bounded continuous functions on S) it follows that
Z Z
n→∞
f dµn −−−→ f dµ.

By Example 11.10 below weak limits are unique.

n→∞ n→∞
Example 10.2. (i) xn −−−→ x in S implies δxn −−−→ δx weakly.

(ii) Let S := R1 and µn := N 0, n1 . Then µn → δ0 weakly, since for all

f ∈ Cb (R )
x2
Z Z
1 − 1
f dµn = f (x) · q ·e 2· n
dx
2π n1
x= √yn
Z y 1 y2
= f √ · √ · e− 2 dy
n 2π
Lebesgue Z
n→∞
−−−→ f (0) = f dδ0 .

Proposition 10.3 (Portemanteau-Theorem). Let S be a metric space with

metric d. Then the following statements are equivalent:

(i) µn → µ weakly
n→∞
(ii) f dµ for all f bounded and uniformly continuous (w.r.t.
R R
f dµn −−−→
d)

(iii) lim supn→∞ µn (F ) 6 µ(F ) for all F ⊂ S closed

(iv) lim inf n→∞ µn (G) > µ(G) for all G ⊂ S open

(v) limn→∞ µn (A) = µ(A) for all µ-continuity sets A, i.e. ∀ A ∈ S with
µ(∂A) = 0.
n→∞
(vi) f dµ for all f bounded, measurable and µ-a.s. contin-
R R
f dµn −−−→
uous.

Proof. (i)⇒(ii), (iii)⇔(iv), (vi)⇒(i) are obvious.

52
(ii)⇒(iii): Let F ⊂ S be closed and define d(x, F ) := inf y∈F d(x, y), x ∈ S.
The sets

1
Gm := x ∈ S d(x, F ) < , m∈N are open,
m

and Gm & ∩Gm = F , hence µ(Gm ) & µ(F ). In particular: ∀ε > 0

there exists some m0 = m0 (ε) ∈ N with

µ(Gm0 ) < µ(F ) + ε.

Define

1
 if x 6 0
ϕ(x) := 1 − x if x ∈ [0, 1]
if x > 1.

0


and let fm0 := ϕ m0 · d( · , F ) .

fm0 is Lipschitz continuous, hence uniformly continuous. Moreover, 0 6

fm0 6 1, fm0 = 0 on Gcm0 and fm0 = 1 on F , and thus
Z Z
(ii)
lim sup µn (F ) 6 lim sup fm0 dµn = fm0 dµ
n→∞ n→∞

6 µ(Gm0 ) < µ(F ) + ε.

(Idea: Construct uniformly continuous functions fm = ϕ m · d( · , F ) ,

m ∈ N, such that

1F ≤ fm ≤ 1Gm & 1F as m % ∞.)

(iii)⇒(v): For a subset A ⊂ S we denote the closure by Ā, the interior by Å,
and the boundary by ∂A. Let A be such that µ(Ā \ Å) = µ(∂A) = 0.
Then

(iv)
µ(A) = µ(Å) 6 lim inf µn (Å) ≤ lim inf µn (A) ≤ lim sup µn (A)
n→∞ n→∞ n→∞
(iii)
≤ lim sup µn (Ā) 6 µ(Ā) = µ(A).
n→∞

53
(v)⇒(vi): Let f be as in (vi). The distribution function
F (x) = µ({f 6 x})
has at most countably many jumps. Thus D := x ∈ R µ({f = x}) 6=
0 is at most countable, and so R \ D ⊂ R is dense. By denseness and
since f is bounded: for any ε > 0 we find c0 < · · · < cm ∈ R \ D with

|ck+1 − ck | ≤ ε ∀k = 0, ..., m − 1 and c0 6 f < c m .

Let Ak := {f ∈ [ck , ck+1 )}, k = 0, ..., m − 1. Then Ak is a µ-continuity

set, because
• ∂Ak ⊂ {f = ck } ∪ {f = ck+1 } ∪ Df , where

Df := {x ∈ R | f is not continuous at x}

and µ({f = ck } ∪ {f = ck+1 } ∪ Df ) = 0 since ci 6∈ D and Df has

zero µ-measure by assumption. (Note: Df is measurable !)

Proof. Let ω ∈ (Ω \ Df ) ∩ ∂f −1 ([ck , ck+1 )). Then

∀ε > 0, ∃δ > 0, such that f B(ω, δ) ⊂ B f (ω), ε (1.13)

and

∀δ̃ > 0 : B(ω, δ̃) ∩ f −1 ([ck , ck+1 )) 6= ∅

B(ω, δ̃) ∩ f −1 (R \ [ck , ck+1 )) 6= ∅. (1.14)

Choosing δ̃ = δ in (1.14) and applying f , we get

f B(ω, δ)∩f −1 ([ck , ck+1 )) 6= ∅ 6= f B(ω, δ)∩f −1 (R\[ck , ck+1 )) ,

thus by (1.13)

B f (ω), ε ∩ [ck , ck+1 ) 6= ∅ 6= B f (ω), ε ∩ R \ [ck , ck+1 ) .

Since ε > 0 is arbitrary, we get f (ω) ∈ ∂[ck , ck+1 ) = {ck , ck+1 }

and so

∂Ak = (Ω \ Df ) ∩ ∂f −1 ([ck , ck+1 )) ∪ Df ∩ ∂f −1 ([ck , ck+1 )) .

| {z } | {z }
⊂ {f =ck }∪{f =ck+1 } ⊂ Df

54
Pm−1
Let g := k=0 ck · IAk . Then kf − gk∞ 6 ε and
Z Z
f dµ − f dµn
Z Z Z Z
6 |f − g| dµ + g dµ − g dµn + |g − f | dµn
| {z } | {z }
6ε 6ε
m−1 (v)
n→∞
X
6 2ε + |ck | · µ Ak − µn Ak −−−→ 2ε.
k=0

Corollary 10.4. Let X, Xn , n ∈ N, be measurable mappings from (Ω, A, P)

to (S, S) with distributions µ, µn , n ∈ N. Then:
n→∞ n→∞
Xn −−−→ X in probability ⇒ µn −−−→ µ weakly

Here, limn→∞ Xn = X in probability, if limn→∞ P(d(X, Xn ) > δ) = 0 for all

δ > 0.

Proof. Let f ∈ Cb (S) be uniformly continuous and ε > 0. Then there exists a
δ = δ(ε) > 0 such that:
x, y ∈ S with d(x, y) 6 δ implies |f (x) − f (y)| ≤ ε
Hence
Z Z

f dµ − f dµn = E f (X) − E f (Xn )
Z Z
6 f (X) − f (Xn ) dP + f (X) − f (Xn ) dP
{d(X,Xn )6δ} {d(X,Xn )>δ}

6 ε + 2kf k∞ · P d(Xn , X) > δ .
| {z }
n→∞
−−−→0

Corollary 10.5. Let S = R1 and let µ, µn , n ∈ N, be probability measures on

(R, B(R)) with distributions functions F , Fn . Then the following statements
are equivalent:
n→∞
(i) µn −−−→ µ vaguely, i.e. limn→∞ f dµn = f dµ for all f ∈ C0 (R)
R R
(= the space of continuous functions with compact support)

55
n→∞
(ii) µn −−−→ µ weakly

n→∞
(iii) Fn (x) −−−→ F (x) for all x where F is continuous.

n→∞
(iv) µn (a, b] −−−→ µ (a, b] for all (a, b] with µ({a}) = µ({b}) = 0.

Proof. (i)⇒(ii): Exercise.

(ii)⇒(iii): Let x be such that F is continuous in x. Then µ({x}) = 0, which

implies by the Portmanteau theorem:

n→∞
Fn (x) = µn (−∞, x] −−−→ µ (−∞, x] = F (x).

(iii)⇒(iv): Let (a, b] be such that µ({a}) = µ({b}) = 0 then F is continuous

in a and b and thus

(iii)
µ (a, b] = F (b) − F (a) = lim Fn (b) − lim Fn (a)
n→∞ n→∞

= lim µn (a, b] .
n→∞

(iv)⇒(i): Let D := x ∈ R µ({x}) 6= 0 . Then D is at most countable,

hence R \ D ⊂ R dense. Let f ∈ C0 (R), then f is uniformly continuous.
Hence for any ε > 0 we find δ = δ(ε) > 0 such that

x, y ∈ R and |x − y| ≤ δ =⇒ |f (x) − f (y)| ≤ ε

and we can find c0 < · · · < cm ∈ R \ D such that supp(f ) ⊂ (c0 , cm ]

and |ck+1 − ck | ≤ δ for k = 0, ..., m − 1. Consequently

m
X
f− f (ck−1 ) · I(ck−1 ,ck ] 6 sup sup f (x)−f (ck−1 ) ≤ ε,
∞ 1≤k≤m x∈[ck−1 ,ck ]
|k=1 {z }
=:g

56
and so
Z Z
f dµ − f dµn
Z Z Z Z
6 |f − g| dµ + g dµ − g dµn + |f − g| dµn
| {z } | {z }
≤ε ≤ε
m
X
6 2ε + |f (ck−1 )| · µ (ck−1 , ck ] − µn (ck−1 , ck ]
k=1
(iv)
n→∞
−−−→ 2ε.

11 Dynkin-systems and Uniqueness of

probability measures
Let Ω 6= ∅.

Definition 11.1. A collection of subsets D ⊂ P(Ω) is called a Dynkin-system,

if:

(i) Ω ∈ D.

(ii) A ∈ D ⇒ Ac ∈ D.

(iii) Ai ∈ D, i ∈ N, pairwise disjoint, then

[
Ai ∈ D.
i∈N

Example 11.2. (i) Every σ-Algebra A ⊂ P(Ω) is a Dynkin-system

(ii) Let P1 , P2 be probability measures on (Ω, A). Then

D := A ∈ A P1 (A) = P2 (A)

is a Dynkin-system.

57
Remark 11.3. (i) Let D be a Dynkin-system. Then

A, B ∈ D , A ⊂ B ⇒ B \ A = (B c ∪ A)c ∈ D

(ii) Every Dynkin-system which is closed under finite intersections (short no-
tation: ∩-stable), is a σ-algebra, because:
(a) A, B ∈ D A ∪ B = A ∪ B \ (A ∩ B) ∈ D.

⇒
| {z }
∈D by ass.
| {z }
(i)
∈D
(b) Ai ∈ D, i ∈ N
i−1 i−1

[ [ [ [ [ c
⇒ Ai = Ai \ An = Ai ∩ An ∈ D.
n=1
i∈N i∈N
| {z } i∈N
| n=1{z }
pairwise disjoint ! (a)

| {z∈ D }
∈ D by ass.,

Proposition 11.4. Let B ⊂ P(Ω) be a ∩-stable collection of subsets (i.e.

A, B ∈ B ⇒ A ∩ B ∈ B). Then

σ(B) = D(B) ,

where
\
D(B) := D
D Dynkin-system
B⊂D

is called the Dynkin-system generated by B.

Proof. Since σ(B) is a Dynkin system that contains B, we obtain σ(B) ⊃
D(B).
In order to show σ(B) ⊂ D(B), it is enough to show that D(B) is a σ-algebra.
By Remark 11.3(ii) it is enough to show that D(B) is ∩-stable. For arbitrary
D ∈ D(B) set
DD := {Q ∈ P(Ω) | Q ∩ D ∈ D(B)}.
Then DD is a Dynkin system. In particular, if D(B) ⊂ DD , then we are done.
Now
E∈B =⇒ B ⊂ DE =⇒ D(B) ⊂ DE
B is ∩-stable

58
=⇒ E ∩ D = D ∩ E ∈ D(B).
The latter implies B ⊂ DD , hence D(B) ⊂ DD .

Proposition 11.5 (Uniqueness of probability measures). Let P1 , P2 be prob-

ability measures on (Ω, A), and B ⊂ A be a ∩-stable collection of subsets.
Then:

P1 (A) = P2 (A) for all A ∈ B ⇒ P1 = P2 on σ(B).

Proof. The collection of subsets

D := A ∈ A P1 (A) = P2 (A)

is a Dynkin-system containing B. Consequently,

11.4
σ(B) = D(B) ⊂ D.

Example 11.6. (i) For p ∈ [0, 1] the probability measure Pp on (Ω :=

{0, 1}N , A) is uniquely determined by
n
X
k
Pp (X1 = x1 , . . . , Xn = xn ) = p (1 − p) n−k
, with k := xi
i=1

for all x1 , . . . , xn ∈ {0, 1}, n ∈ N, because the collection of cylindrical

sets

∅, {X1 = x1 , . . . , Xn = xn }, n ∈ N, x1 , . . . , xn ∈ {0, 1}

is ∩-stable, generating A (cf. Example 1.7).

(Existence of Pp for p = 1
2 see Example 3.6. Existence for p ∈ (0, 1) \ { 12 }
later.)

(ii) A probability measure on (R, B(R)) is uniquely determined through its

distribution function F (:= µ (−∞, · ] ), because

µ (a, b] = F (b) − F (a),

and the collection of intervals (a, b], a, b ∈ R, is ∩-stable, generating

B(R).

59
Definition 11.7. Let H be a vector space of real-valued bounded functions
on Ω. H is called a monotone vector space (MVS), if:

(i) 1 = 1Ω ∈ H (constants are in H)

(ii) Let fn ∈ H, n ∈ N with 0 ≤ f1 ≤ · · · ≤ fn % f , f bounded. Then

f ∈ H.

Lemma 11.8. A MVS H is closed under uniform convergence.

n→∞
Proof. Let fn ∈ H, n ∈ N fn −→ f uniformly on Ω, i.e.
n→∞
kf − fn k∞,Ω := sup |f (ω) − fn (ω)| −→ 0.
ω∈Ω

W.l.o.g. f ≥ 0 (otherwise consider f + kf k∞,Ω and fn + kfn k∞,Ω ). There is

a subsequence (fnk )k∈N , such that

εk := kfnk − fnk+1 k∞,Ω , k ≥ 1

P∞
satisfies < ∞. Thus ak := & 0. Put
P
k≥1 εk n=k εn
k→∞

gk := fnk −ak + 2a1 ∈ H.

|{z} | {z }
∈H =const.

Then

(1) gk+1 − gk = fnk +1 − fnk + ak − ak+1 ≥ 0 (gk %)

| {z }
=εk

(2) gk % f + 2a1 , and f + 2a1 is bounded

P∞ P∞
(3) g1 = fn1 + a1 = fn1 + k=1 kfnk − fnk +1 k∞,Ω ≥ fn1 − k=1 (fnk −
fnk +1 ) = f ≥ 0.

(1) − (3) ⇒ (gk )k≥1 fulfills assumptions of Definition 11.7 (ii).

⇒ lim gk = f + 2a1 ∈ H ⇒ f ∈ H.
k→∞

Notations: If M is a class of functions on a set Ω, then

60
σ(M) := “smallest σ-algebra for which all f ∈ M are measurable”

σ(M)b := “bounded σ(M)-measurable functions”

The next theorem plays the same role in measure theory and probability theory
as the Stone-Weierstrass Theorem in analysis.

Proposition 11.9. (Monotone class theorem (in multiplicative form)) Let

M be a class of bounded functions on Ω which is closed under multiplication
(a so-called multiplicative class), i.e f, g ∈ M ⇒ f · g ∈ M. Let H be a MVS
on Ω, and M ⊂ H. Then σ(M)b ⊂ H.

Proof. Let M0 = span(M, 1) = smallest subspace of H containing M and 1.

Then M0 is again an algebra and σ(M0 ) = σ(M). Let M0 be the uniform clo-
sure of M0 w.r.t. k · k∞,Ω . Then M0 is again an algebra, σ(M0 ) = σ(M0 ) =
σ(M), and M0 ⊂ H by Lemma 11.8. Moreover, M0 is again a linear subspace
of H.

Claim: f ∈ M0 ⇒ f ∧ α ∈ M0 , ∀α ∈ R.

Proof: L := kf k∞,Ω . Weierstrass ⇒ ∀ε > 0 ∃Pε polynomial s.t. |x ∧

α − Pε (x)| < ε, ∀x ∈ [−L, L]. Since Pε (f ) ∈ M0 , we get f ∧ α ∈ M0 .
Every f ∈ σ(M0 )b is the uniform limit of σ(M0 )-elementary functions (f + , f − ∈
σ(M0 )b can be uniformly approximated by σ(M0 )-measurable elementary func-
tions by Proposition 4.4 (ii)). Since H is closed under uniform convergence, it
suffices hence to show that 1A ∈ H for any A ∈ σ(M0 ). In other words, we
have to show

σ(M0 ) = {A ∈ σ(M0 ) | 1A ∈ H} =: S.

“⊃”: Clear
“⊂”: S is a Dynkin system. Put

E := {A ∈ σ(M0 ) | ∃fn ∈ M0 , fn ≥ 0, n ≥ 1 : fn % 1A }.

E is closed under intersections ⇒ σ(E) = D(E). Since H is a MVS and

M0 ⊂ H we have: A ∈ E ⇒ 1A ∈ H. Thus E ⊂ S, and then

σ(E) = D(E) ⊂ S ⊂ σ(M0 ). (1.15)

61
For f ∈ M0 , α ∈ R, we have:
n · (f − α)+ ∧ 1 % 1{f >α} ⇒ {f > α} ∈ E
| {z } n→∞ | {z }
∈M0 such sets generate σ(M0 )

⇒ σ(M0 ) ⊂ σ(E) ⇒ equalities in (1.15).

Example 11.10. Let µ1 , µ2 be probability measures on a metric space (S, d)

with Borel σ-algebra B(S). Suppose
Z Z
f dµ1 = f dµ2 ∀f ∈ Cb (S),
S S

where Cb (S) denotes the continuous and bounded functions on S. Then µ1 =

µ2 .

Proof. H := {f ∈ B(S)b | f dµ2 } is a MVS , and

R R
f dµ1 =

Cb (S) ⊂ H =⇒ H ⊃ σ(Cb (S))b .

Theorem 11.9

How big is σ(Cb (S)) ? Clearly σ(Cb (S)) ⊂ B(S) since every continuous func-
tion on S is measurable w.r.t. B(S). Let F ⊂ S be closed and d(x, F ) :=
inf y∈F d(x, y), x ∈ S. Then d(·, F ) is Lipschitz continuous and so f :=
d(·, F ) ∧ 1 ∈ Cb (S). Moreover

F = {f = 0} ∈ σ(Cb (S)).

Hence B(S) ⊂ σ(Cb (S)). In particular 1A ∈ σ(Cb (S))b for all A ∈ B(S),
hence µ1 = µ2 .

Example 11.11. Let µ be a probability measure on a metric space (S, d) with

Borel σ-algebra B(S). Then Cb (S) is dense in L1 (µ).

Proof. One shows that

Z
f ∈ L (µ) | ∃(fn )n∈N ⊂ Cb (S) with lim
1

|f − fn |dµ = 0
n→∞ S

is a MVS and concludes as in Example 11.10.

62
2 Independence

1 Independent events
Let (Ω, A, P ) be a probability space.

Definition 1.1. The events Ai ∈ A, i ∈ I, are said to be independent (w.r.t.

P ), if for any finite subset J ⊂ I
\ Y
P Aj = P (Aj ).
j∈J j∈J

A family of collections of subsets Bi ⊂ A, i ∈ I, is said to be independent,

if for all finite subsets J ⊂ I and for all subsets Aj ∈ Bj , j ∈ J
\ Y
P Aj = P (Aj ).
j∈J j∈J

Proposition 1.2. Let Bi , i ∈ I, be independent collections of subsets that are

closed under intersections. Then:

(i) σ(Bi ), i ∈ I, are independent.

(ii) Let Jk , k ∈ K, be a partition of the index set I. Then the σ-algebras

[
σ Bi , k ∈ K,
i∈Jk

are independent.

Proof. (i) Let J ⊂ I, J finite, be of the form J = {j1 , . . . , jn }. Let Aj1 ∈

σ(Bj1 ), . . . , Ajn ∈ σ(Bjn ).
We have to show that

P (Aj1 ∩ · · · ∩ Ajn ) = P (Aj1 ) · · · P (Ajn ). (2.1)

63
To this end suppose first that Aj2 ∈ Bj2 , . . . , Ajn ∈ Bjn , and define

Dj1 := A ∈ σ(Bj1 ) P (A ∩ Aj2 ∩ · · · ∩ Ajn )

= P (A) · P (Aj2 ) · · · P (Ajn ) .

Then Dj1 is a Dynkin system (!) containing Bj1 . Proposition 1.11.4 now
implies

σ(Bj1 ) = D(Bj1 ) ⊂ Dj1 ,

hence σ(Bj1 ) = Dj1 . Iterating the above argument for Dj2 , Dj3 , implies
(2.1).

(ii) For k ∈ K define

n\ o
Ck := Aj J ⊂ Jk , J finite, Aj ∈ Bj .
j∈J

Then Ck is closed under intersections and the Ck , k ∈ K, are independent,

because: given k1 , . . . , kn ∈ K and finite subsets J 1 ⊂ Jk1 , . . . , J n ⊂ Jkn ,
then
\ \ Bind.
i ,i∈I n
Y \
P Ai ∩ · · · ∩ Ai = P Ai .
i∈J 1 i∈J n j=1 i∈J j
| {z } | {z }
∈Ck1 ∈Ckn
S
(i) now implies that σ(Ck ), k ∈ K is independent. But Ck ⊂ σ i∈Jk Bi
S
since Ck only contains finite intersections of events in σ i∈Jk Bi and
Ck ⊃ Bi . Hence
S
i∈Jk
[
σ(Ck ) = σ Bi , k ∈ K,
i∈Jk

which implies the assertion.

Example 1.3. Let Ai ∈ A, i ∈ I, be independent. Then σ({Ai }) =

{∅, Ai , Aci , Ω}, i ∈ I, is an independent family of σ-algebras, but the collection
of events {Ai , Aci | i ∈ I}, is in general not independent.

64
Remark 1.4. Pairwise independence does not imply independence in general.
Example: Consider two tosses with a fair coin, i.e.

P := uniform distribution.

Ω := (i, k) i, k ∈ {0, 1} ,

Consider the events

A := "1. toss 1" = (1, 0), (1, 1)

B := "2. toss 1" = (0, 1), (1, 1)

C := "1. and 2. toss equal" = (0, 0), (1, 1) .

Then P (A) = P (B) = P (C) = 1

2 and A, B, C are pairwise independent

1
P (A ∩ B) = P (B ∩ C) = P (C ∩ A) = .
4
But on the other hand
1
P (A ∩ B ∩ C) = 6= P (A) · P (B) · P (C).
4
Example 1.5. Independent 0-1-experiments with success probability
p ∈ [0, 1]. Let Ω := {0, 1}N , Xi (ω) := xi and ω := (xi )i∈N . Let Pp be a
probability measure on A := σ {Xi = 1}, i = 1, 2, . . . , with

(i) Pp [Xi = 1] = p (hence Pp [Xi = 0] = Pp {Xi = 1}c = 1 − p).

(ii) {Xi = 1}, i ∈ N, are independent w.r.t. Pp .

Then for any x1 , . . . , xn ∈ {0, 1}:

(ii) and n
1.3 Y (i)
Pp [Xi1 = x1 , . . . , Xin = xn ] = Pp [Xij = xj ] = pk (1 − p)n−k ,
j=1
Pn
where k := i=1 xi gilt. Hence Pp is uniquely determined by (i) and (ii).

Proposition 1.6 (Kolmogorov’s Zero-One Law). Let Bn , n ∈ N, be indepen-

dent σ-algebras, and
∞ [
\ ∞
B∞ := σ Bm
n=1 m=n

65
be the tail σ-algebra (resp. σ-algebra of terminal events). Then

P (A) ∈ {0, 1} ∀ A ∈ B∞

i.e., P is deterministic on B∞ .

Illustration: Independent 0-1-experiments

Let Bi = σ {Xi = 1} . Then
\ [
B∞ = σ Bm
n∈N m>n

is the σ-algebra containing the events of the remote future, e.g.

lim sup{Xi = 1} = {“infinitely many ‘1” ’}

i→∞
n
1X
ω ∈ {0, 1}N lim Xi (ω) exists
n→∞ n
i=1
| {z }
=: Snn(ω)

Proof of the Zero-One Law. Proposition 1.2 implies that for all n
∞
[
B1 , B2 , . . . , Bn−1 , σ Bm
m=n
S
are independent. Since B∞ ⊂ σ B
m>n m , this implies that for all n

B1 , B2 , . . . , Bn−1 , B∞

are independent. By definition this implies that

B∞ , Bn , n ∈ N are independent

and now Proposition 1.2(ii) implies that

[
σ Bn and B∞
n∈N
S
are idependent. Since B∞ ⊂ σ n>1 Bn we finally obtain that B∞ and B∞
are independent. The conclusion now follows from the next lemma.

66
Lemma 1.7. Let B ⊂ A be a σ-algebra such that B is independent from B.
Then

P (A) ∈ {0, 1} ∀A ∈ B.

Proof. For all A ∈ B

P (A) = P (A ∩ A) = P (A) · P (A) = P (A)2 .

Hence P (A) = 0 or P (A) = 1.

For any sequence An , n ∈ N, of independent events in A, Kolmogorov’s

Zero-One Law implies in particular for
\ [
A∞ := Am =: lim sup An
n→∞
n∈N m>n

that P (A∞ ) ∈ {0, 1}.

Proof: The σ-algebras Bn := σ{An } = {∅, Ω, A, Ac }, n ∈ N, are independent
by Proposition 1.2 and A∞ ∈ B∞ .

Lemma 1.8 (Borel-Cantelli). (i) Let Ai ∈ A, i ∈ N. Then

∞
X
P (Ai ) < ∞ ⇒ P lim sup Ai = 0.
i→∞
i=1

(ii) Assume that Ai ∈ A, i ∈ N, are independent. Then

∞
X
P (Ai ) = ∞ ⇒ P lim sup Ai = 1.
i→∞
i=1

Proof. (i) See Lemma 1.1.11.

(ii) It suffices to show that

∞
[ ∞
\
P Am = 1 resp. P Acm =0 ∀n.
m=n m=n

67
The last equality follows from the fact that
∞
\ n+k
\
P Acm = lim P Acm
k→∞
m=n m=n
| {z }
ind.
Qn+k c
= m=n P (Am )

n+k n+k
!
Y X
= lim (1 − P (Am )) ≤ lim exp − P (Am ) =0
k→∞ k→∞
m=n m=n

where we used the inequality 1 − α 6 e−α for all α ∈ [0, 1].

Example 1.9. Independent 0-1-experiments with success probability

p ∈ (0, 1). Let (x1 , . . . , xN ) ∈ {0, 1}N ("binary text of length N ").
Pp ["text occurs"] = ?
To calculate this probability we partition the infinite sequence ω = (yn ) into
blocks of length N
(y1 , y2 , . . . ... ... . . .) ∈ Ω := {0, 1}N .
| {z } | {z }
1. block 2. block
length = N length = N

and consider the events Ai = "text occurs in the ith block". Clearly, Ai , i ∈ N,
are independent events (!) by Proposition 1.2(ii) with equal probability
Pp (Ai ) = pK (1 − p)N −K =: α > 0.
PN
where K := i=1 xi is the total sum of ones in the text. In particular,
∞ ∞
i=1 α = ∞, and now Borel-Cantelli implies Pp (A∞ ) = 1,
P P
i=1 Pp (Ai ) =
where
A∞ = lim sup Ai := "text occurs infinitely many times" .
i→∞
Moreover: since the indicator functions 1A1 , 1A2 , . . . are uncorrelated (since
A1 , A2 , . . . are independent) with uniformly bounded variances, the strong law
of large numbers implies that
n
1X Pp -a.s.
1A1 −−−−→ E[1A1 ] = α ,
n
i=1
i.e. the relative frequency of the given text in blocks of the infinite sequence is
strictly positive.

68
2 Independent random variables
Let (Ω, A, P ) be a probability space.

Definition 2.1. A family Xi , i ∈ I, of r.v. on (Ω, A, P ) is said to be indepen-

dent, if the σ-algebras

σ(Xi ) := Xi−1 B(R̄) = {Xi ∈ A} A ∈ B(R̄) , i ∈ I,

are independent, i.e. for all finite subsets J ⊂ I and any Borel subsets Aj ∈
B(R̄)
\ Y
P {Xj ∈ Aj } = P [Xj ∈ Aj ].
j∈J j∈J

Remark 2.2. Let Xi , i ∈ I, be independent and hi : R̄ → R̄, i ∈ I,

B(R̄)/B(R̄)-measurable. Then Yi := hi (Xi ), i ∈ I, are again independent,
because σ (Yi ) ⊂ σ (Xi ) for all i ∈ I.

Proposition 2.3. Let X1 , . . . , Xn be independent r.v., ≥ 0. Then

E[X1 · · · Xn ] = E[X1 ] · · · E[Xn ].

Proof. W.l.o.g. n = 2. (Proof of the general case by induction, using the fact
that X1 · . . . · Xn−1 and Xn are independent , since X1 · . . . · Xn−1 is
measurable
w.r.t σ σ(X1 ) ∪ · · · ∪ σ(Xn−1 ) and σ σ(X1 ) ∪ · · · ∪ σ(Xn−1 ) and σ(Xn )

are independent by Proposition 1.2.)
It therefore suffices to consider two independent r.v. X, Y , ≥ 0, and we have
to show that

E[XY ] = E[X] · E[Y ]. (2.2)

W.l.o.g. X, Y simple (for general X and Y there exist increasing sequences of

simple r.v. Xn (resp. Yn ), which are σ(X)-measurable, resp. σ(Y )-measurable,
converging pointwise to X, resp. Y . Then E[Xn Yn ] = E[Xn ] · E[Yn ] for all n
implies (2.2) using monotone integration).
But for X, Y simple, hence
m
X n
X
X= αi 1Ai and Y = βj 1Bj ,
i=1 j=1

69
with αi , βj > 0 and Ai ∈ σ(X) resp. Bj ∈ σ(Y ) it follows that
X X
E[XY ] = αi βj ·P (Ai ∩Bj ) = αi βj ·P (Ai )·P (Bj ) = E[X]· E[Y ].
i,j i,j

Corollary 2.4. X, Y independent, X, Y ∈ L1

⇒ XY ∈ L1 and E[XY ] = E[X] · E[Y ] .

Proof. Let ε1 , ε2 ∈ {+, −}. Then X ε1 and Y ε2 are independent by Remark 2.2
and nonnegative. Proposition 2.3 implies

E[X ε1 · Y ε2 ] = E[X ε1 ] · E[Y ε2 ].

In particular X ε1 · Y ε2 in L1 , because E[X ε1 ] · E[Y ε2 ] < ∞. Hence

X · Y = X + · Y + + X − · Y − − (X + · Y − + X − · Y + ) ∈ L1 ,

and E[XY ] = E[X] · E[Y ].

Remark 2.5. (i) In general the converse to the above corollary does not hold:
For example let X be N (0, 1)-distributed and Y = X 2 . Then X and Y
are not independent, but

E[XY ] = E[X 3 ] = E[X] · E[Y ] = 0 .

(ii)
X, Y ∈ L2 independent ⇒ X, Y uncorelated
because

cov(X, Y ) = E[XY ] − E[X] · E[Y ] = 0 .

Corollary 2.6 (to the strong law of large numbers ). Let X1 , X2 , · · · ∈ L2 be

independent with supi∈N var(Xi ) < ∞. Then
n
1X
P-a.s.

lim Xi − E[Xi ] = 0
n→∞ n
i=1

n
1X
If E[Xi ] ≡ m then lim Xi = m P -a.s.
n→∞ n
i=1

70
3 Kolmogorov’s law of large numbers
Proposition 3.1 (Kolmogorov, 1930). Let X1 , X2 , · · · ∈ L1 be independent,
identically distributed (i.i.d.), m = E[Xi ]. Then

n
1X n→∞
Xi −−−→ m P -a.s.
n
| i=1
{z }
empirical
mean

Proposition 3.1 follows from the following more general result:

Proposition 3.2 (Etemadi, 1981). Let X1 , X2 , · · · ∈ L1 be pairwise indepen-

dent, identically distributed, m = E[Xi ]. Then

n
1X n→∞
Xi −−−→ m P -a.s.
n
i=1

Proof. W.l.o.g. Xi > 0

(otherwise consider X1+ , X2+ , . . . (pairwise independent, identically distributed)
and X1− , X2− , . . . (pairwise independent, identically distributed))

1. Replace Xi by X̃i := 1{Xi <i} Xi .

Clearly,
(
x if x < i
X̃i = hi (Xi ) with hi (x) :=
0 if x > i

Then X̃1 , X̃2 , . . . are pairwise independent by Remark 2.2. For the proof it
Pn
is now sufficient to show that for S̃n := i=1 X̃i we have that

S̃n n→∞
−−−→ m P -a.s.
n

71
Indeed,
∞
X ∞
X ∞
X
P [Xn 6= X̃n ] = P [Xn > n] = P [X1 > n]
n=1 n=1 n=1
∞ X
X ∞ ∞
X

= P X1 ∈ [k, k + 1) = k · P X1 ∈ [k, k + 1)
n=1 k=n k=1
X∞

= E k · 1{X1 ∈[k,k+1)} 6 E[X1 ] < ∞
k=1
| {z }
6X1 ·1{X1 ∈[k,k+1)}

implies by the Borel-Cantelli lemma

P [Xn 6= X̃n infinitely often] = 0 .

2. Reduce the proof to convergence along the subsequence kn = bαn c (=

largest natural number ≤ αn ), α > 1.
We will show in Step 3. that
S̃kn − E[S̃kn ] n→∞
−−−→ 0 P-a.s. (2.3)
kn
This will imply the assertion of the Proposition, because
i→∞
E[X̃i ] = E 1{Xi <i} · Xi = E 1{X1 <i} · X1 % E[X1 ](= m)
hence
kn
1 1 X n→∞
· E[S̃kn ] = E[X̃i ] −−−→ m,
kn kn
i=1

and thus by (2.3)

1 n→∞
· S̃kn (ω) −−−→ m ∀ω ∈ Nαc where Nα ∈ A with P (Nα ) = 0.
kn
It follows for l ∈ N ∩ [kn , kn+1 ), n big, and ω ∈ Nαc

kn S̃k S̃l S̃k kn+1

· n (ω) 6 (ω) 6 n+1 (ω) · .
kn+1 kn l kn+1 kn
| {z } | {z } | {z } | {z }
n→∞ n→∞ n→∞ n→∞
−−−→ 1
α
−−−→m −−−→m −−−→α

72
Hence for all ω ∈
/ Nα

1 S̃l (ω) S̃ (ω)

· m 6 lim inf 6 lim sup l 6 α · m.
α l→∞ l l→∞ l

Finally choose a sequence αn & 1. Then for all ω in

\ [
N := Nαc n = Ω \ Nαn
n>1 n>1

we get

S̃l (ω)
lim = m.
l→∞ l

3. Due to Lemma 1.7.7 it suffices for the proof of (2.3) to show that
∞
" #
X S̃kn − E[S̃kn ]
∀ε > 0 : P >ε <∞
kn
n=1

(fast convergence in probability towards 0)

Pairwise independence of X̃i implies X̃i pairwise uncorrelated, hence
" # kn
S̃kn − E[S̃kn ] 1 1 X
P > ε 6 2 2 · var(S̃kn ) = 2 2 var(X̃i )
kn kn ε kn ε
i=1
kn
1 X
E (X̃i )2 .

6 2 2
kn ε
i=1

It therefore suffices to show that

∞ kn
X 1 X 2
X 1 2

s := 2
E (X̃ i ) = · E ( X̃ i ) < ∞.
kn kn2
n=1 i=1 (i,n)∈N2 ,
i6kn

To this end note that

∞ X
X 1
· E (X̃i )2 .

s=
kn2
i=1 n : kn > i

73
We will show in the following that there exists a constant c such that
X 1 c
6 . (2.4)
kn2 i2
n : kn >i

This will then imply that

(2.4) ∞ ∞
X 1 2
X 1
· E 1{X1 <i} · X12

s 6 c · E (X̃i ) =c
i2 i2
i=1 i=1
∞ i
X 1 X
l2 · P X1 ∈ [l − 1, l)

6 c
i2
i=1 l=1
∞ ∞
X !
X 1
l2 ·

= c ·P X1 ∈ [l − 1, l)
i2
l=1 i=l
| {z }
62l−1
∞
X X∞

6 2c l · P X1 ∈ [l − 1, l) = 2c E l · 1{X1 ∈[l−1,l)}
l=1 l=1
| {z }
6(X1 +1)·1{X1 ∈[l−1,l)}

6 2c · E[X1 ] + 1 < ∞,
where we used the fact that
∞ ∞ ∞
X 1 1 X 1 1 X 1 1 1 1 2
2
6 2+ = 2+ − = 2+ 6 .
i l (i − 1)i l i−1 i l l l
i=l i=l+1 i=l+1

It remains to show (2.4). To this end note that

bαn c = kn 6 αn < kn + 1

α>1 α−1
n n n−1
⇒ kn > α − 1 > α − α = αn .
α
| {z }
=:cα

Let ni be the smallest natural number satisfying kni = bαni c > i, hence
αni > i, then
X 1 X 1 1 c−2 1
6 c−2
α = c−2
α · −2
·α −2ni
6 α
−2
· 2.
kn2 α2n 1−α 1−α i
n : kn > i n>ni

74
Corollary 3.3. Let X1 , X2 , . . . be pairwise independent, identically distributed
with Xi > 0. Then
n
1X
P-a.s.

lim Xi = E[X1 ] ∈ [0, ∞]
n→∞ n
i=1

Pn n→∞
Proof. W.l.o.g. E[X1 ] = ∞. Then 1

n i=1 Xi (ω) ∧ N −−−→ E[X1 ∧ N ],
P -a.s. for all N , hence
n n
1X 1X n→∞ N →∞
Xi > Xi ∧ N −−−→ E[X1 ∧ N ] % E[X1 ] P -a.s.
n n
i=1 i=1

Example 3.4. Growth in random media Let Y1 , Y2 , . . . be i.i.d., Yi > 0,

with m := E[Yi ] (existence of such a sequence later!)
Define X0 = 1 and inductively Xn := Xn−1 · Yn
Clearly, Xn = Y1 · · · Yn and E[Xn ] = E[Y1 ] · · · E[Yn ] = mn , hence

+∞
 if m > 1 exponential growth (supercritical)
E[Xn ] → 1 if m = 1 critical
if m < 1 exponential decay (subcritical)

0


What will be the long-time behaviour of Xn (ω)?

Surprisingly, in the supercritical case m > 1, one may observe that limn→∞ Xn =
0 with positive probability.
Explanation: Suppose that log Yi ∈ L1 . Then
n
1 1X n→∞
log Xn = log Yi −−−→ E[log Y1 ] =: α P -a.s.
n n
i=1

and

α < 0: ∃ ε > 0 with α + ε < 0, so that Xn (ω) 6 en(α+ε) ∀ n > n0 (ω), hence
P-a.s. exponential decay

α > 0: ∃ ε > 0 with α − ε > 0, so that Xn (ω) > en(α−ε) ∀ n > n0 (ω), hence
P-a.s. exponential growth

75
Note that by Jensen’s inequality

α = E[log Y1 ] 6 log E[Y1 ],

| {z }
=m

and typically the inequality is strict, i.e. α < log m, so that it might happen
that α < 0 although m > 1 (!)
Illustration As a particular example consider the following model:
Let X0 := 1 be the capital at time 0. At time n − 1 invest 12 Xn−1 and win
c 12 Xn−1 or 0, both with probability 12 . Then
(
1 1
2 (1 + c) with prob. 12
Xn = Xn−1 + = Xn−1 Yn
|2 {z } 0 with prob. 12 ,
not invested | {z }
gain/loss

with (
1
2 (1 + c) with prob. 12
Yn := 1
2 with prob. 12 ,

so that E[Yi ] = 41 (1 + c) + 1
4 = c+2
4 (supercritical if c > 2)
On the other hand
" #
1 1 1 1 1 + c c<3
E[log Y1 ] = · log (1 + c) + log = · log < 0.
2 2 2 2 4

n→∞
Hence Xn −−−→ 0 P -a.s. with exponential rate for c < 3, whereas at the same
time for c > 2 E[Xn ] = mn % ∞ with exponential rate.

Back to Kolmogorov’s law of large numbers:

Let X1 , X2 , . . . ∈ L1 i.i.d. with m := E[Xi ]. Then
n
1X n→∞
Xi (ω) −−−→ E[X1 ] P -a.s.
n
i=1

which corresponds to the empirical measurement of the expectation E[X1 ]. We

want a similar statement for the distribution P ◦ X1−1 . For this, define the

76
"random measure"
n
1X
%n (ω, A) := 1A Xi (ω)
n
i=1

= "relative frequency of the visit in A"

Then
n
1X
%n (ω, · ) = δXi (ω)
n
i=1

is a probability measure on R, B(R) (for fixed ω) and it is called the empirical

distribution of the first n observations.
Proposition 3.5. (“Fundamental Theorem of Statistics”) For P -almost every
ω ∈ Ω:
n→∞
%n (ω, · ) −−−→ µ := P ◦ X1−1 weakly,
which corresponds to the empirical measurement of the theoretical distribution
µ.
Proof. Clearly, Kolmogorov’s law of large numbers implies that for any x ∈ R
n
1X
Fn (ω, x) := %n ω, (−∞, x] = 1(−∞,x] Xi (ω)
n
i=1

→ E[1(−∞,x] (X1 )] = P [X1 ≤ x] = µ (−∞, x] =: F (x)
P -a.s., hence for every ω ∈
/ N (x) for some P -null set N (x).
Then
[
N := N (r).
r∈Q

is a P -null set too, and for all x ∈ R and all s, r ∈ Q with s < x < r and
ω∈ / N:
F (s) = lim Fn (ω, s) 6 lim inf Fn (ω, x)
n→∞ n→∞

6 lim sup Fn (ω, x) 6 lim Fn (ω, r) = F (r).

n→∞ n→∞

Hence, if F is continuous at x, then for ω ∈

/N
lim Fn (ω, x) = F (x).
n→∞
Now the assertion follows from Corollary 10.5.

77
4 Joint distribution and convolution
Let Xi ∈ L1 i.i.d. Kolmogorov’s law of large numbers implies that
n
1X n→∞
Xi (ω) −−−→ E[X1 ] P -a.s.
n
|i=1 {z }
=:Sn

hence
Z −1 ! " #
Sn Sn
f (x) d P ◦ (x) = E f
n n
(Lebesgue) Z
n→∞
−−−→ f E[X1 ] = f (x) dδE[X1 ] (x) ∀f ∈ Cb (R)

i.e., the distribution of Snn converges weakly to δE[X1 ] . This is not surprising,
because at least for Xi ∈ L2
n
Sn 1 X n→∞
var = 2 var(Xi ) −−−→ 0.
n n | {z }
i=1 =var(X )
1

We will see later that if we rescale Sn appropriately, namely Sn

√
n
, so that
Sn
= var(X1 ), then the sequence of distributions of √ Sn
is asymptotically

var √ n n
√
distributed as a normal distribution N ( n E[X1 ], var(X1 )). In particular, if Sn
Sn −1
is centered, i.e. E[Sn ] = 0, or equivalently E[X1 ] = 0, then P ◦ √

n
→
N (0, var(X1 )) weakly.
One problem in this context is: How to calculate the distribution of Sn ? Since
Sn is a function of X1 , . . . , Xn , we need to consider their joint distribution in
the sense of the following definition:

Definition 4.1. Let X1 , . . . , Xn be real-valued r.v. on (Ω, A, P ). Then the

distribution µ̄ := P ◦ X̄ −1 of the transformation

X̄ : Ω → Rn ,

ω 7→ X̄(ω) := X1 (ω), . . . , Xn (ω)

under P is said to be the joint distribution of X1 , . . . , Xn .

Note that µ̄ is a probability measure on Rn , B(Rn ) with µ̄(Ā) = P [X̄ ∈ Ā]
for all Ā ∈ B(Rn ).

78
Remark 4.2. (i) µ̄ is well-defined, because X̄ : Ω → Rn is A/B(Rn )-
measurable.
Proof:

B(Rn ) = σ A1 × · · · × An Ai ∈ B(R)

(= σ A1 × · · · × An Ai = (−∞, xi ] , xi ∈ R )

and if Ai ∈ B(R), i = 1, ..., n, then

n
\
X̄ −1
(A1 × · · · × An ) = {Xi ∈ Ai } ∈A
| {z }
i=1 ∈A

which implies the measurability of the transformation X̄.

(ii) Proposition 1.11.5 implies that µ̄ is uniquely determined by

n
\
µ̄(A1 × · · · × An ) = P {Xi ∈ Ai } .
i=1

Example 4.3. (i) Let X, Y be r.v., uniformly distributed on [0, 1]. Then
• X, Y independent ⇒ joint distribution = uniform distribution on
[0, 1]2
• X = Y ⇒ joint distribution = uniform distribution on the diagonal

(ii) Let X, Y be independent, N (m, σ 2 ) distributed. The following Proposition

shows that the joint distribution of X and Y has the density

1 1 2 2

f (x, y) = · exp − · (x − m) + (y − m)
2πσ 2 2σ 2

which is a particular example of a 2-dimensional normal distribution.

In the case m = 0 it follows that
p
R := X 2 + Y 2,
Y
Φ := arctan ,
X

79
are independent and

Φ has a uniform distribution on − π2 , π2 ,

( 2
r r
if r > 0

exp − 2σ
R has a density σ2 2

0 if r < 0.

Definition 4.4. (Products of probability spaces) The product of measurable

spaces (Ωi , Ai ), i = 1, . . . n, is defined as the measurable space (Ω, A) given
by
Ω := Ω1 × . . . × Ωn

endowed with the smallest σ-algebra

A := σ({A1 × . . . × An | Ai ∈ Ai , 1 ≤ i ≤ n})

generated by measurable cylindrical sets. A is said to be the product σ-algebra

Nn
of Ai (notation: i=1 Ai ).
Let Pi , i = 1, . . . , n, be probability measures on (Ωi , Ai ). Then there exists
a unique probability measure P on the product space (Ω, A) satisfying

P (A1 × · · · × An ) = P1 (A1 ) · . . . · Pn (An )

for every measurable cylindrical set. P is called the product measure of Pi

Nn
(notation: i=1 Pi ).
(Uniqueness of P follows from 1.11.5, existence later!)

Proposition 4.5. Let X1 , . . . , Xn be r.v. on (Ω, A, P ) with distributions

µ1 , . . . , µn and joint distribution µ̄. Then

n
O
X1 , . . . , Xn independent ⇔ µ̄ = µi ,
i=1

(i.e., µ̄(A1 × · · · × An ) = if Ai ∈ B(R)).

Q
i µi (Ai )
In this case:

(i) µ̄ is uniquely determined by µ1 , . . . , µn .

80
(ii)
Z
ϕ(x1 , . . . , xn ) dµ̄(x1 , . . . , xn )
Z Z !
= ··· ϕ(x1 , . . . , xn ) µi1 (dxi1 ) · · · µin (dxin ).

for all B(Rn )-measurable functions ϕ : Rn → R̄ with ϕ ≥ 0 or ϕ µ̄-

integrable.

(iii) If µi is absolutely continuous with density fi , i = 1, . . . , n, then µ̄ is

absolutely continuous with density
n
Y
f¯(x̄) := fi (xi ).
i=1

Proof. The equivalence is obvious.

(i) Obvious from part (ii) of the previous Remark 4.2.

(ii) See text books on measure theory.

(iii) f¯ is nonnegative and measurable on Rn w.r.t. B(Rn ), and

Z n Z
Y
f¯(x̄) dx̄ = fi (xi ) dxi = 1.
Rn i=1 R

Hence,
Z
µ̌(A) := f¯(x̄) dx̄, A ∈ B(Rn ),
A

defines a probability measure on (Rn , B(Rn )). For A1 , . . . , An ∈ B(R)

it follows that
n
Y n Z
Y
µ̄(A1 × · · · × An ) = µi (Ai ) = fi (xi ) dxi
i=1 i=1 Ai
Z
(ii)
= 1A1 ×···×An (x̄) · f¯(x̄) dx̄ = µ̌(A1 × · · · × An ).

Hence µ̄ = µ̌ by 1.11.5.

81
Let X1 , . . . , Xn be independent, Sn := X1 + · · · + Xn
How to calculate the distribution of Sn with the help of the distribution of
Xi ?
In the following denote by Tx : R1 → R1 , y 7→ x + y, the translation by
x ∈ R.
Proposition 4.6. Let X1 , X2 be independent r.v. with distributions µ1 , µ2 .
Then:
(i) The distribution of X1 + X2 is given by the convolution
Z
µ1 ∗ µ2 := µ1 (dx1 ) µ2 ◦ Tx−1
1
, i.e.
Z Z
µ1 ∗ µ2 (A) = 1A (x1 + x2 ) µ1 (dx1 ) µ2 (dx2 )
Z
= µ1 (dx1 ) µ2 (A − x1 ) ∀ A ∈ B(R1 ) .

(ii) If one of the distributions µ1 , µ2 is absolutely continuous, e.g. µ2 with

density f2 , then µ1 ∗ µ2 is absolutely continuous again with density
Z
f (x) := µ1 (dx1 ) f2 (x − x1 )
Z
= f1 (x1 ) · f2 (x − x1 ) dx1 =: (f1 ∗ f2 )(x) if µ1 = f1 dx1 .

Proof. (i) Let A ∈ B(R), and define Ā := (x1 , x2 ) ∈ R2 x1 + x2 ∈ A .

Then

P [X1 + X2 ∈ A] = P (X1 , X2 ) ∈ Ā = (µ1 ⊗ µ2 )(Ā)
ZZ
= 1Ā (x1 , x2 ) d(µ1 ⊗ µ2 )(x1 , x2 )
ZZ
= 1A (x1 + x2 ) d(µ1 ⊗ µ2 )(x1 , x2 )
Z Z
= 1A−x1 (x2 ) µ2 (dx2 ) µ1 (dx1 )
Z
= µ2 (A − x1 ) µ1 (dx1 ) = (µ1 ∗ µ2 )(A).

82
(ii)
Z Z Z
(µ1 ∗ µ2 )(A) = µ1 (dx1 ) µ2 (A − x1 ) = µ1 (dx1 ) f2 (x2 ) dx2
A−x1
change of variable Z Z
x−x1 =x2
= µ1 (dx1 ) f2 (x − x1 ) dx
A
Z Z
4.5
= µ1 (dx1 ) f2 (x − x1 ) dx.
A

Example 4.7.

(i) Let X1 , X2 be independent r.v. with Poisson-distribution πλ1 and πλ2 .

Then X1 + X2 has Poisson-distribution πλ1 +λ2 , because
n n
X
−(λ1 +λ2 )
X λk 1 λn−k
2
(πλ1 ∗ πλ2 )(n) = πλ1 (k) · πλ2 (n − k) = e ·
k! (n − k)!
k=0 k=0
n
(λ1 + λ2 )n

1 X n
= e−(λ1 +λ2 ) · λk1 λn−k
2 = e−(λ1 +λ2 ) · .
n! k n!
k=0

(ii) Let X1 , X2 be independent r.v. with normal distributions N (mi , σi2 ), i =

1, 2. Then X1 +X2 has normal distribution N (m1 +m2 , σ12 +σ22 ), because
fm1 +m2 ,σ12 +σ22 = fm1 ,σ12 ∗ fm2 ,σ22 (Exercise!)

(iii) The Gamma distribution Γα,p is defined through its density γα,p given by
(
1
Γ(p)
· αp xp−1 e−αx if x > 0
γα,p (x) =
0 if x 6 0

If X1 , X2 are independent with distribution Γα,pi , i = 1, 2, then X1 + X2

has distribution Γα,p1 +p2 . (Exercise!)
In the particular case pi = 1: The sum Sn = T1 + . . . + Tn of indepen-
dent r.v. Ti with exponential distribution with parameter α has Gamma-
distribution Γα,n , i.e.
(
αn
(n−1)!
· e−αx xn−1 if x > 0
γα,n (x) =
0 if x 6 0.

83
Example 4.8 (The waiting time paradox). Let T1 , T2 , . . . be independent,
exponentially distributed waiting times (e.g. time between reception of two
phone calls in a call center) with parameter α > 0, so that in particular
Z ∞
1
E[Ti ] = x · αe−αx dx = · · · = .
0 α

Fix some time t > 0. Let X denote the time-interval from the preceding event
to t, and Y denote the time-interval from t to the next event.
T T
z }|1 { z }|2 { . . . t
| {z } | {z }
X Y
Question: How long on average is the waiting time from t until the next event,
i.e., how big is E[Y ] ?

Naive guess: E[Y ] = 1

2α is wrong.

Correct Answer: E[Y ] = α1 , and

1 1
E[X] = (1 − e−αt ) ≈ for large t .
α α

More precisely:

(i) Y has exponential distribution with parameter α.

(ii) X has exponential distribution with parameter α, "compressed to" [0, t],
i.e.:

P [X > s] = e−αs ∀ 0 6 s 6 t,
P [X = t] = e−αt ;

In particular,
Z t
1
E[X] = s · αe−αs ds + t · e−αt = · · · = (1 − e−αt )
0 α

and [t − X, t + Y ] has on average length α1 (2 − e−αt ) (≈ 2 · α1 for large t).

84
(iii) X, Y are independent.

Descriptive: Choosing t “randomly” it is more likely to pick a large interval.

Proof. Let us first determine the joint distribution of X and Y : Fix 0 6 x 6 t

and y > 0. Then for Sn := T1 + · · · + Tn , S0 := 0:

P [X > x, Y > y]

[
= P {t − Sn > x, Sn+1 − t > y}
n≥0
∞
X
= P [T1 > y + t] + P Sn 6 t − x, Tn+1 > y + t − Sn
n=1
∞ ZZ
X
= e−α(t+y) + 1[0,t−x]×[y+t−s,∞) (s, t) · γα,n (s) · αe−αr ds dr
n=1
∞ Z
X t−x
−α(t+y)
=e + γα,n (s) · e−α(y+t−s) ds
n=1 0
Z t−x ∞
X
= e−α(t+y) 1 + eαs γα,n (s) ds
0
|n=1 {z }
=α
Z t−x
−α(t+y) αs
=e 1+ αe ds
0

= e−α(t+y) · eα(t−x) = e−αy · e−αx .

Consequently:

(i) Setting x = 0 we get: Y ist exponentially distributed with parameter α.

(ii) Setting y = 0 we get: X ist exponentially distributed with parameter α,

compressed to [0, t].

(iii) X, Y are independent.

85
5 Characteristic functions
Let M1+ (Rn ) be the set of all probability measures on (Rn , B(Rn )).
For given µ ∈ M1+ (Rn ) define its characteristic function as the complex-
valued function µ̂ : Rn → C defined by
Z Z Z
ihu,yi
µ̂(u) := e µ(dy) := cos(hu, yi) µ(dy) + i sin(hu, yi) µ(dy) .

Proposition 5.1. Let µ ∈ M1+ (Rn ). Then

(i) µ̂(0) = 1.

(ii) |µ̂| 6 1.

(iii) µ̂(−u) = µ̂(u).

(iv) µ̂ is uniformly continuous.

(v) µ̂ is positive definite, i.e. for all c1 , . . . , cm ∈ C, u1 , . . . , um ∈ Rn , m > 1:

m
X
cj c̄k · µ̂(uj − uk ) > 0.
j,k=1

Proof. Exercise.

Proposition 5.2 (Uniqueness theorem). Let µ1 , µ2 ∈ M1+ (Rn ) with µ̂1 = µ̂2 .
Then µ1 = µ2 .

Proposition 5.3 (Bochner’s theorem). Let ϕ : Rn → C be a continuous,

positive definite function with ϕ(0) = 1. Then there exists one (and only one)
µ ∈ M1+ (Rn ) with µ̂ = ϕ.

Proposition 5.4 (Lévy’s continuity theorem). Let (µm )m∈N be a sequence in

M1+ (Rn ). Then
(i) limm→∞ µm = µ weakly implies limm→∞ µ̂m = µ̂ uniformly on every
compact subset of Rn .

(ii) Conversely, if (µ̂m )m∈N converges pointwise to some function ϕ : Rn → C

which is continuous in u = 0, then there exists a unique µ ∈ M1+ (Rn )
such that µ̂ = ϕ and limm→∞ µm = µ weakly.

86
Proof. For the proofs or references where to find the proofs of the three previous
propositions see Klenke Theorem 15.9, 15.23, and 15.29.

Let (Ω, A, P ) be a probability space and X̄ : Ω → Rn be A/B(Rn )-

measurable. Let PX̄ (:= P ◦ X̄ −1 ) be the distribution of X̄. Then
Z Z h i
ihu,yi ihu,X̄i ihu,X̄i
ϕX̄ (u) := P̂X̄ (u) = e PX̄ (dy) = e dP = E e

is called the characteristic function of X̄.

Remark 5.5. X1 , . . . , Xn are independent if and only if

Yn
n
P̂(X1 ,...,Xn ) (u1 , . . . , un ) = (⊗j=1 PXj )(u1 , . . . , un ) ( =
\ P̂Xj (uj ) ),
| {z } | {z }
j=1
=ϕ(X1 ,...,Xn ) =ϕXj (uj )

n
Y
i.e.: P̂(X1 ,...,Xn ) = P̂Xj ◦ P rj , where P rj (u) = uj .
j=1

Proposition 5.6. Let X1 , . . . , Xn be independent r.v., α ∈ R and S :=

Pn
α k=1 Xk . Then for all u ∈ R:
n
Y
ϕS (u) = ϕXk (αu).
k=1

Proof.
Z n
Z Y n Z n
Indep.
Y Y
iuS iαuXk iαuXk
ϕS (u) = e dP = e dP = e dP = ϕXk (αu).
k=1 k=1 k=1

Proposition 5.7. For all u ∈ Rn :

n2 Z
1 1 2 1 2
eihu,yi e− 2 kyk dy = e− 2 kuk .
2π

Proof. See Theorem 15.12 in Klenke.

Example 5.8. (i) δ̂a (u) = eiua .

87
P∞ P∞
(ii) Let µ := i=0 αi δai (αi ≥ 0, i=0 αi = 1). Then

∞
X
µ̂(u) = αi eiuai , u ∈ R.
i=0

Special cases:
Pn
a) Binomial distribution βnp = n
pk q n−k δk Then for all u ∈ R:

k=0 k

n
X n
β̂np (u) = pk q n−k · eiuk = (q + peiu )n .
k
k=0

P∞ −α αn δ .
b) Poisson distribution πα = n=0 e n! n Then for all u ∈ R:

∞
−α
X αn iu
−1)
π̂α (u) = e · eiun = eα(e .
n!
n=0 | {z }
iu n
= (αen! )

6 Central limit theorem

Definition 6.1. Let X1 , X2 , . . . ∈ L2 be independent r.v.’s with strictly posi-
tive variances, Sn := X1 + · · · + Xn and

Sn − E[Sn ]
Sn∗ := p ("standardized sum")
var(Sn )

In particular E[Sn∗ ] = 0 and var(Sn∗ ) = 1.

The sequence X1 , X2 , . . . of r.v.’s is said to have the central limit property

(CLP), if

lim PSn∗ = N (0, 1) weakly,

n→∞

or equivalently
Z b
1 x2
lim P [Sn∗ 6 b] = √ e− 2 dx = Φ(b), ∀b ∈ R.
n→∞ 2π −∞

88
Proposition 6.2. (Central limit theorem) Let X1 , X2 , . . . ∈ L2 be independent
r.v., σn2 := var(Xn ) > 0 and
n
X 21
sn := σk2 .
k=1

Assume that (Xn )n∈N satisfies Lindeberg’s condition

n Z 2
X Xk − E[Xk ]
lim dP = 0 ∀ε > 0. (L)
sn
n o
n→∞ |Xk −E[Xk ]|
>ε
k=1 sn
| {z }
=:Ln (ε)

Then (Xn )n∈N has the CLP.

Remark 6.3. (i) (Xn )n∈N i.i.d. ⇒ (Xn )n∈N satisfies (L).
Proof: Let m := E[Xn ], σ 2 := var(Xn ). Then s2n = nσ 2 , so that
Z Lebesgue
−2 2 n→∞
Ln (ε) = σ √
(X1 − m) dP −−−→ 0.
{|X1 −m|>ε nσ}

(ii) The following stronger condition, known as Lyapunov’s condition, is often

easier to check in applications:
n h i
X 2+δ
E Xk − E[Xk ]
k=1
∃ δ > 0 : lim = 0. (Lya)
n→∞ s2+δ
n

To see that Lyapunov’s condition implies Lindeberg’s condition note that

for all ε > 0:
Xk − E[Xk ]
>1
εsn
2+δ
Xk − E[Xk ] 2
⇒ > Xk − E[Xk ]
(εsn )δ
and therefore
n h i
X 2+δ
E Xk − E[Xk ]
1 k=1
Ln (ε) 6 · .
εδ s2+δ
n

89
(iii) Let (Xn ) be bounded and suppose that sn → ∞. Then (Xn ) satisfies
Lyapunov’s condition for any δ > 0, because
α
|Xk | 6
2
⇒ Xk − E[Xk ] 6 α
n h i n h i
X 2+δ X 2 δ
E Xk − E[Xk ] E Xk − E[Xk ] α
k=1 k=1
⇒ 6
s2+δ
n s2n sδn
δ n i α δ
α 1 X
h
2
= · E Xk − E[Xk ] = .
sn s2n sn
|k=1 {z }
=s2n

Lemma 6.4. Suppose that (Xn ) satisfies Lindeberg’s condition. Then

σk
lim max = 0. (2.5)
n→∞ 16k6n sn

Proof. For all 1 6 k 6 n

2 Z 2 Z 2
σk Xk − E[Xk ] Xk − E[Xk ]
= dP 6 dP + ε2
sn sn sn
n o
|Xk −E[Xk ]|
sn
>ε

6 Ln (ε) + ε2 .
The proof of Proposition 6.2 requires some further preparations.
Lemma 6.5. For all t ∈ R and n ∈ N:
it (it)2 (it)n−1 |t|n
eit − 1 − − − ··· − 6 .
1! 2! (n − 1)! n!

Proof. Define f (t) := eit , then f (k) (t) = ik eit . Then Taylor series expansion
around t = 0, applied to real and imaginary part, implies that

it (it)n−1
e − 1 − ··· − = Rn (t)
(n − 1)!
with
t |t|
|t|n
Z Z
1 n−1 n is 1
Rn (t) = (t−s) i e ds 6 sn−1 ds = .
(n − 1)! 0 (n − 1)! 0 n!

90
Proposition 6.6. Let X ∈ L2 . Then ϕX (u) = eiuX dP is two times
R
continuously differentiable with
Z Z
ϕ0X (u) =i Xe iuX
dP , ϕ00X (u) =− X 2 eiuX dP .

In particular

ϕ0X (0) = i · E[X], ϕ00X (0) = − E[X 2 ], |ϕ00X | 6 E[X 2 ].

Moreover, for all u ∈ R

1
ϕX (u) = 1 + iu · E[X] + · θ(u)u2 · E[X 2 ]
2

with θ(u) 6 1 and θ(u) ∈ C.

Proof. Clearly,

(eiuX )0 = iX · eiuX , (eiuX )00 = −X 2 eiuX , |eiuX | = 1 .

Now, Lebesgue’s dominated convergence theorem implies all assertions up to

the last one. For the proof of the last assertion note that the previous lemma
implies in the case n = 2, t = uX that

1 2 2
|eiuX − 1 − iuX| 6 ·u X .
2
Hence
Z
1 2
ϕX (u) − 1 − iu · E[X] = eiuX − 1 − iuX dP 6 · u · E[X 2 ].
2

ϕX (u)−1−iu·E[X]
Now define θ(u) := 0 if u2 E[X 2 ] = 0, and θ(u) := 1
·u2 ·E[X 2 ]
otherwise.
2

From now on assume that X1 , X2 , · · · ∈ L2 are independent and

n
X 12
E[Xn ] = 0, σn2 := var(Xn ) > 0, sn = σk2 , n ≥ 1.
k=1

91
Proposition 6.7. Suppose that
σk
(F ) lim max =0 and
n→∞ 1≤k≤n sn
n !
X u 1
(b) lim ϕXk −1 = − u2 ∀u ∈ R.
n→∞ sn 2
k=1
Then (Xn ) has the CLP.
Proof. It is sufficient to show that
n
Y u 1 2
lim ϕXk = e− 2 u . (2.6)
n→∞ sn
k=1
Pn
because for Sn∗ = 1
sn k=1 Xk we have that
n
Y u
ϕSn∗ (u) = ϕXk ,
sn
k=1
n→∞ 1 2
and ϕSn∗ (u) −−−→ e− 2 u = N \ (0, 1)(u) pointwise as well as N
\ (0, 1)(u) contin-
uous at u = 0, implies by Lévy’s continuity theorem and the uniqueness theorem
that limn→∞ PSn∗ = N (0, 1) weakly.
For the proof of (2.6) we need to show that for all u ∈ R
n n !
Y u Y u
lim ϕXk − exp ϕXk −1 = 0.
n→∞ sn sn
k=1 k=1
| {z }
=exp[ ··· ]→exp[− 12 u2 ]
P

To this end fix u ∈ R and note that |ϕXk | 6 1, hence

u u
exp ϕXk −1 = exp Re ϕXk − 1 6 1.
sn sn

Furthermore, for a1 , . . . , an , b1 , . . . , bn ∈ z ∈ C |z| 6 1

n
Y n
Y
ak − bk = (a1 − b1 ) · a2 · · · an + b1 · (a2 − b2 ) · a3 · · · an + . . .
k=1 k=1

+ b1 · · · bn−1 · (an − bn )
n
X
6 |ak − bk |.
k=1

92
Consequently,
n n
Y u Y u
ϕXk − exp ϕXk −1
sn sn
k=1 k=1
n
X u u
6 ϕXk − exp ϕXk −1 =: Dn .
sn sn
k=1

If we define zk := ϕXk ( sun ) − 1, we can write

n
X
Dn = |zk + 1 − ezk |.
k=1

Fix ε ∈ (0, 21 ]. Then

|z + 1 − ez | 6 |z|2 6 ε|z| ∀ |z| 6 ε .

Note that E[Xk ] = 0 and E[Xk2 ] = σk2 . The previous proposition now implies
that for all k
2 2
u u 1 u u 1 u
|zk | = ϕXk −1 = i ·E[Xk ]+ ·θ( ) ·E[Xk2 ] 6 σk2 ,
sn sn 2 sn sn 2 sn
and moreover by (F) we can find n0 ∈ N such that for all n > n0 and 1 6 k 6 n
2
1 u
σk2 < ε.
2 sn
Hence for all n > n0
n n
X u2 X σk2 u2
Dn 6 ε |zk | 6 ε = ε · .
2 s2n 2
k=1 k=1

Consequently, limn→∞ Dn = 0.
Proof of Proposition 6.2. W.l.o.g. assume that E[Xn ] = 0 for all n ∈ N. We
will use Proposition 6.7. Since (L) ⇒ (F) by Lemma 6.4 it remains to show (b)
of Proposition 6.7. We will show that Lindeberg’s condition implies (b), i.e. we
show that (L) implies
n !
X u 1
lim ϕXk −1 = − · u2 .
n→∞ sn 2
k=1

93
Let u ∈ R, n ∈ N, 1 6 k 6 n. By Lemma 6.5, we get
3
1 u2

u u 1 u
Yk := exp i · · Xk −1−i· · Xk + · 2 · Xk2 6 · Xk ,
sn sn 2 sn 6 sn
| {z }
E[... ]=0

and by the triangle inquality and Lemma 6.5 again, we get

1 u2 u2

u u
Yk 6 exp i · · Xk −1−i· · Xk + · 2 · Xk2 6 2 · Xk2 .
sn sn 2 sn sn
Then
n
! n
!
1 u2

X u 1 2 X u
ϕXk −1 + ·u = ϕXk − 1 + · 2 · σk2
sn 2 sn 2 sn
k=1 k=1

n Z
1 u2

X u u
· Xk + · 2 · Xk2 dP

6 exp i · · Xk −1−i·
sn sn 2 sn
k=1 | {z }
E[... ]=0
n
X
6 E[Yk ],
k=1
and for any ε > 0
Z Z
E[Yk ] = Yk dP + Yk dP
{|Xk |>εsn } {|Xk |<εsn }

u2 |u|3
Z Z
6 2 Xk2 dP + 3 |Xk |3 dP.
sn {|Xk |>εsn } 6sn {|Xk |<εsn }
Note that
σk2
Z Z
1 3 ε 2
|X k | dP 6 X k dP = ε · ,
s3n {|Xk |<εsn } s2n s2n
so that we obtain
n n Z 2 n
|u|3 X σk2

X
2
X Xk
E[Yk ] 6 u dP + ·ε
X
{| snk |>ε} sn 6 s2n
k=1 k=1
|k=1{z }
=1

|u|3
= u2 Ln (ε) + · ε.
6

94
Consequently
n
X
lim E[Yk ] = 0 ,
n→∞
k=1

and thus
n !
X u 1 2
lim ϕXk −1 + ·u = 0.
n→∞ sn 2
k=1

Example 6.8 (Applications). (i) "Ruin probability"

Consider a portfolio of n contracts of a risc insurance (e.g. car insurance,
fire insurance, health insurance, ...). Let Xi > 0 be the claim size (or claim
severity) of the ith contract, 1 6 i 6 n. We assume that X1 , . . . , Xn ∈ L2
are i.i.d. with m := E[Xi ] and σ 2 := var(Xi ).
Suppose the insurance holder has to pay he following premium

Π := m + λσ 2
= average claim size + safety loading.

After some fixed amount of time:

Income: nΠ
n
X
Expenditures: Sn = Xi .
i=1

Suppose that K is the initial capital of the insurance company. What is

the probability P (R), where

R := {Sn > K + nΠ} denotes the ruin ?

We assume here that:

• No interest rate.
• Payments due only at the end of the time period.

95
Let
Sn − nm
Sn∗ := √ .
nσ
The central limit theorem implies for large n that Sn∗ ∼ N (0, 1), so that
K + nλσ 2

K + nΠ − nm
P (R) = P Sn∗ > √ = P Sn∗ > √
nσ nσ
K + nλσ 2

CLT
≈ 1−Φ √ ,
nσ
| {z }
n→∞
−−−→∞
where Φ denotes the distribution function of the standard normal distribu-
tion. Note that the ruin probability decreases with an increasing number
of contracts.
Example
Assume that n = 2000, σ = 60, λ = 0.5‰.

(a) K = 0 ⇒ P (R) ≈ 1 − Φ(1.342) ≈ 9%.

(b) K = 1500 ⇒ P (R) ≈ 3%.
How large do we have to choose n in order to let the probability of ruin
P (R) fall below 1‰?
Answer: Φ(. . .) > 0.999, hence n > 10 611.

(ii) Stirling’s formula

Remark: Stirling proved the following formula
√ 1
n! ≈ 2πnn+ 2 e−n (2.7)
in the year 1730 and De Moivre used it in his proof of the CLT for Bernoulli
experiments.
Conversely, in 1977, Weng provided an independent proof of the formula,
using the CLT (note that we did not use Stirling’s formula in our proof of
the CLT). Here is Weng’s proof:
Let X1 , X2 , . . . be i.i.d. with distribution π1 , i.e.,
∞
−1
X 1
PXn = e δk .
k!
k=0

96
Then Sn := X1 + · · · + Xn has Poisson distribution πn , i.e.,
∞
−n
X nk
PSn = e δk ,
k!
k=0

and in particular E[Sn ] = var(Sn ) = n. Thus

Sn − n
Sn∗ = √ ,
n

so that Sn∗ = tn ◦ Sn for tn (x) := √ .

x−n
n
Then
Z Z
= E f (Sn∗ ) = E (f ◦ tn )(Sn ) =

f dPSn∗ f ◦ tn d PSn
|{z}
=πn

In particular, for

f∞ (x) := x−

= (−x) ∨ 0

it follows that
n
nk
Z Z
x−n −n
X n−k
f∞ dPSn∗ = f∞ √ πn (dx) = e · √
R n k! n
|( {z } k=0 | {z }
= 0 if x>n =f∞ ( k−n
√ )
n
n−x
= √
n
if x6n

n
e−n nk (n − k)
X
= √ · n+
n k!
k=1
n 1
e−n nk+1 nk e−n · nn+ 2
X
= √ · n+ − = .
n k! (k − 1)! n!
|k=1 {z }
n+1 1
= n n! − n0!

Moreover,
Z Z 0 0
1 2
− x2 1 x2 1
f∞ dN (0, 1) = √ (−x)·e dx = √ ·e− 2 =√ .
2π −∞ 2π −∞ 2π

97
Hence, Stirling’s formula (2.7) would follow, once we have shown that
Z Z
n→∞
f∞ dPSn∗ −−−→ f∞ dN (0, 1). (2.8)

Note that this is not implied by the weak convergence in the CLT since
f∞ is continuous but unbounded. Hence, we consider for given m ∈ N

fm := f∞ ∧ m ∈ Cb (R) .

The CLT now implies that

Z Z
n→∞
fm dPSn∗ −−−→ fm dN (0, 1).

Define gm := f∞ − fm (≥ 0). (2.8) then follows from a "3ε-argument",

once we have shown that
Z
1
(0 6) gm dPSn∗ 6 ∀ m,
m
Z
1
(0 6) gm dN (0, 1) 6 ∀m.
m

The first inequality follows from

Z Z Z

gm dPSn∗ = |x| − m dPSn∗ 6 |x| dPSn∗
]−∞,−m[ ]−∞,−m]
|x|
>1
Z
m 1 1
6 x2 dPSn∗ 6 · var(S ∗ ) ,
m ]−∞,−m] m | {z n}
=1

the second inequality can be shown similarly.

Probability Theory - R.S.varadhan
No ratings yet
Probability Theory - R.S.varadhan
225 pages
Maths Notes For Class 10 Chapter 15 Probability
No ratings yet
Maths Notes For Class 10 Chapter 15 Probability
2 pages
CH 3
No ratings yet
CH 3
127 pages
CH1 On Measure Theoretic Prob
No ratings yet
CH1 On Measure Theoretic Prob
62 pages
Probability Theory - Varadhan
No ratings yet
Probability Theory - Varadhan
225 pages
Lecture Notes Stochastic Calculus For Finance (2021-22)
No ratings yet
Lecture Notes Stochastic Calculus For Finance (2021-22)
55 pages
(Courant Lecture Notes) S. R. S. Varadhan - Probability Theory-Amer Mathematical Society (2001) PDF
100% (1)
(Courant Lecture Notes) S. R. S. Varadhan - Probability Theory-Amer Mathematical Society (2001) PDF
178 pages
STAT733 Notes
No ratings yet
STAT733 Notes
216 pages
Advanced Probability Theory Notes
No ratings yet
Advanced Probability Theory Notes
384 pages
Stochastic Processes - Jiahua Chen
No ratings yet
Stochastic Processes - Jiahua Chen
113 pages
Sample - Space and Prob
No ratings yet
Sample - Space and Prob
8 pages
011 Cours en
No ratings yet
011 Cours en
93 pages
An Introduction To Probability Theory
100% (1)
An Introduction To Probability Theory
91 pages
Handbook Unit 1
No ratings yet
Handbook Unit 1
48 pages
Probability Theory: 1 Heuristic Introduction
No ratings yet
Probability Theory: 1 Heuristic Introduction
17 pages
FT Notes
No ratings yet
FT Notes
46 pages
Probability Olivier Knill
No ratings yet
Probability Olivier Knill
372 pages
FundProb Notes22
No ratings yet
FundProb Notes22
52 pages
Lect 1
No ratings yet
Lect 1
16 pages
Stochastic Processes: Stat433/833 Lecture Notes
No ratings yet
Stochastic Processes: Stat433/833 Lecture Notes
113 pages
Probability Theory Overview
No ratings yet
Probability Theory Overview
46 pages
Probability Theory: STAT310/MATH230 September 12, 2010
No ratings yet
Probability Theory: STAT310/MATH230 September 12, 2010
151 pages
LN SP2018
No ratings yet
LN SP2018
139 pages
Probability Theory: STAT310/MATH230 September 12, 2010
No ratings yet
Probability Theory: STAT310/MATH230 September 12, 2010
151 pages
4b ProbabilityNotes
No ratings yet
4b ProbabilityNotes
79 pages
Lnotes
No ratings yet
Lnotes
409 pages
GSM 199 Prev
No ratings yet
GSM 199 Prev
25 pages
Lnotes PDF
No ratings yet
Lnotes PDF
388 pages
CalTech Lecture On Probability
No ratings yet
CalTech Lecture On Probability
14 pages
1 Probability Space
No ratings yet
1 Probability Space
3 pages
Week 1
No ratings yet
Week 1
6 pages
Lect 02
No ratings yet
Lect 02
12 pages
Ma8451 PRP Model Set B
No ratings yet
Ma8451 PRP Model Set B
3 pages
Math 170A
No ratings yet
Math 170A
34 pages
Skript 2022
No ratings yet
Skript 2022
112 pages
Ananta Kumar Majee
No ratings yet
Ananta Kumar Majee
7 pages
01 - Probability Spaces
No ratings yet
01 - Probability Spaces
15 pages
STA347
No ratings yet
STA347
23 pages
Homework - 1
No ratings yet
Homework - 1
10 pages
Notes On Mathematical Expectation
No ratings yet
Notes On Mathematical Expectation
6 pages
Probability Measures
No ratings yet
Probability Measures
20 pages
Random Variables and Distribution MODULE 1
100% (1)
Random Variables and Distribution MODULE 1
36 pages
Martingale Theory and Applications: DR Nic Freeman June 4, 2015
No ratings yet
Martingale Theory and Applications: DR Nic Freeman June 4, 2015
40 pages
Statistics and Probability Lect3 S2025 Pucit
No ratings yet
Statistics and Probability Lect3 S2025 Pucit
27 pages
Cs229 Probability Review
No ratings yet
Cs229 Probability Review
36 pages
(Courant Lecture Notes in Mathematics 7) S. R. S. Varadhan-Probability Theory-Courant Institute of Mathematical Sciences - American Mathematical Society (2001)
0% (1)
(Courant Lecture Notes in Mathematics 7) S. R. S. Varadhan-Probability Theory-Courant Institute of Mathematical Sciences - American Mathematical Society (2001)
227 pages
LMS Content IVth Sem Module 3 PDF
No ratings yet
LMS Content IVth Sem Module 3 PDF
16 pages
An Introduction To Probability Theory - Geiss
No ratings yet
An Introduction To Probability Theory - Geiss
71 pages
UW MATH STAT394 Axioms Proba
No ratings yet
UW MATH STAT394 Axioms Proba
6 pages
Chap1 Mazumdar
No ratings yet
Chap1 Mazumdar
38 pages
Geometric Distribution in Statistics
No ratings yet
Geometric Distribution in Statistics
24 pages
Probability Notes
No ratings yet
Probability Notes
73 pages
Introductory Course M2 TSE: 1 Basic Concepts
No ratings yet
Introductory Course M2 TSE: 1 Basic Concepts
38 pages
2023 11 Exam P Syllabus
No ratings yet
2023 11 Exam P Syllabus
7 pages
Lecture 6 - Bayesian Econometrics For Bus and Econ 2021
No ratings yet
Lecture 6 - Bayesian Econometrics For Bus and Econ 2021
49 pages
Lecture 04 Slides
No ratings yet
Lecture 04 Slides
52 pages
Stochastic Processes
No ratings yet
Stochastic Processes
133 pages
Lecture 1 - Bayesian Econometrics For Bus and Econ 2021
No ratings yet
Lecture 1 - Bayesian Econometrics For Bus and Econ 2021
42 pages
Lecture 4 - Bayesian Econometrics For Bus and Econ 2021
No ratings yet
Lecture 4 - Bayesian Econometrics For Bus and Econ 2021
38 pages
WQU MScFE Discrete-Time Stochastic Processes Module 1
100% (1)
WQU MScFE Discrete-Time Stochastic Processes Module 1
52 pages
Notes
No ratings yet
Notes
24 pages
2019 - CS - Tutorial - Assignment 2
No ratings yet
2019 - CS - Tutorial - Assignment 2
3 pages
Discrete Random Variables Guide
No ratings yet
Discrete Random Variables Guide
17 pages
Tutorial 10 Gemini
No ratings yet
Tutorial 10 Gemini
5 pages
Algebra3 en
No ratings yet
Algebra3 en
116 pages
Discrete Random Variables Guide
No ratings yet
Discrete Random Variables Guide
7 pages
(Ebook) A First Look at Rigorous Probability Theory, Second Edition by Jeffrey S. Rosenthal ISBN 9789812772534, 9789812703712, 9789812703705, 9812703713, 9812703705, 9812772537 Download
No ratings yet
(Ebook) A First Look at Rigorous Probability Theory, Second Edition by Jeffrey S. Rosenthal ISBN 9789812772534, 9789812703712, 9789812703705, 9812703713, 9812703705, 9812772537 Download
60 pages
PTSP ASSIGNMENT Q Bank
No ratings yet
PTSP ASSIGNMENT Q Bank
9 pages
Probability Theory - Weber
No ratings yet
Probability Theory - Weber
117 pages
Lecturenotes3 4 Probability
No ratings yet
Lecturenotes3 4 Probability
14 pages
Probability Theory Lecture Notes
No ratings yet
Probability Theory Lecture Notes
111 pages
Chap 4
No ratings yet
Chap 4
21 pages
Random Process-1
No ratings yet
Random Process-1
19 pages
Ch2 Sol
No ratings yet
Ch2 Sol
17 pages
ELEN90054 Probability and Random Models
No ratings yet
ELEN90054 Probability and Random Models
5 pages
Topics in Mathematics 2 Homework 2
No ratings yet
Topics in Mathematics 2 Homework 2
2 pages
ENGR 371 Gradesavers
0% (1)
ENGR 371 Gradesavers
25 pages
Lecture 1
No ratings yet
Lecture 1
7 pages
Jeopardy! Game Setup Guide
No ratings yet
Jeopardy! Game Setup Guide
79 pages
Mean and Variance Derivations
No ratings yet
Mean and Variance Derivations
16 pages
FA VU Ex10
No ratings yet
FA VU Ex10
4 pages
Common Distributions for ACTSC 431/831
No ratings yet
Common Distributions for ACTSC 431/831
3 pages
Mean, Variance and Standard Deviations
No ratings yet
Mean, Variance and Standard Deviations
23 pages
Binomial Variables & Trials Lecture
No ratings yet
Binomial Variables & Trials Lecture
18 pages
Lectorial Module 2.1
No ratings yet
Lectorial Module 2.1
9 pages
Expectations: Proposition 12.1
No ratings yet
Expectations: Proposition 12.1
11 pages
Tutorial1 ch5350
No ratings yet
Tutorial1 ch5350
2 pages
Deep Learning (Grad) - Fall 2024 - Literature Review Talk Feedback Form Template
No ratings yet
Deep Learning (Grad) - Fall 2024 - Literature Review Talk Feedback Form Template
1 page