Chapter 01: Probability Theory
Jing Xu (RUC)
Renmin University of China
Fall
Jing Xu (RUC) SOURCE: Chapter 01
Modern Language about Probability
What is probability?
Classic interpretation: The frequencies that certain events
occur
I it is posterior
I every event is associated with a number in [0, 1]
Modern perspective: Probability is a mapping from the set of
“events” to the interval [0,1], satisfying some basic properties
(Kolmogorov)
I from now on, you may think probability as some kind of
function, which maps events to numbers. The number is
defined as “the probability that a particular event will occur”
I Q: how to build a model for “event”?
Jing Xu (RUC) Chapter 01
σ-Algebra: A Model of “Event”
Suppose we are doing a random experiment. Let a set Ω
include all possible outcomes of the experiment. Ω is called
the sample space, and every element in it is called a sample
point
Definition (σ-algebra): Given a set Ω, let F be a collection of
subsets of Ω. F is called a σ-algebra, if it satisfies the
following properties
I (i) ∅ ∈ F;
I (ii) if A ∈ F, then Ac ∈ F;
I (iii) if Ai ∈ F, then ∪∞
i=1 Ai ∈ F.
Basic property: σ-algebra is closed for common set operations
such as set intersection, set difference, and so on.
Jing Xu (RUC) Chapter 01
σ-Algebra: A Model of “Event”
Usually, given a collection of the subsets of Ω, we can
construct a smallest σ-algebra that includes this collection.
This σ-algebra is called to be generated by this collection
Consider a random experiment of tossing a fair coin twice
I what is the sample space?
I what different σ-algebras can you construct, given different
collections of subsets, like A1 = {{HH}}, A2 = {{HH, HT }},
and A3 = {{HH}, {HT }, {TH}, {TT }}?
I note that we have a clear meaning for every set in the above
σ-algebras
Jing Xu (RUC) Chapter 01
Probability Measure
Definition (Probability Measure): Given a set Ω, let F be a
σ-algebra of the subsets of Ω. A Probability Measure P is a
mapping from F to [0,1], satisfying the following properties
I (i) P(Ω) = 1;
P∞ collection of disjoint sets Ai ∈ F, we have
(ii) for a countable
I
P(∪∞ A
i=1 i ) = i=1 P(Ai ) (this implies P(∅) = 0).
Definition (Probability Space): If the above requirements are
satisfied, the triplet (Ω, F, P) is called a Probability Space
I notes: all three elements are necessary
Jing Xu (RUC) Chapter 01
Random Variable and Distribution
Definition (Borel set): Let Ω = R. The Borel σ-algebra is the
σ-algebra generated by all closed intervals contained in R.
Every set A in the Borel σ-algebra is called a Borel set
I almost every point set on R you can imagine is a Borel set (an
open interval, a single point, some isolated points, and so on)
Definition (Random Variable): Given a probability space
(Ω, F, P), a random variable X (defined on this space) is a
mapping from Ω to the set of real numbers R: X : ω → X (ω),
satisfying: For each Borel set B ∈ B(R), X −1 (B) ∈ F.
I comparison: a probability measure maps sets in a σ-algebra to
numbers in [0, 1], while a random variable maps points in a
sample space to numbers in R
Jing Xu (RUC) Chapter 01
Distribution
Definition (Distribution): A random variable X induces a
measure on the real line R, denoted by µX (), satisfying
µX (B) = P(ω : X (ω) ∈ B) for every Borel set B of R
I remark: what is the density function of a random variable? If
there exists a non-negative function f (x), such that for every
Borel set B ∈ B(R), we have
Z
µX (B) = f (x)dx
B
then f (x) is called the density function of random variable X
Jing Xu (RUC) Chapter 01
Expectation of Random Variables
Definition (Expectation): Given a probability space (Ω, F, P)
and a random variable
R X , the expectation of X is defined as a
Lebesgue integral Ω X (ω)dP(ω)
R
Construction of Lebesgue integral Ω X (ω)dP(ω) (assume
that X (ω) ≥ 0 for all ω for the moment):
I let 0 = y0 < y1 < ... < yk < yk+1 < ... be a partition of R +
I let Ak = {ω : yk ≤ X (ω) < yk+1 }, then Ak ∈ F (Why?).
Therefore, P(Ak ) is defined. We construct the lower Lebesgue
sum
∞
X
LSΠ− = yk P(Ak )
k=1
I the limit of LSΠ− when the size of the partition Π approaches 0
is defined as the value of the Lebesgue integral
Jing Xu (RUC) Chapter 01
More on Lebesgue Integral
Remarks:
I 1: What if X (ω) can assume either positive or negative values?
We can separate its positive and negative part apart
X = X+ − X−
where X + = max{X , 0}, X − = max{−X , 0}. Then we can
define
Z Z Z
X (ω)dP(ω) = +
X (ω)dP(ω) − X − (ω)dP(ω)
Ω Ω Ω
provided that at least one of them is finite
I 2: We don’t need to consider the upper Lebesgue sum
∞
X
LSΠ+ = yk+1 P(Ak )
k=1
I 3: Use the definition to calculate the expectation of a random
variable on the coin-tossing space
Jing Xu (RUC) Chapter 01
Properties of Expectations
The usual properties of expectation continue to hold with the
new definition
I Linearity: E [aX + bY ] = aE [X ] + bE [Y ]
I Comparability: if X ≤ Y a.s., then E [X ] ≤ E [Y ]
I Jensen’s inequality: if ϕ(x) is a convex function, and
|E [X ]| < ∞, then ϕ(E [X ]) ≤ E [ϕ(X )]. The inequality is
reversed if ϕ(x) is concave (pay attention to the implication of
this inequality for risk aversion and acceptability of gamble)
I Cauchy’s inequality: (E [XY ])2 ≤ E [X 2 ]E [Y 2 ]
I If X has a density function f (x), then the expectation
R can be
computed through evaluating the Riemann integral R xf (x)dx
Jing Xu (RUC) Chapter 01
Convergence of Functions
Let {fn (x) : n ≥ 0} be a sequence of functions, and f (x) be a
function, defined on a common domain X
Given a point y ∈ X , if limn→∞ fn (y ) = f (y ), then we say the
sequence {fn : n ≥ 0} converges to f at the point y
If the sequence {fn (x) : n ≥ 0} converges to f (x) at every
point x ∈ X , then we say {fn : n ≥ 0} converges to f
everywhere on X
Almost everywhere...
Jing Xu (RUC) Chapter 01
Convergence of Integrals
When a sequence of functions fn (x) convergesRto a limit
function f (x) almost everywhere,
R the integral Ω fn (x)dx does
not necessarily converge to Ω f (x)dx
I An example: let fn (x) be the density function of normal
distribution N(0, n1 ). Then, limn→∞ fn (x) = 0 almost
everywhere. Does the integral converge as well?
Two convergence theorems
I Monotone ConvergenceRTheorem: if fn (x) converges R to f (x) in
a monotonic way, then Ω fn (x)dx converges to Ω f (x)dx
I Dominated Convergence Theorem: if there is an integrable
function g (x) such that |fnR(x)| ≤ g (x) almost
R everywhere,
then fn (x) → f (x) implies Ω fn (x)dx → Ω f (x)dx
I Remark: These two theorems are about sufficient conditions,
not necessary conditions. The above example does not satisfy
the conditions of either theorems
Jing Xu (RUC) Chapter 01
Convergence of Expectations
When a sequence of random variables Xn converges to a limit
random variable X almost surely, the expectation E [Xn ] does
not necessarily converge to E [X ]
Two convergence theorems
I Monotone Convergence Theorem: if Xn converges to X in a
monotone way, then E [Xn ] → E [X ]
I Dominated Convergence Theorem: if there is an integrable
random variable Y such that |Xn | ≤ Y almost surely, then
Xn → X a.s. implies E [Xn ] → E [X ]
I Again, these two theorems are about sufficient conditions, not
necessary conditions
Jing Xu (RUC) Chapter 01
Change of Measure
Since a probability measure is no more than a rule of assigning
numbers, we can assign new numbers to the same events as
we wish, as long as the new assignment satisfies proper
conditions
One of the most efficient ways: Changing measure through a
non-negative random variable Z with E [Z ] = 1, according to
the following “algorithm”
Z Z
P̃(A) = Z (ω)dP(ω) = Z (ω)1A (ω)dP(ω)
A Ω
I remark 1: P̃(A) can be interpreted as the average value of Z
on the set A, under probability measure P
I remark 2: If Z > 0 almost surely, then P̃ and P are equivalent
probability measures in the sense that P̃(A) = 0 if and only if
P(A) = 0 (if Z = 0 with positive P measure, this relation is no
longer true)
Jing Xu (RUC) Chapter 01
Change of Measure
Task: verify P̃ is indeed a probability measure
Assume we define a new measure P̃ by
Z Z
P̃(A) = Z (ω)dP(ω) = Z (ω)1A (ω)dP(ω)
A Ω
then how can we calculate expectations under the new
measure P̃?
R
I Note that P̃(A) = Ẽ [1A ] = Ω
1A (ω)d P̃(ω). Writing the above
formula in differential form
d P̃(ω) = Z (ω)dP(ω)
I According to the definition of expectation, for a random
variable X , we have
Z Z
Ẽ [X ] = X (ω)d P̃(ω) = X (ω)Z (ω)dP(ω) = E [XZ ]
Ω Ω
Jing Xu (RUC) Chapter 01
Radon-Nikodym Theorem
Definition (Radon-Nikodym Derivative):
R If we define a new
probability measure P̃ by P̃(A) = A Z (ω)dP(ω), then Z is
called the Radon-Nikodym derivative of P̃ with respect to P,
and we formally write
d P̃
Z=
dP
Theorem (Radon-Nikodym): Let P and P̃ be equivalent
probability measures defined on a probability space (Ω, F).
Then there exists an almost surely positive random variable Z
such that E [Z ] = 1 and
Z
P̃(A) = Z (ω)dP(ω)
A
for every A ∈ F.
Jing Xu (RUC) Chapter 01
An Example of Change-of-Measure
Suppose we have a probability space (Ω, F, P), and X is a
standard normal random variable on this space. Fix a constant
θ > 0, clearly, Y = X + θ is not a standard normal random
variable. However, we can construct a new measure P̃, such
that Y is a standard normal random variable under P̃
1 2
I we define Z = e −θX − 2 θ and use Z as the Radon-Nikodym
derivative to induce a new measure P̃
I it can be shown that
Z y
1 1 2
P̃(Y ≤ y ) = √ e − 2 y dy
−∞ 2π
Jing Xu (RUC) Chapter 01
After-Class Work
Read Chapter 1 of the textbook
Jing Xu (RUC) Chapter 01