ST206 or 202 Lecture Notes (MT)
ST206 or 202 Lecture Notes (MT)
1: Probability and
Distribution Theory
Tay Meshkinyar
Dr. Milt Mavrakakis
2021-2022
Contents
1 Introduction 4
2 Probability 5
2.1 Week 1: Lecture 1 . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.1 A Pair of Dice . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.2 [a bit of] Measure Theory . . . . . . . . . . . . . . . . . . 5
2.2 Week 1: Lecture 2 . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2.1 The Probability Measure . . . . . . . . . . . . . . . . . . 7
2.2.2 More Properties of Probability Measures . . . . . . . . . . 8
2.2.3 Sample Problems . . . . . . . . . . . . . . . . . . . . . . . 10
2.3 Week 2: Lecture 1 . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3.1 Discrete Tools . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3.2 Conditional Probability . . . . . . . . . . . . . . . . . . . 12
2.3.3 Bayes’ Rule . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.3.4 The Law of Total Probability . . . . . . . . . . . . . . . . 13
2.4 Week 2: Lecture 2 . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.4.1 Independence . . . . . . . . . . . . . . . . . . . . . . . . . 14
1
0 ⧸ Contents
4 Multivariate Distributions 46
4.1 Week 8: Lecture 1 . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.1.1 Joint CDFs and PDFs . . . . . . . . . . . . . . . . . . . . 46
4.2 Week 8: Lecture 2 . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.2.1 Bivariate Density . . . . . . . . . . . . . . . . . . . . . . . 48
4.2.2 Multiple Random Variables . . . . . . . . . . . . . . . . . 49
4.2.3 Covariance and Correlation . . . . . . . . . . . . . . . . . 50
4.3 Week 9: Lecture 1 . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.3.1 Joint Moments . . . . . . . . . . . . . . . . . . . . . . . . 52
4.3.2 Joint MGFs . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.3.3 Joint CGFs . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.4 Week 9: Lecture 2 . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.4.1 Independent Random Variables . . . . . . . . . . . . . . . 55
4.4.2 Random Vectors & Random Matrices . . . . . . . . . . . 56
4.4.3 Transformations of Random Variables . . . . . . . . . . . 58
4.5 Week 10: Lecture 1 . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.5.1 Sums of Random Variables . . . . . . . . . . . . . . . . . 58
4.5.2 Multivariate Normal Distributions . . . . . . . . . . . . . 61
5 Conditional Distributions 63
5.1 Week 10: Lecture 2 . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5.1.1 Another Deck of Cards . . . . . . . . . . . . . . . . . . . . 63
5.1.2 Conditional Mass and Density . . . . . . . . . . . . . . . 63
5.2 Week 11: Lecture 1 . . . . . . . . . . . . . . . . . . . . . . . . . . 66
5.2.1 Conditional Expectation . . . . . . . . . . . . . . . . . . . 66
2
0 ⧸ Contents
Conclusion 73
3
Chapter 1
Introduction
4
Chapter 2
Probability
Example 2.1.1. Roll two dice. The probability sum is > 10. There are three
favourable outcomes:
(5, 6), (6, 5), (6, 6).
ii. m(∅) = 0,
S∞ P∞
iii. if A1 , A2 , . . . ∈ G are disjoint, then m ( i=1 Ai ) = i=1 m(Ai ).
5
Week 1: Lecture 2 2 ⧸ Probability
i. ∅ ∈ G,
ii. if A ∈ G then Ac ∈ G,
iii. if A1 , A2 , A3 , . . . ∈ G then
∞
[
Ai = A1 ∪ A2 ∪ A3 ∪ . . . ∈ G.
i=1
Example 2.1.7. Let ψ be a set. The set {∅, ψ} is the smallest σ-algebra of ψ.
Suppose that |ψ| > 1. Let A ⊂ ψ. Then {∅, A, Ac , ψ} is the smallest non-
trivial σ-algebra.
⋄
Example 2.1.8. The σ-algebra G = {A : A ⊆ ψ} = P(ψ) is the power set of
ψ. Hence, if
ψ = {ω1 , ω2 , . . . , ωk },
i.) m(A) ≥ 0? ✓
ii.) m(∅) = 0? ✓
6
Week 1: Lecture 2 2 ⧸ Probability
Definition 2.2.1. Consider the measurable space (Ω, F). Define (Ω, F, P )
as a probability space. The function P is a probability measure that
satisfies P (A) ∈ [0, 1] for all A ∈ F and P (Ω) = 1.
Since P is a measure,
• P (∅) = 0,
(A ∩ B)c = Ac ∪ B c .
So yes! We do.
∞ ∞
!
[ X
P Ai = P (Ai ).
i=1 i=1
i. P (Ac ) = 1 − P (A)
7
Week 1: Lecture 2 2 ⧸ Probability
n
! n n
[ X X
P Ai = P (Ai ) − P (Ai ∩ Aj )
i=1 i=1 i,j=1, i<j
X
+ P (Ai ∩ Aj ∩ Ak )
i<j<k
− ...
+ (−1)n+1 P (A1 ∩ A2 ∩ · · · ∩ An ).
Proof. Define
B 1 = A1
B 2 = A2 \ B 1
B3 = A3 \ (B1 ∪ B2 )
..
.
Bi = Ai \ (B1 ∪ · · · ∪ Bi−1 ).
S∞
Then B1 , B2 , B3 , · · · ∈ F (confirm this!) They are disjoint, and i=1 Bi =
S∞
i=1 Ai . So,
∞ ∞ ∞ ∞
! !
[ [ X X
P Ai = P Bi = P (Bi ) ≤ P (Ai ).
i=1 i=1 i=1 i=1
8
Week 1: Lecture 2 2 ⧸ Probability
A4
A3
A2
A1
Proof. Define
B 1 = A1
B 2 = A2 \ A1
..
.
Bi = Ai \ Ai−1
..
.
Sn
Note that these events are mutually exclusive, and so An = i=1 Bi .
S∞ S∞
Moreover, i=1 Bi = i=1 Ai . Hence,
n
!
[
lim P (An ) = lim P Bi
n→∞ n→∞
i=1
n
X
= lim P (Bi )
i=1
∞
!
[
=P Bi
i=1
∞
!
[
=P Ai .
i=1
We will use P (A) = |A||Ω| , for A ∈ F, with assumptions that the each event is
equally likely, and that the sample space is finite.
9
Week 2: Lecture 1 2 ⧸ Probability
2) Birthdays: 100 people in this lecture. What is the probability that at least
two share a birthday?
Note. Read how the multiplication rule applies to permutations and combina-
tions.
Permutations
We can find the number of possible permutations of size k using the multiplica-
tion rule:
n(n − 1) · · · 1
n × (n − 1) × · · · × (n − k + 1) =
|{z} | {z } | {z } (n − k)(n − k − 1) · · · 1
1st choice 2nd kth
n!
=
(n − k)!
n
= Pk .
Combinations
10
Week 2: Lecture 1 2 ⧸ Probability
Hence,
n n!
Pk =n Ck × k! ⇒n Ck =n Pk = .
(n − k)!k!
Again, why? Think of it this way. Suppose there are n slots. We can put an
object of type I or II in each slot. The order doesn’t matter, so there are nk
aj bn−j
11
Week 2: Lecture 1 2 ⧸ Probability
P (A ∩ B)
PB (A) = P (A | B) = .
P (B)
|A|
Note that if P (A) = |Ω| , then
|A ∩ B|/|Ω| |A ∩ B|
P (A | B) = = .
|B|/|Ω| |B|
So,
P (Ac | B) = 1 − P (A | B).
P (A ∩ B)
P (A | B) =
P (B)
P (A ∩ B)
=
P (B)
P (B | A)P (A)
=
P (B)
P (B | A)
= P (A) .
P (B)
12
Week 2: Lecture 2 2 ⧸ Probability
2
fraternal (BB , GG , BG , GB ).
3 1
4
1
4
1 1
4 4
13
Week 2: Lecture 2 2 ⧸ Probability
M
1/4
F
2/3
3/4 Mc
T
M
0.012 1/2
1/3
I
1/2 Mc
0.988
Tc
By multiplying along the paths of each event, we can obtain the probabilities
of the events I, F, M, and P (F | M ).
1
P (I) = P (I | T )P (T ) = × 0.012 = 0.004
3
2
P (F ) = P (F | T )P (T ) = × 0.012 = 0.008
3
1 2 1 1
P (M ) = × × 0.012 + × × 0.012 = 0.004
4 3 2 3
1
P (M | F )P (F ) × 0.008 1
P (F | M ) = = 4 = .
P (M ) 0.004 2
2.4.1 Independence
Let A, B ∈ Ω. If A and B are independent, then
P (A ∩ B)
P (A|B) = P (A) ⇒ = P (A),
P (B)
which, in turn implies our definition of independence:
(i.) if A ⊥
⊥ B and P (B) > 0, then P (A | B) = P (A)
14
Week 2: Lecture 2 2 ⧸ Probability
(ii.) if A ⊥
⊥ B, then Ac ⊥ ⊥ B c , and Ac ⊥
⊥ B, A ⊥ ⊥ Bc.
15
Chapter 3
Ω
BB
0
BG
1
GB
2
GG
16
Week 3: Lecture 1 3 ⧸ Random Variables & Univariate Distributions
P (X = 1) = P ({ω ∈ Ω : X(ω) = 1}
2 1
= P ({BG, GB}) = =
4 2
P (X > 0) = P ({ω ∈ Ω : X(ω) > 0})
3
= P ({BG, GB, GG}) = .
4
⋄
17
Week 3: Lecture 1 3 ⧸ Random Variables & Univariate Distributions
P (A1 ) = P (X ≤ 1) = FX (1).
⋄
Example 3.2.6. In our two child example,
1 3 3 3
FX (0) = , FX (1) = , FX (2) = 1, FX (−1) = 0, FX = ,...
4 4 2 4
Moreover, note that the CDF in this case is a step function, as seen in the figure
below.
FX (x)
3
4
1
4
x
-1 1 2 3
Figure 3.1: The Cumulative Distribution Function of the two child example.
18
Week 3: Lecture 2 3 ⧸ Random Variables & Univariate Distributions
Ax 1 ⊇ AX 2 ⊇ · · ·
and Ax ⊇ Axn . So
∞
\
Ax = Ax n .
n=1
Then
= P (Ax )
= FX (X)
⇒ lim FX (x + h) = FX (x).
h↓0
Observe that
• P (X > x) = 1 − P (X ≤ x) = 1 − FX (x)
• P (X < x) = FX (x−)
• P (X = x) = FX (x) − FX (x−).
19
Week 3: Lecture 2 3 ⧸ Random Variables & Univariate Distributions
Notation.
(ii) X ∼ Fx (a CDF)
Example 3.3.2. Recall the prior two child example. Our discrete random vari-
able was in the form of a step function. ⋄
In our example: fX (0) = 1/4, fX (1) = 1/2, fX (2) = 1/4 fX (x) = 0 for all
other x. Hence, {0, 1, 2} is the support.
20
Week 3: Lecture 2 3 ⧸ Random Variables & Univariate Distributions
Proposition 3.3.5. For valid PMF fX (x) and valid CDF FX (x),
P (X = x) = P (X ≤ x) − P (X < x).
Example 3.3.6. For our previous example, where X is the number of girls,
X ∼ Bin 2, 12 .
⋄
Bernoulli Distribution
Geometric Distribution
21
Week 4: Lecture 1 3 ⧸ Random Variables & Univariate Distributions
Same setup as Geometric, but we stop when we obtain the rth success for some
given r ∈ Z+ .
Let X : number of trials required to obtain r successes. Then X ∼ NegBin(r, p),
and
fX (x) = pr (1 − p)x−r for x = r, r + 1, r + 2, . . .
• X∗ = X − r
22
Week 4: Lecture 1 3 ⧸ Random Variables & Univariate Distributions
n!
fX (x) = lim px (1 − p)n−x
n→∞, p→0, np=λ x!(n − x)!
n−x
λx
n! λ
= lim 1 −
n→∞ x!(n − x)! nx n
n −x x
n(n − 1) · · · (n − x + 1) λ λ λ
= lim 1− 1−
n→∞ n × n··· × n n n x!
x
λ
= 1 × e−λ × 1 ×
x!
e−λ λx
= , x = 0, 1, 2, . . .
x!
which is the PMF of the Poisson distribution. Hence, X ∼ Poisson(λ).
P (X = x∗ )
P (Y = y) = 0
∀y ∈ R
x y
x∗
23
Week 4: Lecture 1 3 ⧸ Random Variables & Univariate Distributions
i. fX (x) = d
dx FX (x)
′
= FX (x)
iii. R fX (x) dx = 1.
R
v. For any B ⊆ R, Z
P (X ∈ B) = fX (x) dx.
B
h
P (c ≤ X ≤ c + h) = .
b−a
24
Week 4: Lecture 1 3 ⧸ Random Variables & Univariate Distributions
fY (y)
y
0 a c c+h b
⋄
Example 3.4.7. Let X be the number of email arrivals in an hour, and suppose
X ∼ Poisson(λ). Note that we can scale this, with X(t) = Poisson(tλ.) Let Y
be the time of the first arrival. Note that
FY (y) = P (Y ≤ y)
= 1 − P (Y > y)
= 1 − P (X(y) = 0)
e−λy (λy)0
=1−
0!
= 1 − eλy , y ≥ 0.
d
fY (y) = FY (y)
dy
d
= (1 − e−λy )
dy
= λ−λy , y ≥ 0;
25
Week 4: Lecture 2 3 ⧸ Random Variables & Univariate Distributions
fY (y)
y
0
Exponential Distribution
Note that θ = λ1 .
Normal Distributions
26
Week 4: Lecture 2 3 ⧸ Random Variables & Univariate Distributions
Remark 3.5.4. The normal CDF has no closed form. It can be written as an
infinite sum, but it cannot be written in a finite number of operations.
Remark 3.5.5. If Z ∼ N (0, 1) we write Φ(z) for FZ (z). The Φ function is the
CDF for the standard normal.
Gamma Distribution
fX (x)
α=3
α=1
x
0
27
Week 4: Lecture 2 3 ⧸ Random Variables & Univariate Distributions
as long as X
|g(x)|fX (x) < ∞,
x
and similarly when X is continuous.
Example 3.5.12. For a random variable and a0 , a1 , a2 , · · · ∈ R,
28
Week 5: Lecture 1 3 ⧸ Random Variables & Univariate Distributions
Observe that
P (x − µ)2 f (x)
X (discrete)
σ = R x
2
(x − µ)2 fX (x) dx
R
(continuous).
i. Var(X) ≥ 0,
29
Week 5: Lecture 1 3 ⧸ Random Variables & Univariate Distributions
fY (x)
y
0 a
Figure 3.6: The Markov Inequality shows that the shaded area (the survival
)
function of X evaluated at a) is always less than or equal to E(Y
a .
Var(x)
P (|X − E(x)| ≥ a) ≤ ,
a2
for any a > 0.
30
Week 5: Lecture 1 3 ⧸ Random Variables & Univariate Distributions
E(g(x)) ≥ g(E(x)).
31
Week 5: Lecture 1 3 ⧸ Random Variables & Univariate Distributions
E(aX + b) = aE(X) + b.
⋄
Example 3.6.8. Note that
E(X 2 ) ≥ (E(X))2 .
⋄
Example 3.6.9. If Y > 0,
1 1
E ≥ .
Y E(Y )
⋄
3.6.3 Moments
Definition 3.6.10. The rth moment of a random variable X is
µ′1 = E(X),
µ1 = 0,
µ2 = Var(X) = E(X 2 ) − E(X)2
⇒ µ2 = µ′2 − (µ′1 )2 .
32
Week 5: Lecture 2 3 ⧸ Random Variables & Univariate Distributions
⋄
Example 3.6.13. Let X ∼ Exp(λ). Then
µ′r = E(X r )
Z
= xr fX (x) dx
ZR∞
= xr λe−λx dx
0
Z ∞
d
= xr (−e−λx ) dx
0 dx
Z ∞
−λx ∞
rxr−1 (−e−λx ) dx
r
= x (−e ) 0 −
0
r ∞ r−1 −λx
Z
= x λe dx
λ 0
r
= µ′r−1 .
λ
Observe that
r ′
µ′r = µ
λ r−1
r r−1 ′
= µr−2
λ λ
= ...
r r−1 1
= · · · µ′0
λ λ λ
r!
= r.
λ
So E(X) = λ1 , E(X 2 ) = λ2 ,
2
and so on. Further note that
2
2 1 1
Var(X) = 2 − = 2.
λ λ λ
⋄
33
Week 5: Lecture 2 3 ⧸ Random Variables & Univariate Distributions
where we require that MX (t) < ∞ for all t ∈ [−h, h] for some h > 0 (a
neighborhood of 0).
And so
MX (t) = E(etX )
(tX)2
= E 1 + tX + + ···
2!
∞
X (tX)j
= E
j=0
j!
∞ j
X t
= E(X j )
j=0
j!
t2 ′ t3
= 1 + tµ′1 + µ2 + µ′3 + · · ·
2! 3!
µ′r E(X r )
The coefficient of tr is r! = r! .
Proof.
(r) dr
MX (t) = µX (t)
dtr
dr t2 t3
= r 1 + tµ′1 + µ′2 + µ′3 + · · ·
dt 2! 3!
2
t
= µ′r + tµ′r+1 + µ′r+2 + · · ·
2!
34
Week 5: Lecture 2 3 ⧸ Random Variables & Univariate Distributions
This implies
(r)
µX (0) = µ′r = E(X r ).
Proof. Omitted.
MX (t) = E(etX )
X
= etx fX (x)
x
∞
X e−λ λx
= etx
x=0
x!
∞
X e−λ (λet )x
=
x=0
x!
∞ t
t X e−λe (λet )x
= eλe e−λ
x=0
x!
t
= eλ(e −1)
for t ∈ R
t
= exp(λ(e − 1))
= exp(λ(et − 1))
t2 t3
= exp(λ(1 + t + + + ··· − 1
2 6
2
t2 λ2 (t + t2 + · · · )2
= 1 + λ(t + + · · · ) + + ···
2 2
2 2 2
λt λ t
= 1 + λt + + + ···
2 2
λt2 t2
= 1 + λt + + (λ + λ2 ) + · · · .
2 2
From this, E(X) = λ, and E(X 2 ) = λ + λ2 . Moreover,
Var(X) = λ + λ2 − λ2 = λ.
Or,
′
MX = exp(λ(et − 1))λet ⇒ µ′1 = MX
′
(0) = λ.
35
Week 5: Lecture 2 3 ⧸ Random Variables & Univariate Distributions
MY (t) = E(etY )
Z ∞
λα α−1 −λy
= etY y e dy
0 Γ(α)
Z ∞
λα (λ − t)α
= dy
(λ − t)α 0 Γ(α)y α−1 e−(λ−t)y
α
λ
=
λ−t
−α
t
= 1− , for |t| < λ.
λ
⋄
−α
t
MY (t) = 1 −
λ
∞
X j + α − 1 t j
=
j=0
α−1 λ
tj (j+α−1)! −j
So, for example, the coefficient of j! is (α−1)! λ . Then
(1 + α − 1)! −1
E(Y ) = λ
α − 1)!
α! 1
=
(α − 1)! λ
α
= .
λ
We can write
κ2 2 κ3 3
KX (t) = κ1 t + t + t + ···
2! 3!
tr
The rth cumulant, κr , is the coefficient of r! in the power series expansion
of KX (t) about 0.
Example 3.7.8. Let X ∼ Poisson(λ). Then
36
Week 7: Lecture 1 3 ⧸ Random Variables & Univariate Distributions
KX (t) = ln MX (t)
= ln(exp(λ(et − 1)))
= λ(et − 1)
t2 t3
= λt + λ + λ + ··· .
2 3!
So, κ1 = κ2 = κ3 = · · · = λ. ⋄
i. κ1 = µ′1 = E(X)
Proof.
i. Observe that
ii.
′
′ ′′ ′
(t))2
′′ MX (t) MX (t)MX (t) − (MX
KX (t) = =
MX (t) (MX (t))2
′′
⇒ κ2 = KX (0) = µ′2 − (µ′1 )2 .
37
Week 7: Lecture 1 3 ⧸ Random Variables & Univariate Distributions
Y = g(X), E(g(X))
FY (y) = P (Y ≤ y)
= P (g(X) ≤ y) ̸= P (X ≤ g −1 (y)).
g −1 ({4}) = {−2, 2}
g −1 ([0, 1]) = [−1, 1]
P (Y ∈ B) = P (g(X) ∈ B)
= P ({ω ∈ Ω : g(X(ω)) ∈ B})
= P ({ω ∈ Ω : X(ω) ∈ g −1 (B)})
= P (X ∈ g −1 (B))
⋄
Remark 3.9.4. Note that
FY (y) = P (Y ≤ y)
= P (Y ∈ (−∞, y])
= P (X ∈ g −1 ((−∞, y])
X
(discrete)
x:g(x)≤y
= Z
(continuous).
fX (x) dx
x:g(x)≤y
38
Week 7: Lecture 1 3 ⧸ Random Variables & Univariate Distributions
Further,
X
fY (y) = . . . = fX (x).
x:g(x)=y
FY (y) = P (Y ≤ y)
= P (X 2 ≤ y)
√ √
= P (− y ≤ X ≤ y
√ √
= FX ( y) − FX (− y)
d
⇒ fY (y) = FY (y)
dy
√ √
1
2 y (fX ( y) + fX (− y),
√ y≥0
=
0, y < 0.
1 −(√y)2
1
fy (y) = √ √ e 2 + ...
2 y 2π
1 −1 −y
= √ y 2 e 2 , y ≥ 0.
2π
(1/2)2 1 −1 − 1 y
= √ y2 e 2 .
π
√
Note that π = Γ( 12 ). Hence, Y ∼ Γ 21 , 12 .
⋄
Monotonicity
39
Week 7: Lecture 1 3 ⧸ Random Variables & Univariate Distributions
y = g(x)
d
y0
x
0 a g −1 (y0 ) b
y0
d
y = g(x)
x
0 a g −1 (y0 ) b
40
Week 7: Lecture 2 3 ⧸ Random Variables & Univariate Distributions
so
d −1
fY (y) = fX (g −1 (y)) g (y)
dy
1
= fX (log y)
y
1
= fX (log y) , y ≥ 0.
y
If we define y = g(x), x = g −1 (y), we can write
dx
fY (y) = fX (x)
dy
41
Week 7: Lecture 2 3 ⧸ Random Variables & Univariate Distributions
dx
fY (y) = fX (x) .
dy
Note that
y−µ
y = µ + σx ⇐⇒ x = ,
σ
so
y−µ 1
fY (y) = fX .
σ σ
What about the MGF/CGF?
Definition 3.10.1. A sequence converges (xn ) → x if, for all ε > 0 there
exists some N ∈ N such that |xn − x| < ε for all n ≥ N .
Say we have a sequence of random variables (Xn ). What does it mean to say
that (Xn ) “converges"?
Convergence in...
42
Week 7: Lecture 2 3 ⧸ Random Variables & Univariate Distributions
And thus,
then we want P (A) = 1. Now consider Ac . There exists ε > 0 where for every n
we can find m ≥ n with |Xm (ω) − X(ω)| > ε. Equivalently: There are infinitely
many m with |Xm (ω) − X(ω)| > ε.
If
An = |Xn − 0| > ε
43
Week 7: Lecture 2 3 ⧸ Random Variables & Univariate Distributions
S∞
Note that m=n Em occurs when at least one Em (m ≥ n) occurs.
0.
= lim P (Bn )
n→∞
∞
!
[
= lim P
n→∞
m=n
∞
X
≤ lim P (Em ).
n→∞
m=n
as long as S∞ < ∞.
For a coin, the probability of tails is P (Em ) = 1/2m . Then
∞
X ∞
X
P (Em ) = 1/2m = 1 < ∞
m=1 m=1
44
Week 7: Lecture 2 3 ⧸ Random Variables & Univariate Distributions
converges.
45
Chapter 4
Multivariate Distributions
Bivariate CDFs
Note that
Moreover,
FX,Y (−∞, y) = 0 = FX,Y (x, −∞)
Similarly,
FX,Y (∞, ∞) = 1.
46
Week 8: Lecture 1 4 ⧸ Multivariate Distributions
Lastly,
= P (X ≤ x, Y ≤ ∞)
= P (X ≤ x)
= FX (x),
x↓y→ 0 1 2 fX (x)
0 0.713 0.133 0.004 0.850
1 0.133 0.012 0 0.145
2 0.004 0 0 0.004
fY (y) 0.85 0.145 0.004 1
It follows that XX
fX,Y (x, y) = 1.
x y
and that X
fX (x) = fX,Y (x, y).
y
⋄
for all x, y ∈ R.
47
Week 8: Lecture 2 4 ⧸ Multivariate Distributions
So
∂2
fX,Y (x, y) = FX,Y (x, y).
∂x∂y
Now, we have
Z
fX,Y (x, y) dx dy = 1,
R2
and Z Z
fX (x) = fX,Y (x, y) dy, fY (y) = fX,Y (x, y) dx,
R R
and Z Z
P ((X, Y ) ∈ B) = fX,Y (x, y) dx dy.
B
so
fX (0) = fX,Y (0, 0) + fX,Y (0, 1) + fX,Y (0, 2) + · · · .
48
Week 8: Lecture 2 4 ⧸ Multivariate Distributions
x
0 1
Figure 4.2: The support of Example 4.2.1. The support of a bivariate PDF is
an interval in R2 . Finding the limits of integration for each axis can be tricky.
Z ∞ Z ∞ Z 1 Z y Z 1 Z 1
fX,Y (x, y) dxdy = 8xy dx dy = 8xy dy dx.
−∞ −∞ 0 0 0 x
Note that Z Z 1
fX (x) = fX,Y (x, y) dy = 8xy dy.
R x
⋄
P g(x)f (x)
X (discrete)
E(g(x)) = R ∞x
−∞
g(x)fX (x) dx (continuous).
X1 , X2 , . . . , Xn
.
365
Or the maximum, or the median, etc. These are all functions g : R → Rn .
If X1 , X2 , . . . , Xn are random variables, and g : Rn → R is a well-behaved
49
Week 9: Lecture 1 4 ⧸ Multivariate Distributions
function.
E(g(X1 , X2 , . . . , Xn )) =
P · · · P g(x , . . . , x )f (x , . . . x )
x1 xn 1 n 1 n (discrete)
R · · · R g(x1 , . . . , xn )fX ,...,X (x1 , . . . , xn ) dx1 · · · dxn
R R 1 n
(continuous).
⋄
In previous example:
Z
E(X + 2Y ) = xfX,Y (x, y) dx dy
R2
Z 1 Z y
= (x + 2y)8xy dx dy.
0 0
Z
E(X + 2Y ) = xfX,Y (x, y) dx dy
R2
Z
=2 yfX,Y (x, y) dx dy.
R2
• Cov(aX, aY ) = abCov(X, Y )
• Cov(X + c, Y + d) = Cov(X, Y )
• Cov(X, X) = Var(X)
Cov(X, Y )
Corr(X, Y ) = p = ρ,
Var(X) Var(Y )
with −1 ≤ ρ ≤ 1.
50
Week 9: Lecture 1 4 ⧸ Multivariate Distributions
−1 ≤ Corr(X, Y ) ≤ 1,
0 ≤ Var(Z)
= Var(Y − rX)
= Var(Y ) + Var(−rX) + 2 Cov(Y, −rX)
= Var(Y ) + r2 Var(X) − 2r Cov(X, Y ).
∆ = b2 − 4ac
= (−2 Cov(X, Y ))2 − 4 Var(X) Var(Y )
= 4(Cov(X, Y )2 − Var(X) Var(Y )).
Since 0 ≤ h(r), h(r) has at most one root. Then ∆ ≤ 0, and hence
Thus,
!2
Cov(X, Y )
p ≤ 1,
Var(X) Var(Y )
which implies that −1 ≤ Corr(X, Y ) ≤ 1. If ∆ = 0, or Corr(X, Y )2 = 1,
then h(r) has a double root, i.e., h(r∗ ) = 0 for some r∗ ∈ R. Moreover,
h(r∗ ) = 0 ⇐⇒ Var(Y − r∗ X) = 0,
so
Y − r∗ X = k ⇐⇒ Y = r∗ X + k.
Cov(X,Y )
We can show that r∗ = − 2a
b
= Var(X) .
51
Week 9: Lecture 1 4 ⧸ Multivariate Distributions
Cov(X, Y ) = Cov(X, rX + k)
= r Cov(X, X)
= r Var(X)
= Var(Y )
= Var(rX + k)
= r2 Var(X).
So
r Var(X)
Corr(X, Y ) = p
Var(X)r2 Var(X)
r
=√
r2
r
=
|r|
1, if r > 0
=
−1, if r < 0.
µ′1,0 = E(X)
µ′r,0 = E(X r )
µ′0,3 = E(Y 3 ).
⋄
Example 4.3.5. Note that
Cov(X, Y ) µ1,1
Corr(X, Y ) = p =√ .
Var(X) Var(Y ) µ2,0 µ0,2
52
Week 9: Lecture 1 4 ⧸ Multivariate Distributions
⋄
Example 4.3.6. Let
x + y, 0 ≤ x, y ≤ 1
fX,Y (x, y) =
0, otherwise.
Then
µ′r,s = E(X r Y s )
Z
= xr y s fX,Y (x, y) dx dy
R2
Z 1 Z 1
= xr y s (x + y) dx dy
0 0
Z 1 Z 1
= (xr+1 y s + xr y s+1 ) dx dy
0 0
= ...
(r,s) ∂ r+s
MX,Y (0, 0) = MX,Y (t, u)
∂tr ∂us t=0, u=0
= µ′r,s
= E(X r Y s ).
53
Week 9: Lecture 2 4 ⧸ Multivariate Distributions
∂ ∂
MX,Y (t, u) µ′1,0 + µ′1,1 u + · · ·
KX,Y (t, u) = ∂t =
∂t MX,Y (t, u) MX,Y (t, u)
∂
∂2 MX,Y (t, u)
⇒ KX,Y (t, u) = ∂t
∂u ∂t MX,Y (t, u)
µ′1,0 + µ′1,1 u + · · · (µ′1,0 + µ′1,1 u + · · · )(µ′0,1 + · · · )
= −
MX,Y (t, u) (MX,Y (t, u))2
Thus,
(1,1)
κ1,1 = KX,Y (0, 0) = µ′1,1 − µ′1,0 µ′0,1
= E(XY ) − E(X)E(Y )
= Cov(X, Y ).
54
Week 9: Lecture 2 4 ⧸ Multivariate Distributions
fX,Y (x, y) = P (X = x, Y = y)
= P (X = x)P (Y = y)
= fX (x)fY (y).
= E(XY )
= E(X)E(Y ).
Hence, X ⊥
⊥ Y ⇒ X, Y are uncorrelated, i.e., Cov(X, Y ) = 0.
Proposition 4.4.2. If X ⊥
⊥ Y and g, h : R → R are well-behaved functions,
then g(X) ⊥
⊥ h(Y ) and E(g(X)h(Y )) = E(g(X))E(h(Y )).
55
Week 9: Lecture 2 4 ⧸ Multivariate Distributions
and thus,
fX,Y (x, y) ̸= fX (x)fY (y),
so X ̸⊥
⊥Y. ⋄
Example 4.4.5. Let
kxy, 0<x<y<1
fX,Y (x, y) =
0, otherwise.
Two functions that don’t have the same support cannot be the same function.
Hence, X ̸⊥
⊥ Y because of the support. ⋄
Notation. We write that X1 , X2 , . . . , Xn are independent iff {X1 ≤ x1 }, . . . , {Xn ≤
xn } are mutually independent. Hence,
u
Y
FX1 ,...,Xn (x1 , . . . , xn ) = FXi (xi ).
i=1
Also
E(X1 X2 · · · Xn ) = E(X1 ) · · · E(Xn ).
X1
X2
..
X=
.
Xn
56
Week 9: Lecture 2 4 ⧸ Multivariate Distributions
And similarly for fX (x) and Mx (t). The expectation of a random vector X is
given by
E(X1 )
.
E(X) = .
. ,
E(Xn )
and the expectation of a random matrix W is given by
E(W1,1 ) . . . E(W1,n )
.. .. ..
E(W) = . . .
E(Wm,1 ) . . . E(Wm,n )
What is the variance of a random vector?
Then
bT Ab ≥ 0.
57
Week 10: Lecture 1 4 ⧸ Multivariate Distributions
dx
fY (y) = fX (x) ,
dy
where x = g −1 (y).
X = g1 (U, V )
Y = g2 (U, V ),
58
Week 10: Lecture 1 4 ⧸ Multivariate Distributions
fZ (z) = P (Z = z)
= P (X + Y = z)
X
= P (X = u, Y = z − u)
u
X
= fX,Y (u, z − u).
u
Z = X + Y, U = X ⇐⇒ X = U, Y = Z − U.
Let
(Z, U ) = ∂(X, Y ), (X, Y ) = h(U, Z).
Then
∂x ∂x
Jh (x, y) = ∂u ∂z
∂y ∂y
∂u ∂z
1 0
= .
−1 1
Then
fU,Z (u, z) = fX,Y (u, z − u)|Jh | = 1,
59
Week 10: Lecture 1 4 ⧸ Multivariate Distributions
Remark 4.5.3. If X ⊥
⊥ Y, then
P f (u)f (z − u)
X Y (discrete),
fZ (z) = R u
fX (u)fY (z − u) du
R
(continuous).
Hence,
fZ = fX ∗ fY = fY ∗ fX .
Example 4.5.4.
⋄
Example 4.5.5. Let
X ∼ Exp(λ), Y ∼ Exp(θ), X⊥
⊥ Y, Z = X + Y
Observe that
Z
fZ (z) = fX (u)fY (z − u) du
ZRz
= λe−λu θe−λ(z−u) du
0
z
1 −(λθ)u
= λθe−θz − e
λ−θ 0
λθ −θz −(λ−θ)z
= e (1 − e
λ−θ
λθ(e−θz − e−λz )
= , for z > 0, λ ̸= 0
λ−θ
⋄
60
Week 10: Lecture 1 4 ⧸ Multivariate Distributions
X1 , X2 , . . . , Xn .
Pn
Let S = i=1 Xi . Suppose that (Xi ) are mutually independent. Then
n
Y
fS = fX1 ∗ fX2 ∗ . . . fXn , MS (t) = MXi (t).
i=1
⋄
If X1 , . . . , Xn are IID (identically distributed), then
n
Y
MS (t) = MXi (t) = (MX1 (t))n ,
i=1
and thus,
p
Var(Y ) = ρ2 Var(U ) + ( 1 − ρ2 )2 Var(V )
= ρ2 + 1 − ρ2 = 1,
61
Week 10: Lecture 1 4 ⧸ Multivariate Distributions
Thus,
Cov(X, Y )
Corr(X, Y ) = p =ρ
Var(X) Var(Y )
which is Normal, as U ⊥
⊥ V.
We have
Then
1 2
KX,Y (t, u) = (s + 2ρst + t2 ).
2
⋄
X ∗ = µX + σX X, Y ∗ = µY + σY Y.
62
Chapter 5
Conditional Distributions
63
Week 10: Lecture 2 5 ⧸ Conditional Distributions
fX,Y (x, y)
fY |X (y | x) = .
fX (x)
Recall that
fX (x) = 4x(1 − x2 ), 0 < x < 1.
Then
8xy 2y
fY |X (y | x) = 2
= , x < y < 1.
4x(1 − x ) 1 − x2
Furthermore,
Z y
FY |X (y | x) = fY |X (u | x) du
−∞
Z y
2u
= du
x 1 − x2
y 2 − x2
= , x < y < 1.
1 − x2
Plug in y = x to check if this is plausible.
Recall. P (A ∩ B ∩ C) = P (A | B ∩ C)P (B | C)P (C). Similarly,
64
Week 10: Lecture 2 5 ⧸ Conditional Distributions
and
fX,Y,Z (x, y, z) = fZ|X,Y (z | x, y)fY |X (y | x)fX (x).
A Simple Model
e−λ λx
x y
fX,Y (x, y) = fY |X (y | x)fX (x) = p (1 − p)x−y .
y x!
supported by x, y = 0, 1, 2, . . . , y ≤ x. Then
X
fY (y) = fX,Y (x, y)
x
∞
X x! e−λ λx
= py (1 − p)x−y
x=y
y!(x − y)! x!
∞
e−λ py X (1 − p)x−y λx
= .
y! x=y (x − y)!
Let z = x − y. Then
∞
e−λ py y X (1 − p)z λz
fY (y) = λ
y! z=0
z!
e−λ (λpy ) λ(1−p)
= e
y!
e−λp (λp)y
=
y!
−λp
e (λp)y
= , y = 0, 1, 2, . . .
y!
Moreover, Z X
fX,Y (x, y) dy = 1.
R x
65
Week 11: Lecture 1 5 ⧸ Conditional Distributions
Insurance Example
Then
(Z | Y = y) ∼ some continuous model.
(Y | X = x) ∼ Bin(x, p),
so E(Y | X = x) = xp ⇒ E(Xp).
66
Week 11: Lecture 1 5 ⧸ Conditional Distributions
Proof.
= E(Y ).
Example 5.2.4 (More Hurricanes). E(Y ) = E[E(Y | X)] = E(Xp) = λp. Then
X ∼ Poisson(λ), (Y | X = x) ∼ Bin(x, p). ⋄
Find E(Y | X) :
Z Z ∞
fX,Y (x, y) dy = xe−xy e−x dy
R 0
= e−x , x > 0.
67
Week 11: Lecture 1 5 ⧸ Conditional Distributions
• E((aX + b) | Y ) = aE(X | Y ) + b,
68
Week 11: Lecture 2 5 ⧸ Conditional Distributions
Proof. We have
Hurricanes Again
Definition 5.3.1. If
MY |X (u | X) = E[euY | X].
Then
MY |X (y | x) = (1 − p + peu ) ⇒ MY |X (u | X) = (1 − p + peu )X .
69
Week 11: Lecture 2 5 ⧸ Conditional Distributions
So
= MX (ln(1 − p + peu ))
u
= exp(λ(eln(1−p+pe )
− 1)
u
−1)
= eλp(e ,
so Y ∼ Poisson(λp).
t
Remark 5.3.3. Aside: MX (t) = eλ(e −1)
.
Thus,
X : height of a student,
W : male or female (male = 0, female = 1).
Then
2
(X | W = 1) ∼ Normal(µW , σW )
2
(X | W = 0) ∼ Normal(µM , σM )
W ∼ Bernoulli(p).
Moreover,
X
fX (x) = fX|W (x | w)fW (w)
w
| {z }
fX,W (x,w)
Note that
1 (x−µw )2
fX|W (x | 1) = √ e− 2σ 2 w .
2πσ 2 w
70
Week 11: Lecture 2 5 ⧸ Conditional Distributions
fX (x)
X|w=1
X
X|w=0
x
0 µW µM
⋄
Example 5.3.5 (Household Insurance).
Then
N
X
Y = Xi , a random sum.
i=1
n
!
X
E(Y | N = n) = E Xi )
i=1
= nE(X1 ).
So
E(Y ) = E[E(Y | N )]
= E[N E(X1 )]
= E(X1 )E(N ).
71
Week 11: Lecture 2 5 ⧸ Conditional Distributions
Moreover,
n
!
X
Var(Y | N = n) = Var Xi
i=1
= n Var(X1 ).
Now, how do we iterate variances? We use the Law of Iterated Variance:
= (MX1 (u))n .
Finally,
MY (u) = E[NY |N (u | N )]
= E[(MX1 (u))N ]
= E[exp(N ln MX1 (u)]
= Mn (log MX1 (u)).
This implies
KY (u) = KN (KX1 (u)).
Back to the insurance example. Note that X1 , X2 , . . . ∼ Exp(λ), and N ∼
Geo(p). We have
11 1
E(Y ) = E(N )E(X1 ) = = .
pλ λp
Then
MY (u) = MN (ln(MX1 (u))
u −1
= MN (ln(1 − )
λ
−1
1 1 u
= 1− + 1−
p p λ
−1
1 1 u
= 1− + −
p p λp
u
= 1− ,
λp
so Y ∼ Exp(λp). ⋄
72
Conclusion
Any issues with the lecture notes can be reported on the git repository, by
either submitting a pull request or an issue. I am happy to fix any typos or
inaccuracies with the content. In addition, feel free to edit my work, just keep
my name on it if you’re going to publish it somewhere else. The figures can
be edited with Inkscape, the software I used to create them. When editing the
figures, make sure to save to pdf, and choose the option that exports the text
directly to LATEX. I hope these notes helped!
73