Review Probability
1 / 43
Fundamental Assumption In this Course
Data Comes From Random Sampling
Process
Randomness is a modeling assumption for something we don’t
understand (for example, errors in measurements)
2 / 43
Statistical Inference vs Probability
−1
Statistical Inference = Probability
In Probability In Statistical Inference
For a specified probability For a specified set of data, what
distribution, what are the are properties of the
properties of data from this distribution(s)
distribution?
Example
Example i.i.d
X1 , X2 . . . ∼ N (θ, 1) for some
X1 , X2 · · · ∼ N (0, 1) which are θ
independent (called i.i.d). Observe that X1 = 0.134,
are P (X
What 1 > 0.5), X2 = −1, . . . .
P X1 +X32 +X3 > 2? What is θ?
3 / 43
In this session
▶ review probability distribution and the mean, variance
▶ study joint probability distribution to explore simultaneous
outcomes of two (or more) quantities such as the amount of
precipitate P and volume V of gas released from a
controlled chemical experiment, the hardness H and tensile
strength T of cold-drawn copper.
▶ Explore the statistically independent concept between
random variables, which is an important property of
random sampling
▶ Study the moment generating function, a useful tool to
discovery the distribution of some important statistics such
as sample mean
4 / 43
Random Variables
Joint Distribution
Joint Distribution
Expectation and Covariance
Two important multivariate distribution
Moment Generating Function
5 / 43
Discrete Random Variable
A discrete random variable X can take a finite or countably
infinite number of possible values. The distribution of X is
specified by its probability mass function (p.m.f):
fX (x) = P (X = k)
For any set A of values that X can take
X
P (X ∈ A) = fX (x)
A
6 / 43
Continuous Random Variable
A continuous random variable X takes values in R and models
continuous real-valued data . The distribution of X is specified
by its probability density function (p.d.f) fX (x), which
satisfies for any set A ⊂ R
Z
P (X ∈ A) = fX (x)dx
x∈A
7 / 43
Cumulative Distribution Function (c.d.f)
The distribution of X can also be specified by its cumulative
distribution function
FX (x) = P (X ≤ x)
Discrete Case Continuous Case
Z x
FX (x) =
X
P (X = u) FX (x) = fX (u)du
−∞
u≤x
Inversely
Inversely
d
−
fX (x) = FX (x)
fX (x) = FX (x) − FX (x ) dx
8 / 43
Expectation
Expectation or mean or average value of g(X) for some function
g and random variable X is
(P
x g(x)fX (x) if X is a discrete random variable
E[g(X)] = R
R g(x)fX (x) if X is a discrete random variable
Mean of X
(P
x xfX (x) if X is a discrete random variable
µX = E(X) = R
R xfX (x) if X is a discrete random variable
Variance of X
(P
x (x − E(X))2 fX (x) in discrete case
V ar(X) = E[(X−E(X))]2 R 2
R (x − E(X)) fX (x) in continuous cas
9 / 43
Properties of Mean and Variance
For any real numbers a, b and variable X, X1 , ..., Xn
1. Linear property of mean
E(aX + b) = aE(X) + b
E(X1 + · · · + Xn ) = E(X1 ) + · · · + E(Xn )
2.
V ar(aX + b) = a2 V ar(X)
p
3. V ar(X) is called then standard deviation of X
10 / 43
Random Variables
Joint Distribution
Joint Distribution
Expectation and Covariance
Two important multivariate distribution
Moment Generating Function
11 / 43
Example
Toss a fair coin three times. Let X be the number of heads in
the first two tosses and Y be the total number of heads in three
tosses.
Outcome HHH HHT HT H HT T T HH T HT TTH TTT
Value of X 2 2 1 1 1 1 0 0
Value of Y 3 2 2 1 2 1 1 0
Probability 1/8 1/8 1/8 1/8 1/8 1/8 1/8 1/8
▶ The pair value of X and Y with respect to the outcome HHH is
2 and 3. We denote by
(X, Y )(HHH) = (2, 3)
▶ (X, Y ) = (1, 2) that is X = 1 and Y = 2 if and only if the
outcomes is HT H or T HH. So (X, Y ) = (1, 2) is considered as
an event of {HT H, T HH}. Then
P ((X, Y ) = (1, 2)) = P ({HT H, T HH}) = P (HT H)+P (T HH) = ...
12 / 43
Summarize all possible pair value of (X, Y ) and the
corresponding probability
y
0 1 2 3
1 1
0 8 8 0 0
1 1
1 0 4 4 0
x 1 1
2 0 0 8 8
3 0 0 0 0
From the above table, we can evaluate any probabilty
concerning on X and Y such as
P (X ≤ 2, Y = 2) =P (X = 0, Y = 2) + P (X = 1, Y = 2)
+ P (X = 2, Y = 2) = . . .
P (X = 1) =P (X = 1, Y = 1) + P (X = 1, Y = 2) = . . .
13 / 43
Joint Probability Mass Function (join p.m.f)
The function f (x, y) is a joint probability mass function of the
discrete random variables X and Y if
▶ f (x, y) ≥ 0
▶ XX
f (x, y) = 1
x y
▶
f (x, y) = P (X = x, Y = y)
For any A ⊂ R2 , we have
X X
P ((X, Y ) ∈ A) = P (X = x, Y = y) = f (x, y)
(x,y)∈A (x,y)∈A
14 / 43
Marginal Probability Mass Function
▶ The marginal probability mass function of X is
X
fX (x) = f (x, y)
y
▶ The marginal probability mass function of Y is
X
fY (x) = f (x, y)
x
15 / 43
Example
y p.m.f
0 1 2 3 of X
1 1
0 8 8 0 0 P (X = 0) = 81 + 81 + 0 + 0 = 14
1 1
1 0 4 4 0 P (X = 1) = 0 + 41 + 41 + 0 = 12
x 1 1
2 0 0 8 8 P (X = 2) = 0 + 0 + 81 + 18 = 14
3 0 0 0 0 P (X = 3) = 0 + 0 + 0 + 0 = 0
p.m.f of Y 1 3 3 1
8 8 8 8 total = 1
P (Y = y)
16 / 43
Joint Probability Density Function (joint p.d.f)
The function f (x, y) is joint probability density function of the
continuous random variables X and Y if
▶ f (x, y) ≥ 0
▶ ZZ
f (x, y)dxdy = 1
R2
▶ For any A ⊂ R2 , we have
ZZ
P ((X, Y ) ∈ A) = f (x, y)dxdy
(x,y)∈A
17 / 43
Marginal Probability Density Function
▶ The marginal probability mass function of X is
Z ∞
fX (x) = f (x, y)dy
−∞
▶ The marginal probability mass function of Y is
Z ∞
fY (x) = f (x, y)dx
−∞
18 / 43
Example
Joint p.d.f of two continous random variables X and Y is
(
2
5 (2x + 3y) if 0 ≤ x ≤ 1, 0 ≤ y ≤ 1
f (x, y) =
0 otherwise
1. Compute P (0 < X < 0.25, 0.5 < Y < 0.75)
2. Find the marginal probability density function of X
19 / 43
Solution
1.
Z 0.25 Z 0.75
P (0 < X < 0.25, 0.5 < Y < 0.75) = f (x, y) dy dx
0 0.5
Z 0.25 Z 0.75
2
= (2x + 3y) dy dx
0 0.5 5
4
Z 0.25 Z 0.75 6
Z 0.25 Z 0.75
= x dy dx + y dy dx
5 0 0.5 5 0 0.5
| {z } | {z }
= xy|0.75
y=0.5 y2
0.75
= 2
y=0.5
4 0.25
Z 0.25
6 5
Z
= 0.25x dx + dx
5 0 5 0 32
= ...
20 / 43
2. If x ≤ 0 or x > 1 then f (x, y) = 0 for all y. Hence
Z ∞ Z ∞
fX (x) = f (x, y)dy = 0dy = 0
−∞ −∞
(
2
5 (2x + 3y) if 0 ≤ y ≤ 1
If 0 ≤ x ≤ 1 then f (x, y) = . So
0 otherwise
Z ∞ Z 1 1
2 2 3
fX (x) = f (x, y)dy = (2x + 3y)dy = 2xy + y 2
−∞ 0 5 5 2 y=0
4x + 3
=
5
Thus the marginal p.d.f of X is
(
4x+3
5 if 0 ≤ x ≤ 1
fX (x) =
0 otherwise
21 / 43
Exercise
Roll a fair dice twice. Let X and Y be the face number of the
first and second roll.
1. Find the joint probability mass function of X and Y
2. Determine the marginal probability mass function of X
and of Y
3. Explore the relationship between
▶ P (X = x|Y = y) and P (X = x)
▶ P (X = x, Y = y) and P (X = x)P (Y = y)
21 / 43
Statistically Independent
Let X and Y be two random variables, discrete or continuous,
with joint probability distribution f (x, y) and marginal
distributions f( x) and fY (y), respectively. The random variables
X and Y are said to be statistically independent if and only if
f (x, y) = fX (x)fY (y)
for all (x, y) within their range.
22 / 43
Expectation of g(X, Y )
(P
g(x, y)fX,Y (x, y)
x,y in discrete case
E[g(X, Y )] = RR
R2 g(x, y)fX,Y (x, y) in continuous case
Covariance
Cov(X, Y ) = E [(X − E(X))(Y − E(Y ))] = E(XY ) − E(X)E(Y )
Correlation
The correlation between X and Y is their covariance
normalized by the product of their standard deviations:
Cov(X)
corr(X, Y ) = q p
V ar(X) V ar(Y )
23 / 43
Example
Compute the covariance and correlation of two discrete random
variable X and Y with joint p.m.f
y
0 1 2 3
1 1
0 8 8 0 0
1 1
1 0 4 4 0
x 1 1
2 0 0 8 8
3 0 0 0 0
Solution
We have
E(XY ) = E[g(X, Y )] where g(x, y) = xy
So
1 1
X
E(XY ) = xyf (x, y) = (0)(0) +(0)(1) +· · ·+(3)(3)(0) = 2
x,y 8 8
24 / 43
The marginal p.m.f of Y
The marginal p.m.f of X
x 0 1 2
y 0 1 2 3
P (X = x) 1/4 1/2 1/4
P (Y = y) 1/8 3/8 3/8 1/8
So
So
X
E(X) = xP (X = x) = 1 X
E(Y ) = xP (X = x) = 3/2
x
x
2
E(X ) = 3/2
E(Y 2 ) = 3
2 2
V ar(X) = E(X ) − [E(X)] = 1/2
V ar(Y ) = E(Y 2 ) − [E(Y )]2 = 3/4
Covariance
Cov(X, Y ) = E(XY ) − E(X)E(Y ) = 2 − (1)(3/2) = 1/2
Correlation
Cov(X, Y ) 1/2
corr = p p =p p ≈ 0.8165
V ar(X) V ar(Y ) 1/2 3/4
25 / 43
Properties of Expectation and Covariance
1. Symmetry
Cov(X, Y ) = Cov(Y, X)
2. Bilinear property
n X
X m
Cov(a1 X1 +· · ·+an Xn , b1 Y1 +· · ·+bm Ym ) = ai bj Cov(Xi , Yj )
i=1 j=1
3.
Cov(X, X) = V ar(X)
4. Variance of sum
n
X X
V ar(X1 + X2 + · · · + Xn ) = V ar(Xi ) + 2 Cov(Xi , Xj )
i=1 i<j
5. If X and Y are indepenent then
5.1 E[f (X)g(Y )] = E[f (X)]E[g(Y )]
5.2 Cov(X, Y ) = 0
6. If X1 , . . . , Xn are independent then
n
X
V ar(X1 + X2 + · · · + Xn ) = V ar(Xi )
26 / 43
Multinomial Distribution
The multivariate version of a Binomial is called a Multinomial.
Consider drawing a ball from an urn which has balls with k
different colors labeled “color 1, color 2, ..., color k.” Let
p = (p1 , ..., pk ) where pj ≥ 0 and pj = 1. Suppose pj is
P
probability to draw a ball of color j. Draw n times (independent
draws with replacement) and let X = (X1 , ..., Xk ) where Xj
is the number P
of times that color j appears.
Hence n = Xj . We say that X has a M ultinomial(n, p)
distribution written X ∼ M ultinomial(n, p). The joint
probability mass function is
!
n
f (x1 , . . . , xn ) = P (X1 = x1 , . . . , Xk = xk ) = px1 . . . pxk k
x1 . . . xk 1
n n!
where x1 ...xk = x1 !...xk !
Property
If X ∼ M ultinomial(n, p) then Xj ∼ Bernoulli(pj )
27 / 43
Standard Multivariate Normal Distribution
Z1
i.i.d
Let Z1 , . . . , Zk ∼ N (0, 1). The joint p.d.f of Z = . . . is
Zk
k
Y 1 − x2i 1 − 12 xT x
fZ1 ,...,Zk (x1 , . . . , xk ) = √ e 2 = k e
i=1 2π (2π) 2
x1
..
where x = . We say that Z has a standard multivariate
xk
Normal distribution written Z ∼ N (0, I) where 0 is a zero
column vector and I is a k × k identity matrix
28 / 43
Multivariate Normal Distribution
Let µ be a k × 1 column vector and Σ be a k × k symmetric,
definite
positive
matrix. A random vector
X1
X = . . . ∼ N (µ, Σ) if the joint p.d.f of X is
Xk
1 (x−µ)T ΣT (x−µ)
−
fX (x) = k 1 e
2 det(Σ)
(2π) 2 (det Σ) 2
Property
1. µi = E(Xi ) and Σij = Cov(Xi , Xj ). So we call µ be the
mean column and Σ be the covariance matrix
2. Xi ∼ N (µi , Σii )
3. a1 X1 + · · · + ak Xk has normal distribution for all numbers
a1 , . . . , ak
4. (X − µ)T Σ−1 (X − µ) ∼ χ2k
29 / 43
5. Conditional distribution of a component given any
information of other components also has normal
distribution
6. If Σ is diagonal then the components of X are statistically
independent
30 / 43
Random Variables
Joint Distribution
Joint Distribution
Expectation and Covariance
Two important multivariate distribution
Moment Generating Function
31 / 43
Moment Generating Function
Definition (Moment Generating Function (MGF))
The moment generating function of a random variable X is a
function of single argument t ∈ R defined by
MX (t) = E(etX )
Theorem
Let X and Y be two random variables such that, for some h > 0
and every t ∈ (−h, h), both MX (t) and MY (t) are finite and
MX (t) = MY (t). Then X and Y have the same distribution.
The reason why the MGF will be useful for us is because if X1 , ..., Xn
are independent, then the MGF of their sum satisfies
MX1 +...+Xn (t) = E etX1 × ... × E etXn = MX1 (t)...MXn (t)
This gives us a very simple tool to understand the distributions of
sums of independent random variables.
32 / 43
Example
Consider a standard normal random variable Z ∼ N (0, 1). The
MGF of Z is
h i Z ∞
tZ
MZ (t) = E e = etx fZ (x)dx
−∞
Z ∞
1 x2
= etx √ e− 2 dx
−∞ 2π
Z ∞
1 − x2 +tx
= √ e 2 dx
−∞ 2π
Z ∞
1 x2 −2tx
= √ e− 2σ2 dx
−∞ 2π
Z ∞
1 − x2 −2tx+t2 −t2 −
= √ e 2 dx
−∞ 2π
Z ∞
t2 1 (x−t)2 t2
=e 2 √ e− 2 dx = e 2
−∞ 2π
| {z }
p.d.f of N (t,1)
| {z }
=1
33 / 43
Example
Let X ∼ N (µ, σ 2 ) then X = µ + σZ the MGF of X is
h i h i
MX (t) = E et(µ+σZ) = eµt E eσtZ = eµt MZ (σt)
(σt)2 σ 2 t2
= eµt e 2 = eµt+ 2
34 / 43
Example
i.i.d
Let X1 , . . . , Xn ∼ N (µ, σ) then the MGF of X1 + · · · + Xn is
σ 2 t2 σ 2 t2
MX1 +···+Xn (t) = MX1 (t) . . . MXn (t) = eµt+ 2 . . . eµt+ 2
nσ 2 t2
= enµt+ 2
which is the MGF of normal distribution with mean nµ and
variance nσ 2 . So
X1 + · · · + Xn ∼ N (nµ, nσ 2 )
35 / 43
Example
Let X ∼ Ber(p) with p.m.f
P (X = x) = px (1 − p)1−x , x ∈ {0, 1}
The MGF of X is
MX (t) = E[etX ] = et×0 P (X = 0) + et×1 P (X = 1) = 1 − p + pet
or
MX (t) = pet + q
where q = 1 − p
36 / 43
Example
Let X ∼ Bin(n, p) with p.m.f
!
n k
P (X = k) = p (1 − p)n−k , for k = 0, 1, . . . , n
k
The MGF of X is
n n
!
tX
X
tk
X
tk n k
MX (t) = E[e ]= e P (X = k) = e p (1 − p)n−k
k=0 k=0
k
n
!
X n
= (pet )k q n−k with q = 1 − p
k=0
k
n
= pet + q
37 / 43
Exercise
i.i.d
Let X1 , . . . , Xn ∼ Ber(p).
1. Find the MGF of X1 + · · · + Xn
2. What is the distribution of X1 + · · · + Xn ?
38 / 43
Exercise
1. A random variable X has Poisson distribution with
parameter λ. It p.m.f is given by
λk
P (X = k) = e−λ , k = 0, 1, 2, 3, . . .
k!
Find the MGF of X
i.i.d
2. Let X1 , . . . , Xn ∼ P oisson(λ). Find the MGF of
X1 + · · · + Xn . What is the distribution of X1 + · · · + Xn ? 39 / 43
Exercise
Random variable X has exponential distribution E(λ). Its p.d.f
is (
λe−λx if x > 0
f (x) =
0 otherwise
Find the MGF of X
40 / 43
Exercise
X has gamma distribution with
parameters α and β, denoted
by X ∼ Gamma(α, β) if the
p.d.f of X is given by
1 x
α−1 − β
fX (x) = x e
β α Γ(α)
where Γ(α) = 0∞ y α−1 e−y dy.
R
The MGF of X is
α
1 1
MX (t) = , for t ≤ k = β1 , θ = α
1 − βt β
i.i.d
Let X1 , . . . , Xn ∼ E(λ). Find the MGF and distribution of
X1 + · · · + Xn
41 / 43
Statistics
For data X1 , ..., Xn , a statistic T (X1 , ..., Xn ) is any real-valued
function of the data. In other words, it is any number that you
can compute from the data.
Example
Sample mean
X1 + · · · + Xn
X̄ =
n
i.i.d
If X1 , . . . , Xn ∼ N (µ, σ) then
!
σ2
X̄ ∼ N µ,
n
and sample variance
1
S2 = (X1 − X̄)2 + · · · + (Xn − X̄)2 ∼ χ2n−1
n−2
42 / 43
Exercise
Rice Exercise 79 page 173
Rice Exercise 83 page 174
Rice Exercie 89 page 174
43 / 43