TAMS65 - Lecture 2
Point estimation -
Methods to get point estimate/estimator
Zhenxia Liu
Matematisk statistik
Matematiska institutionen
Content
I Review of Lecture 1
I Introduction
I Method of Moments -MM
I Least Square Method - LSM
I Maximum Likelihood Method - ML
I Appendix
TAMS65 - Lecture2 1/30
Review of Lecture 1
I Population(population) X with unknown parameter θ.
Goal: To estimate θ.
I Random sample(slumpmässigt stickprov) X1 , . . . , Xn ,
observations(observationer) x1 , . . . , xn
I Point estimator(stickprovsvariabeln/skattningsvariabel)
Θ
b = f (X1 , . . . , Xn )
I Point estimate(punktskattning) θ̂ = f (x1 , . . . , xn )
I Unbiased(Väntevärdesriktighet)
I More effective(effektivare)
I Consistent(konsistent)
TAMS65 - Lecture2 2/30
Review of Lecture 1
1 Pn
Population mean µ ≈ µ̂ = x̄ = n i=1 xi Sample mean
Population variance σ 2 ≈ σ̂ 2
Pn
1 2
n i=1 (xi − µ)
if µ is known(känt),
2
σ̂ =
1 Pn
2 2
s = n−1 i=1 (xi − x̄) if µ is unknown(okänt)
s 2 is called sample variance.
√
s= s 2 is called sample standard deviation.
√
Population standard deviation σ ≈ σ̂ = σ̂ 2
TAMS65 - Lecture2 3/30
Introduction
Several methods to find point estimate/estimator.
I The Method of Moments (momentmetoden)
I Least Square Method( minsta-kvadrat-metoden)
I Maximum Likelihood Method
(maximum-likelihood-metoden)
Skill:
Applying these methods = Applying pdf/pmf, expectation,
variance or related property.
TAMS65 - Lecture2 4/30
The Method of Moments
The population X with unknown parameter θ, estimate θ.
Random sample X1 , ..., Xn , observations x1 , ..., xn .
The Method of Moments (momentmetoden) - MM:
Population moment = Sample moment
1st moment E (X ) = x̄ = n1 ni=1 xi
P
1 Pn
2nd moment E (X 2 ) = n
2
i=1 xi
..
. Pn
1
kth moment E (X k ) = n
k
i=1 xi
Solve for θ, then get θ̂MM or θ̂.
Note: The number of equations depends on the number of
unknown parameters.
TAMS65 - Lecture2 5/30
Example 1
Example 1: Assume that x1 , ..., xm are observations of
independent random variables X1 , ..., Xm , where Xi ∼ Bin(n, p).
Find point estimate(punktskattning) p̂MM of p by using the
Method of Moments.
Binomial distribution(Binomialfördelning) X ∼ Bin(n, p) :
I p = Population proportion
I E(X ) = np
x̄
, where x̄ = m1 m
P
p̂MM = i=1 xi .
n
( Exercise: Prove that p̂MM is unbiased.)
TAMS65 - Lecture2 6/30
Least Square Method
The population X with unknown parameter θ, estimate θ.
Random sample X1 , ..., Xn , observations x1 , ..., xn .
Least Square Method(minsta-kvadrat-skattningen) - LSM
Find θ which minimize
n
X
Q(θ) = [xi − E(Xi )]2 .
i=1
Then get θ̂LSM or θ̂.
Note: θ would be multiple dimensional.
TAMS65 - Lecture2 7/30
Example 2
Example 2: Suppose that the distribution of a population X has
the probability density function(täthetsfunktionen)
x
(
1 − 2a
2a e if x ≥ 0;
fX (x) =
0 otherwise,
where a > 0 is an unknown parameter.
Observations x1 , x2 , . . . , xn are given.
Find a point estimate(punktskattning) of a using Least Square
Method.
Exponential distribution(Exponentialfördelning)
I X ∼ Exp( µ1 )
− µx
with pdf fX (x) = µ1 e for x ≥ 0. E(X ) = µ
x̄
âLSM = 2 ( Exercise: Prove that âLSM is unbiased.)
TAMS65 - Lecture2 8/30
Preparations to Maximum Likelihood Method
Summation
n
X n
X
xi = x1 + x2 + . . . + xn , c = nc
i=1 i=1
n
X n
X
cxi = cx1 + cx2 + . . . + cxn = c(x1 + x2 + . . . + xn ) = c xi
i=1 i=1
Product
n
Y n
Y
xi = x1 · x2 · . . . · xn , c = cn
i=1 i=1
n
Y n
Y
n n
(cxi ) = (cx1 ) · (cx2 ) · . . . · (cxn ) = c · x1 · x2 · . . . · xn = c xi
i=1 i=1
TAMS65 - Lecture2 9/30
Preparations to Maximum Likelihood Method
ln(a · b) = ln a + ln b
a
ln = ln a − ln b
b
ln ac = c ln a
ln e c = c
e ln c = c
TAMS65 - Lecture2 10/30
Maximum Likelihood Method
Maximum Likelihood
Method(Maximum-Likelihood-Metoden) - ML
Let x1 , ..., xn are observations of independent r.v.s X1 , ..., Xn with
p(x; θ) := pX (x), pmf - discrete r.v.
f (x; θ) := fX (x), pdf - continuous r.v.
The Likelihood function(likelihoodfunktionen) L(θ) is defi-
ned as following
Qn
i=1 p(xi ; θ) = p(x1 ; θ) · ... · p(xn ; θ), discrete r.v.
L(θ) =
Qn
i=1 f (xi ; θ) = f (x1 ; θ) · ... · f (xn ; θ), continuous r.v.
Find θ which can maximize L(θ), then get θ̂ML or θ̂.
TAMS65 - Lecture2 11/30
Maximum Likelihood Method
Observations: (−0.5, 0, 0.3, 0.5, 0.7, 0.8, 0.95, 1.15, 1.25, 1.30, 1.6, 1.9, 2.7, 3.5).
When θ changes from θ1 to θ2 , we get a ”new” probability density funtion. ML
chooses the pdf which makes L(θ) as large as possible.
TAMS65 - Lecture2 12/30
Maximum Likelihood Method
Note:
I In general, it’s easier to maximize ln L(θ).
I If there are observations x1 , . . . , xn and y1 , . . . , ym from
independent r.v.s Xi , i = 1, . . . , n and Yj , j = 1, . . . , m,
respectively, where Xi and Yj have different distributions but
both distributions contain same unknown parameter θ, then
L(θ) = L1 (θ) · L2 (θ).
I The parameter θ can be multidimensional.
TAMS65 - Lecture2 13/30
Example 3
Example 3: During a ” short ” geological period, it may be
reasonable to assume that the times between successive eruptions
for a volcano are independent and exponentially distributed with
an expected value µ that is characteristic of the individual volcano.
The table below shows the times in months between 36 successive
eruptions for the volcano Mauna Loa in Hawaii 1832-1950.
126 73 3 6 37 23
73 23 2 65 94 51
26 21 6 68 16 20
6 18 6 41 40 18
41 11 12 38 77 61
26 3 38 50 91 12
According to the data, estimate µ by Maximum Likelihood
Method.
TAMS65 - Lecture2 14/30
Example 3
According to the data, estimate µ by Maximum Likelihood
Method.
Model: Let X be the time between two successive eruptions, then
X ∼ Exp( µ1 ) with the pdf fX (x) = µ1 e −x/µ , for x ≥ 0.
x1 +x2 +···+x36
µ̂ML = x̄ = 36 = 36.72
TAMS65 - Lecture2 15/30
Example 4
Example 4: The following data are 40 observations from a Poisson
distribution with a parameter λ (which is the mean):
Value 0 1 2 3 4
Frequency 21 0 11 6 2
Find the Maximum-Likelihood estimate for λ by given information.
Poisson distribution(Poissonfördelning) X ∼ Po(λ) :
I pX (k) = λk −λ
k! e for k = 0, 1, . . .
Note that: there are 21+0+11+6+2=40 observations in total.
λ̂ML = x̄ = 1.2.
TAMS65 - Lecture2 16/30
Maximum Likelihood Method - Normal distribution
We have observations x1 , . . . , xn from independent r.v.s
X1 , . . . , Xn , where Xi ∼ N(µ, σ).
Normal distribution(normalfördelning X ∼ N(µ, σ) :
(x−µ)2
−
I fX (x) = √1 e 2σ 2 , x ∈ R.
σ 2π
Case 1: σ is known and µ is unknown. Then µ̂ML = x̄.
The proof is given in Appendix.
Case 2: µ is known and σ is unknown. Then
n
2 1X
σ̂ML = (xi − µ)2 Exercises
n
i=1
TAMS65 - Lecture2 17/30
Maximum Likelihood Method - Normal distribution
Case 3: Both µ and σ are unknown.
(
µ̂ML = n1 ni=1 xi = x̄
P
(unbiased);
2 = 1
Pn 2
σ̂ML n i=1 (xi − x̄) (biased).
The proof is given in Appendix.
1 Pn n−1 2
Note that: E(Σ̂2ML ) =E (Xi − X̄ )2 = σ 6= σ 2
n i=1 n
which is NOT unbiased.
n
So we make an adjustment by choosing σ̂ 2 since it is
n − 1 ML
unbiased. That is,
n n 1
Pn 1 Pn
σ̂ 2 = 2
σ̂ML = · i=1 (xi − x̄)2 = (xi − x̄)2 = s 2 .
n−1 n−1 n n − 1 i=1
TAMS65 - Lecture2 18/30
Maximum Likelihood Method - Corrected/Adjusted
Corrected/Adjusted(korrigerade) point estimate of σ 2 is sample
variance:
n
2 1 X
s = (xi − x̄)2 .
n−1
i=1
A sample from Normal distribution, we use the following
point estimates
µ̂ = x̄,
n
2 2 1 X
σ̂ = s = (xi − x̄)2 ,
n−1
i=1
where both µ and σ 2 are unknown.
TAMS65 - Lecture2 19/30
Maximum Likelihood Method - More samples from Normal
distributions
Now suppose we have two samples from independent Normal
distributions
x1 , . . . , xn1 , where X1 , . . . , Xn1 are independent and N(µ1 , σ)
y1 , . . . , yn2 , where Y1 , . . . , Yn2 are independent and N(µ2 , σ)
The estimates of the three parameters can be deduced by applying
the following Likelihood function
L(µ1 , µ2 , σ 2 ) = L1 (µ1 , σ 2 ) · L2 (µ2 , σ 2 )
(xi −µ1 )2 (yi −µ2 )2
1 1
√ e − 2σ2 √ e − 2σ2
Qn1 Qn 2
= i=1 i=1
σ 2π σ 2π
TAMS65 - Lecture2 20/30
Maximum Likelihood Method - More samples from Normal
distributions
Then µ̂1 = x̄, µ̂2 = ȳ
The corrected/adjusted point estimate of σ 2
(n1 − 1)s12 + (n2 − 1)s22
σ̂ 2 = s 2 = ,
(n1 − 1) + (n2 − 1)
where
s12 = n11−1 ni=1 1 Pn2
(xi − x̄)2 and s22 = − ȳ )2
P 1
n2 −1 i=1 (yi
which are sample variances from the respective samples.
Here s 2 is called Combined/Pooled sample variance.
Note: This result can also be generalized to more samples.
TAMS65 - Lecture2 21/30
MM, LSM, ML
MM versus LSM versus ML
I MM - The Method of Moments
I Simple
I Consistent point estimate/estimator
I Usually, biased.
I LSM - Least Square Method
I Idea is good
I Linear regression
I ML - Maximum Likelihood Method
I Take into account of more samples
I Usually, lower variance than other methods
TAMS65 - Lecture2 22/30
Practice after the lecture:
Exercises - Lesson 2:
(I) 11.23, PS-1, 11.10, 11.14, 11.12, 11.15.
(II)11.13(a), 11.11, 11.16, 11.28, 11.22, 11.25.
Prepare your questions to the Lessons.
Thank you!
TAMS65 - Lecture2 23/30
Appendix
Solution to Example 1
By the Method of Moments, we have that E(X ) = x̄ which gives
m
1 X x̄
np = x̄ = xi , that is p= .
m n
i=1
So
x̄
p̂MM = ,
n
1 Pm
where x̄ = m i=1 xi .
TAMS65 - Lecture2 24/30
Appendix
Solution to Example 2
R∞ x
1 − 2a
Note that E(Xi ) = E (X ) = 0 x· 2a e dx = 2a.
Pn Pn
Then Q(a) = i=1 [xi − E(Xi )]2 = i=1 (xi − 2a)2 .
Pn
Let Q 0 (a) = 2 i=1 (xi − 2a)(−2) = 0, then a = x̄2 .
Q 00 (a) = 8n > 0, so we get minimum value of Q(a).
That is, âLSM = x̄2 .
TAMS65 - Lecture2 25/30
Appendix
Solution to Example 3
Model: Let X be the time between two successive eruptions, then
X ∼ Exp( µ1 ) with the pdf f (x) = µ1 e −x/µ , for x ≥ 0.
We have 36 observations x1 , . . . , x36 , where n = 36.
By the Maximum Likelihood Method,
Qn Qn 1 −xi /µ 1 − µ1 Pni=1 xi
L(µ) = i=1 f (xi ; µ) = i=1 e = e
µ µn
1 Pn
ln L(µ) = −n ln µ − xi
µ i=1
d(ln L(µ)) n 1 Pn
=− + 2 i=1 xi = 0 gives µ = x̄ Max?
dµ µ µ
d 2 (ln L(µ)) n 2 Pn n 2 n
= 2 − 3 i=1 xi = 2 − 3 nx̄ = ... = − 2 < 0 ⇒
dµ2 µ µ µ=x̄ x̄ x̄ x̄
i.e. max.
⇒ µ̂ML = x̄ = x1 +x2 +···+x
36
36
= 36.72.
TAMS65 - Lecture2 26/30
Appendix
Solution to Example 4
The likelihood function is
Pn
Qn xi xi
L(λ) = i=1 λxi ! e −λ = λQn i=1xi ! e −nλ
i=1
Pn Qn
ln L(λ) = i=1 xi ln λ − ln ( i=1 xi !) − nλ
Pn
d(ln L(λ)) i=1 xi
dλ = λ −n
Pn
xi
Let d(lndλL(λ))
= i=1 λ − n = 0, then we have λ = x̄.
2
Pn
d (ln L(λ)) x
dλ2 = − λi=12 i < 0, so maximum.
Then the Maximum-Likelihood estimate for λ is
21 · 0 + ... + 4 · 2
λ̂ML = x̄ = = 1.2.
40
TAMS65 - Lecture2 27/30
Appendix
We have observations x1 , . . . , xn from independent r.v.s
X1 , . . . , Xn , where Xi ∼ N(µ, σ).
Case 1: σ is known and µ is unknown. Then µ̂ML = x̄.
1 (x−µ)2
f (x) = √ e − 2σ2
σ 2π
1 (xi −µ)2 1 Pn
− 12 2
√ e − 2σ2 = 2 i=1 (xi −µ)
Qn
L(µ) = i=1 n/2
e 2σ
σ 2π (σ 2π)
L(µ) gets maximum when the function ni=1 (xi − µ)2 gets
P
minimum, i.e. µ̂ML = x̄ (same as MM, LSM).
TAMS65 - Lecture2 28/30
Appendix
Case 3: Both µ and σ are unknown.
The Likelihood function
h 1 2 2
i h 1 2 2
i
L(µ, σ) = √ e −(x1 −µ) /2σ · . . . · √ e −(xn −µ) /2σ
σ 2π σ 2π
1 n 1 Pn 2
−n − 2σ2 (x −µ)
= √ σ e i=1 i
.
2π
Then we get
n
1 X
ln L(µ, σ) = konstant − n ln σ − (xi − µ)2 .
2σ 2
i=1
TAMS65 - Lecture2 29/30
Appendix
n n
!
∂(ln L(µ, σ)) 1 X 1 X
=− 2 2(xi − µ)(−1) = 2 xi − nµ
∂µ 2σ σ
i=1 i=1
n
∂(ln L(µ, σ)) n 1 X
=− + 3 (xi − µ)2
∂σ σ σ
i=1
∂l 1 Pn
∂µ =0 µ̂ML = n i=1 xi = x̄ (unbiased)
ger Pn
2 = 1
∂l σ̂ML i=1 (xi − x̄)2 (biased)
∂σ =0 n
TAMS65 - Lecture2 30/30
Thank you!