TAMS65 - Lecture 2
Point estimation -
Methods to get point estimate/estimator
Zhenxia Liu
Matematisk statistik
Matematiska institutionen
Content
▶ Review of Lecture 1
▶ Introduction of Lecture 2
▶ Method of Moments -MM
▶ Least Square Method - LSM
▶ Maximum Likelihood Method - ML
▶ Appendix
TAMS65 - Lecture2 1/31
Review of Lecture 1
▶ Population(population) X with unknown parameter θ.
Goal: To estimate θ.
▶ Random sample(slumpmässigt stickprov) X1 , . . . , Xn ,
observations(observationer) x1 , . . . , xn
▶ Point estimate(punktskattning) θ̂ = f (x1 , . . . , xn )
▶ Point estimator(stickprovsvariabeln/skattningsvariabel)
Θ
b = f (X1 , . . . , Xn )
▶ Unbiased(Väntevärdesriktighet)
▶ More effective(effektivare)
▶ Consistent(konsistent)
TAMS65 - Lecture2 2/31
Review of Lecture 1 Pn
1
Population mean µ ≈ µ̂ = x̄ = n i=1 xi Sample mean
Unbiased and Consistent
Population variance σ 2 ≈ σ̂ 2 Unbiased
Pn
1 2
n i=1 (xi − µ)
if µ is known(känt),
2
σ̂ =
2 1 Pn
s = n−1 i=1 (xi − x̄)2 if µ is unknown(okänt)
s 2 is√called sample variance
s = s 2 is called sample standard deviation.
√
Population standard deviation σ ≈ σ̂ = σ̂ 2
TAMS65 - Lecture2 3/31
Introduction of Lecture 2
Question: What if the unknown parameter is NOT population
mean/variance/standard deviation? e.g., population proportion p?
=⇒ We need to learn several methods to find point
estimate/estimator.
▶ The Method of Moments (momentmetoden)
▶ Least Square Method( minsta-kvadrat-metoden)
▶ Maximum Likelihood Method
(maximum-likelihood-metoden)
Skill: Applying these methods = Applying pdf/pmf, expectation of
given distribution.
TAMS65 - Lecture2 4/31
The Method of Moments
The population X with unknown parameter θ, estimate θ.
Random sample X1 , ..., Xn , observations x1 , ..., xn .
The Method of Moments (momentmetoden) - MM:
Population moment = Sample moment
1st moment E (X ) = x̄ = n1 ni=1 xi
P
1 Pn
2nd moment E (X 2 ) = n
2
i=1 xi
..
. Pn
1
kth moment E (X k ) = n
k
i=1 xi
Solve for θ, then get θ̂MM or θ̂.
Note: The number of equations depends on the number of
unknown parameters.
TAMS65 - Lecture2 5/31
Example 1
Example 1: Assume that x1 , ..., xm are observations of
independent r.v.s X1 , ..., Xm , where Xi ∼ Bin(n, p). Find point
estimate(punktskattning) p̂MM of p by using the Method of
Moments and prove that p̂MM is unbiased.
Binomial distribution(Binomialfördelning) X ∼ Bin(n, p) :
▶ p = Population proportion
▶ E(X ) = np
x̄ 1 Pm
p̂MM = , where x̄ = m i=1 xi .
n
TAMS65 - Lecture2 6/31
Example 2
Example 2: Suppose that the distribution of a population has the
probability density function(täthetsfunktionen)
(
3θx 3θ−1 if 0 ≤ x ≤ 1;
fX (x) =
0 otherwise,
where θ > 0 is an unknown parameter. A sample {x1 , x2 , . . . , xn }
from this population is now given.
Find a point estimate(punktskattning) of θ using Method of
Moments.
x̄
θ̂MM = 3−3x̄
TAMS65 - Lecture2 7/31
Least Square Method
The population X with unknown parameter θ, estimate θ.
Random sample X1 , ..., Xn , observations x1 , ..., xn .
Least Square Method(minsta-kvadrat-skattningen) - LSM
Find θ which minimize
n
X
Q(θ) = (xi − E(Xi ))2 .
i=1
Then get θ̂ LSM or θ̂.
Note: θ would be multiple dimensional, e.g. linear regression.
TAMS65 - Lecture2 8/31
Example 3
Example 3: Among 200 financial transactions in a company, 25
are selected and 3 are found to be incorrect. Estimate the p =
percentage of incorrect transactions.
Model: Let X be the number of incorrect transactions among 25
transactions. Then
X ∼ Hyp(N, n, p), where N = 200, n = 25.
Note: x = 3 is an observation.
Hypergeometric distribution(Hypergeometrisk fördelning)
X ∼ Hyp(N, n, p) :
▶ E(X ) = np
x 3
Method I: By Method of Moments, p̂MM = n = .
25
TAMS65 - Lecture2 9/31
Example 3
Method II: By Least Square Method,
Pn
Q(p) = i=1 (xi − E (Xi ))2 = (x − np)2
dQ dQ x
= −2n(x − np), then = 0, i.e. p = .
dp dp n
d 2Q
= 2n2 > 0 (min)
dp 2
x 3
So p̂LSM = = .
n 25
Exercise: Prove that p̂MM and p̂LSM are unbiased!
TAMS65 - Lecture2 10/31
Preparations to Maximum Likelihood Method
Summation
n
X n
X
xi = x1 + x2 + . . . + xn , c = nc
i=1 i=1
n
X n
X
cxi = cx1 + cx2 + . . . + cxn = c(x1 + x2 + . . . + xn ) = c xi
i=1 i=1
Product
n
Y n
Y
xi = x1 · x2 · . . . · xn , c = cn
i=1 i=1
n
Y n
Y
n n
(cxi ) = (cx1 ) · (cx2 ) · . . . · (cxn ) = c · x1 · x2 · . . . · xn = c xi
i=1 i=1
TAMS65 - Lecture2 11/31
Preparations to Maximum Likelihood Method
ln(a · b) = ln a + ln b
a
ln = ln a − ln b
b
ln ac = c ln a
ln e c = c
e ln c = c
TAMS65 - Lecture2 12/31
Maximum Likelihood Method
Maximum Likelihood
Method(Maximum-Likelihood-Metoden) - ML
Let x1 , ..., xn are observations of independent r.v.s X1 , ..., Xn with
p(x; θ) := pX (x), pmf - discrete r.v.
f (x; θ) := fX (x), pdf - continuous r.v.
The Likelihood function(likelihoodfunktionen) L(θ) is defi-
ned as following
Qn
i=1 p(xi ; θ) = p(x1 ; θ) · ... · p(xn ; θ), discrete r.v.
L(θ) =
Qn
i=1 f (xi ; θ) = f (x1 ; θ) · ... · f (xn ; θ), continuous r.v.
Find θ which maximize L(θ), then get θ̂ML or θ̂.
TAMS65 - Lecture2 13/31
Maximum Likelihood Method
Sample x = (−0.5, 0, 0.3, 0.5, 0.7, 0.8, 0.95, 1.15, 1.25, 1.30, 1.6, 1.9, 2.7, 3.5).
When θ changes from θ1 to θ2 , we get a ”new” probability density funtion. ML
chooses the pdf which makes L(θ) as large as possible.
TAMS65 - Lecture2 14/31
Maximum Likelihood Method
Note:
▶ In general, it’s easier to maximize ln L(θ).
▶ If there are observations x1 , . . . , xn and y1 , . . . , ym from
independent r.v.s Xi , i = 1, . . . , n and Yj , j = 1, . . . , m,
respectively, where Xi and Yj have different distributions but
both distributions contain same unknown parameter θ, then
L(θ) = L1 (θ) · L2 (θ).
▶ The parameter θ can be multidimensional.
TAMS65 - Lecture2 15/31
Example 4
Example 4: During a ” short ” geological period, it may be
reasonable to assume that the times between successive eruptions
for a volcano are independent and exponentially distributed with
an expected value µ that is characteristic of the individual volcano.
The table below shows the times in months between 36 successive
eruptions for the volcano Mauna Loa in Hawaii 1832-1950.
126 73 3 6 37 23
73 23 2 65 94 51
26 21 6 68 16 20
6 18 6 41 40 18
41 11 12 38 77 61
26 3 38 50 91 12
TAMS65 - Lecture2 16/31
Example 4
According to the data, answer the following:
(a) Estimate µ by Maximum Likelihood Method.
Model: Let X be the time between two successive eruptions, then
X ∼ Exp( µ1 ) with the pdf f (x) = µ1 e −x/µ , for x ≥ 0.
Exponential distribution(Exponentialfördelning)
▶ X ∼ Exp( µ1 )
− µx
fX (x) = µ1 e for x ≥ 0 with E(X ) = µ, V (X ) = µ2
▶ We also use X ∼ Exp(µ) :
fX (x) = µe −µx for x ≥ 0 with E(X ) = µ1 , V (X ) = 1
µ2
TAMS65 - Lecture2 17/31
Example 4
(a) we have 36 observations x1 , . . . , x36 , where n = 36. By the
Maximum Likelihood Method,
Qn Qn 1 −xi /µ 1 − µ1 Pni=1 xi
L(µ) = i=1 f (xi ; µ) = i=1 e = e
µ µn
1 Pn
ln L(µ) = −n ln µ − xi
µ i=1
d(ln L(µ)) n 1 Pn
= − + 2 i=1 xi = 0 gives µ = x̄ Max?
dµ µ µ
d 2 (ln L(µ)) n 2 Pn n 2 n
2
= 2 − 3 i=1 xi = 2
− 3 nx̄ = ... = − 2 < 0 ⇒
dµ µ µ µ=x̄ x̄ x̄ x̄
i.e. max.
x1 +x2 +···+x36
⇒ µ̂ML = x̄ = 36 = 36.72
TAMS65 - Lecture2 18/31
Example 4
(b) Find the standard error of estimation of µ̂ML .
Standard error(medelfelet) (of a point estimate θ̂): q
d = d(θ̂) := an estimation of D(Θ) = an estimation of V (Θ)
b b
(b) The standard error(medelfelet) of µ̂ML is
q
d(µ̂ML )= an estimation of D(M̂ML ) = an estimation of V (M̂ML )
1 Pn 1 2 µ2
Where V (M̂ML ) = V (X̄ ) = 2 i=1 V (Xi ) = 2 nµ =
n n n
r
(µ̂ML )2
q
36.722
So d(µ̂ML ) = = 36
≈ 6.12.
n
TAMS65 - Lecture2 19/31
Example 5
Example 5: The following data are 40 observations from a Poisson
distribution with a parameter λ (which is the mean):
Value 0 1 2 3 4
Frequency 21 0 11 6 2
Find the Maximum-Likelihood estimate for λ given this information.
Poisson distribution(Poissonfördelning) X ∼ Po(λ) :
▶ pX (k) = λk −λ
k! e for k = 0, 1, . . .
Note that: there are 21+0+11+6+2=40 observations in total.
TAMS65 - Lecture2 20/31
Example 5
The likelihood function is
0 21 1 0 2 11 3 6 4 2
L(λ) = λ0! e −λ · λ1! e −λ · λ2! e −λ · λ3! e −λ · λ4! e −λ
2 3 4
= (e −λ )21 · (λe −λ )0 · ( λ2 e −λ )11 · ( λ6 e −λ )6 · ( λ24 e −λ )2
= e −40λ · λ48 · 1
211 ·66 ·242
1
ln L(λ) = −40λ + 48 ln λ + ln 211 ·66 ·242
d(ln L(λ))
Let dλ = −40 + 48/λ = 0, then we have λ = 1.2.
d 2 (ln L(λ))
dλ2 = −48/λ2 < 0, so maximum.
Then the Maximum-Likelihood estimate for λ is
λ̂ML = 1.2.
TAMS65 - Lecture2 21/31
Maximum Likelihood Method - Normal distribution
We have observations x1 , . . . , xn from independent r.v.s
X1 , . . . , Xn , where Xi ∼ N(µ, σ).
Normal distribution(normalfördelning X ∼ N(µ, σ) :
(x−µ)2
−
▶ fX (x) = √1 e 2σ 2 , x ∈ R.
σ 2π
Case 1: σ is known and µ is unknown. Then µ̂ML = x̄.
The proof is given in Appendix.
Case 2: σ is unknown and µ is known. Then
n
2 1X
σ̂ML = (xi − µ)2 Exercises
n
i=1
TAMS65 - Lecture2 22/31
Maximum Likelihood Method - Normal distribution
Case 3: Both µ and σ are unknown.
(
µ̂ML = n1 ni=1 xi = x̄
P
(unbiased);
2 = 1
Pn 2
σ̂ML n i=1 (xi − x̄) (biased).
The proof is given in Appendix.
1 2 = n − 1 σ 2 ̸= σ 2
Pn
Note that: E(Σ̂2ML ) = E
i=1 (Xi − X̄ )
n n
which is NOT unbiased.
n
So we make an adjustment by choosing σ̂ 2 since it is
n − 1 ML
unbiased. That is,
n 1 Pn
σ̂ 2 = 2 = ... =
σ̂ML (xi − x̄)2 = s 2 .
n−1 n − 1 i=1
TAMS65 - Lecture2 23/31
Maximum Likelihood Method - Corrected/Adjusted
Corrected/Adjusted(korrigerade) point estimate of σ 2 is sample
variance:
n
2 1 X
s = (xi − x̄)2 .
n−1
i=1
A sample from Normal distribution, we use the following
point estimates
µ̂ = x̄,
n
2 2 1 X
σ̂ = s = (xi − x̄)2 ,
n−1
i=1
where both µ and σ 2 are unknown.
TAMS65 - Lecture2 24/31
Maximum Likelihood Method - More samples from Normal
distributions
Now suppose we have two samples from independent Normal
distributions
x1 , . . . , xn1 , where X1 , . . . , Xn1 are independent and N(µ1 , σ)
y1 , . . . , yn2 , where Y1 , . . . , Yn2 are independent and N(µ2 , σ)
The estimates of the three parameters can be deduced by applying
the following Likelihood function
L(µ1 , µ2 , σ 2 ) = L(µ1 , σ 2 ) · L(µ2 , σ 2 )
(xi −µ1 )2 (yi −µ2 )2
1 1
√ e − 2σ2 √ e − 2σ2
Qn1 Qn 2
= i=1 i=1
σ 2π σ 2π
TAMS65 - Lecture2 25/31
Maximum Likelihood Method - More samples from Normal
distributions
Then µ̂1 = x̄, µ̂2 = ȳ
The corrected/adjusted point estimate of σ 2
(n1 − 1)s12 + (n2 − 1)s22
σ̂ 2 = s 2 = ,
(n1 − 1) + (n2 − 1)
where
s12 = n11−1 ni=1 1 Pn2
(xi − x̄)2 and s22 = − ȳ )2
P 1
n2 −1 i=1 (yi
which are sample variances from the respective samples.
Here s 2 is called Combined/Pooled sample variance.
Note: This result can also be generalized to more samples.
TAMS65 - Lecture2 26/31
MM, LSM, ML
MM versus LSM versus ML
▶ MM - The Method of Moments
▶ Simple
▶ Consistent point estimate/estimator
▶ Usually, biased.
▶ LSM - Least Square Method
▶ Idea is good
▶ Linear regression
▶ ML - Maximum Likelihood Method
▶ Take into account of more samples
▶ Usually, lower variance than other methods
TAMS65 - Lecture2 27/31
Practice after the lecture:
Exercises - Lesson 2:
(I) 11.23, PS-1, 11.10, 11.14, 11.12, 11.15.
(II)11.13(a), 11.11, 11.16, 11.28, 11.22, 11.25.
Prepare your questions to the Lessons.
Thank you!
TAMS65 - Lecture2 28/31
Appendix
We have observations x1 , . . . , xn from independent r.v.s
X1 , . . . , Xn , where Xi ∼ N(µ, σ).
Case 1: σ is known and µ is unknown. Then µ̂ML = x̄.
1 (x−µ)2
f (x) = √ e − 2σ2
σ 2π
1 (xi −µ)2 1 Pn
− 12 2
√ e − 2σ2 = 2 i=1 (xi −µ)
Qn
L(µ) = i=1 n/2
e 2σ
σ 2π (σ 2π)
L(µ) gets maximum when the function ni=1 (xi − µ)2 gets
P
minimum, i.e. µ̂ML = x̄ (same as MM, LSM).
TAMS65 - Lecture2 29/31
Appendix
Case 3: Both µ and σ are unknown. The Likelihood function
1
h 2 2
i h 1 2 2
i
L(µ, σ) = √ e −(x1 −µ) /2σ · . . . · √ e −(xn −µ) /2σ
σ 2π σ 2π
1 n 1 Pn 2
− (x −µ)
= √ σ −n e 2σ2 i=1 i .
2π
Then we get
n
1 X
ln L(µ, σ) = konstant − n ln σ − (xi − µ)2 .
2σ 2
i=1
TAMS65 - Lecture2 30/31
Appendix
n n
!
∂(ln L(µ, σ)) 1 X 1 X
=− 2 2(xi − µ)(−1) = 2 xi − nµ
∂µ 2σ σ
i=1 i=1
n
∂(ln L(µ, σ)) n 1 X
=− + 3 (xi − µ)2
∂σ σ σ
i=1
∂l 1 Pn
∂µ =0 µ̂ML = n i=1 xi = x̄ (unbiased)
ger Pn
2 = 1
∂l σ̂ML i=1 (xi − x̄)2 (biased)
∂σ =0 n
TAMS65 - Lecture2 31/31
Thank you!