Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
24 views33 pages

Lecture 2

Uploaded by

Sanna Zommarin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views33 pages

Lecture 2

Uploaded by

Sanna Zommarin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

TAMS65 - Lecture 2

Point estimation -
Methods to get point estimate/estimator

Zhenxia Liu
Matematisk statistik
Matematiska institutionen
Content

▶ Review of Lecture 1

▶ Introduction of Lecture 2

▶ Method of Moments -MM

▶ Least Square Method - LSM

▶ Maximum Likelihood Method - ML

▶ Appendix

TAMS65 - Lecture2 1/31


Review of Lecture 1

▶ Population(population) X with unknown parameter θ.


Goal: To estimate θ.

▶ Random sample(slumpmässigt stickprov) X1 , . . . , Xn ,


observations(observationer) x1 , . . . , xn

▶ Point estimate(punktskattning) θ̂ = f (x1 , . . . , xn )

▶ Point estimator(stickprovsvariabeln/skattningsvariabel)
Θ
b = f (X1 , . . . , Xn )

▶ Unbiased(Väntevärdesriktighet)
▶ More effective(effektivare)
▶ Consistent(konsistent)

TAMS65 - Lecture2 2/31


Review of Lecture 1 Pn
1
Population mean µ ≈ µ̂ = x̄ = n i=1 xi Sample mean
Unbiased and Consistent

Population variance σ 2 ≈ σ̂ 2 Unbiased

 Pn
1 2
 n i=1 (xi − µ)
 if µ is known(känt),
2
σ̂ =

 2 1 Pn
s = n−1 i=1 (xi − x̄)2 if µ is unknown(okänt)

s 2 is√called sample variance


s = s 2 is called sample standard deviation.

Population standard deviation σ ≈ σ̂ = σ̂ 2
TAMS65 - Lecture2 3/31
Introduction of Lecture 2
Question: What if the unknown parameter is NOT population
mean/variance/standard deviation? e.g., population proportion p?
=⇒ We need to learn several methods to find point
estimate/estimator.

▶ The Method of Moments (momentmetoden)

▶ Least Square Method( minsta-kvadrat-metoden)

▶ Maximum Likelihood Method


(maximum-likelihood-metoden)

Skill: Applying these methods = Applying pdf/pmf, expectation of


given distribution.
TAMS65 - Lecture2 4/31
The Method of Moments
The population X with unknown parameter θ, estimate θ.
Random sample X1 , ..., Xn , observations x1 , ..., xn .

The Method of Moments (momentmetoden) - MM:

Population moment = Sample moment


1st moment E (X ) = x̄ = n1 ni=1 xi
P

1 Pn
2nd moment E (X 2 ) = n
2
i=1 xi
..
. Pn
1
kth moment E (X k ) = n
k
i=1 xi

Solve for θ, then get θ̂MM or θ̂.

Note: The number of equations depends on the number of


unknown parameters.
TAMS65 - Lecture2 5/31
Example 1
Example 1: Assume that x1 , ..., xm are observations of
independent r.v.s X1 , ..., Xm , where Xi ∼ Bin(n, p). Find point
estimate(punktskattning) p̂MM of p by using the Method of
Moments and prove that p̂MM is unbiased.

Binomial distribution(Binomialfördelning) X ∼ Bin(n, p) :


▶ p = Population proportion
▶ E(X ) = np

x̄ 1 Pm
p̂MM = , where x̄ = m i=1 xi .
n

TAMS65 - Lecture2 6/31


Example 2
Example 2: Suppose that the distribution of a population has the
probability density function(täthetsfunktionen)
(
3θx 3θ−1 if 0 ≤ x ≤ 1;
fX (x) =
0 otherwise,

where θ > 0 is an unknown parameter. A sample {x1 , x2 , . . . , xn }


from this population is now given.
Find a point estimate(punktskattning) of θ using Method of
Moments.


θ̂MM = 3−3x̄

TAMS65 - Lecture2 7/31


Least Square Method

The population X with unknown parameter θ, estimate θ.


Random sample X1 , ..., Xn , observations x1 , ..., xn .

Least Square Method(minsta-kvadrat-skattningen) - LSM

Find θ which minimize


n
X
Q(θ) = (xi − E(Xi ))2 .
i=1

Then get θ̂ LSM or θ̂.

Note: θ would be multiple dimensional, e.g. linear regression.

TAMS65 - Lecture2 8/31


Example 3
Example 3: Among 200 financial transactions in a company, 25
are selected and 3 are found to be incorrect. Estimate the p =
percentage of incorrect transactions.

Model: Let X be the number of incorrect transactions among 25


transactions. Then
X ∼ Hyp(N, n, p), where N = 200, n = 25.
Note: x = 3 is an observation.

Hypergeometric distribution(Hypergeometrisk fördelning)


X ∼ Hyp(N, n, p) :
▶ E(X ) = np

x 3
Method I: By Method of Moments, p̂MM = n = .
25
TAMS65 - Lecture2 9/31
Example 3

Method II: By Least Square Method,


Pn
Q(p) = i=1 (xi − E (Xi ))2 = (x − np)2

dQ dQ x
= −2n(x − np), then = 0, i.e. p = .
dp dp n

d 2Q
= 2n2 > 0 (min)
dp 2

x 3
So p̂LSM = = .
n 25

Exercise: Prove that p̂MM and p̂LSM are unbiased!

TAMS65 - Lecture2 10/31


Preparations to Maximum Likelihood Method
Summation
n
X n
X
xi = x1 + x2 + . . . + xn , c = nc
i=1 i=1
n
X n
X
cxi = cx1 + cx2 + . . . + cxn = c(x1 + x2 + . . . + xn ) = c xi
i=1 i=1

Product
n
Y n
Y
xi = x1 · x2 · . . . · xn , c = cn
i=1 i=1
n
Y n
Y
n n
(cxi ) = (cx1 ) · (cx2 ) · . . . · (cxn ) = c · x1 · x2 · . . . · xn = c xi
i=1 i=1

TAMS65 - Lecture2 11/31


Preparations to Maximum Likelihood Method

ln(a · b) = ln a + ln b

a
ln = ln a − ln b
b

ln ac = c ln a

ln e c = c

e ln c = c

TAMS65 - Lecture2 12/31


Maximum Likelihood Method
Maximum Likelihood
Method(Maximum-Likelihood-Metoden) - ML

Let x1 , ..., xn are observations of independent r.v.s X1 , ..., Xn with



 p(x; θ) := pX (x), pmf - discrete r.v.

f (x; θ) := fX (x), pdf - continuous r.v.


The Likelihood function(likelihoodfunktionen) L(θ) is defi-


ned as following
 Qn
 i=1 p(xi ; θ) = p(x1 ; θ) · ... · p(xn ; θ), discrete r.v.
L(θ) =
 Qn
i=1 f (xi ; θ) = f (x1 ; θ) · ... · f (xn ; θ), continuous r.v.

Find θ which maximize L(θ), then get θ̂ML or θ̂.


TAMS65 - Lecture2 13/31
Maximum Likelihood Method
Sample x = (−0.5, 0, 0.3, 0.5, 0.7, 0.8, 0.95, 1.15, 1.25, 1.30, 1.6, 1.9, 2.7, 3.5).
When θ changes from θ1 to θ2 , we get a ”new” probability density funtion. ML
chooses the pdf which makes L(θ) as large as possible.

TAMS65 - Lecture2 14/31


Maximum Likelihood Method

Note:

▶ In general, it’s easier to maximize ln L(θ).

▶ If there are observations x1 , . . . , xn and y1 , . . . , ym from


independent r.v.s Xi , i = 1, . . . , n and Yj , j = 1, . . . , m,
respectively, where Xi and Yj have different distributions but
both distributions contain same unknown parameter θ, then

L(θ) = L1 (θ) · L2 (θ).

▶ The parameter θ can be multidimensional.

TAMS65 - Lecture2 15/31


Example 4

Example 4: During a ” short ” geological period, it may be


reasonable to assume that the times between successive eruptions
for a volcano are independent and exponentially distributed with
an expected value µ that is characteristic of the individual volcano.
The table below shows the times in months between 36 successive
eruptions for the volcano Mauna Loa in Hawaii 1832-1950.

126 73 3 6 37 23
73 23 2 65 94 51
26 21 6 68 16 20
6 18 6 41 40 18
41 11 12 38 77 61
26 3 38 50 91 12

TAMS65 - Lecture2 16/31


Example 4
According to the data, answer the following:

(a) Estimate µ by Maximum Likelihood Method.

Model: Let X be the time between two successive eruptions, then


X ∼ Exp( µ1 ) with the pdf f (x) = µ1 e −x/µ , for x ≥ 0.

Exponential distribution(Exponentialfördelning)
▶ X ∼ Exp( µ1 )
− µx
fX (x) = µ1 e for x ≥ 0 with E(X ) = µ, V (X ) = µ2
▶ We also use X ∼ Exp(µ) :
fX (x) = µe −µx for x ≥ 0 with E(X ) = µ1 , V (X ) = 1
µ2

TAMS65 - Lecture2 17/31


Example 4
(a) we have 36 observations x1 , . . . , x36 , where n = 36. By the
Maximum Likelihood Method, 
Qn Qn 1 −xi /µ 1 − µ1 Pni=1 xi
L(µ) = i=1 f (xi ; µ) = i=1 e = e
µ µn

1 Pn
ln L(µ) = −n ln µ − xi
µ i=1

d(ln L(µ)) n 1 Pn
= − + 2 i=1 xi = 0 gives µ = x̄ Max?
dµ µ µ

d 2 (ln L(µ)) n 2 Pn n 2 n
2
= 2 − 3 i=1 xi = 2
− 3 nx̄ = ... = − 2 < 0 ⇒
dµ µ µ µ=x̄ x̄ x̄ x̄
i.e. max.

x1 +x2 +···+x36
⇒ µ̂ML = x̄ = 36 = 36.72

TAMS65 - Lecture2 18/31


Example 4

(b) Find the standard error of estimation of µ̂ML .

Standard error(medelfelet) (of a point estimate θ̂): q


d = d(θ̂) := an estimation of D(Θ) = an estimation of V (Θ)
b b

(b) The standard error(medelfelet) of µ̂ML is


q
d(µ̂ML )= an estimation of D(M̂ML ) = an estimation of V (M̂ML )

1 Pn 1 2 µ2
Where V (M̂ML ) = V (X̄ ) = 2 i=1 V (Xi ) = 2 nµ =
n n n
r
(µ̂ML )2
q
36.722
So d(µ̂ML ) = = 36
≈ 6.12.
n

TAMS65 - Lecture2 19/31


Example 5

Example 5: The following data are 40 observations from a Poisson


distribution with a parameter λ (which is the mean):

Value 0 1 2 3 4
Frequency 21 0 11 6 2

Find the Maximum-Likelihood estimate for λ given this information.

Poisson distribution(Poissonfördelning) X ∼ Po(λ) :


▶ pX (k) = λk −λ
k! e for k = 0, 1, . . .

Note that: there are 21+0+11+6+2=40 observations in total.

TAMS65 - Lecture2 20/31


Example 5
The likelihood function is
 0 21  1 0  2 11  3 6  4 2
L(λ) = λ0! e −λ · λ1! e −λ · λ2! e −λ · λ3! e −λ · λ4! e −λ
2 3 4
= (e −λ )21 · (λe −λ )0 · ( λ2 e −λ )11 · ( λ6 e −λ )6 · ( λ24 e −λ )2
= e −40λ · λ48 · 1
211 ·66 ·242

1

ln L(λ) = −40λ + 48 ln λ + ln 211 ·66 ·242
d(ln L(λ))
Let dλ = −40 + 48/λ = 0, then we have λ = 1.2.
d 2 (ln L(λ))
dλ2 = −48/λ2 < 0, so maximum.
Then the Maximum-Likelihood estimate for λ is

λ̂ML = 1.2.

TAMS65 - Lecture2 21/31


Maximum Likelihood Method - Normal distribution
We have observations x1 , . . . , xn from independent r.v.s
X1 , . . . , Xn , where Xi ∼ N(µ, σ).

Normal distribution(normalfördelning X ∼ N(µ, σ) :


(x−µ)2

▶ fX (x) = √1 e 2σ 2 , x ∈ R.
σ 2π

Case 1: σ is known and µ is unknown. Then µ̂ML = x̄.

The proof is given in Appendix.

Case 2: σ is unknown and µ is known. Then


n
2 1X
σ̂ML = (xi − µ)2 Exercises
n
i=1

TAMS65 - Lecture2 22/31


Maximum Likelihood Method - Normal distribution
Case 3: Both µ and σ are unknown.
(
µ̂ML = n1 ni=1 xi = x̄
P
(unbiased);
2 = 1
Pn 2
σ̂ML n i=1 (xi − x̄) (biased).

The proof is given in Appendix.

1 2 = n − 1 σ 2 ̸= σ 2
Pn
Note that: E(Σ̂2ML ) = E

i=1 (Xi − X̄ )
n n
which is NOT unbiased.
n
So we make an adjustment by choosing σ̂ 2 since it is
n − 1 ML
unbiased. That is,

n 1 Pn
σ̂ 2 = 2 = ... =
σ̂ML (xi − x̄)2 = s 2 .
n−1 n − 1 i=1

TAMS65 - Lecture2 23/31


Maximum Likelihood Method - Corrected/Adjusted
Corrected/Adjusted(korrigerade) point estimate of σ 2 is sample
variance:
n
2 1 X
s = (xi − x̄)2 .
n−1
i=1

A sample from Normal distribution, we use the following


point estimates

µ̂ = x̄,
n
2 2 1 X
σ̂ = s = (xi − x̄)2 ,
n−1
i=1

where both µ and σ 2 are unknown.

TAMS65 - Lecture2 24/31


Maximum Likelihood Method - More samples from Normal
distributions
Now suppose we have two samples from independent Normal
distributions

x1 , . . . , xn1 , where X1 , . . . , Xn1 are independent and N(µ1 , σ)

y1 , . . . , yn2 , where Y1 , . . . , Yn2 are independent and N(µ2 , σ)

The estimates of the three parameters can be deduced by applying


the following Likelihood function

L(µ1 , µ2 , σ 2 ) = L(µ1 , σ 2 ) · L(µ2 , σ 2 )

(xi −µ1 )2 (yi −µ2 )2


  
1 1
√ e − 2σ2 √ e − 2σ2
Qn1 Qn 2
= i=1 i=1
σ 2π σ 2π

TAMS65 - Lecture2 25/31


Maximum Likelihood Method - More samples from Normal
distributions
Then µ̂1 = x̄, µ̂2 = ȳ

The corrected/adjusted point estimate of σ 2

(n1 − 1)s12 + (n2 − 1)s22


σ̂ 2 = s 2 = ,
(n1 − 1) + (n2 − 1)

where
s12 = n11−1 ni=1 1 Pn2
(xi − x̄)2 and s22 = − ȳ )2
P 1
n2 −1 i=1 (yi

which are sample variances from the respective samples.

Here s 2 is called Combined/Pooled sample variance.

Note: This result can also be generalized to more samples.

TAMS65 - Lecture2 26/31


MM, LSM, ML

MM versus LSM versus ML

▶ MM - The Method of Moments


▶ Simple
▶ Consistent point estimate/estimator
▶ Usually, biased.
▶ LSM - Least Square Method
▶ Idea is good
▶ Linear regression
▶ ML - Maximum Likelihood Method
▶ Take into account of more samples
▶ Usually, lower variance than other methods

TAMS65 - Lecture2 27/31


Practice after the lecture:

Exercises - Lesson 2:

(I) 11.23, PS-1, 11.10, 11.14, 11.12, 11.15.

(II)11.13(a), 11.11, 11.16, 11.28, 11.22, 11.25.

Prepare your questions to the Lessons.

Thank you!

TAMS65 - Lecture2 28/31


Appendix
We have observations x1 , . . . , xn from independent r.v.s
X1 , . . . , Xn , where Xi ∼ N(µ, σ).

Case 1: σ is known and µ is unknown. Then µ̂ML = x̄.

1 (x−µ)2
f (x) = √ e − 2σ2
σ 2π

1 (xi −µ)2 1 Pn
− 12 2
√ e − 2σ2 = 2 i=1 (xi −µ)
Qn
L(µ) = i=1 n/2
e 2σ
σ 2π (σ 2π)

L(µ) gets maximum when the function ni=1 (xi − µ)2 gets
P
minimum, i.e. µ̂ML = x̄ (same as MM, LSM).

TAMS65 - Lecture2 29/31


Appendix

Case 3: Both µ and σ are unknown. The Likelihood function


1
h 2 2
i h 1 2 2
i
L(µ, σ) = √ e −(x1 −µ) /2σ · . . . · √ e −(xn −µ) /2σ
σ 2π σ 2π
 1 n 1 Pn 2
− (x −µ)
= √ σ −n e 2σ2 i=1 i .

Then we get
n
1 X
ln L(µ, σ) = konstant − n ln σ − (xi − µ)2 .
2σ 2
i=1

TAMS65 - Lecture2 30/31


Appendix

n n
!
∂(ln L(µ, σ)) 1 X 1 X
=− 2 2(xi − µ)(−1) = 2 xi − nµ
∂µ 2σ σ
i=1 i=1
n
∂(ln L(µ, σ)) n 1 X
=− + 3 (xi − µ)2
∂σ σ σ
i=1
 ∂l  1 Pn
 ∂µ =0  µ̂ML = n i=1 xi = x̄ (unbiased)
ger Pn
2 = 1
∂l σ̂ML i=1 (xi − x̄)2 (biased)
 
∂σ =0 n

TAMS65 - Lecture2 31/31


Thank you!

You might also like