TAMS65 - Lecture 1
Introduction and Point estimation
Zhenxia Liu
Matematisk statistik
Matematiska institutionen
Content
I Syllabus, Course Plan, Course evaluation
I Introduction
I Repetitions
I Definitions in Statistics
I Point estimation
I Point estimate, point estimator
I Unbiased, more effective
I Consistent
I Commonly used point estimates/estimators
I Appendix
TAMS65 - Lecture1 1/27
Introduction
I Probability Theory - TAMS79/TAMS80
Construct models that describe how common different events
are and that explain the variation in measurement data.
I Statistical Theory - TAMS65
Provide basic knowledge of statistical methods, i.e. to draw
conclusions about phenomenon affected by chance based on
observed data.
TAMS65 - Lecture1 2/27
Big picture of TAMS65
Heights
z}|{
Y = β0 + β1
|{z} x + |{z}
ε
Weights Error
I Typical Linear regression model (Lecture 10-12)
I β0 , β1 : Point/interval estimation, Hypothesis Testing(Lecture
1-7)
I ε : χ2 -test, Random vector(Lecture 8-9)
TAMS65 - Lecture1 3/27
Repetition
General normal distribution(normalfördelning) :
(x−µ)2
−
X ∼ N(µ, σ) if its pdf fX (x) = √1 e 2σ 2 , x ∈ R.
σ 2π
X −µ
Theorem 1: If X ∼ N(µ, σ), then ∼ N(0, 1).
σ
Standard normal distribution
2
Z ∼ N(0, 1) if its pdf φ(z) = √1 e −z /2 , z ∈ R.
2π
TAMS65 - Lecture1 4/27
Repetition
Standard normal distribution
2
Z ∼ N(0, 1) if its pdf φ(z) = √1 e −z /2 , z ∈ R.
2π
Its cdf Φ(z) = P(Z ≤ z).
I P(Z ≤ a) = Φ(a)
I P(a ≤ Z ≤ b) = Φ(b) − Φ(a)
I Φ(−a) = 1 − Φ(a)
TAMS65 - Lecture1 5/27
Definitions in Statistics
Population (population) := entire collection of objects that we
are interested in. Denoted by X , Y , . . . .
Assumption: Population X can be modelled by a certain kind of
distribution with unknown parameter(s) θ
Goal: Estimate the unknown parameter(s) θ
I Example: What is the ’true’ average height of all adults in
Sweden?
I Let X = {heights of all adults in Sweden}, then X is the
population.
I It is reasonable to assume X ∼ N(µ, σ) (Why?)
I µ = population mean = true average height
I σ = population standard deviation = true standard deviation
of heights
I For example, we want to estimate µ.
TAMS65 - Lecture1 6/27
Definitions in Statistics
Sample (stickprov), denoted by {X1 , . . . , Xn }, is a subset of the
population.
Random sample (slumpmässigt stickprov), denoted by
{X1 , . . . , Xn }, is a sample such that X1 , . . . , Xn are independent,
and X1 , . . . , Xn have the same distribution as population.
Note: n = sample size(stickprovsstorlek)
I Population X = {heights of all adults in Sweden}, then we
assume X ∼ N(µ, σ).
I e.g. Plan to choose n adults (from the population) who don’t
have genetic relation.
I Let Xi be the height of adult i, i = 1, . . . , n,
I Then X1 , . . . , Xn are independent and
Xi ∼ N(µ, σ), i = 1, . . . , n,
I Then {X1 , . . . , Xn } is a random sample.
TAMS65 - Lecture1 7/27
Definitions in Statistics
I Before measure/observe: X1 , . . . , Xn are random variables
I After measure/observe:
Observations(observationer): x1 , . . . , xn are numbers.
For example:
I Population X = {heights of all adults in Sweden}
I Plan to choose n adults who don’t have genetic relation.
I Before measure/observe: X1 , X2 , . . . , Xn .
I After measure/observe:
x1 = 180cm, x2 = 175cm, . . . , xn = 182cm.
TAMS65 - Lecture1 8/27
Point estimation - point estimator, point estimate
Point estimator(stickprovsvariabeln/skattningsvariabel) of an
unknown parameter θ, denoted by Θ, b is a function of random
sample {X1 , . . . , Xn }. That is, Θ = f (X1 , . . . , Xn ).
b
I Point estimator is a random variable.
I e.g. Sample mean(stickprovsmedelvärde):
X̄ = n1 ni=1 Xi = X1 +...+X
P n
n is a point estimator.
Point estimate(punktskattning) of an unknown parameter θ,
denoted by θ̂, is an observed value of point estimator, that is,
θ̂ = f (x1 , ..., xn ).
I Point estimate is a number.
I e.g. Sample mean x̄ = n1 ni=1 xi is a point estimate.
P
TAMS65 - Lecture1 9/27
Point estimation
True value θ v.s. point estimate θ̂ v.s. point estimator Θ
b
I θ = true/real/theoretical value
I θ̂ = a point estimate = an estimation value of θ, calculated
with observations.
I Θ
b = a point estimator = describes the variation of θ̂ for
different observations.
Standard error(medelfelet) (of a point estimate θ̂): q
d = d(θ̂) := an estimation of D(Θ)
b = an estimation of V (Θ)
b
In the text book, you can see
b = θ∗ (X) and Point estimate θ̂ = θ∗ (x).
Point estimator Θ
TAMS65 - Lecture1 10/27
Point estimation - point estimate, point estimator
Example 1: Let X = {heights of all adults in Sweden}, and
assume that X ∼ N(µ, σ).
To estimate µ, we chose three (non-genetic) adults in Sweden and
got x1 = 178 cm, x2 = 182 cm, and x3 = 186 am.
To estimate µ, we try different estimations, for example:
µ̂1 = 13 (x1 + x2 + x3 ), µ̂2 = 12 (x1 + x3 ), µ̂3 = x3 − 2.
Are µ̂1 , µ̂2 and µ̂3 point estimates of µ?
Then we get
1
µ̂1 = (178 + 182 + 186) = 182
3
1
µ̂2 = (178 + 186) = 182, µ̂3 = 186 − 2 = 184.
2
Are they the same good point estimates? If not, which is the best?
TAMS65 - Lecture1 11/27
Point estimation - unbiased
Rules to compare point different estimates/estimators
Rule 1:
A point estimator Θ
b of θ is unbiased(Väntevärdesriktighet) if
E(Θ)
b = θ.
Example 1 continued: Three point estimates of µ are given
1 1
µ̂1 = (x1 + x2 + x3 ), µ̂2 = (x1 + x3 ), µ̂3 = x3 − 2
3 2
Their corresponding three point estimators of µ are
1 1
M̂1 = (X1 + X2 + X3 ), M̂2 = (X1 + X3 ), M̂3 = X3 − 2
3 2
Lowercase: µ ↔ Uppercase: M Unbiased?
TAMS65 - Lecture1 12/27
Point estimation - unbiased
Note that: {X1 , X2 , X3 } is a random sample since X1 , X2 , X3 are
independent, and Xi ∼ N(µ, σ), i = 1, 2, 3.
1 1 1 1
E(M1 ) = E
b (X1 + X2 + X3 ) = E(X1 ) + E(X2 ) + E(X3 ) = µ,
3 3 3 3
1 1 1
E(Mb2 ) = E (X1 + X3 ) = E(X1 ) + E(X3 ) = µ,
2 2 2
E(Mb3 ) = E (X3 − 2)) = E(X3 ) − 2 = µ − 2 6= µ.
Point estimator M
b3 /point estimate µ̂3 is NOT unbiased.
Both point estimators M b1 and Mb2 are unbiased, that is, point
estimates µ̂1 and µ̂2 are unbiased. Which is better?
TAMS65 - Lecture1 13/27
Point estimation - more effective
Rule 2:
If Θ
b 1 and Θ
b 2 are unbiased point estimators of θ, Θ
b 1 is more
effective (effektivare) than Θ2 if
b
V (Θ
b 1 ) < V (Θ
b 2 ).
Example 1 continued: Both point estimators M b1 and M b2 of µ are
unbiased, but which point estimator is more effective?
σ2
1 1 1 1
V (M
b1 ) = V (X1 + X2 + X3 ) = 2 V (X1 ) + 2 V (X2 ) + 2 V (X3 ) = ,
3 3 3 3 3
2
b2 ) = V 1 (X1 + X3 ) = 1 V (X1 ) + 1 V (X3 ) = σ .
V (M
2 22 22 2
then V (Mb1 ) < V (M
b2 ), so M
b1 - µ̂1 is more effective. Thus we
1
choose µ̂1 = 3 (x1 + x2 + x3 ) among these three point estimates.
TAMS65 - Lecture1 14/27
Point estimation - consistent
Rule 3: A point estimator Θ
b n of θ is said to be
consistent(konsistent) if
b n − θ| > ε) → 0 as n → ∞,
P(|Θ
for every ε > 0.
Theorem 2 If E(Θ b n ) → 0 as n → ∞, then Θ
b n ) = θ and V (Θ bn
is a consistent point estimator of θ.
Note: Proof of Theorem 2 is given in Appendix.
Remark: Bias(systematiskt fel): A bias is a systematic error that
b − θ.
leads to an incorrect estimate of effect or association = E (Θ)
TAMS65 - Lecture1 15/27
Point estimation - Example 2
Example 2: Let x1 , ..., x7 be independent observations of a random
variable X with E(X ) = µ and V (X ) = σ 2 . Is the following point
estimate of σ 2 unbiased?
x22 + x62 − (x2 + x6 )/2
σ̂ 2 = .
2
Note: Lowercase: σ 2 ↔ Uppercase: Σ2
σ̂ 2 is not unbiased.
TAMS65 - Lecture1 16/27
Point estimation - Theorems
Let x1 , . . . , xn are observations of independent r.v.s X1 , . . . , Xn
with E(Xi ) = µ and V (Xi ) = σ 2 .
Theorem 3 The Sample mean(stickprovsmedelvärdet)
n n
b = X̄ = 1 1X
X
M Xi , µ̂ = x̄ = xi
n n
i=1 i=1
is an unbiased and consistent point estimator of µ.
Commonly used point estimates/estimators
I Population mean µ ≈ µ̂ = x̄ Sample mean
1
Pn
I µ̂ = x̄ = n i=1 xi
1
Pn
I M
b =X =
n i=1 Xi
The proof is given in Appendix.
TAMS65 - Lecture1 17/27
Point estimation - Theorems
Let x1 , . . . , xn are observations of independent r.v.s X1 , . . . , Xn
with E(Xi ) = µ and V (Xi ) = σ 2 .
Theorem 4 The Sample variance(stickprovsvariansen)
n n
1 X 1 X
S2 = (Xi − X̄ )2 , s2 = (xi − x̄)2
n−1 n−1
i=1 i=1
is an unbiased and consistent point estimator of σ 2 .
Commonly used point estimates/estimators
I Population variance σ 2 ≈ σ̂ 2 , if µ is known(känt),
2 1
Pn 2
I σ̂ = n i=1 (xi − µ)
1
Pn
I Σ̂2 = n i=1 (Xi − µ)
2
The proof of unbiasedness is given in Appendix.
TAMS65 - Lecture1 18/27
Point estimation
Commonly used point estimates/estimators
Population variance σ 2 ≈ σ̂ 2 , if µ is unknown(okänt),
1 Pn
I σ̂ 2 = s 2 = n−1 i=1 (xi − x̄)2 Sample variance
1 Pn
I Σ̂2 = S 2 = n−1 i=1 (Xi − X̄ )2
√ √
I Sample standard deviation s = s 2 and S = S 2.
Pn 2
1 Pn ( )−n·(x̄)2
i=1 xi
Note: s 2 = n−1
2
i=1 (xi − x̄) = n−1 .
I Population standard deviation σ ≈ σ̂.
TAMS65 - Lecture1 19/27
Point estimation
Sample standard deviation(stickprovsstandardavvikelse) S or s
v v
√
u n √
u n
u 1 X u 1 X
S= S =2 t 2 2
(Xi − X̄ ) , s = s = t (xi − x̄)2
n−1 n−1
i=1 i=1
Note: S is NOT an unbiased point estimator of σ, since
0 < V (S) = E(S 2 ) − [E(S)]2 = σ 2 − [E(S)]2
That is, [E(S)]2 < σ 2 and E(S) < σ.
TAMS65 - Lecture1 20/27
Commonly used point estimates/estimators
Population mean µ ≈ µ̂ = x̄ Sample mean
1 Pn
I µ̂ = x̄ = n i=1 xi
1 Pn
I M
b =X =
n i=1 Xi
Population variance σ 2 ≈ σ̂ 2 , Population standard deviation σ ≈ σ̂.
I If µ is known(känt),
1 n
I σ̂ 2 = 2
P
n i=1 (xi − µ)
1
Pn
I Σ̂2 = n i=1 (Xi − µ)
2
I If µ is unknown(okänt),
1 n
I σ̂ 2 = s 2 = − x̄)2 Sample variance
P
n−1 i=1 (xi
1
Pn
I Σ̂2 = S 2 = n−1 i=1 (Xi − X̄ )2
√ √
I Sample standard deviation s = s 2 and S = S 2 .
TAMS65 - Lecture1 21/27
σ
10.1 : Coefficient of variation(variationskoefficient) µ :
σ s
≈ .
µ x̄
Practice after the lecture:
Exercises - Lesson 1:
(I) 11.6, 11.8, 11.9, 10.1, 10.4.
(II) 5.7, 5.12, 5.22, 6.1, 5.13, 6.9.
Prepare your questions to the Lessons.
Thank you!
TAMS65 - Lecture1 22/27
Appendix
Proof of the Theorem 2.
Apply Chebyshev’s inequality
V (X )
P(|X − E (X )| ≥ k) ≤ .
k2
For any given ε > 0, then
b n − θ| > ε) ≤ P(|Θ
P(|Θ b n − θ| ≥ ε)
V (Θ
b n)
≤ → 0 as n → ∞
ε2
b n ) → 0 as n → ∞. So Θ
since V (Θ b n is consistent.
TAMS65 - Lecture1 23/27
Appendix
Solution to Example 2:
X22 X62 X2 X6
2
E(Σ̂ ) = E + − −
2 2 4 4
2 2
E(X2 ) E(X6 ) E(X2 ) E(X6 )
= + − −
2 2 4 4
2 2
= / E(Z ) = V (Z ) + (E(Z )) /
V (X2 ) + (E(X2 ))2 V (X6 ) + (E(X6 ))2 µ µ
= + − −
2 2 4 4
σ 2 + µ 2 σ 2 + µ2 µ µ
= + − = σ 2 + µ2 − 6= σ 2
2 2 2 2
No, σ̂ 2 is not unbiased. Lowercase: σ 2 ↔ Uppercase: Σ2
TAMS65 - Lecture1 24/27
Appendix
Proof of Theorem 3.
n
b = E(X̄ ) = 1 1
X
E(M) E(Xi ) = nµ = µ
n | {z } n
i=1 =µ
So M
b is unbiased.
n
2 X
1 1 σ2
V (M)
b = V (X̄ ) = V (Xi ) = 2 nσ 2 =
n | {z } n n
i=1
=σ 2
b → 0 as n → ∞. By Theorem 6, M
So V (M) b is consistent.
Therefore, M
b an unbiased and consistent point estimator of µ.
TAMS65 - Lecture1 25/27
Appendix
Proof of Theorem 4 We have
n
X n
X n
X n
X
(Xi − X̄ )2 = (Xi2 − 2Xi X̄ + X̄ 2 ) = Xi2 − 2X̄ Xi +nX̄ 2
i=1 i=1 i=1
|i=1
{z }
=nX̄
n
X
= Xi2 − nX̄ 2
i=1
and
E(Xi2 ) = V (Xi ) + (E(Xi ))2 = σ 2 + µ2
σ2
E(X̄ 2 ) = V (X̄ ) + (E(X̄ ))2 = + µ2
n
which gives
TAMS65 - Lecture1 26/27
Appendix
n n
! !
1 X 1 X
E(S 2 ) = E (Xi − X̄ )2 = E (Xi − X̄ )2
n−1 n−1
i=1 i=1
n
!
1 X
= E Xi2 − nX̄ 2
n−1
i=1
n
!
1 X
= E(Xi2 ) −n E(X̄ ) 2
n−1 | {z } | {z }
i=1
=σ 2 +µ2 2
= σn +µ2
1 1
n(σ 2 + µ2 ) − σ 2 − nµ2 = (n − 1)σ 2 = σ 2 .
=
n−1 n−1
So, S 2 is an unbiased point estimator of σ 2 .
TAMS65 - Lecture1 27/27
Thank you!