ECON3209: LECTURE 6
ESTIMATION THEORY
Professor Alan Woodland
Schoo of Economies
nsw
@ We take a random sample of observations on a random variable that has a p.d.f., which
depends on a parameter vector 0.
@ The aim to to use this random sample to estimate the parameters.
@ How can this be done?
@ What are the properties of the resulting estimators?
© Concepts of random variables and estimators.
@ Method of moments.
© Method of maximum likelihood.
© Bayesian estimator method.
Reading.The random variables X, X2,..., Xn constitute a random sample of size 7 from the population
1(x|8) if %, Xo, .., Xn are mutually independent random variables and the marginal p.0./_ of each
X; is the same funetion f(x\@). Equivalently, X, Xo, .., Xn are called independent and identically
distributed random variables with p.d./_ given by f(x, 0). Because of independence, the joint
p.d.f.of X;, X2,.... Xn is given by the product of densities as
£44525 X018) = T] 1006)
in
PREIS eROcnegeNiGERUee ENESCO)
Example: Let X;, X>, ... Xn be a random sample from an exponential population with parameter 0
Specifically, X;, X2,..., X» might correspond to the time until failure of n identical light bulbs. What
is the joint p.d.f. for the sample?
‘Answer: The p.d.. for an exponential random variable is (xj\@) = }e-*/°. Thus, the joint p.d/f.
is the product of these and so is
(x4, 2 Xnl8) = TT (18)
rhe? = he Dh /e
Exercise: Show that the probability that all light bulbs last at least 2 years is e~2"/?
RANDOM SAMPLES AND ESTIMATION
Let X;, X2, ... Xn be a random sample of size n from a population and let T (X;, X2, Xo) be a
real-valued or vector-valued function whose domain includes the sample space of (X;, Xo, .., Xn)
Then the random variable or random vector Y = T (X;, Xo,.... Xo) is called a statistic. The
probability distribution of a statistic Y is called the sampling distribution of Y.
The definition of a statistic is broad, the only restriction being that a statistic cannot be a function of
a parameter.
© The sample mean is the arithmetic average of the values in a random sample. It is usually
denoted by X = 1977, X;.
© The sample variance is the statistic defined byThe Estimation Problem:
‘@ We have a model for how a random variable X is determined. That is, we have a population
described by a p.d./, f(x\8), where 4 is a population parameter contained in a set 8.
‘Sometimes this parameter has a meaningful interpretation; other times it may be the case
that some function of @ is of primary interest.
‘© We observe data in the form of a random sample X;, ... X, of size n drawn iid (independently
and identically distributed) from the population.
© The value of the population parameter 0 is unknown to the researcher.
@ Accordingly, the estimation problem is to use the data in the random sample to obtain an
estimate for the population parameter.
Apoint estimator is any function W(X, .. Xn) of a sample. An estimator refers to the statistic,
which is a random variable, while an estimate refers to a realized value of the estimator and is a
constant. Therefore, an estimator is a function of the R.V.s, X7, ... Xn, while the estimate is a
function of the realized values of these random variables, x;, .., Xn.
Examples of statistics used as estimators:
@ The sample mean, X, is used as an estimator for the population mean, j.
@ The sample variance, S°, is used as an estimator for the population variance, 0.
@ The sample standard deviation, S, is used as an estimator for the population standard
deviation, «,
We now consider
iferent methods
METHOD OF MOMENTS ESTIMATORS
The method of moments consists of equating the first few moments of a population to the
corresponding moments of a sample, getting as many equations as are needed to solve for the
unknown parameters of the population.
Let X, .., X» be a sample from a population with p.d.f. given by f(x|0;,.., 0x). The r sameLe
MOMENT ABOUT THE ORIGIN is defined as
ifs,
nat
DEFINITION (METHOD OF MOMENTS ESTIMATORS)
Let X;,... Xn be a random sample from a population with p.d-f. given by f(x|0;,... 0x)- Then the
method of moments equates the first k moments of the sample to the corresponding k population
moments, More precisely, the method of moments estimators are obtained by solving the system
of equations
Mr = 24810 9K) Tuk
for (01, .., Om), where 11) (04, .., 9%) = E(X"|8s, .., 8) is the r population moment.Suppose Xj, Xn are iid N(11, 0%). We want to find method of moments (MIM) estimators for j:
and 0
© The first two sample moments are: My =X and My = =E1*"
@ The corresponding population moments about the origin are the first moment yy = E(X) = 1.
and the second moment j1, = E(X*) = 0? 4 2.
@ Tofind the MM estimators we solve the equations,
M=X=n
Mp = 5s XP
@ Solving for j« and 0” yields the MM estimators
M =X
= Mp — Mp
®
x2-X = 199
; oe
9%
© Here the MIM estimation coincides with intuition.
© Suppose that the realization of a random sample is x = (5,6, 4,1), so the realized moments
are m, = 4 and m, = 20.75. Then, the MM estimates for j: and 0” are m, = 4 and
img — me 0.75 — 16 = 4.75.
Let X;, ..., Xp be iid B(k, p), that is
POX = xikp) = stg PAC = py for x = 0,1).
Both k and p are unknown and need to be estimated,
Application: Some crimes have many unreported occurrences. For these crimes, the true
reporting rate, p, and the total number of occurrences, k, are unknown
@ The population moments are jx; = kp and ; = ko(1 — p) + kp?.
© The equations to solve are
© Solving for k and p, we obtain the MM estimators as
Me _ id
MMM ~ FANE RAYEINE EEN eee euen nn)
Let X;, ..., Xn be iid U(a, 1). Find the MM estimator for av
© The first moment for the uniform distribution f(x|a. fora
O and @ > 0.
@ Then, the likelihood function is
L(9|x) = T] fe 0, 9 = 1) with p.d.f
f(x]a) = K(a)x°~1e-* for x > 0, where x(a) = 1/T(a). Note that x’(a) < 0 for all a > 0
© The likelihood funetion is
La|x) = Tec ja) = Tle e
ist ist
and the log-likeliood function, defining y = 4577 Inx;, is
In L(alx) = nin — n+ (a — 1) ny.
@ The first order condition for the MLE is
AinL(alx) _ ,2!nx(a)
8a a
ny =0,
0 the MLE for «has to be obtained numerically by solving a nonlinear equation.
@ The second order condition is satisfied, since
Pink(alx) Ginx
aaa <0
forall a > 0.
Let X;, .... Xn be a random sample from a Bernoulli distribution with p.d.f. given by
A(x\p) = pX(1 — p)'-* for x = 0,1 and0 0.
© The likelihood function is
ma
Lox) = T] fx) = [Te xt =e "=
et et
and the log-likeliood funetic
In L(A\x) = md + nln a
© The first order condition for the MLE is.
Alm L(A|x) _ yy
=n mx/A=0,
Assuming ¥ > 0, the unique MLE is 3 — x if the second order condition is satisfied. Itis, since
Pini)
xe nx/d?
craton Teo ©
oF MLE
RIANCE PROPERTY
If @ is the MLE of 8, then for any monotone (increasing or decreasing) function 9(4), the MLE of
9(8) Is 9(8)
Let M(3) = L(0(2)) be the likelihood function expressed in terms of 3, where 0(;) is the inverse
of function 3 — (0) with 0’ / 0. Then M'(8) = L’(0(8))9"(8) = O if and only if L’(0) = 0. Thus,
the first order conditions coincide, Also,
M"(3) = L"(0(8))0"(8)0"(8) + L'(0(8))8"(8) = L” (0(8))(0’(8))?, since L’ = 0 by the first order
condition. Thus, /M’"(3) <0 if and only if L’(@) < 0, so the second order conditions also
coincide. Thus, maximizing M with respect to ¢ is equivalent to maximizing L with respect to 2.
aSINAN CUS Col
Exaeues
YORMAL DISTRIBUTION - VARIANCE)
Let X, .., Xn be lid N(0, 0°). Find MLEs for 0? and «, and so verify the invariance theorem
MLE for a = 0?
@ InL(a|x) = —$log(2r) — Fina — FL OP, x?.
© Set partial derivatives wrt. c equal to zero (first order necessary conditio
FH InLalx) = — 2 + gh Dy xP = 0.
@ The ML estimate is: @ = 1 S77, x?
© Check the second order sufficiency con
Zz inl(alx) lana = —zBr <0
OF MLE - EXAMPLE
XAMPLE (NORMAL DISTRIBUTION - STANDARD DEVIATION)
Let X;, ..., Xn be iid N(0, 02). Find MLE for o? and o.
MLE for o
@ In L(o|x) = ~Zlog(2n) — 3 In(o?) — a4 Ly x2.
‘@ Set partial derivatives w.r.t. « equal to zero (first order necessary conditions):
Be inU(oix) = -2 + Dye = 0.
© The ML estimate is: &
© Check the second order sufficiency condit
a =-24
$2 WL(o\x) lows = 24 <0
AMPLE (NORMAL DISTRIBUTION - COMPARISON)
These two derivations confirm the invariance theorem.
© The ML estimate for o is: @ = y/E DP, x?
@ The ML estimate for « = ois: th?
@ To get the MLE for « = 02, we could get the MLE for « and square it.
© To get the MLE for «, we could get the MLE for « = 2” and take the square root of it.There is a fundamental difference between the Bayesian and classical approaches to estimation.
CLASSICAL APPROACH:
@ Inthe classical approach (what we have been doing), the parameter @ is thought to be
unknown but fixed,
@ Arandom sample is drawn from a population characterized by 0 and, based on this sample
information, an estimate of 0 is obtained.
BAYESTAN APPROACH:
@ In Bayesian approach, 0 is considered to be random and described by probability distribution
(called the prior distribution).
© Arandom sample is taken from a population characterized by @ and the prior distribution of
the parameter @ is updated (to form the posterior distribution) with the sample information.
© The posterior distribution is then used to find an estimate for @ or to make probability
statements about 8.
@ One can make use of loss functions that are meaningful to the problem at hand in order to
find an estimate for 0.
Bay
BEES UOVEMKO)
Bayes estimators are obtained from the following approach.
@ Let X,.... Xn be a random sample with a sampling distribution given by the joint p.d.t. (x9),
where x — (x1,...,n) and @ € @ is a parameter (vector) to be estimated.
© The prior distribution for parameter 0 is given by 77(8). This contains the prior information
about the parameter, perhaps based on theoretical or past empirical information.
© Therefore, the joint distribution for (X, 8) is
9(X, 8) = F(x\8)=(8)
and the marginal distribution of X is
m(x) = tae a(x, 0)d0 = tee 1(x\a)n(0)d0
@ Then the posterior distribution, which is the conditional distribution of @ given the sample x, is
= 968) _ F(xle)n(6)
h(6|x) = mi) ma)
© A NATURAL BAYES ESTIMATOR IS THE MEAN OF THE POSTERIOR DISTRIBUTION. This estimator
5 = E(e\x).EXAMPLE (BERNOULLI
Let X = (1%... Xn) be a random sample from a Bernoulli distribution with p.d.f. given by
A(x\p) = pX(1 — p)'-* for x =0,1 andO
0,8>0.
Tara)”
x(pla,8)
This ensures that the probability parameter, p, is between 0 and 1. The parameters « and ¢
determine the shape of the prior p.dif. for p and hence the nature of the prior information.
© The objective is to determine the Bayes estimator for p.
© Using the joint density for X|p and the prior density, the joint density for X and p is
9(x, P) A(x\p)r(pia, 8)
= pm(1— py” pet
= Met8) pera 1B)
Tear ~P
= Tet B) Fa!IT(8") Fla! +8 pa'1¢4 _ pats
~ Foor) roa) rayriay? ~P)
alae) ane —
rear” “~?)
= ra(pla', 6’)
where spa se, a! = nX¥ +aand 8! = n(1 —X) + 8. Thus, the joint density is
proportional to a Beta p.d.t. with parameters «’ and 3’.
@ The marginal density for X is
1
m(x) = o(x,p)ee = «ff a(pla’,")dp =x. (Why?)XAMPLE (BERNOULLI SOLUTION CONTINUED)
© Thus, the posterior density for p is
9(X,P) _ xn(plo’, 8’)
(x) 5
(pla! 6"),
(pix)
which is the density for a Beta distribution with parameters (a’, 3") and so p\x ~ Beta(a’, 6")
© Taking the expectation, and using the moment properties of the Beta distribution, a Bayes
Estimator is
. 4
P= E(p\x) Sah pr(pla’, 8')dp
fo
a a+ ne
al +B a4n
@ The Bayes Estimator updates the prior estimate, —*, using sample information n and ¥.
© This estimator is a weighted average of the prior estimate and the MLE (sample information)
ataee) (aaa) * (Ga
Let X;, .... Xp be iid Bernouili(0). Then Y = SY? , X; is Binomial(n, 0). Assume that the prior
distribution for 0 is Beta(d|a, 2).
© The joint distribution of Y and 0:
919) = [ae — 0)?-¥) [ESE O—1(1 — 0°
Tay
a "rays
= wBeta(6|a’, 5’),
where a! = y +0,’ =n—y +8 and nis defined further below.
nl Tats) Fy+a)l(n—y+s)
= WIT(@\(B) Tinta +B)
Hel +8) gal-1¢4 gyi!
OO Fare)SUAVE We) 3
This Bayes Estimator for 0 in the binomial distribution can be expressed as:
o a ats a
5= EN) = ASG = (ats) (5) + (FH) (3)
where y= Df.4 % (80 y/n = Df, %/n =X).
© This Bayes Estimator is a weighted average of the prior expectation and the data mean,
which is the maximum likelihood estimator,
© This weighted average property is not a general property of Bayes estimators.
@ The prior expectation is updated using observed data (y, which comes from x).
DA. Miter and iter.
Chapter 10 (except sections 10.2-10.6).
DB DeGroot
Chapter 7 (except sections 7.7-7.9, and 7.6 components: EM Algorithm and Sampling
Plans).
© For special note:
@ Example on the Lifetimes of Electronic Components - various places, starting p.385.
@ Example on the Lifetimes of Fluorescent Lamps - various places, stating p.386.