2017-6-2 https://onlinecourses.science.psu.
edu/stat414/print/book/export/html/191
Published on STAT 414 / 415 (https://onlinecourses.science.psu.edu/stat414)
Home > Maximum Likelihood Estimation
Maximum Likelihood Estimation
Statement of the Problem
Suppose we have a random sample X1, X2,..., Xn whose assumed probability distribution
depends on some unknown parameter . Our primary goal here will be to nd a point estimator
u(X1,X2,...,Xn), such thatu(x1,x2,...,xn) is a "good" point estimate of, wherex1,x2,...,xnare
the observed values of the random sample. For example, if we plan to take a random
sampleX1,X2,...,Xnfor which the Xi are assumed to be normally distributed with mean and
variance 2, then our goal will be to nd a good estimate of, say, using the
datax1,x2,...,xnthat we obtained from our specic random sample.
The Basic Idea
It seems reasonable that a good estimate ofthe unknown parameterwould bethe value
ofthat maximizes the probability, errrr... that is, the likelihood... of getting the data we
observed. (So, do you see from where the name "maximum likelihood" comes?) So, that is, in a
nutshell, the idea behind the method of maximum likelihood estimation. But how would we
implement the method in practice? Well, suppose we have a random sampleX1,X2,...,Xnfor
which the probability density (or mass) function of each Xi is f(xi;). Then, the joint probability
mass (or density) function ofX1,X2,...,Xn, which we'll (not so arbitrarily) call L() is:
L() = P(X1 = x 1 , X2 = x 2 , , Xn = x n ) = f (x 1 ; ) f (x 2 ; ) f (x n ; ) = f (x i ; )
i=1
The rst equality is of course just the denition of the joint probability mass function. The second
equality comes from that fact that we have a random sample, which implies by denition that
theXiare independent. And, the last equality just uses the shorthand mathematical notation of a
product of indexed terms. Now, in light of the basic idea of maximum likelihood estimation, one
reasonable way to proceed is to treat the "likelihood function"L()as a function of, and nd
the value of that maximizes it.
Is this still sounding like too much abstract gibberish? Let's take a look at an example to see if we
can make it a bit more concrete.
Example
Suppose we have a random sampleX1,X2,...,Xnwhere:
Xi = 0 if a randomly selected student does not own a sports car, and
Xi= 1 if a randomly selected student does own a sports car.
Assuming that the Xiare independentBernoulli random variables with unknown parameter p, nd
the maximum likelihood estimator of p, the proportion of students who own a sports car.
https://onlinecourses.science.psu.edu/stat414/print/book/export/html/191 1/7
2017-6-2 https://onlinecourses.science.psu.edu/stat414/print/book/export/html/191
Solution.If theXiare independentBernoulli
random variables with unknown parameterp,
then the probability mass function of each Xi is:
xi 1xi
f (x i ; p) = p (1 p)
for xi = 0 or 1 and 0 < p < 1. Therefore, the
likelihood function L(p) is, by denition:
n
x1 1x1 x2 1x2 xn 1xn
L(p) = f (x i ; p) = p (1 p) p (1 p) p (1 p)
i=1
for 0 < p < 1. Simplifying, by summing up the exponents, we get :
xi n xi
L(p) = p (1 p)
Now, in order to implement the method of maximum likelihood, we need to nd the p
that maximizes the likelihood L(p). We need to put on our calculus hats now, since in
order to maximize the function, we are going to need to differentiate the likelihood
function with respect to p. In doing so, we'll use a "trick" that often makes the
differentiation a bit easier. Note that the natural logarithm is an increasing function of x:
That is, if x1 < x2, then f(x1) < f(x2). That means that the value of p that maximizes the
natural logarithm of the likelihood function ln(L(p)) is also the value of p that maximizes
the likelihood function L(p). So, the "trick" is to take the derivative ofln(L(p)) (with
respect to p) rather than taking the derivative ofL(p). Again, doing so often makes the
differentiation much easier. (By the way, throughout the remainder of this course, I will
use either ln(L(p)) or log(L(p)) to denote the natural logarithm of the likelihood
function.)
In this case, the natural logarithm of the likelihood function is:
logL(p) = ( x i )log(p) + (n x i )log(1 p)
Now, taking the derivative of the log likelihood, and setting to 0, we get:
https://onlinecourses.science.psu.edu/stat414/print/book/export/html/191 2/7
2017-6-2 https://onlinecourses.science.psu.edu/stat414/print/book/export/html/191
Now, multiplying through by p(1p), we get:
( x i )(1 p) (n x i )p = 0
Upon distributing, we see that two of the resulting terms cancel each other out:
leaving us with:
x i np = 0
Now, all we have to do is solve for p. In doing so, you'll want to make sure that you
always put a hat ("^") on the parameter, in this case p, to indicate it is an estimate:
n
xi
i=1
p =
n
or, alternatively, an estimator:
n
Xi
i=1
p =
n
Oh, and we should technically verify that we indeed did obtain a maximum. We can do
that by verifying that the second derivative of the log likelihood with respect to p is
negative. It is, but you might want to do the work to convince yourself!
Now, with that example behind us, let us take a look at formal denitions of the terms (1)
likelihood function, (2) maximum likelihood estimators, and (3) maximum likelihood estimates.
Denition.LetX1,X2,...,Xnbe a random sample from a distribution that depends on one or more
unknown parameters1,2,...,mwith probability density (or mass) function f(xi;1,2,...,m).
Suppose that (1,2,...,m) is restricted to a given parameter space . Then:
(1) Whenregarded as a function of1,2,...,m, the joint probability density (or mass) function
ofX1,X2,...,Xn:
n
L(1 , 2 , , m ) = f (xi ; 1 , 2 , , m )
i=1
https://onlinecourses.science.psu.edu/stat414/print/book/export/html/191 3/7
2017-6-2 https://onlinecourses.science.psu.edu/stat414/print/book/export/html/191
((1,2,...,m) in )is called the likelihood function.
(2) If:
[u1 (x1 , x2 , , xn ), u2 (x1 , x2 , , xn ), , um (x1 , x2 , , xn )]
is the m-tuple that maximizes the likelihood function, then:
i = ui (X1 , X2 , , Xn )
is the maximum likelihood estimator ofi, for i = 1, 2, ..., m.
(3) The corresponding observed values of the statistics in (2), namely:
[u1 (x1 , x2 , , xn ), u2 (x1 , x2 , , xn ), , um (x1 , x2 , , xn )]
are called the maximum likelihood estimates ofi, fori= 1, 2, ...,m.
Example
Suppose the weights of randomly selected American
female college students are normally distributed with
unknown mean and standard deviation . A
random sample of 10 American female college
students yielded the following weights (in pounds):
115122130127149160
152138149180
Based on the denitions given above, identify the
likelihood function and the maximum likelihood
estimator of, the mean weight of all American female college students. Using the given sample,
nd a maximum likelihood estimate ofas well.
Solution. The probability density function of Xi is:
2
1 (x i )
2
f (x i ; , ) = exp
[ 2 ]
2
2
for < x <. The parameter space is = {(, ): << and 0 <
<}.Therefore, (you might want to convince yourself that) the likelihood function is:
n
1
n n/2 2
L(, ) = (2 ) exp (x i )
[ 2 ]
2 i=1
for << and 0 <<. It can be shown (we'll do so in the next example!), upon
maximizing the likelihood function with respect to,that the maximum likelihood
estimator ofis:
n
1
= Xi = X
n
i=1
https://onlinecourses.science.psu.edu/stat414/print/book/export/html/191 4/7
2017-6-2 https://onlinecourses.science.psu.edu/stat414/print/book/export/html/191
Based on the given sample, a maximum likelihood estimate ofis:
n
1 1
= xi = (115 + + 180) = 142.2
n 10
i=1
pounds. Note that the only difference between the formulas for the maximum likelihood
estimator and the maximum likelihood estimate is that:
the estimator is dened using capital letters (to denote that its value is random),
and
the estimate is dened using lowercase letters (to denote that its value is xed
and based on an obtained sample)
Okay, so now we have the formal denitions out of the way. The rst example on this page
involved a joint probability mass function that depends on only one parameter, namely p, the
proportion of successes. Now, let's take a look at an example that involves a joint probability
density function that depends on two parameters.
Example
LetX1,X2,...,Xnbe a random sample from a normal distribution with unknown meanand
variance2. Find maximum likelihood estimators ofmeanand variance2.
Solution.In nding the estimators, the rst thing we'll do is write the probability density
function as a function of 1= and 2= 2:
2
1 (x i 1 )
f (x i ; 1 , 2 ) = exp
[ 22 ]
2 2
for <1< and 0 <2<.We do this so as not to cause confusion when taking
the derivative of the likelihood with respect to2. Now, that makes the likelihood
function:
n n
n/2
1
n/2 2
L(1 , 2 ) = f (x i ; 1 , 2 ) = (2 ) exp (x i 1 )
2
[ 22 ]
i=1 i=1
and therefore the log of the likelihood function:
2
n n (x i 1 )
logL(1 , 2 ) = log2 log(2 )
2 2 22
Now, upon taking the partial derivative of the log likelihood with respect to1, and
setting to 0, we see that a few things cancel each other out, leaving us with:
https://onlinecourses.science.psu.edu/stat414/print/book/export/html/191 5/7
2017-6-2 https://onlinecourses.science.psu.edu/stat414/print/book/export/html/191
Now, multiplying through by 2, and distributing the summation, we get:
x i n1 = 0
Now, solving for1, and putting on its hat, we have shown that the maximum likelihood
estimate of1is:
xi
1 = = = x
n
Now for 2. Taking the partial derivative of the log likelihood with respect to2, and
setting to 0,we get:
Multiplying through by 222 :
we get:
2
n2 + (x i 1 ) = 0
And, solving for2, and putting on its hat, we have shownthat themaximum likelihood
estimate of2is:
2
)
(xi x
2
2 = =
n
(I'll again leave it to you to verify, in each case, that the second partial derivative of the log
likelihood is negative, and therefore that we did indeed nd maxima.) In summary, we have
shown that the maximum likelihood estimators ofand variance2for the normal model are:
Xi )2
(Xi X
2
=
= X and =
n n
respectively.
Note that the maximum likelihood estimator of2for the normal model is not the sample
varianceS2. They are, in fact, competing estimators. So how do we know which estimator we
should use for2? Well, one way is to choose the estimator that is "unbiased." Let's go learn
about unbiased estimators now.
https://onlinecourses.science.psu.edu/stat414/print/book/export/html/191 6/7
2017-6-2 https://onlinecourses.science.psu.edu/stat414/print/book/export/html/191
Source URL: https://onlinecourses.science.psu.edu/stat414/node/191
https://onlinecourses.science.psu.edu/stat414/print/book/export/html/191 7/7