Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
8 views5 pages

Mseee

The document discusses the concepts of estimators, Mean Square Error (MSE), and consistency in statistics, focusing on how to choose appropriate statistics to estimate an unknown parameter θ. It explains how MSE can be decomposed into variance and bias, and provides examples of different estimators and their performance in terms of MSE. Additionally, it highlights the importance of consistency in estimators, illustrating with examples from normal and uniform distributions.

Uploaded by

Jiregna
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views5 pages

Mseee

The document discusses the concepts of estimators, Mean Square Error (MSE), and consistency in statistics, focusing on how to choose appropriate statistics to estimate an unknown parameter θ. It explains how MSE can be decomposed into variance and bias, and provides examples of different estimators and their performance in terms of MSE. Additionally, it highlights the importance of consistency in estimators, illustrating with examples from normal and uniform distributions.

Uploaded by

Jiregna
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Estimators, Mean Square Error, and

Consistency

January 20, 2006

1 Statistics and Mean Square Error


Let x = (x1 , . . . , xn ) be a random sample from a distribution f (x|θ), with
θ unknown. For example, X1 , . . . , Xn ∼ N (θ, 1). Our goal is to use the
information available in the data to make guesses about θ. Ideally, we would
like these to be educated guesses that are likely to be close to the true value
of θ. If the data is our only available source of information, we must estimate
θ by a function of the data, δ(x). One such function is δ(x) = x̄, others are
δ(x) = median(x), δ(x) = max(x), or δ(x) = 3x1 /(x2 x3 ).
Any such function of the data is called a statistic. One of the main goals
of this course is to figure out how to choose the right statistic to estimate
θ. Of course, we need some definition of being “right”. Our vague notion of
wanting something that is “likely to be close to θ” could be interpreted in
many ways, and we have to pick one. One of the most common measures is
Mean Square Error, or MSE, which is defined as

M SE(θ) = Eθ [(δ(X) − θ)2 ]


Estimators δ(X) that have small MSE are considered good because their
expected distance from θ is small (if the squared error is small then the
actual distance will be small as well). Note that the MSE is a function of θ,
which means some estimators might work well for some values of θ and not
for others.
Computing MSE requires the sampling distribution of δ(x), which is where
the prerequisite of probability appears in this course. The calculation is

1
somewhat simplified by noting that MSE can be divided into two parts. Let
µδ = Eθ [δ(X)] (note µδ is a constant, not a random variable).

Eθ [(δ(X) − θ)2 ] = Eθ [(δ(X) − µδ + µδ − θ)2 ]


= Eθ [(δ(X) − µδ )2 + 2(δ(X) − µδ )(µδ − θ) + (µδ − θ)2 ]
= Eθ [(δ(X) − µδ )2 ] + Eθ [2(δ(X) − µδ )(µδ − θ)] + Eθ [(µδ − θ)2 ] (1)
= Vθ [δ(X)] + 2(µδ − θ)Eθ [(δ(X) − µδ )] + (µδ − θ)2
= Vθ [δ(X)] + (µδ − θ)2

Thus, the mean square error can be decomposed into a variance term and a
bias term. The bias is defined as (µδ −θ), the distance between the estimator’s
mean and the parameter θ. An estimator is called unbiased if the bias is 0
(which occurs if E[δ(X)] = µδ = θ), in which case the MSE is just the
variance of the estimator.
For example, suppose X1 , . . . , Xn ∼ N (θ, 1) and δ(X) = X̄, the sample
mean. The distribution of X̄ is N (θ, 1/n) (1/n is the variance). In this case
µδ = E[δ(X)] = θ and Vθ [δ(X)] = 1/n, so the MSE is 1/n + 02 = 1/n. Note
in this example the MSE does not depend on the parameter θ. The sample
mean performs equally well for all values of θ.
Choosing an estimator depends strongly on the likelihood. It turns out
δ(x) = x̄ is one of the best estimators for the normal mean in the previous
example. If X1 , . . . , Xn ∼ U ni(0, θ), x̄ doesn’t perform nearly as well. To
find the MSE, we need the mean and variance of x̄. Note that E[Xi ] = θ/2
and V [Xi ] = θ2 /12. The sample mean therefore has mean θ/2 and variance
θ2 /(12n). The MSE is therefore
!2
θ2 θ (3n + 1)θ2
+ −θ =
12n 2 12n
In this example the MSE depends on θ. In turns out this MSE is much larger
than other available estimators. One quick improvement, for example, is to
remove the bias. Suppose instead of δ(x) = x̄ we use δ(x) = 2x̄. Then
E[δ(x)] = 2(θ/2) = θ and V [δ(x)] = 4(θ2 /(12n)) = θ2 /(3n). The MSE is

θ2 4θ2
+0=
3n 12n
For n > 1, 2x̄ has a smaller MSE than x̄ for all θ.

2
2 Consistency
One desirable property of estimators is consistency. If we collect a large
number of observations, we hope we have a lot of information about any
unknown parameter θ, and thus we hope we can construct an estimator with
a very small MSE. We call an estimator consistent if

lim M SE(θ) = 0
n

which means that as the number of observations increase the MSE descends
to 0. In our first example, we found if X1 , . . . , Xn ∼ N (θ, 1), then the MSE
of x̄ is 1/n. Since limn (1/n) = 0, x̄ is a consistent estimator of θ.
Remark: To be specific we may call this “MSE-consistant”. There are
other type of consistancy definitions that, say, look at the probability of the
errors. They work better when the estimator do not have a variance.
If X1 , . . . , Xn ∼ U ni(0, θ), then δ(x) = x̄ is not a consistent estimator of
θ. The MSE is (3n + 1)θ2 /(12n) and

(3n + 1)θ2 θ2
lim = 6= 0
n 12n 4
so even if we had an extremely large number of observations, x̄ would prob-
ably not be close to θ. Our adjusted estimator δ(x) = 2x̄ is consistent,
however. We found the MSE to be θ2 /3n, which tends to 0 as n tends to
infinity. This doesn’t necessarily mean it is the optimal estimator (in fact,
there are other consistent estimators with MUCH smaller MSE), but at least
with large samples it will get us close to θ.

3 The uniform distribution in more detail


We said there were a number of possible functions we could use for δ(x).
Suppose that X1 , . . . , Xn ∼ U ni(0, θ). We have already discussed two es-
timators, x̄ and 2x̄, and found their MSE. There are a variety of others.
Instead of the mean x̄ we could look at the median of the x values. Anal-
ogously to the mean, 2median(x) is an improvement. Another estimator is
the maximum of the x values. This can be made unbiased by multiplying
by (n + 1)/n. Finally, there is the possibility of more complicated functions.

3
Clearly θ must be bigger than max(x), otherwise max(x) couldn’t be in the
sample. If 2x̄ < max(x), then max(x) must be closer to θ than 2x̄, so we
can use the estimator max(2x̄, max(x)). This results in 6 estimators shown
in the table.
We have already derived the MSEs for x̄ and 2x̄. It is also fairly easy
to derive the MSEs for 2median(x), max(x), and (n + 1) max(x)/n. The
MSE for max(2x̄, max(x)) is more difficult to derive. To demonstrate a little
more explicitly what is going on than theoretical calculations allow, we turn
to simulations. I generated a dataset X1 , . . . , X11 ∼ U ni(0, θ = 5). For
this dataset I computed each of the 6 estimators. Ideally, we would like
these estimators to be close to 5, the correct answer. I then tossed away
the 11 observations, generated another 11, and computed the 6 estimators
for this second set of observation. I then generated a third set, a fourth set,
and so on to a total of 100000 sets of 11 observations. The figure shows
histograms of the 6 estimators. These histograms approximate the sampling
distributions of the estimators for n = 11. I then approximated the MSE for
each estimator. This was done by looking at the values of these estimators
for each of the 100000 datasets and using the sample mean and variance
as guesses of µδ and V [δ(x)]. Note that for x̄ and 2x̄, the estimators whose
MSEs are known from the previous section, the values in the table are almost
identical to the theoretical values. For n = 11, we find the theoretical MSE
for x̄ is (3n + 1)θ2 /(12n) = (34)(25)/(132) = 6.44 and the theoretical MSE
for 2x̄ is θ2 /3n = (25/33) = 0.76.

Estimator Mean Bias Variance MSE(simulated)


d1 = x̄ 2.50 -2.50 0.19 6.43
d2 = 2x̄ 5.00 0.00 0.76 0.76
d3 = 2median(x) 4.99 -0.01 1.92 1.92
d4 = max(x) 4.58 -0.42 0.15 0.32
d5 = (n + 1) max(x)/n 5.00 0.00 0.18 0.18
d6 = max(2x̄, max(x)) 5.14 0.14 0.52 0.54

4
Figure 1: Simulated Results for 6 Estimators

You might also like