Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
118 views4 pages

Parameter Estimation Techniques

The document discusses parameter estimation and summarization techniques, including: 1) Point estimation and standard error calculation using sample data to estimate population parameters. 2) Interval estimation for means and proportions using confidence intervals, which provide a range of plausible values for the true population parameter with a specific level of confidence. 3) Determining minimum sample sizes needed to estimate parameters within a given margin of error and confidence level. 4) Validating an "oracle" statement that the probability a random sample's minimum and maximum values will contain the population median is 75% by repeated simulation.

Uploaded by

Rajiv Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
118 views4 pages

Parameter Estimation Techniques

The document discusses parameter estimation and summarization techniques, including: 1) Point estimation and standard error calculation using sample data to estimate population parameters. 2) Interval estimation for means and proportions using confidence intervals, which provide a range of plausible values for the true population parameter with a specific level of confidence. 3) Determining minimum sample sizes needed to estimate parameters within a given margin of error and confidence level. 4) Validating an "oracle" statement that the probability a random sample's minimum and maximum values will contain the population median is 75% by repeated simulation.

Uploaded by

Rajiv Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

PARAMETER ESTIMATION

Kaustav Banerjee
Decision Sciences Area, IIM Lucknow

1 Point estimation
Consider the following research question: we want to estimate the proportion of PGP-I students
using i-phone. A random sample of 10 students from Section D found 4 students using i-phone.
A random sample of 20 students from Section E found 9 students using i-phone. Based on this
information answer the following questions:
(a) What is the estimate of the proportion of PGP-I students using i-phone?
(b) What is the standard error of the estimator?
(c) Which one of these 2 random samples makes you more confident? Do you notice that
the estimator is a function of the sample observations? Do you realize that its numerical
value changes as you get a fresh sample?
The problem of estimation is probably better perceived with a dart board.
Consider the ‘bull’s eye’ to be the true value of the population parameter.
We seek to estimate this true value of the parameter, just like throwing darts
at the bull’s eye. Now, it is easy to see why a ‘good’ estimator is the one
which remains ‘close’ to the bull’s eye. One way of measuring this ‘closeness’
is to take the following approach.

Mean Squared Error: To assess how an estimator T is spread around the


population parameter θ, the mean squared error (MSE) is computed as follows

MSE(T ) = E(T − θ)2 = Variance(T ) + [E(T ) − θ]2

The quantity E(T ) − θ is the bias of an estimator, assessing whether on an


average the estimator T is around the parameter θ.

With such measures assessing the performance of an estimator, consider these questions:
(a) Do you think an estimator with zero bias is a good estimator?
(b) Will you prefer an estimator with high MSE?
(c) Is sample proportion an unbiased estimator of the population proportion? Looking at
the standard error of this estimator, can you justify the use of a larger sample?

1
Consistency: An estimator T (based on a sample of size n) of a parameter θ
is consistent, if
E(T ) → θ and V (T ) → 0
as the sample size n gets large.

Question: Is consistency a desirable property for an estimator? Check if sample proportion


is a consistent estimator of the population proportion.

2 Interval estimation for mean


Suppose we are interested in estimating the average number of hours in a day a college student
in Lucknow spends in browsing social network sites or the average number of social network
accounts a college student has. Suppose there are N = 1 lakh college students in Lucknow.
If we could collect data from each of these 1 lakh students, the population average µ and the
population standard deviation σ of number of hours (number of accounts) could be determined.
However, we observe the values of the variable, say, X (number of hours or number of accounts)
for the selected sample individuals only. Let n be the sample size; X̄ the sample mean; and
S, the sample standard deviation. Of course X̄ and S are the point estimators of µ and σ.
Question 1: Could we make the following statement?

P (| X̄ − µ |< 1.96σ/ n) = 0.95

Question 2: Is the above statement equivalent to the following statement?


√ √
P (X̄ − 1.96σ/ n < µ < X̄ + 1.96σ/ n) = 0.95

Question 3: How should we interpret the above statement?


If we plug in the observed values of X̄ (assuming σ to be known, though it is unusual) computed
from the sample, we get a constant interval, which is the realized value of the random interval
for the given sample. This constant interval is a confidence interval of µ with confidence
coefficient 0.95. This is the realized value of the following random interval
√ √ 
X̄ − 1.96σ/ n, X̄ + 1.96σ/ n

Question 4: How should we interpret the confidence coefficient associated with the interval?
Question 5: What are the conditions to make a probability statement as above?
(a) Sample is randomly selected.
(b) For small sample sizes (n < 30) assumption of normality needs to be invoked for the
population distribution. For large samples (thumb rule n ≥ 30), one needs to invoke
central limit theorem to justify the above probability statement.
Usually σ is unknown, so it needs to be replaced by S, in the above interval.
Question 6: Do you think that in this case 1.96 is to be replaced by a different number for
small sample sizes?

2
For X1 , X2 , ..., Xn , a random sample from N (µ, σ), X̄ = n−1 ni=1 Xi , and
P

S 2 = (n − 1)−1 ni=1 (Xi − X̄)2 , the sampling distribution of the statistic


P

X̄ − µ
T = √ ∼ tn−1
S/ n

In general a confidence interval of µ with confidence coefficient (1−α) is defined as the realized
value of the following random interval for a given sample:
√ √ 
X̄ − Zα/2 σ/ n, X̄ + Zα/2 σ/ n if σ is known, which is unlikely
√ √ 
X̄ − tα/2;n−1 S/ n, X̄ + tα/2;n−1 S/ n if σ is unknown

The length of the above confidence interval is



2Zα/2 σ/ n if σ is known

2tα/2;n−1 S/ n if σ is unknown

This length is often called the margin of error. Notice that the margin of error decreases
with increase in sample size (n) and with decrease in population heterogeneity (σ).

3 Interval estimation for proportion


Consider the problem of estimating the proportion of college students in Lucknow (i) having
a laptop for personal use, (ii) having two cell phones etc.
Question 1: Suppose p̄ is the sample proportion based on a random sample of size n from
the relevant population, then what could we say about the following interval, keeping analogy
with the previous discussion?
 p p 
p̄ − Zα/2 p̄(1 − p̄)/n, p̄ + Zα/2 p̄(1 − p̄)/n

Question 2: What assumptions are necessary to make pthe above probability statement?
Question 3: Notice the margin of error is now 2Zα/2 p̄(1 − p̄)/n. For which value of p̄, the
margin of error is maximum?

4 Sample size determination


Question 1: Suppose the client wants you to provide her an interval estimate of µ with, say,
95% confidence level such that the margin of error should not exceed 0.5. What minimum
sample size would you recommend? (both when σ is known and it is unknown)
Question 2: Suppose the client wants you to provide her an interval estimate of p with, say,
95% confidence level such that the margin of error should not exceed 0.5. What minimum
sample size would you recommend?
Note that sample size determination takes place before the data are collected. Do you think
that it makes the determination of sample size not feasible?

3
5 An oracle
Suppose we have a population of 10 digits, 0, 1, 2, ..., 9 and we are to select a random sam-
ple of 3 digits from this population. Before carrying out the actual exercise, I received the
following oracle: the probability, that the minimum and maximum of these 3 digits
would contain the median of those 9 digits, is 75%.
10

50
40
8
Repetition No.

Repetition No.

30
6

20
4

10
2

0 2 4 6 8 0 2 4 6 8

(Minimum, Maximum) (Minimum, Maximum)

Figure 1: The intervals (minimum, maximum) for 10 and 50 samples of size 3

Let us check whether this holds true, by carrying out an exercise. We draw 10 and 50 samples
of size 3 and for each sample compute the respective minimum and maximum. In Figure 1
we draw these minimums and maximums on the x -axis and connect them by a line, while the
particular sample they come from is numbered on the y-axis. The vertical line refers to the
median of these 9 digits, 4.5. How many times these intervals contain 4.5, so intersect the
vertical line? It’s 70% in the first case and 74% in the second case. To understand the process,
let’s repeat this exercise number of times and note down the percentage of cases where the
said interval would include the median. Following is a summary of our findings.

Repetition Inclusion percentage


100 0.71
1000 0.727
10000 0.7499

What do you make of these findings with reference to our discussion? Do you see any connection
between these inclusion percentages and the confidence coefficient, i.e. 75%?

You might also like