Lecture 7
BUSINESS STATISTICS ESTIMATION
Advanced Educational Program
Reading materials:
Chap 10 (Keller)
1 2
Outline Recap: The Central Limit Theorem
• Concepts of estimation – point and interval • As n→∞, the distribution of the sample mean
estimators; unbiasedness and consistency becomes Normal, with centre µ and standard
• Estimating the population mean when the deviation σ/√n.
population variance is known • This happens regardless of the shape of the original
• Estimating the population mean when the population.
population variance is unknown • i.e. X follows a Normal distribution with
• Selecting the sample size
E ( X ) = µ and
2
var( X ) = σ
n
3 4
Recap: What size n? How does this help?
• If the distribution of X is normal, then for all n the • This means that if we have a large enough sample, we
sample mean will follow a normal distribution. can always find out probabilities to do with the mean,
• If the distribution of X is VERY not normal, then since it will have a normal distribution no matter
we will need a large n for us to see the normality of what the original distribution.
the distribution of the sample mean.
• In all cases, as n gets larger, the distribution of the
mean gets more normal.
5 6
Estimation Estimators
• The aim of estimation is to determine the • There are two types of estimators
approximate value of a parameter of the population
using statistics calculated in respect of a sample ØPoint estimator: a single value or point, i.e.
drawn from that population. sample mean = 67 is a point estimate of the
• As an example, we estimate the mean of a population using population mean, µ.
the mean of a sample drawn from that population. That is,
the sample mean is an estimator of the population mean. ØInterval estimator: Draws inferences about a
• The actual statistic we calculate in respect of the sample is population by estimating a parameter using an
called an estimate of the population parameter. For interval (range).
example, a calculated sample mean is an estimate of the
population mean. • E.g. We are 95% confidence that the unknown mean
score lies between 56 and 78.
7 8
What does the 95% confidence mean? What does the 95% confidence mean?
9 10
What does the 95% confidence mean? Desirable qualities of estimators
• Want our estimators to be accurate and precise
Ø Accurate: on average, our estimator is getting
towards the true value
Ø Precise: our estimates are close together
• Sample mean is a precise and accurate estimator
of the population mean. (Sometimes, accurate
and precise together is referred to as unbiased)
11 12
Desirable qualities of estimators Point and interval estimators
Imagine that you throw 10 arrows at a target. The results
• A point estimate is just that, an interval gives some
are illustrated below. idea of how sure we are.
• Interval estimator:
§ Give an interval (range) based on a sample statistic
Accurate but Precise but Accurate and
not precise precise
§ This interval corresponds to a probability and this
not accurate
probability is never equal to 100%
13 14
Interval estimators (cont.)
Interval estimators for µ , σ is known
• We also know that, for a standard normal distribution,
• We know that 95% of the area is contained between -1.96 and +
1.96.
⎛ σ2 ⎞
x ~ N ⎜ µ, ⎟. P ( −1.96 < Z < 1.96) = 0.95
⎝ n ⎠
x −µ
So, Z = ~ N ( 0,1) .
σ n
15 16
Put these things together…. ⎛ σ σ ⎞
P ⎜ x − 1.96 < µ < x + 1.96 ⎟ = 0.95
And rearranging… ⎝ n n⎠
• This is called a 95% confidence interval for µ.
P ( −1.96 < Z < 1.96 ) = 0.95 • What this means:
⎛ x −µ ⎞ • In repeated sampling, 95% of the intervals created
P ⎜ −1.96 < < 1.96 ⎟ = 0.95 this way would contain µ and 5% would not.
⎝ σ n ⎠ • Can change how confident we are by changing
(
P −1.96 σ n < x − µ < 1.96 σ )
n = 0.95 the 1.96
• Use 1.64 to get a 90% confidence interval
• Use 2.57 to get a 99% confidence interval
(
P x − 1.96 σ n < µ < x + 1.96 σ )
n = 0.95
17 18
Example 1 ⎛
P ⎜ x − 1.96
σ
< µ < x + 1.96
σ ⎞
⎟ = 0.95
⎝ n n⎠
• Suppose we know from experience that a
random variable X~N(µ, 1.66), and for a sample ⎛ 1.66 1.66 ⎞
P ⎜⎜1.58 − 1.96 < µ < 1.58 + 1.96 ⎟ = 0.95
of size 10 from this population, the sample mean ⎝ 10 10 ⎟⎠
is 1.58.
P ( 0.78 < µ < 2.38 ) = 0.95
• Now,
n Interpretation: If the experiment were carried out
⎛ σ σ ⎞ multiple times, 95% of the intervals created in this
P ⎜ x − 1.96 < µ < x + 1.96 ⎟ = 0.95 way would contain µ.
⎝ n n⎠
n Lower Confidence Limit: 0.78, Upper Confidence
Limit: 2.38
19 20
General notation What does 100(1-α)% mean
• In general, a 100(1-α)% confidence interval
estimator for µ is given by • If we want 95% confidence, α=0.05 (or 5%).
⎛
P ⎜ x − Zα / 2
σ
< µ < x + Zα / 2
σ ⎞ • If we want 90% confidence, α=0.10 (or 10%).
⎟ = 100(1 − α )%
⎝ n n⎠ • If we want 99% confidence, α=0.01 (or 1%).
• Notations:
Confidence level: 100(1 − α )% -
the prob. that a parameter falls into CI
σ
CI: x ± Zα / 2
n
σ σ
LCL: x − Zα / 2 ; UCL: x + Zα / 2
n n
21 22
What does Zα/2 mean? Factors influence width of the interval
• We want to find the middle 100(1- α)% area of the • σ fixed; can’t be changed
standard normal curve: • Vary the sample size: as n gets bigger, the interval
– So the area left in each tail will be α/2. gets narrower.
– Zα/2 is the point which marks off area of α/2 in the tail • Vary the confidence level: If we want to be more
– Need to look up normal tables to find this! confident, then we simply change the 1.96 to
another number from the standard normal, 2.33 will
give 98% confidence, 2.575 will give 99%
confidence; increasing confidence will make the
interval wider.
23 24
Increase confidence Increase confidence
25 26
IMPORTANT! Example 2
• Average height of a sample of 25 men is found
• Remember that it is the INTERVAL that to be 178cm. Assume that the standard
changes from sample to sample. deviation of male heights is known to be 10cm,
and that heights follow a normal distribution.
• µ is a fixed and constant value. It is either
Find
within the interval or not. 1. A 95% confidence interval for the population
• You should interpret a 95% confidence mean height.
interval as saying “In repeated sampling, 2. A 90% confidence interval for the population
mean height.
95% of such intervals created would contain
the true population mean”.
27 28
1. A 95% confidence interval for the population 2. A 90% confidence interval for the
mean height. population mean height.
P ( −1.645 < Z < 1.645 ) = 0.90,
⎛ σ σ ⎞
P ⎜ x − 1.96 < µ < x + 1.96 ⎟ = 0.95 that is Zα / 2 = 1.645
⎝ n n⎠
⎛ σ σ ⎞
⎛ 10 10 ⎞ P ⎜ x − 1.645 < µ < x + 1.645 ⎟ = 0.90
P ⎜178 − 1.96 < µ < 178 + 1.96 ⎟ = 0.95 ⎝ n n⎠
⎝ 25 25 ⎠
⎛ 10 10 ⎞
P (174.08 < µ < 181.92 ) = 0.95 P ⎜178 − 1.645 < µ < 178 + 1.645 ⎟ = 0.90
⎝ 25 25 ⎠
P (174.71 < µ < 181.29 ) = 0.90
n So, in repeated sampling, we would expect
95% of the intervals created this way to n So, in repeated sampling, we would expect 90%
contain µ. of the intervals created this way to contain µ.
29 30
Interval estimators for µ , σ is unknown About the t-distribution (1)
X −µ
• We can’t simply substitute s in for σ, since • Found by Gosset, published under pseudonym
does not have a standard normal distribution. s n “Student”.
• However, it does follow a known distribution: it • Called “Student’s t-distribution”
follows a t-distribution with n-1 degrees of freedom.
The statistic is called t-statistic: • It is symmetric around 0, mound shaped (like a
normal), but has a higher variance than a normal
distribution.
x−µ
t= • The higher the degrees of freedom, the more
s/ n normal the curve looks.
31 32
About the t-distribution (2)
About the t-distribution (3)
Normal
distribution
Bell-shaped t (df = 13)
Symmetric
More spread out t (df = 5)
Z
t
0
33 34
Degree of freedom (df) Hints for Using the t-tables
• Bottom row has df=∞; this is the standard normal
• Number of obs whose value are free to vary probabilities.
after calculating the sample mean – If df is very large, use Z tables even if σ is unknown
• E.g • If df is not on tables as exact, use whatever df is
df = n -1
– X =2 closest
= 3 -1
=2 – Difference between values for large df is small
– X1 = 1 (or another value) – E.g. df=74; would use values for df=70 as this is closest.
Then say:
X2 = 2 (or another value)
X3 = 3 (can’t be changed) t0.05,74 ≈ t0.05,70 = 1.667
35 36
Confidence Interval for µ ,σ is unknown Example 3
• A random sample, size n = 25, x = 50,
⎛ s s ⎞
P ⎜ x − tα / 2 < µ < x + tα / 2 ⎟ = 100(1 − α )% •s = 8. Use 95% confidence level to estimate µ .
⎝ n n⎠
s s
s s x − tα / 2 < µ < x + tα / 2
∴ CI: x − tα / 2 < µ < x + tα / 2 n ≤µ≤n
n n 8 8
50 − 2.0639 < µ < 50 + 2.0639
25 25
Note: (i) The population must follow normal ≤µ≤
46.69 < µ < 53.30
distribution to get t-statistic
(ii) Use t-table to find t-value
37 38
Determine the sample size Remember in the previous lecture
• Suppose that before we gather data, we know
that we want to get an average within a certain
range of the true population value.
• We can use the CLT to find the minimum
sample size required to meet this condition, if
the standard deviation of the population is
known.
39 40
Sample size required Sample size continued
• Example 4: Assume that the standard • Step 2: standardise.
deviation of a population is 5. I want to ⎛ X −µ 3 ⎞
P⎜ < ⎟ = 0.99
estimate the true population mean lying in a ⎜σ n σ n⎟
⎝ ⎠
range of 3, with 99% certainty. ⎛ 3 ⎞
P⎜ Z < ⎟ = 0.99
• Step 1: set up the equation needed. ⎝ 5 n⎠
⎛ 3 n⎞
( )
P X − µ < 3 = 0.99 P ⎜⎜ Z <
⎝
⎟ = 0.99
5 ⎟⎠
41 42
Sample size continued Activity 1
• Step 3: solve for n.
• Suppose that we know the standard deviation
of men’s heights is 10cm. How many men
P ( Z < 2.575 ) = 0.99
should we measure to ensure that the sample
3 n mean we obtain is no more than 2cm from the
= 2.575
5 population mean with 99% confidence?
n = (2.575*5) / 3
n = 18.42
• Therefore, I need a minimum sample size of 19 to be able to
estimate the true population mean lying in CI of 3, with 99%
certainty 43 44