Statistics
OSTA – WS 2024
Dr. Omer CAYIRLI
Lecture 10
1
Outline
❖Interval Estimation
➢ Population Mean
➢ Determining the Sample Size
➢ Population Proportion
➢ Big Data and Confidence Intervals
2
Interval Estimation
❖ Margin of error and the Interval Estimate
➢ A point estimator cannot be expected to provide the exact value of the
population parameter.
➢ An interval estimate can be computed by adding and subtracting a margin
of error to the point estimate.
➢ The purpose of an interval estimate is to provide information about how
close the point estimate, is to the value of the parameter.
Point Estimate ± Margin of Error
3
Interval Estimation
❖ Interval Estimate of a Population Mean: σ Known
➢ The general form of an interval estimate of a population mean,
𝑥ҧ ± Margin of Error
➢ To develop an interval estimate of a population mean, the margin of error
must be computed using either:
✓ the population standard deviation σ, or
✓ the sample standard deviation s
➢ σ is rarely known exactly. However, often a good estimate can be obtained
based on historical data or other information.
➢ Such cases are referred to as σ known case.
4
Interval Estimation
❖ Interval Estimate of a Population Mean: σ Known
➢ There is a (1-α) probability that the value of a sample mean will provide a margin
of error 𝑧𝛼Τ2 𝜎𝑥ҧ or less.
➢ Interval Estimate of μ:
𝜎 where (1 − α) is the confidence coefficient,
𝑥ҧ ± 𝑧𝛼Τ2 𝑧𝛼Τ2 is the z value providing an area of α/2 in the upper tail of the standard normal
𝑛
probability distribution.
Values of 𝒛𝜶Τ𝟐 for the Most Commonly Used Confidence Levels
5
Interval Estimate of a Population Mean: σ Known
❖ Meaning of C% Confidence
➢ 95% of the values of any normally distributed random variable are
within ±1.96 standard deviations of the mean.
✓ When the sampling distribution of x is normally distributed, 95% of the
x values must be within ±1.96𝜎𝑥ҧ of the mean μ.
✓ We say that this interval has been established at the 95% confidence
level.
✓ The value 0.95 is referred to as the confidence coefficient.
6
Interval Estimate of a Population Mean: σ Known
❖ Discount Sounds has 260 retail outlets throughout the United States. The firm is evaluating
a potential location for a new outlet, based in part, on the mean annual income of the
individuals in the marketing area of the new location.
❖ A sample of size n = 36 was taken; the sample mean income is $41,100. The population is
not believed to be highly skewed. The population standard deviation is estimated to be
$4,500, and the confidence coefficient to be used in the interval estimate is 0.95.
➢ 95% of the values the sample means that can be observed are within ±1.96𝜎𝑥ҧ of the population
mean, μ.
𝜎 4,500
𝑥ҧ ± 𝑧𝛼ൗ @ 95% confidence level 𝑥ҧ ± 1.96 𝑥ҧ ± 1,470
2 𝑛 36
➢ The interval estimate of μ is,
$41,100 + $1,470
or
$39,630 to $42,570
➢ We are 95% confident that the interval contains
the population mean. 7
Interval Estimate of a Population Mean: σ Known
❖ Adequate Sample Size
➢ In most applications, a sample size of n ≥ 30 is adequate.
✓ If the population distribution is highly skewed or contains outliers, a
sample size of 50 or more is recommended.
✓ If the population is not normally distributed but is roughly symmetric, a
sample size as small as 15 will suffice.
✓ If the population is believed to be at least approximately normal, a
sample size of less than 15 can be used.
8
Interval Estimate of a Population Mean: σ Unknown
❖ When developing an interval estimate of a population mean we usually do not have a good estimate of
the population standard deviation either.
❖ We must use the same sample to estimate both μ and σ, which represents the σ unknown case.
❖ When s is used to estimate σ, the margin of error and the interval estimate for the population mean are
based on a probability distribution known as the t distribution.
➢ The mathematical development of the t distribution is based on the assumption of a normal distribution for the
population we are sampling from.
➢ However, research shows that the t distribution can be successfully applied in many situations where the population
deviates significantly from normal.
➢ A specific t distribution depends on a parameter known as the degrees of freedom.
➢ Degrees of freedom refers to the number of independent pieces of information that go into the computation of s.
➢ A t distribution with more degrees of freedom has less dispersion.
➢ As the degrees of freedom increases, the difference between the t distribution and the standard normal probability
distribution becomes smaller and smaller.
9
Interval Estimate of a Population Mean: σ Unknown
❖ For more than 100 degrees of freedom, the standard normal z value provides a good
approximation to the t value.
❖ The standard normal z values can be found in the infinite degrees (∞) row of the t
distribution table.
where (1 − α) is the confidence coefficient,
𝑠
𝑥ҧ ± 𝑡𝛼Τ2 𝑡𝛼Τ2 is the t value providing an area of α/2 in the upper tail of the t
𝑛 distribution with n − 1 degrees of freedom.
10
Interval Estimate of a Population Mean: σ Unknown
❖ A reporter for a student newspaper is writing an article on the cost of off-campus housing.
➢ A sample of 16 one-bedroom apartments within a half-mile of campus resulted in a sample mean of $750 per
month and a sample standard deviation of $55.
➢ Assuming that this population is normally distributed, provide a 95% confidence interval estimate of the
mean rent per month for the population of one-bedroom apartments within a half-mile of campus.
✓ At 95% confidence, 𝛼 = 0.05 and 𝛼Τ2 = 0.025.
✓ 𝑡0.025 is based on 𝑛 − 1 = 16 − 1 = 15 degrees of freedom.
𝑠
𝑥ҧ ± 𝑡𝛼Τ2
𝑛
55
750± 2.131
16
750± 29.30
We are 95% confident that the mean rent per
month for the population of one-bedroom
apartments within a half-mile of campus is
between $720.70 and $779.30. 11
Interval Estimate
❖ Summary of Interval Estimation Procedures for a Population Mean
12
Interval Estimation
❖ Sample Size for an Interval Estimate of a Population Mean
➢ Let 𝐸 = the desired margin of error and define it as the amount added to and subtracted
from the point estimate to obtain an interval estimate.
➢ If a desired margin of error is selected prior to sampling, the sample size necessary to
satisfy the margin of error can be determined. 2
𝜎 𝑧𝛼ൗ 𝜎2
2
Margin of error: 𝐸 = 𝑧𝛼ൗ Necessary sample size: 𝑛=
2 𝑛 𝐸2
➢ The Necessary Sample Size equation requires a value for the population standard
deviation, 𝜎.
✓ If 𝜎 is unknown, a preliminary or planning value for 𝜎 can be used in the equation.
❑ Use the estimate of the population standard deviation computed in a previous study.
❑ Use a pilot study to select a preliminary sample and use the sample standard deviation from the
study.
❑ Use judgment or a “best guess” for the value of 𝜎.
13
Sample Size for an Interval Estimate of a Population Mean
❖ Discount Sounds is evaluating a potential location for a new retail outlet, based in part, on
the mean annual income of the individuals in the marketing area of the new location.
❖ Suppose that Discount Sounds’ management team wants an estimate of the population
mean such that there is a 0.95 probability that the sampling error is $500 or less.
❖ How large a sample size is needed to meet the required precision?
𝜎
𝐸 = 𝑧𝛼ൗ 𝐸 = 500, 𝜎 = 4,500, @95% confidence 𝑧0.025 = 1.96
2 𝑛
2
𝑧𝛼ൗ 𝜎2 1.96 2 (4,500)2
2 𝑛= = 311.17
𝑛= 500 2
𝐸2
14
Interval Estimate of a Population Proportion
❖ The general form of an interval estimate of a population proportion:
𝑝ҧ ± Margin of Error
❖ The sampling distribution of 𝑝ҧ plays a key role in computing the margin of error.
❖ The sampling distribution of 𝑝ҧ can be approximated by a normal distribution whenever
𝑛𝑝 ≥ 5 𝑎𝑛𝑑 𝑛(1 − 𝑝) ≥ 5
ҧ
𝑝(1− 𝑝)ҧ
𝑝ҧ ± 𝑧𝛼Τ2
𝑛
where (1 − α) is the confidence coefficient,
𝑧𝛼Τ2 is the z value providing an area of α/2 in
the upper tail of the standard normal
probability distribution.
15
Interval Estimate of a Population Proportion
❖ Political Science Inc. (PSI) specializes in voter polls and surveys designed to keep political office seekers
informed of their position in a race.
❖ Using telephone surveys, PSI interviewers ask registered voters who they would vote for if the election were held
that day.
❖ In a current election campaign, PSI has just found that 220 registered voters, out of 500 contacted, favor a
particular candidate. PSI wants to develop a 95% confidence interval estimate for the proportion of the
population of registered voters that favor the candidate.
ҧ
𝑝(1− 𝑝)ҧ 220
𝑝ҧ ± 𝑧𝛼Τ2 𝑛 = 500, 𝑝ҧ = = 0.44, @95% confidence 𝑧0.025 = 1.96
𝑛 500
0.44(1−0.44)
0.44± 1.96
500
0.44± 0.0435
16
Sample Size for an Interval Estimate of a Population Proportion
❖ Let 𝐸 = the desired margin of error and define it as the amount added to and
subtracted from the point estimate to obtain an interval estimate.
❖ If a desired margin of error is selected prior to sampling, the sample size necessary
to satisfy the margin of error can be determined.
2
𝑝(1
ҧ − 𝑝)ҧ 𝑧𝛼ൗ 𝑝(1
ҧ − 𝑝)ҧ
2
Margin of error: 𝐸 = 𝑧𝛼ൗ Necessary sample size: 𝑛=
2 𝑛 𝐸2
➢ 𝑝ҧ will not be known until after we selected sample. The planning value 𝑝∗ can be used for 𝑝.ҧ
✓ The planning value 𝑝∗ can be chosen by:
❑ Using the sample proportion from a previous sample of the same or similar size.
❑ Selecting a preliminary sample and using the sample proportion from that sample.
❑ Using judgment or a “best guess” for the 𝑝∗ value.
❑ Otherwise, use 𝑝∗ = 0.5.
17
Sample Size for an Interval Estimate of a Population Proportion
❖ Suppose that PSI would like a 0.99 probability that the sample proportion is
within ± 0.03 of the population proportion.
❖ How large a sample size is needed to meet the required precision? (A
previous sample of similar units yielded 0.44 for the sample proportion.)
𝑝(1
ҧ − 𝑝)ҧ
𝐸 = 𝑧𝛼ൗ 𝐸 = 0.03, 𝑝∗ = 0.44, @99% confidence 𝑧0.025 = 2.576
2 𝑛
2
𝑧𝛼ൗ 𝑝(1
ҧ − 𝑝)ҧ
2
𝑛=
𝐸2
2.576 2 (0.44)(1 − 0.44)
𝑛= = 1,817
(0.03)2
18
What is next?
❖Hypothesis Tests I
➢ Developing Null and Alternative Hypotheses
➢ Type I and Type II Errors
➢ Population Mean: σ Known & Unknown
➢ Reading(s):
✓ SBE 9.1 → 9.4
19
Statistics
OSTA – WS 2024
Dr. Omer CAYIRLI
Lecture 10
20