Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
19 views113 pages

Chapter 7 Statistical Intervals

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views113 pages

Chapter 7 Statistical Intervals

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 113

ES – 71

ENGINEERING
DATA ANALYSIS
ENGR. MARY CRIS L. AYING-TAMPOS
FACULTY, CET
CHAPTER 7

STATISTICAL
INTERVALS
7. Statistical Intervals
7.1 Single Sample: Estimating the Mean
7.2 Confidence Interval on the Mean of a Normal
Distribution, Variance Unknown
7.3 Confidence Interval on the Variance and Standard
Deviation of a Normal Distribution
7.4 Two Samples: Estimating the Difference between
Two Means
7.5 Large-Sample Confidence Interval for a Population
Proportion
7.6 Prediction Interval for Future Observation
7.7 Tolerance Interval
Course References

(1) Walpole, Ronald E., et. al., 2016, “Probability and Statistics
for Engineers and Scientists”. 9th Ed., Pearson Education
Inc.
(2) Montgomery, Douglas C., et al., 2018, “Applied Statistics
and Probability for Engineers”., 7th Ed., John Wiley & Sons
(Asia) Pte Ltd.
(3) Murray, Spiegel R., et al., 2013, “Probability and
Statistics”, 4th Ed., McGraw Hill Companies Inc.
Grading System
Attendance : 5%
Quizzes/Participation : 15%
Prelim Exam : 20%
Midterm Exam : 20%
Prefinal Exam : 20%
Final Exam : 20%
100%

Passing Rate : 50%


01 Construct confidence intervals on the
mean of a normal distribution, using either

Chapter 7: the normal distribution or the 𝑡 distribution


method
02 Construct confidence intervals on the
Intended Learning
variance and standard deviation of a
Outcomes
normal distribution and on a population
proportion
03 Construct prediction intervals for a future
observation

04 Construct a tolerance interval for a normal


distribution

05 Explain the three types of interval


estimates; confidence intervals, ,
prediction intervals and tolerance intervals
Introduction

Statistical intervals represent an uncertainty that exists in the data


because we work with samples that are obtained from a larger
population or process. Statistical intervals are staples of the quality
and validation practitioner’s statistical toolbox. Statistical intervals
can manifest as plus-or-minus limits on test data, represent a margin
of error in a scientific poll, or indicate the level of confidence
associated with a predicted value.
Introduction
Engineers are often involved in estimating parameters. For example, there is an ASTM
Standard E23 that defines a technique called the Charpy V-notch method for notched bar impact
testing of metallic materials. The impact energy is often used to determine whether the material
experiences a ductile-to-brittle transition as the temperature decreases. Suppose that we have
tested a sample of 10 specimens of a particular material with this procedure. We know hat we
can use the sample average 𝑋ത to estimate the true mean impact energy μ. However, we also
know that the true mean impact energy is unlikely to be exactly equal to your estimate.
Reporting the results of your test as a single number is unappealing because nothing inherent in
𝑋ത provides any information about how close it is to μ. Our estimate could be very close, or it
could be considerably far from the true mean. A way to avoid this is to report the estimate in
terms of a range of plausible values called a confidence interval.
7.1
Single Sample:
Estimating the Mean
7.1 Single Sample: Estimating the Mean
A confidence interval always specifies a confidence level, usually 90%,
95%, or 99%, which is a measure of the reliability of the procedure. So if a
95% confidence interval on the impact energy based on the data from our 10
specimens has a lower limit of 63.84 J and an upper limit of 65.08 J, then
we can say that at the 95% level of confidence any value of mean impact
energy between 63.84 J and 65.08 J is a plausible value. By reliability, we
mean that if we repeated this experiment over and over again, 95% of all
samples would produce a confidence interval that contains the true mean
impact energy, and only 5% of the time would the interval be in error.
7.1 Single Sample: Estimating the Mean

An interval estimate for a population parameter is called a confidence


interval. Information about the precision of estimation is conveyed by the
length of the interval. A short interval implies precise estimation. We cannot
be certain that the interval contains the true, unknown population parameter
– we use only a sample from the full population to compute the point
estimate and the interval. However, the confidence interval is constructed so
that we have high confidence that it does contain the unknown population
parameter. Confidence intervals are widely used in engineering and the
sciences.
7.1 Single Sample: Estimating the Mean
A tolerance interval is another important type of interval estimate. For example, the
chemical product viscosity data might be assumed to be normally distributed. We might like to
calculate limits that bound 95% of the viscosity values. For a normal distribution, we know
that 95% of the distribution is in the interval
μ - 1.96σ, μ + 1.96σ
However, this is not a useful tolerance interval because the parameters μ and σ are unknown.
Point estimates such as 𝑥ҧ and s can be used in the preceding equation for μ and σ. However, we
need to account for the potential error in each point estimate to form a tolerance interval for the
distribution. The result is an interval of the form
𝑥ҧ − 𝑘𝑠, 𝑥ҧ + 𝑘𝑠
where k is an appropriate constant (that is larger than 1.96 to account for the estimation error).
7.1 Single Sample: Estimating the Mean
As in the case of a confidence interval, it is not certain that the tolerance interval
bounds 95% of the distribution, but the interval is constructed so that we have high
confidence that it does. Tolerance intervals are widely used and, as we will subsequently
see, they are easy to calculate for normal distributions.
Confidence and tolerance intervals bound unknown elements of a distribution. In this
chapter, you will learn to appreciate the value of these intervals. A prediction interval
provides bounds on one (or more) future observations from the population. For example, a
prediction interval could be used to bound a single, new measurement of viscosity—another
useful interval. With a large sample size, the prediction interval for normally distributed
data tends to the tolerance interval, but for more modest sample sizes, the prediction and
tolerance intervals are different.
7.1 Single Sample: Estimating the Mean
Keep the purpose of the three types of interval estimates clear:
• A confidence interval bounds population or distribution parameters (such as the
mean viscosity).
• A tolerance interval bounds a selected proportion of a distribution.
• A prediction interval bounds future observations from the population or
distribution.

Our experience has been that it is easy to confuse the three types of intervals. For
example, a confidence interval is often reported when the problem situation calls for
a prediction interval.
❖ Confidence Interval on the Mean of a Normal
Distribution, Variance Known

If 𝑥ҧ is the sample mean of a random sample of size 𝑛 from a normal


population with known variance 𝜎 2 , a 100(1 − α)% CI (confidence interval)
on μ is given by

where 𝑍𝛼/2 is the upper 100α/2 percentage point of the standard normal
distribution.
For small samples selected from non-normal populations, we cannot
expect our degree of confidence to be accurate. However, for samples of size
n ≥ 30, with the shape of the distributions not too skewed, sampling theory
guarantees good results.
❖ Confidence Interval on the Mean of a Normal
Distribution, Variance Known

A Confidence Interval, constructed from sample data, is a range of values


that is likely to include the population parameter, at some specified
confidence level. The confidence interval for a population mean is
determined by taking the sample mean (the point estimate) and subtracting
and adding a margin of error to it.

ഥ±𝑬 𝝈 𝒛𝜶Τ : a single value, called the critical value


𝑿 𝑬 = 𝒛𝜶ൗ 𝟐
𝟐 𝒏 : can be found in the normal tables or by
using software
EXCEL: CONFIDENCE(𝛼, 𝜎, 𝑛)
𝑺𝒊𝒈𝒏𝒊𝒇𝒊𝒄𝒂𝒏𝒄𝒆 𝑳𝒆𝒗𝒆𝒍: 𝜶 = 𝟏 − 𝑪𝑳
❖ Confidence Interval on the Mean of a Normal
Distribution, Variance Known

Example 1: Scores on an exam are normally distributed with a population


standard deviation of 5.6. A random sample of 40 scores on the
exam has a mean of 32.

Estimate the population mean with


a) 80% confidence level
b) 90% confidence level
c) 98% confidence level
❖ Confidence Interval on the Mean of a Normal
Distribution, Variance Known

Solution: a) population mean with 80% confidence level

ഥ±𝑬 𝝈
𝑿 𝑬 = 𝒛𝜶ൗ 𝑺𝒊𝒈𝒏𝒊𝒇𝒊𝒄𝒂𝒏𝒄𝒆 𝑳𝒆𝒗𝒆𝒍: 𝜶 = 𝟏 − 𝑪𝑳
𝟐 𝒏

Given: 𝜎 = 5.6 𝑛 = 40 𝑋ത = 32

𝛼 = 1 − 0.80 = 0.20
𝛼 0.20
= = 0.10
2 2
𝑧0.10 = 1.28
Solution: a) population mean with 80% confidence level

ഥ±𝑬 𝝈
𝑿 𝑬 = 𝒛𝜶ൗ 𝑺𝒊𝒈𝒏𝒊𝒇𝒊𝒄𝒂𝒏𝒄𝒆 𝑳𝒆𝒗𝒆𝒍: 𝜶 = 𝟏 − 𝑪𝑳
𝟐 𝒏

Given: 𝜎 = 5.6 𝑛 = 40 𝑋ത = 32
5.6
𝑧𝛼ൗ = 𝑧0.10 = 1.28 𝐸 = 1.28 = 1.13
2
40
EXCEL: CONFIDENCE(0.2, 5.6, 40)=1.13

Lower limit: 𝑋ത − 𝐸 = 32 − 1.13 = 30.87


Upper limit: 𝑋ത + 𝐸 = 32 + 1.13 = 33.13
𝟑𝟎. 𝟖𝟕 ≤ 𝝁 ≤ 𝟑𝟑. 𝟏𝟑
Interpretation: We are 80% confident that the population mean score is
between 30.87 and 33.13 .
❖ Confidence Interval on the Mean of a Normal
Distribution, Variance Known

Solution: b) population mean with 90% confidence level

ഥ±𝑬 𝝈
𝑿 𝑬 = 𝒛𝜶ൗ 𝑺𝒊𝒈𝒏𝒊𝒇𝒊𝒄𝒂𝒏𝒄𝒆 𝑳𝒆𝒗𝒆𝒍: 𝜶 = 𝟏 − 𝑪𝑳
𝟐 𝒏

Given: 𝜎 = 5.6 𝑛 = 40 𝑋ത = 32

𝛼 = 1 − 0.90 = 0.10
𝛼 0.10
= = 0.05
2 2
𝑧0.05 = 1.645
Solution: b) population mean with 90% confidence level

ഥ±𝑬 𝝈
𝑿 𝑬 = 𝒛𝜶ൗ 𝑺𝒊𝒈𝒏𝒊𝒇𝒊𝒄𝒂𝒏𝒄𝒆 𝑳𝒆𝒗𝒆𝒍: 𝜶 = 𝟏 − 𝑪𝑳
𝟐 𝒏

Given: 𝜎 = 5.6 𝑛 = 40 𝑋ത = 32
5.6
𝑧𝛼ൗ = 𝑧0.05 = 1.645 𝐸 = 1.645 = 1.46
2
40
EXCEL: CONFIDENCE(0.1, 5.6, 40)=1.46

Lower limit: 𝑋ത − 𝐸 = 32 − 1.46 = 30.54


Upper limit: 𝑋ത + 𝐸 = 32 + 1.46 = 33.46
𝟑𝟎. 𝟓𝟒 ≤ 𝝁 ≤ 𝟑𝟑. 𝟒𝟔
Interpretation: We are 90% confident that the population mean score is
between 30.54 and 33.46 .
❖ Confidence Interval on the Mean of a Normal
Distribution, Variance Known

Solution: c) population mean with 98% confidence level

ഥ±𝑬 𝝈
𝑿 𝑬 = 𝒛𝜶ൗ 𝑺𝒊𝒈𝒏𝒊𝒇𝒊𝒄𝒂𝒏𝒄𝒆 𝑳𝒆𝒗𝒆𝒍: 𝜶 = 𝟏 − 𝑪𝑳
𝟐 𝒏

Given: 𝜎 = 5.6 𝑛 = 40 𝑋ത = 32

𝛼 = 1 − 0.98 = 0.02
𝛼 0.02
= = 0.01
2 2
𝑧0.01 = 2.33
Solution: c) population mean with 98% confidence level

ഥ±𝑬 𝝈
𝑿 𝑬 = 𝒛𝜶ൗ 𝑺𝒊𝒈𝒏𝒊𝒇𝒊𝒄𝒂𝒏𝒄𝒆 𝑳𝒆𝒗𝒆𝒍: 𝜶 = 𝟏 − 𝑪𝑳
𝟐 𝒏

Given: 𝜎 = 5.6 𝑛 = 40 𝑋ത = 32
5.6
𝑧𝛼ൗ = 𝑧0.01 = 2.33 𝐸 = 2.33 = 2.06
2
40
EXCEL: CONFIDENCE(0.02, 5.6, 40)=2.06

Lower limit: 𝑋ത − 𝐸 = 32 − 2.06 = 29.94


Upper limit: 𝑋ത + 𝐸 = 32 + 2.06 = 34.06
𝟐𝟗. 𝟗𝟒 ≤ 𝝁 ≤ 𝟑𝟒. 𝟎𝟔
Interpretation: We are 98% confident that the population mean score is
between 29.94 and 34.06 .
❖ Confidence Interval on the Mean of a Normal
Distribution, Variance Known

Example 2. ASTM Standard E23 defines standard test methods for notched
bar impact testing of metallic materials. The Charpy V-notch (CVN)
technique measures impact energy and is often used to determine whether or
not a material experiences a ductile-to-brittle transition with decreasing
temperature. Ten measurements of impact energy (J) on specimens of A238
steel cut at 60℃ are as follows: 64.1, 64.7, 64.5, 64.6, 64.5, 64.3, 64.6, 64.8,
64.2, and 64.3. Assume that impact energy is normally distributed with σ = 1
J. Find a 95% CI for μ, the mean impact energy.
𝑧0.025 = 1.96
❖ Confidence Interval on the Mean of a Normal
Distribution, Variance Known
Solution:

The required quantities are 𝑍𝛼/2 = 𝑍0.025 = 1.96, 𝑛 = 10, 𝜎 = 1, and 𝑥ҧ = 64.46.

Using the equation above the resulting 95% CI is as follows:

64.46 -1.96(1/ 10) ≤ μ ≤ 64.46 + 1.96(1/ 10)


63.84 ≤ μ ≤ 65.08

Based on the sample data, a range of highly plausible values for mean impact energy for
A238 steel at 60℃ is 63.84 𝐽 ≤ 𝜇 ≤ 65.08 𝐽 .
❖ Confidence Interval on the Mean of a Normal
Distribution, Variance Known
Practice Problem:
The average zinc concentration from a sample of measurements taken in 36
different locations in a river is found to be 2.6 grams per milliliter. Find the
95% and 99% confidence intervals for the mean zinc concentration in the
river. Assume that the population standard deviation is 0.3 gram per milliliter.
𝑧0.025 = 1.96

𝑧0.005 = 2.58
❖ Confidence Interval on the Mean of a Normal
Distribution, Variance Known
Practice Problem:
The average zinc concentration from a sample of measurements taken in 36 different locations in a river is
found to be 2.6 grams per milliliter. Find the 95% and 99% confidence intervals for the mean zinc
concentration in the river. Assume that the population standard deviation is 0.3 gram per milliliter.

Solution:

For 95% CI: For 99% CI:


2.6 - 1.96(0.3/ 36) ≤ μ ≤ 2.6 + 1.96(0.3/ 36) 2.6 – 2.58(0.3/ 36) ≤ μ ≤ 2.6 + 2.580.3/ 36)
2.50 ≤ μ ≤ 2.70 2.47 ≤ μ ≤ 2.73
❖ Confidence Interval on the Mean of a Normal
Distribution, Variance Known
When the value of 𝜎 is unknown, it can be replaced with the sample standard
deviation 𝑠.
In particular,
𝑠
❑ 𝑥ҧ ± is a 68% confidence interval for 𝜇
𝑛
𝑠
❑ 𝑥ҧ ± 1.645 is a 90% confidence interval for 𝜇
𝑛
𝑠
❑ 𝑥ҧ ± 1.96 is a 95% confidence interval for 𝜇
𝑛
𝑠
❑ 𝑥ҧ ± 2.58 is a 99% confidence interval for 𝜇
𝑛
𝑠
❑ 𝑥ҧ ± 3 is a 99.7% confidence interval for 𝜇
𝑛
▪ Interpreting a Confidence Interval
How does one interpret a confidence interval? In the impact energy
estimation problem in Example 2, the 95% CI is 63.84 ≤ μ ≤ 65.08, so it is
tempting to conclude that μ is within this interval with probability 0.95.
However, with a little reflection, it is easy to see that this cannot be correct;
the true value of μ is unknown, and the statement 63.84 ≤ μ ≤ 65.08 is either
correct (true with probability 1) or incorrect (false with probability 1). The
correct interpretation lies in the realization that a CI is a random interval.
Consequently, the correct interpretation of a 100(1 − 𝛼)% CI depends on
the relative frequency view of probability. Specifically, if an infinite number
of random samples are collected and a 100(1 − 𝛼)% confidence interval for
μ is computed from each sample, 100(1 − 𝛼)% of these intervals will
contain the true value of μ.
▪ Interpreting a Confidence Interval
The situation is illustrated in the figure below, which shows several
100(1 − 𝛼)% confidence intervals for the mean μ of a normal distribution.
The dots at the center of the intervals indicate the point estimate of μ (that is,
ҧ Notice that one of the intervals fails to contain the true value of μ. If this
𝑥).
were a 95% confidence interval, in the long run only 5% of the intervals
would fail to contain μ.

Repeated construction of a
confidence interval for 𝜇
▪ Interpreting a Confidence Interval

In practice, we obtain only one random sample and calculate one


confidence interval. Because this interval either will or will not contain the
true value of μ, it is not reasonable to attach a probability level to this
specific event. The appropriate statement is that the observed interval
brackets the true value of μ with confidence 100(1 − 𝛼). This statement has
a frequency interpretation; that is, we do not know whether the statement is
true for this specific sample, but the method used to obtain the interval
yields correct statements 100(1 − 𝛼)% of the time.
▪ Confidence Level and Precision of Estimation
Notice that in Example 2, our choice of the 95% level of confidence was
essentially arbitrary. What would have happened if we had chosen a higher
level of confidence, say, 99%? In fact, is it not reasonable that we would
want the higher level of confidence? At 𝛼 = 0.01, we find 𝑧𝛼Τ2 = 𝑧0.01Τ2 =
𝑧0.005 = 2.58, while for 𝛼 = 0.05, 𝑧0.025 = 1.96. Thus, the length of the
95% confidence interval is
2(1.96𝜎Τ 𝑛) = 3.92𝜎/ 𝑛
whereas the length of the 99% CI is
2(2.58𝜎Τ 𝑛) = 5.16𝜎/ 𝑛
Thus, the 99% CI is longer than the 95% CI. This is why we have a higher
level of confidence in the 99% confidence interval.
▪ Confidence Level and Precision of Estimation

Generally, for a fixed sample size n and standard deviation σ, the higher the
confidence level, the longer the resulting CI. The length of a confidence
interval is a measure of the precision of estimation. It is desirable to obtain a
confidence interval that is short enough for decision-making purposes and
that also has adequate confidence. One way to achieve this is by choosing
the sample size n to be large enough to give a CI of specified length or
precision with prescribed confidence.
❖ Choice of Sample Size

𝜎
The precision of the confidence interval in the equation above is 2𝑧𝛼/2 .
𝑛

This means that in using 𝑥ҧ to estimate μ, the error 𝐸 = 𝑥ҧ − 𝜇 is less than or equal to
𝜎
𝑧𝛼/2 with confidence 100(1 − α). This is shown graphically in the figure below.
𝑛

Error in estimating μ with 𝑥.ҧ


❖ Choice of Sample Size
In situations whose sample size can be controlled, we can choose n so that we are
100(1 − α)% confident that the error in estimating μ is less than a specified bound on the
𝜎
error E. The appropriate sample size is found by choosing n such that 𝑧𝛼/2 = 𝐸.
𝑛
Solving this equation gives the following formula for n.

If 𝑥ҧ is used as an estimate of μ, we can be 100(1 − α)% confident that the error |𝑥–μ|
ҧ will
not exceed a specified amount E when the sample size is

If the right-hand side of Equation above is not an integer, it must be rounded up. This
will ensure that the level of confidence does not fall below 100(1 − α)%. Notice that 2E
is the length of the resulting confidence interval.
❖ Choice of Sample Size
Example 1. Consider the CVN test described in Example 2 and suppose
that we want to determine how many specimens must be tested to ensure
that the 95% CI on μ for A238 steel cut at 60℃ has a length of at most 1.0 J.
Because the bound on error in estimation E is one-half of the length of the
CI.
Solution:
E = 0.5, σ = 1, and 𝑍𝛼/2 = 1.96.

The required sample size is

2
(1.96)(1)
𝑛 = = 15. 37
0.5
and because n must be an integer, the required sample size is 𝒏 = 𝟏𝟔.
❖ Choice of Sample Size

Notice the general relationship between sample size, desired length of the
confidence interval 2E, confidence level 100(1-α), and standard deviation σ:

• As the desired length of the interval 2E decreases, the required sample


size n increases for a fixed value of σ and specified confidence.

• As σ increases, the required sample size n increases for a fixed desired


length 2E and specified confidence.

• As the level of confidence increases, the required sample size n increases


for fixed desired length 2E and standard deviation σ.
❖ One-sided Confidence Bounds on Mean of a Normal
Distribution, Variance Known

The confidence interval in the Equation gives both


a lower confidence bound and an upper confidence bound for μ. Thus, it
provides a two-sided CI. It is also possible to obtain one-sided confidence
bounds for μ by setting either the lower bound 𝑙 = −∞ or the upper bound
𝑢 = ∞ and replacing 𝑧𝛼/2 by 𝑧𝛼 .

A 100(1 − α)% upper-confidence bound for μ is

and a 100(1 − α)% lower-confidence bound for μ is


❖ One-sided Confidence Bounds on Mean of a Normal
Distribution, Variance Known

Example 1. The same data for impact testing from Example 2 are used to
construct a lower, one-sided 95% confidence interval for the mean impact
energy. Recall that 𝑥ҧ = 64.46, σ = 1 J, and n = 10. What is the interval?

Solution:
𝑧0.05 = 1.645
❖ One-sided Confidence Bounds on Mean of a Normal
Distribution, Variance Known

Solution: 64.46 - 1.645(1/ 10) ≤ μ


63.94 ≤ μ

Practical Interpretation: The lower limit for the two-sided interval in Example
2 was 63.84. Because 𝑧𝛼 < 𝑧𝛼Τ2 , the lower limit of a one-sided interval is always
greater than the lower limit of a two-sided interval of equal confidence. The one-
sided interval does not bound μ from above so that it still achieves 95% confidence
with a slightly larger lower limit. If our interest is only in the lower limit for μ, then
the one-sided interval is preferred because it provides equal confidence with a
greater limit. Similarly, a one-sided upper limit is always less than a two-sided upper
limit of equal confidence.
❖ Large-Sample Confidence Interval on the Mean

When n is large, the quantity

has an approximate standard normal distribution. Consequently,

is a large-sample confidence interval for μ, with confidence level of


approximately 100(1 − α)%.
❖ Large-Sample Confidence Interval on the Mean

Example 1. An article in the 1993 volume of the Transactions of the


American Fisheries Society reports the results of a study to investigate the
mercury contamination in largemouth bass.
A sample of fish was selected from 53 Florida lakes, and mercury
concentration in the muscle tissue was measured (ppm). Find an approximate
95% CI on μ.

The mercury concentration


values were
❖ Large-Sample Confidence Interval on the Mean

Solution:
The summary statistics for these
data are as follows:

The required quantities are n = 53, 𝑥ҧ = 0.5250, s = 0.3486, and 𝑍0.025 = 1.96. The
approximate 95% CI on μ is

This interval is fairly wide because


there is substantial variability in the
mercury concentration measurements. A
larger sample size would have produced
a shorter interval.
7.2
Confidence Interval on the
Mean of a Normal
Distribution, Variance
Unknown
7.2 Confidence Interval on the Mean of a Normal
Distribution, Variance Unknown
• t - distribution

Percentage points of the 𝑡 distribution.


7.2 Confidence Interval on the Mean of a Normal
Distribution, Variance Unknown

If 𝑥ҧ and S are the sample mean and standard deviation of a random sample
from a normal distribution with unknown variance 𝜎 2 , a 100(1 − α)%
confidence interval on μ is given by

where 𝑡𝛼,𝑛−1 is the upper 100α/2 percentage point of the t distribution with
2
n-1 degrees of freedom
7.2 Confidence Interval on the Mean of a Normal
Distribution, Variance Unknown
Example 1: An article in the Journal of Materials Engineering
[“Instrumented Tensile Adhesion Tests on Plasma Sprayed Thermal Barrier
Coatings” (1989, Vol. 11(4), pp. 275–282)] describes the results of tensile
adhesion tests on 22 U-700 alloy specimens. Find the 95% confidence
interval (CI).
The load at specimen failure is as follows (in mega pascals):
𝑡0.025,21 = 2.080
7.2 Confidence Interval on the Mean of a Normal
Distribution, Variance Unknown

Solution:

3.55 3.55
13.71 - 2.080 ( ) ≤ μ ≤ 13.71 + 2.080 ( )
22 22
13.71 -1.57 ≤ μ ≤ 13.71 +1.57
12.14 ≤ μ ≤ 15.28

The CI is fairly wide because there is a lot of variability in the tensile


adhesion test measurements. A larger sample size would have led to a shorter
interval.
7.3
Confidence Interval on the
Variance and Standard
Deviation of a Normal
Distribution
7.3 Confidence Interval on the Variance and Standard Deviation of a
Normal Distribution

When the population is modelled by a normal distribution, the tests and


intervals described in this section are applicable.

The following result provides the basis of constructing these confidence


intervals:

𝟐
• 𝑿 Distribution

Let 𝑋1 , 𝑋2 , … , 𝑋𝑛 be a random sample from a normal distribution with


2 2
mean µ and variance 𝜎 , and let 𝑆 be the sample variance. Then the random
variable 2
Has a chi-square distribution (𝑋 ) with
n-1 degrees of freedom
• Confidence Interval on the Variance
2
If 𝑠 is the sample variance of a random sample of n observations from
2
a normal distribution with unknown variance 𝜎 , then a 100(1 − α)%
confidence interval on 𝝈𝟐 is given by

𝜶
𝟏− ,𝐧 − 𝟏
𝟐

2𝛼 2
where 𝑋 ,𝑛−1 and 𝑋 𝛼
1− ,𝑛−1 are the upper and lower 100α/2 percentage
2 2
points of the chi-square distribution with n-1 degrees of freedom,
respectively.
• Confidence Interval on the Standard Deviation

𝜶
𝟏− ,𝐧 − 𝟏
𝟐
• Confidence Interval on the Variance
• Confidence Interval on the Variance

Sample Problem 1:
A large candy manufacturer produces, packages and sells packs of
candy targeted to weigh 52 grams. A quality control manager working
for the company was concerned that the variation in the actual weights
of the targeted 52-gram packs was larger than acceptable. That is, he
was concerned that some packs weighed significantly less than 52-
grams and some weighed significantly more than 52 grams. In an
attempt to estimate σ, the standard deviation of the weights of all of
the 52-gram packs the manufacturer makes, he took a random sample
of n = 10 packs off of the factory line. The random sample yielded a
sample variance of 4.2 grams. Use the random sample to derive a 95%
confidence interval for σ.
• Confidence Interval on the Variance

Solution:

Therefore, we can be 95% confident that the standard deviation of the


weights of all of the packs of candy coming off of the factory line is
between 1.41 and 3.74 grams.
• Confidence Interval on the Variance

Sample Problem 2:
A sample of 7 boxes of a certain type of cereal with a
nominal weight of 750 g had the following weights: 775, 780,
2
781, 795, 803, 810, 823. Given: n=7, 𝑥=ҧ 795.3, S = 315.574.
2
Find a 95% confidence interval for 𝜎 .
• Confidence Interval on the Variance

Solution:

2
Therefore, we can be 95% confident that 𝜎 , the true variance of weights of
cereal in boxes of this type, lies between 131.04 and 1530.24 g.
• One-Sided Confidence Bounds on the Variance

The 100(1 − α)% lower and upper confidence bounds on 𝜎 2 are respectively

Sample Problem:
An automatic filling machine is used to fill bottles with liquid detergent.
A random sample of 20 bottles results in a sample variance of fill volume of
2
𝑠 = 0.01532 (fluid ounce). If the variance of fill volume is too large, an
unacceptable proportion of bottles will be under- or overfilled. We will
assume that the fill volume is approximately normally distributed. Find the
95% confidence bound.
• One-Sided Confidence Bounds on the Variance

Solution:

taking the square root of both sides: 𝝈 = 𝟎. 𝟏𝟕

Therefore, at the 95% level of confidence, the data indicate that the process
standard deviation could be as large as 0.17 fluid ounce. The process engineer
or manager now needs to determine whether a standard deviation this large
could lead to an operational problem with under- or over-filled bottles.
7.4
Two Samples: Estimating
the Difference between
Two Means
7.4 Two Samples: Estimating the Difference between Two Means

Formula:
Where 𝑥1ҧ is the mean of the first sample and 𝑥ҧ2 is the mean of the second
sample. This formula represents the difference in the sample means, and it
provides an estimate of the difference between the population means. However,
this estimate may be affected by sampling variability, and it is necessary to test
whether the observed difference is statistically significant before drawing
conclusions about the population means. This is typically done using a t-test or
a z-test, depending on the sample size and the assumptions made about the
population variances.
• Confidence Interval for Difference between Two Means,
Variances Known

If 𝑥1ҧ and 𝑥ҧ2 are means of independent random samples of sizes 𝑛1 and 𝑛2
from populations with known variances 𝜎 21 and 𝜎 2 2 , respectively, a
100 1 − 𝛼 % confidence interval for 𝜇1 and 𝜇2 is given by

Where:

• CI : is the confidence interval for the difference between the means


• 𝑥ҧ1 : is the sample mean for the first group
• 𝑥ҧ2 : is the sample mean for the second group
• 𝑍𝛼/2 : is the z-value leaving an area of 𝛼/2 to the right
• σ : is the common population standard deviation
• 𝑛1 and 𝑛2 are the sample sizes for the two groups
• Confidence Interval for Difference between Two Means,
Variances Known

Example 1:

Suppose we want to estimate the difference in the mean height between


male and female students in a certain university. We collect a random sample
of 50 male students and find their mean height to be 175 cm with a standard
deviation of 5 cm, and we collect a random sample of 60 female students and
find their mean height to be 165 cm with a standard deviation of 6 cm. We
assume that the variances of the height measurements in the male and female
populations are known to be 25 and 36, respectively. Calculate a 95%
confidence interval for the difference in the mean height between male and
female students.
• Confidence Interval for Difference between Two Means,
Variances Known
Solution:

Given: Male students: 𝑛 = 50 Female students: 𝑛 = 60


𝑥ҧ = 175 𝑥ҧ = 165
𝜎=5 𝜎=6
𝜎 2 = 25 𝜎 2 = 36
For 95% CL: 𝑍𝛼 = 𝑍0.025 = 1.96 (from a standard normal distribution table)
2

25 36 25 36
175 − 165 − 1.96 + < 𝜇1 − 𝜇2 < 175 − 165 + 1.96 +
50 60 50 60

𝟕. 𝟗𝟒 < 𝝁𝟏 − 𝝁𝟐 < 𝟏𝟐. 𝟎𝟔


Therefore, the 95% confidence interval for the difference in the mean height between male and female students is (7.94, 12.06) cm.
Assumptions made when calculating
confidence intervals:

❖ The sample is a random sample from the population of interest.

❖ The population has a normal distribution, or the sample size is large


enough for the central limit theorem to apply.

❖ The population standard deviation is known, or the sample size is large


enough to use the sample standard deviation as an estimate of the
population standard deviation.

❖ The samples are independent of each other.


Limitations of using known
variances:

In practice, it is often the case that the variances of the two populations are
unknown. When this is the case, we need to estimate the variances from the
sample data. Using the known variances when they are actually unknown can
lead to incorrect results and confidence intervals that are too narrow.
Additionally, using known variances assumes that the two populations have
equal variances, which may not always be the case.
7.5
Large-Sample Confidence
Interval for a Population
Proportion
• Normal Approximation for a Binomial Proportion

If n is large, the distribution of


is approximately standard normal.

• Approximate Confidence Interval on a Binomial Proportion


If 𝑝Ƹ is the proportion of observations in a random sample of size n that
belongs to a class of interest, an approximate 100(1 − α)% confidence interval
on the proportion p of the population that belong to this class is

where 𝑍𝛼 is the upper 𝛼/2 percentage point of the standard normal distribution.
2
• Approximate Confidence Interval on a Binomial Proportion

Example 1:

To estimate the proportion of students at a large college who are female, a


random sample of 120 students is selected. There are 69 female students in the
sample. Construct a 90% confidence interval for the proportion of all students
at the college who are female.
• Approximate Confidence Interval on a Binomial Proportion

Solution:
The proportion of students in the sample who are female is
69
𝑝Ƹ = = 0.575 1- 𝑝Ƹ = 1 − 0.575 = 0.425
120

Confidence level 90% means that 𝛼 = 1 − 0.90 = 0.10 so 𝛼 Τ2 = 0.05. We


obtain 𝑧0.05 = 1.645.
• Approximate Confidence Interval on a Binomial Proportion

Solution:

0.575 0.425 (0.575)(0.425)


0.575 − 1.645 ≤ 𝑝 ≤ 0.575 + 1.645
120 120
𝟎. 𝟓𝟎𝟏 ≤ 𝑝 ≤ 𝟎. 𝟔𝟒𝟗

One may be 90% confident that the true proportion of all students at the college
who are female is between 0.501 and 0.649.
• Sample Size for a Specified Error on a Binomial Proportion

In situations when the sample size can be selected, we may choose n to be


100(1-𝛼)% confident that the error is less than some specified value E. If we

set and solve for n, the appropriate sample size is

The confidence interval formula for estimating a population proportion 𝑝 is 𝑝Ƹ ± 𝐸.


• Sample Size for a Specified Error on a Binomial Proportion

There is a dilemma here: the formula for estimating how large a sample to
take contains the number 𝑝,Ƹ which we know only after we have taken the
sample. There are two ways out of this dilemma. Typically the researcher will
have some idea as to the value of the population proportion 𝑝, hence of what
the sample proportion 𝑝Ƹ is likely to be. For example, if last month 37% of all
voters thought that state taxes are too high, then it is likely that the proportion
with that opinion this month will not be dramatically different, and we would
use the value 0.37 for 𝑝Ƹ in the formula.
• Sample Size for a Specified Error on a Binomial Proportion

The second approach to resolving the dilemma is simply to replace in the


formula by 0.5. This is because if 𝑝Ƹ is large then 1- 𝑝Ƹ is small, and vice versa,
which limits their product to a maximum value of 0.25 , which occurs when
𝑝Ƹ = 0.5 . This is called the most conservative estimate, since it gives the
largest possible estimate of n.
• Sample Size for a Specified Error on a Binomial Proportion

Example 1:

Find the necessary minimum sample size to construct a 98% confidence


interval for 𝑝 with a margin of error 𝐸 = 0.05,

a) assuming that no prior knowledge about 𝑝 is available; and


b) assuming that prior studies suggest that 𝑝 is about 0.1.
• Sample Size for a Specified Error on a Binomial Proportion

Solution:
Confidence level 98% means that 𝛼 = 1 − 0.98 = 0.02 so 𝛼Τ2 = 0.01. We obtain
𝑧0.01 = 2.33. 𝐸 = 0.05

a) Since there is no prior knowledge of p we make the most conservative estimate


that 𝑝ො = 0.5. Then
𝑧𝛼Τ2 2 2.33 2 (0.5)(1 − 0.5)
𝑛= 𝑝ො 1 − 𝑝ො = = 542.89 = 543
𝐸 0.05 2

b) Since 𝑝 ≈ 0.1 we estimate 𝑝ො by 0.1, and obtain


𝑧𝛼Τ2 2 2.33 2 0.1 1 − 0.1
𝑛= 𝑝ො 1 − 𝑝ො = = 195.44 = 196
𝐸 0.05 2
• Sample Size for a Specified Error on a Binomial Proportion

Example 2:

A dermatologist wishes to estimate the proportion of young adults who


apply sunscreen regularly before going out in the sun in the summer. Find
the minimum sample size required to estimate the proportion to within three
percentage points, at 90% confidence.
• Sample Size for a Specified Error on a Binomial Proportion

Solution:

Confidence level 90% means that 𝛼 = 1 − 0.90 = 0.10 so 𝛼 Τ2 = 0.05. We


obtain 𝑧0.05 = 1.645.
Since there is no prior knowledge of 𝑝 we make the most conservative
estimate that 𝑝Ƹ = 0.5. To estimate “to within three percentage points” means
that 𝐸 = 0.03 . Then
𝑧𝛼Τ2 𝜎 2 1.645 2 0.5 1 − 0.5
𝑛= 𝑝Ƹ 1 − 𝑝Ƹ = = 751.6736111
𝐸 0.03 2

𝒏 = 𝟕𝟓𝟐
• Approximate One-Sided Confidence Bounds on a Binomial Proportion

The approximate 100(1-𝛼)% lower and upper confidence bounds are


respectively.
7.6
Prediction Interval for
Future Observation
Prediction Interval
• A prediction interval provides bounds on one (or more) future observations
from the population.
• It represent the uncertainty of predicting the value of a single future
observation or a fixed number of multiple future observations from a
population based on the distribution or scatter of a number of previous
observations.
• Every prediction interval comes attached with a level of confidence, which
is a percentage that expresses the degree of likelihood that the next
observation will actually fall within the predicted range.
• These are particularly associated with regression analysis, which involves
modeling the relationship between two different variables.
Prediction Interval

PREDICTION INTERVAL VS CONFIDENCE INTERVAL

The basic distinction between the two is that the prediction interval
predicts in what range a future individual observation will fall, while a
confidence interval shows the likely range of values associated with some
statistical parameter of the data, such as the population mean.
Confidence intervals refer to the mean (or other parameters) of populations
as a whole, while prediction intervals only ever refer to a single data point.
• Prediction Interval of a Future Observation, Variance Known

For a normal distribution of measurements with unknown mean μ and known


variance σ², a 100(1 − α) % prediction interval of a future observation 𝑋0 is

where 𝑍𝛼 is the z-value leaving an area of α/2 to the right.


2
• Prediction Interval of a Future Observation, Variance Unknown

For a normal distribution of measurements with unknown mean μ and unknown


variance σ², a 100(1 − α) % prediction interval of a future observation 𝑋0 is

where 𝑡𝛼 is the t-value with v = n − 1 degrees of freedom, leaving an area of


2
α/2 to the right.
• Prediction Interval of a Future Observation, Variance Unknown

One-sided prediction intervals can also be constructed. Upper prediction


bounds apply in cases where focus must be placed on future large observations.
Concern over future small observations calls for the use of lower prediction
bounds. The upper bound is given by
• Prediction Interval of a Future Observation, Variance Unknown

Example 1.
A meat inspector has randomly selected 30 packs of 95% lean beef. The
sample resulted in a mean of 96.2% with a sample standard deviation of
0.8%. Find a 99% prediction interval for the leanness of a new pack.
Assume normality.
• Prediction Interval of a Future Observation, Variance Unknown

Solution:
For v = 29 degrees of freedom, 𝑡0.005 = 2.756. Hence, a 99% prediction
interval for a new observation 𝑋0 is

which reduces to (93.96, 98.44).


Notice that the prediction interval is considerably longer than the CI. This is
because the CI is an estimate of a parameter, but the PI is an interval of a
single future observation.
• Prediction Interval of a Future Observation, Variance Known

Example 2.
Due to the decrease in interest rates, the First Citizens Bank received a
lot of mortgage applications. A recent sample of 50 mortgage loans resulted
in an average loan amount of $257,300. Assume a population standard
deviation of $25,000. For the next customer who fills out a mortgage
application, find a 95% prediction interval for the loan amount.
• Prediction Interval of a Future Observation, Variance Known

Solution:
𝑥ҧ = $257, 300 𝜎 = $25, 000 𝑍0.025 = 1.96

Hence, a 95% prediction interval for the future loan amount is

1 1
257, 300 − 1.96 25, 000 1+ < 𝑋0 < 257, 300 + 1.96 25, 000 1+
50 50

$207, 812 < 𝑋0 < $306, 788


7.7
Tolerance Interval
• Tolerance Interval

A tolerance interval is another important type of interval estimate. For


example, the chemical product viscosity data might be assumed to be
normally distributed. We might like to calculate limits that bound 95% of the
viscosity values.
Tolerance intervals (also called enclosure intervals) are similar to
prediction intervals, but they cover a fixed proportion of the population.
They are where we expect a certain population proportion to lie.
Tolerance Intervals can be two-sided (a range , with a specified
minimum and maximum) or one-sided (a range where one limit is either
negative infinity or positive infinity).
• Tolerance Interval

A tolerance interval for capturing at least 𝑦% of the values in a normal


distribution with confidence level 100(1-𝛼) % is

where k is a tolerance interval factor found in Table. Values are given for
𝑦 = 90%, 95%, and 99%, and for 90%, 95%, and 99% confidence.
This interval is very sensitive to the normality assumption. One-sided
tolerance bounds can also be computed.
• Tolerance Interval

Example 1.
Consider Example 7.6 With the information given, find a tolerance
interval that gives two-sided 95% bounds on 90% of the distribution of
packages of 95% lean beef. Assume the data came from an approximately
normal distribution.
Recall from Example 7.6 that 𝑛 = 30, the sample mean is 96.2%, and the
sample standard deviation is 0.8%. From Table I., 𝑘 = 2.14.
• Tolerance Interval

Solution:

96.2 ± (2.14 )(0.8)

we find that the lower and upper bounds are 94.5 and 97.9. We are 95%
confident that the above range covers the central 90% of the distribution of
95% lean beef packages.
Tolerance intervals are often confused with confidence intervals and prediction
intervals. They are not the same thing:

•A confidence interval contains a parameter (like a population mean) with a certain


confidence level. In other words, it tells you about the likely location for a population
parameter. For example, you might have 95% confidence that mean battery life is from
100 to 110 hours. That means if you repeat your experiment over and over again, 95% of
the time the mean battery life will fall into that range.

•A prediction interval tells you where a value will probably fall in the future. For
example, 95% prediction interval of 90 to 120 hours for the mean life of a battery tells
you that future batteries produced will fall into that range 95% of the time. Prediction
intervals are usually wider than confidence intervals.

•A tolerance interval covers a specified proportion of the population for a given


confidence level. For example, 75% of the time, batteries will fall into the interval 90 to
120 hours, with 95% confidence.

You might also like