Chapter 7 Statistical Intervals
Chapter 7 Statistical Intervals
ENGINEERING
DATA ANALYSIS
ENGR. MARY CRIS L. AYING-TAMPOS
FACULTY, CET
CHAPTER 7
STATISTICAL
INTERVALS
7. Statistical Intervals
7.1 Single Sample: Estimating the Mean
7.2 Confidence Interval on the Mean of a Normal
Distribution, Variance Unknown
7.3 Confidence Interval on the Variance and Standard
Deviation of a Normal Distribution
7.4 Two Samples: Estimating the Difference between
Two Means
7.5 Large-Sample Confidence Interval for a Population
Proportion
7.6 Prediction Interval for Future Observation
7.7 Tolerance Interval
Course References
(1) Walpole, Ronald E., et. al., 2016, “Probability and Statistics
for Engineers and Scientists”. 9th Ed., Pearson Education
Inc.
(2) Montgomery, Douglas C., et al., 2018, “Applied Statistics
and Probability for Engineers”., 7th Ed., John Wiley & Sons
(Asia) Pte Ltd.
(3) Murray, Spiegel R., et al., 2013, “Probability and
Statistics”, 4th Ed., McGraw Hill Companies Inc.
Grading System
Attendance : 5%
Quizzes/Participation : 15%
Prelim Exam : 20%
Midterm Exam : 20%
Prefinal Exam : 20%
Final Exam : 20%
100%
Our experience has been that it is easy to confuse the three types of intervals. For
example, a confidence interval is often reported when the problem situation calls for
a prediction interval.
❖ Confidence Interval on the Mean of a Normal
Distribution, Variance Known
where 𝑍𝛼/2 is the upper 100α/2 percentage point of the standard normal
distribution.
For small samples selected from non-normal populations, we cannot
expect our degree of confidence to be accurate. However, for samples of size
n ≥ 30, with the shape of the distributions not too skewed, sampling theory
guarantees good results.
❖ Confidence Interval on the Mean of a Normal
Distribution, Variance Known
ഥ±𝑬 𝝈
𝑿 𝑬 = 𝒛𝜶ൗ 𝑺𝒊𝒈𝒏𝒊𝒇𝒊𝒄𝒂𝒏𝒄𝒆 𝑳𝒆𝒗𝒆𝒍: 𝜶 = 𝟏 − 𝑪𝑳
𝟐 𝒏
Given: 𝜎 = 5.6 𝑛 = 40 𝑋ത = 32
𝛼 = 1 − 0.80 = 0.20
𝛼 0.20
= = 0.10
2 2
𝑧0.10 = 1.28
Solution: a) population mean with 80% confidence level
ഥ±𝑬 𝝈
𝑿 𝑬 = 𝒛𝜶ൗ 𝑺𝒊𝒈𝒏𝒊𝒇𝒊𝒄𝒂𝒏𝒄𝒆 𝑳𝒆𝒗𝒆𝒍: 𝜶 = 𝟏 − 𝑪𝑳
𝟐 𝒏
Given: 𝜎 = 5.6 𝑛 = 40 𝑋ത = 32
5.6
𝑧𝛼ൗ = 𝑧0.10 = 1.28 𝐸 = 1.28 = 1.13
2
40
EXCEL: CONFIDENCE(0.2, 5.6, 40)=1.13
ഥ±𝑬 𝝈
𝑿 𝑬 = 𝒛𝜶ൗ 𝑺𝒊𝒈𝒏𝒊𝒇𝒊𝒄𝒂𝒏𝒄𝒆 𝑳𝒆𝒗𝒆𝒍: 𝜶 = 𝟏 − 𝑪𝑳
𝟐 𝒏
Given: 𝜎 = 5.6 𝑛 = 40 𝑋ത = 32
𝛼 = 1 − 0.90 = 0.10
𝛼 0.10
= = 0.05
2 2
𝑧0.05 = 1.645
Solution: b) population mean with 90% confidence level
ഥ±𝑬 𝝈
𝑿 𝑬 = 𝒛𝜶ൗ 𝑺𝒊𝒈𝒏𝒊𝒇𝒊𝒄𝒂𝒏𝒄𝒆 𝑳𝒆𝒗𝒆𝒍: 𝜶 = 𝟏 − 𝑪𝑳
𝟐 𝒏
Given: 𝜎 = 5.6 𝑛 = 40 𝑋ത = 32
5.6
𝑧𝛼ൗ = 𝑧0.05 = 1.645 𝐸 = 1.645 = 1.46
2
40
EXCEL: CONFIDENCE(0.1, 5.6, 40)=1.46
ഥ±𝑬 𝝈
𝑿 𝑬 = 𝒛𝜶ൗ 𝑺𝒊𝒈𝒏𝒊𝒇𝒊𝒄𝒂𝒏𝒄𝒆 𝑳𝒆𝒗𝒆𝒍: 𝜶 = 𝟏 − 𝑪𝑳
𝟐 𝒏
Given: 𝜎 = 5.6 𝑛 = 40 𝑋ത = 32
𝛼 = 1 − 0.98 = 0.02
𝛼 0.02
= = 0.01
2 2
𝑧0.01 = 2.33
Solution: c) population mean with 98% confidence level
ഥ±𝑬 𝝈
𝑿 𝑬 = 𝒛𝜶ൗ 𝑺𝒊𝒈𝒏𝒊𝒇𝒊𝒄𝒂𝒏𝒄𝒆 𝑳𝒆𝒗𝒆𝒍: 𝜶 = 𝟏 − 𝑪𝑳
𝟐 𝒏
Given: 𝜎 = 5.6 𝑛 = 40 𝑋ത = 32
5.6
𝑧𝛼ൗ = 𝑧0.01 = 2.33 𝐸 = 2.33 = 2.06
2
40
EXCEL: CONFIDENCE(0.02, 5.6, 40)=2.06
Example 2. ASTM Standard E23 defines standard test methods for notched
bar impact testing of metallic materials. The Charpy V-notch (CVN)
technique measures impact energy and is often used to determine whether or
not a material experiences a ductile-to-brittle transition with decreasing
temperature. Ten measurements of impact energy (J) on specimens of A238
steel cut at 60℃ are as follows: 64.1, 64.7, 64.5, 64.6, 64.5, 64.3, 64.6, 64.8,
64.2, and 64.3. Assume that impact energy is normally distributed with σ = 1
J. Find a 95% CI for μ, the mean impact energy.
𝑧0.025 = 1.96
❖ Confidence Interval on the Mean of a Normal
Distribution, Variance Known
Solution:
The required quantities are 𝑍𝛼/2 = 𝑍0.025 = 1.96, 𝑛 = 10, 𝜎 = 1, and 𝑥ҧ = 64.46.
Based on the sample data, a range of highly plausible values for mean impact energy for
A238 steel at 60℃ is 63.84 𝐽 ≤ 𝜇 ≤ 65.08 𝐽 .
❖ Confidence Interval on the Mean of a Normal
Distribution, Variance Known
Practice Problem:
The average zinc concentration from a sample of measurements taken in 36
different locations in a river is found to be 2.6 grams per milliliter. Find the
95% and 99% confidence intervals for the mean zinc concentration in the
river. Assume that the population standard deviation is 0.3 gram per milliliter.
𝑧0.025 = 1.96
𝑧0.005 = 2.58
❖ Confidence Interval on the Mean of a Normal
Distribution, Variance Known
Practice Problem:
The average zinc concentration from a sample of measurements taken in 36 different locations in a river is
found to be 2.6 grams per milliliter. Find the 95% and 99% confidence intervals for the mean zinc
concentration in the river. Assume that the population standard deviation is 0.3 gram per milliliter.
Solution:
Repeated construction of a
confidence interval for 𝜇
▪ Interpreting a Confidence Interval
Generally, for a fixed sample size n and standard deviation σ, the higher the
confidence level, the longer the resulting CI. The length of a confidence
interval is a measure of the precision of estimation. It is desirable to obtain a
confidence interval that is short enough for decision-making purposes and
that also has adequate confidence. One way to achieve this is by choosing
the sample size n to be large enough to give a CI of specified length or
precision with prescribed confidence.
❖ Choice of Sample Size
𝜎
The precision of the confidence interval in the equation above is 2𝑧𝛼/2 .
𝑛
This means that in using 𝑥ҧ to estimate μ, the error 𝐸 = 𝑥ҧ − 𝜇 is less than or equal to
𝜎
𝑧𝛼/2 with confidence 100(1 − α). This is shown graphically in the figure below.
𝑛
If 𝑥ҧ is used as an estimate of μ, we can be 100(1 − α)% confident that the error |𝑥–μ|
ҧ will
not exceed a specified amount E when the sample size is
If the right-hand side of Equation above is not an integer, it must be rounded up. This
will ensure that the level of confidence does not fall below 100(1 − α)%. Notice that 2E
is the length of the resulting confidence interval.
❖ Choice of Sample Size
Example 1. Consider the CVN test described in Example 2 and suppose
that we want to determine how many specimens must be tested to ensure
that the 95% CI on μ for A238 steel cut at 60℃ has a length of at most 1.0 J.
Because the bound on error in estimation E is one-half of the length of the
CI.
Solution:
E = 0.5, σ = 1, and 𝑍𝛼/2 = 1.96.
2
(1.96)(1)
𝑛 = = 15. 37
0.5
and because n must be an integer, the required sample size is 𝒏 = 𝟏𝟔.
❖ Choice of Sample Size
Notice the general relationship between sample size, desired length of the
confidence interval 2E, confidence level 100(1-α), and standard deviation σ:
Example 1. The same data for impact testing from Example 2 are used to
construct a lower, one-sided 95% confidence interval for the mean impact
energy. Recall that 𝑥ҧ = 64.46, σ = 1 J, and n = 10. What is the interval?
Solution:
𝑧0.05 = 1.645
❖ One-sided Confidence Bounds on Mean of a Normal
Distribution, Variance Known
Practical Interpretation: The lower limit for the two-sided interval in Example
2 was 63.84. Because 𝑧𝛼 < 𝑧𝛼Τ2 , the lower limit of a one-sided interval is always
greater than the lower limit of a two-sided interval of equal confidence. The one-
sided interval does not bound μ from above so that it still achieves 95% confidence
with a slightly larger lower limit. If our interest is only in the lower limit for μ, then
the one-sided interval is preferred because it provides equal confidence with a
greater limit. Similarly, a one-sided upper limit is always less than a two-sided upper
limit of equal confidence.
❖ Large-Sample Confidence Interval on the Mean
Solution:
The summary statistics for these
data are as follows:
The required quantities are n = 53, 𝑥ҧ = 0.5250, s = 0.3486, and 𝑍0.025 = 1.96. The
approximate 95% CI on μ is
If 𝑥ҧ and S are the sample mean and standard deviation of a random sample
from a normal distribution with unknown variance 𝜎 2 , a 100(1 − α)%
confidence interval on μ is given by
where 𝑡𝛼,𝑛−1 is the upper 100α/2 percentage point of the t distribution with
2
n-1 degrees of freedom
7.2 Confidence Interval on the Mean of a Normal
Distribution, Variance Unknown
Example 1: An article in the Journal of Materials Engineering
[“Instrumented Tensile Adhesion Tests on Plasma Sprayed Thermal Barrier
Coatings” (1989, Vol. 11(4), pp. 275–282)] describes the results of tensile
adhesion tests on 22 U-700 alloy specimens. Find the 95% confidence
interval (CI).
The load at specimen failure is as follows (in mega pascals):
𝑡0.025,21 = 2.080
7.2 Confidence Interval on the Mean of a Normal
Distribution, Variance Unknown
Solution:
3.55 3.55
13.71 - 2.080 ( ) ≤ μ ≤ 13.71 + 2.080 ( )
22 22
13.71 -1.57 ≤ μ ≤ 13.71 +1.57
12.14 ≤ μ ≤ 15.28
𝟐
• 𝑿 Distribution
𝜶
𝟏− ,𝐧 − 𝟏
𝟐
2𝛼 2
where 𝑋 ,𝑛−1 and 𝑋 𝛼
1− ,𝑛−1 are the upper and lower 100α/2 percentage
2 2
points of the chi-square distribution with n-1 degrees of freedom,
respectively.
• Confidence Interval on the Standard Deviation
𝜶
𝟏− ,𝐧 − 𝟏
𝟐
• Confidence Interval on the Variance
• Confidence Interval on the Variance
Sample Problem 1:
A large candy manufacturer produces, packages and sells packs of
candy targeted to weigh 52 grams. A quality control manager working
for the company was concerned that the variation in the actual weights
of the targeted 52-gram packs was larger than acceptable. That is, he
was concerned that some packs weighed significantly less than 52-
grams and some weighed significantly more than 52 grams. In an
attempt to estimate σ, the standard deviation of the weights of all of
the 52-gram packs the manufacturer makes, he took a random sample
of n = 10 packs off of the factory line. The random sample yielded a
sample variance of 4.2 grams. Use the random sample to derive a 95%
confidence interval for σ.
• Confidence Interval on the Variance
Solution:
Sample Problem 2:
A sample of 7 boxes of a certain type of cereal with a
nominal weight of 750 g had the following weights: 775, 780,
2
781, 795, 803, 810, 823. Given: n=7, 𝑥=ҧ 795.3, S = 315.574.
2
Find a 95% confidence interval for 𝜎 .
• Confidence Interval on the Variance
Solution:
2
Therefore, we can be 95% confident that 𝜎 , the true variance of weights of
cereal in boxes of this type, lies between 131.04 and 1530.24 g.
• One-Sided Confidence Bounds on the Variance
The 100(1 − α)% lower and upper confidence bounds on 𝜎 2 are respectively
Sample Problem:
An automatic filling machine is used to fill bottles with liquid detergent.
A random sample of 20 bottles results in a sample variance of fill volume of
2
𝑠 = 0.01532 (fluid ounce). If the variance of fill volume is too large, an
unacceptable proportion of bottles will be under- or overfilled. We will
assume that the fill volume is approximately normally distributed. Find the
95% confidence bound.
• One-Sided Confidence Bounds on the Variance
Solution:
Therefore, at the 95% level of confidence, the data indicate that the process
standard deviation could be as large as 0.17 fluid ounce. The process engineer
or manager now needs to determine whether a standard deviation this large
could lead to an operational problem with under- or over-filled bottles.
7.4
Two Samples: Estimating
the Difference between
Two Means
7.4 Two Samples: Estimating the Difference between Two Means
Formula:
Where 𝑥1ҧ is the mean of the first sample and 𝑥ҧ2 is the mean of the second
sample. This formula represents the difference in the sample means, and it
provides an estimate of the difference between the population means. However,
this estimate may be affected by sampling variability, and it is necessary to test
whether the observed difference is statistically significant before drawing
conclusions about the population means. This is typically done using a t-test or
a z-test, depending on the sample size and the assumptions made about the
population variances.
• Confidence Interval for Difference between Two Means,
Variances Known
If 𝑥1ҧ and 𝑥ҧ2 are means of independent random samples of sizes 𝑛1 and 𝑛2
from populations with known variances 𝜎 21 and 𝜎 2 2 , respectively, a
100 1 − 𝛼 % confidence interval for 𝜇1 and 𝜇2 is given by
Where:
Example 1:
25 36 25 36
175 − 165 − 1.96 + < 𝜇1 − 𝜇2 < 175 − 165 + 1.96 +
50 60 50 60
In practice, it is often the case that the variances of the two populations are
unknown. When this is the case, we need to estimate the variances from the
sample data. Using the known variances when they are actually unknown can
lead to incorrect results and confidence intervals that are too narrow.
Additionally, using known variances assumes that the two populations have
equal variances, which may not always be the case.
7.5
Large-Sample Confidence
Interval for a Population
Proportion
• Normal Approximation for a Binomial Proportion
where 𝑍𝛼 is the upper 𝛼/2 percentage point of the standard normal distribution.
2
• Approximate Confidence Interval on a Binomial Proportion
Example 1:
Solution:
The proportion of students in the sample who are female is
69
𝑝Ƹ = = 0.575 1- 𝑝Ƹ = 1 − 0.575 = 0.425
120
Solution:
One may be 90% confident that the true proportion of all students at the college
who are female is between 0.501 and 0.649.
• Sample Size for a Specified Error on a Binomial Proportion
There is a dilemma here: the formula for estimating how large a sample to
take contains the number 𝑝,Ƹ which we know only after we have taken the
sample. There are two ways out of this dilemma. Typically the researcher will
have some idea as to the value of the population proportion 𝑝, hence of what
the sample proportion 𝑝Ƹ is likely to be. For example, if last month 37% of all
voters thought that state taxes are too high, then it is likely that the proportion
with that opinion this month will not be dramatically different, and we would
use the value 0.37 for 𝑝Ƹ in the formula.
• Sample Size for a Specified Error on a Binomial Proportion
Example 1:
Solution:
Confidence level 98% means that 𝛼 = 1 − 0.98 = 0.02 so 𝛼Τ2 = 0.01. We obtain
𝑧0.01 = 2.33. 𝐸 = 0.05
Example 2:
Solution:
𝒏 = 𝟕𝟓𝟐
• Approximate One-Sided Confidence Bounds on a Binomial Proportion
The basic distinction between the two is that the prediction interval
predicts in what range a future individual observation will fall, while a
confidence interval shows the likely range of values associated with some
statistical parameter of the data, such as the population mean.
Confidence intervals refer to the mean (or other parameters) of populations
as a whole, while prediction intervals only ever refer to a single data point.
• Prediction Interval of a Future Observation, Variance Known
Example 1.
A meat inspector has randomly selected 30 packs of 95% lean beef. The
sample resulted in a mean of 96.2% with a sample standard deviation of
0.8%. Find a 99% prediction interval for the leanness of a new pack.
Assume normality.
• Prediction Interval of a Future Observation, Variance Unknown
Solution:
For v = 29 degrees of freedom, 𝑡0.005 = 2.756. Hence, a 99% prediction
interval for a new observation 𝑋0 is
Example 2.
Due to the decrease in interest rates, the First Citizens Bank received a
lot of mortgage applications. A recent sample of 50 mortgage loans resulted
in an average loan amount of $257,300. Assume a population standard
deviation of $25,000. For the next customer who fills out a mortgage
application, find a 95% prediction interval for the loan amount.
• Prediction Interval of a Future Observation, Variance Known
Solution:
𝑥ҧ = $257, 300 𝜎 = $25, 000 𝑍0.025 = 1.96
1 1
257, 300 − 1.96 25, 000 1+ < 𝑋0 < 257, 300 + 1.96 25, 000 1+
50 50
where k is a tolerance interval factor found in Table. Values are given for
𝑦 = 90%, 95%, and 99%, and for 90%, 95%, and 99% confidence.
This interval is very sensitive to the normality assumption. One-sided
tolerance bounds can also be computed.
• Tolerance Interval
Example 1.
Consider Example 7.6 With the information given, find a tolerance
interval that gives two-sided 95% bounds on 90% of the distribution of
packages of 95% lean beef. Assume the data came from an approximately
normal distribution.
Recall from Example 7.6 that 𝑛 = 30, the sample mean is 96.2%, and the
sample standard deviation is 0.8%. From Table I., 𝑘 = 2.14.
• Tolerance Interval
Solution:
we find that the lower and upper bounds are 94.5 and 97.9. We are 95%
confident that the above range covers the central 90% of the distribution of
95% lean beef packages.
Tolerance intervals are often confused with confidence intervals and prediction
intervals. They are not the same thing:
•A prediction interval tells you where a value will probably fall in the future. For
example, 95% prediction interval of 90 to 120 hours for the mean life of a battery tells
you that future batteries produced will fall into that range 95% of the time. Prediction
intervals are usually wider than confidence intervals.