Lin01803_ch09_297-332.
qxd 10/27/10 12:55 PM Page 306
306 Chapter 9
followed the normal distribution with a population standard deviation of $5. A sample of
49 steady smokers revealed that X ⫽ $20.
a. What is the point estimate of the population mean? Explain what it indicates.
b. Using the 95 percent level of confidence, determine the confidence interval for . Ex-
plain what it indicates.
6. Refer to the previous exercise. Suppose that 64 smokers (instead of 49) were sampled.
Assume the sample mean remained the same.
a. What is the 95 percent confidence interval estimate of ?
b. Explain why this confidence interval is narrower than the one determined in the pre-
vious exercise.
7. Bob Nale is the owner of Nale’s Quick Fill. Bob would like to estimate the mean number of
gallons of gasoline sold to his customers. Assume the number of gallons sold follows the
normal distribution with a population standard deviation of 2.30 gallons. From his records,
he selects a random sample of 60 sales and finds the mean number of gallons sold is 8.60.
a. What is the point estimate of the population mean?
b. Develop a 99 percent confidence interval for the population mean.
c. Interpret the meaning of part (b).
8. Dr. Patton is a professor of English. Recently she counted the number of misspelled
words in a group of student essays. She noted the distribution of misspelled words per
essay followed the normal distribution with a population standard deviation of 2.44 words
per essay. For her 10 A.M. section of 40 students, the mean number of misspelled words
was 6.05. Construct a 95 percent confidence interval for the mean number of misspelled
words in the population of student essays.
Population Standard Deviation Unknown
LO4 Compute a In the previous section, we assumed the population standard deviation was known.
confidence interval for In the case involving Del Monte 4-ounce cups of peaches, there would likely be a
the population mean long history of measurements in the filling process. Therefore, it is reasonable to
when the population assume the standard deviation of the population is available. However, in most sam-
standard deviation is pling situations the population standard deviation () is not known. Here are some
unknown. examples where we wish to estimate the population means and it is unlikely we
would know the population standard deviations. Suppose each of these studies
involves students at West Virginia University.
• The Dean of the Business College wants to estimate the mean number of hours
full-time students work at paying jobs each week. He selects a sample of 30 stu-
dents, contacts each student and asks them how many hours they worked last
week. From the sample information, he can calculate the sample mean, but it is
not likely he would know or be able to find the population () standard devia-
tion required in formula (9–1). He could calculate the standard deviation of the
sample and use that as an estimate, but he would not likely know the population
standard deviation.
• The Dean of Students wants to estimate the distance the typical commuter stu-
dent travels to class. She selects a sample of 40 commuter students, contacts
each, and determines the one-way distance from each student’s home to the
center of campus. From the sample data, she calculates the mean travel dis-
tance, that is X . It is unlikely the standard deviation of the population would be
known or available, again making formula (9–1) unusable.
• The Director of Student Loans wants to know the mean amount owed on stu-
dent loans at the time of his/her graduation. The director selects a sample of
20 graduating students and contacts each to find the information. From the
sample information, he can estimate the mean amount. However, to develop a
confidence interval using formula (9–1), the population standard deviation is
necessary. It is not likely this information is available.
Fortunately we can use the sample standard deviation to estimate the population
standard deviation. That is, we use s, the sample standard deviation, to estimate ,
Lin01803_ch09_297-332.qxd 10/27/10 12:55 PM Page 307
Estimation and Confidence Intervals 307
the population standard deviation. But in doing so, we cannot use formula (9–1).
Because we do not know , we cannot use the z distribution. However, there is a
remedy. We use the sample standard deviation and replace the z distribution with
the t distribution.
The t distribution is a continuous probability distribution, with many similar char-
acteristics to the z distribution. William Gosset, an English brewmaster, was the first
to study the t distribution.
Statistics in Action He was particularly concerned with the exact behavior of the distribution of the
following statistic:
William Gosset was
born in England in X⫺
1876 and died there t⫽
s兾1n
in 1937. He worked
for many years at where s is an estimate of . He was especially worried about the discrepancy
Arthur Guinness, between s and when s was calculated from a very small sample. The t distribu-
Sons and Company. tion and the standard normal distribution are shown graphically in Chart 9–1. Note
In fact, in his later particularly that the t distribution is flatter, more spread out, than the standard nor-
years he was in charge mal distribution. This is because the standard deviation of the t distribution is larger
of the Guinness than the standard normal distribution.
Brewery in London.
Guinness preferred
its employees to use
pen names when
publishing papers, so z distribution
in 1908, when Gosset
wrote “The Probable
Error of a Mean,” t distribution
he used the name
“Student.” In this pa-
0
per, he first described
the properties of the t
distribution.
CHART 9–1 The Standard Normal Distribution and Student’s t Distribution
The following characteristics of the t distribution are based on the assumption
that the population of interest is normal, or nearly normal.
• It is, like the z distribution, a continuous distribution.
• It is, like the z distribution, bell-shaped and symmetrical.
• There is not one t distribution, but rather a family of t distributions. All t distri-
butions have a mean of 0, but their standard deviations differ according to the
sample size, n. There is a t distribution for a sample size of 20, another for
a sample size of 22, and so on. The standard deviation for a t distribution with
5 observations is larger than for a t distribution with 20 observations.
• The t distribution is more spread out and flatter at the center than the standard
normal distribution (see Chart 9–1). As the sample size increases, however, the
t distribution approaches the standard normal distribution, because the errors
in using s to estimate decrease with larger samples.
Because Student’s t distribution has a greater spread than the z distribution, the
value of t for a given level of confidence is larger in magnitude than the corre-
sponding z value. Chart 9–2 shows the values of z for a 95 percent level of confi-
dence and of t for the same level of confidence when the sample size is n ⫽ 5.
How we obtained the actual value of t will be explained shortly. For now, observe
that for the same level of confidence the t distribution is flatter or more spread out
than the standard normal distribution.
Lin01803_ch09_297-332.qxd 10/27/10 12:55 PM Page 308
308 Chapter 9
Distribution of z
.025 .95 .025
1.96 1.96 Scale of z
Distribution of t
n=5
.025 .025
.95
2.776 2.776 Scale of t
CHART 9–2 Values of z and t for the 95 Percent Level of Confidence
To develop a confidence interval for the population mean using the t distribution,
we adjust formula (9–1) as follows.
CONFIDENCE INTERVAL FOR THE s
X⫾t [9–2]
POPULATION MEAN, UNKNOWN 1n
To determine a confidence interval for the population mean with an unknown
standard deviation, we:
1. Assume the sampled population is either normal or approximately normal. From
the central limit theorem, we know that this assumption is questionable for small
sample sizes, and becomes more valid with larger sample sizes.
2. Estimate the population standard deviation () with the sample standard
deviation (s).
3. Use the t distribution rather than the z distribution.
We should be clear at this point. We base the decision on whether to use the t or
the z on whether or not we know , the population standard deviation. If we know
the population standard deviation, then we use z. If we do not know the population
standard deviation, then we must use t. Chart 9–3 summarizes the decision-making
process.
The following example will illustrate a confidence interval for a population mean
when the population standard deviation is unknown and how to find the appropriate
value of t in a table.
Lin01803_ch09_297-332.qxd 10/27/10 12:55 PM Page 309
Estimation and Confidence Intervals 309
Assume the
population is
normal
Is the population
standard
deviation known?
No Yes
Use the t distribution Use the z distribution
CHART 9–3 Determining When to Use the z Distribution or the t Distribution
Example A tire manufacturer wishes to investigate the tread life of its tires. A sample of 10 tires
driven 50,000 miles revealed a sample mean of 0.32 inches of tread remaining with
a standard deviation of 0.09 inches. Construct a 95 percent confidence interval for
the population mean. Would it be reasonable for the manufacturer to conclude that
after 50,000 miles the population mean amount of tread remaining is 0.30 inches?
Solution To begin, we assume the population distribution is normal. In this case, we don’t
have a lot of evidence, but the assumption is probably reasonable. We know the
sample standard deviation is .09 inches. We use formula (9–2):
s
X⫾t
1n
From the information given, X ⫽ 0.32, s ⫽ 0.09, and n ⫽ 10. To find the value of t,
we use Appendix B.2, a portion of which is reproduced in Table 9–1. Appendix B.2
is also reproduced on the inside back cover of the text. The first step for locating
TABLE 9–1 A Portion of the t Distribution
Confidence Intervals
80% 90% 95% 98% 99%
Level of Significance for One-Tailed Test
df 0.10 0.05 0.025 0.010 0.005
Level of Significance for Two-Tailed Test
0.20 0.10 0.05 0.02 0.01
1 3.078 6.314 12.706 31.821 63.657
2 1.886 2.920 4.303 6.965 9.925
3 1.638 2.353 3.182 4.541 5.841
4 1.533 2.132 2.776 3.747 4.604
5 1.476 2.015 2.571 3.365 4.032
6 1.440 1.943 2.447 3.143 3.707
7 1.415 1.895 2.365 2.998 3.499
8 1.397 1.860 2.306 2.896 3.355
9 1.383 1.833 2.262 2.821 3.250
10 1.372 1.812 2.228 2.764 3.169
Lin01803_ch09_297-332.qxd 10/27/10 12:55 PM Page 310
310 Chapter 9
t is to move across the columns identified for “Confidence Intervals” to the level of
confidence requested. In this case, we want the 95 percent level of confidence, so
we move to the column headed “95%.” The column on the left margin is identified
as “df.” This refers to the number of degrees of freedom. The number of degrees
of freedom is the number of observations in the sample minus the number of sam-
ples, written n ⫺ 1. In this case, it is 10 ⫺ 1 ⫽ 9. Why did we decide there were
9 degrees of freedom? When sample statistics are being used, it is necessary to
determine the number of values that are free to vary.
To illustrate the meaning of degrees of freedom: Assume that the mean of four
numbers is known to be 5. The four numbers are 7, 4, 1, and 8. The deviations of
these numbers from the mean must total 0. The deviations of ⫹2, ⫺1, ⫺4, and ⫹3
do total 0. If the deviations of ⫹2, ⫺1, and ⫺4 are known, then the value of ⫹3
is fixed (restricted) in order to satisfy the condition that the sum of the deviations
must equal 0. Thus, 1 degree of freedom is lost in a sampling problem involving
the standard deviation of the sample because one number (the arithmetic mean) is
known. For a 95 percent level of confidence and 9 degrees of freedom, we select
the row with 9 degrees of freedom. The value of t is 2.262.
To determine the confidence interval, we substitute the values in formula (9–2).
s 0.09
X⫾t ⫽ 0.32 ⫾ 2.262 ⫽ 0.32 ⫾ .064
1n 110
The endpoints of the confidence interval are 0.256 and 0.384. How do we interpret
this result? If we repeated this study 200 times, calculating the 95 percent confi-
dence interval with each sample’s mean and the standard deviation, 190 of the inter-
vals would include the population mean. Ten of the intervals would not include the
population mean. This is the effect of sampling error. A further interpretation is to
conclude that the population mean is in this interval. The manufacturer can be rea-
sonably sure (95 percent confident) that the mean remaining tread depth is between
0.256 and 0.384 inches. Because the value of 0.30 is in this interval, it is possible
that the mean of the population is 0.30.
Here is another example to clarify the use of confidence intervals. Suppose an ar-
ticle in your local newspaper reported that the mean time to sell a residential property
in the area is 60 days. You select a random sample of 20 homes sold in the last year
and find the mean selling time is 65 days. Based on the sample data, you develop a
95 percent confidence interval for the population mean. You find that the endpoints of
the confidence interval are 62 days and 68 days. How do you interpret this result? You
can be reasonably confident the population mean is within this range. The value pro-
posed for the population mean, that is, 60 days, is not included in the interval. It is not
likely that the population mean is 60 days. The evidence indicates the statement by the
local newspaper may not be correct. To put it another way, it seems unreasonable to
obtain the sample you did from a population that had a mean selling time of 60 days.
The following example will show additional details for determining and inter-
preting a confidence interval. We used Minitab to perform the calculations.
Example The manager of the Inlet Square Mall, near Ft. Myers, Florida, wants to estimate the
mean amount spent per shopping visit by customers. A sample of 20 customers
reveals the following amounts spent.
$48.16 $42.22 $46.82 $51.45 $23.78 $41.86 $54.86
37.92 52.64 48.59 50.82 46.94 61.83 61.69
49.17 61.46 51.35 52.68 58.84 43.88
What is the best estimate of the population mean? Determine a 95 percent confidence
interval. Interpret the result. Would it be reasonable to conclude that the population
mean is $50? What about $60?
Lin01803_ch09_297-332.qxd 10/27/10 12:55 PM Page 311
Estimation and Confidence Intervals 311
The mall manager assumes that the population of the
Solution amounts spent follows the normal distribution. This is a rea-
sonable assumption in this case. Additionally, the confidence
interval technique is quite powerful and tends to commit any
errors on the conservative side if the population is not nor-
mal. We should not make the normality assumption when
the population is severely skewed or when the distribution
has “thick tails.” In Chapter 18, we present methods for han-
dling this problem if we cannot make the normality assump-
tion. In this case, the normality assumption is reasonable.
The population standard deviation is not known. Hence,
it is appropriate to use the t distribution and formula (9–2)
to find the confidence interval. We use the Minitab system
to find the mean and standard deviation of this sample. The
results are shown below.
The mall manager does not know the population mean. The sample mean is the
best estimate of that value. From the pictured Minitab output, the mean is $49.35,
which is the best estimate, the point estimate, of the unknown population mean.
We use formula (9–2) to find the confidence interval. The value of t is available from
Appendix B.2. There are n ⫺ 1 ⫽ 20 ⫺ 1 ⫽ 19 degrees of freedom. We move across
the row with 19 degrees of freedom to the column for the 95 percent confidence level.
The value at this intersection is 2.093. We substitute these values into formula (9–2)
to find the confidence interval.
s $9.01
X⫾t ⫽ $49.35 ⫾ 2.093 ⫽ $49.35 ⫾ $4.22
1n 120
The endpoints of the confidence interval are $45.13 and $53.57. It is reasonable to
conclude that the population mean is in that interval.
The manager of Inlet Square wondered whether the population mean could have
been $50 or $60. The value of $50 is within the confidence interval. It is reasonable
that the population mean could be $50. The value of $60 is not in the confidence
interval. Hence, we conclude that the population mean is unlikely to be $60.
The calculations to construct a confidence interval are also available in Excel.
The output follows. Note that the sample mean ($49.35) and the sample standard
deviation ($9.01) are the same as those in the Minitab calculations. In the Excel
information, the last line of the output also includes the margin of error, which is the
amount that is added and subtracted from the sample mean to form the endpoints
of the confidence interval. This value is found from
s $9.01
t ⫽ 2.093 ⫽ $4.22
1n 120
Lin01803_ch09_297-332.qxd 10/27/10 12:55 PM Page 312
312 Chapter 9
Self-Review 9–2 Dottie Kleman is the “Cookie Lady.” She bakes and sells cookies at 50 different locations in
the Philadelphia area. Ms. Kleman is concerned about absenteeism among her workers. The
information below reports the number of days absent for a sample of 10 workers during the
last two-week pay period.
4 1 2 2 1 2 2 1 0 3
(a) Determine the mean and the standard deviation of the sample.
(b) What is the population mean? What is the best estimate of that value?
(c) Develop a 95 percent confidence interval for the population mean.
(d) Explain why the t distribution is used as a part of the confidence interval.
(e) Is it reasonable to conclude that the typical worker does not miss any
days during a pay period?
Exercises
9. Use Appendix B.2 to locate the value of t under the following conditions.
a. The sample size is 12 and the level of confidence is 95 percent.
b. The sample size is 20 and the level of confidence is 90 percent.
c. The sample size is 8 and the level of confidence is 99 percent.
10. Use Appendix B.2 to locate the value of t under the following conditions.
a. The sample size is 15 and the level of confidence is 95 percent.
b. The sample size is 24 and the level of confidence is 98 percent.
c. The sample size is 12 and the level of confidence is 90 percent.
11. The owner of Britten’s Egg Farm wants to estimate the mean number of eggs laid per
chicken. A sample of 20 chickens shows they laid an average of 20 eggs per month with
a standard deviation of 2 eggs per month.
a. What is the value of the population mean? What is the best estimate of this value?
b. Explain why we need to use the t distribution. What assumption do you need to
make?
c. For a 95 percent confidence interval, what is the value of t?
d. Develop the 95 percent confidence interval for the population mean.
e. Would it be reasonable to conclude that the population mean is 21 eggs? What about
25 eggs?