Tryfos-Interval Estimation
Tryfos-Interval Estimation
Interval estimation
and testing
5.1 INTRODUCTION
As stated earlier, the reason for taking a sample is to obtain information
about the unknown characteristics of a population or process. Two types
of population characteristics are of special interest in business: the mean of
a variable, and the proportion of elements belonging to a category. After
the sample has been taken, these are estimated by the corresponding sample
mean and proportion. In the majority of studies in business, these estimates
(there could be many of them in a typical marketing survey) are all that is
required of the sample.
The key to good estimates lies in the design of the sample, and this
takes place before the sample is actually selected. A well-designed sample,
we have argued, should be: (a) randomly selected from the population or
process of interest; and (b) large enough so that the estimates will have the
desired degree of accuracy.
Interval estimates may be used in place of, or as a supplement to, the
“point” estimates we have encountered up to now. Rather than state, after
a sample is taken, that a population characteristic is estimated to be such-
and-such a number, it may on occasion be more informative to state that
the characteristic is estimated to be in such-and-such an interval. Not any
arbitrary interval, obviously, will do. If we are to make interval estimates, we
want assurance that our statement will be correct (that is, that the interval
will contain the population characteristic) with a given probability. Such
intervals are known as conÞdence intervals and are described in the following
two sections.
The remainder of the chapter deals with statistical tests. A statistical
test is a rule–a prescription, if you like–for deciding which of two state-
ments concerning an unknown population characteristic is true. No decision
rule (statistical or other) is infallible. The attractive feature of statisti-
cal tests is that they allow the desision-maker to control the probability of
making an error judged (by the decision-maker) to be the more serious.
If the population (N ) and the sample size (n) are large, the prob-
ability is approximately 1 − α that the interval
R ± Uα/2 SR (5.1)
contains the population mean (µ). These intervals are known as 100(1−
α)% conÞdence intervals for π and µ respectively, and (1 − α) is known
as the conÞdence level. Similar conÞdence intervals for N π and N µ are
and r r
S2 N − n R(1 − R) N − n
SX̄ = , SR = . (5.5)
n N −1 n N −1
Uα/2 for selected (1 − α) are given in Table 5.1.
Table 5.1
Uα/2 for selected 1 − α
1−α Uα/2 1−α Uα/2
0.99 2.576 0.80 1.282
0.95 1.960 0.60 0.842
0.90 1.645 0.50 0.674
of N = 100, 000 was taken. 42% of the sampled households said they would
buy an experimental product. We calculate
r
(0.42)(1 − 0.42) 100000 − 500
SR = = 0.022.
500 100000 − 1
Suppose that a 90% conÞdence interval is desired for the proportion of
households in the population who intend to buy. Then, 1 − α = 0.90,
and Uα/2 = 1.645. Thus, Uα/2 SR = (1.645)(0.022) = 0.036. The desired
interval is from (0.420 − 0.036) to (0.420 + 0.036), or from 38.4% to 45.6%.
A 90% conÞdence interval for the number of households who intend to
buy is (100,000)(0.420 ± 0.036), or from about 38,400 to 45,600.
Figure 5.1
Interval estimates
the Þve conÞdence intervals shown–the exception being that from Sample
No. 4–contain µ.) The statement that the population mean lies in the
interval X̄ ± Uα/2 SX̄ will therefore be correct in 100(1 − α)% of the samples
in the long run.
The calculated conÞdence interval, however, conveys some information
about the reliability of the sample estimate. The intervals (5.1) and (5.2),
it will be noted, are formed around the estimate of the population charac-
teristic, and their width depends on the size of the sample, the variability
of the observations, and the desired conÞdence level.
As an illustration, consider the width of the conÞdence interval for a
population mean: it is approximately twice the amount
r
1 1
c = Uα/2 S 2( − ).
n N
The smaller the conÞdence level, 1 − α, the smaller is Uα/2 . Other things
being equal, therefore, the larger the sample size, the smaller the variability
of the sample observations (as measured by S 2 ), and the lower the conÞdence
level, the narrower the interval–which is as one intuitively expects.
The reader should always bear in mind that a conÞdence interval is not
a necessary appendage to the sample estimate. In most business samples
designed to obtain estimates of a large number of population characteristics,
a routine reporting of all possible conÞdence intervals would most certainly
confuse rather than enlighten.*
If the population (N ) and the sample (n) are large, the probability
distributions of the random variables
X̄ − µ R−π
U1 = , U2 = , (5.6)
SX̄ SR
1X 1
S2 = (Xi − X̄)2 = [(0 − 0.667)2 + (0 − 0.667)2 + (2 − 0.667)2 ] = 0.889.
n 3
Therefore,
r r
0.889 10 − 3 (0.333)(1 − 0.333) 10 − 3
SX̄ = = 0.480, SR = = 0.240,
3 10 − 1 3 10 − 1
0.667 − 0.9
U1 = = −0.485,
0.480
and
0.333 − 0.7
U2 = = −1.529.
0.240
For example, U1 = −0.485 indicates that the observed value of the sample
mean lies 0.485 estimated standard deviations to the left of its expected
value, and U2 = −1.529 shows that the observed value of R lies 1.529 esti-
mated standard deviations to the left of the mean of its distribution.
For each sample outcome listed in Table 4.1 of Chapter 4, there cor-
responds a value of U1 and U2 . The U ’s are random variables, and their
probability distributions can be constructed in the usual way. We shall not
do so, however, because both N and n are small, and no useful purpose will
be served by continuing the calculations.
Consider now Figure 5.2. The histograms depict the probability distri-
butions of U1 for samples of size 3, 10, 20, and 30 with replacement drawn
from the population of Example 4.1 of Chapter 4. Each of these probabil-
ity distributions was routinely constructed from a list of all possible sample
outcomes, their probabilities, and the associated values of U1 ; a computer
program was used to perform the long and tedious numerical calculations.
When S 2 = 0, which occurs when all the sample observations are identical,
U1 has an indeterminate value; the probability of an indeterminate value
is indicated by a separate bar in the histograms of Figure 5.2. Note that
the probability of an indeterminate value decreases as the sample size in-
creases. Compare the histograms with the superimposed standard normal
distribution. As the sample size increases, the standard normal distribution
provides an increasingly better approximation to the actual distribution of
U1 .
What is illustrated for U1 and sampling with replacement can be shown
to hold for U2 and for sampling without replacement: for large N and n
(n < N ), the actual probability distribution of any Ui may be approximated
by the standard normal distribution.
8 Chapter 5: Interval estimation and testing
Figure 5.2
Distribution of U1
5.3 Further properties of large samples 9
It has not been made clear so far just how large N and n must be
for the normal approximations to be satisfactory. Unfortunately, no exact
guidance can be given. In statistical theory, the large-sample properties
are proven for inÞnitely large N and n. Empirical investigations, however,
show the approximation to be surprisingly good in some cases for samples
as low as 30 or smaller. On the other hand, there are other cases where the
approximation is poor even for very large samples.
Roughly speaking, the more symmetric the population distribution of
the variable X (in the case of U1 ) or the closer π is to 0.5 (in the case of
U2 ), the smaller is the sample size required.
As a rule of thumb, the reader would probably be safe in assuming that
for n ≥ 100 and N ≥ 200–conditions easily satisÞed in most samples used
in business–the normal approximation is satisfactory, and that the results
based on this approximation (to be described in the following sections) are
applicable.
We would now like to show that the probability that the interval (5.2) will
contain (“bracket”) µ is approximately 1 − α. The proof concerning the in-
terval (5.1) is very similar, and is left as an exercise for the reader.
¶ For large N and n, the distribution of the ratio U1 = (X̄ − µ)/SX̄ is ap-
proximately the standard normal. Let Uα/2 be a number such that the probability
that the standard normal variable will exceed that number is α/2. By the sym-
metry of the normal distribution, P r(−Uα/2 ≤ U1 ≤ Uα/2 ) = 1 − α. Substituting
(X̄ − µ)/SR for U1 , we have
X̄ − µ
P r(−Uα/2 ≤ ≤ Uα/2 ) = 1 − α. (5.9)
SX̄
Consider the expression within the parentheses. We apply the same two rules
concerning inequalities stated in Chapter 2.* Multiplying all three terms by (the
positive) SX̄ , we get −Uα/2 SX̄ ≤ X̄ − µ ≤ Uα/2 SX̄ . Multiplying these terms by
−1, we reverse the inequalities and get Uα/2 SX̄ ≥ µ − X̄ ≥ −Uα/2 SX̄ . Adding
X̄ to all three terms gives X̄ − Uα/2 SX̄ ≤ µ ≤ X̄ + Uα/2 SX̄ . The Þrst expression
implies the last. All we have done is write the original inequalities in a different
form. Therefore, Equation (5.9) implies that
In words, the probability is 1 − α that the interval from (X̄ − Uα/2 SX̄ ) to (X̄ +
Uα/2 SX̄ ) contains the population mean. This interval is therefore a 100(1 − α)%
conÞdence interval for µ, and the proof is complete. ¶
Substituting (X̄ − µ)/SX̄ for U and following the same approach as above,
we Þnd that:
Example 5.3 A certain brand of light bulbs has a “rated life of 1,000
hours.” This rating is assigned by the manufacturer. The small print on the
5.4 Understanding statistical tests 11
package explains that the average life of this brand of bulbs is warranted
to be more than 1,000 hours. (It is understood that the life of individual
light bulbs varies; some of this variability could perhaps be removed by
better quality control, but some is inherent in the manufacturing process
and cannot be eliminated.)
A batch of 10,000 bulbs has been produced. Before it is shipped out,
a test must be made to determine if the quality of the batch is consistent
with the rating. At issue, therefore, is whether or not the average life of the
bulbs in the batch is more than 1,000 hours. Measuring the life of all 10,000
bulbs is obviously out of the question since the measurement is destructive:
life is measured by letting the bulb burn until it burns out. A sample must
be used.
In one sense, the problem is simple. The manufacturer could select a
random sample of light bulbs, measure their life duration, and calculate the
average life of the bulbs in the sample. If this sample average is greater
than 1,000 hours, the conclusion could be that the batch is Good (that
is, the quality rating is justiÞed); the batch would then be released. On
the other hand, if the sample average is less than or equal to 1,000 hours,
the conclusion could be that the batch is Bad (does not meet the quality
standard) and its release would be withheld.
Let µ be the (unknown) average life of the bulbs in the batch. The issue
then is whether or not µ is more than 1,000 hours. A statistician would say
that the problem involves two hypotheses concerning µ:
where X̄ is the average life of n randomly selected light bulbs. (The terms
“accept” and “reject” are part of the established terminology but need not
be taken literally. By “accept H1 ” we mean “decide in favor of H1 ,” and by
“reject H1 ” “decide in favor of H2 .”)
This decision rule may be perfectly sensible, but it is not at all ßexible.
Let us therefore make it a little more general, as follows:
Accept H1 if X̄ ≤ c,
(5.12)
Reject H1 if X̄ > c,
12 Chapter 5: Interval estimation and testing
Table 5.2
Hypotheses, actions, and consequences
Acts
Events Accept H1 Reject H1
H1 is true No error Type I error
H1 is false Type II error No error
When N and n are large, the approximate decision rule for testing
H1 : µ ≤ µ0 ,
(5.13)
H2 : µ > µ0 ,
Accept H1 if X̄ ≤ µ0 + Uα SX̄ ,
(5.14)
Reject H1 if X̄ > µ0 + Uα SX̄ .
Calculate
r
(80)2 10000 − 100
SX̄ = = 7.96.
100 10000 − 1
¶ We would now like to show that the decision rule (5.14) is indeed such that
the probability of Type I error does not exceed α, and that of Type II error does
not exceed 1 − α.
By deÞnition, a Type I error occurs when µ ≤ µ0 , and a Type II error when
µ > µ0 . The proof, then, is in two steps, as follows.
First, suppose that the true value of µ is less than or equal to µ0 , i.e., H1 is
14 Chapter 5: Interval estimation and testing
Figure 5.3
Standard normal distribution
For the second step of the proof, suppose that µ is greater than µ0 , i.e., H1
is false. The probability of Type II error is the probability of accepting H1 , or
¡ X̄ − µ0 ¢
P r(X̄ ≤ µ0 + Uα SX̄ ) = P r ≤ Uα
SX̄
¡ X̄ − µ + µ − µ0 ¢
= Pr ≤ Uα
SX̄
¡ X̄ − µ µ − µ0 ¢
= Pr ≤ Uα −
SX̄ SX̄
= P r(U1 ≤ Uα − e)
= P r(U1 ≤ U 0 )
where e is positive and U 0 < Uα . As illustrated in Figure 5.3, P r(U1 ≤ U 0 ) < 1−α,
that is, the probability of Type II error cannot exceed 1 − α, and the proof is
complete. ¶
5.4 Understanding statistical tests 15
H1 : π ≤ π0 ,
(5.15)
H2 : π > π0 ,
(π0 a given number) so that the probability of Type I error does not
exceed α and that of Type II error does not exceed 1 − α is to
Accept H1 if R ≤ π0 + Uα SR
(5.16)
Reject H1 if R > π0 + Uα SR .
Example 5.4 Suppose that by a “rating of 1,000 hours,” the light bulb
manufacturer means “relatively few bulbs last less than 1,000 hours.” (Many
quality standards in manufacturing have this type of interpretation.) More
speciÞcally, suppose that “relatively few” means “5% or less.” Let π rep-
resent the proportion of light bulbs in the batch which last less than 1,000
16 Chapter 5: Interval estimation and testing
In this case, a Type I error occurs when a Good lot is declared Bad, and
a Type II error when a Bad lot is declared Good. Suppose it is the latter
error which is considered more serious, and that the probability of Type II
error should not exceed 10%. We set α = 1 − 0.10 = 0.90, which implies
Uα = −1.282.
A random sample without replacement of size n = 100 is taken from
the batch of N = 10,000 light bulbs. Four of the 100 bulbs (R = 0.04) are
found to last less than 1,000 hours. We calculate
r
(0.04)(1 − 0.04) 10000 − 100
SR = = 0.019,
100 10000 − 1
and then π0 + Uα SR = (0.050) − (1.282)(0.019) = 0.026. Since R > 0.026,
H1 is rejected and the batch is declared Bad.
This somewhat surprising conclusion is dictated by the choice of α
which guards against the occurrence of a Type II error. If the two errors
were considered equally serious, or if the Type I error were thought to be
more serious, the opposite conclusion would have been reached.
Table 5.4
Events, acts, and consequences
Acts
Events a1 : Accept H1 a2 : Reject H1
H1 : θ in S1 No error Type I error
H2 : θ in S2 Type II error No error
wish to test the hypothesis that the population relative frequencies are equal
to speciÞed numbers π1o , π2o , . . . , πso , against the alternative hypothesis that
at least one of the πi is not equal to the speciÞed number.
Example 5.5 A casino is testing a die for fairness. The die is fair if the
six faces show up with equal relative frequencies (πi ) in the long run.
The die will be rolled a number of times, and the relative frequencies with
which the six faces show up observed. How should one decide whether the
die is fair or not?
Accept H1 if V ≤ Vα;s−1 ,
(5.18)
Reject H1 if V > Vα;s−1 ,
where
s
X (Ri − πio )2
V =n . (5.19)
i=1
πio
to the πio , and the value of V will tend to be close to zero. If, on the other
hand, the true πi deviate from the hypothesized values under H1 , πio , the
Ri will tend to deviate from the πio , and V will tend to be greater than
zero. We want to accept H1 when V is small, and to reject it when V is
large:
Accept H1 if V ≤ c,
Reject H1 if V > c.
c, the “critical value” of this test, distinguishes “small” from “large” V
values. As always, we would like to determine c so that the probability of a
Type I error does not exceed α, and that of a Type II error 1 − α.
In mathematical statistics, it is shown that if H1 is true and the sam-
ple large and with replacement, the probability distribution of V is approxi-
mately chi-square with parameter λ = s− 1. (The deÞnition of this distribu-
tion can be found in Appendix 2.) Denote by Vα;s−1 the number such that
the probability of a chi-square random variable with parameter λ = s − 1
exceeding that number is α. These numbers are tabulated in Appendix 4H.
It follows that if c is made equal to this number, the probability of a Type I
error will not exceed α (in fact, will equal α). Intuitively, it should be clear
that the probability of a Type II error is greatest when the πi are very close
to the πio , at which point the probability of a Type II error is 1 − α.
For calculations by hand, any one of the following versions of Equation
(5.19) may be used:
s
X Ps s
(Fi − nπio )2 ¡ i=1 Ri2 ¢ X Fi2
V = =n −1 = − n. (5.20)
i=1
nπio πio i=1
nπio
Table 5.5
Observations, Example 5.5
Face (Ci ) Frequency (Fi ) Relative frequency (Ri )
1 11 0.183
2 9 0.150
3 12 0.200
4 8 0.133
5 9 0.150
6 11 0.183
1.000
22 Chapter 5: Interval estimation and testing
1
= [(11)2 + (9)2 + · · · + (11)2 ] − 60
(60)(1/6)
= 1.20.
Since V > 0.554, the hypothesis that the die is fair is rejected.
Related to the above is the test of the hypothesis that the population
distribution has a given mathematical form with certain (unspeciÞed) pa-
rameter values. An example would be the test of the hypothesis that the
distribution of the measurement of an independent process is normal with
some values of the parameters µ and σ. (If the parameter values as well
as the form of the distribution are speciÞed–for example, by the hypothe-
sis that the population distribution is normal with parameter µ = 2.0 and
σ = 0.3–the appropriate test is the earlier one in this section.)
In advanced mathematical statistics texts, it is shown that, when the
random sample is large and with replacement, or large and from an indepen-
dent process, the decision rule for this test is given by (5.18), except that the
critical value is Vα;s−k−1 , where k is the number of estimated parameters of
the hypothesized distribution. In calculating the V statistic, πio are proba-
bilities determined under the assumption that the form of the distribution
is that speciÞed by the hypothesis, with appropriately estimated parameter
values.
Table 5.6
Distribution of number of defects, Example 5.6
Number of Number of Proportion of Proportion
defects items, Fi items, Ri under H1 , πio
(1) (2) (3) (4)
0 463 0.926 0.9213
1 34 0.068 0.0755
2 2 0.004 0.0031
3 1 0.002 0.0001
500 1.000 1.0000
Table 5.7
Population joint distribution
First Second attribute
attribute · · · Bj ··· Total
··· ··· ··· ··· ···
Ai · · · πij ··· πi.
··· ··· ··· ··· ···
Total · · · π.j ··· 1.0
Table 5.8
Sample joint distribution
First Second attribute
attribute · · · Bj ··· Total
··· ··· ··· ··· ···
Ai · · · Rij ··· Ri.
··· ··· ··· ··· ···
Total · · · R.j ··· 1.0
Before stating the decision rule for this case, let us illustrate the nota-
tion with an example.
Example 5.7 It has been argued that the main difference between younger
and older drivers is the tendency of the former to have relatively more acci-
dents and claims. However, again according to this argument, the amount
of the claim is determined largely by the circumstances of the accident and
should be unrelated to the age of the driver. On the other hand, if younger
drivers tend to drive larger and more expensive cars faster, the severity of
any accident in which they are involved will tend to be greater, and the
claim amount should be related to age.
The most recent 500 claims received by an automobile insurance com-
pany were analyzed, and the joint relative frequency distribution of claim
amount and age of the insured was obtained, as shown in Table 5.9.
For example, in 1.4% of the 500 selected Þles the insured was under 30
and the amount of the claim was over $10,000.
The question is: Is the amount of the claim independent of the age of
the insured?
26 Chapter 5: Interval estimation and testing
Table 5.9
Distribution of age and claim amount,
Example 5.7
Claim amount ($000)
Age Under 1 1 to 10 Over 10 Total
Under 30 0.362 0.016 0.014 0.392
30 to 50 0.318 0.024 0.012 0.354
Over 50 0.240 0.008 0.006 0.254
Total 0.920 0.048 0.032 1.000
Note that the hypotheses refer to the process from which the Þles are
selected. A simple calculation will show that the deÞnition of independence
is not satisÞed for the sample.
Xs X t
(Rij − Ri. R.j )2
V =n (5.22)
i=1 j=1
Ri. R.j
will tend to be close to zero. If, on the other hand, the attributes are not
independent, the Rij will tend to deviate from the Ri. R.j , and the value of
V will tend to be large.
A reasonable decision rule, then, is to accept the hypothesis of inde-
pendence when V is small, and to reject it when V is large. This decision
rule can be written as
Accept H1 if V ≤ c,
Reject H1 if V > c,
Accept H1 if V ≤ Vα;(s−1)(t−1) ,
(5.24)
Reject H1 if V > Vα;(s−1)(t−1) ,
Table 5.10
Frequency distributions of t samples
Aggregate
Category, Sample Sample Sample relative
Ci 1 2 ··· t Total frequencies, Ri.
C1 F11 F12 ··· F1t F1. R1. = F1. /n
C2 F21 F22 ··· F2t F2. R2. = F2. /n
··· ··· ··· ··· ··· ··· ···
Cs Fs1 Fs2 ··· Fst Fs. Rs. = Fs. /n
Total n1 n2 ··· nt n 1
of elements of the jth sample that fall into category Ci . The notation is
illustrated in Table 5.10.
The problem is: Are the populations from which the samples were
drawn identical with respect to the distribution of the variable or attribute?
Table 5.11
Four brands of batteries compared
Life (hours) Brand A Brand B Brand C Brand D Total Ri.
Under 19 hr 11 10 8 21 50 0.100
19 to 20 hr 29 35 25 63 152 0.304
20 to 21 hr 42 50 30 77 199 0.398
Over 21 hr 18 25 17 39 99 0.198
Total 100 120 80 200 500 1.000
Xs X t
(Fij − Eij )2
V = (5.26)
i=1 j=1
Eij
against H2 that at least one of the above equalities does not hold, so
that the probability of a Type I error equals α and that of a Type II
error does not exceed 1 − α, is to
Accept H1 if V ≤ Vα;(s−1)(t−1) ,
(5.28)
Reject H1 if V > Vα;(s−1)(t−1) .
Example 5.8 (Continued) If the four samples come from populations hav-
ing identical distributions, the estimates of the common population relative
frequencies of the four categories are shown in the last column of Table 5.11:
R1. = 0.100, R2. = 0.304, R3. = 0.398, and R4. = 0.198. The estimated
expected frequencies under H1 , Eij = nj Ri. , are shown in Table 5.12.
Table 5.12
Estimated expected frequencies, Eij , Example 5.8
Category Brand A Brand B Brand C Brand D
Under 19 hr 10.0 12.00 8.00 20.0
19 to 20 hr 30.4 36.48 24.32 60.8
20 to 21 hr 39.8 47.76 31.84 79.6
Over 21 hr 19.8 23.76 15.84 39.6
Total 100.0 120.00 80.00 300.0
32 Chapter 5: Interval estimation and testing
X4 X 4
(Fij − Eij )2
V =
i=1 j=1
Eij
(11 − 10)2 (29 − 30.4)2 (39 − 39.6)2
= + + ··· +
| 10 30.4 {z 39.6 }
16 terms
= 1.447.
Once again note that H1 is a very precise statement. To assert that the
population distributions are identical means, strictly speaking, that there is
absolutely no difference among the relative frequencies of any category. In
this example, the acceptance of H1 is best interpreted as non-rejection: the
samples are not large enough to determine conclusively that the populations
are not identical.
distributions are different. On the other hand, if two distributions are iden-
tical, their means, variances, and all other characteristics are the same.
We shall describe here a test of the hypothesis that two population
means are equal. The Þrst population consists of N1 , and the second of N2
elements. Let µ1 and µ2 be the means of a certain variable X in the two
populations. At issue is whether or not µ1 = µ2 (alternatively, whether or
not µ1 − µ2 = 0).
A random sample is drawn from each of the two populations. Let ni ,
X̄i , and Si2 be the size of the sample, and the mean and variance of variable
X in the sample drawn from population i (i = 1, 2).
A statistic on which the test can be based is the difference between the
two sample means, X̄1 − X̄2 . If the two population means are equal, the
difference (X̄1 − X̄2 ) will tend to be close to 0; if not, (X̄1 − X̄2 ) will tend to
deviate from 0. We shall want to accept H1 when (X̄1 − X̄2 ) is close to 0,
and to reject it when (X̄1 − X̄2 ) is not close, in either direction, to 0. This
decision rule can be written as
Brand A Brand B
Sample size n1 = 100 n2 = 120
Average life (hours) X̄1 = 14.25 X̄2 = 10.63
Standard deviation S1 = 1.92 S2 = 1.81
H 1 : µ1 = µ2 ,
(5.29)
H2 : µ1 6= µ2 ,
where s
S12 N1 − n1 S 2 N2 − n2
c = Uα/2 + 2 . (5.31)
n1 N1 − 1 n2 N2 − 1
If, as we assumed in Example 5.8, the two types of error are thought to
be equally serious and α = 0.50, then Uα/2 = 0.674. Next, we calculate c.
Since the Ni are very large, we may set (Ni − ni )/(Ni − 1) ≈ 1, in which
case
r
(1.92)2 (1.81)2 √
c = 0.674 + = 0.674 0.064 = (0.674)(0.253) = 0.171.
100 120
Since 3.62 > 0.171, we reject the hypothesis that the average life of the two
brands is the same.
(3.62) ± (1.96)(0.253),
Table 5.13
Samples of two brands of batteries
Life (hours) Ranks
Brand A Brand B Brand A Brand B
(1) (2) (3) (4)
8.9 1
9.9 2
10.1 3
10.2 4
10.5 5
12.4 6
12.7 7
13.6 8
14.2 9
16.8 10
17.9 11
n1 =5 n2 = 6 R1 = 39 R2 = 27
n1 (n1 + n2 + 1)
E(R1 ) = ,
2
n1 n2 (n1 + n2 + 1)
V ar(R1 ) = .
12
It can also be shown, again if the two populations are identical, that the
probability distribution of R1 is approximately normal for large n1 and
n2 . (It does not matter which population and sample is called “the Þrst”
and which “the second.” R1 could refer to either, but n1 should be the
corresponding sample size.)
If the populations are identical, the observed R1 should tend to be
close to E(R1 ); if not, R1 should tend to deviate from E(R1 ). Therefore, we
should want to reject the hypothesis that the two populations are identical
when R1 deviates substantially from E(R1 ). The terms “close” and “sub-
stantially” are given precise meaning in the decision rule shown in the box
that follows.
To illustrate this test, let us return to our example and assume–as
we did in Example 5.8–that α = 0.50. Type I and II errors are assumed
equally serious, and Uα/2 = 0.674. Calculate
(5)(5 + 6 + 1) (5)(6)(5 + 6 + 1)
E(R1 ) = = 30, V ar(R1 ) = = 30.
2 12
p √
E(R1 ) ± Uα/2 V ar(R1 ) is (30) ± (0.674) 30, or from about 23.61 to 33.69.
Since the observed R1 = 39 lies outside this interval, H1 is rejected.
This decision rule is known as the Mann-Whitney or Wilcoxon (MWW)
test. Three points are worth noting.
First, the MWW test is an alternative to the chi-square test of Section
5.9. As we noted in Section 5.5, there may be more than one test of given
hypotheses. This raises the question of how to determine which of the alter-
native tests is better–a question that has received considerable attention
in the statistical literature, but which we shall not pursue here.
Problems 37
The second point is that there is a MWW test applicable for any–not
only large–n1 and n2 ; the critical values for this test are obtained from
special tables, but the test itself is based on R1 . The noteworthy feature
of this test is that it does not require the two population distributions to
have a particular form. As we remarked earlier, decision rules applicable to
samples of any size usually make this requirement; they are sometimes called
parametric tests, in the sense that the hypotheses deal with parameters of
population distributions of a given type. By contrast, the MWW is one of
many non-parametric tests.
The Þnal point worth noting is that H1 , once again, is a precise state-
ment. What is being tested is the hypothesis that the two population distri-
butions are identical. This non-parametric test, therefore, shares the main
shortcoming of the chi-square test of Section 5.9. Since it is very rare that
two population distributions in the business world are exactly alike, care
should be taken that rejection of H1 be followed by an appraisal of the
magnitude of the differences between the two populations.
PROBLEMS
5.1 A random sample of size n = 200 was drawn without replacement from a
population of size N = 1, 000. The sample mean of a variable is X̄ = 150, and the
sample variance is S 2 = 280.
(a) Calculate the 99%, 95%, 90%, and 50% symmetric conÞdence intervals
for the population mean of the variable, µ. Brießy interpret these intervals.
38 Chapter 5: Interval estimation and testing
(b) Calculate the 99%, 95%, 90%, and 50% symmetric conÞdence for the
total value of the variable in the population. Brießy interpret these intervals.
(c) Do (a) and (b) under the assumption that the sample is with replacement.
5.2 A random sample of size n = 200 was drawn without replacement from a
population of size N = 1, 000. The proportion of elements in the sample falling
into a certain category is R = 0.43.
(a) Calculate the 99%, 95%, 90%, and 50% symmetric conÞdence intervals for
the proportion of elements in the population that fall into this category. Brießy
interpret these intervals.
(b) Calculate the 99%, 95%, 90%, and 50% symmetric conÞdence intervals
for the number of elements in the population that belong to this category. Brießy
interpret these intervals.
(c) Do (a) and (b) under the assumption that the sample is with replacement.
5.3 In the manner described in Section 5.3, construct (a) a two-sided symmetric,
(b) a two-sided asymmetric, and (c) two one-sided 90% conÞdence intervals for
(i) the population mean of a variable, and (ii) a population proportion. Brießy
discuss the differences among these intervals. Which type of interval is preferable?
5.4 Following complaints that parking meters were malfunctioning, the Depart-
ment of Consumer Affairs selected at random and without replacement 10% of the
1,850 parking meters in a metropolitan area. Of the meters tested, 105 gave the
correct reading, 75 gave more time that was paid for, and 5 gave less time than
was paid for. “Parking meters,” the Department’s news release concluded, “may
be one of the few bargains left.”
What is your estimate of the proportion of all parking meters giving the cor-
rect time? Construct an interval estimate of this proportion; the interval estimate
should contain the true proportion with probability 95%. Do you need any addi-
tional information in order to determine if parking meters are indeed a bargain?
If so, what?
5.5 The management of a supermarket is concerned about the average waiting
time of its customers at the checkout counters on Saturdays. Fifty customers
“were randomly selected” one Saturday, and the time they spent waiting in line
before being served was recorded. Their average waiting time was 5.2 minutes and
the standard deviation of waiting times was 1.7 minutes.
(a) Assuming that the selected customers constitute a random sample with
replacement from the population of all customers, test the hypothesis that the
mean waiting time of all Saturday customers is less than or equal to 5 minutes,
against the alternative hypothesis that it is greater than 5 minutes. Assume that
the probability of a Type I error should not exceed 10%.
(b) Explain the exact meaning of Type I and II errors in this case, and their
likely consequences. Which error is the more serious in this case?
(c) Redo the test in (a) under the assumption that the probability of a Type
II error should not exceed 10%.
(d) How would you select a random sample of Saturday customers? Would
any n customers, no matter how selected, constitute a sample to which the test in
(a) could be applied?
5.6 The manager of a department store wished to estimate the proportion of time
that the sales clerks are idle. She divided the week into three periods of about
equal business volume (weekdays to 5 p.m., weekdays after 5 p.m., Saturdays).
Over a period of one month, a number of checks were made in each period at
Problems 39
Table 5.14
Data, Problem 5.9
Number Frequency
0 8
1 11
2 9
3 10
4 10
5 12
6 9
7 11
8 12
9 8
100
(a) Assuming that the above numbers can be treated as a random sample
from an inÞnite population of numbers that could be generated by this program,
test the equal frequency hypothesis. The probability of a Type I error should not
exceed 1%.
(b) Which other requirement must these numbers satisfy to qualify as random
numbers? How could this be tested? Describe only, do not calculate.
(c) Determine the exact meaning of Type I and II errors in this case, and
their likely consequences. Which error is more serious?
(d) Redo the test in (a) under the condition that the probability of a Type
II error should not exceed 1%.
(e) Would any n numbers produced by the program qualify as a sample to
which the tests in (a) or (d) apply?
5.10 A study was made of the time that elapsed between 150 successive telephone
calls to a certain exchange. The results were as shown in Table 5.15.
(a) Assuming that the observations constitute a random sample from an inÞ-
nite population of calls, test the hypothesis that the distribution of time between
calls is exponential. The probability of a Type I error should not exceed 5%.
Problems 41
Table 5.15
Data, Problem 5.10
Time from previous call Number of
(minutes) calls
0.0 to 0.5 93
0.5 to 1.0 36
1.0 to 1.5 12
1.5 to 2.0 6
2.0 to 2.5 3
Total 150
Table 5.16
Wage distribution, Problem 5.11
Wage interval
(dollars) Number of workers
120 to 125 2
125 to 130 7
130 to 135 10
135 to 140 15
140 to 145 20
145 to 150 25
150 to 155 19
155 to 160 17
160 to 165 11
165 to 170 8
170 to 175 3
175 to 180 2
180 to 185 1
140
Hints: Estimate the parameter λ of this distribution by the inverse of the average
observed time between calls, using the midpoints of the time intervals above; that
is, set λ = 1/X̄ (why?). The probability that a variable X, having an exponential
distribution with parameter λ, will be in the interval from a to b (a < b) can be
shown to be equal to
(a) Assuming that the sample can be treated as essentially one with replace-
ment (because of the large population size), test the hypothesis that the distribu-
tion of wages in the factory is normal. The probability of a Type I error should not
exceed 10%. Hint: Estimate the parameters µ and σ of the normal distribution by
the sample mean and standard deviation of wages calculated using the midpoints
of the intervals above.
(b) Determine the exact meaning of Type I and II errors in this case, and
their likely consequences. Which error is more serious?
(c) Redo the test in (a) under the condition that the probability of a Type II
error should not exceed 10%.
5.12 A study was made of the feasibility of constructing short-term storage
facilities for liquid products (animal, vegetable, and marine oils, chemicals, etc.,
but excluding petroleum products) exported from or imported to a Great Lakes
port. Table 5.17 shows the frequency distribution of the time between successive
arrivals of all liquid-carrying vessels at the port in the most recent summer season.
Table 5.17
Great Lakes study, Problem 5.12
Interarrival time Number of
(days) vessels
0 to 5 23
5 to 11 10
11 to 15 6
Over 15 1∗
40
∗
24 days
ADM532
ADM531 A B Total
A 18 16 34
B 31 14 45
Total 49 30 79
B
A Yes No Total
Yes 15 35 50
No 25 25 50
Total 40 60 100
(a) Assuming that the population of tenants is very large so that the sample
is essentially one with replacement, test the hypothesis that a tenant’s preference
is independent of the presence of children. The probability of a Type I error should
not exceed 10%.
(b) Determine the meaning and likely consequences of the two types of error
in this case. Which error is the more serious?
(c) Redo the test in (a) under the assumption that the two errors are equally
serious.
5.15 A sample of 280 households in a large metropolitan area was selected in
order to investigate the usage of XL White, a brand of laundry bleach. The results
of the study are in part as shown in Table 5.18.
Table 5.18
Results of bleach study, Problem 5.15
Household income Non-users Light users Heavy users Total
Under 15,000 28 16 10 54
15,001 to 20,000 32 15 8 55
20,001 to 30,000 27 14 12 53
30,001 to 40,000 31 17 11 59
Over 40,000 32 18 9 59
Total 150 80 50 280
V = 1.826
Family size Non-users Light users Heavy users Total
Under 3 60 26 10 96
3 to 4 52 33 15 100
Over 4 38 21 25 84
Total 150 80 50 280
V = 13.799
(a) Test the two hypotheses that usage is independent of (i) income, and (ii)
family size. The probability of a Type I error should not exceed 10%.
(b) Examine the meaning and likely consequences of the two types of error
in this case. Which error is the more serious? Redo the tests under the condition
that the probability of a Type II error should not exceed 10%.
(c) How would you select a simple random of households? Would any sample
do for the purpose of applying the above tests?
(d) What are the implications of the test results?
44 Chapter 5: Interval estimation and testing
Table 5.19
Weight and handling time of
metal plates, Problem 5.16
Plate No.: Weight Handling time
1 Medium Long
2 Light Short
3 Medium Short
4 Heavy Short
5 Light Short
6 Heavy Long
7 Medium Short
8 Heavy Long
9 Light Short
10 Medium Long
11 Light Long
12 Medium Short
13 Heavy Long
14 Medium Long
15 Heavy Long
Does the weight of a metal piece inßuence the handling time? If yes, what is
the nature of the relationship? In answering these questions treat the sample as
if it were large. Why is this assumption necessary? Explain and justify any other
assumptions you are forced to make.
5.17 A product testing laboratory was asked to evaluate the durability of four
brands of tires. The durability tests were made under normal city driving condi-
tions with the assistance of a Þrm operating a ßeet of taxicabs. In all, 140 taxis
were employed and each was Þtted with four new tires, one from each brand. These
tires were randomly selected from retail outlets. Each day, the tires were rotated
in a predetermined manner to ensure uniform exposure to wear, unrelated to their
original location on the car. Also daily, the tires were inspected to determine if
they had reached the end of their useful life. The test results are summarized in
Table 5.20.
(a) Assuming that the observations can be considered random samples from
inÞnitely large populations, do the four samples of tires come from populations
having identical distributions of life? The probability of a Type I error should not
exceed 5%.
(b) Interpret the meaning of the two types of error in this case, and their
likely consequences. Which error is more serious?
(c) Redo the test in (a) under the condition that the probability of a Type II
error should not exceed 5%.
Problems 45
Table 5.20
Tire test results, Problem 5.17
Life Brands
(000 miles) A B C D
Under 30 25 30 25 26
30 to 31 40 45 45 43
31 to 32 60 55 50 55
Over 32 15 10 20 16
Total 140 140 140 140
(d) Comment on the method used by the laboratory to measure the durability
of tires. Does the method produce samples to which the tests in (a) and (c) may
be applied?
5.18 In a sample of 200 beer consumers in city A, the average monthly beer
consumption was 26.5 oz, and the sample standard deviation was 2.14. In a
sample of 150 beer consumers in city B, the average monthly beer consumption
was 29.8 oz, and the sample standard deviation was 3.87.
(a) Assuming that the samples are random and without replacement from
very large populations, test the hypothesis that the average beer consumption of
all beer drinkers in city A is the same as that in city B. The probability of a Type
I error should not exceed 20%.
(b) Interpret the meaning and the likely consequences of the two types of
error in this case. Which error is the more serious?
(c) Redo the test in (a) assuming the probability of a Type II error should
not exceed 20%.
(d) How would you select a random sample of beer consumers in a city?
5.19 Surveys of potential buyers can provide useful information about the likeli-
hood of success of planned (that is, as yet not manufactured or offered) products
and services. In the case described here, the product in question was an electric
car which, at the time the survey was conducted, was still at the prototype stage.*
Data were collected through personal interviews with 1,229 randomly selected
shoppers at designated shopping centers in three cities. The respondents were
given a detailed description of the planned electric car, which included its price,
passenger and luggage capacity, size, speed, cost of operation, and safety features.
The respondents were asked to rate their intention to buy the electric car when
it became available on an 11-point scale, ranging from 0 (absolutely no chance
of buying) to 10 (almost certain of buying). They were also asked to rate the
importance they attributed to each of a number of factors inßuencing their choice
of car on a 10-point scale, ranging from 0 (low) to 9 (high importance). The results
of the study are shown in part in Tables 5.21, 5.22, and 5.23.
In Table 5.23, the importance groups in column (2) are formed as follows: I
= low importance (0 to 3 on the 9-point scale); II = medium importance (4 to 6);
and III = high importance (7 to 9). The intention groups are as follows: None =
0 or 1, Low = 2 to 4, Medium = 5 to 7, and High = 8 to 10 on the 11-point scale.
Table 5.21
Distribution of buying intention,
Problem 5.19
Frequency of response
Rating City A City B City C
0 94 53 33
1 35 12 16
2 32 12 31
3 43 12 38
4 28 16 37
5 101 65 63
6 37 30 29
7 58 36 35
8 53 32 26
9 23 31 12
10 52 33 21
Total 556 332 341
Mean 4.72 5.30 4.69
Std. dev. 3.22 3.21 2.78
Table 5.22
Mean and standard deviation
of ratings for factors, Problem 5.19
Ratings
Factor Mean Std. deviation
Cost of operation 7.00 2.30
Ease of maintenance 6.24 2.49
Cost of maintenance 6.81 2.36
Luggage capacity 4.56 2.51
Passenger capacity 4.92 2.41
Size 5.18 2.58
Mileage 6.85 2.34
Price 6.91 2.33
Speed 4.34 2.43
Acceleration 4.69 2.58
Safety 7.24 2.36
Pollution 6.28 2.80
Column (7) shows the mean intention-to-buy rating for each importance group.
Column (8) is the V statistic for testing the independence of factor and intention
to buy.
Assume you are the marketing manager of the company intending to produce
this electric car. Interpret these Þndings.
5.20 An issue of some importance in marketing is the extent to which consumers
are conscious of the price of an article at the time of its purchase. One theory
is that consumers watch prices and carefully adjust their purchases from different
Problems 47
Table 5.23
Association between importance of factor
and intention to buy, Problem 5.19
Imp. Intention group Mean
Factor group None Low Medium High intention V
(1) (2) (3) (4) (5) (6) (7) (8)
Cost I 43 18 36 25 3.73
of II 49 83 92 50 4.18 67.19
operation III 87 210 326 208 5.14
Ease I 47 40 78 38 4.18
of II 47 114 120 65 4.45 32.82
maintenance III 85 157 256 180 5.13
Cost I 39 20 46 30 4.12
of II 49 85 101 56 4.29 37.12
maintenance III 91 206 307 197 4.98
I 62 91 168 104 4.91
Luggage II 51 148 186 114 4.96 26.60
capacity III 66 72 100 65 4.33
I 54 73 130 87 4.92
Passenger II 56 154 213 127 5.02 23.28
capacity III 69 84 111 69 4.27
I 50 29 36 17 3.03
Mileage II 41 86 102 49 4.41 79.70
III 88 196 316 217 5.19
I 35 30 37 20 3.59
Price II 49 96 95 59 4.40 29.99
III 95 185 322 204 5.11
I 78 93 148 133 4.93
Speed II 59 144 218 103 4.68 31.76
III 42 74 88 47 4.41
I 53 60 72 34 3.80
Pollution II 36 98 115 50 4.60 47.36
III 90 153 267 199 5.18
outlets so as always to minimize the total cost of the articles they buy. Opponents
of this theory argue that consumers cannot remember accurately the prices of the
hundreds of commodities they normally buy; they are often concerned more with
whether or not they can afford to buy an article rather than with its exact price.
To investigate this issue, 640 housewives were approached at randomly se-
lected addresses in a city.* “Housewife” was interpreted broadly to mean the
person, male or female, responsible for current purchases of provisions for the
household, but will be referred to as a “she” in this case. The questionnaire
contained questions about recent purchases of Þfteen selected commodities and
about certain aspects of the household. For each commodity, the housewife was
asked when she bought it last. The interviewers were instructed not to ask for
further information if the last purchase was more than a week ago. However, if
the housewife had purchased the commodity within the last week, she was asked
to state the brand or type of the commodity, whether or not she recalled the price
paid for it, and, if so, how much she paid. The interviewers were asked to supply
their personal estimate of the social group to which the housewife belonged. Five
social groups were distinguished: A (the well-to-do), B (the professional middle
class), C (the lower middle class), D (the working class), and E (the poor). The
recognizable characteristics of these groups were described in detail in the written
instructions to the interviewers.
Of the Þfteen commodities listed in the questionnaire, eight were sold at such
a variety of prices that it was not practicable to check the answers. The remaining
seven, however, could be checked and the price named by the housewife could be
compared with the list price obtained from industry sources. If there was any
departure from the list price, the answer was classiÞed as incorrect. The Don’t
Know category includes both those who, for some reason, refused to name the
price of the purchase, and those who admitted that the price named was more or
less a guess. Altogether, 422 housewives reported 1,888 recent purchases of one or
more of the seven commodities the price of which could be checked. The results
by commodity are shown in Table 5.24.
Table 5.24
Results by commodity, Problem 5.20
Stated Commodity
price was: Tea Coffee Sugar Jam Margarine Flour Cereal All
Correct 283 109 266 117 116 100 85 1,076
Wrong 56 35 53 46 110 82 98 480
Don’t know 18 16 78 34 26 99 61 332
All purchases 357 160 397 197 252 281 244 1,888
The relationship between the percentage of correct answers and social group
is shown in Table 5.25.
Table 5.25
Results by social group, Problem 5.20
Number of
Social Number of Number of prices named Percentage
group housewives purchases correctly correct
A 9 42 19 45.2
B 28 116 54 46.6
C 118 544 309 56.8
D 229 1,052 616 58.6
E 38 134 78 58.2
All groups 422 1,888 1,076 57.0
Problems 49
The percentage correct is the ratio of the number of correct prices named to
the number of items bought.
What are the implications of this study for the price-awareness theory?