Chapter 10
Two-Sample Tests and
ANOVA
Learning Objectives
In this chapter, you learn hypothesis testing
procedures to test:
The means of two independent populations
The means of two related populations
The proportions of two independent
populations
The variances of two independent populations
The means of more than two populations
Chapter Overview
Two-Sample Tests
Population Means,
Independent Samples
Means,
Related Samples
Population
Proportions
Population
Variances
One-Way Analysis
of Variance (ANOVA)
F-test
Tukey-Kramer
test
Two-Sample Tests
Two-Sample Tests
Population
Means,
Independent
Samples
Means,
Related
Samples
Population
Proportions
Population
Variances
Examples:
Mean 1 vs.
independent
Mean 2
Same population
before vs. after
treatment
Proportion 1 vs.
Proportion 2
Variance 1 vs.
Variance 2
Difference Between Two
Means
Population means,
independent
samples
1 and 2 known
1 and 2 unknown,
assumed equal
1 and 2 unknown,
not assumed equal
Goal: Test hypothesis or form
a confidence interval for the
difference between two
population means, 1 2
The point estimate for the
difference is
X1 X2
Independent Samples
Population means,
independent
samples
1 and 2 known
1 and 2 unknown,
assumed equal
1 and 2 unknown,
not assumed equal
Different data sources
Unrelated
Independent
Sample selected from one
population has no effect
on the sample selected
from the other population
Use the difference between
2 sample means
Use Z test, a pooledvariance t test, or a
separate-variance t test
Difference Between Two
Means
Population means,
independent
samples
1 and 2 known
Use a Z test statistic
1 and 2 unknown,
assumed equal
Use Sp to estimate unknown
, use a t test statistic and
pooled standard deviation
1 and 2 unknown,
not assumed equal
Use S1 and S2 to estimate
unknown 1 and 2, use a
separate-variance t test
1 and 2 Known
Population means,
independent
samples
1 and 2 known
1 and 2 unknown,
assumed equal
1 and 2 unknown,
not assumed equal
Assumptions:
Samples are randomly and
independently drawn
Population distributions are
normal or both sample sizes
are 30
Population standard
deviations are known
1 and 2 Known
Population means,
independent
samples
1 and 2 known
1 and 2 unknown,
assumed equal
1 and 2 unknown,
not assumed equal
(continued)
When 1 and 2 are known and
both populations are normal or
both sample sizes are at least 30,
the test statistic is a Z-value
and the standard error of
X1 X2 is
X1 X2
2
1
n1
n2
1 and 2 Known
Population means,
independent
samples
1 and 2 known
1 and 2 unknown,
assumed equal
1 and 2 unknown,
not assumed equal
(continued)
The test statistic for
1 2 is:
X
Z
X 2 1 2
2
1
n1
n2
Hypothesis Tests for
Two Population Means
Two Population Means, Independent Samples
Lower-tail test:
Upper-tail test:
Two-tail test:
H0: 1 2
H1: 1 < 2
H0: 1 2
H1: 1 > 2
H0: 1 = 2
H1: 1 2
i.e.,
i.e.,
i.e.,
H0: 1 2 0
H1: 1 2 < 0
H0: 1 2 0
H1: 1 2 > 0
H0: 1 2 = 0
H1: 1 2 0
Hypothesis tests for 1 2
Two Population Means, Independent Samples
Lower-tail test:
Upper-tail test:
Two-tail test:
H0: 1 2 0
H1: 1 2 < 0
H0: 1 2 0
H1: 1 2 > 0
H0: 1 2 = 0
H1: 1 2 0
-z
Reject H0 if Z < -Z
z
Reject H0 if Z > Z
/2
-z/2
/2
z/2
Reject H0 if Z < -Z/2
or Z > Z/2
Confidence Interval,
1 and 2 Known
Population means,
independent
samples
1 and 2 known
1 and 2 unknown,
assumed equal
1 and 2 unknown,
not assumed equal
The confidence interval for
1 2 is:
2
1
2
X1 X 2 Z
n1
n2
1 and 2 Unknown,
Assumed Equal
Assumptions:
Population means,
independent
samples
Samples are randomly and
independently drawn
1 and 2 known
1 and 2 unknown,
assumed equal
1 and 2 unknown,
not assumed equal
Populations are normally
distributed or both sample
sizes are at least 30
Population variances are
unknown but assumed equal
1 and 2 Unknown,
Assumed Equal
Forming interval
estimates:
Population means,
independent
samples
1 and 2 known
1 and 2 unknown,
assumed equal
1 and 2 unknown,
not assumed equal
(continued)
The population variances
are assumed equal, so use
the two sample variances
and pool them to
estimate the common 2
the test statistic is a t value
with (n1 + n2 2) degrees
of freedom
1 and 2 Unknown,
Assumed Equal
(continued)
Population means,
independent
samples
The pooled variance is
1 and 2 known
1 and 2 unknown,
assumed equal
1 and 2 unknown,
not assumed equal
2
p
n1 1 S
n2 1 S 2
(n1 1) (n2 1)
2
1
1 and 2 Unknown,
Assumed Equal
(continued)
The test statistic for
1 2 is:
Population means,
independent
samples
X
t
1 and 2 known
1 and 2 unknown,
assumed equal
1 and 2 unknown,
not assumed equal
X 2 1 2
1 1
S
n1 n2
2
p
Where t has (n1 + n2 2) d.f.,
and
2
p
2
2
n1 1 S1 n2 1 S 2
(n1 1) (n2 1)
Confidence Interval,
1 and 2 Unknown
Population means,
independent
samples
The confidence interval for
1 2 is:
1 and 2 known
1 and 2 unknown,
assumed equal
1 and 2 unknown,
not assumed equal
X 2 t n1 n2 -2
1 1
S
n1 n2
2
p
Where
2
2
1
S
1
S
1
2
2
S2 1
p
(n1 1) (n2 1)
Pooled-Variance t Test:
Example
You are a financial analyst for a brokerage firm. Is there
a difference in dividend yield between stocks listed on
the NYSE & NASDAQ? You collect the following data:
Number
Sample mean
Sample std dev
NYSE
21
3.27
1.30
Assuming both populations are
approximately normal with
equal variances, is
there a difference in average
yield ( = 0.05)?
NASDAQ
25
2.53
1.16
Calculating the Test Statistic
The test statistic is:
X
t
X 2 1 2
1 1
S
n1 n2
2
p
2
p
3.27 2.53 0
1
1
1.5021
21 25
2
2
n1 1 S1 n2 1 S 2
21 11.30 2 25 11.16 2
(n1 1) (n2 1)
(21 - 1) (25 1)
2.040
1.5021
Solution
H0: 1 - 2 = 0 i.e. (1 = 2)
Reject H0
Reject H0
H1: 1 - 2 0 i.e. (1 2)
= 0.05
.025
df = 21 + 25 - 2 = 44
-2.0154
Critical Values: t = 2.0154
.025
0 2.0154
2.040
Decision:
3.27 2.53
t
2.040 Reject H0 at = 0.05
1
1
Conclusion:
1.5021
21 25
There is evidence of a
difference in means.
Test Statistic:
1 and 2 Unknown,
Not Assumed Equal
Assumptions:
Population means,
independent
samples
Samples are randomly and
independently drawn
Populations are normally
distributed or both sample
sizes are at least 30
1 and 2 known
1 and 2 unknown,
assumed equal
1 and 2 unknown,
not assumed equal
Population variances are
unknown but cannot be
assumed to be equal
1 and 2 Unknown,
Not Assumed Equal
Population means,
independent
samples
Forming the test statistic:
The population variances
are not assumed equal, so
include the two sample
variances in the computation
of the t-test statistic
1 and 2 known
1 and 2 unknown,
assumed equal
1 and 2 unknown,
not assumed equal
(continued)
the test statistic is a t value
(statistical software is generally
used to do the necessary
computations)
1 and 2 Unknown,
Not Assumed Equal
Population means,
independent
samples
(continued)
The test statistic for
1 2 is:
X X
t
1 and 2 known
1 and 2 unknown,
assumed equal
1 and 2 unknown,
not assumed equal
2
1
2
2
S S
n1 n2
Related Populations
Tests Means of 2 Related Populations
Related
samples
Paired or matched samples
Repeated measures (before/after)
Use difference between paired values:
Di = X1i - X2i
Eliminates Variation Among Subjects
Assumptions:
Both Populations Are Normally Distributed
Or, if not Normal, use large samples
Mean Difference, D Known
Related
samples
The ith paired difference is Di ,
where D = X - X
i
1i
The point estimate for
the population mean
paired difference is D :
2i
Suppose the population
standard deviation of the
difference scores, D, is known
n is the number of pairs in the paired sample
D
i 1
Mean Difference, D Known
Paired
samples
(continued)
The test statistic for the
mean difference is a Z
value:
D D
Z
D
n
Where
D = hypothesized mean difference
D = population standard dev. of differences
n = the sample size (number of pairs)
Confidence Interval, D
Known
Paired
samples
The confidence interval for D is
D
DZ
n
Where
n = the sample size
(number of pairs in the paired sample)
Mean Difference, D
Unknown
Related
samples
If D is unknown, we can estimate the
unknown population standard
deviation with a sample standard
deviation:
The sample standard
deviation is
SD
2
(D
D
)
i
i1
n 1
Mean Difference, D
Unknown
Paired
samples
(continued)
Use a paired t test, the test statistic
for D is now a t statistic, with n-1
d.f.:
D D
t
SD
n
Where t has n - 1 d.f.
and SD is:
SD
2
(D
D
)
i
i1
n 1
Confidence Interval, D
Unknown
Paired
samples
The confidence interval for D is
SD
D t n1
n
n
where
SD
(D D)
i1
n 1
Hypothesis Testing for
Mean Difference, D Unknown
Paired Samples
Lower-tail test:
Upper-tail test:
Two-tail test:
H0: D 0
H1: D < 0
H0: D 0
H1: D > 0
H0: D = 0
H1: D 0
-t
Reject H0 if t < -t
t
Reject H0 if t > t
Where t has n - 1 d.f.
/2
-t/2
/2
t/2
Reject H0 if t < -t
or t > t
Paired t Test Example
Assume you send your salespeople to a
customer service training workshop. Has the
training made a difference in the number of
complaints? You collect the following data:
Number of Complaints:
(2) - (1)
Salesperson Before (1) After (2)
Difference, Di
C.B.
T.F.
M.H.
R.K.
M.O.
6
20
3
0
4
4
6
2
0
0
- 2
-14
- 1
0
- 4
D =
Di
n
= -4.2
SD
-21
2
(D
D
)
i
5.67
n 1
Paired t Test: Solution
Has the training made a difference in the number of
complaints (at the 0.01 level)?
H0: D = 0
H1: D 0
= .01
D = - 4.2
Critical Value = 4.604
d.f. = n - 1 = 4
Test Statistic:
D D 4.2 0
t
1.66
SD / n 5.67/ 5
Reject
Reject
/2
/2
- 4.604
4.604
- 1.66
Decision: Do not reject H0
(t stat is not in the reject region)
Conclusion: There is not a
significant change in the
number of complaints.
Two Population
Proportions
Population
proportions
Goal: test a hypothesis or form a
confidence interval for the difference
between two population proportions,
1 2
Assumptions:
n1 1 5 , n1(1- 1) 5
n2 2 5 , n2(1- 2) 5
The point estimate for
the difference is
p1 p2
Two Population
Proportions
Population
proportions
Since we begin by assuming the null
hypothesis is true, we assume 1 = 2
and pool the two sample estimates
The pooled estimate for the
overall proportion is:
X1 X 2
p
n1 n2
where X1 and X2 are the numbers from
samples 1 and 2 with the characteristic of
interest
Two Population
Proportions
The test statistic for
p1 p2 is a Z statistic:
Population
proportions
where
p1 p2 1 2
1 1
p (1 p)
n1 n2
X1 X 2
X
X
, p1 1 , p 2 2
n1 n2
n1
n2
(continued)
Confidence Interval for
Two Population Proportions
Population
proportions
The confidence interval for
1 2 is:
p1 p2
p1(1 p1 ) p 2 (1 p 2 )
Z
n1
n2
Hypothesis Tests for
Two Population Proportions
Population proportions
Lower-tail test:
Upper-tail test:
Two-tail test:
H0: 1 2
H1: 1 < 2
H0: 1 2
H1: 1 > 2
H0: 1 = 2
H1: 1 2
i.e.,
i.e.,
i.e.,
H0: 1 2 0
H1: 1 2 < 0
H0: 1 2 0
H1: 1 2 > 0
H0: 1 2 = 0
H1: 1 2 0
Hypothesis Tests for
Two Population Proportions
(continued)
Population proportions
Lower-tail test:
Upper-tail test:
Two-tail test:
H0: 1 2 0
H1: 1 2 < 0
H0: 1 2 0
H1: 1 2 > 0
H0: 1 2 = 0
H1: 1 2 0
-z
Reject H0 if Z < -Z
z
Reject H0 if Z > Z
/2
-z/2
/2
z/2
Reject H0 if Z < -Z
or Z > Z
Example:
Two population Proportions
Is there a significant difference between
the proportion of men and the proportion
of women who will vote Yes on
Proposition A?
In a random sample, 36 of 72 men and
31 of 50 women indicated they would
vote Yes
Test at the .05 level of significance
Example:
Two population Proportions
(continued)
The hypothesis test is:
H0: 1 2 = 0 (the two proportions are equal)
H1: 1 2 0 (there is a significant difference between proportions)
The sample proportions are:
Men:
p1 = 36/72 = .50
Women:
p2 = 31/50 = .62
The pooled estimate for the overall proportion is:
X1 X 2 36 31 67
p
.549
n1 n2
72 50 122
Example:
Two population Proportions
The test statistic for 1 2 is:
z
p1 p2 1 2
1 1
n1 n2
p (1 p)
.50 .62 0
1
1
72
50
(continued)
Reject H0
Reject H0
.025
.025
-1.96
-1.31
1.96
1.31
.549 (1 .549)
Critical Values = 1.96
For = .05
Decision: Do not reject H0
Conclusion: There is not
significant evidence of a
difference in proportions
who will vote yes between
men and women.