Biometry242 MultipleComparisons
Biometry242 MultipleComparisons
2
Background
• Type I error increases with multiple testing → probability to incorrectly reject at least one of the
𝐻0 ’s (accepting that all 𝐻0 ’s are true)
𝐶 3
Type I error= 1 − 1 − 𝛼 = 1 − 0.95 = 0.14
• This is called the familywise error rate, because the error rate belongs to a family of hypothesis
tests (or a family of comparisons).
Background
𝑘 𝑘−1
= 𝑘 𝐶2
2
Suppose that 𝑘 = 4:
4 3
=6
2
Each test will compare two unique means at a time, resulting in 6 sets of hypotheses (𝐻0 : 𝜇𝐴 = 𝜇𝐵 and
𝐻1 : 𝜇𝐴 ≠ 𝜇𝐵 ):
𝐻0 : 𝜇1 = 𝜇2 ; 𝐻0 : 𝜇1 = 𝜇3 ; 𝐻0 : 𝜇1 = 𝜇4 ;
𝐻0 : 𝜇2 = 𝜇3 ; 𝐻0 : 𝜇2 = 𝜇4 ; 𝐻0 : 𝜇3 = 𝜇4
Arbitrary values:
A and B
Tukey test
• Assumptions
• 𝑘 samples are drawn from normally distributed populations
• However, the Tukey test (as ANOVA) is robust against deviations from this assumption.
• Severe deviations from normality can be addressed by data transformations or using non-
parametric alternative tests.
• Homoscedasticity of variances
• Tukey is sensitive to deviations from homoscedasticity, more than ANOVA.
Tukey test
1. Calculate the mean of each sample (𝑘)
2. Arrange and number the means in increasing order (small to large)
3. Tabulate the pairwise mean differences
4. Calculate the pairwise difference in the following manner (Zar: Page 243):
• largest vs the smallest, largest vs the second smallest,…,largest vs the second largest
• Second largest vs the smallest, second largest vs the second smallest,…,second largest vs the third
largest, etc.
5. Calculate an appropriate standard error for the Tukey test:
𝑠2
𝑆𝐸 = → 𝑠 2 (pooled variance / MSE), 𝑛 (replications)
𝑛
• Using symbols
Conclusion: From the MCP we see that Rock River has a different mean concentration of strontium
compared to all the other lakes. Angler, Appletree and Beaver lakes have the same mean concentration of
strontium. Lake Grayson has a different mean concentration of strontium compared to all the other lakes.
Example 3, Zar 215-216
(Chapter 10)
24.748 ; 26.586
Variety G vs A:
1.116
(26.983 − 25.667) ± 3.674
6
−0.269 ; 2.901
Conclusion: From the MCP we see that Variety L has a different mean yield compared
to varieties A and G (the confidence intervals do not contain zero). Varieties A and G
do not differ in mean yield (the confidence interval contains zero).
Tukey-Kramer and Games-Howell
𝑠2 1 1
𝑆𝐸 = +
2 𝑛A 𝑛B
1 𝑠𝐴2 𝑠𝐵2
𝑆𝐸 = +
2 𝑛𝐴 𝑛𝐵
3 vs 2 6.68 1.370
6.68
= 4.876
Reject 𝐻0 Conclusion: From the MCP we see that
1.37
Feed A has the same mean weight as D,
1.38 Do not reject 𝐻0 but different means as B and C. Feed B
2 vs 1 1.38 1.370 = 1.007
1.37 and C have the same mean weight.
Confidence intervals for
group means
61.701 ; 67.539
A B C D
Report the differences
using a Tukey confidence interval
Level of Group
Ranked
factor mean
Feed A 2 64.62
Feed B 3 71.3
Feed C 4 73.35
Feed D 1 63.24
Feed D vs Feed A:
9.383 1 1
−1.38 ± 4.076 +
2 5 5
−6.964 ; 4.204
Conclusion: From the MCP we see that Feed A has the same mean weight as D (confidence interval contains
zero), but different means as B and C (confidence intervals do not contain zero). Feed B and C have the same
mean weight (confidence interval contains zero). D and B, and D and C have different means (confidence
intervals do not contain zero).
Treatments vs a control
• Comparing a pre-determined treatment mean with all the other treatment means
• Comparing a standard treatment with new treatments
• Comparing a placebo (medicine without an active ingredient) with a treatment (medicine with
active ingredients)
Dunnett test for comparison with a control
Tukey test could be performed, but would not be as powerful as the Dunnett test in this particular design. The
Dunnett test can be one- or two-sided, but the Tukey test is always two-sided.
𝐻0 : 𝜇2 = 𝜇𝐴 vs. 𝐻1 : 𝜇2 ≠ 𝜇𝐴
75 − 60 ′
𝑝= = 0.25 𝑞0.05 2 ,5,75 = 2.47 + 1 − 0.25 2.51 − 2.47 = 2.5
120 − 60
Level of Group
Comparison Difference SE 𝒒 Decision Ranked
factor mean
1 1 17.3
−6.1 Reject 𝐻0
2 vs. 5 −6.1 1.086 = −5.617 2
1.086 2 21.7
(control)
3 3 22.1
−1.9 Do not reject 𝐻0
2 vs. 4 −1.9 1.086 = −1.750 4 4 23.6
1.086
5 5 27.8
Level of
Ranked Group mean
′ factor
𝑞0.05 1 ,3,15 = 2.07 → −2.07
Variety A 1 25.667
Variety G 2 26.983
Variety L 3 29.550
2(1.116) −3.883
Reject 𝐻0
1 vs. 3 −3.883 6 0.610
= −6.366
= 0.610
Reject 𝐻0
1 vs. 2 −1.316 0.610 −2.157
Conclusion: From the MCP we see that both varieties G and L have significant
higher mean potassium content than the control,Variety A.
Fisher’s LSD
To test for the difference in means:
𝐻0 : 𝜇𝐴 − 𝜇𝐵 = 0 vs. 𝐻1 : 𝜇𝐴 − 𝜇𝐵 ≠ 0
We would start by calculating the following:
𝑠2 𝑠2 𝑋ത 𝐴 −𝑋ത𝐵 −0
𝑋ത𝐴 − 𝑋ത𝐵 , 𝑠𝑋ത 𝐴−𝑋ത𝐵 = + , 𝑡 = then 𝐻0 will be rejected if: |𝑡| ≥ 𝑡𝜈;𝛼(2)
𝑛𝐴 𝑛𝐵 𝑠𝑋ഥ −𝑋ഥ
𝐴 𝐵
1.116 1.116
𝐿𝑆𝐷 5% = 𝑡15;0.05 2 × +
6 6
= 2.131 × 0.610
= 𝟏. 𝟑
Any treatment difference greater than or equal to 1.3 is regarded as a significant difference.
Level of
Group mean Variety G Variety A
factor
Variety L 29.550 2.567 3.883
Variety G 26.983 1.316
Variety A 25.667
Conclusion: From the MCP we see that all varieties have different mean potassium
contents, because all the differences are greater than 1.3.
The Tukey test did not report a significant difference between Variety G and A. This
shows that the Tukey test is more conservative than Fisher’s LSD.
The Bonferroni correction
0.05
Compare the P values with = 0.017.
3
Conclusion: From the MCP we see that Varieties G and L are different (0.001 <
0.017) and Varieties A and L are different (0.0002 < 0.017). Varieties G and A are
not different (0.036 > 0.017).
The Tukey test also did not report a significant difference between Variety G and A.
This shows that the Bonferroni correction produced similar results.