Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
35 views7 pages

Multiple Comparisons Testing

Uploaded by

estian.maritz1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views7 pages

Multiple Comparisons Testing

Uploaded by

estian.maritz1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Multiple Comparisons Testing

Introduction
In the previous section we discussed the steps to perform an ANOVA and the procedures to test the assump-
tions of an ANOVA in R. Suppose now that we reject H0 and conclude that at least one of the population
group means differs from the others. Today we consider the second step, which is finding the specific groups
that differ from the rest in terms of their population means.

Multiple comparisons testing


Before the multiple comparisons procedure is discussed, we consider a problem that arises when multiple
independent tests are performed. It will be shown that as the number of independent hypothesis tests
increases, the probability of a Type I error increases. We will consider one method to limit the risk of
rejecting a true null hypothesis. Thereafter, the procedure to perform multiple comparisons testing is
discussed.

Inflation of the type I error


Generally, the t-test is used to test the hypothesis that two populations’ means are equal. If an ANOVA
indicates that at least one group’s population mean differs from the others, we would like to be able to
determine which groups differ significantly from each other. It may seem reasonable to perform all possible
t-tests to achieve this. However, in the section on Multiple Regression we outlined a problem called alpha
spending. This is an inflation of the Type I error rate, thereby making it more likely to falsely reject the
null hypothesis. If we would like the overall error rate to remain fixed at α, then we have to make some kind
of adjustment to the significance level we use for each pairwise comparison.
Consider testing for i, j = 1, 2, ..., k; i 6= j the following hypothesis:
H0 : µi = µj
H1 : µi 6= µj .

If there are k groups, there are m = k2 tests. Suppose each of these tests is performed at a level of

(i)
significance α. Suppose we denote the null hypothesis of each test by H0 , i = 1, 2, ..., m. Furthermore,
(i)
assume that none of the groups differ, i.e., H0 is true for all i = 1, 2, ..., m. We expect that there is α
probability to conclude that at least one group differs from the rest. However,
   
(i) (i) (i) (i)
P Reject at least one H0 | All H0 are true = 1 − P Do not reject any H0 | All H0 are true
m
!
(i) (i)
\
=1−P Do not reject H0 | H0 is true
i=1
m  
(i) (i)
Y
=1− P Do not reject H0 | H0 is true
i=1
m
= 1 − (1 − α)

1
m
Thus, testing m independent hypothesis tests leads to an overall level of significance of 1 − (1 − α) . The
figure below shows this function for varying number of tests, m.

library(ggplot2)

## Warning: package ’ggplot2’ was built under R version 4.0.5

library(ggpubr)

M = 50
overall_significance = function(m, alpha){
adjusted_alpha = 1 - (1-alpha)ˆm
return(adjusted_alpha)
}
alpha = 0.05
m = 1:M
alpha_adj = sapply(m, overall_significance, alpha = alpha)
df_plot = data.frame('m'=m, 'AdjustedAlpha' = alpha_adj)

continuous_m = seq(1,M,by=0.1)
adj_alpha_continuous = sapply(continuous_m, overall_significance, alpha = alpha)
df_plot_c = data.frame('m'=continuous_m, 'AdjustedAlpha' = adj_alpha_continuous)

ggplot(data = df_plot, aes(x = m, y = AdjustedAlpha)) +


geom_point(size = 1, alpha = 0.6) +
geom_line(data = df_plot_c, aes(x = m, y = AdjustedAlpha)) +
xlab('Number of tests') +
ylab('Adjusted level of significance') +
theme_pubr() +
scale_x_continuous(breaks = c(1, seq(5,M,by=5))) +
scale_y_continuous(breaks = seq(0, 1, by=0.1))

0.9
Adjusted level of significance

0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1

1 5 10 15 20 25 30 35 40 45 50
Number of tests

It is clear that some form of adjustment must be made to maintain the required level of significance. Once
such method is known as the Bonferroni method which is now discussed.

2
ALSO TUKEY TEST!!!
The Bonferroni adjustment

If we perform m tests at a level of significance of α? , then the actual probability of a type I error is
α = 1 − (1 − α? )m . It can be shown that (1 − α? )m ≈ 1 − mα? for small α? . Therefore, α ≈ mα? . Hence,
we see that if we want to perform each test at a level of significance of αspecif ied , we should use
αspecif ied
αadjusted = .
m

Therefore, each test will be performed using the decision rule: reject H0 if p − value < αadjusted . We see we
can also use αspecif ied and adjust our p-values, i.e., reject H0 if m × p − value < αspecif ied . This adjustment
is known as the Bonferroni adjustment.
Recall that, keeping all other parameters equal, if the level of significance is decreased, the power of a test
decreases. Therefore, these adjustments aim to keep our significance level constant, but it still decreases the
power of a test. Similarly, since the power of the test is decreased, the probability of a type II error increases.

Recap: Two sample tests

Notice if we have two groups, then the hypothesis test of an ANOVA is

H0 : µ1 = µ2
H1 : µ1 6= µ2

Furthermore, we assume that the two samples are independent and that the population variances are equal.
For such a test, the test statistic is

(x̄1 − x̄2 ) − (µ1 − µ2 )


tcalc = r   ∼ t(n1 + n2 − 2).
2 1 1
sp n1 + n2

Notice that P2
(n1 − 1)s21 + (n2 − 1)s22 j=1 (nj − 1)s2j
s2p = = = M SE.
n1 + n2 − 2 n−k

Pairwise two-sample tests

Consider k groups for which we perform the pairwise tests

H0 : µi = µj
H1 : µi 6= µj ;
i, j = 1, 2, ..., k;
i 6= j.

k

Assume each test must be performed at a level of significance of α. Note there are m = 2 tests.
α
Step 1: Adjust the level of significance with αnew = m.

Step 2: Calculate the M SE with


Pk
j=1 (nj − 1)s2j
M SE = .
n−k

3
Step 3: Compute the test statistic as
(x̄i − x̄j ) − (µi − µj )
tcalc = r   .
1 1
M SE ni + nj

Step 4: Compute a critical value as tαnew /2 (n − k) or a p-value with 2 × P (Tn−k >| tcalc |).

Confidence interval for the difference between two means

Note since
(x̄i − x̄j ) − (µi − µj )
tcalc = r   ∼ t(n − k),
1 1
M SE ni + nj

It follows that  
 (x̄i − x̄j ) − (µi − µj ) 
−tα/2,n−k < r
P  < tα/2,n−k  = 1 − α.


M SE n1i + n1j

Hence, a 100(1 − α)% confidence interval for (µi − µj ) is


s  
1 1
conf1−α (µi − µj ) = (x̄i − x̄j ) ± t 2m
α
,n−k M SE + .
ni nj

If this confidence interval includes zero, we can conclude at a α level of significance that there is insufficient
evidence that µi and µj differ significantly. This is the relationship between hypothesis testing for two-sided
tests and confidence intervals.
Consider the data of the advertisements and the number of juices sold. Suppose we want to compare the
average number of juices sold for the convenience and quality groups. Let C denote the convenience group
and let Q denote the quality group. We have

nC = nQ = 20 ; x̄C = 577.55 ; x̄Q = 653 ; M SE = 8894.447

Suppose we test the following hypothesis at a 5% level of significance:

H0 : µC − µQ = 0
H1 : µC − µQ 6= 0

The test statistic is given by


(x̄C − x̄Q ) − (µC − µQ )
tcalc = r  
M SE n1C + n1Q

(577.55 − 653) − 0
=q
1 1

8894.447 20 + 20
= −2.5299.

Since there are in fact three tests, we must use the Bonferroni adjusted level of significance, αadj = 0.053 =
0.0167. Therefore, the correct critical value is ±tcrit = ±2.1808 such that P (T57 > tcrit ) = 0.0167. Using the
critical value approach, we reject H0 at the 5% significance level since | tcalc |= 2.5299 > 2.1808 =| tcrit |.

4
The p-value of the test is 2 × P (T57 >| tcalc |) = 2 × P (T57 > 2.5299) = 0.0142. The p-value is still less than
αadj = 0.0167, and therefore we reject the null hypothesis at the 5% level of significance. Note that we can
also compute the Bonferroni adjusted p-value given by 3 × 0.0142 = 0.043 which can be compared to the
original level of significance, α = 0.05.
Finally, the 95% confidence interval for the difference between the population means is given by
s  
1 1
conf0.95 (µC − µQ ) = (x̄C − x̄Q ) ± t m
α
,n−k M SE +
nC nQ
s  
1 1
= (577.55 − 653) ± 2.1808 8894.447 +
20 20
= [−140.4917 ; − 10.4083]

Since the confidence interval does not include zero, we conclude again that there is a significant difference
between the population average number of juices sold for the convenience and quality advertisements.

Performing multiple comparisons in R


We consider the same dataset as in the previous examples: ExampleDataNarrow.txt.
Import the data and print the structure.

str(dat)

## ’data.frame’: 60 obs. of 2 variables:


## $ Population: Factor w/ 3 levels "Convenience",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ Sales : int 529 658 793 514 663 719 711 606 461 529 ...

Pairwise hypothsis tests

# No adjustment to p-values
pairwise.t.test(x = dat$Sales, g = dat$Population,
p.adjust.method = 'none')

##
## Pairwise comparisons using t tests with pooled SD
##
## data: dat$Sales and dat$Population
##
## Convenience Price
## Price 0.301 -
## Quality 0.014 0.143
##
## P value adjustment method: none

# Bonferroni adjustment
pairwise.t.test(x = dat$Sales, g = dat$Population,
p.adjust.method = 'bonferroni')

5
##
## Pairwise comparisons using t tests with pooled SD
##
## data: dat$Sales and dat$Population
##
## Convenience Price
## Price 0.904 -
## Quality 0.043 0.428
##
## P value adjustment method: bonferroni

Note that one way to visualise the different groups would be a side-by-side box plot. Below is an example
of another plot where we plot the confidence intervals of each group.

# Get means of each group


convMean = mean(dat$Sales[dat$Population == 'Convenience'])
qualMean = mean(dat$Sales[dat$Population == 'Quality'])
priceMean = mean(dat$Sales[dat$Population == 'Price'])

means = c(convMean, qualMean, priceMean)

# Get MSE
convVar = var(dat$Sales[dat$Population == 'Convenience'])
qualVar = var(dat$Sales[dat$Population == 'Quality'])
priceVar = var(dat$Sales[dat$Population == 'Price'])
vars = c(convVar, qualVar, priceVar)
n_vec = rep(20, 3)
mse = sum((n_vec - 1)*vars)/(sum(n_vec) - length(n_vec))

# Get CI for each group (no adjustment)


alpha = 0.05
t_crit = qt(1 - alpha/2, df = 57)
error = t_crit*sqrt(mse*(1/20+1/20))

# Make plot
df_plot = data.frame('Population' =
as.factor(c('Convenience',
'Quality',
'Price')))

df_plot$Mean = means

df_plot$error = rep(error, length.out = nrow(df_plot))

ggplot(data = df_plot, aes(x = Population, y = Mean)) +


geom_errorbar(aes(ymin = Mean - error,
ymax = Mean + error),
width = 0.1) +
geom_point(shape = 1, size = 3, fill='white') +
xlab('Population') +
ylab('Sales') +
theme_pubr()

6
700

650
Sales

600

550

Convenience Price Quality


Population

You might also like