0% found this document useful (0 votes)

35 views185 pages

Statistics Notes

The document outlines the principles of experimental design, including induction, model processes, and the steps in designing experiments. It covers statistical methods such as hypothesis testing, confidence intervals, and ANOVA, along with various designs like factorial and nested designs. Additionally, it includes sections on model diagnostics and treatment comparisons.

Uploaded by

Pamisha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

35 views185 pages

Statistics Notes

Uploaded by

Pamisha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 185

1 Principles of experimental design 1

1.1 Induction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Model of a process or system . . . . . . . . . . . . . . . . . . . 2
1.3 Experiments and observational studies . . . . . . . . . . . . . 2
1.4 Steps in designing an experiment . . . . . . . . . . . . . . . . 6

2 Test statistics and randomization distributions 9

2.1 Summaries of sample populations . . . . . . . . . . . . . . . . 10
2.2 Hypothesis testing via randomization . . . . . . . . . . . . . . 13
2.3 Essential nature of a hypothesis test . . . . . . . . . . . . . . 17
2.4 Sensitivity to the alternative hypothesis . . . . . . . . . . . . . 18
2.5 Basic decision theory . . . . . . . . . . . . . . . . . . . . . . . 23

3 Tests based on population models 25

3.1 Relating samples to populations . . . . . . . . . . . . . . . . . 25
3.2 The normal distribution . . . . . . . . . . . . . . . . . . . . . 29
3.3 Introduction to the t-test . . . . . . . . . . . . . . . . . . . . . 30
3.4 Two sample tests . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.5 Checking assumptions . . . . . . . . . . . . . . . . . . . . . . 42
3.5.1 Checking normality . . . . . . . . . . . . . . . . . . . . 43
3.5.2 Unequal variances . . . . . . . . . . . . . . . . . . . . . 43

4 Confidence intervals and power 47

4.1 Confidence intervals via hypothesis tests . . . . . . . . . . . . 47
4.2 Power and Sample Size Determination . . . . . . . . . . . . . 49
4.2.1 The non-central t-distribution . . . . . . . . . . . . . . 52
4.2.2 Computing the Power of a test . . . . . . . . . . . . . 54

i
CONTENTS ii

5 Introduction to ANOVA 60
5.1 A model for treatment variation . . . . . . . . . . . . . . . . . 62
5.1.1 Model Fitting . . . . . . . . . . . . . . . . . . . . . . . 63
5.1.2 Testing hypothesis with MSE and MST . . . . . . . . . 66
5.2 Partitioning sums of squares . . . . . . . . . . . . . . . . . . . 70
5.2.1 The ANOVA table . . . . . . . . . . . . . . . . . . . . 72
5.2.2 Understanding Degrees of Freedom: . . . . . . . . . . . 73
5.2.3 More sums of squares geometry . . . . . . . . . . . . . 76
5.3 Unbalanced Designs . . . . . . . . . . . . . . . . . . . . . . . . 78
5.3.1 Sums of squares and degrees of freedom . . . . . . . . . 79
5.3.2 ANOVA table for unbalanced data: . . . . . . . . . . . 81
5.4 Normal sampling theory for ANOVA . . . . . . . . . . . . . . 83
5.4.1 Sampling distribution of the F -statistic . . . . . . . . . 85
5.4.2 Comparing group means . . . . . . . . . . . . . . . . . 88
5.4.3 Power calculations for the F-test . . . . . . . . . . . . 90
5.5 Model diagnostics . . . . . . . . . . . . . . . . . . . . . . . . . 92
5.5.1 Detecting violations with residuals . . . . . . . . . . . 93
5.5.2 Checking normality assumptions: . . . . . . . . . . . . 94
5.5.3 Checking variance assumptions . . . . . . . . . . . . . 96
5.5.4 Variance stabilizing transformations . . . . . . . . . . . 100
5.6 Treatment Comparisons . . . . . . . . . . . . . . . . . . . . . 106
5.6.1 Contrasts . . . . . . . . . . . . . . . . . . . . . . . . . 107
5.6.2 Orthogonal Contrasts . . . . . . . . . . . . . . . . . . . 110
5.6.3 Multiple Comparisons . . . . . . . . . . . . . . . . . . 112
5.6.4 False Discovery Rate procedures . . . . . . . . . . . . . 115
5.6.5 Nonparametric tests . . . . . . . . . . . . . . . . . . . 115

6 Factorial Designs 116

6.1 Data analysis: . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
6.2 Additive e↵ects model . . . . . . . . . . . . . . . . . . . . . . 123
6.3 Evaluating additivity: . . . . . . . . . . . . . . . . . . . . . . . 126
6.4 Inference for additive treatment e↵ects . . . . . . . . . . . . . 130
6.5 Randomized complete block designs . . . . . . . . . . . . . . . 140
6.6 Unbalanced designs . . . . . . . . . . . . . . . . . . . . . . . . 146
6.7 Non-orthogonal sums of squares: . . . . . . . . . . . . . . . . . 153
6.8 Analysis of covariance . . . . . . . . . . . . . . . . . . . . . . 155
6.9 Types of sums of squares . . . . . . . . . . . . . . . . . . . . . 159
CONTENTS iii

7 Nested Designs 163

7.1 Mixed-e↵ects approach . . . . . . . . . . . . . . . . . . . . . . 171
7.2 Repeated measures analysis . . . . . . . . . . . . . . . . . . . 174
List of Figures

1.1 Model of a variable process . . . . . . . . . . . . . . . . . . . . 2

2.1 Wheat yield distributions . . . . . . . . . . . . . . . . . . . . . 12

2.2 Approximate randomization distribution for the wheat example 16
2.3 Histograms and empirical CDFs of the first two hypothetical
samples. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.4 Randomization distributions for the t and KS statistics for
the first example. . . . . . . . . . . . . . . . . . . . . . . . . 21
2.5 Histograms and empirical CDFs of the second two hypotheti-
cal samples. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.6 Randomization distributions for the t and KS statistics for
the second example. . . . . . . . . . . . . . . . . . . . . . . . 22

3.1 The population model . . . . . . . . . . . . . . . . . . . . . . 27

2
3.2 distributions . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.3 t-distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.4 The t-distribution under H0 for the wheat example . . . . . . 39
3.5 Randomization and t-distributions for the t-statistic under H0 40
3.6 Normal scores plots. . . . . . . . . . . . . . . . . . . . . . . . 44

4.1 A t10 distribution and two non-central t10 -distributions. . . . . 52

4.2 Critical regions and the non-central t-distribution . . . . . . . 55
4.3 and power versus sample size, and the normal approximation
to the power. . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.4 Null and alternative distributions for another wheat example,
and power versus sample size. . . . . . . . . . . . . . . . . . . 59

5.1 Response time data . . . . . . . . . . . . . . . . . . . . . . . . 61

5.2 Randomization distribution of the F -statistic . . . . . . . . . . 70

iv
LIST OF FIGURES v

5.3 Coagulation data . . . . . . . . . . . . . . . . . . . . . . . . . 83

5.4 F-distributions . . . . . . . . . . . . . . . . . . . . . . . . . . 87
5.5 Normal-theory and randomization distributions of the F -statistic 88
5.6 Power as a function of n for m = 4, ↵ = 0.05 and ⌧¯2 / 2 = 1 . 92
5.7 Power as a function of n for m = 4, ↵ = 0.05 and ⌧¯2 / 2 = 2 . 92
5.8 Normal scores plots of normal samples, with n 2 {20, 50, 100} 95
5.9 Crab data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
5.10 Crab residuals . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
5.11 Fitted values versus residuals . . . . . . . . . . . . . . . . . . 99
5.12 Data and log data . . . . . . . . . . . . . . . . . . . . . . . . . 101
5.13 Diagnostics after the log transformation . . . . . . . . . . . . 102
5.14 Mean-variance relationship of the transformed data . . . . . . 107
5.15 Yield-density data . . . . . . . . . . . . . . . . . . . . . . . . 110

6.1 Marginal Plots. . . . . . . . . . . . . . . . . . . . . . . . . . . 118

6.2 Conditional Plots. . . . . . . . . . . . . . . . . . . . . . . . . 119
6.3 Cell plots. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
6.4 Mean-variance relationship. . . . . . . . . . . . . . . . . . . . 120
6.5 Mean-variance relationship for transformed data. . . . . . . . 121
6.6 Plots of transformed poison data . . . . . . . . . . . . . . . . 122
6.7 Comparison between types I and II, without respect to deliv-
ery. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
6.8 Comparison between types I and II, with delivery in color. . . 132
6.9 Marginal plots of the data. . . . . . . . . . . . . . . . . . . . . 136
6.10 Three datasets exhibiting non-additive e↵ects. . . . . . . . . . 139
6.11 Experimental material in need of blocking. . . . . . . . . . . . 141
6.12 Results of the experiment . . . . . . . . . . . . . . . . . . . . 142
6.13 Marginal plots, and residuals without controlling for row. . . 143
6.14 Marginal plots for pain data . . . . . . . . . . . . . . . . . . . 150
6.15 Interaction plots for pain data . . . . . . . . . . . . . . . . . . 151
6.16 Oxygen uptake data . . . . . . . . . . . . . . . . . . . . . . . 156
6.17 ANOVA and ANCOVA fits to the oxygen uptake data . . . . 157
6.18 Unbalanced design: Controlling eliminates e↵ect. . . . . . . . 159
6.19 Unbalanced design: Controlling highlights e↵ect. . . . . . . . . 161

7.1 Potato data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165

7.2 Diagnostic plots for potato ANOVA. . . . . . . . . . . . . . . 166
7.3 Potato data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
LIST OF FIGURES vi

7.4 Potato data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172

7.5 Sitka spruce data. . . . . . . . . . . . . . . . . . . . . . . . . . 174
7.6 ANCOVA fit and residuals . . . . . . . . . . . . . . . . . . . . 176
7.7 Within-tree dependence . . . . . . . . . . . . . . . . . . . . . 177
7.8 Reduction to tree-specific summary statistics . . . . . . . . . 178
Chapter 1

Principles of experimental
design

1.1 Induction
Much of our scientific knowledge about processes and systems is based on
induction: reasoning from the specific to the general.

Example (survey): Do you favor increasing the gas tax for public trans-
portation?

• Specific cases: 200 people called for a telephone survey

• Inferential goal: get information on the opinion of the entire city.

Example (Women’s Health Initiative): Does hormone replacement im-

prove health status in post-menopausal women?

• Specific cases: Health status monitored in 16,608 women over a 5-year

period. Some took hormones, others did not.

• Inferential goal : Determine if hormones improve the health of women

not in the study.

1
CHAPTER 1. PRINCIPLES OF EXPERIMENTAL DESIGN 2

x2 Process y

Figure 1.1: Model of a variable process

1.2 Model of a process or system

We are interested in how the inputs of a process a↵ect an output. Input
variables consist of
controllable factors x1 : measured and determined by scientist.
uncontrollable factors x2 : measured but not determined by scientist.
noise factors ✏: unmeasured, uncontrolled factors, often called experimen-
tal variability or “error”.
For any interesting process, there are inputs such that:
variability in input ! variability in output
If variability in an input factor x leads to variability in output y, we say x is
a source of variation. In this class we will discuss methods of designing and
analyzing experiments to determine important sources of variation.

1.3 Experiments and observational studies

Information on how inputs a↵ect output can be gained from:
• Observational studies: Input and output variables are observed from a
pre-existing population. It may be hard to say what is input and what
is output.
• Controlled experiments: One or more input variables are controlled
and manipulated by the experimenter to determine their e↵ect on the
output.
CHAPTER 1. PRINCIPLES OF EXPERIMENTAL DESIGN 3

Example (Women’s Health Initiative, WHI):

• Population: Healthy, post-menopausal women in the U.S.

• Input variables:

1. estrogen treatment, yes/no

2. demographic variables (age, race, diet, etc.)
3. unmeasured variables (?)

• Output variables:

1. coronary heart disease (eg. MI)

2. invasive breast cancer
3. other health related outcomes

• Scientific question: How does estrogen treatment a↵ect health out-

comes?

Observational Study:

1. Observational population: 93,676 women enlisted starting in 1991,

tracked over eight years on average. Data consists of x= input variables,
y=health outcomes, gathered concurrently on existing populations.

2. Results: good health/low rates of CHD generally associated with estro-

gen treatment.

3. Conclusion: Estrogen treatment is positively associated with health out-

comes, such as prevalence of CHD.

Experimental Study (WHI randomized controlled trial):

1. Experimental population:

373,092 women determined to be eligible

,! 18,845 provided consent to be in experiment
,! 16,608 included in the experiment
CHAPTER 1. PRINCIPLES OF EXPERIMENTAL DESIGN 4

⇢
x = 1 (estrogen treatment)
16,608 women randomized to either
x = 0 (no estrogen treatment)

Women were of di↵erent ages and were treated at di↵erent clinics.

Women were blocked together by age and clinic, and then treatments
were randomly assigned within each age⇥treatment block. This type
of random allocation is called a randomized block design.

age group
1 (50-59) 2 (60-69) 3 (70-79)
clinic 1 n11 n12 n13
2 n21 n22 n23
.. .. .. ..
. . . .

ni,j = # of women in study, in clinic i and in age group j

= # of women in block i, j

Randomization scheme: For each block, 50% of the women in that

block were randomly assigned to treatment (x = 1) and the remaining
assigned to control (x = 0).
Question: Why did they randomize within a block?

2. Results: JAMA, July 17 2002. Also see the NLHBI press release . Women
on treatment had a higher incidence rate of

• CHD
• breast cancer
• stroke
• pulmonary embolism

and a lower incidence rate of

• colorectal cancer
• hip fracture
CHAPTER 1. PRINCIPLES OF EXPERIMENTAL DESIGN 5

3. Conclusion: Estrogen treatment is not a viable preventative measure

for CHD in the general population. That is, our inductive inference is
(specific) higher CHD rate in treatment population than control

suggests

(general) if everyone in the population were treated, the incidence

rate of CHD would increase.

Question: Why the di↵erent conclusions between the two studies? Con-
sider the following possible explanation: Let

x = estrogen treatment

✏ = “health consciousness” (not directly measured)

y = health outcomes

x correlation y

Association between x and y may be due to an unmeasured variable ✏.

randomization

x correlation= y
causation
CHAPTER 1. PRINCIPLES OF EXPERIMENTAL DESIGN 6

Randomization breaks the association between ✏ and x.

Observational studies can suggest good experiments to run, but can’t

definitively show causation.

Randomization can eliminate correlation between x and y due to a

di↵erent cause ✏, aka a confounder.

“No causation without randomization”

1.4 Steps in designing an experiment

1. Identify research hypotheses to be tested.

2. Choose a set of experimental units, which are the units to which treat-
ments will be randomized.

3. Choose a response/output variable.

4. Determine potential sources of variation in response:

(a) factors of interest

(b) nuisance factors

5. Decide which variables to measure and control:

(a) treatment variables

(b) potential large sources of variation in the units (blocking variables)

6. Decide on the experimental procedure and how treatments are to be

randomly assigned.

The order of these steps may vary due to constraints such as budgets, ethics,
time, etc..
CHAPTER 1. PRINCIPLES OF EXPERIMENTAL DESIGN 7

Three principles of Experimental Design

1. Replication: Repetition of an experiment.
Replicates are runs of an experiment or sets of experimental units that
have the same values of the control variables.
More replication ! more precise inference
Let
yA,i = response of the ith unit assigned to treatment A
yB,i = response of the ith unit assigned to treatment B
i = 1, . . . , n.
Then ȳA 6= ȳB provides evidence that treatment a↵ects response, i.e.
treatment is a source of variation, with the amount of evidence increas-
ing with n.
2. Randomization: Random assignment of treatments to experimental units.
This removes potential for systematic bias on the part of the researcher,
and removes any preexperimental source of bias. Makes confounding
the e↵ect of treatment with an unobserved variable unlikely (but not
impossible).
3. Blocking: Randomization within blocks of homogeneous experimental
units. The goal is to evenly distribute treatments across large potential
sources of variation.

Example (Crop study):

Hypothesis: Tomato type (A versus B) a↵ects tomato yield.
Experimental units: three plots of land, each to be divided into a 2 ⇥ 2 grid.
Outcome: Tomato yield.
Factor of interest: Tomato type, A or B.
Nuisance factor : Soil quality.

bad soil good soil

CHAPTER 1. PRINCIPLES OF EXPERIMENTAL DESIGN 8

Questions for discussion:

• What are the benefits of this design?

• What other designs might work?

• What other designs wouldn’t work?

• Should the plots be divided up further? If so, how should treatments

then be assigned?
Chapter 2

Test statistics and

randomization distributions

Example: Wheat yield

Question: Is one fertilizer better than another, in terms of yield?

Outcome variable: Wheat yield.

Factor of interest: Fertilizer type, A or B. One factor having two levels.

Experimental material: One plot of land to be divided into 2 rows of 6

subplots each.

1. Design question: How should we assign treatments/factor levels to

the plots? We want to avoid confounding a treatment e↵ect with another
potential source of variation.

2. Potential sources of variation: Fertilizer, soil, sun, water, etc.

9
CHAPTER 2. TEST STATISTICS AND RANDOMIZATION DISTRIBUTIONS10

3. Implementation: If we assign treatments randomly, we can avoid

any pre-experimental bias in results: 12 playing cards, 6 red, 6 black were
shu✏ed and dealt:
1st card black ! 1st plot gets B
2nd card red ! 2nd plot gets A
3rd card black ! 3rd plot gets B
..
.

This is the first design we will study, a completely randomized design.

4. Results:
B A B A B B
26.9 11.4 26.6 23.7 25.3 28.5
B A A A B A
14.2 17.9 16.5 21.1 24.3 19.6
How much evidence is there that fertilizer type is a source of yield variation?
Evidence about di↵erences between two populations is generally measured by
comparing summary statistics across the two sample populations. (Recall, a
statistic is any computable function of known, observed data).

2.1 Summaries of sample populations

Distribution:

• Empirical distribution: P̂r(a, b] = #(a < yi  b)/n

• Empirical CDF (cumulative distribution function)

F̂ (y) = #(yi  y)/n = P̂r( 1, y]

• Histograms
• Kernel density estimates

Note that these summaries more or less retain all the information in
the data except the unit labels.

Location:
CHAPTER 2. TEST STATISTICS AND RANDOMIZATION DISTRIBUTIONS11

1
Pn
• sample mean or average : ȳ = n i=1 yi
• sample median : A/the value y.5 such that

#(yi  y.5 )/n 1/2 #(yi y.5 )/n 1/2

To find the median, sort the data in increasing order, and call
these values y(1) , . . . , y(n) . If there are no ties, then
if n is odd, then y( n+1 ) is the median;
2
if n is even, then all numbers between y( n2 ) and y( n2 +1) are
medians.

Scale:

• sample variance and standard deviation:

n
X
1 p
s2 = (yi ȳ)2 , s= s2
n 1 i=1

• interquantile range:
[y.25 , y.75 ] (interquartile range)
[y.025 , y.975 ] (95% interval)

Example: Wheat yield

All of these sample summaries are easily obtained in R:

> yA< c ( 1 1 . 4 , 2 3 . 7 , 1 7 . 9 , 1 6 . 5 , 2 1 . 1 , 1 9 . 6 )
> yB< c ( 2 6 . 9 , 2 6 . 6 , 2 5 . 3 , 2 8 . 5 , 1 4 . 2 , 2 4 . 3 )

> mean (yA)

[ 1 ] 18.36667
> mean (yB)
[ 1 ] 24.3

> median (yA)

[ 1 ] 18.75
> median (yB)
[ 1 ] 25.95
CHAPTER 2. TEST STATISTICS AND RANDOMIZATION DISTRIBUTIONS12
1.0

0.06

0.04
0.8

0.04
0.4 0.6

Density

Density
F(y)

0.02
0.02
0.2

0.00

0.00
0.0

15 20 25 10 15 20 25 30 0 10 20 30 40
y y y
1.0

0.00 0.08

0.12
Density
0.8

0.08
0.4 0.6

Density

10 15 20 25
F(y)

yA
0.04
0.00 0.10
Density
0.2

0.00
0.0

15 20 25 10 15 20 25 30 10 15 20 25 30 35
y yB

Figure 2.1: Wheat yield distributions

CHAPTER 2. TEST STATISTICS AND RANDOMIZATION DISTRIBUTIONS13

> sd (yA)
[ 1 ] 4.234934
> sd (yB)
[ 1 ] 5.151699

> q u a n t i l e (yA , prob=c ( . 2 5 , . 7 5 ) )

25% 75%
16.850 20.725
> q u a n t i l e (yB , prob=c ( . 2 5 , . 7 5 ) )
25% 75%
24.550 26.825

So there is a di↵erence in yield for these wheat fields.

Would you recommend B over A for future plantings?
Do you think these results generalize to a larger population?

2.2 Hypothesis testing via randomization

Questions:

• Could the observed di↵erences be due to fertilizer type?

• Could the observed di↵erences be due to plot-to-plot variation?

Hypothesis tests:

• H0 (null hypothesis): Fertilizer type does not a↵ect yield.

• H1 (alternative hypothesis): Fertilizer type does a↵ect yield.

A statistical hypothesis test evaluates the plausibility of H0 in light of the

data.
Suppose we are interested in mean wheat yields. We can evaluate H0 by
answering the following questions:

• Is a mean di↵erence of 5.93 plausible/probable if H0 is true?

• Is a mean di↵erence of 5.93 large compared to experimental noise?

To answer the above, we need to compare

{|ȳB ȳA | = 5.93}, the observed di↵erence in the experiment
CHAPTER 2. TEST STATISTICS AND RANDOMIZATION DISTRIBUTIONS14

to
values of |ȳB ȳA | that could have been observed if H0 were true.
Hypothetical values of |ȳB ȳA | that could have been observed under H0 are
referred to as samples from the null distribution.

Finding a null distribution: Let

g(YA , YB ) = g({Y1,A , . . . , Y6,A }, {Y1,B , . . . , Y6,B }) = |ȲB ȲA |.

This is a function of the outcome of the experiment. It is a statistic. Since

we will use it to perform a hypothesis test, we will call it a test statistic.

Observed test statistic: g(11.4, 23.7, . . . , 14.2, 24.3) = 5.93 = gobs

Hypothesis testing procedure: Compare gobs to g(YA , YB ) for values of YA

and YB that could have been observed, if H0 were true.

Recall the design of the experiment:

1. Cards were shu✏ed and dealt B, R, B, R, . . . and fertilizer types planted
in subplots:

B A B A B B

B A A A B A

2. Crops were grown and wheat yields obtained:

B A B A B B
26.9 11.4 26.6 23.7 25.3 28.5
B A A A B A
14.2 17.9 16.5 21.1 24.3 19.6

Now imagine re-doing the experiment in a universe where “H0 : no treatment

e↵ect” is true:
1. Cards are shu✏ed and dealt B, R, B, B, . . . and wheat types planted in
subplots:
CHAPTER 2. TEST STATISTICS AND RANDOMIZATION DISTRIBUTIONS15

B A B B A A

A B B A A B

2. Crops are grown and wheat yields obtained:

B A B B A A
26.9 11.4 26.6 23.7 25.3 28.5
A B B A A B
14.2 17.9 16.5 21.1 24.3 19.6

Under this hypothetical treatment assignment,

(YA , YB ) = {11.4, 25.3, . . . , 21.1, 19.6}

|ȲB ȲA | = 1.07

This represents an outcome of the experiment in a universe where

• The treatment assignment is B, A, B, B, A, A, A, B, B, A, A, B;

• H0 is true.

IDEA: To consider what types of outcomes we would see in universes where

H0 is true, compute g(YA , YB ) under every possible treatment assignment
and assuming H0 is true.
Under our randomization scheme, there were
✓ ◆
12! 12
= = 924
6!6! 6

equally likely ways the treatments could have been assigned. For each one
of these, we can calculate the value of the test statistic that would’ve been
observed under H0 :
{g1 , g2 , . . . , g924 }
This enumerates all potential pre-randomization outcomes of our test statis-
tic, assuming no treatment e↵ect. Along with the fact that each treatment
CHAPTER 2. TEST STATISTICS AND RANDOMIZATION DISTRIBUTIONS16
0.12

0.20
0.08
Density

Density
0.10
0.04 0.00

−10 −5 0 5 10 0.00 0 2 4 6 8
YB − YA |YB − YA|

Figure 2.2: Approximate randomization distribution for the wheat example

assignment is equally likely, these value give a null distribution, a probability

distribution of possible experimental results, if H0 is true.

#{gk  x}
F (x|H0 ) = Pr(g(YA , YB )  x|H0 ) =
924
This distribution is sometimes called the randomization distribution, be-
cause it is obtained by the randomization scheme of the experiment.

Comparing data to the null distribution:

Is there any contradiction between H0 and our data?

Pr(g(YA , YB ) 5.93|H0 ) = 0.056

According to this calculation, the probability of observing a mean di↵er-

ence of 5.93 or more is unlikely under the null hypothesis. This probability
calculation is called a p-value. Generically, a p-value is

“The probability, under the null hypothesis, of obtaining a result as or more

extreme than the observed result.”
CHAPTER 2. TEST STATISTICS AND RANDOMIZATION DISTRIBUTIONS17

The basic idea:

small p-value ! evidence against H0

large p-value ! no evidence against H0

Approximating a randomization distribution:

We don’t want to have to enumerate all nAn+n A
B
possible treatment assign-
ments. Instead, repeat the following S times for some large number S:

(a) randomly simulate a treatment assignment from the population of pos-

sible treatment assignments, under the randomization scheme.

(b) compute the value of the test statistic, given the simulated treatment
assignment and under H0 .

The empirical distribution of {g1 , . . . , gS } approximates the null distribu-

tion:

#(gs gobs )
⇡ Pr(g(YA , YB ) gobs |H0 )
S
The approximation improves if S is increased.
Here is some R-code:
y< c ( 2 6 . 9 , 1 1 . 4 , 2 6 . 6 , 2 3 . 7 , 2 5 . 3 , 2 8 . 5 , 1 4 . 2 , 1 7 . 9 , 1 6 . 5 , 2 1 . 1 , 2 4 . 3 , 1 9 . 6 )
x< c ( ”B” , ”A” , ”B” , ”A” , ”B” , ”B” , ”B” , ”A” , ”A” , ”A” , ”B” , ”A” )

g . null< real ()
for ( s in 1:10000)
{
xsim< sample ( x )
g . n u l l [ s]< abs ( mean ( y [ xsim==”B ” ] ) mean ( y [ xsim==”A” ] ) )
}

2.3 Essential nature of a hypothesis test

Given H0 , H1 and data y = {y1 , . . . , yn }:
CHAPTER 2. TEST STATISTICS AND RANDOMIZATION DISTRIBUTIONS18

1. From the data, compute a relevant test statistic g(y): The test statistic
g(y) should be chosen so that it can di↵erentiate between H0 and H1 in
ways that are scientifically relevant. Typically, g(y) is chosen so that
⇢
small under H0
g(y) is probably
large under H1

2. Obtain a null distribution : A probability distribution over the possible

outcomes of g(Y) under H0 . Here, Y = {Y1 , . . . , Yn } are potential
experimental results that could have happened under H0 .

3. Compute the p-value: The probability under H0 of observing a test

statistic g(Y) as or more extreme than the observed statistic g(y).

p-value = Pr(g(Y) g(y)|H0 )

If the p-value is small ) evidence against H0

If the p-value is large ) not evidence against H0
Even if we follow these guidelines, we must be careful in our specification of
H0 , H1 and g(Y) for the hypothesis testing procedure to be useful.

Questions:
• Is a small p-value evidence in favor of H1 ?

• Is a large p-value evidence in favor of H0 ?

• What does the p-value say about the probability that the null hypoth-
esis is true? Try using Bayes’ rule to figure this out.

2.4 Sensitivity to the alternative hypothesis

In the previous section we said that the test statistic g(y) should be able
to “di↵erentiate between H0 and H1 in ways that are scientifically relevant.”
What does this mean?
Suppose our data consist of samples yA and yB from two populations A and
B. Previously we used g(yA , yB ) = |ȳB ȳA |. Let’s consider two di↵erent
test statistics:
CHAPTER 2. TEST STATISTICS AND RANDOMIZATION DISTRIBUTIONS19

t-statistic:
|ȳ ȳA |
gt (yA , yB ) = p B , where
sp 1/nA + 1/nB
nA 1 nB 1
s2p = s2 + s2B
(nA 1) + (nB 1) A (nA 1) + (nB 1)
This is a scaled version of our previous test statistic, in which we com-
pare the di↵erence in sample means to a pooled version of the sample
standard deviation and the sample size. Note that this statistic is
• increasing in |ȳB ȳA |;
• increasing in nA and nB ;
• decreasing in sp .
A more complete motivation for using this statistic will be given in the
next chapter.
Kolmogorov-Smirnov statistic:
gKS (yA , yB ) = max |F̂B (y) F̂A (y)|
y2R

This is just the size of the largest gap between the two sample CDFs.

Comparing the test statistics:

Suppose we perform a CRD and obtain samples yA and yB like those in

Figure 2.3. For these data,
• nA = nB = 40
• ȳA = 10.05, ȳB = 9.70.
• sA = 0.87, sB = 2.07
The main di↵erence between the two samples seems to be in their variances
and not in their means. Now let’s consider evaluating
H0 : treatment does not a↵ect response
using our two new test statistics. We can approximate the null distributions
of gt (YA , YB ) and gKS (YA , YB ) by randomly reassigning the treatments but
leaving the responses fixed:
CHAPTER 2. TEST STATISTICS AND RANDOMIZATION DISTRIBUTIONS20
0.6

1.0
Density
0.3

0.8
0.0

0.6
6 8 10 12 14

F(y)
yA

0.4
Density

0.2
0.10

0.0
0.00

6 8 10 12 14 6 8 10 12 14
yB y

Figure 2.3: Histograms and empirical CDFs of the first two hypothetical
samples.

Gsim< NULL
for ( s in 1:5000)
{
xsim< sample ( x )
yAsim< y [ xsim==”A” ] ; yBsim< y [ xsim==”B” ]
g1< g . t s t a t ( yAsim , yBsim )
g2< g . ks ( yAsim , yBsim )
Gsim< r b i n d ( Gsim , c ( g1 , g2 ) )
}

These calculations give:

t-statistic : gt (yA , yB ) = 1.00 , Pr(gt (YA , YB ) 1.00) = 0.321

KS-statistic: gKS (yA , yB ) = 0.30 , Pr(gKS (YA , YB ) 0.30) = 0.043

The hypothesis test based on the t-statistic does not indicate strong evidence
against H0 , whereas the test based on the KS-statistic does. The reason is
that the t-statistic is only sensitive to di↵erences in means. In particu-
lar, if ȳA = ȳB then the t-statistic is zero, its minimum value. In contrast, the
KS-statistic is sensitive to any di↵erences in the sample distributions.
Now let’s consider a second dataset, shown in Figure 2.5, for which

• nA = nB = 40
CHAPTER 2. TEST STATISTICS AND RANDOMIZATION DISTRIBUTIONS21

6
0.6

5
Density

Density
3 4
0.4

2
0.2

1
0.0

0 1 2 3 4 0 0.1 0.2 0.3 0.4

t statistic KS statistic

Figure 2.4: Randomization distributions for the t and KS statistics for the
first example.

• ȳA = 10.11, ȳB = 10.73.

• sA = 1.75, sB = 1.85

The di↵erence in sample means is about twice as large as in the previous ex-
ample, and the sample standard deviations are pretty similar. The B-samples
are slightly larger than the A-samples on average. Is there evidence that this
is caused by treatment? Again, we evaluate H0 using the randomization
distributions of our two test statistics.

t-statistic : gt (yA , yB ) = 1.54 , Pr(gt (YA , YB ) 1.54) = 0.122

KS-statistic: gKS (yA , yB ) = 0.25 , Pr(gKS (YA , YB ) 0.25) = 0.106

This time the two test statistics indicate similar evidence against H0 . This
is because the di↵erence in the two sample distributions could primarily be
summarized as the di↵erence between the sample means, which the t-statistic
can identify.
CHAPTER 2. TEST STATISTICS AND RANDOMIZATION DISTRIBUTIONS22
0.15 0.30

1.0
Density

0.8
0.00

0.6
8 10 12 14 16

F(y)
yA

0.4
Density
0.15

0.2
0.0
0.00

8 10 12 14 16 8 10 12 14 16
yB y

Figure 2.5: Histograms and empirical CDFs of the second two hypothetical
samples.
6
0.6

5
Density

Density
2 3 4
0.2 0.4

1
0.0

0 1 2 3 4 0.1 0.2 0.3 0.4 0.5

t statistic KS statistic

Figure 2.6: Randomization distributions for the t and KS statistics for the
second example.
CHAPTER 2. TEST STATISTICS AND RANDOMIZATION DISTRIBUTIONS23

Discussion: These last two examples suggest we should abandon gt in favor

of gKS if we are interested in comparing the following hypothesis:

H0 : treatment does not a↵ect response

H1 : treatment does a↵ect response

This is because, as we found, gt is not sensitive all violations of H0 , it is only

sensitive to violations of H0 where there is a di↵erence in means. However,
in many situations we are actually interested in comparing the following
hypotheses:

H0 : treatment does not a↵ect response

H1 : treatment increases responses or decreases responses

In this case H0 and H1 are not complementary, and we are only interested
in evidence against H0 of a certain type, i.e. evidence that is consistent with
H1 . In this situation we may want to use a statistic like gt .

2.5 Basic decision theory

Task: Accept or reject H0 based on data.

truth
action H0 true H0 false
accept H0 correct decision type II error
reject H0 type I error correct decision

As we discussed,
• the p-value can measure of evidence against H0 ;

• the smaller the p-value, the larger the evidence against H0 .

Decision procedure:
1. Compute the p-value by comparing observed test statistic to the null
distribution.

2. Reject H0 if the p-value  ↵, otherwise accept H0 .

CHAPTER 2. TEST STATISTICS AND RANDOMIZATION DISTRIBUTIONS24

This procedure is called a level-↵ test. It controls the pre-experimental prob-

ability of a type I error, or for a series of experiments, controls the type I
error rate.

Pr(type I error|H0 ) = Pr(reject H0 |H0 )

= Pr(p-value  ↵|H0 )
= ↵

Single Experiment Interpretation: If you use a level-↵ test for your ex-
periment where H0 is true, then before you run the experiment
there is probability ↵ that you will erroneously reject H0 .

Many Experiments Interpretation: If level-↵ tests are used in a large

population of experiments, then H0 will be declared false in (100 ⇥ ↵)%
of the experiments in which H0 is true.

Pr(H0 rejected|H0 true) = ↵

Pr(H0 accepted|H0 true) = 1 ↵
Pr(H0 rejected|H0 false) = ?
Pr(H0 accepted|H0 false) = ?

Pr(H0 rejected|H0 false) is the power. Typically we need to be more specific

than “H0 false” in order to calculate the power. We need to specify how it
is false.
Chapter 3

Tests based on population

models

3.1 Relating samples to populations

If the experiment is

• complicated,

• non- or partially randomized, or

• includes nuisance factors

then a null distribution based on randomization may be difficult to obtain.

An alternative approach to hypothesis testing is based on formulating a sam-
pling model.
Consider the following model for our wheat yield experiment:

• There is a large/infinite population of plots of similar size/shape/com-

position as the plots in our experiment.

• When A is applied to these plots, the distribution of plot yields can be

represented by a probability distribution pA with
R
expectation = E[YA ] = ypA (y)dy = µA ,
variance =Var[YA ] = E[(YA µA )2 ] = 2
A.

25
CHAPTER 3. TESTS BASED ON POPULATION MODELS 26

• When B is applied to these plots, the distribution of plot yields can be

represented by a probability distribution pB with
R
expectation = E[YB ] = ypB (y)dy = µB ,
variance =Var[YB ] = E[(YB µB )2 ] = 2
B.

• The plots that received A in our experiment can be viewed as indepen-

dent samples from pA , and likewise for plots receiving B.

Y1,A , . . . , YnA ,A ⇠ i.i.d. pA

Y1,B , . . . , YnB ,B ⇠ i.i.d. pB

Recall from intro stats:

nA
1 X
E[ȲA ] = E[ Yi,A ]
nA i=1
nA
1 X
= E[Yi,A ]
nA i=1
nA
1 X
= µA = µA
nA i=1

We say that ȲA is an unbiased estimator of µA . Furthermore, if Y1,A , . . . , YnA ,A

are independent samples from population A, then
1 X
Var[ȲA ] = Var[ Yi,A ]
nA
1 X
= Var[Yi,A ]
n2A
1 X 2 2
= A = A /nA .
n2A
This means that as nA ! 1,

Var[ȲA ] = E[(YA µA )2 ] ! 0

which, with unbiasedness, implies

ȲA ! µA
CHAPTER 3. TESTS BASED ON POPULATION MODELS 27

‘All possible' A wheat yields Experimental samples

0.10
0.08

0.08
0.06

random sampling

0.06
yA=18.37
sA=4.23
0.04

0.04
0.02

0.02

µA
0.00
0.00

5 10 15 20 25 30 35
yA
‘All possible' B wheat yields Experimental samples
0.08

0.12
0.06

random sampling
0.08
0.04

yB=24.30
sB=5.15
0.04
0.02

µB
0.00
0.00

5 10 15 20 25 30 35
yB

Figure 3.1: The population model

CHAPTER 3. TESTS BASED ON POPULATION MODELS 28

and we say that ȲA is a consistent estimator for µA . Several of our other
sample characteristics are also consistent for the corresponding population
characteristics. As n ! 1,
ȲA ! µA
s2A ! A2
Z x
#{Yi,A  x}
= F̂A (x) ! FA (x) = pA (y)dy
nA 1

Back to hypothesis testing:

We can formulate null and alternative hypotheses in terms of population

quantities. For example, if µB > µA , we would recommend B over A, and
vice versa.
• H0 : µ A = µ B
• H1 : µA 6= µB
The experiment is performed and it is observed that
18.37 = ȳA < ȳB = 24.3
This is some evidence that µA > µB . How much evidence is it? Should we
reject the null hypothesis? Consider evaluating evidence against H0 with our
t-statistic:
|ȳB ȳA |
gt (yA , yB ) = p
sp 1/nA + 1/nB
To decide whether to reject H0 or not, we need to know the distribution of
g(YA , YB ) under H0 . Consider the following setup:
Assume:
Y1,A , . . . , YnA ,A ⇠ i.i.d. pA
Y1,B , . . . , YnB ,B ⇠ i.i.d. pB
Evaluate: H0 : µA = µB versus H1 : µA 6= µB , i.e., whether or not
Z Z
ypA (y)dy = ypB (y)dy.

To make this evaluation and obtain a p-value, we need the distribution of

g(YA , YB ) under µA = µB . This will involve assumptions about/approxima-
tions to pA and pB .
CHAPTER 3. TESTS BASED ON POPULATION MODELS 29

3.2 The normal distribution

The normal distribution is useful because in many cases,

• our data are approximately normally distributed, and/or

• our sample means are approximately normally distributed.

These are both due to the central limit theorem. Letting P (µ, ) denote a
population with mean µ and variance 2 , then
9
X1 ⇠ P1 (µ1 , 12 ) >
>
>
X2 ⇠ P2 (µ2 , 22 ) = Xm ⇣X X ⌘
2
.. ) Xi ⇠
˙ normal µ j , j .
. >
>
> j=1
Xm ⇠ Pm (µm , m 2
) ;

Sums of varying quantities are approximately normally distributed.

Normally distributed data

Consider crop yields from plots of land:

Yi = a1 ⇥ seedi + a2 ⇥ soili + a3 ⇥ wateri + a4 ⇥ suni + · · ·

The empirical distribution of crop yields from a population of fields with

varying quantities of seed, soil, water, sun, etc. will be approximately normal
(µ, ), where µ and depend on the e↵ects a1 , a2 , a3 , a4 , . . . and the variability
of seed, soil, water, sun, etc..
Additive e↵ects ) normally distributed data

Normally distributed means

Consider the following scenario:

(1) (1)
Experiment 1: sample y1 , . . . , yn ⇠ i.i.d. p and compute ȳ (1) ;
(2) (2)
Experiment 2: sample y1 , . . . , yn ⇠ i.i.d. p and compute ȳ (2) ;
..
.
CHAPTER 3. TESTS BASED ON POPULATION MODELS 30

(m) (m)
Experiment m: sample y1 , . . . , yn ⇠ i.i.d. p and compute ȳ (m) .

A histogram of {ȳ (1) , . . . , ȳ (m) } will look approximately normally distributed

with

sample mean {y (1) , . . . , y (m) } ⇡ µ

sample variance {y (1) , . . . , y (m) } ⇡ 2 /n
2
i.e. the sampling distribution of the mean is approximately normal(µ, /n),
even if the sampling distribution of the data are not normal.

Basic properties of the normal distribution:

2
• Y ⇠ normal(µ, ) ) aY + b ⇠normal(aµ + b, a2 2
).

• Y1 ⇠ normal(µ1 , 12 ), Y2 ⇠ normal(µ2 , 22 ), Y1 , Y2 independent

) Y1 + Y2 ⇠ normal(µ1 + µ2 , 12 + 22 )
2
• if Y1 , . . . , Yn ⇠ i.i.d. normal(µ, ), then Ȳ is statistically independent
of s2 .

How does this help with hypothesis testing?

Consider testing H0 : µA = µB (treatment doesn’t a↵ect mean). Then re-

gardless of distribution of data, under H0 :
· 2
ȲA ⇠ normal(µ, A /nA )
· 2
ȲB ⇠ normal(µ, B /nB )
· 2
ȲB ȲA ⇠ normal(0, AB )

2 2 2
where AB = A /nA + B /nB . So if we knew the variances, we’d have a null
distribution.

3.3 Introduction to the t-test

Consider a simple one-sample hypothesis test:
2
Y1 , . . . , Yn ⇠ i.i.d. P , with mean µ and variance .
CHAPTER 3. TESTS BASED ON POPULATION MODELS 31

H0 : µ = µ 0

H1 : µ 6= µ0

Examples:

Physical therapy

• Yi = muscle strength after treatment - muscle score before.

• H0 : E[Yi ] = 0

Physics

• Yi = boiling point of a sample of an unknown liquid.

• H0 : E[Yi ] = 100 C.

To test H0 , we need a test statistic and its distribution under H0 .

|ȳ µ0 | might make a good test statistic:

• it is sensitive to deviations from H0 .

• its sampling distribution is approximately known:

E[Ȳ ] = µ
2
Var[Ȳ ] = /n
Ȳ is approximately normal.

Under H0
2
(Ȳ µ0 ) ⇠ normal(0, /n),
2
but we can’t use this as a null distribution because is unknown. What if
we scale (Ȳ µ0 )? Then
Ȳ µ0
f (Y) = p
/ n
is approximately standard normal and we write f (Y) ⇠ normal(0, 1). Since
this distribution contains no unknown parameters we could potentially use
it as a null distribution. However, having observed the data y, is f (y) a
statistic?

• ȳ is computable from the data and n is known;

CHAPTER 3. TESTS BASED ON POPULATION MODELS 32

• µ0 is our hypothesized value, a fixed number that we have chosen.

• is not determined by us and is unknown.

2
The solution to this problem is to approximate the population variance
with the sample variance s2 .

One-sample t-statistic:

Ȳ µ0
t(Y) = p
s/ n
For a given value of µ0 this is a statistic. What is the null distribution of
t(Y)?
Ȳ µ0 Ȳ µ0
s⇡ so p ⇡ p
s/ n / n
If Y1 , . . . , Yn ⇠ i.i.d. normal(µ0 , 2 ) then Ȳ /pµn0 is normal(0, 1), and so it would
seem that t(Y) is approximately distributed as a standard normal distribu-
tion under H0 : µ = µ0 . However, if the approximation s ⇡ is poor, like
when n is small, we need to take account of our uncertainty in the estimate
of .

2
The distribution
X
Z1 , . . . , Zn ⇠ i.i.d. normal(0, 1) ) Zi2 ⇠ 2
n , chi-squared dist with n degrees of freedom
X
(Zi Z̄)2 ⇠ 2
n 1

Y1 , . . . , Yn ⇠ i.i.d. normal(µ, ) ) (Y1 µ)/ , . . . , (Yn µ)/ ⇠ i.i.d. normal(0, 1)

1 X
) 2
(Yi µ)2 ⇠ 2
n

1 X
2
(Yi Ȳ )2 ⇠ 2
n 1

Some intuition: Which vector do you expect to be bigger: (Z1 , . . . , Zn ) or

(Z1 Z̄, . . . , Zn Z̄)? (Z1 Z̄, . . . , Zn Z̄) is a vector of length n but lies
in an n 1 dimensional space. In fact, it is has a singular multivariate
normal distribution, with a covariance matrix of rank (n 1).
CHAPTER 3. TESTS BASED ON POPULATION MODELS 33
0.08

n=9
n=10
p (X )

n=11
0.040.00

0 5 10 15 20 25 30
X
2
Figure 3.2: distributions

Getting back to the problem at hand,

n 1 n 1 1 X
2
s2 = 2
(Yi Ȳ )2 ⇠ 2
n 1,
n 1
which is a known distribution we can look up on a table or with a computer.

The t-distribution

If
• Z ⇠ normal (0,1) ;
2
• X⇠ m;

• Z, X statistically independent,
then
Z
p ⇠ tm , the t-distribution with m degrees of freedom
X/m
2
How does this help us? Recall that if Y1 , . . . , Yn ⇠ i.i.d. normal(µ, ),
CHAPTER 3. TESTS BASED ON POPULATION MODELS 34
0.4
0.3

n=3
n=6
p (t )
0.2

n=12
n=∞
∞
0.1
0.0

−3 −2 −1 0 1 2 3
t

Figure 3.3: t-distributions

p
• n(Ȳ µ)/ ⇠normal(0,1)
n 1 2 2
• 2 s ⇠ n 1

• Ȳ , s2 are independent.
p n 1 2
Let Z = n(Ȳ µ)/ , X = 2 s . Then
p
Z n(Ȳ µ)/
p = q
X/(n 1) n 1 2
2 s /(n 1)
Ȳ µ
= p ⇠ tn 1
s/ n
This is still not a statistic because µ is unknown. However, under a specific
hypothesis like H0 : µ = µ0 , it is a statistic:
Ȳ µ0
t(Y) = p ⇠ tn 1 if E[Y ] = µ0
s/ n
It is called the t-statistic.
Some questions for discussion:
CHAPTER 3. TESTS BASED ON POPULATION MODELS 35

• What does X/m converge to as m ! 1?

• What happens to the distribution of t(Y) when n ! 1? Why?

• Consider the situation where the data are not normally distributed.
p
– What is the distribution of n(Ȳ µ)/ for large and small n?
– What is the distribution of n 21 s2 for large and small n?
p
– Are n(Ȳ µ) and s2 independent for small n? What about for
large n?

Two-sided, one-sample t-test:

2
1. Sampling model: Y1 , . . . , Yn ⇠ i.i.d. normal(µ, )

2. Null hypothesis: H0 : µ = µ0

3. Alternative hypothesis: H1 : µ 6= µ0
p
4. Test statistic: t(Y) = n(Ȳ µ0 )/s

• Pre-experiment we think of this as a random variable, an unknown

quantity that is to be randomly sampled from a population.
• Post-experiment this is a fixed number.

5. Null distribution: Under the normal sampling model and H0 , the sam-
pling distribution of t(Y) is the t-distribution with n 1 degrees of
freedom:
t(Y) ⇠ tn 1
If H0 is not true, then t(Y) does not have a t-distribution. If the data
are normal but the mean is not µ0 , then t(Y) has a non-central t-
distribution, which we will use later to calculate power, or the type II
error rate. If the data are not normal then the distribution of t(Y) is
not a t-distribution.
CHAPTER 3. TESTS BASED ON POPULATION MODELS 36

6. p-value: Let y be the observed data.

p-value = Pr(|t(Y)| |t(y)||H0 )

= Pr(|Tn 1 | |t(y)|)
= 2 ⇥ Pr(Tn 1 |t(y)|)
= 2 ⇤ (1 pt(tobs, n 1))
= t.test(y, mu = mu0)

7. Level-↵ decision procedure: Reject H0 if

• p-value  ↵ or equivalently
• |t(y)| t(n 1),1 ↵/2 (for ↵ = .05, t(n 1),1 ↵/2 ⇡ 2 ).

The value t(n 1),1 ↵/2 ⇡ 2 is called the critical value value for this test.
In general, the critical value is the value of the test statistic above
which we would reject H0 .

Question: Suppose our procedure is to reject H0 only when t(y) t(n 1),1 ↵ .
Is this a level-↵ test?

3.4 Two sample tests

Recall the wheat example:
B A B A B B
26.9 11.4 26.6 23.7 25.3 28.5
B A A A B A
14.2 17.9 16.5 21.1 24.3 19.6

Sampling model:
2
Y1A , . . . , YnA A ⇠ i.i.d. normal(µA , )
2
Y1B , . . . , YnB B ⇠ i.i.d. normal(µB , ).

In addition to normality we assume for now that both variances are equal.
CHAPTER 3. TESTS BASED ON POPULATION MODELS 37

Hypotheses: H0 : µA = µB ; HA : µA 6= µB
Recall that ✓  ◆
2 1 1
ȲB ȲA ⇠ N µB µA , + .
nA nB
Hence if H0 is true then
✓  ◆
2 1 1
ȲB ȲA ⇠ N 0, + .
nA nB

How should we estimate 2 ?

PnA P B
2 i=1 (yi,A ȳA )2 + ni=1 (yi,B ȳB )2
sp =
(nA 1) + (nB 1)
nA 1 nB 1
= s2 + s2B
(nA 1) + (nB 1) A (nA 1) + (nB 1)

This gives us the following two-sample t-statistic:

Ȳ ȲA
t(YA , YB ) = qB ⇠ tnA +nB 2
1 1
sp nA
+ nB

Self-check exercises:

1. Show that (nA + nB 2)s2p / 2

⇠ 2
nA +nB 2 (recall how the 2
distribu-
tion was defined).

2. Show that the two-sample t-statistic has a t-distribution with nA +nB

2 d.f.

Numerical Example (wheat again)

Suppose we want to have a type-I error rate of ↵ = 0.05.

Decision procedure:

• Level ↵ test of H0 : µA = µB , with ↵ = 0.05

• Reject H0 if p-value < 0.05
• Reject H0 if |t(y A , y B )| > t10,.975 = 2.23
CHAPTER 3. TESTS BASED ON POPULATION MODELS 38

Data:

• ȳA = 18.36, s2A = 17.93, nA = 6

• ȳB = 24.30, s2B = 26.54, nB = 6

t-statistic:

• s2p = 22.24, sp = 4.72

p
• t(yA , yB ) = 5.93/(4.72 1/6 + 1/6) = 2.18

Inference:

• Hence the p-value= Pr(|T10 | 2.18) = 0.054

• Hence H0 : µA = µB is not rejected at level ↵ = 0.05

> t . t e s t ( y [ x==”A” ] , y [ x==”B” ] , var . e q u a l=TRUE)

Two Sample t t e s t

data : y [ x == ”A” ] and y [ x == ”B” ]

t = 2.1793 , d f = 1 0 , p v a l u e = 0 . 0 5 4 3 1
a l t e r n a t i v e h y p o t h e s i s : t r u e d i f f e r e n c e i n means i s not e q u a l t o 0
95 p e r c e n t c o n f i d e n c e i n t e r v a l :
11.999621 0.132954
sample e s t i m a t e s :
mean o f x mean o f y
18.36667 24.30000

Always keep in mind where the p-value comes from: See Figure 3.4.

Comparison to the randomization test:

Recall that we have already compared the two-sample t-statistic to its ran-
domization distribution. A sample from the randomization distribution were
obtained as follows:

1. Sample a treatment assignment according to the randomization scheme.

2. Compute the value of t(YA , YB ) under this treatment assignment and

assuming the null hypothesis.
CHAPTER 3. TESTS BASED ON POPULATION MODELS 39
0.4
0.3
p (T )
0.2
0.1
0.0

−4 −2 0 2 4
T

Figure 3.4: The t-distribution under H0 for the wheat example

The randomization distribution of the t-statistic is then approximated by

the empirical distribution of

t(1) , . . . , t(S)

To obtain the p-value, we compare tobs = t(y A , y B ) to the empirical distri-

bution of t(1) , . . . , t(S) :

#(|t(s) | |tobs |)
p-value =
S

t . s t a t . obs< t . t e s t ( y [ x==”A” ] , y [ x==”B” ] , var . e q u a l=T) $ s t a t

t . s t a t . sim< r e a l ( )
for ( s in 1:10000)
{
xsim< sample ( x )
tmp< t . t e s t ( y [ xsim==”B” ] , y [ xsim==”A” ] , var . e q u a l=T)
t . s t a t . sim [ s]< tmp$stat
}

mean ( abs ( t . s t a t . sim ) >= abs ( t . s t a t . obs ) )

CHAPTER 3. TESTS BASED ON POPULATION MODELS 40
0.4
0.3
Density
0.2
0.1
0.0

−6 −4 −2 0 2 4 6
t( Y A ,Y B )

Figure 3.5: Randomization and t-distributions for the t-statistic under H0

When I ran this, I got

#(|t(s) | 2.18)
= 0.058 ⇡ 0.054 = Pr(|TnA +nB 2| 2.18)
S
Is this surprising? These two p-values were obtained via two completely
di↵erent ways of looking at the problem!

Assumptions: Under H0 ,
• Randomization Test:
1. Treatments are randomly assigned
• t-test:
1. Data are independent samples
2. Each population is normally distributed
3. The two populations have the same variance
Imagined Universes:
• Randomization Test: Numerical responses remain fixed, we
imagine only alternative treatment assignments.
CHAPTER 3. TESTS BASED ON POPULATION MODELS 41

• t-test: Treatment assignments remain fixed, we imagine an

alternative sample of experimental units and/or conditions, giving
di↵erent numerical responses.

Inferential Context / Type of generalization

• Randomization Test: inference is specific to our particular exper-

imental units and conditions.
• t-test: under our assumptions, inference claims to be generaliz-
able to other units / conditions, i.e. to a larger population.

Yet the numerical results are often nearly identical.

Keep the following concepts clear:

t-statistic : a scaled di↵erence in sample means, computed from the data.

t-distribution : the probability distribution of a normal random variable

divided by the square-root of a 2 random variable.

t-test : a comparison of a t-statistic to a t-distribution

randomization distribution : the probability distribution of a test statis-

tic under random treatment reassignments and H0

randomization test : a comparison of a test statistic to its randomization

distribution

randomization test with the t-statistic : a comparison of the t-statistic

to its randomization distribution

Some history:
Pn
de Moivre (1733): Approximating binomial distributions T = i=1 Yi , Y i 2
{0, 1}.

Laplace (1800s): Used normal distribution to model measurement error in

experiments.

Gauss (1800s) : Justified least squares estimation by assuming normally

distributed errors.
CHAPTER 3. TESTS BASED ON POPULATION MODELS 42

Gosset/Student (1908): Derived the t-distribution.

Gosset/Student (1925): “testing the significance”
Fisher (1925): “level of significance”
Fisher (1920s?): Fisher’s exact test.
Fisher (1935): “It seems to have escaped recognition that the physical act of
randomisation, which, as has been shown, is necessary for the validity of
any test of significance, a↵ords the means, in respect of any particular
body of data, of examining the wider hypothesis in which no normality
of distribution is implied.”
Box and Andersen (1955): Approximation of randomization null distribu-
tions (and a long discussion).
Rodgers (1999): “If Fisher’s ANOVA had been invented 30 years later or
computers had been available 30 years sooner, our statistical procedures
would probably be less tied to theoretical distributions as what they
are today”

3.5 Checking assumptions

For our t-statistic
Ȳ ȲA
t(YA , YB ) = qB
1 1
sp nA
+ nB

we showed that if Y1,A , . . . , YnA ,A and Y1,B , . . . , YnB ,B are independent samples
from pA and pB respectively, and
(a) µA = µB
2 2
(b) A = B

(c) pA and pB are normal distributions

then
t(YA , YB ) ⇠ tnA +nB 2

So our null distribution really assumes conditions (a), (b) and (c). Thus if
we perform a level-↵ test and reject H0 , we are really just rejecting that (a),
(b), (c) are all true.
CHAPTER 3. TESTS BASED ON POPULATION MODELS 43

reject H0 ) one of (a), (b) or (c) is not true

For this reason, we will often want to check if conditions (b) and (c) are
plausibly met. If

(b) is met

H0 is rejected, then

this is evidence that H0 is rejected because µA 6= µB .

3.5.1 Checking normality

Normality can be checked by a normal probability plot
Idea: Order observations within each group:

y(1),A  y(2),A  · · ·  y(nA ),A

compare these sample quantiles to the quantiles of a standard normal

distribution:
z 1 1  z 2 1  · · ·  z nA 1
nA 2 nA 2 nA 2

k 1 1
here Pr(Z  z k 1 )= nA 2
. The 2
is a continuity correction.
nA 2

If data are normal, the relationship should be approximately linear. Thus

we plot the pairs

(z 1 1 , y(1),A ), . . . , (z nA 1 , y(nA ),A ).

nA 2 nA 2

Normality may be checked roughly by fitting straight lines to the probability

plots and examining their slopes

3.5.2 Unequal variances

For now, we will use the following rule of thumb:

If 1/4 < s2A /s2B < 4, we won’t worry too much about unequal variances.
CHAPTER 3. TESTS BASED ON POPULATION MODELS 44
24

0.0 0.5 1.0 1.5

1.5
● ● ● ●
Sample Quantiles

Sample Quantiles

Sample Quantiles
● ●
18 22 26
● ●
20

●
●

0.5
●
●
● ●
● ● ●
16

−0.5
●
12

● ● ● ● ●

−1.0 0.0 1.0 −1.0 0.0 1.0 −1.0 0.0 1.0 −1.0 0.0 1.0
Theoretical Quantiles Theoretical Quantiles Theoretical Quantiles Theoretical Quantiles
1.0

● ● ● 0.5 ● ●
● ●

1.0
●
Sample Quantiles

Sample Quantiles

Sample Quantiles
●
−0.5 0.5

−2.0 −1.0 0.0

● ●
● ● ● ●

−1.0 0.0
−0.5

● ● ● ●
●
−2.0

−1.5

● ● ● ●

−1.0 0.0 1.0 −1.0 0.0 1.0 −1.0 0.0 1.0 −1.0 0.0 1.0
Theoretical Quantiles Theoretical Quantiles Theoretical Quantiles Theoretical Quantiles

● ● ● −1.0 0.0 1.0 2.0 ●

0.0 0.5 1.0

●
Sample Quantiles

Sample Quantiles

Sample Quantiles
−0.5 0.0 0.5

● ●
● ●
● ●
0.0

● ●

●
● ● ●
●
−1.0

●
● ● ● ● ●

−1.0 0.0 1.0 −1.0 0.0 1.0 −1.0 0.0 1.0 −1.0 0.0 1.0
Theoretical Quantiles Theoretical Quantiles Theoretical Quantiles Theoretical Quantiles
−0.5 0.5 1.5 2.5

● ● ● ●
0.0 0.5
2
Sample Quantiles

Sample Quantiles

● ●
−1.5 −0.5 0.5
1

●
● ●
● ●
●
−1 0

●
● ●
● ●
● ●
−1.0

● ● ● ●

−1.0 0.0 1.0 −1.0 0.0 1.0 −1.0 0.0 1.0 −1.0 0.0 1.0
Theoretical Quantiles Theoretical Quantiles Theoretical Quantiles Theoretical Quantiles

Figure 3.6: Normal scores plots.

CHAPTER 3. TESTS BASED ON POPULATION MODELS 45

This may not sound very convincing. In later sections, we will show how
to perform formal hypothesis tests for equal variances. However, this won’t
completely solve the problem. If variances do seem unequal we have a variety
of options available:

• use the randomization null distribution;

• transform the data to stabilize the variances (to be covered later);

• use a modified t-test that allows unequal variance.

The modified t-statistic is

ȳB ȳA
tw (y A , y B ) = q 2
sA s2B
nA
+ nB

This statistic looks pretty reasonable, and for large nA and nB its null dis-
tribution will indeed be a normal(0, 1) distribution. However, the exact null
distribution is only approximately a t-distribution, even if the data are
actually normally distributed. The t-distribution we compare tw to is a t⌫w -
distribution, where the degrees of freedom ⌫w are given by
⇣ 2 ⌘2
sA s2B
nA
+ nB
⌫w = ⇣ 2 ⌘2 ⇣ 2 ⌘2 .
1 sA 1 s
nA 1 nA
+ nB 1 nBB
This is known as Welch’s approximation; it may not give an integer as the
degrees of freedom.
This t-distribution is not, in fact the exact sampling distribution of tdi↵ (yA , yB )
under the null hypothesis that µA = µB , and A2 6= B2 . This is because the
null distribution depends on the ratio of the unknown variances, A2 and B2 .
This difficulty is known as the Behrens-Fisher problem.

Which two-sample t-test to use?

• If the sample sizes are the same (nA = nB ) then the test statistics
tw (y A , y B ) and t(y A , y B ) are the same; however the degrees of freedom
used in the null distribution will be di↵erent unless the sample standard
deviations are the same.
CHAPTER 3. TESTS BASED ON POPULATION MODELS 46

• If nA > nB , but A2 < B2 , and µA = µB then the two sample test based
on comparing t(y A , y B ) to a t-distribution on nA + nB 2 d.f. will
reject more than 5% of the time.

– If the null hypothesis that both the means and variances are equal,
i.e.
H0 : µA = µB and A2 = B2
is scientifically relevant, then we are computing a valid p-value,
and this higher rejection rate is a good thing! Since when the
variances are unequal the null hypothesis is false.
– If however, the hypothesis that is most scientifically relevant is
H0 : µA = µB
without placing any restrictions on the variances, then the higher
rejection rate in the test that assumes the variances are the same
could be very misleading, since p-values may be smaller than they
are under the correct null distribution (in which A2 6= B2 ).
Likewise we will underestimate the probability of type I error.

• If nA > nB and A2 > B2 , then the p-values obtained from the test using
t(y A , y B ) will tend to be conservative (= larger) than those obtained
with tw (y A , y B ).

In short: one should be careful about applying the test based on t(yA , yB ) if
the sample standard deviations appear very di↵erent, and it is not reasonable
to assume equal means and variances under the null hypothesis.
Chapter 4

Confidence intervals and power

4.1 Confidence intervals via hypothesis tests

Recall that

• H0 : E[Y ] = µ0 is rejected if
p
n|(ȳ µ0 )/s| t1 ↵/2

• H0 : E[Y ] = µ0 is not rejected if

p
n|(ȳ µ0 )/s|  t1 ↵/2
1
|ȳ µ0 |  p s ⇥ t1 ↵/2
n
s s
ȳ p ⇥ t1 ↵/2  µ0  ȳ + p ⇥ t1 ↵/2
n n

If µ0 satisfies this last line, then it is in the acceptance region. Otherwise it

is in the rejection region. In other words, “plausible” values of µ are in the
interval
s
ȳ ± p ⇥ t1 ↵/2
n
We say this interval is a “100 ⇥ (1 ↵)% confidence interval” for µ. This
interval contains only those values of µ that are not rejected by this level-↵
test.

47
CHAPTER 4. CONFIDENCE INTERVALS AND POWER 48

Main property of a confidence interval

Suppose you are going to

1. gather data;

2. compute a 100 ⇥ (1 ↵)% confidence interval.

Further suppose H0 : E[Y ] = µ0 is true. What is the probability that µ0

will be in your to-be-sampled (random) interval? In other words, what is the
probability that the random interval will contain the true value?

Pr(µ0 in interval|E[Y ] = µ0 ) = 1 Pr(µ0 not in interval|E[Y ] = µ0 )

= 1 Pr(reject H0 |E[Y ] = µ0 )
= 1 Pr(reject H0 |H0 is true)
= 1 ↵

The quantity 1 ↵ is called the coverage probability. It is

• the pre-experimental probability that your confidence interval will cover

the true value;

• the large sample fraction of experiments in which the confidence interval

covers the true mean.

Confidence interval for a di↵erence between treatments

Recall that we may construct a 95% confidence interval by finding those null
hypotheses that would not be rejected at the 0.05 level.
Sampling model:

2
Y1,A , . . . , YnA ,A ⇠ i.i.d. normal(µA , )
2
Y1,B , . . . , YnB ,B ⇠ i.i.d. normal(µB , ).
Consider evaluating whether is a reasonable value for the di↵erence in
means:
H0 : µB µA =
H1 : µ B µA 6=
CHAPTER 4. CONFIDENCE INTERVALS AND POWER 49

Under H0 , you should be able to show that

(ȲB ȲA )
p ⇠ tnA +nB 2
sp 1/nA + 1/nB
Thus a given di↵erence is accepted at level ↵ if
|ȳ ȳA |
pB  tc
sp 1/nA + 1/nB
r r
1 1 1 1
(ȳB ȳA ) sp + tc   (ȳB ȳA ) + sp + tc
nA nB nA nB
where tc = t1 ↵/2,nA +nB 2 is the critical value.
Wheat example:
• ȳB ȳA = 5.93
p
• sp = 4.72, sp 1/nA + 1/nB = 2.72

• t.975,10 = 2.23
A 95% C.I. for µB µA is

5.93 ± 2.72 ⇥ 2.23

5.93 ± 6.07 = ( 0.13, 11.99)

Questions:
• What does the fact that 0 is in the interval say about H0 : µA = µB ?

• What is the interpretation of this interval?

• Could we have constructed an interval via a randomization test?

4.2 Power and Sample Size Determination

Suppose that we are designing a study where we intend to gather data on
two groups, and will decide if there is a di↵erence between the groups based
on the data. Further suppose our decision procedure will be determined by
a level-↵ two-sample t-test.
CHAPTER 4. CONFIDENCE INTERVALS AND POWER 50

Two sample experiment and t-test:

• H0 : µ A = µ B H1 : µA 6= µB

• Randomize treatments to the two groups via a CRD.

• Gather data.

• Perform a level ↵ hypothesis test: reject H0 if

|tobs | t1 ↵/2,nA +nB 2 .

Recall, if ↵ = 0.05 and nA , nB are large then t1 ↵/2,nA +nB 2 ⇡ 2.

We know that the type I error rate is ↵ = 0.05, or more precisely:

Pr(type I error|H0 true) = Pr(reject H0 |H0 true) = 0.05

What about

Pr(type II error|H0 false) = Pr(accept H0 |H0 false)

= 1 Pr(reject H0 |H0 false)

This is not yet a well-defined problem: there are many di↵erent ways in which
the null hypothesis may be false, e.g. µB µA = 0.0001 and µB µA = 10, 000
are both instances of the alternative hypothesis. However, clearly we have

Pr(reject H0 |µB µA = .0001) < Pr(reject H0 |µB µA = 10, 000)

if the treatment doesn’t a↵ect the variance.

To make the question concerning Type II error-rate better defined we need to
be able to refer to a specific alternative hypothesis. For example, in the case
of the two-sample test, for a specific di↵erence we may want to calculate:

1 Pr(type II error|µB µA = ) = Pr(reject H0 |µB µA = ).

We define the power of a two-sample t-test test under a specific alterna-

tive to be:

Power( ) = Pr(reject H0 | µB µA = )
= Pr(|t(Y A , Y B )| t1 ↵/2,nA +nB 2 µB µA = ).
CHAPTER 4. CONFIDENCE INTERVALS AND POWER 51

Remember, the “critical” value t1 ↵/2,nA +nB 2 above which we reject the null
hypothesis was computed from the null distribution.
However, now we want to work out the probability of getting a value of
the t-statistic greater than this critical value, when a specific alternative
hypothesis is true. Thus we need to compute the distribution of our t-
statistic under the specific alternative hypothesis.
If we suppose Y1,A , . . . , YnA ,A ⇠ i.i.d. normal(µA , 2 ) and Y1,B , . . . , YnB ,B ⇠
i.i.d. normal(µB , 2 ), where µB µA = then to calculate the power we need
to know the distribution of
Ȳ ȲA
t(Y A , Y B ) = qB .
1 1
sp nA
+ nB

We know that if µB µA = then

ȲB ȲA
q ⇠ tnA +nB 2
sp n1A + 1
nB

but unfortunately
ȲB ȲA
t(Y A , Y B ) = q + q . (4.1)
sp n1A + 1
nB
sp 1
nA
+ 1
nB

The first part in the above equation has a t-distribution, which is centered
around zero. The second part moves the t-statistic away from zero by an
amount that depends on the pooled sample variance. For this reason, we
call the distribution of the t-statistic under µB µA = the non-central
t-distribution. In this case, we write
0 1

t(Y A , Y B ) ⇠ t⇤nA +nB 2

@ q A
1 1
nA
+ nB
| {z }
non-centrality
parameter.

Note that this distribution is more complicated than just a t-distribution

plus a constant “shift” away from zero. For the t-statistic, the amount of the
shift depends on the (random) pooled sample variance.
CHAPTER 4. CONFIDENCE INTERVALS AND POWER 52
0.4

γ=0
0.3

γ=1
γ=2
0.2
0.1
0.0

−2 0 2 4 6
t

Figure 4.1: A t10 distribution and two non-central t10 -distributions.

4.2.1 The non-central t-distribution

A noncentral t-distributed random variable can be represented as
Z+
T =p
X/⌫

where

• is a constant;

• Z is standard normal;
2
• X is with ⌫ degrees of freedom, independent of Z.

The quantity is called the noncentrality parameter.

Exercise: Using the above representation, show that the distribution of

the t-statistic is a non-central t distribution, assuming the data are normal
and the variance is the same in both groups.

For a non-central t-distribution,

• the mean is not zero;

CHAPTER 4. CONFIDENCE INTERVALS AND POWER 53

• the distribution is not symmetric.

It can be shown that

p⌫
2
(⌫21)
E[t(Y A , Y B )|µB µA = ] = q ⇥
1
+ 1 ( ⌫2 )
nA nB

where ⌫ = nA + nB 2, the degrees of freedom, and (x) is the gamma

function, a generalization of the factorial:

• (n + 1) = n! if n is an integer

• (r + 1) = r (r)
p
• (1) = 1, ( 12 ) = ⇡

Anyway, you can show that for large ⌫,

r
⌫ ⌫ 1 ⌫
( )/ ( ) ⇡ 1 so
2 2 2
E[t(Y A , Y B )|µB µA = ] ⇡ q
1 1
nA
+ nB

This isn’t really such a big surprise, because we know that:

2
ȲB ȲA ⇠ normal( , [1/nA + 1/nB ]).

Hence 0 1
Ȳ ȲA
qB ⇠ normal @ q , 1A .
1 1 1 1
nA
+ nB nA
+ nB

We also know that for large values of nA , nB , we have s ⇡ , so the non-

central t-distribution will (for large enough nA , nB ) look approximately nor-
mal with
p
• mean /( (1/nA ) + (1/nB ));

• standard deviation 1.
CHAPTER 4. CONFIDENCE INTERVALS AND POWER 54

Another way to get the same result is to refer back to the expression for the
t-statistic given in 4.1:

ȲB ȲA
t(Y A , Y B ) = q + q
sp n1A + 1
nB
sp 1
nA
+ 1
nB

= anA ,nb + bnA ,nB

The first term anA ,nB has a t-distribution, and becomes standard normal as
nA , nB ! 1. As for bnA ,nB , since s2p ! 2 as nA or nB ! 1, we have

1
p ! 1 as nA , nB ! 1.
bnA ,nB sp 1/nA + 1/nB

4.2.2 Computing the Power of a test

Recall our level-↵ testing procedure using the t-test:

1. Sample data, compute tobs = t(Y A , Y B ) .

2. Compute the p-value, Pr(|TnA +nB 2| > |tobs |).

3. Reject H0 if the p-value  ↵ , |tobs | t1 ↵/2,nA +nB 2 .

For this procedure, we have shown that

Pr(reject H0 |µB µA = 0) = Pr( p-value  ↵|µB µA = 0)

But what is the probability of rejection under H : µB µA = ? Hopefully

this is bigger than ↵! Let tc = t1 ↵/2,nA +nB 2 , the 1 ↵/2 quantile of a
t-distribution with nA + nB 2 degrees of freedom.

Pr(reject H0 | µB µA = ) = Pr ( |t(Y A , Y B )| > tc | µB µA = )

= Pr (|T ⇤ | > tc )
= Pr (T ⇤ > tc ) + Pr (T ⇤ < tc )
= [1 Pr (T ⇤ < tc )] + Pr (T ⇤ < tc )
CHAPTER 4. CONFIDENCE INTERVALS AND POWER 55
0.4

γ=0
0.3

γ=1
0.2
0.1
0.0

−6 −4 −2 0 2 4 6
t

Figure 4.2: Critical regions and the non-central t-distribution

where T ⇤ has the non-central t-distribution with = nA + nB 2 degrees of

freedom and non-centrality parameter

= q .
1 1
nA
+ nB

We will want to make this calculation in order to see if our sample size is
sufficient to have a reasonable chance of rejecting the null hypothesis. If we
have a rough idea of and 2 we can evaluate the power using this formula.
t . crit < qt ( 1 a l p h a /2 , nA + nB 2 )

t . gamma< d e l t a / ( sigma ⇤ s q r t ( 1 /nA + 1/nB ) )

t . power < 1 pt ( t . c r i t , nA+nB 2 , ncp=t . gamma ) +

pt( t . c r i t , nA+nB 2 , ncp=t . gamma )

When you do these calculations you should think of Figure 4.2. Letting T ⇤
and T be non-central and central t-distributed random variables respectively,
make sure you can relate the following probabilities to the figure:

• Pr(T ⇤ > tc )
CHAPTER 4. CONFIDENCE INTERVALS AND POWER 56

• Pr(T ⇤ < tc )

• Pr(T > tc )

• Pr(T < tc )

Note that if the power Pr(|T ⇤ | > tc ) is large, then one of Pr(T ⇤ > tc ) or
Pr(T ⇤ < tc ) will be very close to zero.

Approximating the power

Recall that for large nA , nB ,

.
t(Y A , Y B ) ⇠ normal( , 1)

The normal approximation to the power is thus given by

Pr(|X| > tc ) = [1 Pr(X < tc )] + Pr(X < tc )

where X ⇠normal( , 1). This can be computed in R as

t . norm . power < 1 pnorm ( t . c r i t , mean=t . gamma ) +
pnorm( t . c r i t , mean=t . gamma )

This will be a reasonable approximation for large nA , nB . It may be an over-

estimate or under-estimate of the power obtained from the t-distribution.
Finally, keep in mind that in our calculations we have assumed that the
variances of the two populations are equal.

Example (selecting a sample size): Suppose the wheat researchers wish

to redo the experiment using a larger sample size. How big should their sam-
ple size be if they want to have a good chance of rejecting the null hypothesis
µB µA = 0 at level ↵ = 0.05, if the true di↵erence in means is µB µA = 5
or more?

µB µA = 5
2
is unknown: We’ll assume the pooled sample variance from the first
experiment is a good approximation: 2 = 22.24.
CHAPTER 4. CONFIDENCE INTERVALS AND POWER 57

1.0
● ●●●●
4.0

●
● ●●●
● ● ●●
●
●
● ●●
● ●
●
3.5

0.8
● ●
●
● ●
● ●
●

power
● ●
3.0

●
γ

● ●

0.6
● ●
● ● power
● ●
2.5

● normal approx
● ●
● ●
●

0.4
2.0

● ●
●

10 15 20 25 30 10 15 20 25 30
n n

Figure 4.3: and power versus sample size, and the normal approximation
to the power.

Under these conditions, if nA = nB = n, then

µB µA
= p
1/nA + 1/nB
5 p
= p = .75 n
4.72 2/n

What is the probability we’ll reject H0 at level ↵ = 0.05 for a given sample
size?
d e l t a < 5 ; s2< ( (nA 1)⇤ var (yA) + (nB 1)⇤ var (yB) ) / ( nA 1+nB 1)

alpha < 0.05 ; n< s e q ( 6 , 3 0 )

t . crit < qt (1 a l p h a / 2 , 2 ⇤ n 2)

t . gamma< d e l t a / s q r t ( s 2 ⇤ ( 1 / n+1/n ) )

t . power< 1 pt ( t . c r i t , 2 ⇤ n 2, ncp=t . gamma)+

pt( t . c r i t , 2 ⇤ n 2, ncp=t . gamma)

t . normal . power< 1 pnorm ( t . c r i t , mean=t . gamma ) +

pnorm( t . c r i t , mean=t . gamma )
CHAPTER 4. CONFIDENCE INTERVALS AND POWER 58

So we see that if the true mean di↵erence were µB µA = 5, then the original
study only had about a 40% chance of rejecting H0 . To have an 80% chance
or greater, the researchers would need a sample size of 15 for each group.
Note that the true power depends on the unknown true mean di↵erence and
true variance (assuming these are equal in the two groups). Even though our
power calculations were done under potentially inaccurate values of µB µA
and 2 , they still give us a sense of the power under various parameter values:
• How is the power a↵ected if the mean di↵erence is bigger? smaller?
• How is the power a↵ected if the variance is bigger? smaller?

Example (power as a function of the e↵ect): Suppose a chemical

company wants to know if a new procedure B will yield more product than
the current procedure A. Running experiments comparing A to B are ex-
pensive and they are only budgeted to run an experiment with at most 10
observations in each group.
Is running the experiment worthwhile? To assess this we can calculate the
power under nA = nB = 10 for a variety of values of µB µA and . The
first panel plots power as a function of the mean di↵erence for three di↵erent
values of . From this plot, we can see that if the mean di↵erence is 1 and
the variance is 1, then we have almost a 60% chance of rejecting the null
hypothesis, although we only have about a 23% chance of doing so if the
variance is 9 ( = 3).
Because the power varies as the ratio of e↵ect size to the standard deviation,
it is often useful to plot power in terms of this ratio. The scaled e↵ect size
✓, where
✓ = (µB µA )/ ,
represents the size of the treatment e↵ect scaled by the experimental vari-
ability (the standard deviation). The noncentrality parameter is then
p
= ✓/ 1/nA + 1/nB .
With nA = nB = 10, we have = 2.24 ⇥ ✓. A plot of power versus ✓ for a
level-0.05 test appears in the first panel of Figure 4.4. From this we see that
H0 will be rejected with probability 80% or more only if |✓| is bigger than
about 1.33. In other words, for a sample size of 10 in each group, the e↵ect
must be at least 1.33 times as big as the standard deviation in order to have
an 80% chance of rejecting H0 .
CHAPTER 4. CONFIDENCE INTERVALS AND POWER 59
1.0

1.0
0.8

0.8
0.4 0.6

0.4 0.6
power

power
σ=1
0.2

σ=2

0.2
σ=3
0.0

−4 −2 0 2 4 −3 −2 −1 0 1 2 3
µB − µA (µµB − µA)/σ
σ

Figure 4.4: Null and alternative distributions for another wheat example,
and power versus sample size.

Increasing power

As we’ve seen by the normal approximation to the power, for a fixed type I
error rate the power is a function of the noncentrality parameter
µB µA
= p ,
1/nA + 1/nB

so clearly power is

• increasing in |µB µA |;

• increasing in nA and nB ;
2
• decreasing in .

The first of these we do not generally control with our experiment (indeed, it
is the unknown quantity we are trying to learn about). The second of these,
sample size, we clearly do control. The last of these, the variance, seems
like something that might be beyond our control. However, the experimental
variance can often be reduced by dividing up the experimental material into
more homogeneous subgroups of experimental units. This design technique,
known as blocking, will be discussed in an upcoming chapter.
Chapter 5

Introduction to ANOVA

Example (Response times):

Background: Psychologists are interested in how learning methods a↵ect

short-term memory.

Hypothesis: Di↵erent learning methods may result in di↵erent recall time.

Treatments: 5 di↵erent learning methods (A, B, C, D, E).

Experimental design: (CRD) 20 male undergraduate students were ran-

domly assigned to one of the 5 treatments, so that there are 4 students
assigned to each treatment. After a learning period, the students were
given cues and asked to recall a set of words. Mean recall time for each
student was recorded in seconds.

Results:

Treatment sample mean sample sd

A 6.88 0.76
B 5.41 0.84
C 6.59 1.37
D 5.46 0.41
E 5.64 0.83

Question: Is treatment a source of variation?

60
CHAPTER 5. INTRODUCTION TO ANOVA 61

●
8
response time (seconds)

●
●
●
7

●
● ●
●
●
6

● ● ●
●
●
● ●
●
5

● ●
●

A B C D E
treatment

Figure 5.1: Response time data

Possible data analysis method: Perform t-tests of

H0i1 i2 : µi1 = µi2 versus H1i1 i2 : µi1 6= µi2
for each of the 52 = 10 possible pairwise comparisons. Reject a hypothesis
if the associated p-value  ↵.

Problem: If there is no treatment e↵ect at all, then

Pr(reject H0i1 i2 |H0i1 i2 true) = ↵
Pr(reject any H0i1 i2 | all H0i1 i+2 true ) = 1 Pr( accept all H0i1 i2 | all H0i1 i2 true )
Y
⇡ 1 (1 ↵)
i<j

If ↵ = 0.05, then
Pr(reject one or more H0i1 i2 | all H0i1 i2 true ) ⇡ 1 .9510 = 0.40
So, even though the pairwise error rate is 0.05 the experiment-wise error rate
is about 0.40. This issue is called the problem of multiple comparisons and
will be discussed further in Chapter 6. For now, we will discuss a method of
testing the global hypothesis of no variation due to treatment:
H0 : µi1 = µi2 for all i1 , i2 versus H1 : µi1 6= µi2 for some i1 , i2
CHAPTER 5. INTRODUCTION TO ANOVA 62

To do this, we will compare

treatment variability: variability across treatments, to

experimental variability: variability among experimental units.

First we need to have a way of quantifying these things.

5.1 A model for treatment variation

Data:

yij = measurement from the jth replicate under the ith treatment.
i = 1, . . . , m indexes treatments
j = 1, . . . , n indexes observations or replicates.

Treatment means model:

yij = µi + ✏ij
E[✏ij ] = 0
Var[✏ij ] = 2

µi is the ith treatment mean,

✏ij ’s represent within treatment variation, also known as error or noise.

Treatment e↵ects model:

yij = µ + ⌧i + ✏ij
E[✏ij ] = 0
Var[✏ij ] = 2

µ is the grand mean;

⌧1 , . . . , ⌧m are the treatment e↵ects, representing between treatment variation

✏ij ’s still represent within treatment variation.

CHAPTER 5. INTRODUCTION TO ANOVA 63
P
In this model, we typically restrict ⌧i = 0, otherwise the model is overpa-
rameterized.
The treatment means and treatment e↵ects models represent two pa-
rameterizations of the same model:

µi = µ + ⌧ i , ⌧ i = µi µ

Null (or reduced) model:

yij = µ + ✏ij
E[✏ij ] = 0
Var[✏ij ] = 2

This is a special case of the above two models with

• µ = µ1 = · · · µm , or equivalently

• ⌧i = 0 for all i.

In this model, there is no variation due to treatment.

5.1.1 Model Fitting

What are good estimates of the parameters? One criteria used to evaluate
di↵erent values of µ = {µ1 , . . . , µm } is the least squares criterion:
m X
X n
SSE(µ) = (yij µi )2
i=1 j=1

The value of µ that minimizes SSE(µ) is called the least-squares estimate,

and will be denoted µ̂.

Estimating treatment means: SSE is the sum of a bunch of quadratic

terms, and so is a convex function of µ. The global minimizer µ̂ satisfies
CHAPTER 5. INTRODUCTION TO ANOVA 64

rSSE(µ̂) = 0. Taking derivatives, we see that

n
@ @ X
SSE(µ) = (yij µi )2
@µi @µi j=1
X
= 2 (yij µi )
= 2n(ȳi· µi ), so
rSSE(µ) = 2n(ȳ µ)

Therefore, the global minimum occurs at µ̂ = ȳ = {ȳ1· , . . . , ȳt· }.

Interestingly, SSE(µ̂) provides a measure of experimental variability:
XX XX
SSE(µ̂) = (yij µ̂i )2 = (yij ȳi· )2
P
Recall s2i = (yij ȳi· )2 /(n 1) estimates 2 using data from group i. If we
have more than one group, we want to pool our estimates to be more precise:

(n 1)s21 + · · · + (n 1)s2m
s2 =
(n 1) + · · · + (n 1)
P P
(y1j ȳ1· )2 + · · · + (y1j ȳ1· )2
=
m(n 1)
PP
(yij µ̂i )2
=
m(n 1)
SSE(µ̂)
= ⌘ MSE
m(n 1)

The values µ̂ and s2 have various interpretations depending on the assump-

tions we are willing to make. Consider the following assumptions:
A0: Data are independently sampled from their respective populations

A1: Populations have the same variance

A2: Populations are normally distributed

Then
A0 ! µ̂ is an unbiased estimator of µ
CHAPTER 5. INTRODUCTION TO ANOVA 65

A0+A1 ! s2 is an unbiased estimator of 2

A0+A1+A2 !
(µ̂, s2 ) are the minimum variance unbiased estimators of (µ, 2
)
n 1 2 2
(µ̂, n
s) are the maximum likelihood estimators of (µ, )

Within-treatment variability:

SSE(µ̂) ⌘ SSE is a measure of within treatment variability:

m X
X n
SSE = (yij ȳi· )2
i=1 j=1

It is the sum of squared deviation of observation values from their group

mean.
1
MSE = SSE/m(n 1) = (s21 + · · · + s2m )
m
Clearly MSE is a measure of average (or mean) within-treatment variability.

Between-treatment variability:

The analogous measure of between treatment variability is

m X
X n m
X
2
SST = (ȳi· ȳ·· ) = n (ȳi· ȳ·· )2
i=1 j=1 i=1

where
1 XX
ȳ·· = yij
mn
1
= (ȳ1 + · · · + ȳm )
m
is the grand mean of the sample. We call SST the treatment sum of squares.
We also define MST = SST/(m 1) as the treatment mean squares or mean
squares (due to) treatment. Notice that MST is simply n times the sample
variance of the sample means:
" m
#
1 X
MST = n ⇥ (ȳi· ȳ·· )2
m 1 i=1
CHAPTER 5. INTRODUCTION TO ANOVA 66

5.1.2 Testing hypothesis with MSE and MST

Consider evaluating the null hypothesis of no treatment e↵ect:

H0 : {µ1 , . . . , µm } all equal versus H1 : {µ1 , . . . , µm } not all equal

Note that m
X
{µ1 , . . . , µm } not all equal , (µi µ̄)2 > 0
i=1

Probabilistically,
m
X m
X
2
(µi µ̄) > 0 ) a large (ȳi· ȳ·· )2 will probably be observed.
i=1 i=1

Inductively,
X X
a large (ȳi· ȳ·· )2 observed ) (µi µ̄)2 > 0 is plausible
i=1m i=1m

So a large value of SST or MST gives evidence that there are di↵erences
between the true treatment means. But how large is large? We need to
know what values of MST to expect under H0 .

MST under the null:

Suppose H0 : µ1 = · · · = µm = µ is true . Then

p p
E[(Ȳi· ] = µ ! E[ nȲi· ] = nµ
p
Var[Ȳi· ] = 2 /n ! Var[ nȲi· ] = 2
p p
So under the null nȲ1· , . . . , nȲm· are m independent random variables
having the same mean and variance. Recall that if X1 , . . . , Xn ⇠ i.i.d. P ,
then an unbiasedPestimate of the variance of population P is given by the
sample variance (Xi X̄)2 /(n 1). Therefore,
Pp p
( nȲi nȲ )2 p
is an unbiased estimate of Var[ nȲi ] = 2 .
m 1
CHAPTER 5. INTRODUCTION TO ANOVA 67

Notice that
Pp p P
( nȲi nȲ )2 n (Ȳi Ȳ )2
=
m 1 m 1
SST
=
m 1
= MST,
2
so E[MST|H0 ] = .

MST under an alternative:

We can show that under a given value of µ,
P
2 n m i=1 (µi µ̄)2
E[MST|µ] = +
Pmm 2 1
⌧
= 2 + n i=1 i
m 1
2
⌘ + nv⌧2
2
So E[MST|µ] , with equality only if there is no variability in treat-
ment means, i.e. v⌧2 = 0.

ExpectedPvalue of MSE:
MSE = m1 m 2
i=1 si , so

1 X
E[MSE] = E[s2i ]
m
1 X 2 2
= =
m

2
Let’s summarize our potential estimators of :
If H0 is true:
2
• E[MSE|H0 ] =
2
• E[MST|H0 ] =

If H0 is false:
2
• E[MSE|H1 ] =
CHAPTER 5. INTRODUCTION TO ANOVA 68

2
• E[(MST|H1 ] = + nv⌧2

This should give us an idea for a test statistic:

If H0 is true:
2
• MSE ⇡
2
• MST ⇡

If H0 is false
2
• MSE ⇡
2
• MST ⇡ + nv⌧2 > 2

under H0 , MST/MSE should be around 1,

under Hc0 , MST/MSE should be bigger than 1.

Thus the test statistic F (Y ) = MST/MSE is sensitive to deviations from the

null, and can be used to measure evidence against H0 . Now all we need is a
null distribution.

Example (response times):

ybar . t< t a p p l y ( y , x , mean )

s 2 . t< t a p p l y ( y , x , var )

SSE< sum ( ( n 1)⇤ s 2 . t )

SST< n⇤sum ( ( ybar . t mean ( y ) ) ˆ 2 )

MSE< SSE / (m⇤ ( n 1))

MST< SST/ (m 1)

> SSE
[ 1 ] 12.0379
> SST
[ 1 ] 7.55032
CHAPTER 5. INTRODUCTION TO ANOVA 69

> MSE
[ 1 ] 0.8025267
> MST
[ 1 ] 1.88758

It is customary to summarize these calculations in a table:

Source of variation Sums of Squares Mean Squares F -ratio
Treatment 7.55 1.89 2.35
Noise 12.04 0.80
The F -ratio deviates a bit from 1. Possible explanations:

• H1 : The F -value is a result of actual di↵erences between treatments.

• H0 : The F -value is a result of a chance assignment of treatments, eg.

the slow students were randomized to A, the fast students to D.

To evaluate H0 we look at how likely such a treatment assignment is:

Randomization test:

F . obs< anova ( lm ( y˜ a s . f a c t o r ( x ) ) ) $F [ 1 ]

> F . obs
[ 1 ] 2.352046

$
set . seed (1)
F . n u l l < NULL
f o r ( nsim i n 1 : 1 0 0 0 )
{
x . sim< sample ( x )
F . n u l l < c (F . n u l l , anova ( lm ( y˜ a s . f a c t o r ( x . sim ) ) ) $F [ 1 ] )
}
$
> mean (F . n u l l >=F . obs )
[ 1 ] 0.102

The observed between-group variation is larger than the observed within-

group variation, but not larger than the types of F -statistics we’d expect
to get if the null hypothesis were true.
CHAPTER 5. INTRODUCTION TO ANOVA 70
0.6
0.4
0.2
0.0

0 2 4 6 8
F

Figure 5.2: Randomization distribution of the F -statistic

5.2 Partitioning sums of squares

Informally, we can think about the total variation in a datasets as follows:
Total variability = variability due to treatment + variability due to other things
= between treatment variability + within treatment variability
This can in fact be made formal:
m X
X n
SSTotal ⌘ (yij ȳ·· )2 = SST + SSE
i=1 j=1

Proof:
m X
X n XX
(yij ȳ·· )2 = [(yij ȳi· ) + (ȳi· ȳ·· )]2
i=1 j=1 i j
XX
= (yij ȳi· )2 + 2(yij ȳi· )(ȳi· ȳ·· ) + (ȳi· ȳ·· )2
i j
XX XX XX
= (yij ȳi· )2 + 2(yij ȳi· )(ȳi· ȳ·· ) + (ȳi· ȳ·· )2
i j i j i j
= (1) + (2) + (3)
CHAPTER 5. INTRODUCTION TO ANOVA 71

P P
(1) = (y ȳi· )2 = SSE
Pi Pj ij P
(3) = (ȳi· ȳ·· )2 = n i (ȳi· ȳ·· )2 = SST
P P
i j P P
(2) = 2 i j (yij ȳi· )(ȳi· ȳ·· ) = 2 i (ȳi· ȳ·· ) j (yij ȳi· )

but note that

!
X X
(yij ȳi· ) = yij nȳi,·
j j
= nȳi,· nȳi,· = 0

for all j. Therefore (2) = 0 and we have

total sum of squared deviations from grand mean = between trt sums of squares +
within trt sums of squares
or more succinctly,
SSTotal = SST + SSE.

Putting it all together:

H0 : µi1 = µi2 for all i1 , i2 ) Reduced model : yij = µ + ✏ij
H1 : µi1 6= µi2 for some i1 6= i2 ) Full model : yij = µi + ✏ij

• A fitted value or predicted value of an observation yij is denoted ŷij and

represents the modeled value of yij , without the noise.

• A residual ✏ˆij is the observed value minus the fitted value, ✏ˆij = yij ŷij .

If we believe H1 ,
• our estimate of µi is µ̂i = ȳi·

• the fitted value of yij is ŷij = µ̂i = ȳi·

• the residual for (ij) is ✏ˆij = (yij ŷij ) = (yij ȳi· ).

• the model lack-of-fit is measured by the “sum of squared errors”:

XX
(yij ȳi· )2 = SSEF
i j
CHAPTER 5. INTRODUCTION TO ANOVA 72

If we believe H0 ,

• our estimate of µ is µ̂ = ȳ··

• the fitted value of yij is ŷij = µ̂ = ȳ··

• the residual for (ij) is ✏ˆij = (yij ŷij ) = (yij ȳ·· ).

• the model lack-of-fit in this case is

XX
(yij ȳ·· )2 = SSER = SSTotal
i j

The improvement in model fit by including treatment parameters is

error in reduced model - error in full model = improvement in model fit
SSTotal - SSE = SST
The main idea: Variance can be partitioned into parts representing dif-
ferent sources. The variance explained by di↵erent sources can be compared
and analyzed. This gives rise to the ANOVA table.

5.2.1 The ANOVA table

ANOVA = Analysis of Variance

Source Degrees of Freedom Sums of Squares Mean Squares F-ratio

Treatment m 1 SST MST=SST/(m 1) MST/MSE
Noise m(n 1) SSE MSE=SSE/m(n 1)
Total mn 1 SSTotal

Reaction time example:

Source of variation Degrees of Freedom Sums of Squares Mean Squares F-ratio
Treatment 4 7.55 1.89 2.352
Noise 15 12.04 0.80
Total 19 19.59
CHAPTER 5. INTRODUCTION TO ANOVA 73

Uses of the ANOVA table:

• No model assumptions ! table gives a descriptive decomposition of

variation.
2
• Assuming Var[Yij ] = in all groups !
2
– E[MSE] =
2
– E[MST] = + nv⌧2 , where v⌧2 = variance of true group means.

• Assuming treatments were randomly assigned ! hypothesis tests, p-

values.

• Assuming data are random samples from normal population ! hypoth-

esis tests, p-values, confidence intervals for parameters, power calcula-
tions.

5.2.2 Understanding Degrees of Freedom:

Consider an experiment with m treatments/groups :
9
Data Group means >
>
>
>
y11 . . . , y1n ȳ1· >
=
y21 . . . , y2n ȳ2· nȳ1 + · · · + nȳm ȳ1 + · · · + ȳm
ȳ = =
.. .. >
> mn m
. . >
>
>
;
ym1 . . . , ymn ȳm·

We can “decompose” each observation as follows:

yij = ȳ + (ȳi ȳ) + (yij ȳi )

This leads to
(yij ȳ) = (ȳi ȳ) + (yij ȳi )
total variation = between group variation + within group variation

All data can be decomposed this way, leading to the decomposition of the
data vector of length m ⇥ n into two parts, as shown in Table 5.1. How do we
interpret the degrees of freedom? We’ve heard of degrees of freedom before,
in the definition of a 2 random variable:
CHAPTER 5. INTRODUCTION TO ANOVA 74

Total Treatment Error

y11 ȳ.. = (ȳ1. ȳ.. ) + (y11 ȳ1. )
y12 ȳ.. = (ȳ1. ȳ.. ) + (y12 ȳ1. )
. = . + .
. = . + .
. = . + .
y1n ȳ.. = (ȳ1. ȳ.. ) + (y1n ȳ1. )
y21 ȳ.. = (ȳ2. ȳ.. ) + (y21 ȳ2. )
. = . + .
. = . + .
. = . + .
y2n ȳ.. = (ȳ2. ȳ.. ) + (y2n ȳ2. )
.. .. ..
. . .
ym1 ȳ.. = (ȳm. ȳ.. ) + (ym1 ȳm. )
. = . + .
. = . + .
. = . + .
ymn ȳ.. = (ȳm. ȳ.. ) + (ymn ȳm. )

SSTotal = SSTrt + SSE

mn 1 = m 1 + m(n 1)

Table 5.1: ANOVA decomposition

CHAPTER 5. INTRODUCTION TO ANOVA 75

dof = number of statistically independent elements in a vector

In the ANOVA table, the dof have a geometric interpretation:

dof = number of components of a vector that can vary independently
To understand this latest definition, consider x1 , x2 , x3 and x̄ = (x1 + x2 +
x3 )/3: 0 1 0 1
x1 x̄ c1
@ x2 x̄ A = @ c2 A
x3 x̄ c3
How many degrees of freedom does the vector (c1 , c2 , c3 )T have? How many
components can vary independently, if we know the elements are equal to
some numbers minus the average of the numbers?

c1 + c2 + c3 = x1 x̄ + x2 x̄ x3 x̄
= (x1 + x2 + x3 ) 3x̄
= 3x̄ 3x̄
= 0
Thus we must have c1 + c2 = c3 , and so c1 , c2 , c3 can’t all be independently
varied, only two at a time can be arbitrarily changed. This vector thus lies in
a two-dimensional subspace of R3 , and has 2 degrees of freedom. In general,
0 1
x1 x̄
B .. C
@ . A
xm x̄
is an m-dimensional vector in an m 1 dimensional subspace, having m 1
degrees of freedom.
Exercise: Return to the vector decomposition of the data and obtain the
degrees of freedom of each component. Note that
• dof = dimension of the space the vector lies in
• SS = squared length of the vector

We will soon see the relationship between the geometric interpretation of

degrees of freedom and the interpretation involving 2 random variables.
CHAPTER 5. INTRODUCTION TO ANOVA 76

5.2.3 More sums of squares geometry

Consider the following vectors:
0 0 1 1 0 0 1 1 0 0 1 1
y11 ȳ1· ȳ··
B B . C C B B . C C B B . C C
B @ .. A C B @ .. A C B @ .. A C
B C B C B C
B y1n C B ȳ1· C B ȳ·· C
B C B C B C
B .. C B .. C B .. C
y=B C ȳ = B C ȳ = 1ȳ = B C
B 0 . 1 C B 0 . 1 C ·· B 0 . 1 C
trt ··
B ym1 C B ȳm· C B ȳ·· C
B C B C B C
B B . C C B B . C C B B . C C
@ @ .. A A @ @ .. A A @ @ .. A A
ymn ȳm· ȳ··

We can express our decomposition of the data as follows:

(y ȳ·· ) = (ȳtrt ȳ·· ) + (y ȳtrt )
a = b + c
This is just vector addition/subtraction on vectors of length mn. Recall that
two vectors u and v are orthogonal/perpendicular/at right angles if
m
X
u·v⌘ ui v i = 0
i=1

We have already shown that b and c are orthogonal:

mn
X
b·c = bl cl
k=1
Xm Xn
= (ȳi· ȳ·· )(yij ȳi· )
i=1 j=1
Xm n
X
= (ȳi· ȳ·· ) (yij ȳi· )
i=1 j=1
Xm
= (ȳi· ȳ·· ) ⇥ 0
i=1
= 0
So the vector a is the vector sum of two orthogonal vectors. We can draw
this as follows:
CHAPTER 5. INTRODUCTION TO ANOVA 77

a = (y ȳ·· )
c = (y ȳtrt )

b = (ȳtrt ȳ·· )

Now recall

• ||a||2 = ||y ȳ·· ||2 = SSTotal

• ||b||2 = ||ȳtrt y¯·· ||2 = SST

• ||c||2 = ||y ȳtrt ||2 = SSE

What do we know about right triangles?

||a||2 = ||b||2 + ||c||2

SSTotal = SST + SSE

So the ANOVA decomposition is an application of Pythagoras’ Theorem.

One final observation: recall that

• dof (ȳtrt ȳ·· ) = m 1

• dof (y ȳtrt ) = m(n 1)

• (ȳtrt ȳ·· ) and (y ȳtrt ) are orthogonal.

The last lines means the degrees of freedom must add, so

dof(y ȳ·· ) = (m 1) + m(n 1) = mn 1

CHAPTER 5. INTRODUCTION TO ANOVA 78

5.3 Unbalanced Designs

If the number of replications is constant for all levels of the treatment/factor
then the design is called balanced. Otherwise it is said to be unbalanced.
Unbalanced data:

• yij , i = 1, . . . , m, j = 1, . . . , ni
P
• let N = m i=1 ni be the total sample size.

How does the ANOVA decomposition work in this case? What are the pa-
rameter estimates for the full and reduced model?

Null model:
2
yij = µ + ✏ij Var[✏ij ] =
You should be able to show that the least-squares estimators are
P
• µ̂ = ȳ·· = N1 yij
P
• s2 = N 1 1 (yij ȳ·· )2

Full model:

yij = µi + ✏ij
= µ + ⌧i + ✏ij
Var[✏ij ] = 2

How should we estimate these parameters? When ni = n for all i, we had

Treatment means parameterization: µ̂i = ȳi·

Treatment e↵ects parameterization: µ̂ = ȳ·· , ⌧ˆi = (ȳi· ȳ·· )

1
which meant that ȳ·· = ȳ .
m i·
Similarly, we had
Pm Pn
i=1 (yij ȳi· )2
2
s = Pmj=1
i=1 (n 1)
CHAPTER 5. INTRODUCTION TO ANOVA 79

1
P
which implied that s2 = s2i . However, if ni1 6= ni2 , then in general
m

m m ni m Pm Pni
1 X 1 XX 1 X 2 i=1 (yij ȳi· )2
6
ȳi· = yij , and si 6= Pmj=1
m i=1 N i=1 j=1 m i=1 i=1 (ni 1)

What should the parameter estimates be? With a bit of calculus you can
show that the least squares estimates of µi or (µ, ⌧i ) are
• µ̂i = ȳi

• ⌧ˆi = ȳi ȳ·· , µ̂ = ȳ··

1
P P
We no longer have µ̂ = m
µi , or ⌧ˆi = 0, but we do have
Pm Xm X ni
i=1 ni ȳi
P = yij /N
ni i=1 j=1
= ȳ·· , so
X X
ni µ̂i / ni = µ̂, and
X
ni ⌧ˆi = 0.

So µ̂ is a weighted average of the µ̂i ’s, and a weighted average of the ⌧ˆi ’s is
zero. Similarly,
Pm Pni Pm
2 i=1 j=1 (yij ȳi· )2 (ni 1)s2i
s = Pm = Pi=1
m
i=1 (ni 1) i=1 (ni 1)

so s2 is a weighted average of the s2i ’s.

5.3.1 Sums of squares and degrees of freedom

The vector decomposition is shown in table 5.2. Let a, b and c be the three
vectors in the table. We define the sums of squares as the squared lengths of
these vectors:
P Pni
• SSTotal = ||a||2 = m i=1 j=1 (yij ȳ·· )2
P P i P
• SSTrt = ||b||2 = ti=1 nj=1 (ȳi· ȳ·· )2 = m i=1 ni (yi· ȳ·· )2
P Pni
• SSE = ||c||2 = m i=1 j=1 (yij ȳi· )2
CHAPTER 5. INTRODUCTION TO ANOVA 80

Total Treatment Error

y11 ȳ.. = (ȳ1. ȳ.. ) + (y11 ȳ1. )
y12 ȳ.. = (ȳ1. ȳ.. ) + (y12 ȳ1. )
. = . + .
. = . + .
. = . + .
y1n1 ȳ.. = (ȳ1. ȳ.. ) + (y1n1 ȳ1. )
y21 ȳ.. = (ȳ2. ȳ.. ) + (y21 ȳ2. )
. = . + .
. = . + .
. = . + .
y2n2 ȳ.. = (ȳ2. ȳ.. ) + (y2n2 ȳ2. )
.. .. ..
. . .
ym1 ȳ.. = (ȳm. ȳ.. ) + (ym1 ȳm. )
. = . + .
. = . + .
. = . + .
ymnm ȳ.. = (ȳm. ȳ.. ) + (ymnm ȳt. )

SSTotal = SSTrt + P SSE

m
N 1 = m 1 + i=1 (ni 1)

Table 5.2: ANOVA decomposition, unbalanced case

CHAPTER 5. INTRODUCTION TO ANOVA 81

Lets see if things add in a nice way. First, lets check orthogonality:
ni
m X
X
b·c = (ȳi· ȳ·· )(yij ȳi· )
i=1 j=1
Xm ni
X
= (ȳi· ȳ·· ) (yij ȳi· )
i=1 j=1
Xm
= (ȳi· ȳ·· ) ⇥ 0 = 0
i=1

Note: c must be orthogonal to any vector that is constant within a group.

So we have
a=b+c
! ||a||2 = ||b||2 + ||c||2
b·c=0
and so SSTotal = SST + SSE as before. What about degrees of freedom?
P
• dof(c) = m i=1 (ni 1) = N m should be clear

• dof(a) = N 1 should be clear

• dof(b) = ?

Note that in general,

m
1 X
ȳi· 6= ȳ·· ,
m i=1
But in the vector b we have ni copies of (ȳ· ȳ·· ), and
1 X
P ni ȳi· = ȳ··
ni

and so the vector b does sum to zero. Another way of looking at it is that
the vector b is made up of m numbers, which don’t sum to zero, but their
weighted average sums to zero, and so the degrees of freedom are m 1.

5.3.2 ANOVA table for unbalanced data:

CHAPTER 5. INTRODUCTION TO ANOVA 82

Source Deg. of Freedom Sum of Squares Mean Square F-Ratio

SST
Treatment m 1 SST MST= m 1
MST/MSE
SSE
Noise N m SSE MSE = N m

Total N 1 SSTotal
Now suppose the following model is correct:
2
yij = µi + ✏ij Var[✏ij ] =
2
Does MSE still estimate ?

MSE = SSE/(N m)
Pn1 P m
j=1 (y1j ȳ1· )2 + · · · nj=1 (ymj ȳm· )2
=
(n1 1) + · · · + (nm 1)
(n1 1)s21 + · · · + (nm 1)s2m
=
(n1 1) + · · · + (nm 1)
2
So MSE is a weighted average of a bunch of unbiased estimates of , so it
is still unbiased.
Is the F -statistic still sensitive to deviations from H0 ? Note that a group
with more observations contributes more to the grand mean, but it also
contributes more terms to the SST. One can show
2
• E[MSE] =
2 N
• E[MST] = + v2
m 1 ⌧
, where
Pm
ni ⌧i2
– v⌧2 = i=1
N
– ⌧ i = µi µ̄
Pm
nµ
– µ= P i i.
i=1
ni

So yes, MST/MSE will still be sensitive to deviations from the null, but the
groups with larger sample sizes have a bigger impact on the power.
CHAPTER 5. INTRODUCTION TO ANOVA 83

5.4 Normal sampling theory for ANOVA

Example (Blood coagulation): From a large population of farm animals,
24 animals were selected and randomly assigned to one of four di↵erent diets
A, B, C, D. However, due to resource constraints, rA = 4, rB = 6, rC =
6, rD = 8.

B C
70

C
coagulation time

B C
B C
B
65

B D
A B D
A D
D
A
60

D
A D

diet

Figure 5.3: Coagulation data

Questions:
• Does diet have an e↵ect on coagulation time?
• If a given diet were assigned to all the animals in the population, what
would the distribution of coagulation times be?
• If there is a diet e↵ect, how do the mean coagulation times di↵er?
The first question we can address with a randomization test. For the second
and third we need a sampling model:
yij = µi + ✏ij
2
✏11 . . . ✏mnm ⇠ i.i.d. normal(0, )
This model implies
CHAPTER 5. INTRODUCTION TO ANOVA 84

• independence of errors

• constant variance

• normally distributed data

Another way to write it is as follows:

2
yA1 , . . . , yA4 ⇠ i.i.d. normal(µA , )
2
yB1 , . . . , yB6 ⇠ i.i.d. normal(µB , )
2
yC1 , . . . , yC6 ⇠ i.i.d. normal(µC , )
2
yD1 , . . . , yD8 ⇠ i.i.d. normal(µD , )

So we are viewing the 4 samples under A as a random sample from the

population of coagulation times that would be present if all animals got A
(and similarly for samples under B, C and D).
> anova ( lm ( c t i m e ˜ d i e t ) )
A n a l y s i s o f V a r i a n c e Table

Response : c t i m e
Df Sum Sq Mean Sq F v a l u e
diet 3 228.0 76.0 13.571
R e s i d u a l s 20 1 1 2 . 0 5.6

The F -statistic is large, but how unlikely is it under H0 ?

Two viewpoints on null distributions:

• The 24 animals we selected are fixed. Other possible outcomes of the

experiment correspond to di↵erent assignments of the treatments.

• The 24 animals we selected are random samples from a larger popula-

tion of animals. Other possible outcomes of the experiment correspond
to di↵erent experimental units.

Randomization tests correspond to the first viewpoint, sampling theory to

the second.
CHAPTER 5. INTRODUCTION TO ANOVA 85

5.4.1 Sampling distribution of the F -statistic

2
Recall the distribution:

2 1 X
Y1 , . . . , Yn ⇠ i.i.d. normal(µ, )) 2
(Yi Ȳ )2 ⇠ 2
n 1

Also, 9
X1 ⇠ 2k1 =
2
X2 ⇠ 2k2 ) X1 + X2 ⇠ k1 +k2
;
X1 , X2 independent

Distribution of SSE:
PP P P
(Yij Ȳi· )2 1 1
2 = 2 (Y1j Ȳ1· )2 + · · · + 2 (Ymj Ȳm· )2
2 2
⇠ n1 1 + ··· + nm 1
2
⇠ N m

2 2
So SSE/ ⇠ N m.

Distribution of SST under the null: Under H0 ,

Ȳi ⇠ normal(µ, 2 /ni )

p
ni Ȳi ⇠ normal(ni µ, 2 )
1 1 X
2
SST = 2
ni (Ȳi Ȳ·· )2
m
1 Xp p
= 2
( ni Ȳi ni Ȳ·· )2 ⇠ 2
m 1
i=1

Results so far:
2 2
• SSE/ ⇠ N m

2 2
• SST/ ⇠ m 1

• SSE, SST independent (why?)

CHAPTER 5. INTRODUCTION TO ANOVA 86

Introducing the F -distribution: If

9
X1 ⇠ 2k1 =
X1 /k1
X2 ⇠ 2k2 ) ⇠ Fk1 ,k2
; X2 /k2
X1 ? X2

Fk1 ,k2 is the “F -distribution with k1 and k2 degrees of freedom.”

Application: Under H0
SST
2 /(m 1) M ST
SSE
= ⇠ Fm 1,N m
2 /(N m) M SE

A large value of F is evidence against H0 , so reject H0 if F > Fcrit . How to

determine Fcrit ?

Level-↵ testing procedure:

1. gather data

2. construct ANOVA table

3. reject “H0 : µi = µ for all i” if F > Fcrit

where Fcrit is the 1 ↵ quantile of an Ft 1,N t distribution, available in

R via qf(1 alpha, dof.trt , dof. err) . Under this procedure (and a host of
assumptions),
Pr(reject H0 |H0 true) = ↵

Plots of several di↵erent F -distributions appear in Figure 5.4. Study these

plots until you understand the relationship between the shape of the curves
and the degrees of freedom. Now let’s get back to the data analysis:
> anova ( lm ( c t i m e ˜ d i e t ) )
A n a l y s i s o f V a r i a n c e Table

Response : c t i m e
Df Sum Sq Mean Sq F v a l u e Pr(>F)
diet 3 228.0 7 6 . 0 1 3 . 5 7 1 4 . 6 5 8 e 05 ⇤⇤⇤
R e s i d u a l s 20 1 1 2 . 0 5.6
CHAPTER 5. INTRODUCTION TO ANOVA 87
0.6

F(3,20)
F(3,10)
density
0.4

F(3,5)
F(3,2)
0.2
0.0

0 5 10 15 20
F
1.0
0.8
0.4 0.6
CDF

F(3,20)
F(3,10)
F(3,5)
F(3,2)
0.2

0 5 10 15 20
F

Figure 5.4: F-distributions

CHAPTER 5. INTRODUCTION TO ANOVA 88
0.6
p (F 3,, 20)

randomization 95th percentile

0.4

normal theory 95th percentile

0.2
0.0

0 2 4 6 8
F

Figure 5.5: Normal-theory and randomization distributions of the F -statistic

Fobs< anova ( lm ( c t i m e ˜ d i e t ) ) $F [ 1 ]
Fsim< NULL
f o r ( nsim i n 1 : 1 0 0 0 ) {
d i e t . sim< sample ( d i e t )
Fsim< c ( Fsim , anova ( lm ( c t i m e ˜ d i e t . sim ) ) $F [ 1 ] )
}

> mean ( Fsim>=Fobs )

[1] 0

> 1 p f ( Fobs , 3 , 2 0 )
[ 1 ] 4 . 6 5 8 4 7 1 e 05

5.4.2 Comparing group means

If H0 is rejected, there is evidence that some population means are di↵erent
from others. We can explore this further by making treatment comparisons.
If H0 is rejected we
• estimate µi with ȳi
2
• estimate i with
– s2i : if variances are very unequal, this might be a better estimate.
CHAPTER 5. INTRODUCTION TO ANOVA 89

– MSE : if variances are close and ni is small, this is generally a

better estimate.

Standard practice: Unless there is strong evidence to the contrary, we typi-

cally assume Var[Yij ] = Var[Ykl ] = 2 , and use s2 ⌘ MSE to estimate 2 . In
this case,

Var[µ̂i ] = Var[Ȳi· ] = 2 /ni

⇡ s2 /ni
p
The “standard
p error of the mean” = SE[µ̂ i ] = s2 /ni , is an estimate of
p
SD[µ̂i ] = Var[Ȳi· ] = / ni . The standard error is a very useful quantity.

Standard error: The usual definition of the standard error of an estimator

✓ˆ of a parameter ✓ is an estimate of its sampling standard deviation:

✓ˆ = ✓(Y
ˆ )
ˆ = 2
Var[✓]
\
Var[ ˆ = ˆ2
✓]
SE[✓]ˆ = ˆ

where ˆ 2 is an estimate of 2
. For example,

µ̂i = Ȳi·
Var[µ̂i ] = 2 /ni
\i ] = ˆ 2 /ni = s2 /ni
Var[µ̂
p
SE[µ̂i ] = s/ ni

Confidence intervals for treatment means: Obtaining confidence in-

tervals is very similar to the one-sample case. The only di↵erence is that we
use data from all of the groups to estimate the variance. As a result, the
degrees of freedom changes.

Ȳi· µi Ȳi· µi Ȳi,· µi

=p = p
SE[Ȳi· ] MSE/ni s/ ni
⇠ tN m
CHAPTER 5. INTRODUCTION TO ANOVA 90

Note that degrees of freedom are those associated with MSE, NOT ni 1.
As a result,
Ȳi ± SE[Ȳi· ] ⇥ t1 ↵/2,N m
is a 100 ⇥ (1 ↵)% confidence interval for µi .

A handy rule of thumb: If ✓ˆ is an estimator of ✓, then in many situations

✓ˆ ± 2 ⇥ SE[✓]
ˆ

is an approximate 95% CI for ✓.

Coagulation Example:

Ȳi ± SE[Ȳi· ] ⇥ t1 ↵/2,N m

For 95% confidence intervals,

• t1 ↵,N m= t.975,20 = qt(.975,20) ⇡ 2.1

p p p
• SE[Ȳi ] = s2 /ni = 5.6/ni = 2.37/ ni

Diet µdiet ni SE[µ̂diet ] 95% CI

C 68 6 0.97 (65.9,70.0)
B 66 6 0.97 (63.9,68.0)
A 61 4 1.18 (58.5,63.5)
D 61 8 0.84 (59.2,62.8)

5.4.3 Power calculations for the F-test

Recall, the power of a test is the probability of rejecting H0 when it is false.
Of course, this depends on how the null hypothesis is false.

2
Power(µ, , n) = Pr(reject H0 |µ, 2 , n)
2
= Pr(Fobs > F1 ↵,m 1,N m |µ, , n)
CHAPTER 5. INTRODUCTION TO ANOVA 91

The noncentral F-distribution:

2
Y11 . . . , Y1n ⇠ i.i.d. normal(µ1 , )
.. .. ..
. . .
2
Ym1 . . . , Ymn ⇠ i.i.d. normal(µt , )

Under this sampling model, F = F (Y ) has a non-central F distribution with

• degrees of freedom n 1, N m

• noncentrality parameter
X
=n ⌧i2 / 2

where ⌧i = µi µ̄ is the ith treatment e↵ect.

In many texts, power is expressed as a function of the quantity :
r P sP
n ⌧i2 ⌧i2 /m p
= 2m
= 2 /n
= /m

Lets try to understand what represents:

P 2
2 ⌧i /m
= 2 /n

treatment variation
=
experimental uncertainty
= treatment variation ⇥ experimental precision

Note that “treatment variation” means “average squared treatment e↵ect

size”. We can gain some more intuition by rewriting as follows:
X
= n ⌧i2 / 2
✓P 2 ◆
⌧i 1
= n⇥m⇥ 2
m
between-treatment variation
= N⇥
within-treatment variation
CHAPTER 5. INTRODUCTION TO ANOVA 92

m = 4 α = 0.05 var.b/var.w=1

1.0
10 15 20 25 30 35 40
4.0

0.8
power
F.crit

λ
3.5

0.6
3.0

0.4
2 4 6 8 10 2 4 6 8 10 2 4 6 8 10
n n n

Figure 5.6: Power as a function of n for m = 4, ↵ = 0.05 and ⌧¯2 / 2

m = 4 α = 0.05 var.b/var.w=2
20 30 40 50 60 70 80

0.95
4.0

power
F.crit

0.85
λ
3.5

0.75
3.0

2 4 6 8 10 2 4 6 8 10 2 4 6 8 10
n n n

Figure 5.7: Power as a function of n for m = 4, ↵ = 0.05 and ⌧¯2 / 2

So presumably power is increasing in , and .

2
Power(µ, , n) = Pr(reject H0 |µ, 2 , n)
2
= Pr(Fobs > F1 ↵,m 1,N m |µ, , n)
= 1 pf( qf(1 alpha,m 1,N m) , m 1 , N m , ncp=lambda )

5.5 Model diagnostics

Our model is
yij = µj + ✏ij .

We have shown that, if

CHAPTER 5. INTRODUCTION TO ANOVA 93

A1: {✏ij }’s are independent;

2
A2: Var[✏ij ] = for all j;

A3: {✏ij }’s are normally distributed.

then
F = MST/MSE ⇠ Fm 1,N m,

where is the noncentrality parameter. If in addition

H0 : µi = µ for all i = 1, . . . , m.

then the noncentrality parameter is zero and F = MST/MSE ⇠ Fm 1,N m .

We make these assumptions to

• do power calculations when designing a study,

• test hypotheses after having gathered the data, and

• make confidence intervals comparing the di↵erent treatments.

What should we do if the model assumptions are not correct?

We could have written

2
A0 : {✏ij } ⇠ i.i.d. normal(0, )
to describe all of A1 A3. We don’t do this because some violations of
assumptions are more serious than others. Statistical folklore says the order
of importance is A1, A2 then A3. We will discuss A1 when we talk about
blocking. For now we will talk about A2 and A3.

5.5.1 Detecting violations with residuals

Violations of assumptions can be checked via residual analysis.
CHAPTER 5. INTRODUCTION TO ANOVA 94

Parameter estimates:

yij = ȳ·· + (ȳi· ȳ·· ) + (yij ȳi· )

= µ̂ + ⌧ˆi + ✏ˆij

Our fitted value for any observation in group i is ŷij = µ̂ + ⌧î = ŷi·
Our estimate of the error is ✏îj = yij ȳi· .
✏îj is called the residual for observation i, j.
Assumptions about ✏ij can be checked by examining the values of ✏îj ’s:

5.5.2 Checking normality assumptions:

Two standard graphical ways of assessing normality are with the following:

• Histogram:
Make a histogram of ✏ˆij ’s. This should look approximately bell-shaped
if the (super)population is really normal and there are enough ob-
servations. If there are enough observations, graphically compare the
histogram to a N (0, s2 ) distribution.
In small samples, the histograms need not look particularly bell-shaped.

• Normal probability, or qq-plot:

If ✏ij ⇠ N (0, 2 ) then the ordered residuals (ˆ✏(1) , . . . , ✏ˆ(mn) ) should cor-
respond linearly with quantiles of a standard normal distribution.

How non-normal can a sample from a normal population look? You can
always check yourself by simulating data in R. See Figure ??

Example (Hermit Crab Data): Is there variability in hermit crab pop-

ulation across six di↵erent coastline sites? A researchers sampled the popu-
lation in 25 randomly sampled transects in each of the six sites.

Data: yij = population total in transect j of site i.

Model: Yij = µ + ⌧i + ✏ij

Note that the data are counts so they cannot be exactly normally distributed.
CHAPTER 5. INTRODUCTION TO ANOVA 95

● ●
0.0 0.1 0.2 0.3 0.4 0.5

1
●
● ●

Sample Quantiles
●
●●
●●
●
Density

0
●●
●
● ●
● ●

−1 −2
●

−2 −1 0 1 2 −2 −1 0 1 2
y Theoretical Quantiles

● ●
●
0.0 0.1 0.2 0.3 0.4 0.5

●●
1

●●
●●
●●●●●
Sample Quantiles

●●
●●●
●
●
●●●
●
Density

●
0

●●●●●
●●●●●
●
●●●
●●●●
●●
−1

● ●
−2

●
●

−2 −1 0 1 2 −2 −1 0 1 2
y Theoretical Quantiles

●
●
0.0 0.1 0.2 0.3 0.4 0.5

●●●
●●●
Sample Quantiles

●
●●●●●
1

●●
●
●
●●
●●
●●
●●
●●
●
●
Density

●
●●
●●
●●
●●
●
●●
●●
●
●●
●
●
●
●
●
●
0

●
●
●●
●●
●
●
●
●●
●
●●
●●
●●
●●
●
●
●●
●●
●●
●●●
●
●●●
−1

●●
●●●●
●●
●
−2

●
●

−2 −1 0 1 2 −2 −1 0 1 2
y Theoretical Quantiles

Figure 5.8: Normal scores plots of normal samples, with n 2 {20, 50, 100}
CHAPTER 5. INTRODUCTION TO ANOVA 96

Data description: See Figure 5.9.

site sample mean sample median sample std dev

1 33.80 17 50.39
2 68.72 10 125.35
3 50.64 5 107.44
4 9.24 2 17.39
5 10.00 2 19.84
6 12.64 4 23.01
ANOVA:

> anova ( lm ( c r a b [ , 2 ] ˜ a s . f a c t o r ( c r a b [ , 1 ] ) ) )
A n a l y s i s o f V a r i a n c e Table

Response : c r a b [ , 2 ]
Df Sum Sq Mean Sq F v a l u e Pr(>F)
as . f a c t o r ( crab [ , 1 ] ) 5 76695 15339 2 . 9 6 6 9 0 . 0 1 4 0 1 ⇤
Residuals 144 744493 5170

Residuals:
✏ˆij = yij µ̂i = yij ȳi·

Residual diagnostic plots are in Figure 5.10. The data are clearly not
normally distributed.

5.5.3 Checking variance assumptions

2
The null distribution in the F-test is based on Var[✏ij ] = for all groups i.

(a) tabulate residual variance in each treatment:

Trt Sample var. of (ˆ✏i1 , . . . , ✏ˆin )
1 s21
.. ..
. .
m s2m
CHAPTER 5. INTRODUCTION TO ANOVA 97

site1 site2

0.012
0.020

0.004 0.008
Density

Density
0.010 0.000

0.000
0 100 200 300 400 0 100 200 300 400
population population
site3 site4
0.015

0.12
0.010

0.08
Density

Density
0.005

0.04
0.000

0.00

0 100 200 300 400 0 100 200 300 400

population population
site5 site6
0.06
0.06

0.04
Density

Density
0.04

0.02
0.02
0.00

0.00

0 100 200 300 400 0 100 200 300 400

population population

Figure 5.9: Crab data

CHAPTER 5. INTRODUCTION TO ANOVA 98
0.012

100 200 300 400

●
●●

Sample Quantiles
0.008

●●

●
0.004

●
●●
●●
●
●
●●
●
●
●●
●
●●
●
●●
●
●
●●
●
●

0
●
●
●●
●
●
●●●
●
●
●●
●
●●●
●●
●
●
●●
● ●
●
●
●●
●
●●
●
●
●●
●
●
0.000

●
●
●●
●
●●
●
●●
●●
●
●
●●
●
●●
●
●
●
● ●●●●●●
●●
●

−100 0 100 200 300 400 −2 −1 0 1 2

residuals Theoretical Quantiles

Figure 5.10: Crab residuals

Rule of thumb:

If s2largest /s2smallest < 4: don’t worry;

If s2largest /s2smallest > 7: worry: need to account for the non-constant
variance, especially if sample sizes are di↵erent. Consider a ran-
domization test, or converting the data to ranks for the overall
test of treatment e↵ects.

(b) Residual vs. fitted value plots

For many types of data there is a mean-variance relationship: typi-

cally, groups with large means tend to have large variances. This is
especially true when the underlying distributions are skewed.

To check this, plot ✏ˆij (residual) vs. ŷij = ȳi· (fitted value).

Example: Crab data

Site Sample mean Sample standard deviation
4 9.24 17.39
5 10.00 19.84
6 12.64 23.01
1 33.80 50.39
3 50.64 107.44
2 68.72 125.35
CHAPTER 5. INTRODUCTION TO ANOVA 99
100 200 300 400

●
●

●
residual

● ●

● ● ●
● ● ●
● ● ●
●
●
● ●
●
●●●
● ●
● ● ● ●
●
0

●
●●
● ● ● ●
●
●
●●
●
●
● ●
●
● ●
●
●
●
●
● ●
●
● ●
●
● ● ●
●
●
● ●
●
● ●
●
●
●
●
●
● ●
●
●
●
●
●
●
●

10 20 30 40 50 60 70
fitted value

Figure 5.11: Fitted values versus residuals

Here we have s2largest /s2smallest ⇡ 50. This is a problem.

(c) Statistical tests of equality of variance:

Levene’s Test: Let dij = |yij ỹi |, where ỹi is the sample median
from group i.
These di↵erences will be ‘large’ in group i, if group i has a ‘large’
variance; and ‘small’ in group i, if group i has a ‘small’ variance.
We compute:
Pt ¯ 2 /(t 1)
n (di· d)
F0 = Pt Pni=1
i=1 j=1 (dij d¯i· )2 /(t(n 1))

which is the ratio of the between group variability of the dij to the
within group variability of the dij .
2
Reject H0 : Var[✏ij ] = for all i, j if F0 > Ft 1,t(n 1),1 ↵

Crab data:
14, 229
F0 = = 2.93 > F5,144,0.95 = 2.28
4, 860
hence we reject the null hypothesis of equal variances at the 0.05 level.
See also
CHAPTER 5. INTRODUCTION TO ANOVA 100

– the F Max test

– Bartlett’s test;
– the normal-theory test of equality for two variances (although this
depends on normality).

Crab data: So the assumptions that validate the use of the F -test are
violated. Now what?

5.5.4 Variance stabilizing transformations

Recall that one justification of the normal model,

Yij = µi + ✏ij ,

was that if the noise ✏ij = Xij1 + Xij2 + · · · was the result of the addition of
unobserved additive, independent e↵ects then by the central limit theorem
✏ij will be approximately normal.
However, suppose the e↵ects are multiplicative, so that in fact:

Yij = µi ⇥ ✏ij = µi ⇥ (Xij1 ⇥ Xij2 ⇥ · · · )

In this case, the Yij will not be normal, and the variances will not be constant:

Var[Yij ] = µ2i Var[Xij1 ⇥ Xij2 ⇥ · · ·]

Log transformation:

log Yij = log µi + (log Xij1 + log Xij2 + · · · )

Var[log Yij ] = Var[log µi + log Xij1 + log Xij2 + · · ·]
= Var[log Xij1 + log Xij2 + · · ·]
2
= log y

So that the variance of the log-data does not depend on the mean µi . Also
note that by the central limit theorem the errors should be approximately
normally distributed.
CHAPTER 5. INTRODUCTION TO ANOVA 101

●
200 400
crab population

● ●

●
● ●
●
●
● ● ●
● ●
● ● ●
0

1 2 3 4 5 6
site
log crab population
−2 0 2 4 6

1 2 3 4 5 6
site

Figure 5.12: Data and log data

Crab data: Let Yij = log(Yijraw + 1/6)

Site Sample mean Sample standard deviation

6 0.82 2.21
4 0.91 1.87
5 1.01 1.74
3 1.75 2.41
1 2.16 2.27
2 2.30 2.44

> anova ( lm ( l o g ( c r a b [ , 2 ] + 1 / 6 ) ˜ a s . f a c t o r ( c r a b [ , 1 ] ) ) )
A n a l y s i s o f V a r i a n c e Table

Response : l o g ( c r a b [ , 2 ] + 1 / 6 )
Df Sum Sq Mean Sq F v a l u e Pr(>F)
as . f a c t o r ( crab [ , 1 ] ) 5 54.73 10.95 2.3226 0.04604 ⇤
Residuals 144 6 7 8 . 6 0 4.71
> anova ( lm ( l o g ( c r a b [ , 2 ] + 1 / 6 ) ˜ a s . f a c t o r ( c r a b [ , 1 ] ) ) )
A n a l y s i s o f V a r i a n c e Table
CHAPTER 5. INTRODUCTION TO ANOVA 102

● ●
0.20

● ●

4
●●●● ● ● ●
●
●● ● ●
●● ● ●
●
●● ● ● ●
●
●
● ●
●
●
●●
●
● ●
● ● ●
●
●
●●
●
●
●● ●
● ● ● ● ●
●
0.15

● ●

2
●
● ● ● ●

Sample Quantiles
●
●● ● ●
● ●
●● ● ●
● ● ●
●● ● ● ● ●
●●
● ● ● ●
● ●
● ● ● ●

residual
●
●
●● ● ● ● ●
●
●
● ● ● ●
●
●
●● ● ● ● ●
●
● ●
0.10

●
● ● ● ●

0
●
●
●● ● ● ● ● ●
●
●●
●
● ● ●
● ●
●
● ● ● ●
●
●
●● ● ● ●
● ●
●
● ● ●
● ●
0.05

−2

−2
● ●
●
●●
●
●
●●
●
●
●●
●● ● ●
●
●
●● ●

●
●●
●
● ●
0.00

−4

−4
● ●●●●●●
●● ● ●

−4 −2 0 2 4 −2 −1 0 1 2 1.0 1.5 2.0

residual Theoretical Quantiles fitted value

Figure 5.13: Diagnostics after the log transformation

Response : l o g ( c r a b [ , 2 ] + 1 / 6 )
Df Sum Sq Mean Sq F v a l u e Pr(>F)
as . f a c t o r ( crab [ , 1 ] ) 5 54.73 10.95 2.3226 0.04604 ⇤
Residuals 144 6 7 8 . 6 0 4.71

Other transformations: For data having multiplicative e↵ects , we showed

that
i / µi ,

and taking the log stabilized the variances. In general, we may observe:
↵
i / µi i.e. the standard deviation of a group depends on the group mean.

The goal of a variance stabilizing transformation is to find a transformation

of yij to yij⇤ such that yij⇤ / (µ⇤i )0 = 1, i.e. the standard deviation doesn’t
depend on the mean.
Consider the class of power transformations, transformations of the form
Yij⇤ = Yij . Based on a Taylor series expansion of g (Y ) = Y around µi , we
have

Yij⇤ = g (Yij )
1
⇡ µi + (Yij µ i ) µi
⇤
E[Yij ] ⇡ µi
Var[Yij⇤ ] ⇡ E[(Yij µi )2 ]( µi 1 2
)
SD[Yij⇤ ] / µ↵i µi 1
= µ↵+
i
1
CHAPTER 5. INTRODUCTION TO ANOVA 103

⇠
So if we observe i / µ↵i , then i⇤ / µ↵+i
1
. So if we take = 1 ↵ then
we will have stabilized the variances to some extent. Of course, we typically
don’t know ↵, but we could try to estimate it from data.

Estimation of ↵:

i / µ↵i , i = cµ↵i
log i = log c + ↵ ⇥ log µi ,
so log si ⇡ log c + ↵ log ȳi·

Thus we may use the following procedure:

(1) Plot log si vs. log ȳi·

(2) Fit a least squares line: lm( log si ⇠ log ȳi· )

(3) The slope ↵

ˆ of the line is an estimate of ↵.

(4) Analyze yij⇤ = yij1 ↵ˆ .

Here are some common transformations:

mean-var relation ↵ =1 ↵ transform ⇤
yij

y / const. 0 1 no transform! yij

1/2 1/2 p
y / µi 1/2 1/2 square root yij = yij
3/4 1/4
y / µi 3/4 1/4 quarter power yij
y / µi 1 0 log log yij
3/2 1/2
y / µi 3/2 -1/2 reciproc. sqr. root yij
y / µi
2 2 -1 reciprocal 1/yij

Note that

all the mean-variance relationships here are examples of power-laws.

Not all mean-variance relations are of this form.

↵ = 1 is the multiplicative model discussed previously.

CHAPTER 5. INTRODUCTION TO ANOVA 104

More about the log transform

How did ↵ = 1 give us yij⇤ = log yij , shouldn’t it be yij1 ↵
= yij = yij0 = 1 in
there?

Everything will make sense if we define for any 6= 0:

y 1
y ⇤( )
= / y + c.

For = 0, it’s natural to define the transformation as:

y 1
y ⇤(0) = lim y ⇤( )
= lim
!0 !0
y ln y
= = ln y
1 =0

Note that for a given 6= 0 it will not change the results of the ANOVA on
the transformed data if we transform using:

y 1
y⇤ = y or y ⇤( )
= = ay + b.

To summarize the procedure: If the data present strong evidence of

nonconstant variance,

(1) Plot log si vs. log ȳi· . If the relationship looks linear, then

(2) Fit a least squares line: lm( log si ⇠ log ȳi· )

(3) The slope ↵

ˆ of the line is an estimate of ↵.

(4) Analyze yij⇤ = yij1 ↵ˆ .

This procedure is called the “Box-Cox” transformation (George Box and

David Cox first proposed the method in a paper in 1964). If the relationship
between log si and log ȳi does not look linear, then a Box-Cox transformation
will probably not help.
CHAPTER 5. INTRODUCTION TO ANOVA 105

A few words of caution:

• For variance stabilization via the Box-Cox procedure to do much good,
the linear relationship between means and variances should be quite
tight.

• Remember the rule of thumb which says not to worry if the ratio of
the largest to smallest variance is less than 4, i.e. don’t use a transform
unless there are drastic di↵erences in variances.

• Don’t get too carried away with the Box-Cox transformations. If ↵ ˆ=

0.53 don’t analyze yij⇤ = yij0.47 , just use yij0.5 . Remember, ↵
ˆ is just an
estimate anyway (See Box and Cox ‘Rebuttal’, JASA 1982.)

• Remember to make sure that you describe the units of the transformed
data, and make sure that readers of your analysis will be able to un-
derstand that the model is additive in the transformed data, but not
in the original data. Also always include a descriptive analysis of the
untransformed data, along with the p-value for the transformed data.

• Try to think about whether the associated non-linear model for yij
makes sense.

• Don’t assume that the transformation is a magical fix: remember to

look at residuals and diagnostics after you do the transform. If things
haven’t improved much, don’t transform.

• Remember that the mean-variance relationship might not be cured by

a transform in the Box-Cox class. For example, if the response is a
binomial proportion
p (= proportion of successes out of n), we have mean
= p, s.d. = p(1 p); the variance stabilizing transformation in this
p
case is y ⇤ = arcsin y.

• Keep in mind that statisticians disagree on the usefulness of transfor-

mations: some regard them as a ‘hack’ more than a ‘cure’:

• It can be argued that if the scientist who collected the data had a good
reason for using certain units, then one should not just transform the
data in order to bang it into an ANOVA-shaped hole. (Given enough
time and thought we could instead build a non-linear model for the
original data.)
CHAPTER 5. INTRODUCTION TO ANOVA 106

• The sad truth: as always you will need to exercise judgment while
performing your analysis.

These warnings apply whenever you might reach for a transform, whether in
an ANOVA context, or a linear regression context.

Example (Crab data): Looking at the plot of means vs. sd.s suggests
↵ ⇡ 1, implying a log-transformation. However, the zeros in our data lead
to problems, since log(0) = 1.
Instead we can use yij⇤ = log(yij +1/6). For the transformed data this gives us
a ratio of the largest to smallest standard deviation of approximately 2 which
is acceptable based on the rule of 4. Additionally, the residual diagnostic
plots (Figure 5.13) are much improved
This table needs to be fixed: Third column needs to be sd(log(y)).

site sample sd sample mean log(sample sd) log(sample mean)

4 17.39 9.24 2.86 2.22
5 19.84 10.00 2.99 2.30
6 23.01 12.64 3.14 2.54
1 50.39 33.80 3.92 3.52
3 107.44 50.64 4.68 3.92
2 125.35 68.72 4.83 4.23

lm ( f o r m u l a = l o g s d ˜ log mean )
Coefficients :
( Intercept ) log mean
0.6652 0.9839

5.6 Treatment Comparisons

Recall the coagulation time data from the beginning of the chapter: Four
di↵erent diets were assigned to a population of 24 animals, with n1 = 4,
n2 = 6, n3 = 6 and n4 = 8.
> anova ( lm ( c t i m e ˜ d i e t ) )
A n a l y s i s o f V a r i a n c e Table
CHAPTER 5. INTRODUCTION TO ANOVA 107

●
●
4.5
4.0
log(sd)

●
3.5

●
3.0

●
●

2.5 3.0 3.5 4.0

log(mean)

Figure 5.14: Mean-variance relationship of the transformed data

Response : c t i m e
Df Sum Sq Mean Sq F v a l u e Pr(>F)
diet 3 228.0 7 6 . 0 1 3 . 5 7 1 4 . 6 5 8 e 05 ⇤⇤⇤
R e s i d u a l s 20 1 1 2 . 0 5.6

We conclude from the F -test that there are substantial di↵erences between
the population treatment means. How do we decide what those di↵erences
are?

5.6.1 Contrasts
Di↵erences between sets of means can be evaluated by estimating contrasts.
A contrast is a linear function of the means such that the coefficients sum to
zero: m m
X X
C = C(µ, k) = ki µi , where ki = 0
i=1 i=1

Examples:

• diet 1 vs diet 2 : C = µ1 µ2
CHAPTER 5. INTRODUCTION TO ANOVA 108

• diet 1 vs diets 2,3 and 4 : C = µ1 (µ2 + µ3 + µ4 )/3

• diets 1 and 2 vs diets 3 and 4 : C = (µ1 + µ2 )/2 (µ3 + µ4 )/2

Contrasts are functions of the unknown parameters. We can estimate

them, and obtain standard errors for them. This leads to confidence intervals
and hypothesis tests.

Parameter estimates: Let

m
X m
X
Ĉ = k i µi = ki ȳi·
i=1 i=1

Then E[Ĉ] = C, so Ĉ is a unbiased estimator of C.

Standard errors:
m
X
Var[Ĉ] = Var[ki ȳi· ]
i=1
Xm
= ki2 2
/ni
i=1
m
X
2
= ki2 /ni
i=1

So an estimate of Var[Ĉ] is
m
X
s2C =s 2
ki2 /ni
i=1

The standard error of a contrast is an estimate of its standard deviation

s
X k2
i
SE[Ĉ] = s
ni
CHAPTER 5. INTRODUCTION TO ANOVA 109

t-distributions for contrasts: Consider

Pm
Ĉ i=1 ki ȳi·
= pP
SE[Ĉ] s ki2 /ni
P
If the data are normally distributed , then under H0 : C = ki µi = 0,

Ĉ
⇠ tN m
SE[Ĉ]

Exercise: Prove this result

Hypothesis test:

• H0 : C = 0 versus H1 : C 6= 0.

• Level-↵ test : Reject H0 if |Ĉ/SE[Ĉ]| > t1 ↵/2,N m .

• p-value: Pr(|tN m| > |Ĉ(y)/SE[Ĉ(y)]|) = 2*(1-pt( abs(c_hat/se_c_hat),N-m))

Example: Recall in the coagulation example µ̂1 = 61, µ̂2 = 66, and their
95% confidence intervals were (58.5,63.5) and (63.9,68.0). Let C = µA µB .

Hypothesis test: H0 : C = 0.

Ĉ ȳ ȳ2·
p 1·
=
SE[Ĉ] s 1/6 + 1/4
5
=
1.53
= 3.27

p-value = 2*(1-pt(3.27,20)) = 0.004.

Confidence intervals: Based on the normality assumptions, a 100 ⇥ (1

↵)% confidence interval for C is given by

Ĉ ± t1 ↵/2,N m ⇥ SE[Ĉ]
CHAPTER 5. INTRODUCTION TO ANOVA 110

●
20

●
●
●
18

●
●
grain yield

●
● ●
●
16

●
●
14

●
●
12

10 20 30 40 50
plant density

Figure 5.15: Yield-density data

5.6.2 Orthogonal Contrasts

What use are contrasts beyond just comparing two means? Consider the
data in Figure 5.15, which show the results of a CRD for an experiment on
the e↵ects of planting density on crop yield in which there were three fields
randomly assigned to each of 5 planting densities.

> anova(lm(y~as.factor(x))
Df Sum Sq Mean Sq F value Pr(>F)
as.factor(x) 4 87.600 21.900 29.278 1.690e-05 ***
Residuals 10 7.480 0.748

There is strong evidence of an e↵ect of planting density. How should we sum-

marize the e↵ect? In this experiment, the treatment levels have an ordering
to them (this is not always the case). Consider the following m 1 = 4
contrasts:
Contrast k1 k2 k3 k4 k5
C1 -2 -1 0 1 2
C2 2 -1 -2 -1 2
C3 -1 2 0 -2 1
C4 1 -4 6 -4 1
Note:
CHAPTER 5. INTRODUCTION TO ANOVA 111

• These are all actually contrasts (the coefficients sum to zero).

• They are orthogonal : Ci · Cj = 0.

What are these contrasts representing? Would would make them large?

• If all µi ’s are the same, then they will all be close to zero. This is the
“sum to zero” part, i.e. Ci · 1 = 0 for each contrast.

• C1 will be big if µ1 , . . . , µ5 are monotonically increasing or monoton-

ically decreasing. This contrast is measuring the linear component of
the relationship between density and yield.

• C2 is big if “there is a bump”, up or down, in the middle of the treat-

ments, i.e. C2 is measuring the quadratic component of the relationship.

• Similarly, C3 and C4 are measuring the cubic and quartic parts of the
relationship between density and yield.

The orthogonality bit is important: Suppose the true relationship between

density and yield is linear, eg, µi = ↵ + ⇥ densityi . Then C2 (µ) = C3 (µ) =
C4 (µ) = 0.
You can produce these contrast coefficients in R:
> t ( contr . poly ( 5 ) )
[ ,1] [ ,2] [ ,3] [ ,4] [ ,5]
.L 0.6324555 0.3162278 0 . 0 0 0 0 0 0 e+00 0.3162278 0.6324555
.Q 0 . 5 3 4 5 2 2 5 0.2672612 5.345225 e 01 0.2672612 0.5345225
.C 0.3162278 0 . 6 3 2 4 5 5 5 4.095972 e 16 0.6324555 0.3162278
ˆ4 0 . 1 1 9 5 2 2 9 0.4780914 7 . 1 7 1 3 7 2 e 01 0.4780914 0.1195229
P
Here each contrast has been normalized so that ki2 = 1. This doesn’t
change the orthogonality.
Estimating the contrasts is easy:
> t r t . means< t a p p l y ( y , x , mean )
> t r t . means
10 20 30 40 50
12 16 19 18 17

> c . hat< t r t . means%⇤%c o n t r . p o l y ( 5 )

> c . hat
.L .Q .C ˆ4
[ 1 , ] 3.794733 3.741657 0 . 3 1 6 2 2 7 8 0 . 8 3 6 6 6
CHAPTER 5. INTRODUCTION TO ANOVA 112

> 3⇤ c . hat ˆ2
. L .Q .C ˆ4
[ 1 , ] 4 3 . 2 42 0 . 3 2 . 1

> sum ( 3 ⇤ c . hat ˆ 2 )

[ 1 ] 87.6

Coincidence? I think not. Recall, we represented treatment variation as a

vector that lived in m 1 dimensional space. This variation can be further
decomposed into m 1 orthogonal parts:
Source df SS MS F
Linear trt 1 43.20 43.20 57.75
Quad trt 1 42.00 42.00 56.15
Cubic trt 1 0.30 0.30 0.40
Quart trt 1 2.10 2.10 2.81
Total trt 4 87.60 21.90 29.28
Error 10 7.48 0.75
Total 14 95.08
The useful idea behind orthogonal contrasts is that the treatment varia-
tion can be decomposed into orthogonal parts. As you might expect, under
H0r : Cr = 0, the F -statistic corresponding to the rth contrast has an F -
distribution with 1 and N m degrees of freedom (assuming normality, con-
stant variance, etc.). For the planting density data, we find strong evidence
of linear and quadratic components to the relationship between density and
yield.

5.6.3 Multiple Comparisons

An experiment with m treatment levels has m2 pairwise comparisons, i.e.
contrasts of the form C = µi µj . Should we perform hypothesis tests for
all comparisons?
The more hypotheses we test, the higher the probability that at least one of
them will be rejected, regardless of their validity.

Two levels of error: Define the hypotheses

• H0 : µi = µj for all i, j
CHAPTER 5. INTRODUCTION TO ANOVA 113

• H0ij : µi = µj
We can associate error rates to both of these types of hypotheses
• Experiment-wise type I error rate: Pr(reject H0 |H0 is true ).

• Comparison-wise type I error rate : Pr(reject H0ij |H0ij is true ).

Consider the following procedure:
1. Gather data

2. Compute all pairwise contrasts and their t-statistics

3. Reject each H0ij for which |tij | > t1 ↵C /2,N m

Letting tcrit = t1 ↵C /2,N m , the comparison-wise type I error rate is of

course
P (|tij | > tcrit |H0ij ) = ↵C .
The experiment-wise type I error rate is the probability that we say
di↵erences between treatments exist when no di↵erences exist:

P (|tij | > tcrit for some i, j |H0 ) ↵C

with equality only if there are two treatments total. The fact that the
experiment-wise error rate is larger than the comparison-wise rate is called
the issue of multiple comparisons. What is the experiment-wise rate in this
analysis procedure?

Pr(one or more H0ij rejected |H0 ) = 1 Pr(none of H0ij rejected |H0 )

<
Y
⇠ 1 Pr(H0ij not rejected |H0 )
i,j
m
= 1 (1 ↵C )( 2 )

We can approximate this with a Taylor series expansion: Let f (x) = 1 (1

↵C )x . Then f 0 (0) = log(1 ↵C ) and

f (x) ⇡ f (0) + xf 0 (0)

1 (1 ↵C )x ⇡ (1 1) x log(1 ↵C )
⇡ x↵C for small ↵C .
CHAPTER 5. INTRODUCTION TO ANOVA 114

So ✓ ◆
<m
Pr(one or more H0ij rejected |H0 ) ⇠ ↵C
2
If we are worried about possible dependence among the tests, perhaps a
better way to derive this bound is to recall that

Pr(A1 [ A2 ) = Pr(A1 ) + Pr(A2 ) Pr(A1 \ A2 )

 Pr(A1 ) + Pr(A2 ) and that
Pr(A1 [ · · · [ Ak )  Pr(A1 ) + · · · + Pr(Ak )

(subadditivity). Therefore
X
P (one or more H0ij rejected |H0 )  P (H0ij rejected |H0 )
i,j
✓ ◆
m
= ↵C
2

Bonferroni error control:

The Bonferroni procedure for controlling experiment-wise type I error rate
is as follows:
m
1. Compute pairwise t-statistics on all 2
pairs.

2. Reject H0ij if |tij | > t1 ↵C /2,N m

m
where ↵C = ↵E / 2
.

• the experiment-wise error rate is less than ↵E

• the comparison-wise error rate is ↵C .

So for example if ↵E = 0.05 and m = 5, then ↵c = 0.005.

Fisher’s Protected LSD (Fisher’s least-significant-di↵erence method):

Another approach to controlling type I error makes use of the F -test.

1. Perform ANOVA and compute the F -statistic.

2. If F (y) < F1 ↵E ,m 1,N m then don’t reject H0 and stop.

CHAPTER 5. INTRODUCTION TO ANOVA 115

3. If F (y) > F1 ↵E ,m 1,N m then reject H0 and reject all H0ij for which
|Ĉij /SE[Ĉij ]| > t1 ↵C /2,N m .

Under this procedure,

• Experiment-wise type I error rate is ↵E

• Comparison-wise type I error rate is

– If H0 is also true, this is less than ↵E , and is generally between ↵C ⇥

↵E and ↵E , depending on how many treatments there are. For
example, if there only a few treatments, then F > Fcrit suggests
|tij | > tcrit .
– If H0 is not true, we don’t know what this is, but some simulation
work suggests this is approximately ↵C ⇥power, again depending
on how many treatments there are and the true treatment varia-
tion.

5.6.4 False Discovery Rate procedures

5.6.5 Nonparametric tests
Chapter 6

Factorial Designs

Example (Insecticide): In developing methods of pest control, researchers

are interested in the efficacy of di↵erent types of poison and di↵erent de-
livery methods.

• Treatments:

– Type 2 {I, II, III}

– Delivery 2 {A, B, C, D}

• Response: time to death, in minutes.

Possible experimental design: Perform two separate CRD experiments,

one testing for the e↵ects of Type and the other for Delivery. But,

which Delivery to use for the experiment testing for Type e↵ects?

which Type to use for the experiment testing for Delivery e↵ects?

To compare di↵erent Type⇥Delivery combinations, we need to do experi-

ments under all 12 treatment combinations.

Experimental design: 48 insects randomly assigned to treatments: 4 to

each treatment combination, i.e.

4 assigned to (I, A),

4 assigned to (I, B),

116
CHAPTER 6. FACTORIAL DESIGNS 117

..
.
It might be helpful to visualize the design as follows:

Delivery
Type A B C D
1 y I,A y I,B y I,C y I,D
2 y II,A y II,B y II,C y II,D
3 y III,A y III,B y III,C y III,D

This type of design is called a factorial design. Specifically, this design is a

3 ⇥ 4 two-factor design with 4 replications per treatment combination.

Factors: Categories of treatments

Levels of a factor: the di↵erent treatments in a category

So in this case, Type and Delivery are both factors. There are 3 levels of
Type and 4 levels of Delivery.

6.1 Data analysis:

Let’s first look at a series of plots:

Marginal Plots: Based on these marginal plots, it looks like (III, A) would
be the most e↵ective combination. But are the e↵ects of Type consis-
tent across levels of Delivery?

Conditional Plots: Type III looks best across delivery types. But the dif-
ference between types I and II seems to depend on delivery. For exam-
ple, for delivery methods B or D there doesn’t seem to be much of a
di↵erence between I and II.

Cell Plots: Another way of looking at the data is to just view it as a CRD
with 3 ⇥ 4 = 12 di↵erent groups. Sometimes each group is called a cell.

Notice that there seems to be a bit of a mean-variance relationship. Let’s

take care of this before we go any further: Plotting means versus standard
deviations on both the raw and log scale gives the relationship in Figure 6.4.
Computing the least squares line gives
CHAPTER 6. FACTORIAL DESIGNS 118

●
12

12
10

10
8

8
●
6

6
4

4
2

2
I II III A B C D

Figure 6.1: Marginal Plots.

> lm ( l o g ( s d s ) ˜ l o g ( means ) )
Coefficients :
( Intercept ) l o g ( means )
3.203 1.977

So i ⇡ cµ2i . This suggests a reciprocal transformation, i.e. analyzing yi,j =

1/ time to death. Does this reduce the relationship between means and vari-
ances? Figure 6.5 shows the mean-variance relationship for the transformed
data. The transformation seems to have improved things. We select this as
our data scale and start from scratch, beginning with plots (Figure 6.6).

Possible analysis methods: Let’s first try to analyze these data using
our existing tools:
• Two one-factor ANOVAS: Just looking at Type, for example, the ex-
periment is a one-factor ANOVA with 3 treatment levels and 16 reps
per treatment. Conversely, looking at Delivery, the experiment is a one
factor ANOVA with 4 treatment levels and 12 reps per treatment.
> dat$y < 1/dat$y
> anova ( lm ( dat$y ˜ d a t $ t y p e ) )

Df Sum Sq Mean Sq F v a l u e Pr(>F)

dat$type 2 0 . 3 4 8 7 7 0 . 1 7 4 3 9 2 5 . 6 2 1 3 . 7 2 8 e 08 ⇤⇤⇤
CHAPTER 6. FACTORIAL DESIGNS 119

A B
4.5

12
4.0

10
3.5

8
3.0

6
2.5

4
2.0

I II III I II III

C D
10
7

9
6

8
7
5

6
4

5
3

4
3
2

I II III I II III

Figure 6.2: Conditional Plots.

CHAPTER 6. FACTORIAL DESIGNS 120
12
10
8
6
4
2

I.A II.A I.B II.B I.C II.C I.D II.D

Figure 6.3: Cell plots.

● ●
1.0

●
●
8

● ●
7

● ●
0.0
log(sds)

●
means
5 6

● ● ●
●
●
−1.0

●
4

●
●
● ● ● ●
3

−2.0

●
● ●
2

0.0 1.0 2.0 3.0 0.8 1.2 1.6 2.0

sds log(means)

Figure 6.4: Mean-variance relationship.

CHAPTER 6. FACTORIAL DESIGNS 121

● ●

−2.6
●
●
0.4

−3.0
●
● ●

log(sds)
means

●
0.3

● ● ● ●
● ●

−3.4
●
0.2

●
● ● ●

−3.8
●
●
● ●

0.02 0.04 0.06 0.08 −2.2 −1.8 −1.4 −1.0

sds log(means)

Figure 6.5: Mean-variance relationship for transformed data.

R e s i d u a l s 45 0 . 3 0 6 2 8 0 . 0 0 6 8 1

> anova ( lm ( dat$y ˜ d a t $ d e l i v e r y ) )

Df Sum Sq Mean Sq F v a l u e Pr(>F)

d a t $ d e l i v e r y 3 0 . 2 0 4 1 4 0 . 0 6 8 0 5 6 . 6 4 0 1 0 . 0 0 0 8 4 9 6 ⇤⇤⇤
Residuals 44 0 . 4 5 0 9 1 0 . 0 1 0 2 5

• A one-factor ANOVA: There are 3 ⇥ 4 = 12 di↵erent treatments:

> anova ( lm ( dat$y ˜ d a t $ t y p e : d a t $ d e l i v e r y ) )

Df Sum Sq Mean Sq F v a l u e Pr(>F)

d a t $ t y p e : d a t $ d e l i v e r y 11 0 . 5 6 8 6 2 0 . 0 5 1 6 9 2 1 . 5 3 1 1 . 2 8 9 e 12 ⇤⇤⇤
Residuals 36 0 . 0 8 6 4 3 0 . 0 0 2 4 0

Why might these methods be insufficient?

• What are the SSE, MSE representing in the first two ANOVAS? Why
are they bigger than the value in the in the third ANOVA?
CHAPTER 6. FACTORIAL DESIGNS 122

●
12

12
10

10
8

8
●
6

6
4

4
2

I II III A B C D

A B
0.55

0.30
0.45

0.20
0.35
0.25

0.10

I II III I II III

C D
0.45

0.30
0.35

0.20
0.25
0.15

0.10

I II III I II III
0.1 0.2 0.3 0.4 0.5

I.A II.A III.A I.B II.B III.B I.C II.C III.C I.D II.D III.D

Figure 6.6: Plots of transformed poison data

CHAPTER 6. FACTORIAL DESIGNS 123

• In the third ANOVA, can we assess the e↵ects of Type and Delivery
separately?
• Can you think of a situation where the F -stats in the first and second
ANOVAs would be “small”, but the F -stat in the third ANOVA “big”?
Basically, the first and second ANOVAs may mischaracterize the data and
sources of variation. The third ANOVA is “valid,” but we’d like a more
specific result: we’d like to know which factors are sources of variation, and
the relative magnitude of their e↵ects. Also, if the e↵ects of one factor are
consistent across levels of the other, maybe we don’t need to have a separate
parameter for each of the 12 treatment combinations, i.e. a simpler model
may suffice.

6.2 Additive e↵ects model

Yi,j,k = µ + ai + bj + ✏i,j,k , i = 1, . . . , m1 , j = 1, . . . , m2 , k = 1, . . . , n
µ = overall mean;
a1 , . . . , am1 = additive e↵ects of factor 1;
b1 , . . . , bm2 = additive e↵ects of factor 2.
Notes:
1. Side conditions: As with with the treatment e↵ects model in the one-
factor case, we only need m1 1 parameters to di↵erentiate between
m1 means, so we usually
• restrict a1 = 0, b1 = 0 (set-to-zero side conditions ), OR
P P
• restrict ai = 0, bj = 0 (sum-to-zero side conditions).
2. The additive model is a reduced model : There are m1 ⇥ m2 groups
or treatment combinations, and a full model fits a di↵erent population
mean separately to each treatment combination, requiring m1 ⇥ m2
parameters. In contrast, the additive model only has
1 parameter for µ
m1 1 parameters for ai ’s
m2 1 parameters for bj ’s
m1 + m2 1 parameters total.
CHAPTER 6. FACTORIAL DESIGNS 124

Parameter estimation and ANOVA decomposition:

yijk = ȳ··· + (ȳi·· ȳ··· ) + (ȳ·j· ȳ··· ) + (yijk ȳi·· ȳ·j· + ȳ··· )
= µ̂ + âi + b̂j + ✏ˆijk
These are the least-squares parameter estimates, under the sum-to-zero side
conditions: X X
âi = (ȳi·· ȳ··· ) = nȳ··· nȳ··· = 0

To obtain the set-to-zero side conditions, add â1 and b̂1 to µ̂, subtract â1
from the âi ’s, and subtract b̂1 from the b̂j ’s. Note that this does not change
the fitted value in each group:

fitted(yijk ) = µ̂ + âi + b̂j

= (µ̂ + â1 + b̂1 ) + (âi â1 ) + (b̂j b̂1 )
= µ̂⇤ + â⇤i + b̂⇤j
As you might have guessed, we can write this decomposition out as vectors
of length m1 ⇥ m2 ⇥ n:

y ȳ··· = â + b̂ + ˆ✏
vT = v1 + v2 + ve
The columns represent
vT variation of the data around the grand mean;
v1 variation of factor 1 means around the grand mean;
v2 variation of factor 2 means around the grand mean;
ve variation of the data around fitted the values.
You should be able to show that these vectors are orthogonal, and so
P P P P P P 2 P P P 2 P P P 2
i j k (yijk ȳ··· )2 = i j k âi + i j k b̂i + i j k✏
ˆi
SSTotal = SSA + SSB + SSE

Degrees of Freedom:
• â contains m1 di↵erent numbers but sums to zero ! m1 1 dof
• b̂ contains m2 di↵erent numbers but sums to zero ! m2 1 dof
CHAPTER 6. FACTORIAL DESIGNS 125

ANOVA table

Source SS df MS F
A SSA m1 1 SSA/dfA MSA/MSE
B SSB m2 1 SSB/dfB MSB/MSE
Error SSE (m1 1)(m2 1) + m1 m2 (n 1) SSE/dfE
Total SSTotal m1 m2 n 1
> anova ( lm ( dat$y ˜ d a t $ t y p e+d a t $ d e l i v e r y ) )
A n a l y s i s o f V a r i a n c e Table

Response : dat$y
Df Sum Sq Mean Sq F v a l u e
dat$type 2 0.34877 0.17439 71.708
dat$delivery 3 0.20414 0.06805 27.982
Residuals 42 0 . 1 0 2 1 4 0.00243

This ANOVA has decomposed the variance in the data into the variance of
additive Type e↵ects, additive Delivery e↵ects, and residuals. Does this
adequately represent what is going on in the data? What do we mean by
additive? Assuming the model is correct, we have:

E[Y |type=I, delivery=A] = µ + a1 + b1

E[Y |type=II, delivery=A] = µ + a2 + b1

This says that the di↵erence between Type I and Type II is a1 a2 regardless
of Delivery. Does this look right based on the plots? Consider the following
table:
E↵ect of Type I vs II, given Delivery
Delivery full model additive model
A µIA µIIA (µ + a1 + b1 ) (µ + a2 + b1 ) = a1 a2
B µIB µIIB (µ + a1 + b2 ) (µ + a2 + b2 ) = a1 a2
C µIC µIIC (µ + a1 + b3 ) (µ + a2 + b3 ) = a1 a2
D µID µIID (µ + a1 + b4 ) (µ + a2 + b4 ) = a1 a2

• The full model allows di↵erences between Types to vary across levels
of Delivery

• The reduced/additive model says di↵erences are constant across levels

of Delivery.
CHAPTER 6. FACTORIAL DESIGNS 126

Therefore, the reduced model is appropriate if

(µIA µIIA ) = (µIB µIIB ) = (µIC µIIC ) = (µID µIID )

How can we test for this? Consider the following parameterization of the
full model:

Interaction model:

Yijk = µ + ai + bj + (ab)ij + ✏ijk

µ = overall mean;

a1 , . . . , am1 = additive e↵ects of factor 1;

b1 , . . . , bm2 = additive e↵ects of factor 2.

(ab)ij = interaction terms = deviations from additivity.

The interaction term is a correction for non-additivity of the factor e↵ects.
This is a full model: It fits a separate mean for each treatment combination:

E(Yijk ) = µij = µ + ai + bj + (ab)ij

Parameter estimation and ANOVA decomposition:

yijk = ȳ··· + (ȳi·· ȳ··· ) + (ȳ·j· ȳ··· ) + (ȳij· ȳi·· ȳ·j· + ȳ··· ) + (yijk ȳij· )
= µ̂ + âi + b̂j + ˆ
(ab) + ✏ˆijk
ij

Note that the interaction term is equal to the fitted value under the full
model (yij ) minus the fitted value under the additive model (ȳi· + ȳ·j ȳ·· ).
Deciding between the additive/reduced model and the interaction/full model
is tantamount to deciding if the variance explained by the (ab)ˆ ’s is large or
ij
not, i.e. whether or not the full model is close to the additive model.

6.3 Evaluating additivity:

Recall the additive model

yijk = µ + ai + bj + ✏ijk
CHAPTER 6. FACTORIAL DESIGNS 127

• i = 1, . . . , m1 indexes the levels of factor 1

• j = 1, . . . , m2 indexes the levels of factor 1
• k = 1, . . . , n indexes the replications for each of the m1 ⇥ m2 treatment
combinations.
Parameter estimates are obtained via the following decomposition:
yijk = ȳ··· + (ȳi·· ȳ··· ) + (ȳ·j· ȳ··· ) + (yijk ȳi·· ȳ·j· + ȳ··· )
= µ̂ + âi + b̂j + ✏ˆijk
Fitted value:
ŷijk = µ̂ + âi + b̂j
= ȳ··· + (ȳi·· ȳ··· ) + (ȳ·j· ȳ··· )
= ȳi·· + ȳ·j· ȳ···

Note that in this model the fitted value in one cell depends on data
from the others.
Residual:
✏ˆijk = yijk ŷijk
= (yijk ȳi·· ȳ·j· + ȳ··· )

As we discussed, this model is “adequate” if the di↵erences across levels of

one factor don’t depend on the level of the other factor, i.e.
• ȳ1j· ȳ2j· ⇡ â1 â2 for all j = 1, . . . , m2 , and

• ȳi1· ȳi2· ⇡ b̂1 b̂2 for all i = 1, . . . , m1 .

This adequacy can be assessed by looking di↵erences between the sample
means from each cell and the fitted values of these averages under the additive
model.
ȳij· (µ̂ + âi + b̂j ) = ȳij· ȳi·· ȳ·j· + ȳ···
ˆ
⌘ (ab) ij

ˆ
The term (ab) ij measures the deviation of the cell means to the estimated
additive model. It is called an interaction. It does not measure how factor 1
“interacts” with factor 2. It measures how much the data deviate from the
additive e↵ects model.
CHAPTER 6. FACTORIAL DESIGNS 128

Interactions and the full model: The interaction terms also can be
derived by taking the additive decomposition above one step further: The
residual in the additive model can be written:

✏ˆA
ijk = yijk ȳi·· ȳ·j· + ȳ···
= (yijk ȳij· ) + (ȳij· ȳi·· ȳ·j· + ȳ··· )
ˆ
= ✏ˆI + (ab)
ijk ij

This suggests the following full model, or interaction model

yijk = µ + ai + bj + (ab)ij + ✏ijk

with parameter estimates obtained from the following decomposition:

yijk = ȳ··· + (ȳi·· ȳ··· ) + (ȳ·j· ȳ··· ) + (ȳij· ȳi·· ȳ·j· + ȳ··· ) + (yijk ȳij· )
= µ̂ + âi + b̂j + ˆ ij
(ab) + ✏ˆijk

Fitted value:
ˆ ij
ŷijk = µ̂ + âi + b̂j + (ab)
= ȳij· = µ̂ij

This is a full model for the treatment means: The estimate of the mean
in each cell depends only on data from that cell. Contrast this to
additive model.

Residual:

✏ˆijk = yijk ŷijk

= yijk ȳij·

Thus the full model ANOVA decomposition partitions the variability among
the cell means ȳ11· , ȳ12· , . . . , ȳm1 m2 · into

• the overall mean µ̂

• additive treatment e↵ects âi , b̂j

ˆ = ȳij·
• what is “leftover” : (ab) (µ̂ + âi + b̂j )
ij
CHAPTER 6. FACTORIAL DESIGNS 129

As you might expect, these di↵erent parts are orthogonal, resulting in the
following orthogonal decomposition of the variance.
var explained by add model + error in add model
var explained by full model + error in full model
Total SS = SSA + SSB + SSAB + SSE
Example: (Poison) 3 ⇥ 4 two-factor CRD with 4 reps per treatment com-
bination.
Df Sum Sq Mean Sq F v a l u e
p o i s $ d e l i v 3 0.20414 0.06805 27.982
pois$type 2 0.34877 0.17439 71.708
R e s i d u a l s 42 0 . 1 0 2 1 4 0 . 0 0 2 4 3

Df Sum Sq Mean Sq F v a l u e
pois$deliv 3 0.20414 0.06805 28.3431
pois$type 2 0.34877 0.17439 72.6347
pois$deliv : pois$type 6 0.01571 0.00262 1.0904
Residuals 36 0.08643 0.00240

So notice

• 0.10214 = 0.01571 + 0.08643, that is SSEadd = SSABint + SSEint

• 42 = 6 + 36, that is dof(SSEadd ) = dof(SSABint ) + dof(SSEint )

• SSA, SSB, dof(A), dof(B) are unchanged in the two models

• MSEadd ⇡ MSEint , but degrees of freedom are larger in the additive

model. Which do you think is a better estimate of the within-group
variance?

Expected sums of squares:

• If H0 : (ab)ij = 0 is true, then

2
– E[MSE] =
2
– E[MSAB] =

• If H0 : (ab)ij = 0 is not true, then

2
– E[MSE] =
CHAPTER 6. FACTORIAL DESIGNS 130

2 2 2
– E[MSAB] = + r⌧AB > .

This suggests
• An evaluation of the adequacy of the additive model can be assessed
by comparing MSAB to MSE. Under H0 : (ab)ij = 0 ,

FAB = MSAB/MSE ⇠ F(m1 1)⇥(m2 1),m1 m2 (n 1)

Evidence against H0 can be evaluated by computing the p-value.

• If the additive model is adequate then MSEint and MSAB are two
independent estimates of roughly the same thing (why independent?).
We may then want to combine them to improve our estimate of 2 .

Df Sum Sq Mean Sq F v a l u e Pr(>F)

For these data, there is strong evidence of both treatment e↵ects, and little
evidence of non-additivity. We may want to use the additive model.

6.4 Inference for additive treatment e↵ects

Consider a two-factor experiment in which it is determined that the e↵ects
of factor F1 and F2 are large. Now we want to compare means across levels
of one of the factors.
Recall in the pesticide example we had 4 reps for each of 3 levels of Poison
type and 4 levels of Delivery. So we have 4 ⇥ 4 = 16 observations for each
level of Type.

The wrong approach: The two-sample t-test is

ȳ1·· ȳ2··
p
s12 2/(4 ⇥ 4)
For the above example,
• ȳ1·· ȳ2·· = 0.047
CHAPTER 6. FACTORIAL DESIGNS 131

2 22 2 2 2 2 2 2 2 22 2 2 2

1 1 1 111 111 11 1 1

0.1 0.2 0.3 0.4 0.5

rate

Figure 6.7: Comparison between types I and II, without respect to delivery.

• s12 = 0.081, n ⇥ m2 = 4 ⇥ 4 = 16.

• t-statistic = -1.638, df=30, p-value=0.112.
Questions:
• What is s212 estimating?
• What should we be comparing the factor level di↵erences to?
If Delivery is a known source of variation, we should compare di↵erences
between levels of Poison type to variability within a treatment
p combination,
i.e. 2 . For the above example, s12 = .081, whereas sMSE = 0.00240 ⇡ 0.05,
a ratio of about 1.65.
ȳ1·· ȳ2··
p = 2.7, p-value ⇡ 0.01
sMSE 2/(4 ⇥ 4)

Testing additive e↵ects Let µij be the population mean in cell ij. The
relationship between the cell means model and the parameters in the inter-
action model are as follows:
µij = µ·· + (µi· µ·· ) + (µ·j µ·· ) + (µij µi· µj· + µ·· )
= µ + ai + bj + (ab)ij
and so
CHAPTER 6. FACTORIAL DESIGNS 132

2 22 2 22 2 2 2 2 22 2 2 2

1 11 111 11
1 11 1 1

0.1 0.2 0.3 0.4 0.5

rate

Figure 6.8: Comparison between types I and II, with delivery in color.

P
• i ai = 0
P
• j bj =0
P P
• i (ab)ij = 0 for each j, j (ab)ij = 0 for each i

Suppose we are interested in the di↵erence between F1 = 1 and F1 = 2. With

the above representation we can show that
1 X 1 X 1 X
µ1j µ2j = (µ1j µ2j )
m2 j m2 j m2 j
1 X
= ([µ + a1 + bj + (ab)1j ] [µ + a2 + bj + (ab)2j ])
m2 j
!
1 X X X
= (a1 a2 ) + (ab)1j (ab)2j
m2 j j j
= a1 a2

and so the “e↵ect of F1 = 1 versus F1 = 2” = a1 a2 can be viewed

as a contrast of cell means. Note that a1 a2 is a contrast of treatment
CHAPTER 6. FACTORIAL DESIGNS 133

(population) means:
F2 = 1 F2 = 2 F2 = 3 F2 = 4
F1 = 1 µ11 µ12 µ13 µ14 4µ̄1·
F1 = 2 µ21 µ22 µ23 µ24 4µ̄2·
F1 = 3 µ31 µ32 µ33 µ34 4µ̄3·
3µ̄·1 3µ̄·2 3µ̄·3 3µ̄·4 12µ̄··
So
a1 a2 = µ̄1· µ̄2·
= (µ11 + µ12 + µ13 + µ14 )/4 (µ21 + µ22 + µ23 + µ24 )/4
Like any contrast, we can estimate/make inference for it using contrasts of
sample means:
a1 a2 = â1 â2 = ȳ1·· ȳ2·· is an unbiased estimate of a1 a2
Note that this estimate is the corresponding contrast among the m1 ⇥ m2
sample means:

F2 = 1 F2 = 2 F2 = 3 F2 = 4
F1 = 1 ȳ11· ȳ12· ȳ13· ȳ14· 4ȳ1··
F1 = 2 ȳ21· ȳ22· ȳ23· ȳ24· 4ȳ2··
F1 = 3 ȳ31· ȳ32· ȳ33· ȳ34· 4ȳ3··
3ȳ·1· 3ȳ·2· 3ȳ·3· 3ȳ·4· 12ȳ···
So
â1 â2 = ȳ1·· ȳ2··
= (ȳ11· + ȳ12· + ȳ13· + ȳ14· )/4 (ȳ21· + ȳ22· + ȳ23· + ȳ24· )/4
Hypothesis tests and confidence intervals can be made using the standard
assumptions:
• E[â1 â2 ] = a1 a2
• Under the assumption of constant variance:
Var[â1 â2 ] = Var[ȳ1·· ȳ2·· ]
= Var[ȳ1·· ] + Var[ȳ2·· ]
= 2 /(n ⇥ m2 ) + 2 /(n ⇥ m2 )
= 2 2 /(n ⇥ m2 )
CHAPTER 6. FACTORIAL DESIGNS 134

• Under the assumption that the data are normally distributed

p
â1 â2 ⇠ normal(a1 a2 , 2/(n ⇥ m2 ))
and is independent of MSE (why?)
• So
(â1 â ) (a1 a2 ) (â1 â2 ) (a1 a2 )
q2 = ⇠ t⌫
2
MSE n⇥m SE[â1 â2 ]
2

2
where ⌫ are the degrees of freedom associated with our estimate of ,
i.e. the residual degrees of freedom in our model.
– ⌫ = m1 m2 (n 1) under the full/interaction model
– ⌫ = m1 m2 (n 1) + (m1 1)(m2 1) under the reduced/additive
model
Review: Explain the degrees of freedom for the two models.

t-test: Reject H0 : a1 = a2 if
â â2
q 1 > t1 ↵C /2,⌫
2
MSE n⇥m 2
r
2
|â1 â2 | > MSE ⇥ t1 ↵C /2,⌫
n ⇥ m2
So the quantity
LSD1 = t1 ↵C /2,⌫ ⇥ SE(ˆ
↵1 ↵ˆ2)
r
2
= t1 ↵C /2,⌫ ⇥ MSE
n ⇥ m2
is a “yardstick” for comparing levels of factor 1. It is sometimes called the
least significant di↵erence for comparing levels of Factor 1. It is analogous
to the LSD we used in the 1-factor ANOVA.
Important note: The LSD depends on which factor you are looking at:
The comparison of levels of Factor 2 depends on
Var[ȳ·1· ȳ·2· ] = Var[ȳ·1· ] + Var[ȳ·2· ]
= 2 /(n ⇥ m1 ) + 2 /(n ⇥ m1 )
= 2 2 /(n ⇥ m1 )
CHAPTER 6. FACTORIAL DESIGNS 135

So the LSD for factor 2 di↵erences is

r
2
LSD2 = t1 ↵C /2,⌫ ⇥ MSE
n ⇥ m1

Df Sum Sq Mean Sq F v a l u e Pr(>F)

pois$deliv 3 0.20414 0 . 0 6 8 0 5 2 8 . 3 4 3 1 1 . 3 7 6 e 09 ⇤⇤⇤
pois$type 2 0.34877 0 . 1 7 4 3 9 7 2 . 6 3 4 7 2 . 3 1 0 e 13 ⇤⇤⇤
pois$deliv : pois$type 6 0.01571 0.00262 1.0904 0.3867
Residuals 36 0.08643 0.00240
There is not very much evidence that the e↵ects are not additive. Let’s
assume there is no interaction term. If we are correct then we will have
increased the precision of our variance estimate.
Df Sum Sq Mean Sq F v a l u e Pr(>F)
p o i s $ d e l i v 3 0.20414 0 . 0 6 8 0 5 2 7 . 9 8 2 4 . 1 9 2 e 10 ⇤⇤⇤
pois$type 2 0.34877 0 . 1 7 4 3 9 7 1 . 7 0 8 2 . 8 6 5 e 14 ⇤⇤⇤
R e s i d u a l s 42 0 . 1 0 2 1 4 0.00243
So there is strong evidence against the hypothesis that the additive e↵ects
are zero for either factor. Which treatments within a factor are di↵erent from
each other?

e↵ects of poison type: At ↵C = 0.05,

LSDtype = t.975,42 ⇥ SE[â1 â2 ]
r
2
= 2.018 ⇥ .0024
4⇥4
= 2.018 ⇥ 0.0173 = 0.035
Type Mean LSD Grouping
I 0.18 1
II 0.23 2
III 0.38 3

e↵ects of poison delivery: At ↵C = 0.05,

LSDtype = t.975,42 ⇥ SE[b̂1 b̂2 ]
r
2
= 2.018 ⇥ .0024
4⇥3
= 2.018 ⇥ 0.02 = 0.040
CHAPTER 6. FACTORIAL DESIGNS 136
0.5

0.5
0.4

0.4
0.3

0.3
0.2

0.2
0.1

0.1
I II III A B C D

Figure 6.9: Marginal plots of the data.

Type Mean LSD Grouping

B 0.19 1
D 0.21 1
C 0.29 2
A 0.35 3
Note the di↵erences between these comparisons and those from two-sample
t-tests.

Interpretation of estimated additive e↵ects: If the additive model is

clearly wrong, can we still interpret additive e↵ects? The full model is

yijk = µij + ✏ijk

A reparameterization of this model is the interaction model:

yijk = µ + ai + bj + (ab)ij + ✏ijk

where
1
P P
• µ= m1 m2 i j µij = µ̄··
CHAPTER 6. FACTORIAL DESIGNS 137

1
P
• ai = m2 j (µij µ̄·· ) = µ̄i· µ̄··
1
P
• bj = m1 i (µij µ̄·· ) = µ̄·j µ̄··

• (ab)ij = µij µ̄i· µ̄·j + µ̄·· = µij (µ + ai + bj )

The terms {a1 , . . . , am1 } , {b1 , . . . , bm2 } are sometimes called “main e↵ects”.
The additive model is

yijk = µ + ai + bj + ✏ijk

Sometimes this is called the “main e↵ects” model. If this model is correct,
it implies that

(ab)ij = 0 8i, j ,
µi1 j µi2 j = ai1 ai2 8i1 , i2 , j ,
µij1 µij2 = bj1 bj2 8i, j1 , j2

What are the estimated additive e↵ects estimating, in either case?

â1 â2 = (ȳ1·· ȳ··· ) (ȳ2·· ȳ··· )

= ȳ1·· ȳ2··
m2 m2
1 X 1 X
= ȳ1j· ȳ2j·
m2 j=1 m2 j=1
m2
1 X
= (ȳ1j· ȳ2j· ) (2)
m2 j=1

Now m2 m2
1 X 1 X
E[ (ȳ1j· ȳ2j· )] = (µ1j µ2j ) = a1 a2 ,
m2 j=1 m2 j=1

so â1 â2 is estimating a1 a2 regardless if additivity is correct or not. Now,

how do we interpret this e↵ect?

• Regardless of additivity, â1 â2 can be interpreted as the estimated

di↵erence in response between having factor 1 =1 and factor 1=2, av-
eraged over the experimental levels of factor 2.
CHAPTER 6. FACTORIAL DESIGNS 138

• If additivity is correct, â1 â2 can further be interpreted as the es-

timated di↵erence in response between having factor 1 =1 and factor
1=2, for every level of factor 2 in the experiment.
Statistical folklore suggests that if there is significant non-additivity, then
you can’t interpret main/additive e↵ects. As we can see, this is not true:
the additive e↵ects have a very definite interpretation under the full model.
In some cases (block designs, coming up), we may be interested in additive
e↵ects even if there is significant interaction.

Dissecting the interaction: Sometimes if there is an interaction, we

might want to go in and compare individual cell means. Consider the follow-
ing table of means from a 2 ⇥ 3 two-factor ANOVA.
ȳ11· ȳ12· ȳ13· ȳ14·
ȳ21· ȳ22· ȳ23· ȳ24·
A large interaction SS in the ANOVA table gives us evidence, for example,
that µ1j µ2j varies across levels j of factor 2. It may be useful to dissect
this variability further, and understand how the non-additivity is manifested
in the data: For example, consider the three plots in Figure 4.10. These all
would give a large interaction SS, but imply very di↵erent things about the
e↵ects of the factors.

Contrasts for examining interactions: Suppose we want to compare

the e↵ect of (factor 1=1) to (factor 1=2) across levels of factor 2. This
involves contrasts of the form:

C = (µ1j µ2j ) (µ1k µ2k )

This contrast can be estimated with the sample contrast:
Ĉ = (ȳ1j· ȳ2j· ) (ȳ1k· ȳ2k· )
As usual, the standard error of this contrast is the estimate of its standard
deviation:
2
Var[Ĉ] = /r + 2 /r + 2
/r + 2
/r = 4 2 /r
p
SE[Ĉ] = 2 MSE/n
Confidence intervals and t-tests for C can be made in the usual way.
CHAPTER 6. FACTORIAL DESIGNS 139

●
8
6

●
4

●
2

1.1 2.1 1.2 2.2 1.3 2.3 1.4 2.4

7
6
5
4
3

1.1 2.1 1.2 2.2 1.3 2.3 1.4 2.4

8
7
6
5
4
3

●
2

1.1 2.1 1.2 2.2 1.3 2.3 1.4 2.4

Figure 6.10: Three datasets exhibiting non-additive e↵ects.

CHAPTER 6. FACTORIAL DESIGNS 140

6.5 Randomized complete block designs

Recall our first design:

CRD: to assess e↵ects of a single factor, say F1 , on response, we randomly

allocate levels of F1 to experimental units. Typically, one hopes the e.u.’s
are homogeneous or nearly so:

• scientifically: If the units are nearly homogeneous, then any observed

variability in response can be attributed to variability in factor levels.

• statistically: If the units are nearly homogeneous, then MSE will be

small, confidence intervals will be precise and hypothesis tests powerful.

But what if the e.u.’s are not homogeneous?

Example: Let F2 be a factor that is a large source of variation/heterogene-

ity, but is not recorded (age of animals in experiment, gender, field plot
conditions, soil conditions).
ANOVA
Source SS MS F-ratio
If SS2 is large,
F1 SS1 MS1 =SS1/(m1 1) MS1/MSE
(F2+Error) SS2+SSE (SS2+SSE)/N m1
F-stat for F1 may be small.
If a factor

1. a↵ects response

2. varies across experimental units

then it will increase the variance in response and also the experimental error
variance/MSE if unaccounted for. If F2 is a known, potentially large source
of variation, we can control for it pre-experimentally with a block design.

Blocking: The stratification of experimental units into groups that are more
homogeneous than the whole.

Objective: To have less variation among units within blocks than between
blocks.
CHAPTER 6. FACTORIAL DESIGNS 141

dry

wet
−−−−−−−−−−−−−− irrigation −−−−−−−−−−−−−−

Figure 6.11: Experimental material in need of blocking.

Typical blocking criteria:

• location

• physical characteristics

• time

Example(Nitrogen fertilizer timing): How does the timing of nitrogen

additive a↵ect nitrogen uptake?

• Treatment: Six di↵erent timing schedules 1, . . . , 6: Level 4 is “stan-

dard”
2
• Response: Nitrogen uptake (ppm⇥10 )

• Experimental material: One irrigated field

Soil moisture is thought to be a source of variation in response.

CHAPTER 6. FACTORIAL DESIGNS 142

treatment and data versus location

2 5 4 1 6 3
1

40.89 37.99 37.18 34.98 34.89 42.07

1 3 4 6 5 2
2

41.22 49.42 45.85 50.15 41.99 46.69

row

6 3 5 1 2 4
3

44.57 52.68 37.61 36.94 46.65 40.23

2 4 6 5 3 1
4

41.9 39.2 43.29 40.45 42.91 39.97

1 2 3 4 5 6
column

Figure 6.12: Results of the experiment

Design:

1. Field is divided into a 4 ⇥ 6 grid.

2. Within each row or block, each of the 6 treatments are randomly allo-
cated.

1. The experimental units are blocked into presumably more homogeneous

groups.

2. The blocks are complete, in that each treatment appears in each block.

3. The blocks are balanced, in that there are

• m1 = 6 observations for each level of block

• m2 = 4 observations for each level of trt
• n = 1 observation for each trt⇥block combination.

This design is a randomized complete block design.

CHAPTER 6. FACTORIAL DESIGNS
50
143

50
45

45
Ni

Ni
40

40
35

1 2 3 4 5 6 1 2 3 4
treatment row
treatment and residual versus location

2 5 4 1 6 3
1

−3.14 −1.52 −3.44 −3.3 −8.33 −4.7

1 3 4 6 5 2
2

2.94 2.65 5.24 6.92 2.48 2.66

row

6 3 5 1 2 4
3

1.34 5.91 −1.9 −1.34 2.62 −0.39

2 4 6 5 3 1
4

−2.13 −1.41 0.06 0.94 −3.86 1.69

1 2 3 4 5 6
column

Figure 6.13: Marginal plots, and residuals without controlling for row.
CHAPTER 6. FACTORIAL DESIGNS 144

Analysis of the RCB design with one rep: Analysis proceeds just as
in the two-factor ANOVA:
yij ȳ·· = (ȳi· ȳ·· ) + (ȳ·j ȳ·· ) + (yij ȳi· ȳ·j + ȳ·· )
SSTotal = SSTrt + SSB + SSE

ANOVA table

Source SS dof MS F-ratio

Trt SST m1 1 SST/(m1 1) MST/MSE
Block SSB m2 1 SSB/(m2 1) (MSB/MSE)
Error SSE (m1 1)(m2 1) SSE/(m1 1)(m2 1)
Total SSTotal m1 m2 1

ANOVA for nitrogen example:

Source SS dof MS F-ratio p-value
Trt 201.32 5 40.26 5.59 0.004
Block 197.00 3 65.67 9.12
Error 108.01 15 7.20
Total 506.33 23

Discussion: Consider the following three ANOVA decompositions:

#######
> anova ( lm ( c ( y ) ˜ a s . f a c t o r ( c ( t r t ) ) ))
Df Sum Sq Mean Sq F v a l u e Pr(>F)
as . f a c t o r ( c ( t r t ) ) 5 201.316 40.263 2.3761 0.08024 .
Residuals 18 3 0 5 . 0 1 2 1 6 . 9 4 5
#######

#######
> anova ( lm ( c ( y ) ˜ a s . f a c t o r ( c ( t r t ) ) + a s . f a c t o r ( c ( rw ) ) ) )
Df Sum Sq Mean Sq F v a l u e Pr(>F)
a s . f a c t o r ( c ( t r t ) ) 5 2 0 1 . 3 1 6 4 0 . 2 6 3 5 . 5 9 1 7 0 . 0 0 4 1 9 1 ⇤⇤
a s . f a c t o r ( c ( rw ) ) 3 1 9 7 . 0 0 4 6 5 . 6 6 8 9 . 1 1 9 8 0 . 0 0 1 1 1 6 ⇤⇤
Residuals 15 1 0 8 . 0 0 8 7.201
#######

#######
> anova ( lm ( c ( y ) ˜ a s . f a c t o r ( c ( t r t ) ) : a s . f a c t o r ( c ( rw ) ) ) )
Df Sum Sq Mean Sq F v a l u e Pr(>F)
CHAPTER 6. FACTORIAL DESIGNS 145

a s . f a c t o r ( c ( t r t ) ) : a s . f a c t o r ( c ( rw ) ) 23 5 0 6 . 3 3 22.01
Residuals 0 0.00
#######
Can we test for interaction? Do we care about interaction in this case, or just
main e↵ects? Suppose it were true that “in row 2, timing 6 is significantly
better than timing 4, but in row 3, treatment 3 is better.” Is this relevant in
for recommending a timing treatment for other fields?

Did blocking help? Consider CRD as an alternative:

block 1 2 4 3 2 1 4
block 2 5 5 3 4 1 4
block 3 6 3 4 2 6 5
block 4 1 2 6 2 5 6
• Advantages:
– more possible treatment assignments, so power is increased in a
randomization test.
– If we don’t estimate block e↵ects, we’ll have more dof for error.
• Disadvantages:
– It is possible, (but unlikely) that some treatment level will get as-
signed many times to a “good” row, leading to post-experimental
bias.
– If “row” is a big source of variation, then ignoring it may lead to
an overly large MSE.

Consider comparing the F -statistic from a CRD with that from an RCB:
According to Cochran and Cox (1957)
SSB + n(m 1)MSErcbd
MSEcrd =
✓ nm ◆ 1 ✓ ◆
n 1 n(m 1)
= MSB + MSErcbd
nm 1 nm 1
In general, the e↵ectiveness of blocking is a function of MSEcrd /MSErcb . If
this is large, it is worthwhile to block. For the nitrogen example, this ratio
is about 2.
CHAPTER 6. FACTORIAL DESIGNS 146

6.6 Unbalanced designs

Example: Observational study of 20 fatal accidents.
• Response: y = speed in excess of speed limit

• Recorded sources of variation:

1. R =rainy (rainy/not rainy)

2. I =interstate (interstate/two-lane highway),
cell means sum marginal means
interstate two-lane
rainy 15 5 130 13
n11 = 8 n12 = 2 n1· = 10
not rainy 20 10 120 12
n21 = 2 n22 = 8 n1· = 10
sum 160 90 250
n·1 = 10 n·2 = 10 n·· = 20
marginal mean 16 9 ȳ··· = 12.5
Let’s naively compute sums of squares based on the decomposition:

yijk = ȳ··· + (ȳij· ȳ··· ) + (yijk ȳij· )

yijk = ȳ··· + (ȳi·· ȳ··· ) + (ȳ·j· ȳ··· ) + (ȳij· ȳi·· ȳ·j· + ȳ··· ) + (yijk ȳij· )

XXX XX
SSF = (ȳij· ȳ··· )2 = nij (ȳij· ȳ··· )2
i j k i j

= 8(15 12.5) + 2(5 12.5) + 2(20 12.5)2 + 8(10

2 2
12.5)2 = 325
SS1 = 10 ⇥ (13 12.5)2 + 10 ⇥ (12 12.5)2 = 5
SS2 = 10 ⇥ (16 12.5)2 + 10 ⇥ (10 12.5)2 = 245

Previously we had SS1+SS2+SS12 =SSF, so it seems we should have

SS12 = SSF SS1 SS2 = 325 5 245 = 75

This suggests some interaction. But look at the cell means:

• “no rain” - “rain” =5 regardless of interstate
CHAPTER 6. FACTORIAL DESIGNS 147

• “interstate” - “two=lane” =10 regardless of rain

There is absolutely no interaction! The problem is that SS1, SS2, SS12 are
not orthogonal (as computed this way) and so SSF 6= SS1 + SS2 + SSR12.
There are other things to be careful about too:

Unbalanced marginal means: Supposed someone asked, “are the acci-

dent speeds higher on rainy days or on clear days?”
• ȳrain ȳclear = 13 12 = 1: Marginal means suggest speeds were slightly
higher on average during rainy days than on clear days.

• Cell means show, for both road types, speeds were higher on clear
days by 5 mph on average.

Explanation: Cell frequencies are varying.

• Marginal means for rain are dominated by interstate accidents

• Marginal means for no rain are dominated by two-lane accidents

We say that the marginal e↵ects are unbalanced. How can we make sense
of marginal e↵ects in such a situation? How can we test for non-additivity?

Least-squares means: The solution to this paradox is to use “least squares

means”, which are based on on the cell means model:

yijk = µij + ✏ijk

1 X 1 XXX
µ̂ij = yijk , ˆ 2 = s2 = (yijk ȳij· )2
nij k N m1 m2 i j k
The least-squares marginal means are:
m2
1 X
µ̂i· = µ̂ij 6⌘ ȳi··
m2 j=1
m1
1 X
µ̂·j = µ̂ij 6⌘ ȳ·j·
m1 i=1

The idea:
CHAPTER 6. FACTORIAL DESIGNS 148

1. get estimates for each cell

2. average cell estimates to get marginal estimates.
1 2 ··· m2
1 ȳ11· ȳ12· ··· ȳ1m2 · µ̂1·
2 ȳ21· ȳ22· ··· ȳ2m2 · µ̂2·
.. ..
. .
m1 ȳm1 1· ȳm1 2· · · · ȳm1 m2 · µ̂m1 ·
µ̂·1 µ̂·2 · · · µ̂·m2

Accident example:

interstate two-lane marginal mean LS mean

rainy 15 5 13 10
not rainy 20 10 12 15
marginal mean 16 9
LS mean 17.5 7.5
Comparisons between means can be made in the standard way:

1
P
Standard errors: µ̂i· = m2
ȳij· , so
j

1 X 2
Var[µi· ] = /nij
m22 j
s
1 X MSE
SE[µ̂i· ] = and similarly
m2 j
nij
s
1 X MSE
SE[µ̂·j ] =
m1 i
nij

Now use the SE to obtain confidence intervals, t-tests, etc.

Testing for interaction: Consider

H0 : Yijk = µ + ai + bj + ✏ijk
H1 : Yijk = µ + ai + bj + (ab)ij + ✏ijk
How do we evaluate H0 when the data are unbalanced? We’ll first outline
the procedure, then discuss why it works.
CHAPTER 6. FACTORIAL DESIGNS 149
P P P
1. Compute SSEF = minµ,a,b,(ab) i j k (yijk [µ + ai + bj + (ab)ij ])2
P P P
2. Compute SSEA = minµ,a,b i j k (yijk [µ + ai + bj ])2

3. Compute SS12 = SSEA SSEF . Note that this is always positive.

Allowing for interaction improves the fit, and reduces error variance. SSI
measures the improvement in fit. If SSI is large, i.e. SSEA is much bigger than
SSEF , this suggests the additive model does not fit well and the interaction
term should be included in the model.

Testing:
MSI SSI/(m1 1)(m2 1)
F = =
MSE SSEF /(N m1 m2 )
Under H0 , F ⇠ F(m1 1)(m2 1),N m1 m2 , so a level-↵ test of H0 is

reject H0 if F > F1 ↵,(m1 1)(m2 1),N m1 m2 .

Note:

• SSI is the change in fit in going from the additive to the full model;

• (m1 1)(m2 1) is the change in number of parameters in going from

the additive to the full model.

A painful example: A small scale clinical trial was done to evaluate the
e↵ect of painkiller dosage on pain reduction for cancer patients in a variety
of age groups.

• Factors of interest:

– Dose 2 { Low, Medium, High }

– Age 2 { 41-50, 51-60, 61-70, 71-80 }

• Response = Change in pain index 2 { 5, 5}

• Design: CRD, each treatment level was randomly assigned to ten pa-
tients, not blocked by age.
CHAPTER 6. FACTORIAL DESIGNS 150
2

2
0

0
−2

−2
−4

●
−4
1 2 3 50 60 70 80

Figure 6.14: Marginal plots for pain data

> t a b l e ( t r t , ageg )
ageg
t r t 50 60 70 80
1 1 2 3 4
2 1 3 3 3
3 2 1 4 3

> t a p p l y ( y , t r t , mean )
1 2 3
0.381 0.950 2.131
>
> t a p p l y ( y , ageg , mean )
50 60 70 80
2.200 1.265 0.922 0.139

Do these marginal plots and means misrepresent the data? To evaluate this
possibility,

• compare the marginal plots in Figure 6.14 to the interaction plots in

Figure 6.15;

• compute the LS means and compare to marginal means.

CHAPTER 6. FACTORIAL DESIGNS 151
2
0
−2
−4

1.50 3.50 2.60 1.70 3.70 2.80

Figure 6.15: Interaction plots for pain data

> cellmeans< t a p p l y ( y , l i s t ( t r t , ageg ) , mean )

> cellmeans
50 60 70 80
1 4.90 1 . 4 0 0 0 0 0 1.066667 0.677500
2 2.40 2.996667 1.270000 0.300000
3 3.15 1.400000 2.152500 1.666667

> trt lsm < apply ( c e l l m e a n s , 1 , mean )

> age lsm < apply ( c e l l m e a n s , 2 , mean )

> trt lsm

1 2 3
0.4389583 0.3916667 2.0922917

> age lsm

50 60 70 80
1.8833333 0.9988889 0.7852778 0.2297222
What are the di↵erences between LS means and marginal means? Not as
extreme as in the accident example, but the di↵erences can be explained by
looking at the interaction plot, and the slight imbalance in the design:
• The youngest patients (ageg=50) were imbalanced towards the higher
CHAPTER 6. FACTORIAL DESIGNS 152

dose, so we might expect their marginal mean to be too low. Observe

the change from the marginal mean = -2.2 to the LS mean = -1.883.

• The oldest patients (ageg=80) were imbalanced towards the lower dose,
so we might expect their marginal mean to be too high. Observe the
change from the marginal mean = -.14 to the LS mean = -.23 .

Let’s look at the main e↵ects in our model:

> trt coef < t r t l s m mean ( c e l l m e a n s )
> age coef < age lsm mean ( c e l l m e a n s )

> trt coef

1 2 3
0.5353472 0.5826389 1.1179861

> age coef

50 60 70 80
0.90902778 0.02458333 0.18902778 0.74458333

What linear modeling commands in R will get you the same thing?
> o p t i o n s ( c o n t r a s t s=c ( ” c o n t r . sum ” , ” c o n t r . p o l y ” ) )
> f i t f u l l < lm ( y˜ a s . f a c t o r ( ageg ) ⇤ a s . f a c t o r ( t r t ) )

> fit full$coef [2:4]

a s . f a c t o r ( ageg ) 1 a s . f a c t o r ( ageg ) 2 a s . f a c t o r ( ageg ) 3
0.90902778 0.02458333 0.18902778

> fit full$coef [5:6]

as . f a c t o r ( t r t )1 as . f a c t o r ( t r t )2
0.5353472 0.5826389

Note that the coefficients in the reduced/additive model are not the same:
> f i t a d d < lm ( y˜ a s . f a c t o r ( ageg )+ a s . f a c t o r ( t r t ) )

> fit add$coef [2:4]

a s . f a c t o r ( ageg ) 1 a s . f a c t o r ( ageg ) 2 a s . f a c t o r ( ageg ) 3
0.7921041 0.3593577 0.3049595

> fit add$coef [5:6]

as . f a c t o r ( t r t )1 as . f a c t o r ( t r t )2
1.208328354 0.002085645
CHAPTER 6. FACTORIAL DESIGNS 153

6.7 Non-orthogonal sums of squares:

Consider the following ANOVA table obtained from R:
> anova ( lm ( y˜ a s . f a c t o r ( ageg )+ a s . f a c t o r ( t r t ) ) )
Df Sum Sq Mean Sq F v a l u e Pr(>F)
a s . f a c t o r ( ageg ) 3 1 3 . 3 5 5 4.452 0.9606 0.42737
as . f a c t o r ( t r t ) 2 28.254 14.127 3.0482 0.06613 .
Residuals 24 1 1 1 . 2 3 0 4.635

It might be somewhat unsettling that R also produces the following table:

> anova ( lm ( y˜ a s . f a c t o r ( t r t )+ a s . f a c t o r ( ageg ) ) )
Df Sum Sq Mean Sq F v a l u e Pr(>F)
as . f a c t o r ( t r t ) 2 31.588 15.794 3.4079 0.0498 ⇤
a s . f a c t o r ( ageg ) 3 1 0 . 0 2 1 3.340 0.7207 0.5494
Residuals 24 1 1 1 . 2 3 0 4.635

Where do these sums of squares come from? What do the F -tests represent?
By typing “?anova.lm” in R we see that anova() computes
“a sequential analysis of variance table for that fit. That is, the
reductions in the residual sum of squares as each term of the formula
is added in turn are given in as the rows of a table, plus the residual
sum of squares.”

Sequential sums of squares: Suppose in a linear model we have three

sets of parameters, say A, B and C. For example, let

• A be the main e↵ects of factor 1

• B be the main e↵ects of factor 2

• C be their interaction.

Consider the following calculation:

0. Calculate SS0 = residual sum of squares from the model

(0) yijk = µ + ✏ijk

1. Calculate SS1 = residual sum of squares from the model

(A) yijk = µ + ai + ✏ijk

CHAPTER 6. FACTORIAL DESIGNS 154

2. Calculate SS2 = residual sum of squares from the model

(AB) yijk = µ + ai + bj + ✏ijk

3. Calculate SS3 = residual sum of squares from the model

(ABC) yijk = µ + ai + bj + (ab)ij + ✏ijk

We can assess the importance of A, B and C sequentially as follows:

• SSA = improvement in fit in going from (0) to (A) = SS0-SS1

• SSB—A = improvement in fit in going from (A) to (AB) = SS1-SS2

• SSC—AB = improvement in fit in going from (AB) to (ABC) = SS2-

SS3

This is actually what R presents in an ANOVA table:

> s s 0 < sum ( lm ( y ˜1 ) $ r e s ˆ2 )
> s s 1 < sum ( lm ( y˜ a s . f a c t o r ( ageg ) ) $ r e s ˆ2 )
> s s 2 < sum ( lm ( y˜ a s . f a c t o r ( ageg )+ a s . f a c t o r ( t r t ) ) $ r e s ˆ2 )
> ss3<

> s0 s s 1
[ 1 ] 13.3554
>
> ss1 ss2
[ 1 ] 28.25390
>
> ss2 ss3
[ 1 ] 53.75015

> ss3
[ 1 ] 57.47955

> anova ( lm ( y˜ a s . f a c t o r ( ageg ) ⇤ a s . f a c t o r ( t r t ) ) )

Df Sum Sq Mean Sq F v a l u e Pr(>F)
a s . f a c t o r ( ageg ) 3 13.355 4.452 1.3941 0.27688
as . f a c t o r ( t r t ) 2 28.254 14.127 4.4239 0.02737 ⇤
a s . f a c t o r ( ageg ) : a s . f a c t o r ( t r t ) 6 5 3 . 7 5 0 8.958 2.8054 0.04167 ⇤
Residuals 18 5 7 . 4 8 0 3.193

Why does order of the variables matter?

CHAPTER 6. FACTORIAL DESIGNS 155

• In a balanced design, the parameters are orthogonal, and SSA = SSA—B

, SSB = SSB—A and so on, so the order doesn’t matter.
• In an unbalanced design, the estimates of one set of parameters depends
on whether or not you are estimating the others, i.e. they are not
orthogonal, and in general SSA 6= SSA—B , SSB 6= SSB—A.
I will try to draw a picture of this on the board.

The bottom line: For unbalanced designs, there is no “variance due to

factor 1” or “variance due to factor 2”. There is only “extra variance due
to factor 1, beyond that explained by factor 2”, and vice versa. This is
essentially because of the non-orthogonality, and so the part of the variance
that can be explained by factor 1 overlaps with the part that can be explained
by factor 2. This will become more clear when you get to regression next
quarter.

6.8 Analysis of covariance

Example(Oxygen ventilation): Researchers are interested in measuring
the e↵ects of exercise type on maximal oxygen uptake.
Experimental units: 12 healthy males between 20 and 35 who didn’t ex-
ercise regularly.
Design: Six subjects randomly selected to a 12-week step aerobics program,
the remaining to 12-weeks of outdoor running on flat terrain.
Response: Before and after the 12 week program, each subject’s O2 uptake
was tested while on an inclined treadmill.
y = change in O2 uptake

Initial analysis: CRD with one two-level factor. The first thing to do is
plot the data. The first panel of Figure 6.16 indicates a moderately large
di↵erence in the two sample populations. The second thing to do is a two-
sample t-test:

ȳA ȳB 7.705 ( 2.767)

tobs = p = = 2.907, Pr(|t10 | 2.907) = 0.0156
sp 2/6 6.239 ⇥ .577
CHAPTER 6. FACTORIAL DESIGNS 156

10
A
15
5 10

5
o2_change

residuals
B
B

0
B
0

A
−10 −5

−5
A
A
B

A B 20 22 24 26 28 30
grp age

Figure 6.16: Oxygen uptake data

So we have reasonably strong evidence of a di↵erence in treatment e↵ects.

However, a plot of residuals versus age of the subject indicates a couple of
causes for concern with our inference:
1. The residual plot indicates that response increases with age (why?),
regardless of treatment group.
2. Just due to chance, the subjects assigned to group A were older than
the ones assigned to B.

Question: Is the observed di↵erence in ȳA , ȳB due to

• treatment?
• age?
• both?

A linear model for ANCOVA: Let yi,j be the response of the jth subject
in treatment i:
yi,j = µ + ai + b ⇥ xi,j + ✏i,j
CHAPTER 6. FACTORIAL DESIGNS 157

A A
15

15
AA AA
5 10

5 10
B B
o2_change

o2_change
A A
A A
A B A B
0

−10 −5 0
B B
B B
−10 −5

B B
B B

20 22 24 26 28 30 20 22 24 26 28 30
age age

Figure 6.17: ANOVA and ANCOVA fits to the oxygen uptake data

This model gives a linear relationship between age and response for each
group:

intercept slope error

if i = A, Yi = (µ + aA ) + b ⇥ xi,j + ✏i,j
i = B, Yi = (µ + aB ) + b ⇥ xi,j + ✏i,j

Unbiased parameter estimates can be obtained by minimizing the residual

sum of squares:
X
(µ̂, â, b̂) = arg min (yi,j [µ + ai + b ⇥ xi,j ])2
µ,a,b
i,j

The fitted model is shown in the second panel of Figure 6.17.

Variance decomposition Consider the following two ANOVAs:

> anova ( lm ( o 2 c h a n g e ˜ grp ) )
Df Sum Sq Mean Sq F v a l u e Pr(>F)
grp 1 328.97 328.97 8.4503 0.01565 ⇤
R e s i d u a l s 10 3 8 9 . 3 0 38.93
CHAPTER 6. FACTORIAL DESIGNS 158

> anova ( lm ( o 2 c h a n g e ˜ grp+age ) )

Df Sum Sq Mean Sq F v a l u e Pr(>F)
grp 1 3 2 8 . 9 7 3 2 8 . 9 7 4 2 . 0 6 2 0 . 0 0 0 1 1 3 3 ⇤⇤⇤
age 1 3 1 8 . 9 1 3 1 8 . 9 1 4 0 . 7 7 6 0 . 0 0 0 1 2 7 4 ⇤⇤⇤
Residuals 9 70.39 7.82

The second one decomposes the variation in the data that is orthogonal to
treatment (SSE from the first ANOVA) into a parts that can be ascribed to
age (SS age in the second ANOVA), and everything else (SSE from second
ANOVA). I will try to draw some triangles that describe this situation.
Now consider two other ANOVAs:
> anova ( lm ( o 2 c h a n g e ˜ age ) )
Df Sum Sq Mean Sq F v a l u e Pr(>F)
age 1 5 7 6 . 0 9 5 7 6 . 0 9 4 0 . 5 1 9 8 . 1 8 7 e 05 ⇤⇤⇤
R e s i d u a l s 10 1 4 2 . 1 8 14.22

> anova ( lm ( o 2 c h a n g e ˜ age+grp ) )

Df Sum Sq Mean Sq F v a l u e Pr(>F)
age 1 5 7 6 . 0 9 5 7 6 . 0 9 7 3 . 6 5 9 4 1 . 2 5 7 e 05 ⇤⇤⇤
grp 1 71.79 71.79 9.1788 0.01425 ⇤
Residuals 9 70.39 7.82

Again, I will try to draw some triangles describing this situation.

Blocking, ANCOVA and unbalanced designs: Suppose we have a fac-

tor of interest (say F 1) that we will randomly assign to experimental material,
but it is known that there is some nuisance factor (say F 2 ) that is suspected
to be a large source of variation. Our options are:

• Block according to F 2: This allows an even allocation of treatments

across levels of F 2. As a result, we can separately estimate the e↵ects
of F 1 and F 2.

• Do a CRD with respect to F 1, but account for F 2 somehow in the

analysis:

– For a standard two-factor ANOVA, F 2 must be treated as a cat-

egorical variable (“old”, “young”).
– ANCOVA allows for more precise control of variation due to F 2.
CHAPTER 6. FACTORIAL DESIGNS 159

A A A
2.5

2.5

2.5
A A A
2.0

2.0

2.0
A A A
B
A B
A B
A
1.5

1.5

1.5
y

y
1.0

1.0

1.0
0.5

0.5

0.5
B
B B
B B
B
B
A B
A B
A
0.0

0.0

0.0
B B B

1 2 1 2 1 2
f2 f2 f2

Figure 6.18: Unbalanced design: Controlling eliminates e↵ect.

However, in either of these approaches the design is unlikely to be completely

balanced , and so the e↵ects of F 1 can’t be estimated separately from those
of F 2. This is primarily a problem for experiments with small sample sizes:
The probability of design imbalance decreases as a function of sample size
(as does the correlation between the parameter estimates) as long as F 1 is
randomly assigned.

6.9 Types of sums of squares

Figure 6.18 gives a simple example where an unbalanced design can lead to
results that are difficult to interpret.
• Two factors:
– F1 = treatment (A vs B)
– F2 = block (location 1 versus 2)
• 10 experimental units
• Balanced CRD in F1, but not in F1⇥F2.
Looking at things marginally in F1, we have ȳA > ȳB , and there seems to be
an e↵ect of F1=A versus B. This is highlighted in the second panel of the
figure, which shows the di↵erence between the sample marginal means and
the grand mean seems large compared to error variance.
CHAPTER 6. FACTORIAL DESIGNS 160

> anova ( lm ( y˜ f 1+f 2 ) )

Df Sum Sq Mean Sq F v a l u e Pr(>F)
f1 1 3 . 5 8 2 4 3 . 5 8 2 4 2 1 . 5 7 7 0 . 0 0 2 3 5 5 ⇤⇤
f2 1 4 . 2 9 7 1 4 . 2 9 7 1 2 5 . 8 8 2 0 . 0 0 1 4 1 9 ⇤⇤
Residuals 7 1.1622 0.1660

However, notice the imbalance:

• 4 A observations under F2=1, 1 under F2=2

• 1 B observations under F2=1, 4 under F2=2

What happens if we explain the variability in y using F2 first, then F1?
Look at the black lines in the third panel of the plot: These are the marginal
means for F2. Within levels of F2, di↵erences between levels of F1 are small.
> anova ( lm ( y˜ f 2+f 1 ) )
Df Sum Sq Mean Sq F v a l u e Pr(>F)
f2 1 7 . 8 0 6 4 7 . 8 0 6 4 4 7 . 0 1 8 5 0 . 0 0 0 2 4 0 5 ⇤⇤⇤
f1 1 0.0731 0.0731 0.4405 0.5281460
Residuals 7 1.1622 0.1660

The ANOVA quantifies this: There is variability in the data that can be
explained by either F1 or F2. In this case,
• SSA > SSA|B

• SSB > SSB|A

Do these inequalities always hold? Consider the data in Figure 6.19. In this
case F1=A is higher than F1=B for both values of F2. But there are more
A observations in the low-mean value of F2 than the high-mean value. The
second an third plots suggest
• Marginally, the di↵erence between levels of F1 are small.

• Within each group, the di↵erence between levels of F1 are larger.

Thus “controlling” for F2 highlights the di↵erences between levels of F1.
This is confirmed in the corresponding ANOVA tables:
> anova ( lm ( y˜ f 1+f 2 ) )
Df Sum Sq Mean Sq F v a l u e Pr(>F)
f1 1 0.1030 0.1030 0.5317 0.4895636
f2 1 5 . 7 8 5 1 5 . 7 8 5 1 2 9 . 8 7 7 2 0 . 0 0 0 9 3 9 9 ⇤⇤⇤
Residuals 7 1.3554 0.1936
CHAPTER 6. FACTORIAL DESIGNS 161

A A A
2.5

2.5

2.5
B
B B
B B
B
B B B
2.0

2.0

2.0
A A A
B B B
1.5

1.5

1.5
y

y
A
A A
A A
A
1.0

1.0

1.0
A A A
0.5

0.5

0.5
0.0

0.0

0.0
B B B

1 2 1 2 1 2
f2 f2 f2

Figure 6.19: Unbalanced design: Controlling highlights e↵ect.

> anova ( lm ( y˜ f 2+f 1 ) )

Df Sum Sq Mean Sq F v a l u e Pr(>F)
f2 1 4 . 4 8 0 4 4 . 4 8 0 4 2 3 . 1 3 9 1 0 . 0 0 1 9 4 3 ⇤⇤
f1 1 1.4077 1.4077 7.2698 0.030806 ⇤
Residuals 7 1.3554 0.1936

Which ANOVA table to use? Some software packages combine these anova
tables to form ANOVAs based on “alternative” types of sums of squares.
Consider a two-factor ANOVA in which we plan on decomposing variance
into additive e↵ects of F1, additive e↵ects of F2, and their interaction.

Type I SS: Sequential, orthogonal decomposition of the variance.

Type II SS: Sum of squares for a factor is the improvement in fit from
adding that factor, given inclusion of all other terms at that level or
below.

Type III SS: Sum of squares for a factor is the improvement in fit from
adding that factor, given inclusion of all other terms.

So for example:

• SSF11 = RSS(0) RSS(F1) if F1 first in sequence

• SSF11 = RSS(F2) RSS(F1 + F2) if F1 second in sequence

CHAPTER 6. FACTORIAL DESIGNS 162

• SSF12 = RSS(F2) RSS(F1 + F2)

• SSF13 = RSS(F2, F1 : F2) RSS(F1, F2, F1 : F2)

Type II sums of squares is very popular, and is to some extent the “default”
for linear regression analysis. It seems natural to talk about the “variability
due to a treatment, after controlling for other sources of variation.” How-
ever, there are situations in which you might not want to “control” for other
variables. Consider the following (real-life) scenario:

Clinical trial:

• Factor 1: Assigned dose of a drug (high or low)

• Factor 2: Dose reduction (yes or no)

Weaker patients experienced higher rates of drug side-e↵ects than stronger

patients, and side e↵ects are worse under the high dose. As a result,

• Only the strongest A patients remain in the non-reduced group;

• Only the weakest B patients go into the reduced group.

“Controlling” or adjusting for dose reduction artificially inflates the perceived

e↵ect of treatment.
Chapter 7

Nested Designs

Example(Potato): Sulfur added to soil kills bacteria, but too much sulfur
can damage crops. Researchers are interested in comparing two levels of
sulfur additive (low, high) on the damage to two types of potatoes.

Factors of interest:

1. Potato type 2 {A, B}

2. Sulfur additive 2 {low,high}

Experimental material: Four plots of land.

Design constraints:

• It is easy to plant di↵erent potato types within the same plot

• It is difficult to have di↵erent sulfur treatments in the same plot,
due to leeching.

Experimental Design: A Split-plot design

1. Each sulfur additive was randomly assigned to two of the four

plots.
2. Each plot was split into four subplots. Each potato type was
randomly assigned to two subplots per plot.

163
CHAPTER 7. NESTED DESIGNS 164

A B B A
L H
A B A B

B A B A
H L
A B A B

Randomization:

Sulfur type was randomized to whole plots;

Potato type was randomized to subplots.

Initial data analysis: Sixteen responses, 4 treatment combinations.

• 8 responses for each potato type

• 8 responses for each sulfur type

• 4 responses for each potato⇥type combination

> f i t . f u l l < lm ( y˜ type ⇤ s u l f u r ) ; f i t . add< lm ( y˜ type+s u l f u r )

> anova ( f i t . f u l l )
Df Sum Sq Mean Sq F value Pr(>F)
type 1 1.48840 1.48840 1 3 . 4 4 5 9 0 . 0 0 3 2 2 5 ⇤⇤
sulfur 1 0.54022 0.54022 4.8803 0.047354 ⇤
type : s u l f u r 1 0 . 0 0 3 6 0 0 . 0 0 3 6 0 0.0325 0.859897
Residuals 12 1 . 3 2 8 3 5 0 . 1 1 0 7 0

> anova ( f i t . add )

Df Sum Sq Mean Sq F v a l u e Pr(>F)
type 1 1 . 4 8 8 4 0 1 . 4 8 8 4 0 1 4 . 5 2 7 0 0 . 0 0 2 1 6 ⇤⇤
sulfur 1 0.54022 0.54022 5.2727 0.03893 ⇤
R e s i d u a l s 13 1 . 3 3 1 9 5 0 . 1 0 2 4 6

Randomization test: Consider comparing the observed outcome to the

population of other outcomes that could’ve occurred under

• di↵erent treatment assignments;

CHAPTER 7. NESTED DESIGNS 165
5.5

5.5
5.0

5.0
4.5

4.5

high low A B
sulfur type
5.5
5.0
4.5

high.A low.A high.B low.B

Figure 7.1: Potato data.

CHAPTER 7. NESTED DESIGNS 166

● ●
0.0 0.2 0.4

0.4
● ●

● ● ●
Sample Quantiles

●
● ●

0.0 0.2
fit.add$res
● ●
● ●
● ●

● ●
● ●
● ●● ● ●
● ●
−0.4

−0.4
● ● ● ●

−2 −1 0 1 2 4.6 4.8 5.0 5.2 5.4

Theoretical Quantiles fit.add$fit

Figure 7.2: Diagnostic plots for potato ANOVA.

• no treatment e↵ects.

Treatment assignment Field 1 Field 2 Field 3 Field 4

low low high high
1 A A B B A A B B A A B B A A B B
low low high high
2 A B A B A A B B A A B B A A B B
low low high high
3 A B A B A B A B A A B B A A B B
low low high high
4 A B A B A A B B A B A B A A B B
low high low high
5 A B A B A A B B A A B B A A B B
.. .. .. ..
. . . .
.. .. ... ... .. .. ... ... .. .. ... ... .. .. ... ... ..
. . . . . . . . .

Number of possible treatment assignments:

4
• 2
= 6 ways to assign sulfur
CHAPTER 7. NESTED DESIGNS 167

0.15
0.4
0.2 0.3

0.10
Density

Density
0.05
0.1

0.00
0.0

0 5 10 15 20 0 5 10 15
F.t.null F.s.null

Figure 7.3: Potato data

4 4
• For each sulfur assignment there are 2
= 1296 type assignments
So we have 7776 possible treatment assignments. It probably wouldn’t be to
hard to write code to go through all possible treatment assignments, but its
very easy to obtain a Monte Carlo approximation to the null distribution:
F . t . n u l l < F . s . n u l l < NULL

f o r ( ns i n 1 : 1 0 0 0 ) {
s . sim< r e p ( sample ( c ( ” low ” , ” low ” , ” h i g h ” , ” h i g h ” ) ) , r e p ( 4 , 4 ) )
t . sim< c ( sample ( c ( ”A” , ”A” , ”B” , ”B” ) ) , sample ( c ( ”A” , ”A” , ”B” , ”B” ) ) ,
sample ( c ( ”A” , ”A” , ”B” , ”B” ) ) , sample ( c ( ”A” , ”A” , ”B” , ”B” ) )
)

f i t . sim< anova ( lm ( y˜ a s . f a c t o r ( t . sim)+ a s . f a c t o r ( s . sim ) ) )

F . t . n u l l < c (F . t . n u l l , f i t . sim [ 1 , 4 ] )
F . s . n u l l < c (F . s . n u l l , f i t . sim [ 2 , 4 ] )
}

> mean (F . t . n u l l >=F . t . obs )

[ 1 ] 0.001
> mean (F . s . n u l l >=F . s . obs )
[ 1 ] 0.352
CHAPTER 7. NESTED DESIGNS 168

What happened?
rand
Ftype ⇡ F1,13 ) prand anova1
type ⇡ ptype
rand
Fsulfur 6⇡ F1,13 ) prand anova1
sulfur 6⇡ psulfur

The F -distribution approximates a null randomization distribution if treat-

ments are randomly assigned to units. But here, the sulfur treatment is
being assigned to groups of four units.
• The precision of an e↵ect is related to the number of independent
treatment assignments made
• We have 16 assignments of type, but only 4 assignments of sulfur. It
is difficult to tell the di↵erence between sulfur e↵ects and field e↵ects.
Our estimates of sulfur e↵ects are less precise than those of type.
Note:
• From the point of view of type alone, the design is a RCB.
– Each whole-plot(field) is a block. We have 2 observations of each
type per block.
– We compare MSType to the MSE from residuals left from a model
with type e↵ects, block e↵ects and possibly interaction terms.
– Degrees of freedom breakdown for the sub-plot analysis:
Source dof
block=whole plot 3
type 1
subplot error 11
subplot total 15
• From the point of view of sulfur alone, the design is a CRD.
– Each whole plot is an experimental unit.
– We want to compare MSSulfur to the variation in whole plots,
which are fields.
– Degrees of freedom breakdown for the whole-plot analysis:
Source dof
sulfur 1
whole plot error 2
whole plot total 3
CHAPTER 7. NESTED DESIGNS 169

• Recall, degrees of freedom quantify the precision of our estimates and

the shape of the null distribution.

The basic idea is:

• We have di↵erent levels of experimental units (fields/whole-plots for

sulfur, subfields/sub-plots for type).

• We compare the MS of a factor to the variation among the experimental

units of that factor’s level. In general, this variation is represented
by the interaction between the experimental units label and all factors
assigned at the given level.

• The level of an interaction is the level of the smallest experimental unit

involved.

> anova ( lm ( y˜ type+type : s u l f u r+a s . f a c t o r ( f i e l d ) ) )

Df Sum Sq Mean Sq F v a l u e Pr(>F)
type 1 1 . 4 8 8 4 0 1 . 4 8 8 4 0 5 4 . 7 0 0 5 2 . 3 2 6 e 05 ⇤⇤⇤
a s . f a c t o r ( f i e l d ) 3 1 . 5 9 6 4 8 0 . 5 3 2 1 6 1 9 . 5 5 7 5 0 . 0 0 0 1 6 6 1 ⇤⇤⇤
type : s u l f u r 1 0.00360 0.00360 0.1323 0.7236270
Residuals 10 0 . 2 7 2 1 0 0 . 0 2 7 2 1

Thus there is strong evidence for type e↵ects, and little evidence that the
e↵ects of type vary among levels of sulfur.

> anova ( lm ( y˜ s u l f u r+a s . f a c t o r ( f i e l d ) ) )

Df Sum Sq Mean Sq F v a l u e Pr(>F)
sulfur 1 0.54022 0.54022 3.6748 0.07936 .
as . f a c t o r ( f i e l d ) 2 1.05625 0.52813 3.5925 0.05989 .
Residuals 12 1 . 7 6 4 1 0 0 . 1 4 7 0 1

The F-test here is not appropriate: We need to compare MSSulfur to the

variation among whole-plots, after accounting for the e↵ects of sulfur, i.e.
MSField (note that including or excluding type here has no e↵ect on this
comparison).
MSS< anova ( lm ( y˜ s u l f u r+a s . f a c t o r ( f i e l d ) ) ) [ 1 , 3 ]
MSWPE< anova ( lm ( y˜ s u l f u r+a s . f a c t o r ( f i e l d ) ) ) [ 2 , 3 ]
F . s u l f u r < MSS/MSWPE
CHAPTER 7. NESTED DESIGNS 170

> F. sulfur
[ 1 ] 1.022911
> 1 p f (F . s u l f u r , 1 , 2 )
[ 1 ] 0.4182903

This is more in line with the analysis using the randomization test.
The above calculations are somewhat tedious. In R there are several au-
tomagic ways of obtaining the correct F -test for this type of design. One
way is with the aov command:
> f i t 1 < aov ( y˜ type ⇤ s u l f u r + E r r o r ( f a c t o r ( f i e l d ) ) )
> summary ( f i t 1 )

Error : f a c t o r ( f i e l d )
Df Sum Sq Mean Sq F v a l u e Pr(>F)
sulfur 1 0.54022 0.54022 1.0229 0.4183
Residuals 2 1.05625 0.52813

E r r o r : Within
Df Sum Sq Mean Sq F v a l u e Pr(>F)
type 1 1.48840 1 . 4 8 8 4 0 5 4 . 7 0 0 5 2 . 3 2 6 e 05 ⇤⇤⇤
type : s u l f u r 1 0 . 0 0 3 6 0 0.00360 0.1323 0.7236
Residuals 10 0 . 2 7 2 1 0 0.02721

###

> f i t 2 < aov ( y˜ type + s u l f u r + E r r o r ( f a c t o r ( f i e l d ) ) )

> summary ( f i t 2 )

Error : f a c t o r ( f i e l d )
Df Sum Sq Mean Sq F v a l u e Pr(>F)
sulfur 1 0.54022 0.54022 1.0229 0.4183
Residuals 2 1.05625 0.52813

E r r o r : Within
Df Sum Sq Mean Sq F v a l u e Pr(>F)
type 1 1 . 4 8 8 4 0 1 . 4 8 8 4 0 5 9 . 3 8 5 9 . 3 0 7 e 06 ⇤⇤⇤
R e s i d u a l s 11 0 . 2 7 5 7 0 0 . 0 2 5 0 6

The “Error(field)” option tells R that it should

• treat “fields” as sampled experimental units, i.e. subject to sampling

heterogeneity;

• identify factors assigned to fields;

CHAPTER 7. NESTED DESIGNS 171

• test those factors via comparisons of MSFactor to MSField.

7.1 Mixed-e↵ects approach

What went wrong with the normal sampling model approach? What is wrong
with the following model?

yijk = µ + ai + bj + (ab)ij + ✏ijk

where
• i indexes sulfur level, i 2 {1, 2} ;
• j indexes type level, j 2 {1, 2} ;
• k indexes reps, k 2 {1, . . . , 4}
• ✏ijk are i.i.d. normal
We checked the normality and constant variance assumptions for this model
previously, and they seemed ok. What about independence? Figure 5.4 plots
the residuals as a function of field. The figure indicates that residuals are
more alike within a field than across fields, and so observations within a field
are positively correlated. Statistical dependence of this sort is common
to split-plot and other nested designs.
dependence within whole-plots
• a↵ects the amount of information we have about factors applied at
the whole-plot level: within a given plot, we can’t tell the di↵erence
between plot e↵ects and whole-plot factor e↵ects.
• This doesn’t a↵ect the amount of information we have about factors
applied at the sub-plot level: We can tell the di↵erence between plot
e↵ects and sub-plot factor e↵ects.
If residuals within a whole-plot are positively correlated, the most intuitively
straightforward way to analyze such data (in my opinion) is with a hierar-
chical mixed-e↵ects model:
yijkl = µ + ai + bj + (ab)ij + ik + ✏ijkl
where things are as before except
CHAPTER 7. NESTED DESIGNS 172

l
h
0.4

l h
h
0.0 0.2

l h
residual

l
l
h
h l
h
−0.4

h l
1.0 1.5 2.0 2.5 3.0 3.5 4.0
field

Figure 7.4: Potato data

• ik represents error variance at the whole plot level, i.e. variance in

whole plot experimental units (fields or blocks in the above example).
The index k represents whole-plot reps k = 1, . . . , r1 ,
2
{ ik } ⇠ normal(0, w)

• ✏ijkl represents error variance at the sub plot level, i.e. variance in
sub plot experimental units The index j represents sub-plot reps l =
1, . . . , r2 ,
{✏ijkl } ⇠ normal(0, s2 )
Now every subplot within the same wholeplot has something in common, i.e.
ik . This models the positive correlation within whole plots:

Cov(yi,j1 ,k,l1 , yi,j2 ,k,l2 ) = E[(yi,j1 ,k,l1 E[yi,j1 ,k,l1 ]) ⇥ (yi,j2 ,k,l2 E[yi,j2 ,k,l2 ])]
= E[( i,k + ✏i,j1 ,k,l1 ) ⇥ ( i,k + ✏i,j2 ,k,l2 )]
2
= E[ i,k + i,k ⇥ (✏i,j1 ,k,l1 + ✏i,j2 ,k,l2 ) + ✏i,j1 ,k,l1 ✏i,j2 ,k,l2 ]
2 2
= E[ i,k ] +0+0= w

This and more complicated random-e↵ects models can be fit using the lme
command in R. To use this command, you need the nlme package:
CHAPTER 7. NESTED DESIGNS 173

l i b r a r y ( nlme )
f i t . me< lme ( f i x e d=y˜ type+s u l f u r , random =˜1| a s . f a c t o r ( f i e l d ) )

>summary ( f i t . me)

Fixed e f f e c t s : y ˜ type + s u l f u r
Value Std . E r r o r DF t value p value
( I n t e r c e p t ) 4 . 8 8 5 0 0 . 2 5 9 9 6 5 0 11 1 8 . 7 9 0 9 9 1 0 . 0 0 0 0
typeB 0 . 6 1 0 0 0 . 0 7 9 1 5 7 5 11 7 . 7 0 6 1 5 3 0 . 0 0 0 0
sulfurlow 0.3675 0 . 3 6 3 3 6 0 2 2 1.011393 0 . 4 1 8 3

> anova ( f i t . me)

numDF denDF F v a l u e p v a l u e
( Intercept ) 1 11 7 5 9 . 2 9 4 6 <.0001
type 1 11 5 9 . 3 8 4 8 <.0001
sulfur 1 2 1.0229 0.4183

Notice that this gives the same results as

• the randomization test (approximately);

• the by-hand comparison of mean squares for factors to the appropriate

MSE;

• the results from the aov command.

But the lme command allows for much more complex models to be fit.
CHAPTER 7. NESTED DESIGNS 174

7.2 Repeated measures analysis

Sitka spruce data: Longitudinal data on 79 spruce trees

• 54 grown in ozone-enriched chambers

• 25 grown in regular atmosphere (controls)

The size of each tree (roughly the volume) was measured at five time points:
152, 174, 201, 227 and 258 days after the beginning of experiment.

●
● ● ● ●
● ● ● ●
● ● ● ● ●
●
● ● ●
6

● ● ●
● ●
● ●
● ●
● ●
● ● ● ● ●
● ● ●
● ● ●
●
● ● ● ● ● ●
● ● ●
● ● ● ●
● ● ●
●
● ● ●
● ●
● ● ●
● ●
● ● ●
● ● ●
● ● ●
● ●
●
●
● ● ● ● ● ● ●
5

●
● ● ● ●
● ● ● ● ●
● ●
●
● ● ● ● ● ●
● ● ● ●
height

● ● ● ●
● ● ●
● ●
● ● ● ● ●
● ● ●
●
● ●
● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ●
● ●
● ●
● ● ● ●
● ●
● ● ● ● ●
● ●
● ● ●
4

● ● ●
● ●
●
● ● ●
● ●
● ●
● ●
●
● ●
● ● ●
●
● ●
● ● ● ●
● ● ● ●
● ● ●
3

● ●
●
●

152 174 201 227 258

time

Figure 7.5: Sitka spruce data.

How should we analyze these data?

How should we evaluate evidence of a treatment (ozone) e↵ect?

Naive approach I: A naive approach would be to ignore the study design

and the temporal nature of the data.

• 5 ⇥ 54 = 270 ozone observations

• 5 ⇥ 25 = 125 control observations

CHAPTER 7. NESTED DESIGNS 175

> f i t < lm ( S i t k a $ s i z e ˜ S i t k a $ t r e a t )
> anova ( f i t )
Df Sum Sq Mean Sq F v a l u e Pr(>F)
Sitka$treat 1 3.810 3.810 6.0561 0.01429 ⇤
Residuals 393 2 4 7 . 2 2 2 0.629

Histogram of fit$res Normal Q−Q Plot

●●● ● ●
100

●
●●
●●
●●●
●
●
●●
●
●●
●
●●
●●
●
●
●
6

●●

1
●●
●
●
●●
●●
●
●
●●
●
40 60 80

●
●●
●

Sample Quantiles
●
●
●
●●
●
●●
●
●●
●
●
●●
●
●
●
●●
●●
●
●
●●
●
●●
●
Frequency

●
●●
●
5

●
●
●●

0
●●
●
●●
●●
●
●●
●
●
●●
●
●
●
●●
●
●
●●
●
●
●
●●
●
●●
●
●●
●
●
●
●●

−1
●
●
●
4

●
●●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●●
●●●
●
20

−2
●●●
3

●
0

control ozone −3 −2 −1 0 1 2 −3 −2 −1 0 1 2 3
fit$res Theoretical Quantiles
1 0
fit$res
−1

●
● ●
●
●
●
−2

152 174 201 227 258

as.factor(Sitka$Time)
1 0
fit$res
−1 −2

1 3 5 7 9 11 14 17 20 23 26 29 32 35 38 41 44 47 50 53 56 59 62 65 68 71 74 77
as.factor(Sitka$tree)
CHAPTER 7. NESTED DESIGNS 176

Naive approach II: Clearly there is some e↵ect of time. Let’s now “ac-
count” for growth over time, using a simple ANCOVA:

yi,j,t = µ0 + ai + b ⇥ t + ci ⇥ t + ✏i,j,t

• for ozone (i = 1) , E[y1,j,t ] = (µ0 + a1 ) + (b + c1 ) ⇥ t

• for control (i = 2) , E[y2,j,t ] = (µ0 + a2 ) + (b + c2 ) ⇥ t

> f i t < lm ( s i z e ˜Time+t r e a t+Time⇤ t r e a t , data=S i t k a )

> anova ( f i t )

Df Sum Sq Mean Sq F v a l u e Pr(>F)

Time 1 8 9 . 5 6 4 8 9 . 5 6 4 2 2 2 . 9 0 2 0 < 2 . 2 e 16 ⇤⇤⇤
treat 1 3.810 3.810 9 . 4 8 1 3 0 . 0 0 2 2 2 2 ⇤⇤
Time : t r e a t 1 0.551 0.551 1.3703 0.242480
R e s i d u a l s 391 1 5 7 . 1 0 7 0.402

●
● ● ● ●
● ● ● ●
● ● ● ● ●
●
● ● ●
6

● ● ●
● ●
● ●
● ●
● ●
● ● ● ● ●
● ● ●
● ● ●
● ● ● ● ●
● ● ●
● ●
● ● ●
● ● ●
●
● ● ● ● ● ● ●
●
● ● ● ● ● ● ● ●
● ●
● ●
●
● ● ● ●
● ● ● ● ●
●
5

●
● ● ● ●
● ● ● ● ●
● ●
● ● ● ● ● ●
● ● ● ●
●
height

● ● ● ●
● ● ●
● ●
● ● ● ●
● ●
● ● ●
●
● ●
●
● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ●
● ●
● ● ● ● ●
● ●
● ● ●
● ● ●
● ●
● ● ●
4

● ● ●
● ●
●
● ● ●
● ●
● ●
● ●
●
● ●
● ● ●
●
● ●
● ● ● ●
● ● ● ●
● ● ●
3

● ●
●
●

160 180 200 220 240 260

time
Histogram of fit$res Normal Q−Q Plot
140

● ●
●●●
●●
●
●●
●●
●●
●●●●●
●
1

●
●●
●
●
●●
●
●●
●
●
●●
●
●
●
●
●●
●
●●
●
●
●●
●
●
●●
●
●
Sample Quantiles

●
●
●
●●
●
●
100

●●
●
●●
●
●
●
●●
●
●
●●
●
●●
●
●●
●
●●
●
●
●●
●
●●
●
●●
●
●
Frequency

●●
●
●●
●
0

●
●●
●
●●
●
●
●
●
●●
●
●●
●
●●
●
●●
●●
●
●●
●
●●
●
●
●●
●
●●
●
●●
●
●
●
●
●
60

●●
●
●
●
●●
●
●●
●●
●
●
−1

●●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●●
●
●●
0 20

●●
●
●●
−2

●
●

−2 −1 0 1 −3 −2 −1 0 1 2 3
fit$res Theoretical Quantiles

Figure 7.6: ANCOVA fit and residuals

CHAPTER 7. NESTED DESIGNS 177

●
●
1

● ● ●

● ● ●● ●
● ●
● ● ● ●
0

● ● ● ●
fit$res

● ● ●
●● ● ●
● ● ● ●
● ● ●
−1

●
●
−2

1 4 7 11 15 19 23 27 31 35 39 43 47 51 55 59 63 67 71 75 79
as.factor(Sitka$tree)

Figure 7.7: Within-tree dependence

Conservative approach: Any of our standard inference tools are suspect

because we clearly do not have independent observations. The within-tree
dependence indicates that we have
• less information than in 270 independent ozone and 125 independent
control observations,

• but more information than in 54 independent ozone and 24 independent

control observations.
So our “worst-case scenario” is analogous to having n1 = 54 and n2 = 24.
One approach to analysis in such situations is to reduce the information
to one number per unit, then compare numbers across treatments. For
example, we could compute an average and fit a regression line for each tree
j, giving
avg
• yi,j , the average of the 5 observations for tree i, j
int
• yi,j , the estimated intercept of the regr. line for tree i, j
int
• yi,j , the estimated slope of the regr. line for tree i, j
We can then compare averages, intercepts and slopes across the two treat-
ment groups. Note that, for each type of y, there is only one observation for
each tree: we have eliminated the problem of dependent measurements.
CHAPTER 7. NESTED DESIGNS 178
6
5
height
4
3

160 180 200 220 240 260

time

0.020
6.0

4
5.5

0.015
3
4.5 5.0

intercept
average

slope
0.010
2
4.0

0.005
1
3.5

●
3.0

● ●
0

control ozone control ozone control ozone

treatment treatment treatment

Figure 7.8: Reduction to tree-specific summary statistics

> anova ( lm ( y . avg ˜ t r e a t ) )

Df Sum Sq Mean Sq F v a l u e Pr(>F)
treat 1 0.7619 0.7619 2.0188 0.1594
R e s i d u a l s 77 2 9 . 0 6 1 7 0 . 3 7 7 4

> anova ( lm ( y . i n t ˜ t r e a t ) )
Df Sum Sq Mean Sq F v a l u e Pr(>F)
treat 1 0.840 0.840 1.0431 0.3103
R e s i d u a l s 77 6 1 . 9 8 9 0.805

> anova ( lm ( y . s l o p e ˜ t r e a t ) )
Df Sum Sq Mean Sq F v a l u e Pr(>F)
treat 1 0 . 0 0 0 0 7 8 1 5 0 . 0 0 0 0 7 8 1 5 7 . 6 6 2 8 0 . 0 0 7 0 5 8 ⇤⇤
R e s i d u a l s 77 0 . 0 0 0 7 8 5 2 9 0 . 0 0 0 0 1 0 2 0
CHAPTER 7. NESTED DESIGNS 179

Linear random e↵ects models: The last approach is extremely conser-

vative, as it basically reduces all the information we have from a tree to one
number. Of course, the observations from a single tree are not completely
dependent, and so compressing the data in this way throws away potentially
valuable information. To make use of all the information from a tree, we can
use a random e↵ects model which accounts for correlation of observations
common to a given tree.

yi,j,t = (a1 + b1,j + c1,i ) + (a2 + b2,j + c2,j ) ⇥ t + ✏i,j,t

where

• (b1,j , b2,j ), j = 1, 2 are fixed-e↵ects, measuring the heterogeneity of

the average slope and intercept across the two levels of treatment;

• (c1,1 , c2,1 ), . . . , (c1,n , c2,n ) ⇠ i.i.d. multivariate normal(0, ⌃) are random

e↵ects , inducing a within-tree correlation of observations.

> f i t < lme ( f i x e d=s i z e ˜ t r e a t+Time+t r e a t ⇤Time ,

random=˜Time | t r e e , data=S i t k a )
> anova ( f i t )
numDF denDF F v a l u e p v a l u e
( Intercept ) 1 314 4 9 6 7 . 0 5 6 <.0001
treat 1 77 2.118 0.1497
Time 1 314 1 2 4 6 . 5 1 8 <.0001
t r e a t : Time 1 314 7.663 0.0060

Accenture Complete Preparation Sheet
100% (1)
Accenture Complete Preparation Sheet
11 pages
Statistics 502 Lecture Notes: Peter D. Hoff
No ratings yet
Statistics 502 Lecture Notes: Peter D. Hoff
186 pages
Statistics 502 Lecture Notes. Hoff. 2006
No ratings yet
Statistics 502 Lecture Notes. Hoff. 2006
160 pages
Experimentos
100% (5)
Experimentos
680 pages
Sta2005s Ed
No ratings yet
Sta2005s Ed
165 pages
Statnotes PDF
No ratings yet
Statnotes PDF
300 pages
AS Theory 2022-23
No ratings yet
AS Theory 2022-23
140 pages
Biostatistics for Public Health Grads
100% (1)
Biostatistics for Public Health Grads
205 pages
Statistical Methods Overview
No ratings yet
Statistical Methods Overview
115 pages
Statistical Inference
No ratings yet
Statistical Inference
148 pages
Class Notes
No ratings yet
Class Notes
147 pages
Experimental Design & Data Analysis
No ratings yet
Experimental Design & Data Analysis
310 pages
Applied Statistical Methods Guide
No ratings yet
Applied Statistical Methods Guide
197 pages
Akritas Probability & Statistics With R For Engineers and Scientists
No ratings yet
Akritas Probability & Statistics With R For Engineers and Scientists
256 pages
Applied Data Analysis For Process Improvement - A Practical Guide To Six Sigma Black Belt Statistics-Hytinen, - Annemieke
No ratings yet
Applied Data Analysis For Process Improvement - A Practical Guide To Six Sigma Black Belt Statistics-Hytinen, - Annemieke
303 pages
Biometry: Robert R. Rohlf
No ratings yet
Biometry: Robert R. Rohlf
36 pages
Optimal Experimental Design With R 1st Edition Dieter Rasch Ready To Read
No ratings yet
Optimal Experimental Design With R 1st Edition Dieter Rasch Ready To Read
154 pages
Introduction To Bio Statistics
No ratings yet
Introduction To Bio Statistics
204 pages
Probability and Statistics Ii: George Deligiannidis Module Lecturer 2020/21: Kalliopi Mylona
No ratings yet
Probability and Statistics Ii: George Deligiannidis Module Lecturer 2020/21: Kalliopi Mylona
107 pages
Principles of Biostatistics: Class Notes To Accompany The Textbook by Pagano and Gauvreau
No ratings yet
Principles of Biostatistics: Class Notes To Accompany The Textbook by Pagano and Gauvreau
125 pages
Biostatistics Constantin Yiannoutsos
No ratings yet
Biostatistics Constantin Yiannoutsos
125 pages
Notes On Applied Statistics
No ratings yet
Notes On Applied Statistics
16 pages
Advanced Data Analysis Notes
No ratings yet
Advanced Data Analysis Notes
376 pages
Applied Statistical Methods in Agriculture, Health and Life Sciences Instant Download
No ratings yet
Applied Statistical Methods in Agriculture, Health and Life Sciences Instant Download
16 pages
DOE by Leonard C. Onyiah
No ratings yet
DOE by Leonard C. Onyiah
839 pages
Introduction To Biostatistics A Guide To Design, Analysis, and Discovery (FULL VERSION DOWNLOAD)
100% (12)
Introduction To Biostatistics A Guide To Design, Analysis, and Discovery (FULL VERSION DOWNLOAD)
15 pages
Class Notes, Statistical Methods in Research 1
No ratings yet
Class Notes, Statistical Methods in Research 1
113 pages
Power and Sample Size in R 1st Edition ISBN 1138591629, 9781138591622 Optimized PDF Download
No ratings yet
Power and Sample Size in R 1st Edition ISBN 1138591629, 9781138591622 Optimized PDF Download
15 pages
Stat Technical Notes
0% (1)
Stat Technical Notes
430 pages
Termination and Follow Up
No ratings yet
Termination and Follow Up
16 pages
Unit 5 - Attention, Perception, Learning, Memory
No ratings yet
Unit 5 - Attention, Perception, Learning, Memory
41 pages
Unit 7 - Personality, Motivation, Emotion, Stress and Coping
No ratings yet
Unit 7 - Personality, Motivation, Emotion, Stress and Coping
36 pages
Unit 9
No ratings yet
Unit 9
6 pages
Unit 3-Psychological Testing
No ratings yet
Unit 3-Psychological Testing
23 pages
Namma Kalvi 12th Maths Minimum Learning Study Material em 217100
No ratings yet
Namma Kalvi 12th Maths Minimum Learning Study Material em 217100
81 pages
GCSE Maths Higher Tier Exam 2014
No ratings yet
GCSE Maths Higher Tier Exam 2014
16 pages
Worksheet - 1 Tangent - Normal
No ratings yet
Worksheet - 1 Tangent - Normal
11 pages
Quantum Mechanics Course Zeemansplitting
No ratings yet
Quantum Mechanics Course Zeemansplitting
29 pages
Summary Applied Econometric Time Series
No ratings yet
Summary Applied Econometric Time Series
10 pages
RRB NTPC Syllabus 2024, Subjects, Topics and Pattern For CBT 1, 2
No ratings yet
RRB NTPC Syllabus 2024, Subjects, Topics and Pattern For CBT 1, 2
20 pages
Vip No.5 - Mesl
No ratings yet
Vip No.5 - Mesl
4 pages
Satyapriya Roy College of Education: AA 287, SECTOR I, SALT LAKE, KOLKATA 700 064
No ratings yet
Satyapriya Roy College of Education: AA 287, SECTOR I, SALT LAKE, KOLKATA 700 064
6 pages
KUET - Academic Records
No ratings yet
KUET - Academic Records
4 pages
1.2 Test Mark Scheme
No ratings yet
1.2 Test Mark Scheme
5 pages
How To Crack BITCOIN - Algorithm
0% (1)
How To Crack BITCOIN - Algorithm
3 pages
Statistical Quality Control
100% (1)
Statistical Quality Control
3 pages
Using Basketball To Understand Options
No ratings yet
Using Basketball To Understand Options
3 pages
Additional Mathematics Paper 1
No ratings yet
Additional Mathematics Paper 1
18 pages
Automation Chapter 4
No ratings yet
Automation Chapter 4
44 pages
Rabie Bin Asim Design Problem 1
No ratings yet
Rabie Bin Asim Design Problem 1
25 pages
CUHK STAT5102 Ch3
No ratings yet
CUHK STAT5102 Ch3
73 pages
Compound Bars
No ratings yet
Compound Bars
5 pages
AIEEE 2002 Physics & Chemistry
No ratings yet
AIEEE 2002 Physics & Chemistry
14 pages
Database Concepts and SQL Quiz
100% (1)
Database Concepts and SQL Quiz
28 pages
FDT Excel Sample
No ratings yet
FDT Excel Sample
4 pages
Advanced Statistical Physics Problems
No ratings yet
Advanced Statistical Physics Problems
7 pages
Quantum Perturbation Theory
No ratings yet
Quantum Perturbation Theory
46 pages
Year 5 Math Curriculum Guide
No ratings yet
Year 5 Math Curriculum Guide
22 pages
Euclid's Algorithm: ENGI 1331: Exam 2 Review - Additional Practice Problems Fall 2020
No ratings yet
Euclid's Algorithm: ENGI 1331: Exam 2 Review - Additional Practice Problems Fall 2020
4 pages
Holiday Assignment
No ratings yet
Holiday Assignment
2 pages
CV Assignment 3
No ratings yet
CV Assignment 3
2 pages
ART 002 Lesson 5 Visual Elements of Arts and Designs
No ratings yet
ART 002 Lesson 5 Visual Elements of Arts and Designs
17 pages
Quantum Mechanics: Variational Method
No ratings yet
Quantum Mechanics: Variational Method
9 pages

Statistics Notes

Uploaded by

Statistics Notes

Uploaded by

Contents

1 Principles of experimental design 1

2 Test statistics and randomization distributions 9

3 Tests based on population models 25

4 Confidence intervals and power 47

6 Factorial Designs 116

7 Nested Designs 163

1.1 Model of a variable process . . . . . . . . . . . . . . . . . . . . 2

2.1 Wheat yield distributions . . . . . . . . . . . . . . . . . . . . . 12

3.1 The population model . . . . . . . . . . . . . . . . . . . . . . 27

4.1 A t10 distribution and two non-central t10 -distributions. . . . . 52

5.1 Response time data . . . . . . . . . . . . . . . . . . . . . . . . 61

5.3 Coagulation data . . . . . . . . . . . . . . . . . . . . . . . . . 83

6.1 Marginal Plots. . . . . . . . . . . . . . . . . . . . . . . . . . . 118

7.1 Potato data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165

7.4 Potato data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172

• Specific cases: 200 people called for a telephone survey

• Inferential goal: get information on the opinion of the entire city.

Example (Women’s Health Initiative): Does hormone replacement im-

• Specific cases: Health status monitored in 16,608 women over a 5-year

• Inferential goal : Determine if hormones improve the health of women

Figure 1.1: Model of a variable process

1.2 Model of a process or system

1.3 Experiments and observational studies

Example (Women’s Health Initiative, WHI):

• Population: Healthy, post-menopausal women in the U.S.

1. estrogen treatment, yes/no

1. coronary heart disease (eg. MI)

• Scientific question: How does estrogen treatment a↵ect health out-

1. Observational population: 93,676 women enlisted starting in 1991,

2. Results: good health/low rates of CHD generally associated with estro-

3. Conclusion: Estrogen treatment is positively associated with health out-

Experimental Study (WHI randomized controlled trial):

373,092 women determined to be eligible

Women were of di↵erent ages and were treated at di↵erent clinics.

ni,j = # of women in study, in clinic i and in age group j

Randomization scheme: For each block, 50% of the women in that

and a lower incidence rate of

3. Conclusion: Estrogen treatment is not a viable preventative measure

(general) if everyone in the population were treated, the incidence

✏ = “health consciousness” (not directly measured)

Association between x and y may be due to an unmeasured variable ✏.

Randomization breaks the association between ✏ and x.

Observational studies can suggest good experiments to run, but can’t

Randomization can eliminate correlation between x and y due to a

“No causation without randomization”

1.4 Steps in designing an experiment

3. Choose a response/output variable.

4. Determine potential sources of variation in response:

(a) factors of interest

5. Decide which variables to measure and control:

(a) treatment variables

6. Decide on the experimental procedure and how treatments are to be

Three principles of Experimental Design

Example (Crop study):

bad soil good soil

Questions for discussion:

• What are the benefits of this design?

• What other designs might work?

• What other designs wouldn’t work?

• Should the plots be divided up further? If so, how should treatments

Test statistics and

Example: Wheat yield

Question: Is one fertilizer better than another, in terms of yield?

Outcome variable: Wheat yield.

Factor of interest: Fertilizer type, A or B. One factor having two levels.

Experimental material: One plot of land to be divided into 2 rows of 6

1. Design question: How should we assign treatments/factor levels to

2. Potential sources of variation: Fertilizer, soil, sun, water, etc.

3. Implementation: If we assign treatments randomly, we can avoid

This is the first design we will study, a completely randomized design.

2.1 Summaries of sample populations

• Empirical distribution: P̂r(a, b] = #(a < yi  b)/n

F̂ (y) = #(yi  y)/n = P̂r( 1, y]

#(yi  y.5 )/n 1/2 #(yi y.5 )/n 1/2

• sample variance and standard deviation: