Statistics Notes
Statistics Notes
i
CONTENTS ii
5 Introduction to ANOVA 60
5.1 A model for treatment variation . . . . . . . . . . . . . . . . . 62
5.1.1 Model Fitting . . . . . . . . . . . . . . . . . . . . . . . 63
5.1.2 Testing hypothesis with MSE and MST . . . . . . . . . 66
5.2 Partitioning sums of squares . . . . . . . . . . . . . . . . . . . 70
5.2.1 The ANOVA table . . . . . . . . . . . . . . . . . . . . 72
5.2.2 Understanding Degrees of Freedom: . . . . . . . . . . . 73
5.2.3 More sums of squares geometry . . . . . . . . . . . . . 76
5.3 Unbalanced Designs . . . . . . . . . . . . . . . . . . . . . . . . 78
5.3.1 Sums of squares and degrees of freedom . . . . . . . . . 79
5.3.2 ANOVA table for unbalanced data: . . . . . . . . . . . 81
5.4 Normal sampling theory for ANOVA . . . . . . . . . . . . . . 83
5.4.1 Sampling distribution of the F -statistic . . . . . . . . . 85
5.4.2 Comparing group means . . . . . . . . . . . . . . . . . 88
5.4.3 Power calculations for the F-test . . . . . . . . . . . . 90
5.5 Model diagnostics . . . . . . . . . . . . . . . . . . . . . . . . . 92
5.5.1 Detecting violations with residuals . . . . . . . . . . . 93
5.5.2 Checking normality assumptions: . . . . . . . . . . . . 94
5.5.3 Checking variance assumptions . . . . . . . . . . . . . 96
5.5.4 Variance stabilizing transformations . . . . . . . . . . . 100
5.6 Treatment Comparisons . . . . . . . . . . . . . . . . . . . . . 106
5.6.1 Contrasts . . . . . . . . . . . . . . . . . . . . . . . . . 107
5.6.2 Orthogonal Contrasts . . . . . . . . . . . . . . . . . . . 110
5.6.3 Multiple Comparisons . . . . . . . . . . . . . . . . . . 112
5.6.4 False Discovery Rate procedures . . . . . . . . . . . . . 115
5.6.5 Nonparametric tests . . . . . . . . . . . . . . . . . . . 115
iv
LIST OF FIGURES v
Principles of experimental
design
1.1 Induction
Much of our scientific knowledge about processes and systems is based on
induction: reasoning from the specific to the general.
Example (survey): Do you favor increasing the gas tax for public trans-
portation?
1
CHAPTER 1. PRINCIPLES OF EXPERIMENTAL DESIGN 2
x1
x2 Process y
• Input variables:
• Output variables:
Observational Study:
1. Experimental population:
⇢
x = 1 (estrogen treatment)
16,608 women randomized to either
x = 0 (no estrogen treatment)
age group
1 (50-59) 2 (60-69) 3 (70-79)
clinic 1 n11 n12 n13
2 n21 n22 n23
.. .. .. ..
. . . .
2. Results: JAMA, July 17 2002. Also see the NLHBI press release . Women
on treatment had a higher incidence rate of
• CHD
• breast cancer
• stroke
• pulmonary embolism
• colorectal cancer
• hip fracture
CHAPTER 1. PRINCIPLES OF EXPERIMENTAL DESIGN 5
suggests
Question: Why the di↵erent conclusions between the two studies? Con-
sider the following possible explanation: Let
x = estrogen treatment
y = health outcomes
x correlation y
randomization
x correlation= y
causation
CHAPTER 1. PRINCIPLES OF EXPERIMENTAL DESIGN 6
2. Choose a set of experimental units, which are the units to which treat-
ments will be randomized.
The order of these steps may vary due to constraints such as budgets, ethics,
time, etc..
CHAPTER 1. PRINCIPLES OF EXPERIMENTAL DESIGN 7
9
CHAPTER 2. TEST STATISTICS AND RANDOMIZATION DISTRIBUTIONS10
4. Results:
B A B A B B
26.9 11.4 26.6 23.7 25.3 28.5
B A A A B A
14.2 17.9 16.5 21.1 24.3 19.6
How much evidence is there that fertilizer type is a source of yield variation?
Evidence about di↵erences between two populations is generally measured by
comparing summary statistics across the two sample populations. (Recall, a
statistic is any computable function of known, observed data).
• Histograms
• Kernel density estimates
Note that these summaries more or less retain all the information in
the data except the unit labels.
Location:
CHAPTER 2. TEST STATISTICS AND RANDOMIZATION DISTRIBUTIONS11
1
Pn
• sample mean or average : ȳ = n i=1 yi
• sample median : A/the value y.5 such that
To find the median, sort the data in increasing order, and call
these values y(1) , . . . , y(n) . If there are no ties, then
if n is odd, then y( n+1 ) is the median;
2
if n is even, then all numbers between y( n2 ) and y( n2 +1) are
medians.
Scale:
• interquantile range:
[y.25 , y.75 ] (interquartile range)
[y.025 , y.975 ] (95% interval)
0.06
0.04
0.8
0.04
0.4 0.6
Density
Density
F(y)
0.02
0.02
0.2
0.00
0.00
0.0
15 20 25 10 15 20 25 30 0 10 20 30 40
y y y
1.0
0.00 0.08
0.12
Density
0.8
0.08
0.4 0.6
Density
10 15 20 25
F(y)
yA
0.04
0.00 0.10
Density
0.2
0.00
0.0
15 20 25 10 15 20 25 30 10 15 20 25 30 35
y yB
> sd (yA)
[ 1 ] 4.234934
> sd (yB)
[ 1 ] 5.151699
Hypothesis tests:
to
values of |ȳB ȳA | that could have been observed if H0 were true.
Hypothetical values of |ȳB ȳA | that could have been observed under H0 are
referred to as samples from the null distribution.
B A B A B B
B A A A B A
B A B A B B
26.9 11.4 26.6 23.7 25.3 28.5
B A A A B A
14.2 17.9 16.5 21.1 24.3 19.6
B A B B A A
A B B A A B
B A B B A A
26.9 11.4 26.6 23.7 25.3 28.5
A B B A A B
14.2 17.9 16.5 21.1 24.3 19.6
• H0 is true.
equally likely ways the treatments could have been assigned. For each one
of these, we can calculate the value of the test statistic that would’ve been
observed under H0 :
{g1 , g2 , . . . , g924 }
This enumerates all potential pre-randomization outcomes of our test statis-
tic, assuming no treatment e↵ect. Along with the fact that each treatment
CHAPTER 2. TEST STATISTICS AND RANDOMIZATION DISTRIBUTIONS16
0.12
0.20
0.08
Density
Density
0.10
0.04 0.00
−10 −5 0 5 10 0.00 0 2 4 6 8
YB − YA |YB − YA|
#{gk x}
F (x|H0 ) = Pr(g(YA , YB ) x|H0 ) =
924
This distribution is sometimes called the randomization distribution, be-
cause it is obtained by the randomization scheme of the experiment.
(b) compute the value of the test statistic, given the simulated treatment
assignment and under H0 .
#(gs gobs )
⇡ Pr(g(YA , YB ) gobs |H0 )
S
The approximation improves if S is increased.
Here is some R-code:
y< c ( 2 6 . 9 , 1 1 . 4 , 2 6 . 6 , 2 3 . 7 , 2 5 . 3 , 2 8 . 5 , 1 4 . 2 , 1 7 . 9 , 1 6 . 5 , 2 1 . 1 , 2 4 . 3 , 1 9 . 6 )
x< c ( ”B” , ”A” , ”B” , ”A” , ”B” , ”B” , ”B” , ”A” , ”A” , ”A” , ”B” , ”A” )
g . null< real ()
for ( s in 1:10000)
{
xsim< sample ( x )
g . n u l l [ s]< abs ( mean ( y [ xsim==”B ” ] ) mean ( y [ xsim==”A” ] ) )
}
1. From the data, compute a relevant test statistic g(y): The test statistic
g(y) should be chosen so that it can di↵erentiate between H0 and H1 in
ways that are scientifically relevant. Typically, g(y) is chosen so that
⇢
small under H0
g(y) is probably
large under H1
Questions:
• Is a small p-value evidence in favor of H1 ?
• What does the p-value say about the probability that the null hypoth-
esis is true? Try using Bayes’ rule to figure this out.
t-statistic:
|ȳ ȳA |
gt (yA , yB ) = p B , where
sp 1/nA + 1/nB
nA 1 nB 1
s2p = s2 + s2B
(nA 1) + (nB 1) A (nA 1) + (nB 1)
This is a scaled version of our previous test statistic, in which we com-
pare the di↵erence in sample means to a pooled version of the sample
standard deviation and the sample size. Note that this statistic is
• increasing in |ȳB ȳA |;
• increasing in nA and nB ;
• decreasing in sp .
A more complete motivation for using this statistic will be given in the
next chapter.
Kolmogorov-Smirnov statistic:
gKS (yA , yB ) = max |F̂B (y) F̂A (y)|
y2R
This is just the size of the largest gap between the two sample CDFs.
1.0
Density
0.3
0.8
0.0
0.6
6 8 10 12 14
F(y)
yA
0.4
Density
0.2
0.10
0.0
0.00
6 8 10 12 14 6 8 10 12 14
yB y
Figure 2.3: Histograms and empirical CDFs of the first two hypothetical
samples.
Gsim< NULL
for ( s in 1:5000)
{
xsim< sample ( x )
yAsim< y [ xsim==”A” ] ; yBsim< y [ xsim==”B” ]
g1< g . t s t a t ( yAsim , yBsim )
g2< g . ks ( yAsim , yBsim )
Gsim< r b i n d ( Gsim , c ( g1 , g2 ) )
}
The hypothesis test based on the t-statistic does not indicate strong evidence
against H0 , whereas the test based on the KS-statistic does. The reason is
that the t-statistic is only sensitive to di↵erences in means. In particu-
lar, if ȳA = ȳB then the t-statistic is zero, its minimum value. In contrast, the
KS-statistic is sensitive to any di↵erences in the sample distributions.
Now let’s consider a second dataset, shown in Figure 2.5, for which
• nA = nB = 40
CHAPTER 2. TEST STATISTICS AND RANDOMIZATION DISTRIBUTIONS21
6
0.6
5
Density
Density
3 4
0.4
2
0.2
1
0.0
Figure 2.4: Randomization distributions for the t and KS statistics for the
first example.
• sA = 1.75, sB = 1.85
The di↵erence in sample means is about twice as large as in the previous ex-
ample, and the sample standard deviations are pretty similar. The B-samples
are slightly larger than the A-samples on average. Is there evidence that this
is caused by treatment? Again, we evaluate H0 using the randomization
distributions of our two test statistics.
This time the two test statistics indicate similar evidence against H0 . This
is because the di↵erence in the two sample distributions could primarily be
summarized as the di↵erence between the sample means, which the t-statistic
can identify.
CHAPTER 2. TEST STATISTICS AND RANDOMIZATION DISTRIBUTIONS22
0.15 0.30
1.0
Density
0.8
0.00
0.6
8 10 12 14 16
F(y)
yA
0.4
Density
0.15
0.2
0.0
0.00
8 10 12 14 16 8 10 12 14 16
yB y
Figure 2.5: Histograms and empirical CDFs of the second two hypothetical
samples.
6
0.6
5
Density
Density
2 3 4
0.2 0.4
1
0.0
Figure 2.6: Randomization distributions for the t and KS statistics for the
second example.
CHAPTER 2. TEST STATISTICS AND RANDOMIZATION DISTRIBUTIONS23
In this case H0 and H1 are not complementary, and we are only interested
in evidence against H0 of a certain type, i.e. evidence that is consistent with
H1 . In this situation we may want to use a statistic like gt .
truth
action H0 true H0 false
accept H0 correct decision type II error
reject H0 type I error correct decision
As we discussed,
• the p-value can measure of evidence against H0 ;
Decision procedure:
1. Compute the p-value by comparing observed test statistic to the null
distribution.
Single Experiment Interpretation: If you use a level-↵ test for your ex-
periment where H0 is true, then before you run the experiment
there is probability ↵ that you will erroneously reject H0 .
• complicated,
25
CHAPTER 3. TESTS BASED ON POPULATION MODELS 26
Var[ȲA ] = E[(YA µA )2 ] ! 0
ȲA ! µA
CHAPTER 3. TESTS BASED ON POPULATION MODELS 27
0.10
0.08
0.08
0.06
random sampling
0.06
yA=18.37
sA=4.23
0.04
0.04
0.02
0.02
µA
0.00
0.00
5 10 15 20 25 30 35
yA
‘All possible' B wheat yields Experimental samples
0.08
0.12
0.06
random sampling
0.08
0.04
yB=24.30
sB=5.15
0.04
0.02
µB
0.00
0.00
5 10 15 20 25 30 35
yB
and we say that ȲA is a consistent estimator for µA . Several of our other
sample characteristics are also consistent for the corresponding population
characteristics. As n ! 1,
ȲA ! µA
s2A ! A2
Z x
#{Yi,A x}
= F̂A (x) ! FA (x) = pA (y)dy
nA 1
These are both due to the central limit theorem. Letting P (µ, ) denote a
population with mean µ and variance 2 , then
9
X1 ⇠ P1 (µ1 , 12 ) >
>
>
X2 ⇠ P2 (µ2 , 22 ) = Xm ⇣X X ⌘
2
.. ) Xi ⇠
˙ normal µ j , j .
. >
>
> j=1
Xm ⇠ Pm (µm , m 2
) ;
(m) (m)
Experiment m: sample y1 , . . . , yn ⇠ i.i.d. p and compute ȳ (m) .
2 2 2
where AB = A /nA + B /nB . So if we knew the variances, we’d have a null
distribution.
H0 : µ = µ 0
H1 : µ 6= µ0
Examples:
Physical therapy
Physics
E[Ȳ ] = µ
2
Var[Ȳ ] = /n
Ȳ is approximately normal.
Under H0
2
(Ȳ µ0 ) ⇠ normal(0, /n),
2
but we can’t use this as a null distribution because is unknown. What if
we scale (Ȳ µ0 )? Then
Ȳ µ0
f (Y) = p
/ n
is approximately standard normal and we write f (Y) ⇠ normal(0, 1). Since
this distribution contains no unknown parameters we could potentially use
it as a null distribution. However, having observed the data y, is f (y) a
statistic?
One-sample t-statistic:
Ȳ µ0
t(Y) = p
s/ n
For a given value of µ0 this is a statistic. What is the null distribution of
t(Y)?
Ȳ µ0 Ȳ µ0
s⇡ so p ⇡ p
s/ n / n
If Y1 , . . . , Yn ⇠ i.i.d. normal(µ0 , 2 ) then Ȳ /pµn0 is normal(0, 1), and so it would
seem that t(Y) is approximately distributed as a standard normal distribu-
tion under H0 : µ = µ0 . However, if the approximation s ⇡ is poor, like
when n is small, we need to take account of our uncertainty in the estimate
of .
2
The distribution
X
Z1 , . . . , Zn ⇠ i.i.d. normal(0, 1) ) Zi2 ⇠ 2
n , chi-squared dist with n degrees of freedom
X
(Zi Z̄)2 ⇠ 2
n 1
1 X
2
(Yi Ȳ )2 ⇠ 2
n 1
n=9
n=10
p (X )
n=11
0.040.00
0 5 10 15 20 25 30
X
2
Figure 3.2: distributions
The t-distribution
If
• Z ⇠ normal (0,1) ;
2
• X⇠ m;
• Z, X statistically independent,
then
Z
p ⇠ tm , the t-distribution with m degrees of freedom
X/m
2
How does this help us? Recall that if Y1 , . . . , Yn ⇠ i.i.d. normal(µ, ),
CHAPTER 3. TESTS BASED ON POPULATION MODELS 34
0.4
0.3
n=3
n=6
p (t )
0.2
n=12
n=∞
∞
0.1
0.0
−3 −2 −1 0 1 2 3
t
p
• n(Ȳ µ)/ ⇠normal(0,1)
n 1 2 2
• 2 s ⇠ n 1
• Ȳ , s2 are independent.
p n 1 2
Let Z = n(Ȳ µ)/ , X = 2 s . Then
p
Z n(Ȳ µ)/
p = q
X/(n 1) n 1 2
2 s /(n 1)
Ȳ µ
= p ⇠ tn 1
s/ n
This is still not a statistic because µ is unknown. However, under a specific
hypothesis like H0 : µ = µ0 , it is a statistic:
Ȳ µ0
t(Y) = p ⇠ tn 1 if E[Y ] = µ0
s/ n
It is called the t-statistic.
Some questions for discussion:
CHAPTER 3. TESTS BASED ON POPULATION MODELS 35
• Consider the situation where the data are not normally distributed.
p
– What is the distribution of n(Ȳ µ)/ for large and small n?
– What is the distribution of n 21 s2 for large and small n?
p
– Are n(Ȳ µ) and s2 independent for small n? What about for
large n?
2. Null hypothesis: H0 : µ = µ0
3. Alternative hypothesis: H1 : µ 6= µ0
p
4. Test statistic: t(Y) = n(Ȳ µ0 )/s
5. Null distribution: Under the normal sampling model and H0 , the sam-
pling distribution of t(Y) is the t-distribution with n 1 degrees of
freedom:
t(Y) ⇠ tn 1
If H0 is not true, then t(Y) does not have a t-distribution. If the data
are normal but the mean is not µ0 , then t(Y) has a non-central t-
distribution, which we will use later to calculate power, or the type II
error rate. If the data are not normal then the distribution of t(Y) is
not a t-distribution.
CHAPTER 3. TESTS BASED ON POPULATION MODELS 36
• p-value ↵ or equivalently
• |t(y)| t(n 1),1 ↵/2 (for ↵ = .05, t(n 1),1 ↵/2 ⇡ 2 ).
The value t(n 1),1 ↵/2 ⇡ 2 is called the critical value value for this test.
In general, the critical value is the value of the test statistic above
which we would reject H0 .
Question: Suppose our procedure is to reject H0 only when t(y) t(n 1),1 ↵ .
Is this a level-↵ test?
Sampling model:
2
Y1A , . . . , YnA A ⇠ i.i.d. normal(µA , )
2
Y1B , . . . , YnB B ⇠ i.i.d. normal(µB , ).
In addition to normality we assume for now that both variances are equal.
CHAPTER 3. TESTS BASED ON POPULATION MODELS 37
Hypotheses: H0 : µA = µB ; HA : µA 6= µB
Recall that ✓ ◆
2 1 1
ȲB ȲA ⇠ N µB µA , + .
nA nB
Hence if H0 is true then
✓ ◆
2 1 1
ȲB ȲA ⇠ N 0, + .
nA nB
Ȳ ȲA
t(YA , YB ) = qB ⇠ tnA +nB 2
1 1
sp nA
+ nB
Self-check exercises:
Decision procedure:
Data:
t-statistic:
Inference:
Two Sample t t e s t
Always keep in mind where the p-value comes from: See Figure 3.4.
−4 −2 0 2 4
T
t(1) , . . . , t(S)
#(|t(s) | |tobs |)
p-value =
S
−6 −4 −2 0 2 4 6
t( Y A ,Y B )
Assumptions: Under H0 ,
• Randomization Test:
1. Treatments are randomly assigned
• t-test:
1. Data are independent samples
2. Each population is normally distributed
3. The two populations have the same variance
Imagined Universes:
• Randomization Test: Numerical responses remain fixed, we
imagine only alternative treatment assignments.
CHAPTER 3. TESTS BASED ON POPULATION MODELS 41
Some history:
Pn
de Moivre (1733): Approximating binomial distributions T = i=1 Yi , Y i 2
{0, 1}.
we showed that if Y1,A , . . . , YnA ,A and Y1,B , . . . , YnB ,B are independent samples
from pA and pB respectively, and
(a) µA = µB
2 2
(b) A = B
So our null distribution really assumes conditions (a), (b) and (c). Thus if
we perform a level-↵ test and reject H0 , we are really just rejecting that (a),
(b), (c) are all true.
CHAPTER 3. TESTS BASED ON POPULATION MODELS 43
For this reason, we will often want to check if conditions (b) and (c) are
plausibly met. If
(b) is met
(c) is met
H0 is rejected, then
k 1 1
here Pr(Z z k 1 )= nA 2
. The 2
is a continuity correction.
nA 2
If 1/4 < s2A /s2B < 4, we won’t worry too much about unequal variances.
CHAPTER 3. TESTS BASED ON POPULATION MODELS 44
24
Sample Quantiles
Sample Quantiles
Sample Quantiles
● ●
18 22 26
● ●
20
●
●
0.5
●
●
● ●
● ● ●
16
−0.5
●
12
14
● ● ● ● ●
−1.0 0.0 1.0 −1.0 0.0 1.0 −1.0 0.0 1.0 −1.0 0.0 1.0
Theoretical Quantiles Theoretical Quantiles Theoretical Quantiles Theoretical Quantiles
1.0
● ● ● 0.5 ● ●
● ●
1.0
●
Sample Quantiles
Sample Quantiles
Sample Quantiles
Sample Quantiles
●
−0.5 0.5
● ●
● ● ● ●
−1.0 0.0
−0.5
● ● ● ●
●
−2.0
−1.5
● ● ● ●
−1.0 0.0 1.0 −1.0 0.0 1.0 −1.0 0.0 1.0 −1.0 0.0 1.0
Theoretical Quantiles Theoretical Quantiles Theoretical Quantiles Theoretical Quantiles
●
Sample Quantiles
Sample Quantiles
Sample Quantiles
Sample Quantiles
−0.5 0.0 0.5
● ●
● ●
● ●
0.0
● ●
●
● ● ●
●
−1.0
●
● ● ● ● ●
−1.0 0.0 1.0 −1.0 0.0 1.0 −1.0 0.0 1.0 −1.0 0.0 1.0
Theoretical Quantiles Theoretical Quantiles Theoretical Quantiles Theoretical Quantiles
−0.5 0.5 1.5 2.5
● ● ● ●
0.0 0.5
2
Sample Quantiles
Sample Quantiles
Sample Quantiles
Sample Quantiles
● ●
−1.5 −0.5 0.5
1
●
● ●
● ●
●
−1 0
●
● ●
● ●
● ●
−1.0
● ● ● ●
−1.0 0.0 1.0 −1.0 0.0 1.0 −1.0 0.0 1.0 −1.0 0.0 1.0
Theoretical Quantiles Theoretical Quantiles Theoretical Quantiles Theoretical Quantiles
This may not sound very convincing. In later sections, we will show how
to perform formal hypothesis tests for equal variances. However, this won’t
completely solve the problem. If variances do seem unequal we have a variety
of options available:
This statistic looks pretty reasonable, and for large nA and nB its null dis-
tribution will indeed be a normal(0, 1) distribution. However, the exact null
distribution is only approximately a t-distribution, even if the data are
actually normally distributed. The t-distribution we compare tw to is a t⌫w -
distribution, where the degrees of freedom ⌫w are given by
⇣ 2 ⌘2
sA s2B
nA
+ nB
⌫w = ⇣ 2 ⌘2 ⇣ 2 ⌘2 .
1 sA 1 s
nA 1 nA
+ nB 1 nBB
This is known as Welch’s approximation; it may not give an integer as the
degrees of freedom.
This t-distribution is not, in fact the exact sampling distribution of tdi↵ (yA , yB )
under the null hypothesis that µA = µB , and A2 6= B2 . This is because the
null distribution depends on the ratio of the unknown variances, A2 and B2 .
This difficulty is known as the Behrens-Fisher problem.
• If the sample sizes are the same (nA = nB ) then the test statistics
tw (y A , y B ) and t(y A , y B ) are the same; however the degrees of freedom
used in the null distribution will be di↵erent unless the sample standard
deviations are the same.
CHAPTER 3. TESTS BASED ON POPULATION MODELS 46
• If nA > nB , but A2 < B2 , and µA = µB then the two sample test based
on comparing t(y A , y B ) to a t-distribution on nA + nB 2 d.f. will
reject more than 5% of the time.
– If the null hypothesis that both the means and variances are equal,
i.e.
H0 : µA = µB and A2 = B2
is scientifically relevant, then we are computing a valid p-value,
and this higher rejection rate is a good thing! Since when the
variances are unequal the null hypothesis is false.
– If however, the hypothesis that is most scientifically relevant is
H0 : µA = µB
without placing any restrictions on the variances, then the higher
rejection rate in the test that assumes the variances are the same
could be very misleading, since p-values may be smaller than they
are under the correct null distribution (in which A2 6= B2 ).
Likewise we will underestimate the probability of type I error.
• If nA > nB and A2 > B2 , then the p-values obtained from the test using
t(y A , y B ) will tend to be conservative (= larger) than those obtained
with tw (y A , y B ).
In short: one should be careful about applying the test based on t(yA , yB ) if
the sample standard deviations appear very di↵erent, and it is not reasonable
to assume equal means and variances under the null hypothesis.
Chapter 4
• H0 : E[Y ] = µ0 is rejected if
p
n|(ȳ µ0 )/s| t1 ↵/2
47
CHAPTER 4. CONFIDENCE INTERVALS AND POWER 48
1. gather data;
Recall that we may construct a 95% confidence interval by finding those null
hypotheses that would not be rejected at the 0.05 level.
Sampling model:
2
Y1,A , . . . , YnA ,A ⇠ i.i.d. normal(µA , )
2
Y1,B , . . . , YnB ,B ⇠ i.i.d. normal(µB , ).
Consider evaluating whether is a reasonable value for the di↵erence in
means:
H0 : µB µA =
H1 : µ B µA 6=
CHAPTER 4. CONFIDENCE INTERVALS AND POWER 49
(ȲB ȲA )
p ⇠ tnA +nB 2
sp 1/nA + 1/nB
Thus a given di↵erence is accepted at level ↵ if
|ȳ ȳA |
pB tc
sp 1/nA + 1/nB
r r
1 1 1 1
(ȳB ȳA ) sp + tc (ȳB ȳA ) + sp + tc
nA nB nA nB
where tc = t1 ↵/2,nA +nB 2 is the critical value.
Wheat example:
• ȳB ȳA = 5.93
p
• sp = 4.72, sp 1/nA + 1/nB = 2.72
• t.975,10 = 2.23
A 95% C.I. for µB µA is
Questions:
• What does the fact that 0 is in the interval say about H0 : µA = µB ?
• H0 : µ A = µ B H1 : µA 6= µB
• Gather data.
What about
This is not yet a well-defined problem: there are many di↵erent ways in which
the null hypothesis may be false, e.g. µB µA = 0.0001 and µB µA = 10, 000
are both instances of the alternative hypothesis. However, clearly we have
Power( ) = Pr(reject H0 | µB µA = )
= Pr(|t(Y A , Y B )| t1 ↵/2,nA +nB 2 µB µA = ).
CHAPTER 4. CONFIDENCE INTERVALS AND POWER 51
Remember, the “critical” value t1 ↵/2,nA +nB 2 above which we reject the null
hypothesis was computed from the null distribution.
However, now we want to work out the probability of getting a value of
the t-statistic greater than this critical value, when a specific alternative
hypothesis is true. Thus we need to compute the distribution of our t-
statistic under the specific alternative hypothesis.
If we suppose Y1,A , . . . , YnA ,A ⇠ i.i.d. normal(µA , 2 ) and Y1,B , . . . , YnB ,B ⇠
i.i.d. normal(µB , 2 ), where µB µA = then to calculate the power we need
to know the distribution of
Ȳ ȲA
t(Y A , Y B ) = qB .
1 1
sp nA
+ nB
but unfortunately
ȲB ȲA
t(Y A , Y B ) = q + q . (4.1)
sp n1A + 1
nB
sp 1
nA
+ 1
nB
The first part in the above equation has a t-distribution, which is centered
around zero. The second part moves the t-statistic away from zero by an
amount that depends on the pooled sample variance. For this reason, we
call the distribution of the t-statistic under µB µA = the non-central
t-distribution. In this case, we write
0 1
γ=0
0.3
γ=1
γ=2
0.2
0.1
0.0
−2 0 2 4 6
t
where
• is a constant;
• Z is standard normal;
2
• X is with ⌫ degrees of freedom, independent of Z.
• (n + 1) = n! if n is an integer
• (r + 1) = r (r)
p
• (1) = 1, ( 12 ) = ⇡
Hence 0 1
Ȳ ȲA
qB ⇠ normal @ q , 1A .
1 1 1 1
nA
+ nB nA
+ nB
• standard deviation 1.
CHAPTER 4. CONFIDENCE INTERVALS AND POWER 54
Another way to get the same result is to refer back to the expression for the
t-statistic given in 4.1:
ȲB ȲA
t(Y A , Y B ) = q + q
sp n1A + 1
nB
sp 1
nA
+ 1
nB
The first term anA ,nB has a t-distribution, and becomes standard normal as
nA , nB ! 1. As for bnA ,nB , since s2p ! 2 as nA or nB ! 1, we have
1
p ! 1 as nA , nB ! 1.
bnA ,nB sp 1/nA + 1/nB
γ=0
0.3
γ=1
0.2
0.1
0.0
−6 −4 −2 0 2 4 6
t
= q .
1 1
nA
+ nB
We will want to make this calculation in order to see if our sample size is
sufficient to have a reasonable chance of rejecting the null hypothesis. If we
have a rough idea of and 2 we can evaluate the power using this formula.
t . crit < qt ( 1 a l p h a /2 , nA + nB 2 )
When you do these calculations you should think of Figure 4.2. Letting T ⇤
and T be non-central and central t-distributed random variables respectively,
make sure you can relate the following probabilities to the figure:
• Pr(T ⇤ > tc )
CHAPTER 4. CONFIDENCE INTERVALS AND POWER 56
• Pr(T ⇤ < tc )
• Pr(T > tc )
• Pr(T < tc )
Note that if the power Pr(|T ⇤ | > tc ) is large, then one of Pr(T ⇤ > tc ) or
Pr(T ⇤ < tc ) will be very close to zero.
µB µA = 5
2
is unknown: We’ll assume the pooled sample variance from the first
experiment is a good approximation: 2 = 22.24.
CHAPTER 4. CONFIDENCE INTERVALS AND POWER 57
1.0
● ●●●●
4.0
●
● ●●●
● ● ●●
●
●
● ●●
● ●
●
3.5
0.8
● ●
●
● ●
● ●
●
power
● ●
3.0
●
γ
● ●
0.6
● ●
● ● power
● ●
2.5
● normal approx
● ●
● ●
●
0.4
2.0
● ●
●
10 15 20 25 30 10 15 20 25 30
n n
Figure 4.3: and power versus sample size, and the normal approximation
to the power.
What is the probability we’ll reject H0 at level ↵ = 0.05 for a given sample
size?
d e l t a < 5 ; s2< ( (nA 1)⇤ var (yA) + (nB 1)⇤ var (yB) ) / ( nA 1+nB 1)
t . crit < qt (1 a l p h a / 2 , 2 ⇤ n 2)
t . gamma< d e l t a / s q r t ( s 2 ⇤ ( 1 / n+1/n ) )
So we see that if the true mean di↵erence were µB µA = 5, then the original
study only had about a 40% chance of rejecting H0 . To have an 80% chance
or greater, the researchers would need a sample size of 15 for each group.
Note that the true power depends on the unknown true mean di↵erence and
true variance (assuming these are equal in the two groups). Even though our
power calculations were done under potentially inaccurate values of µB µA
and 2 , they still give us a sense of the power under various parameter values:
• How is the power a↵ected if the mean di↵erence is bigger? smaller?
• How is the power a↵ected if the variance is bigger? smaller?
1.0
0.8
0.8
0.4 0.6
0.4 0.6
power
power
σ=1
0.2
σ=2
0.2
σ=3
0.0
−4 −2 0 2 4 −3 −2 −1 0 1 2 3
µB − µA (µµB − µA)/σ
σ
Figure 4.4: Null and alternative distributions for another wheat example,
and power versus sample size.
Increasing power
As we’ve seen by the normal approximation to the power, for a fixed type I
error rate the power is a function of the noncentrality parameter
µB µA
= p ,
1/nA + 1/nB
so clearly power is
• increasing in |µB µA |;
• increasing in nA and nB ;
2
• decreasing in .
The first of these we do not generally control with our experiment (indeed, it
is the unknown quantity we are trying to learn about). The second of these,
sample size, we clearly do control. The last of these, the variance, seems
like something that might be beyond our control. However, the experimental
variance can often be reduced by dividing up the experimental material into
more homogeneous subgroups of experimental units. This design technique,
known as blocking, will be discussed in an upcoming chapter.
Chapter 5
Introduction to ANOVA
Results:
60
CHAPTER 5. INTRODUCTION TO ANOVA 61
●
8
response time (seconds)
●
●
●
7
●
● ●
●
●
6
● ● ●
●
●
● ●
●
5
● ●
●
A B C D E
treatment
If ↵ = 0.05, then
Pr(reject one or more H0i1 i2 | all H0i1 i2 true ) ⇡ 1 .9510 = 0.40
So, even though the pairwise error rate is 0.05 the experiment-wise error rate
is about 0.40. This issue is called the problem of multiple comparisons and
will be discussed further in Chapter 6. For now, we will discuss a method of
testing the global hypothesis of no variation due to treatment:
H0 : µi1 = µi2 for all i1 , i2 versus H1 : µi1 6= µi2 for some i1 , i2
CHAPTER 5. INTRODUCTION TO ANOVA 62
yij = measurement from the jth replicate under the ith treatment.
i = 1, . . . , m indexes treatments
j = 1, . . . , n indexes observations or replicates.
yij = µi + ✏ij
E[✏ij ] = 0
Var[✏ij ] = 2
yij = µ + ⌧i + ✏ij
E[✏ij ] = 0
Var[✏ij ] = 2
µi = µ + ⌧ i , ⌧ i = µi µ
yij = µ + ✏ij
E[✏ij ] = 0
Var[✏ij ] = 2
• µ = µ1 = · · · µm , or equivalently
• ⌧i = 0 for all i.
(n 1)s21 + · · · + (n 1)s2m
s2 =
(n 1) + · · · + (n 1)
P P
(y1j ȳ1· )2 + · · · + (y1j ȳ1· )2
=
m(n 1)
PP
(yij µ̂i )2
=
m(n 1)
SSE(µ̂)
= ⌘ MSE
m(n 1)
Then
A0 ! µ̂ is an unbiased estimator of µ
CHAPTER 5. INTRODUCTION TO ANOVA 65
A0+A1+A2 !
(µ̂, s2 ) are the minimum variance unbiased estimators of (µ, 2
)
n 1 2 2
(µ̂, n
s) are the maximum likelihood estimators of (µ, )
Within-treatment variability:
Between-treatment variability:
where
1 XX
ȳ·· = yij
mn
1
= (ȳ1 + · · · + ȳm )
m
is the grand mean of the sample. We call SST the treatment sum of squares.
We also define MST = SST/(m 1) as the treatment mean squares or mean
squares (due to) treatment. Notice that MST is simply n times the sample
variance of the sample means:
" m
#
1 X
MST = n ⇥ (ȳi· ȳ·· )2
m 1 i=1
CHAPTER 5. INTRODUCTION TO ANOVA 66
Note that m
X
{µ1 , . . . , µm } not all equal , (µi µ̄)2 > 0
i=1
Probabilistically,
m
X m
X
2
(µi µ̄) > 0 ) a large (ȳi· ȳ·· )2 will probably be observed.
i=1 i=1
Inductively,
X X
a large (ȳi· ȳ·· )2 observed ) (µi µ̄)2 > 0 is plausible
i=1m i=1m
So a large value of SST or MST gives evidence that there are di↵erences
between the true treatment means. But how large is large? We need to
know what values of MST to expect under H0 .
Notice that
Pp p P
( nȲi nȲ )2 n (Ȳi Ȳ )2
=
m 1 m 1
SST
=
m 1
= MST,
2
so E[MST|H0 ] = .
ExpectedPvalue of MSE:
MSE = m1 m 2
i=1 si , so
1 X
E[MSE] = E[s2i ]
m
1 X 2 2
= =
m
2
Let’s summarize our potential estimators of :
If H0 is true:
2
• E[MSE|H0 ] =
2
• E[MST|H0 ] =
If H0 is false:
2
• E[MSE|H1 ] =
CHAPTER 5. INTRODUCTION TO ANOVA 68
2
• E[(MST|H1 ] = + nv⌧2
If H0 is true:
2
• MSE ⇡
2
• MST ⇡
If H0 is false
2
• MSE ⇡
2
• MST ⇡ + nv⌧2 > 2
So
> SSE
[ 1 ] 12.0379
> SST
[ 1 ] 7.55032
CHAPTER 5. INTRODUCTION TO ANOVA 69
> MSE
[ 1 ] 0.8025267
> MST
[ 1 ] 1.88758
Randomization test:
F . obs< anova ( lm ( y˜ a s . f a c t o r ( x ) ) ) $F [ 1 ]
> F . obs
[ 1 ] 2.352046
$
set . seed (1)
F . n u l l < NULL
f o r ( nsim i n 1 : 1 0 0 0 )
{
x . sim< sample ( x )
F . n u l l < c (F . n u l l , anova ( lm ( y˜ a s . f a c t o r ( x . sim ) ) ) $F [ 1 ] )
}
$
> mean (F . n u l l >=F . obs )
[ 1 ] 0.102
0 2 4 6 8
F
Proof:
m X
X n XX
(yij ȳ·· )2 = [(yij ȳi· ) + (ȳi· ȳ·· )]2
i=1 j=1 i j
XX
= (yij ȳi· )2 + 2(yij ȳi· )(ȳi· ȳ·· ) + (ȳi· ȳ·· )2
i j
XX XX XX
= (yij ȳi· )2 + 2(yij ȳi· )(ȳi· ȳ·· ) + (ȳi· ȳ·· )2
i j i j i j
= (1) + (2) + (3)
CHAPTER 5. INTRODUCTION TO ANOVA 71
P P
(1) = (y ȳi· )2 = SSE
Pi Pj ij P
(3) = (ȳi· ȳ·· )2 = n i (ȳi· ȳ·· )2 = SST
P P
i j P P
(2) = 2 i j (yij ȳi· )(ȳi· ȳ·· ) = 2 i (ȳi· ȳ·· ) j (yij ȳi· )
• A residual ✏ˆij is the observed value minus the fitted value, ✏ˆij = yij ŷij .
If we believe H1 ,
• our estimate of µi is µ̂i = ȳi·
If we believe H0 ,
This leads to
(yij ȳ) = (ȳi ȳ) + (yij ȳi )
total variation = between group variation + within group variation
All data can be decomposed this way, leading to the decomposition of the
data vector of length m ⇥ n into two parts, as shown in Table 5.1. How do we
interpret the degrees of freedom? We’ve heard of degrees of freedom before,
in the definition of a 2 random variable:
CHAPTER 5. INTRODUCTION TO ANOVA 74
c1 + c2 + c3 = x1 x̄ + x2 x̄ x3 x̄
= (x1 + x2 + x3 ) 3x̄
= 3x̄ 3x̄
= 0
Thus we must have c1 + c2 = c3 , and so c1 , c2 , c3 can’t all be independently
varied, only two at a time can be arbitrarily changed. This vector thus lies in
a two-dimensional subspace of R3 , and has 2 degrees of freedom. In general,
0 1
x1 x̄
B .. C
@ . A
xm x̄
is an m-dimensional vector in an m 1 dimensional subspace, having m 1
degrees of freedom.
Exercise: Return to the vector decomposition of the data and obtain the
degrees of freedom of each component. Note that
• dof = dimension of the space the vector lies in
• SS = squared length of the vector
a = (y ȳ·· )
c = (y ȳtrt )
b = (ȳtrt ȳ·· )
Now recall
• yij , i = 1, . . . , m, j = 1, . . . , ni
P
• let N = m i=1 ni be the total sample size.
How does the ANOVA decomposition work in this case? What are the pa-
rameter estimates for the full and reduced model?
Null model:
2
yij = µ + ✏ij Var[✏ij ] =
You should be able to show that the least-squares estimators are
P
• µ̂ = ȳ·· = N1 yij
P
• s2 = N 1 1 (yij ȳ·· )2
Full model:
yij = µi + ✏ij
= µ + ⌧i + ✏ij
Var[✏ij ] = 2
1
P
which implied that s2 = s2i . However, if ni1 6= ni2 , then in general
m
m m ni m Pm Pni
1 X 1 XX 1 X 2 i=1 (yij ȳi· )2
6
ȳi· = yij , and si 6= Pmj=1
m i=1 N i=1 j=1 m i=1 i=1 (ni 1)
What should the parameter estimates be? With a bit of calculus you can
show that the least squares estimates of µi or (µ, ⌧i ) are
• µ̂i = ȳi
So µ̂ is a weighted average of the µ̂i ’s, and a weighted average of the ⌧ˆi ’s is
zero. Similarly,
Pm Pni Pm
2 i=1 j=1 (yij ȳi· )2 (ni 1)s2i
s = Pm = Pi=1
m
i=1 (ni 1) i=1 (ni 1)
Lets see if things add in a nice way. First, lets check orthogonality:
ni
m X
X
b·c = (ȳi· ȳ·· )(yij ȳi· )
i=1 j=1
Xm ni
X
= (ȳi· ȳ·· ) (yij ȳi· )
i=1 j=1
Xm
= (ȳi· ȳ·· ) ⇥ 0 = 0
i=1
• dof(b) = ?
and so the vector b does sum to zero. Another way of looking at it is that
the vector b is made up of m numbers, which don’t sum to zero, but their
weighted average sums to zero, and so the degrees of freedom are m 1.
Total N 1 SSTotal
Now suppose the following model is correct:
2
yij = µi + ✏ij Var[✏ij ] =
2
Does MSE still estimate ?
MSE = SSE/(N m)
Pn1 P m
j=1 (y1j ȳ1· )2 + · · · nj=1 (ymj ȳm· )2
=
(n1 1) + · · · + (nm 1)
(n1 1)s21 + · · · + (nm 1)s2m
=
(n1 1) + · · · + (nm 1)
2
So MSE is a weighted average of a bunch of unbiased estimates of , so it
is still unbiased.
Is the F -statistic still sensitive to deviations from H0 ? Note that a group
with more observations contributes more to the grand mean, but it also
contributes more terms to the SST. One can show
2
• E[MSE] =
2 N
• E[MST] = + v2
m 1 ⌧
, where
Pm
ni ⌧i2
– v⌧2 = i=1
N
– ⌧ i = µi µ̄
Pm
nµ
– µ= P i i.
i=1
ni
So yes, MST/MSE will still be sensitive to deviations from the null, but the
groups with larger sample sizes have a bigger impact on the power.
CHAPTER 5. INTRODUCTION TO ANOVA 83
B C
70
C
coagulation time
B C
B C
B
65
B D
A B D
A D
D
A
60
D
A D
diet
Questions:
• Does diet have an e↵ect on coagulation time?
• If a given diet were assigned to all the animals in the population, what
would the distribution of coagulation times be?
• If there is a diet e↵ect, how do the mean coagulation times di↵er?
The first question we can address with a randomization test. For the second
and third we need a sampling model:
yij = µi + ✏ij
2
✏11 . . . ✏mnm ⇠ i.i.d. normal(0, )
This model implies
CHAPTER 5. INTRODUCTION TO ANOVA 84
• independence of errors
• constant variance
Response : c t i m e
Df Sum Sq Mean Sq F v a l u e
diet 3 228.0 76.0 13.571
R e s i d u a l s 20 1 1 2 . 0 5.6
2 1 X
Y1 , . . . , Yn ⇠ i.i.d. normal(µ, )) 2
(Yi Ȳ )2 ⇠ 2
n 1
Also, 9
X1 ⇠ 2k1 =
2
X2 ⇠ 2k2 ) X1 + X2 ⇠ k1 +k2
;
X1 , X2 independent
Distribution of SSE:
PP P P
(Yij Ȳi· )2 1 1
2 = 2 (Y1j Ȳ1· )2 + · · · + 2 (Ymj Ȳm· )2
2 2
⇠ n1 1 + ··· + nm 1
2
⇠ N m
2 2
So SSE/ ⇠ N m.
Results so far:
2 2
• SSE/ ⇠ N m
2 2
• SST/ ⇠ m 1
Application: Under H0
SST
2 /(m 1) M ST
SSE
= ⇠ Fm 1,N m
2 /(N m) M SE
1. gather data
Response : c t i m e
Df Sum Sq Mean Sq F v a l u e Pr(>F)
diet 3 228.0 7 6 . 0 1 3 . 5 7 1 4 . 6 5 8 e 05 ⇤⇤⇤
R e s i d u a l s 20 1 1 2 . 0 5.6
CHAPTER 5. INTRODUCTION TO ANOVA 87
0.6
F(3,20)
F(3,10)
density
0.4
F(3,5)
F(3,2)
0.2
0.0
0 5 10 15 20
F
1.0
0.8
0.4 0.6
CDF
F(3,20)
F(3,10)
F(3,5)
F(3,2)
0.2
0 5 10 15 20
F
0 2 4 6 8
F
Fobs< anova ( lm ( c t i m e ˜ d i e t ) ) $F [ 1 ]
Fsim< NULL
f o r ( nsim i n 1 : 1 0 0 0 ) {
d i e t . sim< sample ( d i e t )
Fsim< c ( Fsim , anova ( lm ( c t i m e ˜ d i e t . sim ) ) $F [ 1 ] )
}
> 1 p f ( Fobs , 3 , 2 0 )
[ 1 ] 4 . 6 5 8 4 7 1 e 05
✓ˆ = ✓(Y
ˆ )
ˆ = 2
Var[✓]
\
Var[ ˆ = ˆ2
✓]
SE[✓]ˆ = ˆ
where ˆ 2 is an estimate of 2
. For example,
µ̂i = Ȳi·
Var[µ̂i ] = 2 /ni
\i ] = ˆ 2 /ni = s2 /ni
Var[µ̂
p
SE[µ̂i ] = s/ ni
Note that degrees of freedom are those associated with MSE, NOT ni 1.
As a result,
Ȳi ± SE[Ȳi· ] ⇥ t1 ↵/2,N m
is a 100 ⇥ (1 ↵)% confidence interval for µi .
✓ˆ ± 2 ⇥ SE[✓]
ˆ
Coagulation Example:
2
Power(µ, , n) = Pr(reject H0 |µ, 2 , n)
2
= Pr(Fobs > F1 ↵,m 1,N m |µ, , n)
CHAPTER 5. INTRODUCTION TO ANOVA 91
• degrees of freedom n 1, N m
• noncentrality parameter
X
=n ⌧i2 / 2
treatment variation
=
experimental uncertainty
= treatment variation ⇥ experimental precision
m = 4 α = 0.05 var.b/var.w=1
1.0
10 15 20 25 30 35 40
4.0
0.8
power
F.crit
λ
3.5
0.6
3.0
0.4
2 4 6 8 10 2 4 6 8 10 2 4 6 8 10
n n n
m = 4 α = 0.05 var.b/var.w=2
20 30 40 50 60 70 80
0.95
4.0
power
F.crit
0.85
λ
3.5
0.75
3.0
2 4 6 8 10 2 4 6 8 10 2 4 6 8 10
n n n
then
F = MST/MSE ⇠ Fm 1,N m,
H0 : µi = µ for all i = 1, . . . , m.
2
A0 : {✏ij } ⇠ i.i.d. normal(0, )
to describe all of A1 A3. We don’t do this because some violations of
assumptions are more serious than others. Statistical folklore says the order
of importance is A1, A2 then A3. We will discuss A1 when we talk about
blocking. For now we will talk about A2 and A3.
Parameter estimates:
Our fitted value for any observation in group i is ŷij = µ̂ + ⌧ˆi = ŷi·
Our estimate of the error is ✏ˆij = yij ȳi· .
✏ˆij is called the residual for observation i, j.
Assumptions about ✏ij can be checked by examining the values of ✏ˆij ’s:
• Histogram:
Make a histogram of ✏ˆij ’s. This should look approximately bell-shaped
if the (super)population is really normal and there are enough ob-
servations. If there are enough observations, graphically compare the
histogram to a N (0, s2 ) distribution.
In small samples, the histograms need not look particularly bell-shaped.
How non-normal can a sample from a normal population look? You can
always check yourself by simulating data in R. See Figure ??
Note that the data are counts so they cannot be exactly normally distributed.
CHAPTER 5. INTRODUCTION TO ANOVA 95
● ●
0.0 0.1 0.2 0.3 0.4 0.5
1
●
● ●
Sample Quantiles
●
●●
●●
●
Density
0
●●
●
● ●
● ●
−1 −2
●
−2 −1 0 1 2 −2 −1 0 1 2
y Theoretical Quantiles
● ●
●
0.0 0.1 0.2 0.3 0.4 0.5
●●
1
●●
●●
●●●●●
Sample Quantiles
●●
●●●
●
●
●●●
●
Density
●
0
●●●●●
●●●●●
●
●●●
●●●●
●●
−1
● ●
−2
●
●
−2 −1 0 1 2 −2 −1 0 1 2
y Theoretical Quantiles
●
●
0.0 0.1 0.2 0.3 0.4 0.5
●●●
●●●
Sample Quantiles
●
●●●●●
1
●●
●
●
●●
●●
●●
●●
●●
●
●
Density
●
●●
●●
●●
●●
●
●●
●●
●
●●
●
●
●
●
●
●
0
●
●
●●
●●
●
●
●
●●
●
●●
●●
●●
●●
●
●
●●
●●
●●
●●●
●
●●●
−1
●●
●●●●
●●
●
−2
●
●
−2 −1 0 1 2 −2 −1 0 1 2
y Theoretical Quantiles
Figure 5.8: Normal scores plots of normal samples, with n 2 {20, 50, 100}
CHAPTER 5. INTRODUCTION TO ANOVA 96
> anova ( lm ( c r a b [ , 2 ] ˜ a s . f a c t o r ( c r a b [ , 1 ] ) ) )
A n a l y s i s o f V a r i a n c e Table
Response : c r a b [ , 2 ]
Df Sum Sq Mean Sq F v a l u e Pr(>F)
as . f a c t o r ( crab [ , 1 ] ) 5 76695 15339 2 . 9 6 6 9 0 . 0 1 4 0 1 ⇤
Residuals 144 744493 5170
Residuals:
✏ˆij = yij µ̂i = yij ȳi·
Residual diagnostic plots are in Figure 5.10. The data are clearly not
normally distributed.
site1 site2
0.012
0.020
0.004 0.008
Density
Density
0.010 0.000
0.000
0 100 200 300 400 0 100 200 300 400
population population
site3 site4
0.015
0.12
0.010
0.08
Density
Density
0.005
0.04
0.000
0.00
0.04
Density
Density
0.04
0.02
0.02
0.00
0.00
Sample Quantiles
0.008
●●
●
0.004
●
●●
●●
●
●
●●
●
●
●●
●
●●
●
●●
●
●
●●
●
●
0
●
●
●●
●
●
●●●
●
●
●●
●
●●●
●●
●
●
●●
● ●
●
●
●●
●
●●
●
●
●●
●
●
0.000
●
●
●●
●
●●
●
●●
●●
●
●
●●
●
●●
●
●
●
● ●●●●●●
●●
●
Rule of thumb:
To check this, plot ✏ˆij (residual) vs. ŷij = ȳi· (fitted value).
●
●
●
residual
● ●
● ● ●
● ● ●
● ● ●
●
●
● ●
●
●●●
● ●
● ● ● ●
●
0
●
●●
● ● ● ●
●
●
●●
●
●
● ●
●
● ●
●
●
●
●
● ●
●
● ●
●
● ● ●
●
●
● ●
●
● ●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
10 20 30 40 50 60 70
fitted value
which is the ratio of the between group variability of the dij to the
within group variability of the dij .
2
Reject H0 : Var[✏ij ] = for all i, j if F0 > Ft 1,t(n 1),1 ↵
Crab data:
14, 229
F0 = = 2.93 > F5,144,0.95 = 2.28
4, 860
hence we reject the null hypothesis of equal variances at the 0.05 level.
See also
CHAPTER 5. INTRODUCTION TO ANOVA 100
Crab data: So the assumptions that validate the use of the F -test are
violated. Now what?
Yij = µi + ✏ij ,
was that if the noise ✏ij = Xij1 + Xij2 + · · · was the result of the addition of
unobserved additive, independent e↵ects then by the central limit theorem
✏ij will be approximately normal.
However, suppose the e↵ects are multiplicative, so that in fact:
In this case, the Yij will not be normal, and the variances will not be constant:
Log transformation:
So that the variance of the log-data does not depend on the mean µi . Also
note that by the central limit theorem the errors should be approximately
normally distributed.
CHAPTER 5. INTRODUCTION TO ANOVA 101
●
200 400
crab population
● ●
●
● ●
●
●
● ● ●
● ●
● ● ●
0
1 2 3 4 5 6
site
log crab population
−2 0 2 4 6
1 2 3 4 5 6
site
> anova ( lm ( l o g ( c r a b [ , 2 ] + 1 / 6 ) ˜ a s . f a c t o r ( c r a b [ , 1 ] ) ) )
A n a l y s i s o f V a r i a n c e Table
Response : l o g ( c r a b [ , 2 ] + 1 / 6 )
Df Sum Sq Mean Sq F v a l u e Pr(>F)
as . f a c t o r ( crab [ , 1 ] ) 5 54.73 10.95 2.3226 0.04604 ⇤
Residuals 144 6 7 8 . 6 0 4.71
> anova ( lm ( l o g ( c r a b [ , 2 ] + 1 / 6 ) ˜ a s . f a c t o r ( c r a b [ , 1 ] ) ) )
A n a l y s i s o f V a r i a n c e Table
CHAPTER 5. INTRODUCTION TO ANOVA 102
● ●
0.20
● ●
4
●●●● ● ● ●
●
●● ● ●
●● ● ●
●
●● ● ● ●
●
●
● ●
●
●
●●
●
● ●
● ● ●
●
●
●●
●
●
●● ●
● ● ● ● ●
●
0.15
● ●
2
●
● ● ● ●
Sample Quantiles
●
●● ● ●
● ●
●● ● ●
● ● ●
●● ● ● ● ●
●●
● ● ● ●
● ●
● ● ● ●
residual
●
●
●● ● ● ● ●
●
●
● ● ● ●
●
●
●● ● ● ● ●
●
● ●
0.10
●
● ● ● ●
0
●
●
●● ● ● ● ● ●
●
●●
●
● ● ●
● ●
●
● ● ● ●
●
●
●● ● ● ●
● ●
●
● ● ●
● ●
0.05
−2
−2
● ●
●
●●
●
●
●●
●
●
●●
●● ● ●
●
●
●● ●
●
●●
●
● ●
0.00
−4
−4
● ●●●●●●
●● ● ●
Response : l o g ( c r a b [ , 2 ] + 1 / 6 )
Df Sum Sq Mean Sq F v a l u e Pr(>F)
as . f a c t o r ( crab [ , 1 ] ) 5 54.73 10.95 2.3226 0.04604 ⇤
Residuals 144 6 7 8 . 6 0 4.71
and taking the log stabilized the variances. In general, we may observe:
↵
i / µi i.e. the standard deviation of a group depends on the group mean.
Yij⇤ = g (Yij )
1
⇡ µi + (Yij µ i ) µi
⇤
E[Yij ] ⇡ µi
Var[Yij⇤ ] ⇡ E[(Yij µi )2 ]( µi 1 2
)
SD[Yij⇤ ] / µ↵i µi 1
= µ↵+
i
1
CHAPTER 5. INTRODUCTION TO ANOVA 103
⇠
So if we observe i / µ↵i , then i⇤ / µ↵+i
1
. So if we take = 1 ↵ then
we will have stabilized the variances to some extent. Of course, we typically
don’t know ↵, but we could try to estimate it from data.
Estimation of ↵:
i / µ↵i , i = cµ↵i
log i = log c + ↵ ⇥ log µi ,
so log si ⇡ log c + ↵ log ȳi·
Note that
y 1
y ⇤( )
= / y + c.
y 1
y ⇤(0) = lim y ⇤( )
= lim
!0 !0
y ln y
= = ln y
1 =0
Note that for a given 6= 0 it will not change the results of the ANOVA on
the transformed data if we transform using:
y 1
y⇤ = y or y ⇤( )
= = ay + b.
(1) Plot log si vs. log ȳi· . If the relationship looks linear, then
• Remember the rule of thumb which says not to worry if the ratio of
the largest to smallest variance is less than 4, i.e. don’t use a transform
unless there are drastic di↵erences in variances.
• Remember to make sure that you describe the units of the transformed
data, and make sure that readers of your analysis will be able to un-
derstand that the model is additive in the transformed data, but not
in the original data. Also always include a descriptive analysis of the
untransformed data, along with the p-value for the transformed data.
• Try to think about whether the associated non-linear model for yij
makes sense.
• It can be argued that if the scientist who collected the data had a good
reason for using certain units, then one should not just transform the
data in order to bang it into an ANOVA-shaped hole. (Given enough
time and thought we could instead build a non-linear model for the
original data.)
CHAPTER 5. INTRODUCTION TO ANOVA 106
• The sad truth: as always you will need to exercise judgment while
performing your analysis.
These warnings apply whenever you might reach for a transform, whether in
an ANOVA context, or a linear regression context.
Example (Crab data): Looking at the plot of means vs. sd.s suggests
↵ ⇡ 1, implying a log-transformation. However, the zeros in our data lead
to problems, since log(0) = 1.
Instead we can use yij⇤ = log(yij +1/6). For the transformed data this gives us
a ratio of the largest to smallest standard deviation of approximately 2 which
is acceptable based on the rule of 4. Additionally, the residual diagnostic
plots (Figure 5.13) are much improved
This table needs to be fixed: Third column needs to be sd(log(y)).
lm ( f o r m u l a = l o g s d ˜ log mean )
Coefficients :
( Intercept ) log mean
0.6652 0.9839
●
●
4.5
4.0
log(sd)
●
3.5
●
3.0
●
●
log(mean)
Response : c t i m e
Df Sum Sq Mean Sq F v a l u e Pr(>F)
diet 3 228.0 7 6 . 0 1 3 . 5 7 1 4 . 6 5 8 e 05 ⇤⇤⇤
R e s i d u a l s 20 1 1 2 . 0 5.6
We conclude from the F -test that there are substantial di↵erences between
the population treatment means. How do we decide what those di↵erences
are?
5.6.1 Contrasts
Di↵erences between sets of means can be evaluated by estimating contrasts.
A contrast is a linear function of the means such that the coefficients sum to
zero: m m
X X
C = C(µ, k) = ki µi , where ki = 0
i=1 i=1
Examples:
• diet 1 vs diet 2 : C = µ1 µ2
CHAPTER 5. INTRODUCTION TO ANOVA 108
Standard errors:
m
X
Var[Ĉ] = Var[ki ȳi· ]
i=1
Xm
= ki2 2
/ni
i=1
m
X
2
= ki2 /ni
i=1
So an estimate of Var[Ĉ] is
m
X
s2C =s 2
ki2 /ni
i=1
Ĉ
⇠ tN m
SE[Ĉ]
Hypothesis test:
• H0 : C = 0 versus H1 : C 6= 0.
Example: Recall in the coagulation example µ̂1 = 61, µ̂2 = 66, and their
95% confidence intervals were (58.5,63.5) and (63.9,68.0). Let C = µA µB .
Hypothesis test: H0 : C = 0.
Ĉ ȳ ȳ2·
p 1·
=
SE[Ĉ] s 1/6 + 1/4
5
=
1.53
= 3.27
Ĉ ± t1 ↵/2,N m ⇥ SE[Ĉ]
CHAPTER 5. INTRODUCTION TO ANOVA 110
●
20
●
●
●
18
●
●
grain yield
●
● ●
●
16
●
●
14
●
●
12
10 20 30 40 50
plant density
> anova(lm(y~as.factor(x))
Df Sum Sq Mean Sq F value Pr(>F)
as.factor(x) 4 87.600 21.900 29.278 1.690e-05 ***
Residuals 10 7.480 0.748
What are these contrasts representing? Would would make them large?
• If all µi ’s are the same, then they will all be close to zero. This is the
“sum to zero” part, i.e. Ci · 1 = 0 for each contrast.
• Similarly, C3 and C4 are measuring the cubic and quartic parts of the
relationship between density and yield.
> 3⇤ c . hat ˆ2
. L .Q .C ˆ4
[ 1 , ] 4 3 . 2 42 0 . 3 2 . 1
• H0 : µi = µj for all i, j
CHAPTER 5. INTRODUCTION TO ANOVA 113
• H0ij : µi = µj
We can associate error rates to both of these types of hypotheses
• Experiment-wise type I error rate: Pr(reject H0 |H0 is true ).
with equality only if there are two treatments total. The fact that the
experiment-wise error rate is larger than the comparison-wise rate is called
the issue of multiple comparisons. What is the experiment-wise rate in this
analysis procedure?
So ✓ ◆
<m
Pr(one or more H0ij rejected |H0 ) ⇠ ↵C
2
If we are worried about possible dependence among the tests, perhaps a
better way to derive this bound is to recall that
(subadditivity). Therefore
X
P (one or more H0ij rejected |H0 ) P (H0ij rejected |H0 )
i,j
✓ ◆
m
= ↵C
2
m
where ↵C = ↵E / 2
.
3. If F (y) > F1 ↵E ,m 1,N m then reject H0 and reject all H0ij for which
|Ĉij /SE[Ĉij ]| > t1 ↵C /2,N m .
Pr(reject H0ij |H0ij ) = Pr(F > Fcrit and |tij | > tcrit |H0ij )
= Pr(F > Fcrit |H0ij ) ⇥ Pr(|tij | > tcrit |F > Fcrit , H0ij )
Factorial Designs
• Treatments:
which Delivery to use for the experiment testing for Type e↵ects?
which Type to use for the experiment testing for Delivery e↵ects?
116
CHAPTER 6. FACTORIAL DESIGNS 117
..
.
It might be helpful to visualize the design as follows:
Delivery
Type A B C D
1 y I,A y I,B y I,C y I,D
2 y II,A y II,B y II,C y II,D
3 y III,A y III,B y III,C y III,D
So in this case, Type and Delivery are both factors. There are 3 levels of
Type and 4 levels of Delivery.
Marginal Plots: Based on these marginal plots, it looks like (III, A) would
be the most e↵ective combination. But are the e↵ects of Type consis-
tent across levels of Delivery?
Conditional Plots: Type III looks best across delivery types. But the dif-
ference between types I and II seems to depend on delivery. For exam-
ple, for delivery methods B or D there doesn’t seem to be much of a
di↵erence between I and II.
Cell Plots: Another way of looking at the data is to just view it as a CRD
with 3 ⇥ 4 = 12 di↵erent groups. Sometimes each group is called a cell.
●
12
12
10
10
8
8
●
6
6
4
4
2
2
I II III A B C D
> lm ( l o g ( s d s ) ˜ l o g ( means ) )
Coefficients :
( Intercept ) l o g ( means )
3.203 1.977
Possible analysis methods: Let’s first try to analyze these data using
our existing tools:
• Two one-factor ANOVAS: Just looking at Type, for example, the ex-
periment is a one-factor ANOVA with 3 treatment levels and 16 reps
per treatment. Conversely, looking at Delivery, the experiment is a one
factor ANOVA with 4 treatment levels and 12 reps per treatment.
> dat$y < 1/dat$y
> anova ( lm ( dat$y ˜ d a t $ t y p e ) )
A B
4.5
12
4.0
10
3.5
8
3.0
6
2.5
4
2.0
I II III I II III
C D
10
7
9
6
8
7
5
6
4
5
3
4
3
2
I II III I II III
● ●
1.0
●
●
8
● ●
7
● ●
0.0
log(sds)
●
means
5 6
● ● ●
●
●
−1.0
●
4
●
●
● ● ● ●
3
−2.0
●
● ●
2
● ●
−2.6
●
●
0.4
−3.0
●
● ●
log(sds)
means
●
0.3
● ● ● ●
● ●
−3.4
●
0.2
●
● ● ●
−3.8
●
●
● ●
R e s i d u a l s 45 0 . 3 0 6 2 8 0 . 0 0 6 8 1
• What are the SSE, MSE representing in the first two ANOVAS? Why
are they bigger than the value in the in the third ANOVA?
CHAPTER 6. FACTORIAL DESIGNS 122
●
12
12
10
10
8
8
●
6
6
4
4
2
I II III A B C D
A B
0.55
0.30
0.45
0.20
0.35
0.25
0.10
I II III I II III
C D
0.45
0.30
0.35
0.20
0.25
0.15
0.10
I II III I II III
0.1 0.2 0.3 0.4 0.5
I.A II.A III.A I.B II.B III.B I.C II.C III.C I.D II.D III.D
• In the third ANOVA, can we assess the e↵ects of Type and Delivery
separately?
• Can you think of a situation where the F -stats in the first and second
ANOVAs would be “small”, but the F -stat in the third ANOVA “big”?
Basically, the first and second ANOVAs may mischaracterize the data and
sources of variation. The third ANOVA is “valid,” but we’d like a more
specific result: we’d like to know which factors are sources of variation, and
the relative magnitude of their e↵ects. Also, if the e↵ects of one factor are
consistent across levels of the other, maybe we don’t need to have a separate
parameter for each of the 12 treatment combinations, i.e. a simpler model
may suffice.
To obtain the set-to-zero side conditions, add â1 and b̂1 to µ̂, subtract â1
from the âi ’s, and subtract b̂1 from the b̂j ’s. Note that this does not change
the fitted value in each group:
y ȳ··· = â + b̂ + ˆ✏
vT = v1 + v2 + ve
The columns represent
vT variation of the data around the grand mean;
v1 variation of factor 1 means around the grand mean;
v2 variation of factor 2 means around the grand mean;
ve variation of the data around fitted the values.
You should be able to show that these vectors are orthogonal, and so
P P P P P P 2 P P P 2 P P P 2
i j k (yijk ȳ··· )2 = i j k âi + i j k b̂i + i j k✏
ˆi
SSTotal = SSA + SSB + SSE
Degrees of Freedom:
• â contains m1 di↵erent numbers but sums to zero ! m1 1 dof
• b̂ contains m2 di↵erent numbers but sums to zero ! m2 1 dof
CHAPTER 6. FACTORIAL DESIGNS 125
ANOVA table
Source SS df MS F
A SSA m1 1 SSA/dfA MSA/MSE
B SSB m2 1 SSB/dfB MSB/MSE
Error SSE (m1 1)(m2 1) + m1 m2 (n 1) SSE/dfE
Total SSTotal m1 m2 n 1
> anova ( lm ( dat$y ˜ d a t $ t y p e+d a t $ d e l i v e r y ) )
A n a l y s i s o f V a r i a n c e Table
Response : dat$y
Df Sum Sq Mean Sq F v a l u e
dat$type 2 0.34877 0.17439 71.708
dat$delivery 3 0.20414 0.06805 27.982
Residuals 42 0 . 1 0 2 1 4 0.00243
This ANOVA has decomposed the variance in the data into the variance of
additive Type e↵ects, additive Delivery e↵ects, and residuals. Does this
adequately represent what is going on in the data? What do we mean by
additive? Assuming the model is correct, we have:
This says that the di↵erence between Type I and Type II is a1 a2 regardless
of Delivery. Does this look right based on the plots? Consider the following
table:
E↵ect of Type I vs II, given Delivery
Delivery full model additive model
A µIA µIIA (µ + a1 + b1 ) (µ + a2 + b1 ) = a1 a2
B µIB µIIB (µ + a1 + b2 ) (µ + a2 + b2 ) = a1 a2
C µIC µIIC (µ + a1 + b3 ) (µ + a2 + b3 ) = a1 a2
D µID µIID (µ + a1 + b4 ) (µ + a2 + b4 ) = a1 a2
• The full model allows di↵erences between Types to vary across levels
of Delivery
How can we test for this? Consider the following parameterization of the
full model:
Interaction model:
µ = overall mean;
yijk = ȳ··· + (ȳi·· ȳ··· ) + (ȳ·j· ȳ··· ) + (ȳij· ȳi·· ȳ·j· + ȳ··· ) + (yijk ȳij· )
= µ̂ + âi + b̂j + ˆ
(ab) + ✏ˆijk
ij
Note that the interaction term is equal to the fitted value under the full
model (yij ) minus the fitted value under the additive model (ȳi· + ȳ·j ȳ·· ).
Deciding between the additive/reduced model and the interaction/full model
is tantamount to deciding if the variance explained by the (ab)ˆ ’s is large or
ij
not, i.e. whether or not the full model is close to the additive model.
yijk = µ + ai + bj + ✏ijk
CHAPTER 6. FACTORIAL DESIGNS 127
Note that in this model the fitted value in one cell depends on data
from the others.
Residual:
✏ˆijk = yijk ŷijk
= (yijk ȳi·· ȳ·j· + ȳ··· )
ˆ
The term (ab) ij measures the deviation of the cell means to the estimated
additive model. It is called an interaction. It does not measure how factor 1
“interacts” with factor 2. It measures how much the data deviate from the
additive e↵ects model.
CHAPTER 6. FACTORIAL DESIGNS 128
Interactions and the full model: The interaction terms also can be
derived by taking the additive decomposition above one step further: The
residual in the additive model can be written:
✏ˆA
ijk = yijk ȳi·· ȳ·j· + ȳ···
= (yijk ȳij· ) + (ȳij· ȳi·· ȳ·j· + ȳ··· )
ˆ
= ✏ˆI + (ab)
ijk ij
yijk = ȳ··· + (ȳi·· ȳ··· ) + (ȳ·j· ȳ··· ) + (ȳij· ȳi·· ȳ·j· + ȳ··· ) + (yijk ȳij· )
= µ̂ + âi + b̂j + ˆ ij
(ab) + ✏ˆijk
Fitted value:
ˆ ij
ŷijk = µ̂ + âi + b̂j + (ab)
= ȳij· = µ̂ij
This is a full model for the treatment means: The estimate of the mean
in each cell depends only on data from that cell. Contrast this to
additive model.
Residual:
Thus the full model ANOVA decomposition partitions the variability among
the cell means ȳ11· , ȳ12· , . . . , ȳm1 m2 · into
As you might expect, these di↵erent parts are orthogonal, resulting in the
following orthogonal decomposition of the variance.
var explained by add model + error in add model
var explained by full model + error in full model
Total SS = SSA + SSB + SSAB + SSE
Example: (Poison) 3 ⇥ 4 two-factor CRD with 4 reps per treatment com-
bination.
Df Sum Sq Mean Sq F v a l u e
p o i s $ d e l i v 3 0.20414 0.06805 27.982
pois$type 2 0.34877 0.17439 71.708
R e s i d u a l s 42 0 . 1 0 2 1 4 0 . 0 0 2 4 3
Df Sum Sq Mean Sq F v a l u e
pois$deliv 3 0.20414 0.06805 28.3431
pois$type 2 0.34877 0.17439 72.6347
pois$deliv : pois$type 6 0.01571 0.00262 1.0904
Residuals 36 0.08643 0.00240
So notice
2 2 2
– E[MSAB] = + r⌧AB > .
This suggests
• An evaluation of the adequacy of the additive model can be assessed
by comparing MSAB to MSE. Under H0 : (ab)ij = 0 ,
• If the additive model is adequate then MSEint and MSAB are two
independent estimates of roughly the same thing (why independent?).
We may then want to combine them to improve our estimate of 2 .
For these data, there is strong evidence of both treatment e↵ects, and little
evidence of non-additivity. We may want to use the additive model.
2 22 2 2 2 2 2 2 2 22 2 2 2
1 1 1 111 111 11 1 1
Figure 6.7: Comparison between types I and II, without respect to delivery.
Testing additive e↵ects Let µij be the population mean in cell ij. The
relationship between the cell means model and the parameters in the inter-
action model are as follows:
µij = µ·· + (µi· µ·· ) + (µ·j µ·· ) + (µij µi· µj· + µ·· )
= µ + ai + bj + (ab)ij
and so
CHAPTER 6. FACTORIAL DESIGNS 132
2 22 2 22 2 2 2 2 22 2 2 2
1 11 111 11
1 11 1 1
Figure 6.8: Comparison between types I and II, with delivery in color.
P
• i ai = 0
P
• j bj =0
P P
• i (ab)ij = 0 for each j, j (ab)ij = 0 for each i
(population) means:
F2 = 1 F2 = 2 F2 = 3 F2 = 4
F1 = 1 µ11 µ12 µ13 µ14 4µ̄1·
F1 = 2 µ21 µ22 µ23 µ24 4µ̄2·
F1 = 3 µ31 µ32 µ33 µ34 4µ̄3·
3µ̄·1 3µ̄·2 3µ̄·3 3µ̄·4 12µ̄··
So
a1 a2 = µ̄1· µ̄2·
= (µ11 + µ12 + µ13 + µ14 )/4 (µ21 + µ22 + µ23 + µ24 )/4
Like any contrast, we can estimate/make inference for it using contrasts of
sample means:
a1 a2 = â1 â2 = ȳ1·· ȳ2·· is an unbiased estimate of a1 a2
Note that this estimate is the corresponding contrast among the m1 ⇥ m2
sample means:
F2 = 1 F2 = 2 F2 = 3 F2 = 4
F1 = 1 ȳ11· ȳ12· ȳ13· ȳ14· 4ȳ1··
F1 = 2 ȳ21· ȳ22· ȳ23· ȳ24· 4ȳ2··
F1 = 3 ȳ31· ȳ32· ȳ33· ȳ34· 4ȳ3··
3ȳ·1· 3ȳ·2· 3ȳ·3· 3ȳ·4· 12ȳ···
So
â1 â2 = ȳ1·· ȳ2··
= (ȳ11· + ȳ12· + ȳ13· + ȳ14· )/4 (ȳ21· + ȳ22· + ȳ23· + ȳ24· )/4
Hypothesis tests and confidence intervals can be made using the standard
assumptions:
• E[â1 â2 ] = a1 a2
• Under the assumption of constant variance:
Var[â1 â2 ] = Var[ȳ1·· ȳ2·· ]
= Var[ȳ1·· ] + Var[ȳ2·· ]
= 2 /(n ⇥ m2 ) + 2 /(n ⇥ m2 )
= 2 2 /(n ⇥ m2 )
CHAPTER 6. FACTORIAL DESIGNS 134
2
where ⌫ are the degrees of freedom associated with our estimate of ,
i.e. the residual degrees of freedom in our model.
– ⌫ = m1 m2 (n 1) under the full/interaction model
– ⌫ = m1 m2 (n 1) + (m1 1)(m2 1) under the reduced/additive
model
Review: Explain the degrees of freedom for the two models.
t-test: Reject H0 : a1 = a2 if
â â2
q 1 > t1 ↵C /2,⌫
2
MSE n⇥m 2
r
2
|â1 â2 | > MSE ⇥ t1 ↵C /2,⌫
n ⇥ m2
So the quantity
LSD1 = t1 ↵C /2,⌫ ⇥ SE(ˆ
↵1 ↵ˆ2)
r
2
= t1 ↵C /2,⌫ ⇥ MSE
n ⇥ m2
is a “yardstick” for comparing levels of factor 1. It is sometimes called the
least significant di↵erence for comparing levels of Factor 1. It is analogous
to the LSD we used in the 1-factor ANOVA.
Important note: The LSD depends on which factor you are looking at:
The comparison of levels of Factor 2 depends on
Var[ȳ·1· ȳ·2· ] = Var[ȳ·1· ] + Var[ȳ·2· ]
= 2 /(n ⇥ m1 ) + 2 /(n ⇥ m1 )
= 2 2 /(n ⇥ m1 )
CHAPTER 6. FACTORIAL DESIGNS 135
0.5
0.4
0.4
0.3
0.3
0.2
0.2
0.1
0.1
I II III A B C D
where
1
P P
• µ= m1 m2 i j µij = µ̄··
CHAPTER 6. FACTORIAL DESIGNS 137
1
P
• ai = m2 j (µij µ̄·· ) = µ̄i· µ̄··
1
P
• bj = m1 i (µij µ̄·· ) = µ̄·j µ̄··
The terms {a1 , . . . , am1 } , {b1 , . . . , bm2 } are sometimes called “main e↵ects”.
The additive model is
yijk = µ + ai + bj + ✏ijk
Sometimes this is called the “main e↵ects” model. If this model is correct,
it implies that
(ab)ij = 0 8i, j ,
µi1 j µi2 j = ai1 ai2 8i1 , i2 , j ,
µij1 µij2 = bj1 bj2 8i, j1 , j2
Now m2 m2
1 X 1 X
E[ (ȳ1j· ȳ2j· )] = (µ1j µ2j ) = a1 a2 ,
m2 j=1 m2 j=1
●
8
6
●
4
●
2
●
2
1. a↵ects response
then it will increase the variance in response and also the experimental error
variance/MSE if unaccounted for. If F2 is a known, potentially large source
of variation, we can control for it pre-experimentally with a block design.
Blocking: The stratification of experimental units into groups that are more
homogeneous than the whole.
Objective: To have less variation among units within blocks than between
blocks.
CHAPTER 6. FACTORIAL DESIGNS 141
dry
wet
−−−−−−−−−−−−−− irrigation −−−−−−−−−−−−−−
• location
• physical characteristics
• time
2 5 4 1 6 3
1
1 3 4 6 5 2
2
6 3 5 1 2 4
3
2 4 6 5 3 1
4
1 2 3 4 5 6
column
Design:
2. Within each row or block, each of the 6 treatments are randomly allo-
cated.
2. The blocks are complete, in that each treatment appears in each block.
50
45
45
Ni
Ni
40
40
35
35
1 2 3 4 5 6 1 2 3 4
treatment row
treatment and residual versus location
2 5 4 1 6 3
1
1 3 4 6 5 2
2
6 3 5 1 2 4
3
2 4 6 5 3 1
4
1 2 3 4 5 6
column
Figure 6.13: Marginal plots, and residuals without controlling for row.
CHAPTER 6. FACTORIAL DESIGNS 144
Analysis of the RCB design with one rep: Analysis proceeds just as
in the two-factor ANOVA:
yij ȳ·· = (ȳi· ȳ·· ) + (ȳ·j ȳ·· ) + (yij ȳi· ȳ·j + ȳ·· )
SSTotal = SSTrt + SSB + SSE
ANOVA table
#######
> anova ( lm ( c ( y ) ˜ a s . f a c t o r ( c ( t r t ) ) + a s . f a c t o r ( c ( rw ) ) ) )
Df Sum Sq Mean Sq F v a l u e Pr(>F)
a s . f a c t o r ( c ( t r t ) ) 5 2 0 1 . 3 1 6 4 0 . 2 6 3 5 . 5 9 1 7 0 . 0 0 4 1 9 1 ⇤⇤
a s . f a c t o r ( c ( rw ) ) 3 1 9 7 . 0 0 4 6 5 . 6 6 8 9 . 1 1 9 8 0 . 0 0 1 1 1 6 ⇤⇤
Residuals 15 1 0 8 . 0 0 8 7.201
#######
#######
> anova ( lm ( c ( y ) ˜ a s . f a c t o r ( c ( t r t ) ) : a s . f a c t o r ( c ( rw ) ) ) )
Df Sum Sq Mean Sq F v a l u e Pr(>F)
CHAPTER 6. FACTORIAL DESIGNS 145
a s . f a c t o r ( c ( t r t ) ) : a s . f a c t o r ( c ( rw ) ) 23 5 0 6 . 3 3 22.01
Residuals 0 0.00
#######
Can we test for interaction? Do we care about interaction in this case, or just
main e↵ects? Suppose it were true that “in row 2, timing 6 is significantly
better than timing 4, but in row 3, treatment 3 is better.” Is this relevant in
for recommending a timing treatment for other fields?
Consider comparing the F -statistic from a CRD with that from an RCB:
According to Cochran and Cox (1957)
SSB + n(m 1)MSErcbd
MSEcrd =
✓ nm ◆ 1 ✓ ◆
n 1 n(m 1)
= MSB + MSErcbd
nm 1 nm 1
In general, the e↵ectiveness of blocking is a function of MSEcrd /MSErcb . If
this is large, it is worthwhile to block. For the nitrogen example, this ratio
is about 2.
CHAPTER 6. FACTORIAL DESIGNS 146
XXX XX
SSF = (ȳij· ȳ··· )2 = nij (ȳij· ȳ··· )2
i j k i j
• Cell means show, for both road types, speeds were higher on clear
days by 5 mph on average.
The idea:
CHAPTER 6. FACTORIAL DESIGNS 148
Accident example:
1
P
Standard errors: µ̂i· = m2
ȳij· , so
j
1 X 2
Var[µi· ] = /nij
m22 j
s
1 X MSE
SE[µ̂i· ] = and similarly
m2 j
nij
s
1 X MSE
SE[µ̂·j ] =
m1 i
nij
Allowing for interaction improves the fit, and reduces error variance. SSI
measures the improvement in fit. If SSI is large, i.e. SSEA is much bigger than
SSEF , this suggests the additive model does not fit well and the interaction
term should be included in the model.
Testing:
MSI SSI/(m1 1)(m2 1)
F = =
MSE SSEF /(N m1 m2 )
Under H0 , F ⇠ F(m1 1)(m2 1),N m1 m2 , so a level-↵ test of H0 is
Note:
• SSI is the change in fit in going from the additive to the full model;
A painful example: A small scale clinical trial was done to evaluate the
e↵ect of painkiller dosage on pain reduction for cancer patients in a variety
of age groups.
• Factors of interest:
• Design: CRD, each treatment level was randomly assigned to ten pa-
tients, not blocked by age.
CHAPTER 6. FACTORIAL DESIGNS 150
2
2
0
0
−2
−2
−4
●
−4
1 2 3 50 60 70 80
> t a b l e ( t r t , ageg )
ageg
t r t 50 60 70 80
1 1 2 3 4
2 1 3 3 3
3 2 1 4 3
> t a p p l y ( y , t r t , mean )
1 2 3
0.381 0.950 2.131
>
> t a p p l y ( y , ageg , mean )
50 60 70 80
2.200 1.265 0.922 0.139
Do these marginal plots and means misrepresent the data? To evaluate this
possibility,
> cellmeans
50 60 70 80
1 4.90 1 . 4 0 0 0 0 0 1.066667 0.677500
2 2.40 2.996667 1.270000 0.300000
3 3.15 1.400000 2.152500 1.666667
• The oldest patients (ageg=80) were imbalanced towards the lower dose,
so we might expect their marginal mean to be too high. Observe the
change from the marginal mean = -.14 to the LS mean = -.23 .
What linear modeling commands in R will get you the same thing?
> o p t i o n s ( c o n t r a s t s=c ( ” c o n t r . sum ” , ” c o n t r . p o l y ” ) )
> f i t f u l l < lm ( y˜ a s . f a c t o r ( ageg ) ⇤ a s . f a c t o r ( t r t ) )
Note that the coefficients in the reduced/additive model are not the same:
> f i t a d d < lm ( y˜ a s . f a c t o r ( ageg )+ a s . f a c t o r ( t r t ) )
Where do these sums of squares come from? What do the F -tests represent?
By typing “?anova.lm” in R we see that anova() computes
“a sequential analysis of variance table for that fit. That is, the
reductions in the residual sum of squares as each term of the formula
is added in turn are given in as the rows of a table, plus the residual
sum of squares.”
• C be their interaction.
> s0 s s 1
[ 1 ] 13.3554
>
> ss1 ss2
[ 1 ] 28.25390
>
> ss2 ss3
[ 1 ] 53.75015
> ss3
[ 1 ] 57.47955
Initial analysis: CRD with one two-level factor. The first thing to do is
plot the data. The first panel of Figure 6.16 indicates a moderately large
di↵erence in the two sample populations. The second thing to do is a two-
sample t-test:
10
A
15
5 10
5
o2_change
AA
residuals
B
B
0
B
0
A
−10 −5
−5
A
A
B
A B 20 22 24 26 28 30
grp age
A linear model for ANCOVA: Let yi,j be the response of the jth subject
in treatment i:
yi,j = µ + ai + b ⇥ xi,j + ✏i,j
CHAPTER 6. FACTORIAL DESIGNS 157
A A
15
15
AA AA
5 10
5 10
B B
o2_change
o2_change
A A
A A
A B A B
0
−10 −5 0
B B
B B
−10 −5
B B
B B
20 22 24 26 28 30 20 22 24 26 28 30
age age
Figure 6.17: ANOVA and ANCOVA fits to the oxygen uptake data
This model gives a linear relationship between age and response for each
group:
The second one decomposes the variation in the data that is orthogonal to
treatment (SSE from the first ANOVA) into a parts that can be ascribed to
age (SS age in the second ANOVA), and everything else (SSE from second
ANOVA). I will try to draw some triangles that describe this situation.
Now consider two other ANOVAs:
> anova ( lm ( o 2 c h a n g e ˜ age ) )
Df Sum Sq Mean Sq F v a l u e Pr(>F)
age 1 5 7 6 . 0 9 5 7 6 . 0 9 4 0 . 5 1 9 8 . 1 8 7 e 05 ⇤⇤⇤
R e s i d u a l s 10 1 4 2 . 1 8 14.22
A A A
2.5
2.5
2.5
A A A
2.0
2.0
2.0
A A A
B
A B
A B
A
1.5
1.5
1.5
y
y
1.0
1.0
1.0
0.5
0.5
0.5
B
B B
B B
B
B
A B
A B
A
0.0
0.0
0.0
B B B
1 2 1 2 1 2
f2 f2 f2
The ANOVA quantifies this: There is variability in the data that can be
explained by either F1 or F2. In this case,
• SSA > SSA|B
A A A
2.5
2.5
2.5
B
B B
B B
B
B B B
2.0
2.0
2.0
A A A
B B B
1.5
1.5
1.5
y
y
A
A A
A A
A
1.0
1.0
1.0
A A A
0.5
0.5
0.5
0.0
0.0
0.0
B B B
1 2 1 2 1 2
f2 f2 f2
Which ANOVA table to use? Some software packages combine these anova
tables to form ANOVAs based on “alternative” types of sums of squares.
Consider a two-factor ANOVA in which we plan on decomposing variance
into additive e↵ects of F1, additive e↵ects of F2, and their interaction.
Type II SS: Sum of squares for a factor is the improvement in fit from
adding that factor, given inclusion of all other terms at that level or
below.
Type III SS: Sum of squares for a factor is the improvement in fit from
adding that factor, given inclusion of all other terms.
So for example:
Type II sums of squares is very popular, and is to some extent the “default”
for linear regression analysis. It seems natural to talk about the “variability
due to a treatment, after controlling for other sources of variation.” How-
ever, there are situations in which you might not want to “control” for other
variables. Consider the following (real-life) scenario:
Clinical trial:
Nested Designs
Example(Potato): Sulfur added to soil kills bacteria, but too much sulfur
can damage crops. Researchers are interested in comparing two levels of
sulfur additive (low, high) on the damage to two types of potatoes.
Factors of interest:
Design constraints:
163
CHAPTER 7. NESTED DESIGNS 164
A B B A
L H
A B A B
B A B A
H L
A B A B
Randomization:
> anova ( f i t . f u l l )
Df Sum Sq Mean Sq F value Pr(>F)
type 1 1.48840 1.48840 1 3 . 4 4 5 9 0 . 0 0 3 2 2 5 ⇤⇤
sulfur 1 0.54022 0.54022 4.8803 0.047354 ⇤
type : s u l f u r 1 0 . 0 0 3 6 0 0 . 0 0 3 6 0 0.0325 0.859897
Residuals 12 1 . 3 2 8 3 5 0 . 1 1 0 7 0
5.5
5.0
5.0
4.5
4.5
high low A B
sulfur type
5.5
5.0
4.5
● ●
0.0 0.2 0.4
0.4
● ●
● ● ●
Sample Quantiles
●
● ●
0.0 0.2
fit.add$res
● ●
● ●
● ●
● ●
● ●
● ●● ● ●
● ●
−0.4
−0.4
● ● ● ●
• no treatment e↵ects.
0.15
0.4
0.2 0.3
0.10
Density
Density
0.05
0.1
0.00
0.0
0 5 10 15 20 0 5 10 15
F.t.null F.s.null
4 4
• For each sulfur assignment there are 2
= 1296 type assignments
So we have 7776 possible treatment assignments. It probably wouldn’t be to
hard to write code to go through all possible treatment assignments, but its
very easy to obtain a Monte Carlo approximation to the null distribution:
F . t . n u l l < F . s . n u l l < NULL
f o r ( ns i n 1 : 1 0 0 0 ) {
s . sim< r e p ( sample ( c ( ” low ” , ” low ” , ” h i g h ” , ” h i g h ” ) ) , r e p ( 4 , 4 ) )
t . sim< c ( sample ( c ( ”A” , ”A” , ”B” , ”B” ) ) , sample ( c ( ”A” , ”A” , ”B” , ”B” ) ) ,
sample ( c ( ”A” , ”A” , ”B” , ”B” ) ) , sample ( c ( ”A” , ”A” , ”B” , ”B” ) )
)
F . t . n u l l < c (F . t . n u l l , f i t . sim [ 1 , 4 ] )
F . s . n u l l < c (F . s . n u l l , f i t . sim [ 2 , 4 ] )
}
What happened?
rand
Ftype ⇡ F1,13 ) prand anova1
type ⇡ ptype
rand
Fsulfur 6⇡ F1,13 ) prand anova1
sulfur 6⇡ psulfur
Thus there is strong evidence for type e↵ects, and little evidence that the
e↵ects of type vary among levels of sulfur.
> F. sulfur
[ 1 ] 1.022911
> 1 p f (F . s u l f u r , 1 , 2 )
[ 1 ] 0.4182903
This is more in line with the analysis using the randomization test.
The above calculations are somewhat tedious. In R there are several au-
tomagic ways of obtaining the correct F -test for this type of design. One
way is with the aov command:
> f i t 1 < aov ( y˜ type ⇤ s u l f u r + E r r o r ( f a c t o r ( f i e l d ) ) )
> summary ( f i t 1 )
Error : f a c t o r ( f i e l d )
Df Sum Sq Mean Sq F v a l u e Pr(>F)
sulfur 1 0.54022 0.54022 1.0229 0.4183
Residuals 2 1.05625 0.52813
E r r o r : Within
Df Sum Sq Mean Sq F v a l u e Pr(>F)
type 1 1.48840 1 . 4 8 8 4 0 5 4 . 7 0 0 5 2 . 3 2 6 e 05 ⇤⇤⇤
type : s u l f u r 1 0 . 0 0 3 6 0 0.00360 0.1323 0.7236
Residuals 10 0 . 2 7 2 1 0 0.02721
###
Error : f a c t o r ( f i e l d )
Df Sum Sq Mean Sq F v a l u e Pr(>F)
sulfur 1 0.54022 0.54022 1.0229 0.4183
Residuals 2 1.05625 0.52813
E r r o r : Within
Df Sum Sq Mean Sq F v a l u e Pr(>F)
type 1 1 . 4 8 8 4 0 1 . 4 8 8 4 0 5 9 . 3 8 5 9 . 3 0 7 e 06 ⇤⇤⇤
R e s i d u a l s 11 0 . 2 7 5 7 0 0 . 0 2 5 0 6
l
h
0.4
l h
h
0.0 0.2
l h
residual
l
l
h
h l
h
−0.4
h l
1.0 1.5 2.0 2.5 3.0 3.5 4.0
field
• ✏ijkl represents error variance at the sub plot level, i.e. variance in
sub plot experimental units The index j represents sub-plot reps l =
1, . . . , r2 ,
{✏ijkl } ⇠ normal(0, s2 )
Now every subplot within the same wholeplot has something in common, i.e.
ik . This models the positive correlation within whole plots:
Cov(yi,j1 ,k,l1 , yi,j2 ,k,l2 ) = E[(yi,j1 ,k,l1 E[yi,j1 ,k,l1 ]) ⇥ (yi,j2 ,k,l2 E[yi,j2 ,k,l2 ])]
= E[( i,k + ✏i,j1 ,k,l1 ) ⇥ ( i,k + ✏i,j2 ,k,l2 )]
2
= E[ i,k + i,k ⇥ (✏i,j1 ,k,l1 + ✏i,j2 ,k,l2 ) + ✏i,j1 ,k,l1 ✏i,j2 ,k,l2 ]
2 2
= E[ i,k ] +0+0= w
This and more complicated random-e↵ects models can be fit using the lme
command in R. To use this command, you need the nlme package:
CHAPTER 7. NESTED DESIGNS 173
l i b r a r y ( nlme )
f i t . me< lme ( f i x e d=y˜ type+s u l f u r , random =˜1| a s . f a c t o r ( f i e l d ) )
>summary ( f i t . me)
Fixed e f f e c t s : y ˜ type + s u l f u r
Value Std . E r r o r DF t value p value
( I n t e r c e p t ) 4 . 8 8 5 0 0 . 2 5 9 9 6 5 0 11 1 8 . 7 9 0 9 9 1 0 . 0 0 0 0
typeB 0 . 6 1 0 0 0 . 0 7 9 1 5 7 5 11 7 . 7 0 6 1 5 3 0 . 0 0 0 0
sulfurlow 0.3675 0 . 3 6 3 3 6 0 2 2 1.011393 0 . 4 1 8 3
But the lme command allows for much more complex models to be fit.
CHAPTER 7. NESTED DESIGNS 174
The size of each tree (roughly the volume) was measured at five time points:
152, 174, 201, 227 and 258 days after the beginning of experiment.
●
● ● ● ●
● ● ● ●
● ● ● ● ●
●
● ● ●
6
● ● ●
● ●
● ●
● ●
● ●
● ● ● ● ●
● ● ●
● ● ●
●
● ● ● ● ● ●
● ● ●
● ● ● ●
● ● ●
●
● ● ●
● ●
● ● ●
● ●
● ● ●
● ● ●
● ● ●
● ●
●
●
● ● ● ● ● ● ●
5
●
● ● ● ●
● ● ● ● ●
● ●
●
● ● ● ● ● ●
● ● ● ●
height
● ● ● ●
● ● ●
● ●
● ● ● ● ●
● ● ●
●
● ●
● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ●
● ●
● ●
● ● ● ●
● ●
● ● ● ● ●
● ●
● ● ●
4
● ● ●
● ●
●
● ● ●
● ●
● ●
● ●
●
● ●
● ● ●
●
● ●
● ● ● ●
● ● ● ●
● ● ●
3
● ●
●
●
> f i t < lm ( S i t k a $ s i z e ˜ S i t k a $ t r e a t )
> anova ( f i t )
Df Sum Sq Mean Sq F v a l u e Pr(>F)
Sitka$treat 1 3.810 3.810 6.0561 0.01429 ⇤
Residuals 393 2 4 7 . 2 2 2 0.629
●
●●
●●
●●●
●
●
●●
●
●●
●
●●
●●
●
●
●
6
●●
1
●●
●
●
●●
●●
●
●
●●
●
40 60 80
●
●●
●
Sample Quantiles
●
●
●
●●
●
●●
●
●●
●
●
●●
●
●
●
●●
●●
●
●
●●
●
●●
●
Frequency
●
●●
●
5
●
●
●●
0
●●
●
●●
●●
●
●●
●
●
●●
●
●
●
●●
●
●
●●
●
●
●
●●
●
●●
●
●●
●
●
●
●●
−1
●
●
●
4
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●●
●●●
●
20
−2
●●●
3
●
0
control ozone −3 −2 −1 0 1 2 −3 −2 −1 0 1 2 3
fit$res Theoretical Quantiles
1 0
fit$res
−1
●
● ●
●
●
●
−2
1 3 5 7 9 11 14 17 20 23 26 29 32 35 38 41 44 47 50 53 56 59 62 65 68 71 74 77
as.factor(Sitka$tree)
CHAPTER 7. NESTED DESIGNS 176
Naive approach II: Clearly there is some e↵ect of time. Let’s now “ac-
count” for growth over time, using a simple ANCOVA:
yi,j,t = µ0 + ai + b ⇥ t + ci ⇥ t + ✏i,j,t
●
● ● ● ●
● ● ● ●
● ● ● ● ●
●
● ● ●
6
● ● ●
● ●
● ●
● ●
● ●
● ● ● ● ●
● ● ●
● ● ●
● ● ● ● ●
● ● ●
● ●
● ● ●
● ● ●
●
● ● ● ● ● ● ●
●
● ● ● ● ● ● ● ●
● ●
● ●
●
● ● ● ●
● ● ● ● ●
●
5
●
● ● ● ●
● ● ● ● ●
● ●
● ● ● ● ● ●
● ● ● ●
●
height
● ● ● ●
● ● ●
● ●
● ● ● ●
● ●
● ● ●
●
● ●
●
● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ●
● ●
● ● ● ● ●
● ●
● ● ●
● ● ●
● ●
● ● ●
4
● ● ●
● ●
●
● ● ●
● ●
● ●
● ●
●
● ●
● ● ●
●
● ●
● ● ● ●
● ● ● ●
● ● ●
3
● ●
●
●
● ●
●●●
●●
●
●●
●●
●●
●●●●●
●
1
●
●●
●
●
●●
●
●●
●
●
●●
●
●
●
●
●●
●
●●
●
●
●●
●
●
●●
●
●
Sample Quantiles
●
●
●
●●
●
●
100
●●
●
●●
●
●
●
●●
●
●
●●
●
●●
●
●●
●
●●
●
●
●●
●
●●
●
●●
●
●
Frequency
●●
●
●●
●
0
●
●●
●
●●
●
●
●
●
●●
●
●●
●
●●
●
●●
●●
●
●●
●
●●
●
●
●●
●
●●
●
●●
●
●
●
●
●
60
●●
●
●
●
●●
●
●●
●●
●
●
−1
●●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●●
●
●●
0 20
●●
●
●●
−2
●
●
−2 −1 0 1 −3 −2 −1 0 1 2 3
fit$res Theoretical Quantiles
●
●
1
● ● ●
● ● ●● ●
● ●
● ● ● ●
0
● ● ● ●
fit$res
● ● ●
●● ● ●
● ● ● ●
● ● ●
−1
●
●
−2
1 4 7 11 15 19 23 27 31 35 39 43 47 51 55 59 63 67 71 75 79
as.factor(Sitka$tree)
0.020
6.0
4
5.5
0.015
3
4.5 5.0
intercept
average
slope
0.010
2
4.0
0.005
1
3.5
●
3.0
● ●
0
> anova ( lm ( y . i n t ˜ t r e a t ) )
Df Sum Sq Mean Sq F v a l u e Pr(>F)
treat 1 0.840 0.840 1.0431 0.3103
R e s i d u a l s 77 6 1 . 9 8 9 0.805
> anova ( lm ( y . s l o p e ˜ t r e a t ) )
Df Sum Sq Mean Sq F v a l u e Pr(>F)
treat 1 0 . 0 0 0 0 7 8 1 5 0 . 0 0 0 0 7 8 1 5 7 . 6 6 2 8 0 . 0 0 7 0 5 8 ⇤⇤
R e s i d u a l s 77 0 . 0 0 0 7 8 5 2 9 0 . 0 0 0 0 1 0 2 0
CHAPTER 7. NESTED DESIGNS 179