0% found this document useful (0 votes)

6 views158 pages

Section 3

Chapter 3 discusses the general linear mixed model, which can be expressed as y = Xβ + Zu + e, where u and e are normally distributed random effects. It covers estimation methods such as maximum likelihood and restricted maximum likelihood (REML) for variance components, highlighting their differences and applications. The chapter also introduces prediction in mixed models, focusing on best linear predictors and best linear unbiased predictions (BLUP).

Uploaded by

nguyettmn01

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views158 pages

Section 3

Uploaded by

nguyettmn01

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 158

Linear Mixed Models

Chapter 3: The General Linear Mixed Model

Craig Anderson
The general linear mixed model

So far we have seen specific examples of mixed models.

We can generalize mixed models to arbitrary design
matrices:
y = Xβ + Zu + e
where X and Z are given matrices and

u 0 u G 0
E = , Var = .
e 0 e 0 R

We assume that u ∼ N(0, G) and e ∼ N(0, R)

independently of each other.
The models studied previously are all special cases of this
general model.

2/158
Example

Two-way mixed model without interaction

Consider the model

yij = αi + bj + eij

with factor A fixed and factor B random.

Since there is no µ in this parameterisation, there is no
need for an identifiability constraint on the αi .
iid
Assume as usual that bj ∼ N(0, σB2 ) independently of
iid
eij ∼ N(0, σE2 ).
For illustration purposes let i = 1, 2, 3 and j = 1, 2.

3/158
Example

Model components

 
y11
 y12   
  α1
 y21  ; β =  α2  ; u = b1 .

y=
 y22  b2
  α3
 y31 
y32

4/158
Example

Model components

   
1 0 0 1 0

 1 0 0 


 0 1 

 0 1 0   1 0 
X= ; Z =  .

 0 1 0 


 0 1 

 0 0 1   1 0 
0 0 1 0 1

5/158
Normal model

In general we can write

y ∼ N(Xβ, ZGZ| + R) = N(Xβ, V)

y = Xβ + e∗

where e∗ = Zu + e.

This is a linear model with correlated errors since

Var (e∗ ) = V = ZGZ| + R.

6/158
Variance terms in the example

 
σE2 0 ... 0
 0 σ2 ... 0 
σB2 0 E
G= σB2 I2 = ; R = σ 2
I =
 
0 σB2 E 6 
 ... 

2
0 ... 0 σE

σB2 + σE2 σB2 σB2

 
0 0 0
 0 2
σB + σE2 0 2
σB 0 σB2 
 2 2 2 2

 σB 0 σB + σE 0 σB 0 
V= 

 0 σB2 0 σB2 + σE2 0 σB2 

 σB2 0 σB2 0 σB2 + σE2 0 
0 σB2 0 2
σB 0 σB2 + σE2

7/158
Maximum likelihood for fixed effects

For a given V, the generalized least squares (GLS)

estimator of β is

β̃ = (X| V−1 X)−1 X| V−1 y

The estimator β̃ is the maximum likelihood estimator

(MLE) and the uniformly minimum variance unbiased
estimator (UMVUE).
It maximises the log-likelihood
1 1
− log |V| − (y − Xβ)| V−1 (y − Xβ) + const
2 2
for the normal model.

8/158
Variance component estimation

Maximum likelihood can also be used to estimate variance

components.

Other options were popular before the computing advances

of recent decades.

These include minimum norm quadratic unbiased

estimation (MINQUE) and minimum variance quadratic
unbiased estimation (MIVQUE)

9/158
Maximum likelihood for variance matrices
The maximum likelihood estimate of

V = Var (y) = ZGZ| + R

is based on the model

y ∼ N(Xβ, V).

The log-likelihood of y under this model is

1 1
l(β, V) = − log |V| − (y − Xβ)| V−1 (y − Xβ) + const
2 2
and the MLE of (β, V) is the one that maximises this
expression.

10/158
Maximum likelihood for variance matrices
For any fixed V, l(β, V) is maximised over β by
β̃ = (X| V−1 X)−1 X| V−1 y.

Substituting back into the log-likelihood expression, we

obtain the profile log-likelihood for V:
1 1
lP (V) = − log |V| − (y − Xβ̃)| V−1 (y − Xβ̃) + const
2 2

1 1
= − log |V| − y| V−1 [I − X(X| V−1 X)−1 X| V−1 ]y
2 2
+ const

This can be maximised for the parameters in V.

11/158
Variance component estimation

However, there is one big problem with the maximum

likelihood approach.

Even with a very simple model, the variance component

estimates do not match those obtained by the ANOVA
approach.

This is because the ANOVA approach adjusts for the

degrees of freedom lost for estimation.

12/158
Variance component estimation

Simple example
iid
Consider X1 , . . . , Xn ∼ N(µ, σ 2 ).

The variance estimate using the ANOVA approach is

n
2 1 X
s = (Xi − X̄)2 .
n − 1 i=1

The variance estimate using the maximum likelihood

approach is
n
2 1X
σ̂ = (Xi − X̄)2 .
n i=1

13/158
Restricted Maximum Likelihood (REML)

To adjust for degrees of freedom we use restricted

maximum likelihood.

This maximises the likelihood of linear combinations of

the elements of y that do not depend on β.

The resulting criterion function is the restricted

log-likelihood
1
lR (V) = lP (V) − log |X| V−1 X|
2

14/158
Restricted Maximum Likelihood (REML)
REML

Starting with
y = Xβ + Zu + e

find all independent linear combinations of the response, k

such that k| · X = 0.

Then
k| · y = k| · Xβ + k| · Zu + k| · e

and taking K to be the matrix with columns k we have

Ky = (KZ)u + (Ke).

15/158
Restricted Maximum Likelihood (REML)

REML

Thus

Ky ∼ N(0, K| VK) = N(0, K| ZGZ| K + K| RK)

and the maximum likelihood can be used to estimate

variance components based on the likelihood of Ky.

Since there are no longer fixed effects to estimate, we do

not ‘lose’ degrees of freedom.

For simple and balanced designs, REML gives the same

variance component estimates as ANOVA.

16/158
Residual Maximum Likelihood

REML could also stand for Residual Maximum

Likelihood.

REML is equivalent to:

finding the least squares estimates of β from regressing y
on X (ignoring random effects).
taking the residuals.
using maximum likelihood on the residuals.

17/158
Examples

Balanced design - Heart X-ray data

Recall the two-factor random effects model with interaction and
the ANOVA estimates for the variance components:

Variance component Estimate

Observer 0.000070
Case 0.002718
Case:Observer 0.000112
Error 0.000114

18/158
Examples

Balanced design - Heart X-ray data

Recall the two-factor random effects model:

Variance component ANOVA est. REML ML

Observer 0.000070 0.000070 0.000058
Case 0.002718 0.002718 0.002556
Case:Observer 0.000112 0.000112 0.000112
Error 0.000114 0.000114 0.000114

19/158
Examples

Unbalanced data - Nitrogen concentrations

Recall the example of nitrogen concentrations in the
Mississippi river, with influent as the random factor:

Variance component ANOVA est. REML ML

Influent 56.17 63.32 51.26
Error 42.57 42.66 42.70

20/158
Prediction

Mixed models contain fixed effects, random effects and

variance-covariance matrix parameters.

The model parameters are the parameters in β and those in

V. These can be estimated using maximum likelihood as
outlined above.

Maximum likelihood does not apply to the random effects

Instead, we use the term prediction.

21/158
Example

Multicentre clinical trial

Three drugs are compared in a multicentre clinical trial for

their effects on diastolic blood pressure.
Patients are given one of the three drugs, at random, at 1 of
26 clinics.
Measurements are taken on the patients during five visits at
the clinics.
Can you identify the fixed effects and random effects?

22/158
Example

Multicentre clinical trial

The clinics were randomly selected from a large

population of clinics; therefore, clinic is a random effect.

The variable drug is a fixed effect since we only have one

of three predetermined choices.

The drug by clinic interaction corresponds to the patient

effect, and is a random effect.

23/158
Prediction

In conventional fixed-effects analysis of multicentre trials,

inference focuses on the average drug effect throughout the
target population.
In many practical situations, we may be interested in the
performance of treatments at a specific clinic.
Suppose we suspect that different treatments perform
better under different environmental conditions.
These conditions could be represented by the various
locations in the trial.
We would therefore be interested in predicting the value
of the response at a randomly selected clinic.

24/158
Simple prediction example

Consider two random variables, Y and U, where

Y = U + e, U ∼ N(0, 1), e ∼ N(0, 4)

0.6

density of y
density of u
0.5
0.4
density

0.3
0.2
0.1
0.0

−10 −5 0 5 10
25/158
Simple example of prediction

We observe only y. Based on this observation, what is our

prediction for the value of U?

The best predictor (BP) is defined to be the Ũ which

minimises the equation:

E[(Ũ − U)2 ]

For general Y and U, the solution is

ũ = BP(U) = E(U|Y = y).

In the example, it can be shown that ũ = y/5.

26/158
Best predictor

In general, if y is the vector of observed data and u is a

random vector to be predicted, then best prediction
corresponds to minimisation of

E kũ − uk2 .

The solution is

ũ = BP(u) = E(u|y).

27/158
Best linear prediction

As a simplification, we usually restrict attention to

predictors that are linear in y, i.e. of the form

ũ = Ay + b

for some matrix A and vector b.

The solution is called the best linear predictor (BLP):

ũ = BLP(u) = E(u) + CV−1 [y − E(y)],

where

C = E {[(u − E(u)][(y − E(y)]| } and V = Var(y).

28/158
Best linear prediction

u
If is multivariate normal, then best prediction and
y
best linear prediction coincide.

In particular,

BP(u) = BLP(u) = E(u|y) = E(u) + CV−1 [y − E(y)].

29/158
BLP and the mixed model
In the mixed model
y = Xβ + Zu + e
we have
E(y) = Xβ
V = Var (y) = ZGZ| + R
C = E {[(u − E(u)][(y − E(y)]| }
= E[u(Zu + e)| ]
= E(uu| Z| ) + E(ue| )
= Var (u) Z| + 0 = GZ|
Therefore
ũ = BLP(u) = GZ| V−1 (y − Xβ).

30/158
BLP and the mixed model

The expression

ũ = BLP(u) = GZ| V−1 [y − Xβ]

includes the terms G, V and β, all of which need to be

estimated.

For instance β would be replaced by an estimator such as

β̃ = (X| V−1 X)−1 X| V−1 y.

31/158
Best linear unbiased prediction

Best linear unbiased prediction (BLUP) allows us to view

estimation of β and prediction of u in a more unified way.

This involves finding β̃ and ũ to minimise the prediction

error
2
| | | |
E (s Xβ̃ + t Zũ) − (s Xβ + t Zu)

subject to the unbiasedness condition

E(s| Xβ̃ + t| Zũ) = E(s| Xβ + t| Zu)

where s and t are arbitrary n × 1 vectors.

32/158
Best linear unbiased prediction

It can be shown that the solutions are

BLUE(β) = β̃ = (X| V−1 X)−1 X| V−1 y

BLUP(u) = ũ = GZ| V−1 (y − Xβ̃).

The best linear unbiased estimate (BLUE) for β is the

same as the GLS estimate.
The best linear unbiased predictor (BLUP) for u is the
BLP with β replaced by BLUE(β) = β̃.

33/158
Henderson’s justification

One derivation of BLUPs is by solving Henderson’s

equations:
| −1
X| R−1 Z
| −1
X R X β X R y
| −1 | −1 −1 =
Z R X Z R Z+G u Z| R−1 y

These assume

y|u ∼ N(Xβ + Zu, R), u ∼ N(0, G)

and maximise the likelihood of (y, u) over β and u, using

f (y, u) = f (y|u)f (u).

34/158
Henderson’s justification

The criterion to be optimised becomes

(y − Xβ − Zu)| R−1 (y − Xβ − Zu) + u| G−1 u.

This is essentially generalized least squares with a penalty

term.

35/158
Best linear unbiased prediction

It can be shown that the BLUP of (β, u) can be written as

β̃
= (D| R−1 D + B)−1 D| R−1 y,
ũ

where

0 0
D= X Z and B = .
0 G−1

The fitted values are then

BLUP(y) = Xβ̃ + Zũ = D(D| R−1 D + B)−1 D| R−1 y.

36/158
Estimated or empirical BLUP

We showed that the BLUPs for a mixed model are

BLUE(β) = β̃ = (X| V−1 X)−1 X| V−1 y

BLUP(u) = ũ = GZ| V−1 (y − Xβ̃)

These depend on G = Var(u) and R = Var(e) through

V = Var(y) = ZGZ| + R.

37/158
Estimated or empirical BLUP

In practice, the BLUE and BLUP are replaced by the

estimated or empirical BLUE/BLUP.

The EBLUE/EBLUP take the form:

β̂ = (X| V̂−1 X)−1 X| V̂−1 y

û = ĜZ| V̂−1 (y − Xβ̂)

where Ĝ or V̂ are obtained by plugging in the ML or

REML estimates of their parameters.

38/158
Estimated or empirical BLUP

Consider the mixed model as a whole:

BLUP[E(y|u)] = Xβ̃ + Zũ

EBLUP[E(y|u)] = ŷ = Xβ̂ + Zû.

Estimated BLUPs have two sources of variability: that

from estimation of β and u and that from estimation of G
and V.
Both sources should be taken into account when making
inference about the quantity of interest.
This can be tricky.

39/158
Standard error estimation

The variance of

BLUE(β) = β̃ = (X| V−1 X)−1 X| V−1 y

is
Var β̃ = (X| V−1 X)−1 .

A natural estimate of the standard error of the ith entry of

the EBLUE β̂i is the square root of the ith diagonal entry
of (X| V̂−1 X)−1 .
This ignores the variability due to estimation of V.
For large samples this variability can be ignored, but for
small samples it makes a difference.

40/158
Precision of BLUPs involving u

To estimate the precision of BLUPs involving u we need

β̃ − β β̃
Var = Var = (D| R−1 D + B)−1
ũ − u ũ − u

where, as before

0 0
D= X Z and B =
0 G−1

Therefore we could use the approximation

β̂
Var = (D| R̂−1 D + B̂)−1 .
û − u

41/158
Precision of BLUPs involving u

Sometimes we may also need the conditional variance

β̃ − β β̃
Var u = Var u
ũ ũ

= (D| R−1 D + B)−1 D| R−1 D(D| R−1 D + B)−1 .

This suggests the approximation

β̂
Var u = (D| R̂−1 D+B̂)−1 D| R̂−1 D(D| R̂−1 D+B̂)−1 .
û

42/158
Summary

The BLUP of the random effect is the expected value of

the random variable(s) given the observed data.

The solutions for the fixed effect yield best linear unbiased
estimators (BLUEs).

We solve the mixed model equations using the estimated

covariance matrices, Ĝ and R̂.

This yields the estimated or empirical best linear unbiased

predictor (EBLUP) for the random effect u and the
estimated or empirical best linear unbiased estimator
(EBLUE) for the fixed effect β.

43/158
Summary

Properties of a BLUP

it is unbiased; that is, E(û) = E(u) = 0.

it is a linear estimator; that is, it is a linear combination of

y: û = Ay + b, where A and b are free of the fixed effect
parameters.

it is best because it minimises the residual error. If u were

a fixed effect, this criterion is equivalent to minimum
variance.

44/158
Summary

Properties of a BLUP

BLUPs have a so-called shrinkage property.

They shrink toward the overall average - in other words,

they are less extreme than the observed counterparts.

They can be interpreted as the weighted average of the

grand mean and the observed value.

45/158
Toy Example

Recall the model

yij = µ + αi + bj + eij

where
yij is the breaking strength for the ith adhesive and jth toy,
i = 1, . . . , I (I = 3) and j = 1, . . . , J (J = 7).
µ is the overall mean.
αi is the fixed effect associated with the ith adhesive.
bj is the random effect associated with the jth toy (block).
eij is the experimental error associated with samples within
blocks.

46/158
BLUP for toy effect

It can be shown that the BLUP for the toy effect bj is

σB2
b̃j = (ȳ·j − ȳ)
σB2 + σE2 /3

where ȳ·j is the average pressure value for the jth toy and ȳ
is the grand mean.

σB2
Because the factor is never greater than 1,
σB2 + σE2 /3
the BLUP can be thought of as a shrinkage estimator.

47/158
Hypothesis tests

Suppose we wish to test H0 : βi = 0 where βi is the ith

entry of β.

A Wald test of this hypothesis would depend on the

asymptotic normality of the MLE through a result such as

β̂i − βi approx
∼ N(0, 1)
ese(β̂i )

where the estimated standard error (ese) could be obtained

as the square root of the ith diagonal entry of (X| V̂−1 X)−1 .

48/158
Hypothesis tests

Wald test

- However, this result does not hold for general mixed

models because the elements of y are dependent due to the
random effects.

+ It is still applicable in special cases such as longitudinal

data analysis.

49/158
Hypothesis tests

We can use the sums of squares in the ANOVA

decompositions to construct F-tests for hypotheses such as
H0 : βi = 0.

Expected mean squares

+ For balanced data these tests are more powerful than

alternatives.

- The appropriate F-test has to be derived for each particular

example.
- For unbalanced experiments these are not exact and they
rely on various complex adjustments and approximations.

50/158
Likelihood ratio test for fixed effects

Let L(θ; y) be the likelihood of the parameter vector θ

based on the data y.

The likelihood ratio statistic for testing the restricted

model under the null hypothesis against an alternative
unrestricted model is

L(θ̂ 0 ; y)
LR(y) =
L(θ̂; y)

where θ̂ 0 is the MLE under the null model and θ̂ is the

MLE under the unrestricted model.

51/158
Likelihood ratio test for fixed effects

Let l(θ; y) = log L(θ; y) be the log-likelihood.

Usually we work with

h i
−2 l(θ̂ 0 ; y) − l(θ̂; y)

which, under H0 , is approximately distributed as χ2ν .

Here

ν = number of parameters in the unrestricted model

− number of parameters in the null model.

52/158
Hypothesis tests

Likelihood ratio test for fixed effects

+ The test statistic depends on y and hence on the type of

correlation structure in matrices G and R.

- Even when the conditions for the asymptotic result hold,

the approximation could be poor.

53/158
Hypothesis tests for random effects

Suppose we wish to test H0 : σ 2 = 0 against H1 : σ 2 > 0

for some variance parameter.

The test based on comparing

h i
−2 l(θ̂ 0 ; y) − l(θ̂; y)

with the percentiles of a χ2 (1) distribution does not apply

because its theoretical justification assumes that the
parameter of interest is not on the boundary of its
parameter space.

54/158
Likelihood ratio tests for variance

Since the parameter space for σ 2 is [0, ∞), this assumption

is violated.

If you do use the test with the usual degrees of freedom,

the test will be very conservative (the p-values will be
larger than they should be).

This means that if something appears to be significant

using the χ2 approximation, then we can be confident that
it is actually significant.

55/158
Special case

There is a special case where the asymptotic distribution is

available.

For hypothesis tests involving one variance parameter and

s regression coefficients,
h i approx 1 1
−2 l(θ̂ 0 ; y) − l(θ̂; y) ∼ χ2 (s) + χ2 (s + 1)
2 2
h i
This means that the term −2 l(θ̂ 0 ; y) − l(θ̂; y) has an
approximate density function equal to a 50:50 mixture of
the χ2 (s) and χ2 (s + 1) densities.

56/158
Special case

For example, a test of H0 : σ 2 = 0, β1 = 0 against

H0 : σ 2 > 0 or β1 6= 0 would involve comparing
h i
−2 l(θ̂ 0 ; y) − l(θ̂; y)

with the percentiles of the 21 χ2 (1) + 12 χ2 (2) mixture

distribution.

Distribution theory is much more complex if the null

hypothesis involves more than one variance component.

57/158
Tests using bootstrap

All the asymptotic tests discussed above involve (often

poor) approximations.

A more precise way to obtain critical values for tests such

as the likelihood ratio test is via simulation.

This can be done via a method known as the parametric

bootstrap technique.

58/158
Tests using bootstrap

Parametric Bootstrap

1 Generate data under the null model using the fitted

parameter estimates.

2 Compute the likelihood ratio statistic for these generated

data.

3 Use the distribution of this statistic to obtain a critical

value for the observed test statistic.

59/158
Example

Paper brightness

The pulp data frame, available in R from the faraway

package has 20 rows and 2 columns.

Data comes from an experiment to test the paper

brightness depending on a shift operator.

This data frame contains the following columns:

bright Brightness of the pulp as measured by a
reflectance meter
operator Shift operator a-d

Data source: Statistical techniques applied to production situations, F.

Sheldon (1960), Industrial and Engineering Chemistry, 52, 507-509.

60/158
Pulp example

Model

yij = µ + ai + eij

where
yij is the paper brightness measured by the ith operator,
i = 1, . . . , 4 with j = 1, . . . , 5 replicates per operator.
µ is the overall mean
ai is the random effect associated with the ith operator
eij is the experimental error.

61/158
Data

pulp

bright operator
1 59.8 a
2 60.0 a
3 60.8 a
4 60.8 a
5 59.8 a
6 59.8 b
...
18 60.6 d
19 60.5 d
20 60.5 d

62/158
Inference using ANOVA decomposition

# Change the identifiability constraint

# to sum to zero:
op <- options(contrasts
=c("contr.sum", "contr.poly"))

# Obtain the ANOVA decomposition for the

# one-way layout:
lmod <- aov(bright ˜ operator , data=pulp)
summary(lmod)
Df Sum Sq Mean Sq F value Pr(>F)
operator 3 1.34 0.4467 4.204 0.0226 *
Residuals 16 1.70 0.1062

Operator effect significant with p-value of 0.023.

63/158
Fitting a mixed model using ML
smod <- lmer(bright ˜ 1+(1|operator), data=pulp,
REML=FALSE)
summary(smod)

Linear mixed model fit by maximum likelihood

Formula: bright ˜ 1 + (1 | operator)
Data: pulp
AIC BIC logLik deviance REMLdev
22.51 25.5 -8.256 16.51 18.74
Random effects:
Groups Name Variance Std.Dev.
operator (Intercept) 0.04575 0.21389
Residual 0.10625 0.32596
Number of obs: 20, groups: operator, 4

Fixed effects:
Estimate Std. Error t value
(Intercept) 60.4000 0.1294 466.7
64/158
Likelihood ratio test

nullmod <- lm(bright˜ 1, data=pulp)

as.numeric(2*(logLik(smod)-logLik(nullmod)))
[1] 2.568371

pchisq(2.5684,1, lower=FALSE)
[1] 0.1090179

Can we trust the χ2 approximation?

65/158
Fitting a mixed model using REML
library(lme4)
mmod <- lmer(bright ˜ 1+(1|operator), data=pulp)
summary(mmod)

Linear mixed model fit by REML

Formula: bright ˜ 1 + (1 | operator)
Data: pulp
AIC BIC logLik deviance REMLdev
24.63 27.61 -9.313 16.64 18.63
Random effects:
Groups Name Variance Std.Dev.
operator (Intercept) 0.068082 0.26093
Residual 0.106250 0.32596
Number of obs: 20, groups: operator, 4

Fixed effects:
Estimate Std. Error t value
(Intercept) 60.4000 0.1494 404.2
66/158
Parametric bootstrap

lrstat <- numeric(1000)

for (i in 1:1000) {
y <- unlist(simulate(nullmod))
bnull <- lm(y ˜ 1)
balt <- lmer(y ˜ 1 + (1|operator),
data=pulp, REML=FALSE)
lrstat[i] <- as.numeric(2*(logLik(balt)-logLik(bnull))
}

# p-value:
mean(lrstat >2.5684)
[1] 0.02

The effect is significant at 5% level.

67/158
Pulp example

In this example, the p-value obtained from the parametric

bootstrap approach is similar to that from the ANOVA
table (fixed effects model).
The hypotheses for fixed and random effects are different.
It is easier to conclude that there is an effect in a fixed
effects model where the conclusion only applies to the
levels of the factor used in the experiment.
The conclusion about random effects generalizes to a
larger population, hence stronger evidence is required to
obtain significance.

68/158
Prediction

Suppose we want to predict a new value.

If this prediction is for a new operator or an unknown
operator, our best guess will be µ̂ = 60.4.
If we know the operator, we can combine µ̂ with the
estimate of the random effect for that operator to obtain the
empirical best linear unbiased predictor (EBLUP).

69/158
Prediction of the random effects

# Prediction of random effects:

ranef(mmod)$operator
(Intercept)
a -0.1219414
b -0.2591256
c 0.1676695
d 0.2133975

#EBLUPs:
fixef(mmod)+ranef(mmod)$operator
(Intercept)
a 60.27806
b 60.14087
c 60.56767
d 60.61340

70/158
Residuals
Because we can have different fitted values we end up with
more than one type of residual. In the example resid(mmod)
gives residuals as follows:
round(resid(mmod),5)
[1] -0.47806 -0.27806 0.52194 0.52194 -0.47806
[6] 0.34088 0.05912 0.25912 -0.24088 -0.14088
[11] 0.13233 0.13233 -0.06767 0.33233 -0.26767
[16] 0.38660 0.18660 -0.01340 -0.11340 -0.11340

pulp$bright-resid(mmod)
[1] 60.27806 60.27806 60.27806 60.27806 60.27806
[6] 60.14087 60.14087 60.14087 60.14087 60.14087
[11] 60.56767 60.56767 60.56767 60.56767 60.56767
[16] 60.61340 60.61340 60.61340 60.61340 60.61340

We can use these residuals in diagnostic plots.

71/158
Diagnostic plots for pulp data

● ● ●
0.4

0.4
● ●
● ●
Sample Quantiles

● ●
0.2

0.2
● ●

Residuals
●● ●
● ●
0.0

0.0
● ●
● ●
●● ●
● ●
−0.4 −0.2

−0.4 −0.2
● ●
●● ● ●
● ●

● ● ●

−2 −1 0 1 2 60.2 60.4 60.6

Theoretical Quantiles Fitted

72/158
Diagnostic plots for pulp data

We can check normality and pick outliers from the

QQ-plot.
We can check the assumption of constant variance from
the residuals versus fitted plot.
If we had more operators, we could also check the
normality and constant variance assumption for the group
level effect too.
In this example the plots indicate no particular problems.

73/158
Mixed models for split-plot designs

Split-plot design

Two factors: A and B.

Factor A is applied to the large experimental units (whole
unit).
The large experimental unit is divided into smaller
experimental units (sub-units).
Factor B is applied to the sub-units.
Each whole unit is a complete replicate of all the levels of
factor B.

74/158
Mixed models for split-plot designs

Why use a split-plot design?

Split-plot experiments are often used out of necessity.

Sometimes a factor, or factorial combination, must be
applied to relatively large experimental units, whereas
other factors are more appropriately applied to sub-units.
Split-plot experiments are also used for convenience. It is
often easier to apply different factors to different sized
units.
They may also be used to increase the precision of the
estimated effect of the factor applied to the sub-units.

75/158
Advantages of split-plot designs

+ They provide greater power for testing the sub-unit

treatment factor and interaction.
+ They allow for different-sized experimental units in the
same experiment.
+ They allow for including a second factor at very little cost.
+ They can be used for experiments involving repeated
measures (from the sub-units) on the same experimental
unit (whole unit).

76/158
Disdvantages of split-plot designs

- Analysis is complicated by the presence of two

experimental error variances and the necessity for several
different standard errors for comparisons.
- High variance and few replications of whole units often
lead to poor sensitivity on the whole-unit factor.

77/158
Example

Water resistance

An experiment was conducted to investigate the effects of

different types of pretreatments and stains on the water
resistance property of wood.
Two types of pretreatments (A and B) and four types of
stains (1, 2, 3 and 4) were included in the study.
Fourteen wood panels were randomly selected and
pretreatment A was applied to seven of them, pretreatment
B was applied to the other seven wood panels.
Each wood panel was divided into four pieces and one of
the four stains was applied to the smaller piece of wood.

78/158
Example
Water resistance data

The water resistance property was characterised by

measuring how long it takes for three drops of water to
pass through the treated materials.
The dataset wood, available from SAS, contains the
following variables:
wood: the identification number of each wood panel in the
study;
pretrt: pretreatment (A or B) applied to the wood panel;
stain: types of stains (1, 2 ,3, or 4) applied to the smaller
piece of wood;
resistance: the time it takes for three drops of water to
pass through the treated materials.

79/158
Example

A split-plot design

This experiment applies each of the pretreatment types (A

and B) to an entire wood panel. Then each panel is cut into
four pieces and the four stain types are applied to the
smaller pieces.
This is a split-plot design. For the pretreatment factor, the
experimental unit is the entire panel, but for the stain
factor, the experimental unit is one of the small pieces cut
from the large panel.

80/158
Example

Quiz
Which of the following factors in the model are fixed and which
random?
wood: the identification number of each wood panel in the
study;
pretrt: pretreatment (A or B) applied to the wood panel;
stain: types of stains (1, 2 ,3, or 4) applied to the smaller
piece of wood.

81/158
Model

yijk = µ + αi + βj + (αβ)ij + wk + eijk

where
yijk is the resistance measurement for the ith pretreatment
(i = 1, 2), jth stain (j = 1, 2, 3, 4) and kth wood panel,
k = 1, . . . , 14;
µ is the overall mean;
αi is the fixed effect associated with the ith pretreatment;
βj is the fixed effect associated with the jth stain;
(αβ)ij is the fixed effect for the pretreatment*stain
interaction;
wk is the random effect associated with the kth wood panel,
2
assumed i.i.d. N(0, σW );
eijk is the residual effect, assumed i.i.d. N(0, σE2 ).
82/158
Fitting the model in R

Model without interaction term:

woodres <- read.table("wood.dat", header=TRUE)
woodres$wood <- as.factor(woodres$wood)
woodres$stain <- as.factor(woodres$stain)

# Fit a mixed model to these split-plot type data:

library(lme4)
m2 <- lmer(resistance ˜ pretrt+stain
+ (1|wood), data=woodres)
summary(m2)

83/158
R output

Linear mixed model fit by REML

Formula: resistance ˜ pretrt + stain + (1 | wood)

Random effects:
Groups Name Variance Std.Dev.
wood (Intercept) 0.81245 0.90136
Residual 0.81566 0.90314
Number of obs: 56, groups: wood, 14

Fixed effects:
Estimate Std. Error t value
(Intercept) 5.9646 0.4346 13.724
pretrtB 1.3050 0.5389 2.422
stain2 -0.3807 0.3414 -1.115
stain3 -0.9064 0.3414 -2.655
stain4 -1.9714 0.3414 -5.775

84/158
Fixed effects estimates

Intercept corresponds to pretreatment A and stain 1.

The estimated means of all other combinations of
pretreatment and stain level can be worked out from the
output.
A t-value larger than 2 in absolute value roughly
corresponds to a significant effect.
Only certain contrasts can be tested directly from the
output in this way, and there is no multiple comparison
adjustment.

85/158
Types of explanatory variables

In the examples so far, we have focused on categorical

explanatory variables (factors) to answer questions such
as:

Are there differences in strength between adhesives?

Which pretreatment/stain combination is the best for

waterproofing wood?

It is often the case that there are also continuous variables

which must be taken into consideration in order to answer
such questions.

86/158
Analysis of covariance

Analysis of covariance is a (slightly dated) name for a

form of analysis which combines aspects of ANOVA and
regression.

It allows us to evaluate the effect of our categorial variable

of interest on our response variable, while accounting for
additional categorical variables.

The categorical variables are essentially nuisance

variables; we are not interested in their effects, but still
have to account for them in our model.

87/158
Example
Clinical trial for blood pressure drugs

Suppose two drugs were evaluated for the effect of

reducing blood pressure.
For each subject we measure their baseline blood pressure
and the changes in blood pressure after administering one
of the two drugs.
Continuous response: blood pressure (BP).
Explanatory variables: drug (categorical); baseline blood
pressure (continuous).
Is there a difference in mean change in BP between the two
treatment groups, when we compare individuals having the
same baseline BP?

88/158
Possible scenarios

Possible relationships between the response variable and the

treatment and covariate:

the slopes and intercepts for the treatments are the same
the slopes are different, but the intercepts are the same
the slopes and intercepts are different
the intercepts are different, but the slopes are the same
the intercepts are different, but all slopes are zero (a
special case of the previous scenario)

89/158
Possible scenarios

BP Change

BP Change
Baseline BP Baseline BP Baseline BP
BP Change

BP Change

Drug 1
Drug 2

Baseline BP Baseline BP

90/158
Example

Silicon wafers

The dataset wafer4 was obtained from the

semiconductor industry.
The experiment was designed to study the effect of
temperature on the deposition rate of a layer of polysilicon
in the fabrication of wafers.
It was thought that the wafer thickness before the
deposition process was applied might have an effect on the
deposition rate.
Therefore, the average thickness of each wafer (thick) was
determined and used as a possible covariate.

91/158
Silicon Wafer Example

Data

A random sample of 24 wafers was collected and used in

the experiment.
Wafers were randomly assigned to one of the three levels
of temperature (900◦ F, 1000◦ F, and 1100◦ F).
As a result, each level of temperature had eight wafers
assigned.
The amount of deposited material at three randomly
chosen sites from each wafer was measured.

92/158
Silicon Wafer Example

Variables in wafer4

temp: temperature (900◦ F, 1000◦ F, and 1100◦ F);

wafer: wafers randomly selected and assigned to one of
the three temperatures;
site: sites on each wafer where the response
measurements were taken (1, 2, and 3);
deposit: the amount of deposited material at each site;
thick: the average thickness of each wafer before the
deposition process.

93/158
Model

yijk = β0 + αi + β1 xij + δi xij + wj(i) + eijk

where
yijk is the deposition rate for the kth site from the jth wafer
assigned to the ith temperature, i = 1, 2, 3, j = 1, . . . , 8
and k = 1, 2, 3;
β0 is the overall intercept;
αi is the coefficient for the ith temperature effect on the
intercept;
β1 is the overall slope;
δi is the coefficient for the ith temperature effect on the
slope;

94/158
Model

Also
xij is the thickness measured on the jth wafer assigned to
the ith temperature.
2
wj(i) is the random effect for wafer, assumed i.i.d. N(0, σW )
(wafer effect nested within temperature);
eijk is the site effect, assumed i.i.d. N(0, σE2 ).

95/158
Data

96/158
Exploratory Plot

97/158
Initial Impressions

It appears that there is a positive relationship between the

deposition rate and the thickness of the wafers across all
three temperatures.

On average, the deposition rate at 900◦ F seems to be

higher than the deposition rate at other temperatures within
the range of the data.

The slopes do not seem to be the same across three

temperatures.

98/158
Fitting the mixed model in R

We fit a mixed model

mod1 <- lmer(deposit ˜ temp + thick + thick:temp

+ (1|wafer), data=data)

Random effects:
Groups Name Variance Std.Dev.
wafer (Intercept) 132.536 11.512
Residual 4.194 2.048

99/158
Fitting the mixed model in R

Fixed effects:
Estimate Std. Error t value
(Intercept) 114.40145 63.83150 1.792
temp1000 -141.12757 89.05137 -1.585
temp1100 84.67028 114.31343 0.741
thick 0.09970 0.03196 3.120
temp1000:thick 0.06371 0.04529 1.407
temp1100:thick -0.05879 0.05774 -1.018

100/158
Comments on output

The summary() command allows us to obtain estimates

for the fixed effect parameters.

This is an over-parameterised model, since more

parameters need to be estimated (8) than there are
independent pieces of information for (6).

To account for this, lme4 uses the first level of factors

temp and thick*temp as a baseline (equal to zero).

The estimates for the other parameters are computed

relative to this fixed estimate.

101/158
Output interpretation

The Intercept term corresponds to the intercept for the

first level of the group variable, in this case temp 900◦ F.

The estimated intercept coefficient for temp 900◦ F is

114.40.

The coefficient for temp 1000◦ F intercept is

114.40 + (−141.13) = −26.73.

The coefficient for temp 1100◦ F intercept is

114.40 + 84.67 = 199.07.

102/158
Output interpretation

The estimate for thick corresponds to the slope for the

first level of the group variable, in this case temp 900◦ F.

The estimated slope for temp 900◦ F is 0.0997.

The slope for temp 1000◦ F is computed as

0.0997 + 0.0637 = 0.1634.

The slope for temp 1100◦ F is computed as

0.0997 + (−0.05879) = 0.0409.

103/158
Fitted regression lines

For temp 900◦ F

deposit = 114.40 + 0.0997 ∗ thick

For temp 1000◦ F

deposit = (114.40 − 141.13) + (0.0997 + 0.0637) ∗ thick

= −26.73 + 0.1634 ∗ thick.

For temp 1100◦ F

deposit = (114.40 + 84.67) + (0.0997 − 0.05879) ∗ thick

= 199.07 + 0.0409 ∗ thick.

104/158
Test of the interaction term

The |t-value | < 2 for the interaction terms indicates that

the slopes may not be significantly different.
There may not be enough evidence to warrant a model
with different slopes.

105/158
Model with different intercepts, same slope

We now fit a mixed model with different intercepts, but the

same slope.

mod2 <- lmer(deposit ˜ temp + thick +

(1|wafer), data=data)

Random effects:
Groups Name Variance Std.Dev.
wafer (Intercept) 151.817 12.321
Residual 4.194 2.048

Note that the variance component for wafer has changed

because we have changed the mean model.

106/158
Fixed effects parameter estimates

Fixed effects:
Estimate Std. Error t value
(Intercept) 83.89769 43.89139 1.911
temp1000 -17.14673 6.33695 -2.706
temp1100 -30.79875 6.20972 -4.960
thick 0.11501 0.02191 5.249

107/158
Fitted regression lines

For temp 900◦ F

deposit = 83.8977 + 0.1150 ∗ thick

For temp 1000◦ F

deposit = (83.8977 − 17.1467) + 0.1150 ∗ thick

= 66.7510 + 0.1150 ∗ thick.

For temp 1100◦ F

deposit = (83.8977 − 30.7988) + 0.1150 ∗ thick

= 53.0989 + 0.1150 ∗ thick.

108/158
Adjusting for a covariate

In the presence of a covariate (thickness), we can no longer

simply look at average deposition rates for the three
temperatures.

Instead, we can obtain these at a given covariate value,

e.g. at thickness=2000.

Or we could obtain the means for each temperature

evaluated at the average thickness, x̄·· , by taking
β0 + αi + β1 x̄·· .

These are called adjusted treatment means.

109/158
Adjusted treatment means in R

ls_means(mod2, test.effs=NULL, method.grad=’simple’)

Least Squares Means table:

Estimate Std. Error df t value lower upper Pr(>|t|)

temp900 309.8568 4.4204 20 70.097 300.6361 319.0776 < 2.2e-16
temp1000 292.7101 4.4382 20 65.953 283.4522 301.9680 < 2.2e-16
temp1100 279.0581 4.3778 20 63.743 269.9261 288.1901 < 2.2e-16

At the mean value of thick, 1964.7, the average amounts

of deposit for temp 900◦ F, 1000◦ F, and 1100◦ F are
309.86, 292.71, and 279.06, respectively.

110/158
Pairwise differences in means

ls_means(mod2, test.effs=NULL, method.grad=’simple’, pairwise = TRUE)

Least Squares Means table:

Estimate Std. Error df t value

temp900 - temp1000 17.14673 6.33695 20 2.7058
temp900 - temp1100 30.79875 6.20972 20 4.9598
temp1000 - temp1100 13.65202 6.24773 20 2.1851

lower upper Pr(>|t|)

temp900 - temp1000 3.92809 30.36538 0.01360
temp900 - temp1100 17.84550 43.75200 7.539e-05
temp1000 - temp1100 0.61948 26.68456 0.04095

Due to the common slope model, the differences between

deposits at different temperatures will be the same at any
thickness level.

111/158
Comments on the output

At any thickness level, the average amount of deposit for

temp 900◦ F is 17.15 larger than that for temp 1000◦ F.
The average amount of deposit for temp 900◦ F is 30.80
larger than that for temp 1100◦ F.
The average amount of deposit for temp 1000◦ F is 13.65
larger than that for temp 1100◦ F
All pairwise differences are significantly different from
zero.

112/158
Random coefficient models

In the ANCOVA model, the regression coefficients for one

or more continuous explanatory variables are assumed to
be fixed effects.
In a random coefficient model, the regression coefficients
for one or more continuous explanatory variables are
assumed to be random effects.
Data arise from independent subjects or clusters from a
larger population of interest.
The regression model for each subject or cluster can be
assumed to be a random deviation from some population
regression model.

113/158
ANCOVA

+ β 4x
= α4 x
|x) 4 +β 2
µ(y α 2

|x) 2=
y µ (y
β x
= α 1+ 1
µ(y|x) 1

114/158
Random coefficient model

b 4x
a4+

+b 2x
a2
x
y a 3 +b 3 x
a 1 +b 1

115/158
Fixed vs random regression coefficients

ANCOVA graph

The categorical variable (e.g. temperature in the silicon

wafer example) represents all levels of interest; therefore,
it is a fixed effect.

The regression coefficients for each level of the

temperature variable represent unknown fixed parameters
that are estimated from the data.

116/158
Fixed vs random regression coefficients

Random coefficient model graph

The random regression lines for each subject deviate about

the overall population regression line.

The goals of fitting a random coefficient model are

1 to estimate the variances of the intercept and the slope and
any covariance between the two; and
2 to obtain the best linear unbiased predictors (BLUPs) of the
intercept and slope for each subject or cluster.

117/158
Example
Wheat

Ten varieties of wheat were randomly selected from the

population of varieties of hard red winter wheat adapted to
dry climate conditions.
Each variety was randomly assigned to six one-acre plots
of land; thus the experimental units are one-acre plots of
land in a 60-acre field.
It was thought that the pre-plant moisture content of the
plots could have an influence on the germination rate and
hence the eventual yield of the plots.
Therefore, the amount of pre-planting moisture in the top
36 inches of the soil was determined for each plot.

118/158
Wheat Example

Data
The wheat dataset contains the following variables:
id: the identification number for the plots;
variety: ten randomly selected varieties of winter
wheat;
moist: the amount of moisture measured before planting
the varieties on the plots;
yield: the yield of the plot in bushels per acre.

119/158
Yield vs moisture
80

variety
1
60 2
3
4
yield

5
6
7
50 8
9
10

20 40 60
moisture
120/158
Wheat Example

The response variable is the yield in bushels per acre

(yield), and the continuous explanatory variable is the
measured amount of moisture (moist).

Varieties were randomly selected from the population of

wheat varieties and should be represented by a random
effect.

The resulting regression model for each variety therefore

represents a random sample from the model for the
population of varieties.

Each regression model can be expressed as deviations from

the population model.

121/158
Model

yij = ai + bi xij + eij

where
yij is the yield for the ith variety in the jth plot,
i = 1, . . . , 10 and j = 1, . . . , 6;
xij is the moisture of the ith variety in the jth plot;
ai is the intercept for the ith variety. This is a random effect
because variety is a random effect.
bi is the slope for the ith variety. This is also a random
effect because variety is a random effect.
eij is the random error, assumed i.i.d. N(0, σE2 ).

122/158
Model

For the random intercept and random slope we assume

2
ai iid α σA σAB
∼N ,
bi β σAB σB2

The fixed effects of the model are the intercept α and the
slope β.

These are the expected values of the intercepts and slopes

for the population of varieties.

123/158
Mixed model parameterisation
We have shown that

ai = α + a∗i
bi = β + b∗i

We can therefore rewrite the model as:

yij = α + βxij + a∗i + b∗i xij + eij

where
iid
a∗i ∼ N(0, σA2 ),
iid
b∗i ∼ N(0, σB2 )
Cov a∗i , b∗i = σAB .

124/158
Mixed model parameterisation

Fixed effects part of the model

α + βxij

Random effects part of the model

a∗i + b∗i xij + eij

125/158
Random Slope, Random Intercept

Variance

σA2 σAB 1
+ σE2

Var (yij ) = 1 xij
σAB σB2 xij

= σA2 + 2σAB xij + σB2 xij2 + σE2

Covariance

Cov yij , yik = Cov a∗i + b∗i xij + eij , a∗i + b∗i xik + eik

= σA2 + σAB xij + σAB xik + σB2 xij xik

126/158
Wheat Data
id variety yield moist
1 1 41 10
2 1 69 57
3 1 53 32
4 1 66 52
5 1 64 47
6 1 64 48
7 2 49 30
8 2 44 21
...
59 10 67 48
60 10 74 59

Note: to ensure numerical stability, it is a good idea to scale our

covariate (moist) to take values between 1 and 10.

127/158
Fitting the random coefficient model in R

m1 <- lmer(yield ˜ moist + (moist|variety))

Random effects:
Groups Name Variance Std.Dev. Corr
variety (Intercept) 18.8947 4.3468
moist 0.2394 0.4893 -0.34
Residual 0.3521 0.5933

Fixed effects:
Estimate Std. Error t value
(Intercept) 33.4339 1.3985 23.91
moist 6.6166 0.1678 39.42

128/158
Fixed effects in lmer

The term moist in the formula generates a model matrix

X with two columns: the intercept column (all 1s) and the
numeric moist column.

Note that the intercept column is included by default.

If we want to fit the model with out an intercept, we must

specify that using 0 + moist or -1 + moist.

129/158
Random effects in lmer

Our random effect terms are generated by

(moist|variety).

The second part of this term ( |variety) tells R to

generate random effect(s) for each of the 10 unique levels
of the variety parameter.

The first part (moist| ) determines the structure of

these random effect terms.

130/158
Random effects in lmer
Until now, we have only used random effects of the form
(1|variety).

The 1 tells R to generate a set of univariate random effects

at the intercept level.

This time, our random effects take the form

(moist|variety).

Again, this includes an intercept by default, and could be

rewritten as (1 + moist|variety).

This therefore tells R to generate a pair of random effects

for each vector - one for the slope (moist) and one for the
intercept.
131/158
Random effects in lmer

The pair of random effects generated by

(moist|variety) are correlated - ie there is a
correlation between the slope and intercept effects.

If we want uncorrelated effects, then we must instead

include them as two separate terms:
(1|variety) +(0 + moist|variety).

Similar to the fixed effects, we tell R not to include an

intercept effect using 0 or -1.

132/158
Output: fixed effects

The summary() function provides the following:

Fixed effects:
Estimate Std. Error t value
(Intercept) 33.4339 1.3985 23.91
moist 6.6166 0.1678 39.42

We obtain estimates for our intercept and slope

parameters: α̂ = 33.43 and β̂ = 6.62.

From the t-value, it is clear that there is a significant

relationship between moisture and yield.

133/158
Output: random effects

The summary() function also provides the following:

Random effects:
Groups Name Variance Std.Dev. Corr
variety (Intercept) 18.8947 4.3468
moist 0.2394 0.4893 -0.34
Residual 0.3521 0.5933

We obtain estimates for our variance parameters:

σ̂A2 = 18.8947, σ̂B2 = 0.2394 and σ̂E2 = 0.3521.

We can also compute the covariance σ̂AB as:

√ √
σ̂A σ̂B ρ̂AB = 18.8947 × 0.2394 × −0.34 = −0.727.

134/158
Output: random effects
The ranef() function provides estimates for each of our
random effect terms.
(Intercept) moist
1 0.9577955 -0.4921125
2 -2.2842770 -0.6669726
3 -0.4081197 0.6722278
4 0.6960210 -0.2330618
5 1.1159079 -0.1990372
6 4.6391469 0.2388880
7 -10.7300464 0.5642359
8 2.4011660 0.2243375
9 -0.1762124 0.2335679
10 3.7886182 -0.3420729

The first and second columns contain our intercept effects

aˆ∗i and slope effects bˆ∗i respectively.

135/158
Output: random effects

We can use our estimates α̂, β̂, aˆ∗i and bˆ∗i to construct a
unique fitted line for each variety i.

For example, we can construct the line for variety 1 as

follows:

EBLUP1 = α̂ + β̂x + aˆ∗1 + bˆ∗1 x

= 33.43 + 6.62x + 0.96 + (−0.49)x
= 34.39 + 6.13x

Each variety has a unique intercept and slope, each of

which vary around a common mean.

136/158
Fitted lines for each variety
80

variety
1
60 2
3
4
yield

5
6
7
50 8
9
10

2 4 6
moisture 137/158
Likelihood ratio test

The correlation between the intercept and slope random

effects in our model was -0.34.

We may wish to consider a model with uncorrelated

random effects.

We can use a likelihood ratio test to compare these two

models and decide whether the correlation is necessary.

Note that this test is OK since we are testing ρ = 0, which

is not on the boundary of the parameter space for a
correlation parameter [−1, 1].

138/158
Likelihood ratio test

We can carry out the likelihood ratio test in R using the

anova() function.
anova(m2,m1)

Models:
m2: yield ˜ moist + (1 | variety) + (0 + moist | variety)
m1: yield ˜ moist + (moist | variety)
Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)
m2 5 193.10 203.57 -91.548 183.10
m1 6 194.06 206.62 -91.028 182.06 1.0411 1 0.3076

We obtain a p-value of 0.31, which suggests that the

simpler uncorrelated model may be used.

139/158
Likelihood ratio test
We can carry out a similar test to see whether we can
remove the random slope from our model.

anova(m3,m2)

Models:
m3: yield ˜ moist + (1 | variety)
m2: yield ˜ moist + (1 | variety) + (0 + moist | variety)
Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)
m3 4 208.26 216.64 -100.129 200.26
m2 5 193.10 203.57 -91.548 183.10 17.162 1 3.432e-05 ***

We obtain a p-value << 0.05, which suggests that the

random slope is necessary.

We can do a similar test to see whether the random

intercept is necessary (p << 0.05 again).

140/158
Hierarchical linear models

In hierarchical or multilevel models, we have a nested

data structure.

We therefore have a model design which accounts for this

nesting by considering each level of the data in turn.

Level 1 of the model corresponds to the smallest sized

units.

Level 2 of the model corresponds to the first grouping

factor, such that Level 1 is nested within Level 2.

If we have more than two levels, then Level 2 is nested

within Level 3, and so on.

141/158
Hierarchical linear models

In the analysis of such data we fit a random coefficient

model at Level 1.

The coefficients for this Level 1 model are modelled as a

function of the Level 2 variables.

We continue this pattern if there are more levels and finally

combine models from all the levels.

142/158
Example

Test score gains

Data was collected for 3,111 eighth-grade students in the

US.
The students’ test score gains (Gain) on one of the
mathematics achievement tests were recorded.
In addition, the sum of some pretest core items
(PreTotal) on the same students was also recorded.
These students were grouped into 159 classes.
A variable measuring the percent of the class with a
sufficient degree of mastery of previous curricula
(Tmastry) was recorded for each class.

143/158
Test Score Example

Data
The mathscore dataset contains the following variables:
Gain: the test score gains on a mathematics achievement
test for each student;
PreTotal: the sum of some pretest core items for each
student;
Class: the class each student belongs to;
Tmastry: the percent of class mastering previous
curricula.

144/158
Test Score Example

Nested structure

Students are nested within classes.

Students and classes are often considered as random

effects.

In this example, the student effect is modelled by the

residuals.

Measurements were taken at both student and class levels.

145/158
Model at the student level

The response variable, the test score gains (Gain), was

measured at the student level.
The explanatory variable, the sum of some pretest scores
(PreTotal), was also measured at the student level.
A linear regression model might be appropriate to fit the
data.
Because classes are considered as random effects, a
random coefficient model may be reasonable.
In this random coefficient model, the coefficients for each
class (intercept and slope) represent a deviation from some
population regression model.

146/158
Model at the student level

The model at student level is therefore:

yij = aj + bj xij + eij

where
yij is the gain for the ith student in the jth class,
i = 1, . . . , nj and j = 1, . . . , 159;
xij is the sum of pretest scores of the ith student in the jth
class;
aj is the intercept for the jth class. This is a random effect
because class is a random effect.
bj is the slope for the jth class. This is also a random effect.
eij is the random error, assumed i.i.d. N(0, σE2 ).

147/158
Model at the student level

For the random intercept and random slope we assume

2
aj iid α0 σA σAB
∼N ,
bj β0 σAB σB2

The fixed effects of the model are the intercept α0 and the
slope β0 .

These are the expected values of the intercepts and slopes

for the population of classes.

148/158
Model at the student level
We can rewrite this model as:

yij = α0 + β0 xij + a∗j + b∗j xij + eij

where

aj = α0 + a∗j and
bj = β0 + b∗j

with
iid
a∗j ∼ N(0, σA2 ),
iid
b∗j ∼ N(0, σB2 ) and
Cov a∗j , b∗j = σAB .

149/158
Model at the class level

The percentage of the class mastering previous curricula

(Tmastry) is measured at the class level.

This effect can be incorporated into the model for the

intercept and slope for each class:

aj = α0 + α1 zj + a∗j
bj = β0 + β1 zj + b∗j

where zj is the Tmastry for class j, α0 and β0 are fixed

intercepts and α1 and β1 fixed slope parameters.

150/158
Model at the class level

Here we are rewriting the random coefficients for the

intercept and slope to incorporate the Tmastry effect
measured at class level.

The distributional assumptions on a∗j and b∗j are

iid
a∗j ∼ N(0, σA2 ),
iid
b∗j ∼ N(0, σB2 ) and
Cov a∗j , b∗j = σAB .

151/158
Multilevel model

We can combine the student-level model and the

class-level equations

yij = aj + bj xij + eij

aj = α0 + α1 zj + a∗j
bj = β0 + β1 zj + b∗j

to produce a single equation involving effects at two levels

(student and class levels):

yij = α0 + α1 zj + β0 xij + β1 zj xij + a∗j + b∗j xij + eij

152/158
Mathscore data

Class Tmastry Gain PreTotal

1 50 2 20
1 50 -3 18
1 50 5 12
1 50 1 9
1 50 -3 11
1 50 3 12
...
159 95 5 12
159 95 3 16
159 95 12 12

Note: to ensure numerical stability, it is a good idea to scale our

covariates (Tmastry and PreTotal) to take values between 1 and
10.

153/158
Fitting the multilevel model in R

m1 <- lmer(Gain ˜ PreTotal + Tmastry +

PreTotal:Tmastry + (PreTotal|Class) )

Random effects:
Groups Name Variance Std.Dev. Corr
Class (Intercept) 9.0284 3.005
PreTotal 0.7796 0.883 -0.82
Residual 21.6545 4.653

Fixed effects:
Estimate Std. Error t value
(Intercept) -1.494221 1.341034 -1.114
PreTotal -1.602810 0.652153 -2.458
Tmastry 1.131062 0.176961 6.392
PreTotal:Tmastry -0.006142 0.084758 -0.072

154/158
Output: fixed effects

Fixed effects:
Estimate Std. Error t value
(Intercept) -1.494221 1.341034 -1.114
PreTotal -1.602810 0.652153 -2.458
Tmastry 1.131062 0.176961 6.392
PreTotal:Tmastry -0.006142 0.084758 -0.072

Our parameter estimates are α̂0 = −1.49, β̂0 = −1.60,

α̂1 = 1.13 and β̂1 = −0.0062.

The interaction term PreTotal:Tmastry has a very

small t-value. This term is not significant and can be
dropped from the model.

155/158
Output: random effects

Random effects:
Groups Name Variance Std.Dev. Corr
Class (Intercept) 9.0284 3.005
PreTotal 0.7796 0.883 -0.82
Residual 21.6545 4.653

The estimated variance components are

σ̂A2 = 9.0284
σ̂B2 = 0.7795
σ̂AB = σ̂A σ̂A ρ̂AB = −2.1861
σ̂E2 = 21.6545
Note the negative correlation between intercept and slope: larger intercepts
tend to have smaller slopes.

156/158
Likelihood ratio test

We can carry out a likelihood ratio test to see if we need

correlation between our slope and intercept random
effects.

Models:
m3: Gain ˜ PreTotal + Tmastry + (1 | Class) +
m3: (0 + PreTotal | Class)
m2: Gain ˜ PreTotal + Tmastry + (PreTotal | Class)
Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)
m3 6 18681 18717 -9334.6 18669
m2 7 18673 18716 -9329.6 18659 9.8694 1 0.001681 **

We obtain a p-value << 0.05, which suggests that the

correlation is necessary.

157/158
EBLUPs for (some) random effects

Partial output showing some of the estimates â∗j and b̂∗j :

(Intercept) PreTotal
1 -1.92675077 0.0119876249
2 -5.39003631 0.0915662399
3 -1.00595155 0.0094430992
4 -0.38172198 0.0321817612
5 1.58481753 -0.0650445905
6 0.13879856 -0.0187367181
7 -0.87685526 0.0031362848
8 -2.18841662 0.0299758090

158/158

Seattle SISG 18 IntroQG Lecture08
No ratings yet
Seattle SISG 18 IntroQG Lecture08
21 pages
Mixed Models with lme4 in R
No ratings yet
Mixed Models with lme4 in R
21 pages
A Short Introduction To Linear Mixed Models
No ratings yet
A Short Introduction To Linear Mixed Models
5 pages
Chapter 09 Linear
No ratings yet
Chapter 09 Linear
29 pages
REML Estimation for Statisticians
No ratings yet
REML Estimation for Statisticians
8 pages
Calcutta University, University of California, Davis and Carleton University
No ratings yet
Calcutta University, University of California, Davis and Carleton University
23 pages
Math170S Lecture6
No ratings yet
Math170S Lecture6
13 pages
Lecture 24: Weighted and Generalized Least Squares 1 Weighted Least Squares
No ratings yet
Lecture 24: Weighted and Generalized Least Squares 1 Weighted Least Squares
8 pages
Variance Component Estimation & Best Linear Unbiased Prediction (Blup)
100% (1)
Variance Component Estimation & Best Linear Unbiased Prediction (Blup)
16 pages
11 Mle
No ratings yet
11 Mle
26 pages
Econometria Avanzada: Generalized Linear Models
No ratings yet
Econometria Avanzada: Generalized Linear Models
30 pages
Lecture1 ML MLE
No ratings yet
Lecture1 ML MLE
103 pages
Appendix Mixed Models
No ratings yet
Appendix Mixed Models
24 pages
Pg1342 Images
No ratings yet
Pg1342 Images
51 pages
Federico Vegetti - GLM and Maximum Likelihood
No ratings yet
Federico Vegetti - GLM and Maximum Likelihood
32 pages
Experiment 1
No ratings yet
Experiment 1
5 pages
Mixed Models Theory and Applications With R 2nd Edition Complete Ebook Edition
100% (16)
Mixed Models Theory and Applications With R 2nd Edition Complete Ebook Edition
17 pages
Advanced Econometrics: GMM & MLE
No ratings yet
Advanced Econometrics: GMM & MLE
15 pages
Lecture-4 2
No ratings yet
Lecture-4 2
50 pages
Week 3 ML Estimation and REML
No ratings yet
Week 3 ML Estimation and REML
39 pages
Estimation 4
No ratings yet
Estimation 4
16 pages
AllNotes 4
No ratings yet
AllNotes 4
56 pages
Week03 Lecture BB
No ratings yet
Week03 Lecture BB
112 pages
3 - Mle
No ratings yet
3 - Mle
14 pages
17.874 Lecture Notes Part 6: Panel Models
No ratings yet
17.874 Lecture Notes Part 6: Panel Models
13 pages
Lecture 22: Review For Exam 2 1 Basic Model Assumptions (Without Gaussian Noise)
No ratings yet
Lecture 22: Review For Exam 2 1 Basic Model Assumptions (Without Gaussian Noise)
7 pages
HKUST ISOM 2500 Lecture Materials On Uncertainty
No ratings yet
HKUST ISOM 2500 Lecture Materials On Uncertainty
26 pages
Mixed Model Analysis For Overdispersion
No ratings yet
Mixed Model Analysis For Overdispersion
9 pages
Design and Analysis of Computer Experiments: Theory: 1 Density Estimation
No ratings yet
Design and Analysis of Computer Experiments: Theory: 1 Density Estimation
9 pages
MLE Explained for Stat Students
No ratings yet
MLE Explained for Stat Students
6 pages
Unit - III
No ratings yet
Unit - III
4 pages
Unit - 1
No ratings yet
Unit - 1
8 pages
3.exponential Family & Point Estimation - 552
0% (1)
3.exponential Family & Point Estimation - 552
33 pages
Chap 2
No ratings yet
Chap 2
40 pages
Maximum Likelihood Estimation Lecture
No ratings yet
Maximum Likelihood Estimation Lecture
22 pages
Mathematics of The Linear Model and Linear Mixed Model: Brian Zhang February 2020
No ratings yet
Mathematics of The Linear Model and Linear Mixed Model: Brian Zhang February 2020
20 pages
Stat 353 Study Guide
No ratings yet
Stat 353 Study Guide
44 pages
Chapter 1 - Linear Regression With 1 Predictor: Statistical Model
No ratings yet
Chapter 1 - Linear Regression With 1 Predictor: Statistical Model
35 pages
REML
No ratings yet
REML
40 pages
Dis 1
No ratings yet
Dis 1
5 pages
Maximum Likelihood
No ratings yet
Maximum Likelihood
11 pages
Wang PDF
No ratings yet
Wang PDF
56 pages
Sta 3
No ratings yet
Sta 3
9 pages
Sta255 Week 11-1 Pre
No ratings yet
Sta255 Week 11-1 Pre
37 pages
Chap 5
No ratings yet
Chap 5
13 pages
Jacob Rich
No ratings yet
Jacob Rich
38 pages
McCulloch and Neuhaus 2005 Generalized Linear Mixed Models
No ratings yet
McCulloch and Neuhaus 2005 Generalized Linear Mixed Models
5 pages
Using Estimated Linear Mixed: Generalized Linear Models Trajectories From Model
No ratings yet
Using Estimated Linear Mixed: Generalized Linear Models Trajectories From Model
9 pages
Lectura 1 Point Estimation
No ratings yet
Lectura 1 Point Estimation
47 pages
Linear Regression
No ratings yet
Linear Regression
108 pages
Book Mixed Model Henderson
No ratings yet
Book Mixed Model Henderson
384 pages
Henderson 1984 PDF
No ratings yet
Henderson 1984 PDF
384 pages
Applied Regression Tutorial
No ratings yet
Applied Regression Tutorial
2 pages
06 Slides
No ratings yet
06 Slides
52 pages
JDSSV 2024 112 - Galley
No ratings yet
JDSSV 2024 112 - Galley
25 pages
Statistical Inference: Classical and Bayesian Methods
No ratings yet
Statistical Inference: Classical and Bayesian Methods
22 pages
Fixed and Random Effects: Jos Elkink
No ratings yet
Fixed and Random Effects: Jos Elkink
121 pages
Data Analysis and Statistical Packages 1
No ratings yet
Data Analysis and Statistical Packages 1
19 pages
TD 1
No ratings yet
TD 1
6 pages
HW2 Solution
No ratings yet
HW2 Solution
7 pages
Influence Networks in International Relations: Social in Uence Regression, Provides A Way
No ratings yet
Influence Networks in International Relations: Social in Uence Regression, Provides A Way
34 pages
Intervention Analysis
No ratings yet
Intervention Analysis
37 pages
Two-Variable Regression Estimation
No ratings yet
Two-Variable Regression Estimation
35 pages
B A B SC Three Year Degree Course W e F 2012-2013
No ratings yet
B A B SC Three Year Degree Course W e F 2012-2013
30 pages
Regression: Variables Entered/Removed
No ratings yet
Regression: Variables Entered/Removed
1 page
Financial Econometrics Exam
No ratings yet
Financial Econometrics Exam
5 pages
Problems in Uncertainty With Solutions Physics 1
No ratings yet
Problems in Uncertainty With Solutions Physics 1
13 pages
Ma40092 Problem Sheet 3 - Solutions
No ratings yet
Ma40092 Problem Sheet 3 - Solutions
4 pages
Bahan Presentasi Kelompok 3
No ratings yet
Bahan Presentasi Kelompok 3
12 pages
Fin
No ratings yet
Fin
2 pages
UIUC ECON 490: Applied Machine Learning in Economics
No ratings yet
UIUC ECON 490: Applied Machine Learning in Economics
28 pages
Linear and Polynomial Regression Guide
No ratings yet
Linear and Polynomial Regression Guide
43 pages
Sample Size and Estimation New
No ratings yet
Sample Size and Estimation New
4 pages
Ket Qua Eview Chuong 4 - 9
No ratings yet
Ket Qua Eview Chuong 4 - 9
12 pages
Chapter 5
No ratings yet
Chapter 5
16 pages
Sampling & Estimation Practice
No ratings yet
Sampling & Estimation Practice
9 pages
Lecture 4 Notes Final20180219203938
No ratings yet
Lecture 4 Notes Final20180219203938
21 pages
Engineering Curve Fitting Guide
No ratings yet
Engineering Curve Fitting Guide
4 pages
Process Anleitung Alle Modelle
No ratings yet
Process Anleitung Alle Modelle
90 pages
Homework 3
No ratings yet
Homework 3
2 pages
Course Title: Quantitative Techniques For Economics Course Code: ECON6002 Topic: The Linear Probability Model (LPM)
No ratings yet
Course Title: Quantitative Techniques For Economics Course Code: ECON6002 Topic: The Linear Probability Model (LPM)
12 pages
Caro 2013
No ratings yet
Caro 2013
9 pages
HW Multiple Regression Analysis
No ratings yet
HW Multiple Regression Analysis
5 pages
Instant Download Multilevel Modeling Using R 1st Edition Edition W. Holmes Finch PDF All Chapters
100% (21)
Instant Download Multilevel Modeling Using R 1st Edition Edition W. Holmes Finch PDF All Chapters
81 pages
Transportation Planning Analysis
No ratings yet
Transportation Planning Analysis
13 pages
OLS Regression Analysis Insights
No ratings yet
OLS Regression Analysis Insights
15 pages