2 - An Introduction To Multilevel Data Structure
2 - An Introduction To Multilevel Data Structure
23
24 Multilevel Modeling Using R
only one level of a higher-level variable such as school. Thus, students are
nested within school. Such designs can be contrasted with a crossed data
structure whereby individuals at the first level appear in multiple levels of
the second variable. In our example, students might be crossed with after-
school organizations if they are allowed to participate in more than one.
For example, a given student might be on the basketball team as well as in
the band. The focus of this book is almost exclusively on nested designs,
which give rise to multilevel data. Other examples of nested designs might
include a survey of job satisfaction for employees from multiple depart-
ments within a large business organization. In this case, each employee
works within only a single division in the company, which leads to a
nested design. Furthermore, it seems reasonable to assume that employ-
ees working within the same division will have correlated responses on
the satisfaction survey, as much of their view regarding the job would
be based exclusively upon experiences within their division. For a third
such example, consider the situation in which clients of several psycho-
therapists working in a clinic are asked to rate the quality of each of their
therapy sessions. In this instance, there exist three levels in the data: time,
in the form of individual therapy session, client, and therapist. Thus, ses-
sion is nested in client, who in turn is nested within therapist. All of this
data structure would be expected to lead to correlated scores on a therapy-
rating instrument.
Intraclass Correlation
In cases where individuals are clustered or nested within a higher-level unit
(e.g. classrooms, schools, school districts), it is possible to estimate the cor-
relation among individuals’ scores within the cluster/nested structure using
the intraclass correlation (denoted ρI in the population). The ρI is a measure
of the proportion of variation in the outcome variable that occurs between
groups versus the total variation present and ranges from 0 (no variance
between clusters) to 1 (variance between clusters but not within cluster vari-
ance). ρI can also be conceptualized as the correlation for the dependent mea-
sure for two individuals randomly selected from the same cluster. It can be
expressed as
t2
rI = (2.1)
t2 + s2
where
τ2 = Population variance between clusters
σ2 = Population variance within clusters
An Introduction to Multilevel Data Structure 25
å (n - 1)S
j =1
j
2
j
sˆ 2 =
N -C
where
nj
2
å (y ij - yj)
S = variance within cluster j =
j
i =1
(n j - 1)
nj = sample size for cluster j
N = total sample size
C = total number of clusters
å (n - 1)S
j =1
j
2
j
sˆ 2 = (2.2)
N -C
where
nj
å (y
j =1
ij - y j )2
S2j =
(n j - 1)
nj = sample size for cluster j
N = total sample size
C = total number of clusters
å n (y - y)
j j
2
Sˆ B2 =
j =1
(2.3)
n (C - 1)
where
y j = mean on response variable for cluster j
y = overall mean on response variable
é C
ù
1 ê
ê å j =1
n2j ú
ú
n = êN - ú
C -1 ê N ú
ê ú
ë û
sˆ 2
tˆ 2 = SB2 - . (2.4)
n
Using these variance estimates, we can in turn calculate the sample estimate
of ρI:
tˆ 2
rˆ I = . (2.5)
tˆ 2 + sˆ 2
Note that Equation (2.5) assumes that the clusters are of equal size. Clearly,
such will not always be the case, in which case this equation will not hold.
However, the purpose for its inclusion here is to demonstrate the principle
underlying the estimation of ρI, which holds even as the equation might
change.
In order to illustrate estimation of ρI, let us consider the following dataset.
Achievement test data were collected from 10,903 third-grade examinees
nested within 160 schools. School sizes range from 11 to 143, with a mean
size of 68.14. In this case, we will focus on the reading achievement test
score, and will use data from only five of the schools, in order to make the
calculations by hand easy to follow. First, we will estimate ŝ 2 . To do so,
we must estimate the variance in scores within each school. These values
appear in Table 2.1.
An Introduction to Multilevel Data Structure 27
TABLE 2.1
School Size, Mean, and Variance of Reading Achievement Test
School N Mean Variance
767 58 3.952 5.298
785 29 3.331 1.524
789 64 4.363 2.957
815 39 4.500 6.088
981 88 4.236 3.362
Total 278 4.149 3.916
∑ (n − 1)S
j =1
j
2
j
(58 − 1)5.3 + (29 − 1)1.5 + (64 − 1)2.9 + (39 − 1)6.1 + (88 − 1)3.4
σˆ 2 = =
N −C 278 − 5
302.1 + 42 + 182.7 + 231.8 + 295.8 1054.4
= = = 3.9
273 273
The school means, which are needed in order to calculate SB2 , appear in
Table 2.2 as well. First, we must calculate n:
C
1
∑ n
j =1
2
j
1 58 2 + 292 + 64 2 + 392 + 88 2 1
n = N− = 278 − = 4 (278 − 63.2)
C−1 N
5−1 278
= 53.7
TABLE 2.2
Between Subjects Intercept and Slope, and within Subjects Variation on
These Parameters by School
School Intercept U0j Slope U1j
1 1.230 −1.129 0.552 0.177
2 2.673 0.314 0.199 −0.176
3 2.707 0.348 0.376 0.001
4 2.867 0.508 0.336 −0.039
5 2.319 −0.040 0.411 0.036
Overall 2.359 0.375
28 Multilevel Modeling Using R
Using this value, we can then calculate SB2 for the five schools in our small
sample using Equation (2.3):
58(3.952 - 4.149)2 + 29(3.331 - 4.149)2 + 64( 4.363 - 4.149)2 + 39( 4.500 - 4.149)2 + 88( 4.236 - 4.149)2
53.7(5 - 1)
2.251 + 19.405 + 2.931 + 4.8805 + 0.666 30.057
= = = 0.140
214.8 214.800
3.9
0.140 - = 0.140 - 0.073 = 0.067
53.7
We have now calculated all of the parts that we need to estimate ρI for the
population,
0.067
rˆ I = = 0.017
0.067 + 3.9
This result indicates that there is very little correlation of examinees’ test
scores within the schools. We can also interpret this value as the proportion
of variation in the test scores that is accounted for by the schools.
Given that r̂I is a sample estimate, we know that it is subject to sampling
variation, which can be estimated with a standard error as in Equation (2.6):
2
srI = (1 - rI ) (1 + (n - 1)rI ) . (2.6)
n(n - 1)( N - 1)
The terms in 2.6 are as defined previously, and the assumption is that
all clusters are of equal size. As noted earlier in the chapter, this latter
condition is not a requirement, however, and an alternative formulation
exists for cases in which it does not hold. However, 2.6 provides suffi-
cient insight for our purposes into the estimation of the standard error
of the ICC.
The ICC is an important tool in multilevel modeling, in large part because
it is an indicator of the degree to which the multilevel data structure might
impact the outcome variable of interest. Larger values of the ICC are indica-
tive of a greater impact of clustering. Thus, as the ICC increases in value, we
must be more cognizant of employing multilevel modeling strategies in our
data analysis. In the next section, we will discuss the problems associated
with ignoring this multilevel structure, before we turn our attention to meth-
ods for dealing with it directly.
An Introduction to Multilevel Data Structure 29
Random Intercept
As we transition from the one-level regression framework of Chapter 1 to
the MLM context, let’s first revisit the basic simple linear regression model
of Equation (1.1), y = b0 + b1x + e . Here, the dependent variable y is expressed
as a function of an independent variable, x, multiplied by a slope coeffi-
cient, β1, an intercept, β0, and random variation from subject to subject, ε.
We defined the intercept as the conditional mean of y when the value of x
is 0. In the context of a single-level regression model such as this, there is
one intercept that is common to all individuals in the population of inter-
est. However, when individuals are clustered together in some fashion (e.g.
within classrooms, schools, organizational units within a company), there
will potentially be a separate intercept for each of these clusters; that is,
there may be different means for the dependent variable for x = 0 across the
different clusters. We say potentially here because if there is in fact no cluster
effect, then the single intercept model of 1.1 will suffice. In practice, assess-
ing whether there are different means across the clusters is an empirical
question, which we describe below. It should also be noted that in this dis-
cussion we are considering only the case where the intercept is cluster spe-
cific, but it is also possible for β1 to vary by group, or even other coefficients
from more complicated models.
Allowing for group-specific intercepts and slopes leads to the following
notation commonly used for the level 1 (micro level) model in multilevel
modeling:
yij = b0 j + b1 j x + e ij (2.7)
where the subscripts ij refer to the ith individual in the jth cluster. As we con-
tinue our discussion of multilevel modeling notation and structure, we will
begin with the most basic multilevel model: predicting the outcome from just
an intercept which we will allow to vary randomly for each group.
yij = b0 j + e ij . (2.8)
An Introduction to Multilevel Data Structure 31
b0 j = g 00 + U 0 j . (2.9)
y = g 00 + U 0 j + b1x + e. (2.10)
Equation (2.10) is termed the full or composite model in which the multiple
levels are combined into a unified equation.
Often in MLM, we begin our analysis of a dataset with this simple random
intercept model, known as the null model, which takes the form
yij = g 00 + U 0 j + e ij . (2.11)
While the null model does not provide information regarding the impact
of specific independent variables on the dependent, it does yield important
information regarding how variation in y is partitioned between variance
among the individuals σ2 and variance among the clusters τ2. The total vari-
ance of y is simply the sum of σ2 and τ2. In addition, as we have already seen,
these values can be used to estimate ρI. The null model, as will be seen in
later sections, is also used as a baseline for model building and comparison.
Random Slopes
It is a simple matter to expand the random intercept model in 2.9 to accom-
modate one or more independent predictor variables. As an example, if we
32 Multilevel Modeling Using R
add a single predictor (xij) at the individual level (level 1) to the model, we
obtain
yij = b0 j + b1 j x + e ij (2.13)
Level 2:
β0 j = γ 00 + U 0 j (2.14)
b1 j = g 10 (2.15)
This model now includes the predictor and the slope relating it to the depen-
dent variable, γ10, which we acknowledge as being at level 1 by the subscript
10. We interpret γ10 in the same way that we did β1 in the linear regression
model; i.e. a measure of the impact on y of a 1-unit change in x. In addition,
we can estimate ρI exactly as before, though now it reflects the correlation
between individuals from the same cluster after controlling for the indepen-
dent variable, x. In this model, both γ10 and γ00 are fixed effects, while σ2 and
τ2 remain random.
One implication of the model in 2.12 is that the dependent variable is
impacted by variation among individuals (σ2), variation among clusters (τ2),
an overall mean common to all clusters (γ00), and the impact of the indepen-
dent variable as measured by γ10, which is also common to all clusters. In
practice there is no reason that the impact of x on y would need to be com-
mon for all clusters, however. In other words, it is entirely possible that rather
than a single γ10 common to all clusters, there is actually a unique effect for
the cluster of γ10 + U1j, where γ10 is the average relationship of x with y across
clusters, and U1j is the cluster-specific variation of the relationship between
the two variables. This cluster -specific effect is assumed to have a mean of 0
and to vary randomly around γ10. The random slopes model is
Written in this way, we have separated the model into its fixed (g 00 + g 10 xij )
and random (U 0 j + U1 j xij + e ij ) components. Model 2.16 simply states that
there is an interaction between cluster and x, such that the relationship of x
and y is not constant across clusters.
Heretofore we have discussed only one source of between-group varia-
tion, which we have expressed as τ2, and which is the variation among
clusters in the intercept. However, Model2.16 adds a second such source of
An Introduction to Multilevel Data Structure 33
å (U )
2
1j - U 1.
(2.17)
J -1
for the slopes, and an analogous equation for the intercept random variance.
Doing so, we obtain t 02 = 0.439 , and t12 = 0.016 . In other words, much more
34 Multilevel Modeling Using R
Centering
Centering simply refers to the practice of subtracting the mean of a vari-
able from each individual value. This implies the mean for the sample of the
centered variables is 0, and implies that each individual’s (centered) score
represents a deviation from the mean, rather than whatever meaning its raw
value might have. In the context of regression, centering is commonly used,
for example, to reduce collinearity caused by including an interaction term
in a regression model. If the raw scores of the independent variables are used
to calculate the interaction, and then both the main effects and interaction
terms are included in the subsequent analysis, it is very likely that collin-
earity will cause problems in the standard errors of the model parameters.
Centering is a way to help avoid such problems (e.g. Iversen, 1991). Such
issues are also important to consider in MLMs, in which interactions are
frequently employed. In addition, centering is also a useful tool for avoid-
ing collinearity caused by highly correlated random intercepts and slopes
in MLMs (Wooldridge, 2004). Finally, centering provides a potential advan-
tage in terms of interpretation of results. Remember from our discussion in
Chapter 1 that the intercept is the value of the dependent variable when the
independent variable is set equal to 0. In many applications the indepen-
dent variable cannot reasonably be 0 (e.g. a measure of vocabulary), however,
which essentially renders the intercept as a necessary value for fitting the
regression line but not one that has a readily interpretable value. However,
when x has been centered, the intercept takes on the value of the dependent
variable when the independent is at its mean. This is a much more useful
interpretation for researchers in many situations, and yet another reason
why centering is an important aspect of modeling, particularly in the mul-
tilevel context.
An Introduction to Multilevel Data Structure 35
assume about the data, and how they differ from one another. For the tech-
nical details we refer the interested reader to Bryk and Raudenbush (2002)
or de Leeuw and Meijer (2008), both of which provide excellent resources
for those desiring a more in-depth coverage of these methods. Our purpose
here is to provide the reader with a conceptual understanding that will aid
in their understanding of application of MLMs in practice.
the number of level-2 clusters increases, the difference in value for MLE and
REML estimates becomes very small (Snijders and Bosker, 1999).
and
Level 2:
b hj = g h 0 + g hl z j + U hj . (2.19)
The additional piece of the equation in 2.19 is γh1zj, which represents the slope
for (γh1), and value of the average vocabulary score for the school (zj). In other
words, the mean school performance is related directly to the coefficient
linking the individual vocabulary score to the individual reading score. For
our specific example, we can combine 2.18 and 2.19 in order to obtain a single
equation for the two-level MLM.
Each of these model terms has been defined previously in the chapter: γ00
is the intercept or the grand mean for the model, γ10 is the fixed effect of
variable x (vocabulary) on the outcome, U0j represents the random variation
for the intercept across groups, and U1j represents the random variation for
the slope across groups. The additional pieces of the equation in 2.13 are γ01
and γ11. γ01 represents the fixed effect of level-2 variable z (average vocabu-
lary) on the outcome. γ11 represents the slope for, and value of, the average
vocabulary score for the school. The new term in Model 2.20 is the cross-
level interaction, γ1001xijzj. As the name implies, the cross-level interaction
is simply the interaction between the level-1 and level-2 predictors. In this
context, it is the interaction between an individual’s vocabulary score and
the mean vocabulary score for their school. The coefficient for this interac-
tion term, γ1001, assesses the extent to which the relationship between an
examinee’s vocabulary score is moderated by the mean for the school that
they attend. A large significant value for this coefficient would indicate that
the relationship between a person’s vocabulary test score and their overall
reading achievement is dependent on the level of vocabulary achievement
at their school.
An Introduction to Multilevel Data Structure 39
Here, the subscript k represents the level-3 cluster to which the individual
belongs. Prior to formulating the rest of the model, we must evaluate if the
slopes and intercepts are random at both levels 2 and 3, or only at level 1,
for example. This decision should always be based on the theory surround-
ing the research questions, what is expected in the population, and what is
revealed in the empirical data. We will proceed with the remainder of this
discussion under the assumption that the level-1 intercepts and slopes are
random for both levels 2 and 3, in order to provide a complete description
of the most complex model possible when three levels of data structure are
present. When the level-1 coefficients are not random at both levels, the terms
in the following models for which this randomness is not present would sim-
ply be removed. We will address this issue more specifically in Chapter 4,
when we discuss the fitting of three-level models using R.
The level-2 and level-3 contributions to the MLM described in 2.13 appear
below.
Level 2:
b0 jk = g 00 k + U 0 jk
b1 jk = g 10 k + U1 jk
Level 3:
g 00 k = d 000 + V00 k
We can then use simple substitution to obtain the expression for the level-1
intercept and slope in terms of both level-2 and level-3 parameters.
b0 jk = d 000 + V00 k + U 0 jk
and (2.23)
b1 jk = d 100 + V10 k + U1 jk
40 Multilevel Modeling Using R
In turn, these terms can be substituted into Equation (2.15) to provide the full
three-level MLM.
( )
yijk = d 000 + V00 k + U 0 jk + d 100 + V10 k + U1 jk xijk + e ijk . (2.24)
Summary
The goal of this chapter was to introduce the basic theoretical underpin-
nings of multilevel modeling, but not to provide an exhaustive technical
discussion of these issues, as there are a number of useful sources avail-
able in this regard, which you will find among the references at the end of
the text. However, what is given here should stand you in good stead as we
move forward with multilevel modeling using R software. We recommend
that while reading subsequent chapters you make liberal use of the informa-
tion provided here, in order to gain a more complete understanding of the
output that we will be examining from R. In particular, when interpreting
output from R, it may be very helpful for you to come back to this chapter
for reminders on precisely what each model parameter means. In the next
two chapters we will take the theoretical information from Chapter 2 and
apply it to real datasets using two different R libraries, nlme and lme4, both
of which have been developed to conduct multilevel analyses with continu-
ous outcome variables. In Chapter 5, we will examine how these ideas can
be applied to longitudinal data, and in Chapters 7 and 8, we will discuss
multilevel modeling for categorical dependent variables. In Chapter 9, we
will diverge from the likelihood-based approaches described here, and dis-
cuss multilevel modeling within the Bayesian framework, focusing on appli-
cation, and learning when this method might be appropriate and when it
might not.