48 CHAPTER 2.
SIMPLE LINEAR REGRESSION
2.8 Matrix approach to simple linear regression
In this section we will briefly discuss a matrix approach to fitting simple linear
regression models. A random sample of size n gives n equations. For the full
SLRM we have
Y1 = β0 + β1 x1 + ε1
Y2 = β0 + β1 x2 + ε2
.. ..
. .
Yn = β0 + β1 xn + εn
We can write this in matrix formulation as
Y = Xβ + ε, (2.22)
where Y is an (n×1) vector of response variables (random sample), X is an (n×
2) matrix called the design matrix, β is a (2 × 1) vector of unknown parameters
and ε is an (n × 1) vector of random errors. That is,
Y1 1 x1 ε1
Y2 1 x2 ε2
β0
Y = .. , X = .. .. , β= , ε = .. .
. . . β1 .
Yn 1 xn εn
The assumptions about the random errors let us write
ε ∼ N n 0, σ 2 I ,
that is vector ε has n-dimensional normal distribution with
ε1 E(ε1 ) 0
ε2 E(ε2 ) 0
E(ε) = E .. = .. = .. = 0
. . .
εn E(εn ) 0
and the variance-covariance matrix
var(ε1 ) cov(ε1 , ε2 ) . . . cov(ε1 , εn )
cov(ε2 , ε1) var(ε2 ) . . . cov(ε2 , εn )
Var(ε) = .. .. .. ..
. . . .
cov(εn , ε1 ) cov(εn , ε2 ) . . . var(εn )
σ2 0 . . . 0
0 σ2 . . . 0
2
= .. .. . . .. = σ I
. . . .
0 0 . . . σ2
2.8. MATRIX APPROACH TO SIMPLE LINEAR REGRESSION 49
This formulation is usually called the Linear Model (in β). All the models we
have considered so far can be written in this general form. The dimensions of
matrix X and of vector β depend on the number p of parameters in the model
and, respectively, they are n × p and p × 1. In the full SLRM we have p = 2.
The null model (p = 1)
Yi = β0 + εi for i = 1, . . . , n
is equivalent to
Y = 1β0 + ε
where 1 is an (n × 1) vector of 1’s.
The no-intercept model (p = 1)
Yi = β1 xi + εi for i = 1, . . . , n
can be written as in matrix notation with
x1
x2
X = .. , β= β1 .
.
xn
Quadratic regression (p = 3)
Yi = β0 + β1 xi + β2 x2i + εi for i = 1, . . . , n
can be written in matrix notation with
1 x1 x21
1 x2 x2 β0
2
X = .. .. .. , β = β1 .
. . . β2
1 xn x2n
The normal equations obtained in the least squares method are given by
b
X T Y = X T X β.
50 CHAPTER 2. SIMPLE LINEAR REGRESSION
It follows that so long as X T X is invertible, i.e., its determinant is non-zero, the
unique solution to the normal equations is given by
βb = (X T X)−1 X T Y .
This is a common formula for all linear models where X T X is invertible. For the
full simple linear regression model we have
Y1
T 1 1 ··· 1 Y2
X Y =
x1 x2 · · · xn ...
Yn
P ! !
Yi nȲ
= P = P
xi Yi xi Yi
and P
T Pn xi
P 2 = n nx̄
P 2 .
X X=
xi xi nx̄ xi
The determinant of X T X is given by
X X
T 2 2 2 2
|X X| = n xi − (nx̄) = n xi − nx̄ = nSxx .
Hence, the inverse of X T X is
P 2 1
P
T 1 xi −nx̄ 1 x2i −x̄
(X X) = −1
= n .
nSxx −nx̄ n Sxx −x̄ 1
So the solution to the normal equations is given by
b = (X T X)−1 X T y
β
1
P
1 x2i −x̄ nȲ
= n P
Sxx −x̄ 1 xi Yi
P 2 P
1 Ȳ P xi − x̄ xi Yi
=
Sxx xi Yi − nx̄Ȳ
P 2 P
1 Ȳ xi − nx̄2 Ȳ + nx̄2 Ȳ − x̄ xi Yi
=
Sxx SxY
P 2 P
1 2
Ȳ ( xi − nx̄ ) − x̄( xi Yi − nx̄Ȳ )
=
Sxx SxY
1 Ȳ Sxx − x̄SxY
=
Sxx SxY
!
b
Ȳ − β1 x̄
=
βb1
2.8. MATRIX APPROACH TO SIMPLE LINEAR REGRESSION 51
which is the same result as we obtained before.
Note:
Let A and B be a vector and a matrix of real constants and let Z be a vector of
random variables, all of appropriate dimensions so that the addition and multipli-
cation are possible. Then
E(A + BZ) = A + B E(Z)
Var(A + BZ) = Var(BZ) = B Var(Z)B T .
In particular,
E(Y ) = E(Xβ + ε) = Xβ
Var(Y ) = Var(Xβ + ε) = Var(ε) = σ 2 I.
These equalities let us prove the following theorem.
b of β is unbiased and its variance-
Theorem 2.7. The least squares estimator β
covariance matrix is
b = σ 2 (X T X)−1 .
Var(β)
b is unbiased. Here we have
Proof. First we will show that β
b = E{(X T X)−1 X T Y } = (X T X)−1 X T E(Y )
E(β)
= (X T X)−1 X T Xβ = Iβ = β.
Now, we will show the result for the variance-covariance matrix.
b = Var{(X T X)−1 X T Y }
Var(β)
= (X T X)−1 X T Var(Y )X(X T X)−1
= σ 2 (X T X)−1 X T IX(X T X)−1 = σ 2 (X T X)−1 .
We denote the vector of residuals as
e = Y − Yb ,
\) = X β
where Yb = E(Y b is the vector of fitted responses µ
bi . It can be shown that
the following theorem holds.
52 CHAPTER 2. SIMPLE LINEAR REGRESSION
Theorem 2.8. The n × 1 vector of residuals e has mean
E(e) = 0
and variance-covariance matrix
Var(e) = σ 2 I − X(X T X)−1 X T .
Hence, variance of the residuals ei is
var[ei ] = σ 2 (1 − hii ),
where the leverage hii is the ith diagonal element of the Hat Matrix H = X(X T X)−1 X T ,
i.e.,
T
hii = xT
i (X X) xi ,
−1
where xT
i = (1, xi ) is the ith row of matrix X.
The ith mean response can be written as
β0
E(Yi ) = µi = xT
i β = (1, xi ) = β0 + β1 xi
β1
and its estimator as
bi = xT b
µ i β.
Then, the variance of the estimator is
µi ) = var(xT
var(b b 2 T T 2
i β) = σ xi (X X) xi = σ hii
−1
and the estimator of this variance is
\ µi ) = S 2 hii ,
var(b
where S 2 is a suitable unbiased estimator of σ 2 .
We can easily obtain other results we have seen for the SLRM written in non-
matrix notation, now using the matrix notation, both for the full model and for a
reduced SLM (no intercept or zero slope).
We have seen on page 50 that
P
T 1 x2i −nx̄
(X X) −1
= .
nSxx −nx̄ n
2.8. MATRIX APPROACH TO SIMPLE LINEAR REGRESSION 53
b = σ 2 (X T X)−1 . Thus
Now, by Theorem 2.7, Var[β]
P 2
xi
var[βb0 ] = σ 2
nSxx
P P n o
1 x̄2
which, by writing x2 = x2 − nx̄2 + nx̄2 , can be written as σ 2 n
+ Sxx
.
Also,
−nx̄
cov(βb0 , βb1 ) = σ 2
nSxx
−σ 2 x̄
= ,
Sxx
and
σ2
var[βb1 ] = .
Sxx
The quantity hii is given by
T
hii = xTi (X X) xi
−1
P 2
1 xj −nx̄ 1
= (1 xi ) .
nSxx −nx̄ n xi
We shall leave it as an exercise to show that this simplifies to
1 (xi − x̄)2
hii = + .
n Sxx
2.8.1 Some specific examples
1. The Null model
As we have seen, this can be written as
Y = Xβ0 + ε
P
where X = 1 is an (n × 1) vector of 1’s. So X T X = n, X T Y = Yi ,
which gives
1X
βb = (X T X)−1 X T Y = Yi = Ȳ = βb0 ,
n
b T −1 2 σ2
var[β] = (X X) σ = .
n
54 CHAPTER 2. SIMPLE LINEAR REGRESSION
2. No-intercept model
We saw that this example fits the General Linear Model with
x1
x2
X = .. , β = β1
.
xn
P P
So X T X = x2i and X T Y =
xi Yi , and we can calculate
P
xi Yi
βb = (X X) X Y = P 2 = βb1 ,
T −1 T
xi
2
b = σ 2 (X T X)−1 = Pσ .
Var[β]
x2i