LECTURE 4: THE K-
VARIABLE LINEAR
MODEL I
Consider the system
y1 = α + β x 1 + ε 1
y 2 = α + βx 2 + ε 2
…….
……..
yN = α + βx N + ε N
or in matrix form
y = Xβ * + ε
where y is Nx1, X is Nx2, β is 2x1, and ε
is Nx1.
N.M. Kiefer, Cornell University, Economics 620, Lecture 4 1
K-Variable Linear Model
1 x1
1 x2 α
X = , β* = .
. . β
1 xN
Good practice requires inclusion of
the column of ones .
Consider the general model
y = Xβ * + ε
Convention: y is Nx1, X is NxK,
β is Kx1, and ε is Nx1.
β1
β
1 x 21 . . . x K1 2
1 x . . . x .
X = 22 K2
, β = .
. . . . . . .
.
1 x 2N . . . x KN
βK
N.M. Kiefer, Cornell University, Economics 620, Lecture 4 2
More on the Linear Model
a typical row looks like:
yi = β1 + β2 x 2i + β3 x 3i +...+ βK x Ki + εi
THE LEAST SQUARES METHOD:
First assumption: Ey = Xβ
S(b) = (y - Xb)’(y - Xb)
= y’y - 2b’X’y + b’X’Xb
NORMAL EQUATIONS
^
X’X β - X’y = 0
These equations always have a solution.
If X’X is invertible
β$ = (X ′X) -1 X ′y.
N.M. Kiefer, Cornell University, Economics 620, Lecture 4 3
^
Proposition: β is a minimizer.
Proof: Let b be any other K-vector.
(y - Xb)' (y - Xb)
^ ^ ^ ^
= (y - X β + X( β - b))' (y - X β + X( β - b))
^ ^ ^ ^
= (y - X β ) ′ (y - X β ) + ( β - b) ′ X′ X( β - b)
^ ^
≥ (y - X β ) ′ (y - X β ). (why?)
^
Definition: e = y - Xβ is the vector of residuals.
Note: Ee = 0 and X’e = 0.
Proposition: The LS estimator is unbiased.
^
Proof: E β = E[(X’X)-1X’y]
= E[(X’X) -1X’(X β+ ε)] = β
N.M. Kiefer, Cornell University, Economics 620, Lecture 4 4
GEOMETRY OF LEAST SQUARES:
consider y = Xβ +ε with
y1 x1
y = ,X = .
y2 x2
Definition: The space spanned by matrix
X is the vector space which consists of all
linear combinations of the column
vectors of X.
Definition: X(X’X)-1X’y is the orthogonal
projection of y to the space spanned by X
Proposition: e is perpendicular to X,
i.e. X’e = 0.
Proof:
^
e = y - Xβ = y - X(X’X)-1X’y
e = (I - X(X’X)-1X’)y
N.M. Kiefer, Cornell University, Economics 620, Lecture 4 5
⇒ X′e = (X′ - X′)y = 0
Thus the equation y = X β$ + e gives y as the
sum of a vector in R[X] and a vector in
N[X′].
Common (friendly) projection matrices:
1. The matrix which projects to the space
orthogonal to the space spanned by X (i.e. to
N[X′] is
M = I - X(X′X) X′ .
−1
Note: e = My. If X is full column rank, M has
rank (N - K).
2. The matrix which projects to the space
spanned by X is
I - M = X(X′X) X′ . −1
Note: y$ = y - e = y - My = (I - M)y. If X is
full column rank, (I - M) has rank K.
N.M. Kiefer, Cornell University, Economics 620, Lecture 4 6
Example in R2
yi = xiβ + εi i= 1,2
e xb
xβ
What is the case of singular X’X?
N.M. Kiefer, Cornell University, Economics 620, Lecture 4 7
Properties of projection
matrices
1. Projection matrices are idempotent.
I.g. (I - M)(I - M) = (I - M).
Proof: (I - M)(I - M)
= (X(X′X) X′)(X(X′X) X′)
−1 −1
= X(X′X) X′ = (I - M)
−1
2. Idempotent matrices have eigenvalues equal
to zero or one.
Proof: Consider the characteristic equation
Mz = λz⇒ M z = Mλz = λ z
2 2
2
Since M is idempotent, M z = Mz.
Thus, λ z = λz, which implies that λ is either
2
0 or 1.
N.M. Kiefer, Cornell University, Economics 620, Lecture 4 8
3. The number of nonzero eigenvalues of a
matrix is equal to its rank.
⇒ For idempotent matrices, trace = rank.
More assumptions to the K-variable linear
model:
Second assumption: V(y) = V(ε) = σ 2 I N
where y and ε are N-vectors.
With this assumption, we can obtain the
sampling variance of β$ .
Proposition: V( β$ ) = σ 2 (X′X) −1
Proof:
β$ = (X′X) −1 X′y
= (X′X) −1 X′Xβ + (X′X) −1 X′ε
hence
β$ = β + (X′X) −1 X′ε
N.M. Kiefer, Cornell University, Economics 620, Lecture 4 9
V( β$ ) = E( β$ −E β$ ) ( β$ −E β$ )′
= E (X′X) −1 X′εε′X(X′X) −1
V( β$ ) = (X′X) −1 X′ (Eεε′) X(X′X) −1
= σ 2 (X′X) −1
Gauss-Markov Theorem: The LS estimator is
BLUE.
Proof: Consider estimating c′β for some c.
A possible estimator is c′ β$
with variance σ 2 c′(X′X) −1 c.
An alternative linear unbiased estimator: b = a′
y.
Eb = a′Ey = a′Xβ.
Since both c′ β$ and b are unbiased, a′X = c′.
N.M. Kiefer, Cornell University, Economics 620, Lecture 4 10
Thus, b = a′y = a′(Xβ + ε)
= a′Xβ + a′ε = c′β + a′ε.
Hence, V(b) = σ 2 a′a.
Now, V(c′ β$ ) = σ 2 a′X(X′X) −1 X′a since c′ = a
′X.
So V(b) - V(c′ β$ ) = σ 2 a′Ma , p.s.d.
Hence V(b) ≥V(c′ β$ )
N.M. Kiefer, Cornell University, Economics 620, Lecture 4 11
2
Estimation of σ
Proposition: s 2 = e′e/(N - K) is an unbiased
estimator for σ 2 .
Proof: e = y - X β$ = My = Mε ⇒
e′e = ε′Mε
Ee′e = Eε′Mε = E tr ε′Mε (why?)
= tr Eε′Mε = tr EMεε′ (important trick)
= tr M Eεε′ = σ 2 tr M = σ 2 (N -K)
⇒ s 2 = e′e/(N - K) is unbiased for σ 2 .
N.M. Kiefer, Cornell University, Economics 620, Lecture 4 12
FIT: DOES THE REGRESSION MODEL
EXPLAIN THE DATA?
We will need the useful idempotent matrix
A = I − 1(1′1)−11′ = I - 11′/N
which sweeps out means.
Here 1 is an N-vector of ones.
Note that AM = M when X contains a constant
term
Definition: The correlation coefficient in the K-
variable case is
2
R = (Sum of squares due to X)/(Total sum of
squares)
= 1 - (e′e/y′Ay).
N.M. Kiefer, Cornell University, Economics 620, Lecture 4 13
N
∑ i
( y − y ) 2
Using A, y′Ay =: i =1
y′Ay = (Ay)′(Ay) = (A y$ + Ae)′(A y$ + Ae)
= y$ ′A y$ + e′Ae since y$′e = 0
Thus, y′Ay = y$ ′A y$ + e′e since Ae = e.
Scaling yields:
y$ ′Ay$ e ′e
1 = +
y ′Ay y ′Ay
What are the two terms of this splitup?
N.M. Kiefer, Cornell University, Economics 620, Lecture 4 14
2
R gives the fraction of variation explained by
X:
R = 1 - (e′e/y′Ay).
2
Note: The adjusted squared correlation
coefficient is given by
2 e ′e / (N − K)
R = 1 -
y ′Ay / ( N − 1)
(Why might this be preferable?)
N.M. Kiefer, Cornell University, Economics 620, Lecture 4 15
REPORTING
Always report characteristics of the sample, i.e.
means, standard deviations, anything unusual or
surprising, how the data set is collected and how
the sample is selected.
Report β$ and standard errors (not t-statistics).
The usual format is
β$
(s. e. of β$ )
Specify s or σ 2ML .
2
Report N and R 2 .
Plots are important. For example, predicted vs.
actual values or predicted and actual values over
time in time series studies should be presented.
N.M. Kiefer, Cornell University, Economics 620, Lecture 4 16
COMMENTS ON LINEARITY:
Consider the following argument: Economic functions don't
change suddenly. Therefore they are continuous. Thus they
are differentiable and hence nearly linear by Taylor's
Theorem.
This argument is false (but irrelevant).
f(x) f(x)
x x
Continuous, not diff, Continuous, diff,
but well-approximated and not well-
by a line. approximated…
N.M. Kiefer, Cornell University, Economics 620, Lecture 4 17