Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
17 views17 pages

Lec 4

The document discusses the K-variable linear model, presenting its mathematical formulation and the least squares method for estimating parameters. It covers key concepts such as the geometry of least squares, properties of projection matrices, and the Gauss-Markov theorem, which states that the least squares estimator is the best linear unbiased estimator (BLUE). Additionally, it emphasizes the importance of reporting sample characteristics and the adjusted squared correlation coefficient in regression analysis.

Uploaded by

nevaass533
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views17 pages

Lec 4

The document discusses the K-variable linear model, presenting its mathematical formulation and the least squares method for estimating parameters. It covers key concepts such as the geometry of least squares, properties of projection matrices, and the Gauss-Markov theorem, which states that the least squares estimator is the best linear unbiased estimator (BLUE). Additionally, it emphasizes the importance of reporting sample characteristics and the adjusted squared correlation coefficient in regression analysis.

Uploaded by

nevaass533
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

LECTURE 4: THE K-

VARIABLE LINEAR
MODEL I
Consider the system

y1 = α + β x 1 + ε 1
y 2 = α + βx 2 + ε 2

…….
……..
yN = α + βx N + ε N

or in matrix form
y = Xβ * + ε

where y is Nx1, X is Nx2, β is 2x1, and ε


is Nx1.

N.M. Kiefer, Cornell University, Economics 620, Lecture 4 1


K-Variable Linear Model
1 x1 
1 x2  α
X =   , β* =   .
. .  β 
 
 
1 xN 

Good practice requires inclusion of


the column of ones .
Consider the general model

y = Xβ * + ε
Convention: y is Nx1, X is NxK,
β is Kx1, and ε is Nx1.
 β1 
β 
1 x 21 . . . x K1   2
1 x . . . x  . 
X =  22 K2 
, β =   .
. . . . . .  . 
  . 
1 x 2N . . . x KN 
 
βK 
N.M. Kiefer, Cornell University, Economics 620, Lecture 4 2
More on the Linear Model

a typical row looks like:


yi = β1 + β2 x 2i + β3 x 3i +...+ βK x Ki + εi

THE LEAST SQUARES METHOD:


First assumption: Ey = Xβ

S(b) = (y - Xb)’(y - Xb)

= y’y - 2b’X’y + b’X’Xb


NORMAL EQUATIONS
^
X’X β - X’y = 0
These equations always have a solution.
If X’X is invertible
β$ = (X ′X) -1 X ′y.
N.M. Kiefer, Cornell University, Economics 620, Lecture 4 3
^
Proposition: β is a minimizer.

Proof: Let b be any other K-vector.


(y - Xb)' (y - Xb)
^ ^ ^ ^
= (y - X β + X( β - b))' (y - X β + X( β - b))
^ ^ ^ ^
= (y - X β ) ′ (y - X β ) + ( β - b) ′ X′ X( β - b)
^ ^
≥ (y - X β ) ′ (y - X β ). (why?)

^
Definition: e = y - Xβ is the vector of residuals.

Note: Ee = 0 and X’e = 0.

Proposition: The LS estimator is unbiased.


^
Proof: E β = E[(X’X)-1X’y]

= E[(X’X) -1X’(X β+ ε)] = β

N.M. Kiefer, Cornell University, Economics 620, Lecture 4 4


GEOMETRY OF LEAST SQUARES:

consider y = Xβ +ε with
y1  x1 
y =  ,X =  .
 y2   x2 

Definition: The space spanned by matrix


X is the vector space which consists of all
linear combinations of the column
vectors of X.

Definition: X(X’X)-1X’y is the orthogonal


projection of y to the space spanned by X

Proposition: e is perpendicular to X,
i.e. X’e = 0.
Proof:
^
e = y - Xβ = y - X(X’X)-1X’y

e = (I - X(X’X)-1X’)y
N.M. Kiefer, Cornell University, Economics 620, Lecture 4 5
⇒ X′e = (X′ - X′)y = 0 „

Thus the equation y = X β$ + e gives y as the


sum of a vector in R[X] and a vector in
N[X′].
Common (friendly) projection matrices:

1. The matrix which projects to the space


orthogonal to the space spanned by X (i.e. to
N[X′] is
M = I - X(X′X) X′ .
−1

Note: e = My. If X is full column rank, M has


rank (N - K).

2. The matrix which projects to the space


spanned by X is
I - M = X(X′X) X′ . −1

Note: y$ = y - e = y - My = (I - M)y. If X is
full column rank, (I - M) has rank K.

N.M. Kiefer, Cornell University, Economics 620, Lecture 4 6


Example in R2
yi = xiβ + εi i= 1,2

e xb

What is the case of singular X’X?

N.M. Kiefer, Cornell University, Economics 620, Lecture 4 7


Properties of projection
matrices
1. Projection matrices are idempotent.

I.g. (I - M)(I - M) = (I - M).

Proof: (I - M)(I - M)
= (X(X′X) X′)(X(X′X) X′)
−1 −1

= X(X′X) X′ = (I - M) „
−1

2. Idempotent matrices have eigenvalues equal


to zero or one.

Proof: Consider the characteristic equation


Mz = λz⇒ M z = Mλz = λ z
2 2

2
Since M is idempotent, M z = Mz.

Thus, λ z = λz, which implies that λ is either


2

0 or 1. „
N.M. Kiefer, Cornell University, Economics 620, Lecture 4 8
3. The number of nonzero eigenvalues of a
matrix is equal to its rank.

⇒ For idempotent matrices, trace = rank.

More assumptions to the K-variable linear


model:
Second assumption: V(y) = V(ε) = σ 2 I N
where y and ε are N-vectors.

With this assumption, we can obtain the


sampling variance of β$ .

Proposition: V( β$ ) = σ 2 (X′X) −1

Proof:

β$ = (X′X) −1 X′y
= (X′X) −1 X′Xβ + (X′X) −1 X′ε
hence
β$ = β + (X′X) −1 X′ε
N.M. Kiefer, Cornell University, Economics 620, Lecture 4 9
V( β$ ) = E( β$ −E β$ ) ( β$ −E β$ )′
= E (X′X) −1 X′εε′X(X′X) −1

V( β$ ) = (X′X) −1 X′ (Eεε′) X(X′X) −1


= σ 2 (X′X) −1 „

Gauss-Markov Theorem: The LS estimator is


BLUE.

Proof: Consider estimating c′β for some c.


A possible estimator is c′ β$
with variance σ 2 c′(X′X) −1 c.

An alternative linear unbiased estimator: b = a′


y.

Eb = a′Ey = a′Xβ.

Since both c′ β$ and b are unbiased, a′X = c′.

N.M. Kiefer, Cornell University, Economics 620, Lecture 4 10


Thus, b = a′y = a′(Xβ + ε)
= a′Xβ + a′ε = c′β + a′ε.

Hence, V(b) = σ 2 a′a.

Now, V(c′ β$ ) = σ 2 a′X(X′X) −1 X′a since c′ = a


′X.

So V(b) - V(c′ β$ ) = σ 2 a′Ma , p.s.d.

Hence V(b) ≥V(c′ β$ ) „

N.M. Kiefer, Cornell University, Economics 620, Lecture 4 11


2
Estimation of σ

Proposition: s 2 = e′e/(N - K) is an unbiased


estimator for σ 2 .

Proof: e = y - X β$ = My = Mε ⇒
e′e = ε′Mε

Ee′e = Eε′Mε = E tr ε′Mε (why?)

= tr Eε′Mε = tr EMεε′ (important trick)

= tr M Eεε′ = σ 2 tr M = σ 2 (N -K)

⇒ s 2 = e′e/(N - K) is unbiased for σ 2 . „

N.M. Kiefer, Cornell University, Economics 620, Lecture 4 12


FIT: DOES THE REGRESSION MODEL
EXPLAIN THE DATA?

We will need the useful idempotent matrix


A = I − 1(1′1)−11′ = I - 11′/N
which sweeps out means.

Here 1 is an N-vector of ones.

Note that AM = M when X contains a constant


term

Definition: The correlation coefficient in the K-


variable case is
2
R = (Sum of squares due to X)/(Total sum of
squares)
= 1 - (e′e/y′Ay).

N.M. Kiefer, Cornell University, Economics 620, Lecture 4 13


N

∑ i
( y − y ) 2

Using A, y′Ay =: i =1

y′Ay = (Ay)′(Ay) = (A y$ + Ae)′(A y$ + Ae)


= y$ ′A y$ + e′Ae since y$′e = 0

Thus, y′Ay = y$ ′A y$ + e′e since Ae = e.

Scaling yields:

y$ ′Ay$ e ′e
1 = +
y ′Ay y ′Ay

What are the two terms of this splitup?

N.M. Kiefer, Cornell University, Economics 620, Lecture 4 14


2
R gives the fraction of variation explained by
X:

R = 1 - (e′e/y′Ay).
2

Note: The adjusted squared correlation


coefficient is given by

2 e ′e / (N − K)
R = 1 -
y ′Ay / ( N − 1)

(Why might this be preferable?)

N.M. Kiefer, Cornell University, Economics 620, Lecture 4 15


REPORTING
Always report characteristics of the sample, i.e.
means, standard deviations, anything unusual or
surprising, how the data set is collected and how
the sample is selected.

Report β$ and standard errors (not t-statistics).


The usual format is
β$
(s. e. of β$ )

Specify s or σ 2ML .
2

Report N and R 2 .

Plots are important. For example, predicted vs.


actual values or predicted and actual values over
time in time series studies should be presented.

N.M. Kiefer, Cornell University, Economics 620, Lecture 4 16


COMMENTS ON LINEARITY:
Consider the following argument: Economic functions don't
change suddenly. Therefore they are continuous. Thus they
are differentiable and hence nearly linear by Taylor's
Theorem.

This argument is false (but irrelevant).

f(x) f(x)

x x
Continuous, not diff, Continuous, diff,
but well-approximated and not well-
by a line. approximated…

N.M. Kiefer, Cornell University, Economics 620, Lecture 4 17

You might also like