ECON 5350 Class Notes
Least Squares
1 Introduction
We are interested in estimating the population parameters from the regression equation
Y =X + :
The population values are , 2
and . Their sample counterparts are b, ^ 2 and e. The sample counterpart
to the error term ( ) is called the residual (e). The two are related according to
Y = X + = Xb + e:
2 Least Squares
2.1 The Problem
We want to estimate the parameter by choosing a …tting criterion that makes the sample regression line
as close as possible to the data points. Our criterion is
min e0 e = (Y Xb)0 (Y Xb) = Y 0 Y b0 X 0 Y Y 0 Xb + b0 X 0 Xb: (1)
The criterion is minimized by choosing b. Taking the (vector) derivative with respect to b and setting equal
to zero gives
@e0 e
= 2X 0 Y + 2X 0 Xb = 0: (2)
@b
Provided X 0 X is nonsingular (guaranteed by Classical assumption two), we solve to get
b = (X 0 X) 1
X 0 Y: (3)
The second-order condition gives
@ 2 (e0 e)
= 2X 0 X
@b@b0
which satis…es the condition for a minimum since X 0 X is a positive-de…nite matrix if X is of full rank (Greene
A-114).
1
2.2 Example: UW Enrollment and Energy Prices
Consider the bivariate regression over the sample period 1957-2006 where the variables are
Y = UW resident undergraduate enrollment &
X = price of oil.
Assume the population regression equation is
yt = 1 + 2 xt + t:
The objective is to choose b1 and b2 to minimize
XT XT
e2t = (yt b1 b2 xt )2
t=1 t=1
which gives the two …rst-order conditions
P X
@( t e2t )
= 2 (yt b1 b2 xt ) = 0 (4)
@b t
P1 2 X
@( t et )
= 2 (yt b1 b2 xt )xt = 0: (5)
@b2 t
Equations (4) and (5) can be arranged to produce the normal equations
X X
yt = T b1 + b2 xt
t t
X X X
yt xt = b1 xt + b2 x2t :
t t t
Finally, solving for b1 and b2 gives
b1 = y b2 x
P
t (y
Pt
y)(xt x)
b2 = :
t (x t x)2
This is the same answer you get via matrix algebra b = (b1 ; b2 )0 = (X 0 X) 1
(X 0 Y ) for appropriately de…ned
X and Y . See MATLAB example 10 for more details.
2.3 Algebra of Least Squares
Consider the normal equations
X 0 (Y Xb) = X 0 e = 0: (6)
Three interesting results from equation 6 (assuming a constant term).
2
P
1. First column of X implies i ei = 0. Positive and negative residuals exactly cancel out.
P
2. i ei = 0 implies that e = Y Xb = 0, which implies Y = Xb. The regression hyperplane passes
through the sample mean.
3. Y^ 0 e = (Xb)0 e = b0 X 0 e = 0. The …tted values are orthogonal to the residuals.
2.4 Partitioned and Partial Regressions
Let a regression have two sets of explanatory variables, X1 and X2 , such that
Y = X1 1 + X2 2 + :
The normal equations can be written in partitioned form as
2 32 3 2 3
0
6X1 X1 X10 X2 7 6b1 7 0
6X1 Y 7
4 54 5 = 4 5:
X20 X1 X20 X2 b2 X20 Y
Solving for b2 gives
b2 = [X20 (I X1 (X10 X1 ) 1
X10 )X2 ] 1
[X20 (I X1 (X10 X1 ) 1
X10 )Y ]
= [X20 M1 X2 ] 1
[X20 M1 Y ];
where M1 = I X1 (X10 X1 ) 1
X10 can be interpreted as a residual-maker matrix, (i.e., premultiplying
any conformable matrix by M1 will generate the residuals associated with a regression on X1 ). Note the
following:
De…ne eY 1 = M1 Y:
De…ne e21 = M1 X2 :
M1 is symmetric and idempotent (i.e., M1 = M10 M1 = M1 M1 ).
This implies that we can write
b2 = [X20 M1 X2 ] 1
[X20 M1 Y ]
= [e021 e21 ] 1
[e021 eY 1 ]:
This is the result that makes multiple regression analysis so powerful for applied economics. We can
interpret b2 as the impact of X2 on Y while “partialing or netting out” the e¤ect of X1 . The results for b1
are analogous.
3
2.5 Goodness of Fit and Analysis of Variance
We will now assess how well the regression model …ts the data. Begin by writing the sample regression
equation Y = Xb + e in deviation from its mean form using the following matrix
2 3
1 n1 1 1
6 n n 7
6 1 7
1 0 6 1 1 1 7
6 n n n 7
M 0 = (In ii ) = 6 . .. .. 7
n 6 . 7
6 . . . 7
4 5
1 1 1
n n 1 n
where i is the unit column vector. We can then write
Y Y = M 0 Y = M 0 (Xb + e) = M 0 Xb + e: (7)
Premultiplying (7) by itself transposed, and noting that M 0 is a symmetric and idempotent matrix, gives
(Y Y )0 (Y Y ) = Y 0 M 0 Y = b0 X 0 M 0 Xb + e0 e
or SST = SSR+SSE, where the three terms stand for total, regression and error sum of squares, respectively.
A natural measure of goodness of …t is
SSR SSE
R2 = =1 :
SST SST
A few notes about R2
0 R2 1:
By adding additional explanatory variables, you can never make R2 smaller.
SSE=(n k)
An alternative measure is R2 = 1 SST =(n 1) , the adjusted R2 . This measure adds a penalty for
additional explanatory variables.
Be cautious interpreting R2 when no constant is included.
Value of R2 will depend on the type of data (e.g., cross-sectional data tends to produce low R2 s and
time series data often produces high R2 s).
Comparing R2 s requires comparable dependent variables.