ECO2009 Econometrics I
Chapter 3 Multiple Regression Analysis: Estimation
Yaein Baek
Sogang University
Spring 2024
ECO2009 Econometrics I Chapter 3 Spring 2024 1 / 40
Multiple Regression Model
Suppose we have k independent (explanatory) variables
X1 , X2 , . . . , Xk
A model that explains Y in terms of variables X1 , X2 , . . . , Xk ?
The multiple linear regression (MLR) model can be written in the
population as
Y = β0 + β1 X1 + β2 X2 + · · · + βk Xk + u
▶ β0 : intercept
▶ βj : slope parameter associated with Xj , j = 1, . . . , k
▶ u: error term (disturbance) that contains factors other than
X1 , X2 , . . . , Xk that affect Y
ECO2009 Econometrics I Chapter 3 Spring 2024 2 / 40
Interpretation of the Multiple Regression Model
The MLR model is linear in parameters β0 , β1 , . . . , βk
∆Y = β1 ∆X1 + β2 ∆X2 + · · · + βk ∆Xk + ∆u
Assuming ∆X1 = · · · = ∆Xj−1 = ∆Xj+1 = · · · = ∆u = 0, we have
∆Y
βj =
∆Xj
βj measures how much the dependent variable changes if the jth
independent variable is increased by one unit, hold all other
independent variables constant
“Ceteris paribus” interpretation
▶ We still have to assume that unobserved factors do not change if the
explanatory variables are changed, ∆u = 0
ECO2009 Econometrics I Chapter 3 Spring 2024 3 / 40
Motivation for Multiple Regression
The simple regression model
▶ The error term u represents factors other than X that affect Y
▶ The key assumption for ceteris paribus conclusions (SLR.4 ZCM):
E (u|X ) = 0
▶ This assumption is not likely to hold in many situations
▶ Hardly used in empirical economics
The multiple regression model
▶ Incorporate more explanatory factors into the model
▶ Explicitly control for many other factors that affect the dependent
variable
▶ Allow for more flexible functional forms
ECO2009 Econometrics I Chapter 3 Spring 2024 4 / 40
Motivation for Multiple Regression
Example: Wage equation
The wage is determined by the two explanatory variables, education
and experience
Compared with the simple regression model, exper is taken out of the
error term and put explicitly in the equation
Because the equation contains experience explicitly, we will be able to
measure the effect of education on wage, holding experience fixed
ECO2009 Econometrics I Chapter 3 Spring 2024 5 / 40
Motivation for Multiple Regression
Example: Average test scores and per student spending
Per student spending is likely to be correlated with average family
income at a given high school because of school financing
Omitting average family income in regression would lead to biased
estimate of the effect of spending on average test scores
In a simple regression model, effect of per student spending would
partly include the effect of family income on test scores
ECO2009 Econometrics I Chapter 3 Spring 2024 6 / 40
Motivation for Multiple Regression
Example: Family income and family consumption
MLR is also useful for generalizing functional relationships between
variables
Consumption is explained as a quadratic function of income
One has to be very careful when interpreting the coefficients:
ECO2009 Econometrics I Chapter 3 Spring 2024 7 / 40
Motivation for Multiple Regression
Example: CEO salary, sales and CEO tenure
Model assumes a constant elasticity relationship between CEO salary
and the sales of his or her firm
Model assumes a quadratic relationship between CEO salary and his
or her tenure with the firm
Meaning of “linear” regression: the model has to be linear in the
parameters (not in the variables)
ECO2009 Econometrics I Chapter 3 Spring 2024 8 / 40
Multiple Regression Model
The (multivariate) ZCM assumption:
E (u|X1 , X2 , . . . , Xk ) = 0
Under the ZCM assumption
E (Y |X1 , X2 , . . . , Xk ) = β0 + β1 X1 + · · · + βk Xk
βj is the partial effect of Xj on E (Y |X1 , X2 , . . . , Xk )
∂E (Y |X1 = x1 , X2 = x2 , . . . , Xk = xk )
βj =
∂xj
ECO2009 Econometrics I Chapter 3 Spring 2024 9 / 40
Obtaining the OLS estimators
Suppose we have a random sample of n observations
{(Yi , Xi1 , Xi2 , . . . , Xik ) : i = 1, 2, . . . , n}
The parameters are estimated by minimizing the sum of squared residuals
n
X
min (Yi − b0 − b1 Xi1 − · · · − bk Xik )2
b
i=1
where b ≡ (b0 , b1 , . . . , bk )
The OLS estimators (β̂0 , β̂1 , . . . , β̂k ) are obtained from (k + 1) FOCs
X
(Yi − β̂0 − β̂1 Xi1 − · · · − β̂k Xik ) = 0
i
X
(Yi − β̂0 − β̂1 Xi1 − · · · − β̂k Xik )Xi1 = 0
i
..
.
X
(Yi − β̂0 − β̂1 Xi1 − · · · − β̂k Xik )Xik = 0
i
ECO2009 Econometrics I Chapter 3 Spring 2024 10 / 40
Interpreting the OLS Regression Equation
The sample regression function (SRF or the OLS regression line)
Ŷ = β̂0 + β̂1 X1 + β̂2 X2 + · · · + β̂k Xk
The estimators β̂j have partial effect (ceteris paribus) interpretations
∆Ŷ = β̂j ∆Xj
holding X1 , . . . , Xj−1 , Xj+1 , . . . , Xj fixed
→ we have controlled for the variables X1 , . . . , Xj−1 , Xj+1 , . . . , Xj
ECO2009 Econometrics I Chapter 3 Spring 2024 11 / 40
Interpreting the OLS Regression Equation
Example 3.1 Determinants of college GPA
Holding ACT fixed, another point on high school grade point is
associated with another 0.453 points college grade point, on average
Or: If we compare two students with the same ACT , but the hsGPA
of student A is one point higher, we predict student A to have a
colGPA that is 0.453 higher than that of student B
Holding high school grade point average fixed, another 10 points on
ACT are associated with 0.094 point on college GPA, on average
ECO2009 Econometrics I Chapter 3 Spring 2024 12 / 40
OLS Fitted Values and Residuals
Fitted value for observation i
ŷi = β̂0 + β̂1 xi1 + β̂2 xi2 + · · · + β̂k xik
Residual for observation i
ûi = yi − ŷi
Algebraic properties of OLS regression
n
X
1 ûi = 0 sample covariance between Xj and u-
i=1 hat is zero. >> the sample covariance
n
X between the OLS fitted values y-hat
2 xij ûi = 0, j = 1, . . . , k
and the OLS residuals u-hat is zero.
i=1
3 ȳ = β̂0 + β̂1 x̄1 + β̂2 x̄2 + · · · + β̂k x̄k
ECO2009 Econometrics I Chapter 3 Spring 2024 13 / 40
“Partialling Out” Interpretation of Multiple Regression
Consider a MLR model with k = 2 independent variables
Yi = β0 + β1 Xi1 + β2 Xi2 + ui ,
the OLS estimate β̂1 can be obtained in two steps
1 Regress Xi1 on Xi2 and obtain the residuals
(Yi) (Xi)
xi1 = α0 + α1 xi2 + ri1 , i = 1, . . . , n
rˆi1 = xi1 − α̂0 − α̂1 xi2
2 Regress Yi on the residuals rˆi1 , then
(Xi) Pn
rˆi1 yi
β̂1 = Pi=1 n 2
i=1 rˆi1 The sample covariance of
the OLS residual and
rˆi1 are the part of xi1 that is uncorrelated with xi2 independent variable is zero.
Or: rˆi1 is xi1 after the effects of xi2 have been partialled out
(netted out)
ECO2009 Econometrics I Chapter 3 Spring 2024 14 / 40
“Partialling Out” Interpretation of Multiple Regression
Pn
rˆi1 yi
β̂1 = Pi=1
n 2
i=1 rˆi1
β̂1 measures the sample relationship between Y and X1 after X2 has
been partialled out
▶ It represents the isolated effect of the explanatory variable on the
dependent variable
In the general model with k explanatory variables, β̂j can be written as
Pn
rˆij yi
β̂j = Pi=1
n 2
i=1 rˆij
where the residuals rˆij come from the regression of Xj on
X1 , . . . , Xj−1 , Xj+1 , . . . , Xk >> "take out " the part in Xj that is correlated with other
independent variabls.
Holding other variables X1 , . . . , Xj−1 , Xj+1 , . . . , Xk fixed, β̂j is the
average effect of Xj on Y
ECO2009 Econometrics I Chapter 3 Spring 2024 15 / 40
“Partialling Out” Interpretation of Multiple Regression
Frisch-Waugh theorem
It can be shown that β̂j can be obtained from the following procedure
1 Regress Y on X1 , . . . , Xj−1 , Xj+1 , . . . , Xk and obtain the residual û Y
2 Regress Xj on X1 , . . . , Xj−1 , Xj+1 , . . . , Xk and obtain the residual û Xj
3 Regress û Y on û Xj
ECO2009 Econometrics I Chapter 3 Spring 2024 16 / 40
Goodness-of-Fit
We cannot use R-squared to determine whether to include variables or not.
Decomposition of total variation
SST = SSE + SSR
where SST = i (yi − ȳ )2 , SSE = i (ŷi − ȳ )2 , SSR = i ûi2
P P P
R-squared
SSE SSR
R2 ≡ =1−
SST SST
R 2 is non-decreasing in the number of independent variables
R-squared never decreases, and usually increases when other independent variable is
included in the regression.
Even if we add an independent variable that is irrelevant to Y, R-squared increases.
ECO2009 Econometrics I Chapter 3 Spring 2024 17 / 40
Goodness-of-Fit
Example 3.5 Explaining Arrest Records
An additional explanatory variable is added
Limited additional explanatory power as R-squared increases by little
The sign of the coefficient on avgsen is also unexpected: longer average
sentence length increases criminal activity
Even if R-squared is small, regression may still provide good estimates of
ceteris paribus effects
ECO2009 Econometrics I Chapter 3 Spring 2024 18 / 40
Assumptions for the Multiple Regression Model
MLR Assumptions
MLR.1 (Linear in Parameters) The population model is
Y = β0 + β1 X1 + β2 X2 + · · · + βk Xk + u
MLR.2 Random Sampling We have a random sample of n
observations, {(xi1 , xi2 , . . . , xik , yi ) : i = 1, 2, . . . , n}, following the
population model in MLR.1.
ECO2009 Econometrics I Chapter 3 Spring 2024 19 / 40
Assumptions for the Multiple Regression Model
MLR Assumptions
MLR.3 (No Perfect Collinearity) In the sample (and therefore in
the population), none of the independent variables is constant, and
there are no exact linear relationships among the independent
variables.
The assumption only rules out perfect collinearity between
explanatory variables; imperfect correlation is allowed.
If an explanatory variable is a perfect linear combination of other
explanatory variables it is superfluous and may be eliminated.
Constant variables are also ruled out (collinear with intercept).
ECO2009 Econometrics I Chapter 3 Spring 2024 20 / 40
Assumptions for the Multiple Regression Model
Example for perfect collinearity: small sample
Example for perfect collinearity: relationships between regressors
ECO2009 Econometrics I Chapter 3 Spring 2024 21 / 40
Assumptions for the Multiple Regression Model
MLR Assumptions
MLR.4 (Zero Conditional Mean) The error u has an expected
value of zero given any values of the independent variables.
E (u|X1 , X2 , . . . , Xk ) = 0.
Notations:
▶ Vector of n observations of the jth explanatory variable:
Xj = (X1j , . . . , Xij , . . . , Xnj )′ , j = 1, . . . , k
▶ X ≡ [1 X1 X2 · · · Xk ], where 1 is a (n × 1) vector of ones
Assumption MLR.4 with MLR.2 implies E (ui |X) = 0 for i = 1, . . . , n
ECO2009 Econometrics I Chapter 3 Spring 2024 22 / 40
Assumptions for the Multiple Regression Model
In a MLR, the ZCM assumption is much more likely to hold because
fewer things end up in the error.
The ZCM assumption fails if
▶ The functional relationship between the dependent and independent
variables is misspecified
▶ Omitting an important factor that is correlated with any of
X1 , X2 , . . . , Xk
▶ Measurement error (Ch. 9, 15)
▶ Simultaneous equations models (Ch. 16)
ECO2009 Econometrics I Chapter 3 Spring 2024 23 / 40
Unbiasedness of OLS
Theorem 3.1 Unbiasedness of OLS
Under Assumptions MLR.1 through MLR.4,
E (β̂j ) = βj , j = 0, 1, . . . , k
for any values of the population parameters βj . In other words, the OLS
estimators are unbiased estimators of the population parameters.
ECO2009 Econometrics I Chapter 3 Spring 2024 24 / 40
Omitted Variable Bias
The true population model has two explanatory variables and an error
term
Y = β0 + β1 X1 + β2 X2 + u
and assume this model satisfies Assumptions MLR.1-MLR.4
Suppose we specify the model by excluding X2 :
Y = β̃0 + β̃1 X1 + ũ
What will happen to the OLS estimator of β̃1 ?
ECO2009 Econometrics I Chapter 3 Spring 2024 25 / 40
Omitted Variable Bias
Recall that
Pn
ˆ (Xi1 − X̄1 )Yi
β̃1 = Pi=1
n 2
i=1 (Xi1 − X̄1 )
Pn Pn
i=1 (Xi1 − X̄1 )Xi2 β2 (Xi1 − X̄1 )ui
= β1 + Pn 2
+ Pi=1
n 2
. (1)
i=1 (Xi1 − X̄1 ) i=1 (Xi1 − X̄1 )
Consider the following regression where E (v |X1 ) = 0
X2 = γ0 + γ1 X1 + v ,
Note that the OLS estimator of γ1 is unbiased
Pn
(Xi1 − X̄1 )Xi2
E [γˆ1 |X1 ] = E Pi=1n 2
X 1 = γ1
i=1 (Xi1 − X̄1 )
where X1 = (X11 , . . . , Xi1 , . . . , Xn1 )′
ECO2009 Econometrics I Chapter 3 Spring 2024 26 / 40
Omitted Variable Bias
Take the conditional expectation of (1) given X1
Pn
ˆ i=1 (Xi1 − X̄1 )Xi2
h i
E β̃1 |X1 = β1 + β2 E Pn 2
X1
i=1 (Xi1 − X̄1 )
Pn
(Xi1 − X̄1 )ui
+ E Pi=1 n 2
X 1 .
i=1 (Xi1 − X̄1 )
By Law of Iterated Expectations,
"P # " "P # #
n n
(X i1 − X̄1 )u i (X i1 − X̄1 )ui
E Pin 2
X1 = E E Pin 2
X X1 = 0
i (Xi1 − X̄1 ) i (Xi1 − X̄1 )
under Assumption MLR.4 (ZCM)
We also have Pn
i=1 (Xi1 − X̄1 )Xi2
E Pn 2
X1 = γ 1
i=1 (Xi1 − X̄1 )
Therefore,
E β̃ˆ1 |X1 = β1 + β2 · γ1
h i
ECO2009 Econometrics I Chapter 3 Spring 2024 27 / 40
Omitted Variable Bias
The term β2 · γ1 is called the omitted variable bias,
E β̃ˆ1 |X1 − β1 = β2 · γ1
h i
which can be positive, negative, or zero.
We can infer the bias in β̃1 from the signs of β2 and correlation
between X1 and X2
There is no omitted variable bias if the omitted variable is
▶ irrelevant: β2 = 0
▶ uncorrelated with X1 : γ1 = 0
Can be generalized to the case of more than 2 regressors
ECO2009 Econometrics I Chapter 3 Spring 2024 28 / 40
Omitted Variable Bias
Example: Omitting ability in a wage equation
wage = β0 + β1 educ + β2 abil + u
abil = γ0 + γ1 educ + v
More ability leads to higher productivity and therefore higher wages:
β2 > 0
Likely that educ and abil are positively correlated: γ1 > 0
The OLS estimator of β̃1 from the simple regression
wage = β̃0 + β̃1 educ + ũ
are on average too large (overestimated) because β2 γ1 > 0
ECO2009 Econometrics I Chapter 3 Spring 2024 29 / 40
The Variance of the OLS Esitmators
MLR Assumptions
MLR.5 (Homoskedasticity): The error u has the same variance
given any value of the explanatory variables.
Var(u|X1 , . . . , Xk ) = σ 2
Example: Wage equation
wage = β0 + β1 educ + β2 exper + β3 tenure + u
Var(ui |educi , experi , tenurei ) = σ 2
ECO2009 Econometrics I Chapter 3 Spring 2024 30 / 40
The Variance of the OLS Estimators
Theorem 3.2 Sampling variances of the OLS slope estimators
Under Assumptions MLR.1 through MLR.5,
σ2
Var(β̂j |X) = , j = 1, . . . , k
SSTj (1 − Rj2 )
where SSTj = ni=1 (Xij − X̄j )2 and Rj2 is the R-squared from regressing
P
Xj on all other independent variables (including an intercept).
Components of the variance
1 The error variance: σ 2
2 The total variation in Xj : SSTj
3 The linear relationship among X1 , . . . , Xk : Rj2
⋆ Variance inflation factor
ECO2009 Econometrics I Chapter 3 Spring 2024 31 / 40
The Components of the OLS Variances: Multicollinearity
1 The error variance σ 2
▶ A high error variance increases the sampling variance because there is
more “noise” in the equation.
▶ The error variance does not decrease with sample size.
2 The total sample variation in the explanatory variable SSTj
▶ More sample variation leads to more precise estimates.
▶ Total sample variation automatically increases with the sample size,
thus increasing the sample size is thus a way to get more precise
estimates.
▶ A small (nonzero) SSTj is not a violation of Assumption MLR.3.
ECO2009 Econometrics I Chapter 3 Spring 2024 32 / 40
The Components of the OLS Variances: Multicollinearity
3 Linear relationships among the independent variables
▶ Regress Xj on all other independent variables; the R-squared of this
regression will be the higher when Xj can be better explained by the
other independent variables.
▶ The sampling variance of the slope estimator for Xj will be higher when
Xj can be better explained by the other independent variables.
▶ High (but not perfect) correlation between two or more independent
variables is called multicollinearity.
⋆ This is not a violation of Assumption MLR.3.
▶ A high degree of correlation between certain independent variables can
be irrelevant as to how well we can estimate other parameters of
interest.
ECO2009 Econometrics I Chapter 3 Spring 2024 33 / 40
The Components of the OLS Variances: Multicollinearity
An example for multicollinearity
The different expenditure categories will be strongly correlated
because if a school has a lot of resources it will spend a lot on
everything.
As a consequence, sampling variance of the estimated effects will be
large.
Only the sampling variance of the variables involved in
multicollinearity will be inflated; the estimates of other effects may be
very precise.
ECO2009 Econometrics I Chapter 3 Spring 2024 34 / 40
Including Irrelevant Variables in a Regression
Suppose that the true population model is
Yi = β0 + β1 X1 + β2 X2 + U
and assuming Assumptions MLR.1-MLR.4, we specify the model as
follows:
Yi = β̃0 + β̃1 X1 + β̃2 X2 + β̃3 X3 + Ũ
What will happen to the OLS estimators of β̃1 and β̃2 ?
ECO2009 Econometrics I Chapter 3 Spring 2024 35 / 40
Including Irrelevant Variables in a Regression
In terms of unbiasedness of β̃ˆ1 and β̃ˆ2 , there is no effect, which is
immediate from Theorem 3.1.
▶ Remember that unbiasedness means E (β̂j ) = βj for any value of βj
However, inclusion of irrelevant variables increases the variances of
the OLS estimators. Remember that
σ2
Var(β̂j |X) = , j = 1, . . . , k
SSTj (1 − Rj2 )
where SSTj = ni=1 (Xij − X̄j )2 and Rj2 is the R-squared from
P
regressing Xj on all other independent variables (including an
intercept)
ECO2009 Econometrics I Chapter 3 Spring 2024 36 / 40
Standard Errors of the OLS Estimators
The unbiased estimator of σ 2 in a MLR is
n
2 1 X
σ̂ = ûi2
n−k −1
i=1
where ûi = yi − β̂0 − β̂1 xi1 − · · · − β̂k xik
The degrees of freedom (df)
df = n − (k + 1)
= (number of observations) − (number of estimated parameters)
ECO2009 Econometrics I Chapter 3 Spring 2024 37 / 40
Standard Errors of the OLS Estimators
Unbiased Estimation of σ 2
Under Assumptions MLR.1 through MLR.5, E (σ̂ 2 ) = σ 2
The standard deviation of β̂j
σ
q
sd(β̂j ) = Var(β̂j ) = q
SSTj (1 − Rj2 )
The standard error of β̂j
σ̂
q
se(β̂j ) = Var(
d β̂j ) = q
SSTj (1 − Rj2 )
Note that these formulas are only valid under Assumptions
MLR.1-MLR.5 (in particular, there has to be homoskedasticity)
ECO2009 Econometrics I Chapter 3 Spring 2024 38 / 40
The Gauss-Markov Theorem
Under assumptions MLR.1-MLR.4, OLS is unbiased
However, under these assumptions there may be many other
estimators that are unbiased.
Which one is the unbiased estimator with the smallest variance?
▶ We define the “best” estimator as the one with the smallest variance.
In order to answer this question one usually limits oneself to linear
estimators; i.e., estimators linear in the dependent variable.
n
X
β̃j = Wij Yi
i=1
▶ Wij may be an arbitrary function of the sample values of all the
explanatory variables; the OLS estimator can be shown to be of this
form
ECO2009 Econometrics I Chapter 3 Spring 2024 39 / 40
The Gauss-Markov Theorem
Theorem 3.4 Gauss-Morkov Theorem
Under assumptions MLR.1-MLR.5, the OLS estimator β̂j is the best
linear unbiased estimator (BLUE) of the regression coefficient βj , i.e.
Var(β̂j ) ≤ Var(β̃j ), j = 0, 1, . . . , k
Pn
for all β̃j = i=1 Wij Yi for which E (β̃j ) = βj , j = 0, 1, . . . , k.
OLS is only the best estimator if MLR.1 – MLR.5 hold; if there is
heteroskedasticity, OLS no longer has the smallest variance among
linear unbiased estimators
ECO2009 Econometrics I Chapter 3 Spring 2024 40 / 40