Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
15 views18 pages

18-Econometrics-Linear Regression

The document discusses the use of instrumental variables (IV) in econometrics, particularly in the context of endogenous regressors and their impact on regression analysis. It explains the process of transforming equations to achieve homoskedasticity, the importance of choosing appropriate instruments, and the implications of using IV estimators compared to ordinary least squares (OLS). Additionally, it provides examples and exercises related to estimating returns to schooling using parental education as instruments.

Uploaded by

Lorenzo Lucchesi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views18 pages

18-Econometrics-Linear Regression

The document discusses the use of instrumental variables (IV) in econometrics, particularly in the context of endogenous regressors and their impact on regression analysis. It explains the process of transforming equations to achieve homoskedasticity, the importance of choosing appropriate instruments, and the implications of using IV estimators compared to ordinary least squares (OLS). Additionally, it provides examples and exercises related to estimating returns to schooling using parental education as instruments.

Uploaded by

Lorenzo Lucchesi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Econometrics

University of Milan-Bicocca

Course lecturer:
Maryam Ahmadi
[email protected]

1
Endogenous Regressors and
Instrumental Variables

2
Problem 17 & Answer.
1- Consider a linear model to explain monthly beer consumption:
𝑏𝑒𝑒𝑟 = 𝛽0 + 𝛽1 𝑖𝑛𝑐 + 𝛽2 𝑝𝑟𝑖𝑐𝑒 + 𝛽3 𝑒𝑑𝑢𝑐 + 𝛽4 𝑓𝑒𝑚𝑎𝑙𝑒 + 𝑢
E(u|inc, price, educ, female) = 0
Var(u|inc, price, educ, female) = 𝜎2inc2
Write the transformed equation that has a homoskedastic error term.

Var(u|inc,price,educ,female) = 𝜎2inc2 → h(x)= inc2 where h(x) is the heteroskedasticity


function. Therefore, ℎ = inc, and so the transformed equation is obtained by dividing the
original equation by inc:
𝑏𝑒𝑒𝑟 1 𝑝𝑟𝑖𝑐𝑒 𝑒𝑑𝑢𝑐 female 𝑢
= 𝛽0 + 𝛽1 + 𝛽2 + 𝛽3 + 𝛽4 +
𝑖𝑛𝑐 ⅈ𝑛𝐶 ⅈ𝑛𝐶 ⅈ𝑛𝐶 ⅈ𝑛𝐶 ⅈ𝑛𝐶
Notice that, 𝛽1 , which is the slope on inc in the original model, is now a constant in the
transformed equation. This is simply a consequence of the form of the heteroskedasticity and
the functional forms of the explanatory variables in the original equation.
3
2- Consider the model y= 𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + 𝑢, and suppose that cov(𝑢,𝑥2 ) ≠ 0.

a) Is it possible to still make appropriate inferences based on the OLS estimator, while
adjusting the standard errors appropriately?
No. If E(𝑢. 𝑥2 ) ≠ 0, the OLS estimator is biased, no matter what other assumptions we
are making. Correcting standard errors does not solve the biasedness.

b) Explain how an instrumental variable, zi, leads to a new moment condition and,
consequently, an alternative estimator for 𝛽.
An instrumental variable, z, say, gives rise to a new moment condition that can replace
the invalid one. 𝐸 𝑢. 𝑧) = 𝐸{(𝑦 − 𝛽𝑥 . 𝑧} = 0. This is coming from the exogeneity of z
and the fact that cov(z,u)=0. This leads to the IV estimator, , the
ratio between covariance of z and y over covariance of z and x.
4
c) Why does this alternative estimator lead to a smaller R2 than the OLS one? What does this say about
the R2 as a measure for the adequacy of the model?
OLS minimizes the residual sum of squares and therefore maximizes the R2. Any other estimator,
including instrumental variables, results in a lower R2. Note that we are not interested in obtaining an R2
that is as high as possible, but in obtaining unbiased estimates for the coefficients of interest that are as
accurate as possible. The R2 does not tell us which estimator is the preferred one. The R2 tells us how
well the model fits the data (in a given sample) and typically is only interpreted in this way when the
model is estimated by ordinary least squares.

d) Why can we not choose z= 𝑥1 as an instrument for 𝑥2 , even if E(𝑥1 ,u) = 0? Would it be possible to use
𝑥12 as an instrument for 𝑥2 ?
we cannot use x1 as an instrument for x2 because x1 is already included in the model.
In theory, it is possible to use x1-squared an instrument for x2. However, while not being correlated with
u is a necessity condition for instrumental variables, it is not a sufficient condition. An instrument should
be correlated to x2, not correlated with u and adding it to the model be intuitional.

5
Example: Education in a wage equation,

• Individual ability is included in u


• and is correlated with education
• Education is endogenous
• We need an instrumental variable (z) that is correlated with education but uncorrelated
with ability
• We chose father education as an instrument (z)

Use MROZ.dta

. reg educ fatheduc . ivregress 2sls lwage (educ= fatheduc )


Or
. predict educhat
. reg lwage educhat

you will get exactly the coefficients of the 2SLS/IV model (but you will get different
standard errors)
6
The correct two-stage
the residuals are:
least-squares

r = y − (𝑒𝑑𝑢𝑐)𝛽 residuals are:
But these are not the right residuals for 2SLS/IV. Because we
are fitting a structural model, we are interested in the e = y − (𝑒𝑑𝑢c)𝛽
residuals using the actual values of the endogenous variables.
7
Importance of chosing the right instrument

• If x and z are only slightly correlated, the sampling variance for 𝛽𝐼𝑉 could be
very large. The higher correlation between z and x, the smaller is the
variance of the IV estimator.

• This highlights an important cost of performing IV estimation, when x and u


are actually uncorrelated.

• This also highlights the importance of chosing the right instrument z that
satisfies the instrument relevance assumption (Cov(z,x)≠0).
8
An example of using an irrelevant instrument that cov(x,z)≠0 doesn’t hold

The log of birth weight, lbwght, is regressed on number of packs of cigarettes


that mother smoked per day during pregnancy.

We might worry that packs is correlated with other health factors or the
availability of good prenatal care as well as the mothers education, so packs is
endonegnous and and zero conditional mean assumption is violated

A possible instrument variable for packs is the average of price of cigarettes,


cigprice
We assume that cigprice is correlated with packs (instrument relevance) but
uncorrelated with u that is the health factors (instrument exogeneity).
9
. ivregress 2sls lbwght (packs = cigprice), first
The estimation results show that
First-stage regressions

• In the first stage of estimation, there is no relationship


Number of obs = 1,388 between cigprice and packs of smoked cigarettes
F( 1, 1386) = 0.13 (relevance assumption is violated) .
Prob > F = 0.7179
R-squared = 0.0001
Adj R-squared = -0.0006
Root MSE = 0.2987 • The IV estimation results show that the coefficient on
packs is huge and has an unexpected sign.
packs Coef. Std. Err. t P>|t| [95% Conf. Interval]

cigprice
_cons
.0002829
.0674257
.000783
.1025384
0.36
0.66
0.718
0.511
-.0012531
-.1337215
.0018188
.2685728
• The standard error of packs also is very large resulted
from low correlation betweeb cigprice and packs.

Instrumental variables (2SLS) regression Number of obs = 1,388


Wald chi2(1) = 0.12 • This estimation is failed as Cov(cigprice,packs)=0 and
Prob > chi2 = 0.7310 therefore the relevance assumption (Cov(z,x)≠0) is
R-squared
Root MSE
=
=
.
.93818
violated

lbwght Coef. Std. Err. z P>|z| [95% Conf. Interval]


• This is a case of irrelevant instrument; however, we
packs 2.988676 8.692619 0.34 0.731 -14.04854 20.0259 can face the problem of a weak instrument as well, in
_cons 4.448136 .9075006 4.90 0.000 2.669468 6.226805 which the covariance between x and z is not zero but
Instrumented: packs
is very small.
Instruments: cigprice

10
IV estimation in the multiple regression model

𝑦 = 𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + ⋯ + 𝛽𝑘 𝑥k + 𝑢
Endogenous variable exogenous variables

𝑦 is the dependent variable


𝑥1 is the endogenous regressor (correlated with 𝑢)
𝑥2 to 𝑥𝑘 are the exogenous variables or included exogenous regressors
(uncorrelated with 𝑢)
z is the instrumental variable
Does not appear in regression equation
Is uncorrelated with error term
Is partially correlated with endogenous explanatory variable
11
Stage one.

Regress 𝑥1 on all the exogenous regressors: regress 𝑥1 on 𝑥2 to 𝑥𝑘−1 and z by OLS


𝑥1 = 𝜋1 + 𝜋2 𝑥2 + …+ 𝜋𝑘 𝑥𝑘 + 𝜋𝑘+1 𝑧 + 𝑣 In a regression of the endogenous
explanatory variable on all
exogenous variables, the
• This is called “reduced form regression”. instrumental variable must have a
non-zero coefficient.

• The important is to have a statistically significant coefficient for 𝑧, as z is the instrumental


variable and Cov(𝑥1 , 𝑧)≠0 should hold. The significance of other variables doesn’t matter.

• Moreover, for all exogenous variables, Cov(𝑥2 , 𝑢)=0, Cov(𝑥3 , 𝑢)=0, ……, Cov(𝑥𝑘 , 𝑢)=0

• Compute predicted values of 𝑥1 as 𝑥ො1


𝑥ො1 = π
ෝ1 +ෝ
π2 𝑥2 +…+ π
ෝ𝑘 𝑥𝑘 + π
ෝ𝑘+1 𝑧
12
Stage two.

Regress 𝑦1 on 𝑥ො1 and 𝑥2 to 𝑥𝑘 using an OLS

𝑦 = 𝛽0 + 𝛽1 𝑥ො1 + 𝛽2 𝑥2 + ⋯ + 𝛽𝑘 𝑥k + 𝑒𝑟𝑟𝑜𝑟

• This is a Two Stage Least Squares (2SLS) estimation

13
Example. Using the data SCHOOLING, the log of wage is regressed on a set of explanatory variables
lwage=𝛽0 + 𝛽1 𝑒𝑑𝑢𝑐 + 𝛽2 𝑒𝑥𝑝𝑒𝑟 + 𝛽3 𝑒𝑥𝑝𝑒𝑟 2 + 𝛽4 𝑏𝑙𝑎𝑐𝑘+𝛽5 𝑠𝑚𝑠𝑎+𝛽6 𝑠𝑜𝑢𝑡ℎ + 𝑢
smsa is a dummy variable for living in SMSA

• Suppose educ is an endogenous variable that is correlated with the error term, as there is ability
of the person in the error term that can be correlated with education and wage in the same time.

• Ability is unobservable, so we dont have data for this variable. Therefore, we should find an
instrument for education, that is correlated with education but uncorrelated with ability

• We choose nearc4 as the instrument for educ. It is a dummy variable =1 if the family lived near a
four year college at 1966. It is correlated with educ and uncorrelated with the ability (error term).

• Typically, an instrument is thought of as a variable that affects the costs of schooling (and thus the
choice of schooling) but not earnings. 14
OLS estimation results of regressing log(wage) on education, experience, experience-squared and three dummy
variables indicating whether the individual is black, lived in a metropolitan area (SMSA) and lived in the south:

15
Reduced form explaining endogenous
regressors from exogenous regressors and
instruments, should show significant
effect of the instruments. (If weak: weak
instruments problem.)

IV estimates are (much) less accurate than


OLS (how much depends upon their
correlation with the endogenous
regressors).

16
The fact that the IV estimate of the returns to there is no unique definition of an R2 if the
schooling is higher than the OLS, suggests that model is not estimated by ordinary
OLS underestimates the true causal effect of least squares.
schooling.
This is at odds with the ‘ability bias’. When we estimate the model by
instrumental variables methods, goodness-
The downward bias of OLS could be due to of-fit is not what we are after. Our goal was
• measurement error, or to consistently estimate the causal effect of
• the possibility that the true returns to schooling on wage and that is exactly what
schooling vary across individuals, negatively instrumental variables is trying to do.
related to schooling.
Again, the R2 plays no role in comparing
alternative estimators.

17
Problem 18
Consider the data SCHOOLING. The purpose of this exercise is to explore the role of
parents’ education as instruments to estimate the returns to schooling.

a. Estimate a reduced form for schooling that include mother’s and father’s education
levels, instead of the lived near college dummy. What do these results indicate about the
possibility of using parents’ education as instruments?

b. Estimate the returns to schooling, on the basis of the same specification as in the
example, using mother’s and father’s education as instruments.

c. Re-estimate the model using also the lived near college dummy.

d. Compare and interpret the different estimates on the returns to schooling from
example, and parts b and c of this exercise.

• The command for more than one instrument for an endogenous regressor is the same as one
instrument, just add other instruments; e.g. ivregress 2sls lwage76 ( ed76 = nearc4 momed daded ) …
18

You might also like