Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
13 views28 pages

Instrumental Variable in Regression

Uploaded by

gervasvin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views28 pages

Instrumental Variable in Regression

Uploaded by

gervasvin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 28

Introduction to Instrumental

Variables Methods
Dismas Alex
The institute of Finance Management
Introduction
Motivation
• We often use non experimental data to
conduct empirical investigations
• The simple regression model would be:

• For example, return to education, the effect of


class size on student achievement, etc
• What would be the problem with this simple
model?
Motivation
• Suppose that we extend the model to include
covariates/control variables:

• What would be the problem with this simple


model?
• Issue of Omitted Variable bias, then T is
endogenous (Its true for many variables we use)
• We need to think about the best way to actually
mitigate the issue
Motivation
• The problem of omitted variable bias or
unobserved heterogeneity can be quite
extensive
• Often times important personal variables
cannot be observed
• The unobservables are correlated with the
explanatory variables of interest, T.
• Thus T is endogenous.
The consequence of an endogenous T
• Recall the key assumption

• If T is endogenous, then
Cov =
• Thus, the estimated coefficient is biased

• Instrumental variables (IV) offers one approach


to estimating (when instruments are available…)
What are the solutions to OVB and
unobserved heterogeneity
• Ignore the problem – biased and inconsistent
estimate of the coefficients
• Find a suitable proxy variable for the
unobserved variable e.g. IQ test for ability
• Assume that the unobserved variable that
does not change overtime and we can obtain
panel data
– Fixed effects or
– First-differencing model
Example 1:The Case of Job Training and Earnings
• Suppose we want to measure the impact of job training on earnings.
We Observe data on earnings for people who have and have not
completed job training.

• We compare two groups: those who got trained and those who didn’t.

• Want to infer the causal effect of job training on earnings

• What if people who are more “motivated” are more likely to get
training and on average earn more than less “motivated”?
‒ Difference between average earnings across the trained and
untrained confounds the effects of motivation and training
‒ Omitted variables bias: Would like to control for unobserved
(and unobservable?) motivation
Example 1:The Case of Job Training and Earnings
• In this scenario, "motivation" acts as a potential confounding
variable, as it influences both whether someone receives job
training and their final earnings.
• This can indeed bias the observed relationship between
training and earnings, making it difficult to infer a causal
effect.
• Selection bias:
– If motivated individuals are more likely to pursue training, simply
comparing the earnings of those who trained vs. those who didn't will
be misleading.
– The trained group might have inherently higher earning potential due
to their motivation, not necessarily the training itself.
How to address this?
• Randomized controlled trials (RCTs):
– The gold standard for causal inference! If you randomly assign individuals to
receive training or not, any differences in earnings can be attributed to the
training, not pre-existing differences like motivation.

• Control variables:
– You can incorporate variables related to motivation (e.g., education level, prior
work experience) into your analysis. This statistically "controls" for their
influence, allowing you to isolate the effect of training while accounting for
motivation differences.

• Instrumental variables (IVs):


– Find a variable that influences the decision to train but not earnings directly.
This "instrument" can help identify the true causal effect of training by
separating it from the confounding influence of motivation.
Instrumental variables (IVs)

• A solution to the endogeneity problem is to


find an instrumental variable (IV)
Instrumental variables (IVs)
Instrumental variables (IVs)
Instrumental variables (IVs)
IV Estimation in Multiple regression
Two stage Least Square (2SLS) Estimation
Two stage Least Square (2SLS) Estimation
Example: Job Training
• OLS Results (from Stata):

regress earnings train x1-x13 , robust

Linear regression Number of obs = 5102


F( 14, 5087) = 38.35
Prob > F = 0.0000
R-squared = 0.0909
Root MSE = 18659

------------------------------------------------------------------------------
| Robust
earnings | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
train | 3753.362 536.3832 7.00 0.000 2701.82 4804.904
.
.
.

If intuition about source of endogeneity is correct, this should be an over-


estimate of the effect of training.
Example: Job Training
• First-Stage Results (from Stata):

regress train offer x1-x13 , robust

Linear regression Number of obs = 5102


F( 14, 5087) = 390.75
Prob > F = 0.0000
R-squared = 0.3570
Root MSE = .39619

------------------------------------------------------------------------------
| Robust
train | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
offer | .6088885 .0087478 69.60 0.000 .591739 .6260379
.
.
.
Strong evidence that E[zixi] ≠ 0
Example: Job Training
• Reduced-Form Results (from Stata):

regress earnings offer x1-x13 , robust

Linear regression Number of obs = 5102


F( 14, 5087) = 34.19
Prob > F = 0.0000
R-squared = 0.0826
Root MSE = 18744

------------------------------------------------------------------------------
| Robust
earnings | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
offer | 970.043 545.6179 1.78 0.075 -99.60296 2039.689.
.
.
.
Moderate evidence of a non-zero treatment effect
(maintaining exclusion restriction)
Example: Job Training Note: Some software
reports R2 after IV
• IV Results (from Stata): regression. This
object is NOT
meaningful and
should not be used.
ivreg earnings (train = offer) x1-x13 , robust

Instrumental variables (2SLS) regression Number of obs = 5102


F( 14, 5087) = 34.38
Prob > F = 0.0000
R-squared = 0.0879
Root MSE = 18689

------------------------------------------------------------------------------
| Robust
earnings | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
train | 1593.137 894.7528 1.78 0.075 -160.9632 3347.238
.
.
.
Moderate evidence of a positive treatment effect (maintaining
exclusion restriction). Substantially attenuated relative to OLS,
consistent with intuition.
Example: Returns to Schooling
• Structural Equation:

• First-Stage Equation:

– Note: E[zixi] ≠ 0 => π1,1 ≠ 0 or π1,2 ≠ 0 or π1,3 ≠ 0

• Reduced Form Equation:


Example: Returns to Schooling
• OLS Results (from Stata):

xi: reg lwage educ i.yob i.sob , robust


i.yob _Iyob_30-39 (naturally coded; _Iyob_30 omitted)
i.sob _Isob_1-56 (naturally coded; _Isob_1 omitted)

Linear regression Number of obs = 329509


F( 60,329448) = 649.29
Prob > F = 0.0000
R-squared = 0.1288
Root MSE = .63366

------------------------------------------------------------------------------
| Robust
lwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
educ | .067339 .0003883 173.40 0.000 .0665778 .0681001
.
.
. If intuition about source of endogeneity is correct, this should be an over-
estimate of the effect of schooling.
Example: Returns to Schooling
• First-Stage Results (from Stata):
xi: regress educ i.qob i.sob i.yob , robust
Linear regression Number of obs = 329509
F( 62,329446) = 292.87
Prob > F = 0.0000
R-squared = 0.0572
Root MSE = 3.1863

------------------------------------------------------------------------------
| Robust
educ | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_Iqob_2 | .0455652 .015977 2.85 0.004 .0142508 .0768797
_Iqob_3 | .1060082 .0155308 6.83 0.000 .0755683 .136448
_Iqob_4 | .1525798 .0157993 9.66 0.000 .1216137 .1835459

.
.
.
testparm _Iqob*

( 1) _Iqob_2 = 0
( 2) _Iqob_3 = 0
( 3) _Iqob_4 = 0 First-stage F-statistic.
F( 3,329446) = 36.06
Prob > F = 0.0000
Example: Returns to Schooling
• Reduced-Form Results (from Stata):
xi: regress lwage i.qob i.sob i.yob , robust

Linear regression Number of obs = 329509


F( 62,329446) = 147.83
Prob > F = 0.0000
R-squared = 0.0290
Root MSE = .66899

------------------------------------------------------------------------------
| Robust
lwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_Iqob_2 | .0028362 .0033445 0.85 0.396 -.0037188 .0093912
_Iqob_3 | .0141472 .0032519 4.35 0.000 .0077736 .0205207
_Iqob_4 | .0144615 .0033236 4.35 0.000 .0079472 .0209757
.
.

testparm _Iqob*

( 1) _Iqob_2 = 0
( 2) _Iqob_3 = 0
( 3) _Iqob_4 = 0

F( 3,329446) = 10.43
Prob > F = 0.0000
Example: Returns to Schooling
• 2SLS Results (from Stata):
xi: ivregress 2sls lwage (educ = i.qob) i.yob i.sob , robust

Instrumental variables (2SLS) regression Number of obs = 329509


Wald chi2(60) = 9996.12
Prob > chi2 = 0.0000
R-squared = 0.0929
Root MSE = .64652

------------------------------------------------------------------------------
| Robust
lwage | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
educ | .1076937 .0195571 5.51 0.000 .0693624 .146025
.
.
.

Bigger than OLS?

You might also like