0% found this document useful (0 votes)

12 views23 pages

Class 7 After

The document discusses Instrumental Variables (IV) regression as a method for causal inference in observational data, emphasizing the importance of finding valid instruments that meet relevance and exogeneity conditions. It outlines the Two-Stage Least Squares (TSLS) estimator process and provides examples of instruments for demand estimation and their validity. Additionally, it covers testing for instrument validity, including relevance and exogeneity, and provides guidance on implementing IV regression in Python and R.

Uploaded by

angela8779a

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views23 pages

Class 7 After

Uploaded by

angela8779a

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

PMDA

Lecture 7: Instrumental Variables (IV)

A few updates and reminders

Problem sets:
● Problem Set 3: Panel Data & Diﬀ-in-diﬀ due Feb 22th
● Problem Set 4: Lasso Regression due March 15th
Agenda today

● Another method for causal inference in observational data:

○ Instrumental Variables (IV) regression
○ Workshop
Instrumental Variables (IV) regression

● IV is a general method to eliminate endogeneity, i.e. when you have a

regression model

with cov(X,e)≠0
● The key requirement is that you must ﬁnd (at least) another variable, an
instrument Z, that satisﬁes two conditions:

1. (Relevance) it is correlated with X: cov(Z,X)≠0

2. (Exogeneity) it is not correlated with e: cov(Z,e)=0

IV in a causal graph

Z X

● Visualization of the two conditions for instrument validity:

○ Relevance = there is a link between Z and X
○ Exogeneity = there is no direct link between Z and Y

● Q: How is this diﬀerent from the causal graphs with omitted variables we
have considered before?
● A: An omitted variable by deﬁnition is part of the error e, so it must have a
direct link to Y
The two-stage least squares (TSLS) estimator

● Idea: use Z to isolate the variation in X that is not correlated with e (i.e., the
exogenous variation in X) and use it to estimate β_1

● TSLS procedure. Run two regressions:

1. First-stage regression of X on Z

Exogenous part is the predicted value,

2. Second-stage regression of Y on predicted value

Because is uncorrelated with e (why?), the estimator of β_1 is unbiased

Not to be confused with the Frisch-Waugh theorem

The example we saw of the Frisch-Waugh theorem:

1. First regress X on Z

2. Then regress Y on the residual

Q: What are the diﬀerences?

A:
Frisch-Waugh TSLS
● Z is an omitted variable that causes ● Z is exogenous
endogeneity if not included ● Z has a no direct eﬀect on Y
● Z has a direct eﬀect on Y ● It is the predicted value from
● It is the residual from the stage 1 the stage 1 regression that
regression that eliminates the endogeneity eliminates the endogeneity
Example 1. Demand estimation

● β_1 is demand elasticity = % change in quantity for a 1% change in price

(this is the interpretation in a log-log model)
● Data: you observe prices and quantities in diﬀerent regions or time periods
● We saw that this is an example of simultaneous causality bias: there is a
supply curve in the background so P and Q are determined in equilibrium
Example 1. Demand estimation

● Fitting a regression through the scatterplot gives a biased estimate of

elasticity
● But, what if only the supply curve shifted and the demand stayed constant?

● TSLS estimates the demand curve by isolating shifts in price and quantity
that arise from shifts in supply
● Z is a variable that shifts supply but not demand
Instruments for demand estimation

Examples of instruments for the regression

1. Suppose the demand is for milk and use Z = rain in dairy-producing regions
○ Q: Relevant?

A: Less rain → less milk produced → higher price

○ Q: Exogenous?
A: Rain doesn’t directly aﬀect how much milk you buy
2. Suppose the demand is for cigarettes and use Z = sales tax in diﬀerent US
states
○ Q: Relevant?

A: Sales tax aﬀects all prices so also cigarette price

○ Q: Exogenous?
A: Sales tax not determined by cigarette market
Another Example

Does putting criminals in jail reduce crime? (Levitt 1996)

● You have data of crime rates and incarceration rates at the state level
● You want to measure how crime rates are aﬀected by incarceration rates
● You also have data on economic conditions, age, and other demographics

What type of bias can we have?

A: Simultaneous bias:
more crime → more prisoners (if police do their job)
more prisoners → less crime

What instrument would you use?

A: “Lawsuits aimed at reducing prison overcrowding”
● Relevance: They slow the growth of prisoner incarcerations
● Exogeneity: Only if overcrowding litigation is induced by prison
conditions but not by crime rate or its determinants
Instruments can isolate “as if random” variation

● Say we want to understand the eﬀect of class size on students’ performance

test_scores= β_0+β_1 class_size + e
○ Q: Endogeneity problem in this regression?
○ A: OVB, for example due to parental income/involvement
● Suppose you have a natural experiment: an earthquake makes affected
districts to double up classrooms in some schools (others have to close)
● Let Z = distance to the epicenter
○ Q: Is this a valid instrument?
○ A: Relevant because class size is correlated with the distance to the
epicenter: districts close to epicenter are most severely affected
○ Exogenous if distance to epicenter is unrelated to any other factors
affecting student performance such as being english learner or other
disruptive effects of the earthquake on performance
○ Q: Can you explain in words how TSLS gives you a causal estimate?
○ A: The first stage regression isolates the variation in class size that is “as
if randomly assigned”. Using this part in the second stage regression
thus gives a causal estimate
Discussion Question

Suppose you wish to measure the impact of smoking on the weight of

newborns
You are planning to use the following model,

where
● bw is the birth weight,
● male is a dummy variable equal to 1 if the baby is a boy or 0 otherwise,
● order is the birth order of the child,
● y is the log income of the family,
● cig is the amount of cigarettes per day smoked during pregnancy,
● i indexes the observation
● β’s are the unknown parameters.

Q1: What could be the problem in using OLS to estimate the above model?
Q2: Suppose you have data on the average price of cigarettes in the state of
residence. Would this information help you to identify the true parameters of
the model?
Generalizations

● We can have more than one instrument: Z_1,…,Z_m

● We can also add control variables: W_1,…,W_r

1. First-stage regression: regress X on Z_1,…,Z_m and W_1,…,W_r and get

predicted value

2. Second-stage regression: regress Y on and W_1,…,W_r

○ Note: the standard errors from this regression are not the correct
ones because they do not account for estimation uncertainty in
○ Traditional softwares/packages account for this when computing
standard errors
Generalizations

● We can also have more endogenous regressor: X_1,…,X_k

● In this case, we need that the model is exactly identified or over identified:
○ Exactly identified: when m=k, that is, same number of instruments and
endogenous regressors
○ Overidentified: when m>k
○ Underidentified: when m<k

● In the latter case, we cannot estimate all the model parameters

Can you test for instrument validity?

● Requirements for validity when you have multiple instruments Z_1,…,Z_m

1. Relevance: at least one instrument is correlated with X (a bit more
involved when we have multiple endogenous regressors, see Stock & Watson
Chapter 12)
2. Exogeneity: all the instruments are uncorrelated with e

● Can you test these conditions?

○ You can test for relevance by looking at the ﬁrst-stage regression
○ You can test for exogeneity but only when you have more than one
instrument
Testing relevance

● Relevance is akin to sample size: the more relevant the instruments, the more
variation in X is explained by the instruments, so that we have more info for the
IV regression => More relevant implies more accurate estimator

● First stage regression:

○ The instruments are said to be weak if all the a_1,…, a_m are zero or nearly
zero
○ Weak instruments explain little of the variation of X:
■ E.g., something that shifts the supply curve but only by little
○ Weak instruments imply that the TSLS estimator is biased towards the OLS
estimator, and usual statistical inference (standard errors, hypothesis tests)
is misleading
Testing relevance

● You can test for weak instruments by:

○ Consider the F-statistic testing the hypothesis that a_1,…, a_m are all zero
○ Rule of thumb: if ﬁrst-stage F-statistic < 10, you have weak instruments
○ Note that simply rejecting the null hypothesis that the coeﬃcients are zero
isn’t enough – you need the F-statistic to be large (> 10)
○ There are more formal tests for weak instruments (e.g., built-in in python
and R)
Testing relevance

What if you have weak instruments?

○ If you have many: a few strong and many weak instruments, drop some of
the weak until you get F-stat >10

○ Your standard error may increase but keep in mind that the original
standard errors were not meaningful

○ If the coefficients are exactly identified and you don’t have strong enough
instruments:
■ Try to find additional stronger instruments
■ Proceed with weak instruments but use other methods (e.g., Limited
information maximum likelihood estimator, LIML)
Testing exogeneity

● You can test for exogeneity only if you have more than one instrument
● The test is called the “J-test of overidentifying restrictions”: it tests that all
instruments are exogenous (H_0)
● Intuition:
○ Suppose there are two instruments → could compute two separate
TSLS estimates → if estimates are very diﬀerent something must be
wrong: one of the two instruments (or both) must be invalid
○ The J-test makes this comparison in a statistically precise way
Testing exogeneity
More formally:
1. Build:

2. Estimate:

3. Test whether the delta’s are zero by computing the F-statistic for:
delta_1=...=delta_m=0

Why can’t we do the same approach when m=k (same instruments as

endogenous regressors)?

Note: If the J-test rejects, you don’t know which instruments are endogenous.
In this case you need to use your knowledge of the problem to decide what is
the most appropriate next step (see workshop)
IV regression in Python

● The TSLS estimator is computed in Python using the command IV2SLS

from linearmodels.iv: “from linearmodels.iv import IV2SLS”
● Syntax is IV2SLS (Y,[const,W], X, Z ) where X is the endogenous variable, Z
the instrument(s) and W the control(s)
● Diagnostic tests can be run by
○ iv_reg.ﬁrst_stage: Reports statistics from the ﬁrst stage include the partial
F-statistic which tests for relevance (weakness of instruments)

○ iv_reg.sargan: This is the J-test of instrument exogeneity. It can only be

used if you have more instruments than endogenous regressors. The null
hypothesis is of exogeneity
IV regression in R

● The TSLS estimator is computed in R using the command ivreg

● Syntax is ivreg (Y ~ X + W | W + Z ) where X is the endogenous variable, Z
the instrument(s) and W the control(s)
● Diagnostic tests can be run by adding ,diagnostics = TRUE inside the
summary command. This reports:
○ Weak instruments: This tests the null hypothesis that we have weak
instruments
○ Wu-Hausman (we didn’t discuss this)
○ Sargan: This is the J-test of instrument exogeneity. It can only be used if
you have more instruments than endogenous regressors. The null
hypothesis is of exogeneity

LN11 Handout
No ratings yet
LN11 Handout
16 pages
Lecture Set 7
No ratings yet
Lecture Set 7
88 pages
IV Notes1-2
No ratings yet
IV Notes1-2
56 pages
Instrumental Variables Regression Guide
No ratings yet
Instrumental Variables Regression Guide
7 pages
Instrumental Variable Estimation 1: Framework: Instructor: Yuta Toyama Last Updated: 2021-05-18
No ratings yet
Instrumental Variable Estimation 1: Framework: Instructor: Yuta Toyama Last Updated: 2021-05-18
30 pages
Topic 3 Instrumental Variables Regression (Part 1 Basics)
No ratings yet
Topic 3 Instrumental Variables Regression (Part 1 Basics)
26 pages
Endogeneity and Instrumental Variables
No ratings yet
Endogeneity and Instrumental Variables
22 pages
Introduction To Econometrics - Stock & Watson - CH 10 Slides
No ratings yet
Introduction To Econometrics - Stock & Watson - CH 10 Slides
99 pages
Cathy Econ0019 - w2
No ratings yet
Cathy Econ0019 - w2
62 pages
Ec0 8203 Econometrics Ppt6b
No ratings yet
Ec0 8203 Econometrics Ppt6b
25 pages
09 Causal Inference II: MSBA7003 Quantitative Analysis Methods
No ratings yet
09 Causal Inference II: MSBA7003 Quantitative Analysis Methods
34 pages
2024 French IV Slides
No ratings yet
2024 French IV Slides
93 pages
Chapter 1 - Instrumental Variable Method
No ratings yet
Chapter 1 - Instrumental Variable Method
32 pages
Copy+of+Instrumental+Variables+v3 JK
No ratings yet
Copy+of+Instrumental+Variables+v3 JK
32 pages
Chapter 15
No ratings yet
Chapter 15
38 pages
Instrumental Variables Regression Guide
No ratings yet
Instrumental Variables Regression Guide
63 pages
Cathy Econ0019 - w3
No ratings yet
Cathy Econ0019 - w3
44 pages
Instrumental Variables Regression
No ratings yet
Instrumental Variables Regression
20 pages
Lecture 12 Instrumental Variables
No ratings yet
Lecture 12 Instrumental Variables
5 pages
Econometrics: IV Method Essentials
No ratings yet
Econometrics: IV Method Essentials
32 pages
Econometrics: Instrumental Variables
No ratings yet
Econometrics: Instrumental Variables
21 pages
IMPORTANT3
No ratings yet
IMPORTANT3
13 pages
Endogeneity
No ratings yet
Endogeneity
9 pages
Inst Va Reg
No ratings yet
Inst Va Reg
37 pages
Applied Economics IV Lecture Notes
No ratings yet
Applied Economics IV Lecture Notes
64 pages
Chapter 15
No ratings yet
Chapter 15
76 pages
05 - Instrumental Variables PDF
No ratings yet
05 - Instrumental Variables PDF
92 pages
Instrumental Variables & 2SLS: y + X + X + - . - X + U X + Z+ X + - . - X + V
No ratings yet
Instrumental Variables & 2SLS: y + X + X + - . - X + U X + Z+ X + - . - X + V
21 pages
ECON W3412: Introduction To Econometrics Chapter 12. Instrumental Variables Regression (Part II)
No ratings yet
ECON W3412: Introduction To Econometrics Chapter 12. Instrumental Variables Regression (Part II)
33 pages
Key Expressions & Concepts
No ratings yet
Key Expressions & Concepts
5 pages
5 Ivmf
No ratings yet
5 Ivmf
13 pages
Variáveis Instrumentais
No ratings yet
Variáveis Instrumentais
21 pages
Chapter 2 SEM
No ratings yet
Chapter 2 SEM
33 pages
IVregression ECO311 Erdinc 14.03
No ratings yet
IVregression ECO311 Erdinc 14.03
11 pages
CH 10 Quiz
No ratings yet
CH 10 Quiz
7 pages
Development Economics I Dr. Elisabetta Gentile: Orientation Tutorial
No ratings yet
Development Economics I Dr. Elisabetta Gentile: Orientation Tutorial
11 pages
Instrumental Variables & 2SLS Guide
No ratings yet
Instrumental Variables & 2SLS Guide
21 pages
Week 10
No ratings yet
Week 10
42 pages
Econometrics for Advanced Students
No ratings yet
Econometrics for Advanced Students
73 pages
Instrumental Variable: Statistics Econometrics Epidemiology
No ratings yet
Instrumental Variable: Statistics Econometrics Epidemiology
5 pages
Instrumental Variables & 2SLS Guide
No ratings yet
Instrumental Variables & 2SLS Guide
28 pages
Instrumental Variables: Ani Katchova
No ratings yet
Instrumental Variables: Ani Katchova
27 pages
Econometrics: Endogeneity & IVs
No ratings yet
Econometrics: Endogeneity & IVs
5 pages
5ssmn932 Lecture7 2021 Collated Online
No ratings yet
5ssmn932 Lecture7 2021 Collated Online
79 pages
Slides 5 Iu
No ratings yet
Slides 5 Iu
38 pages
Instrumental Variables
No ratings yet
Instrumental Variables
33 pages
15 Instrumental Variables
No ratings yet
15 Instrumental Variables
27 pages
s10 IV Handout
No ratings yet
s10 IV Handout
48 pages
2SLS Notes
No ratings yet
2SLS Notes
44 pages
Instrumental PDF
No ratings yet
Instrumental PDF
69 pages
Econometrics Review
No ratings yet
Econometrics Review
49 pages
Endogeneity 6
No ratings yet
Endogeneity 6
16 pages
Lecture 2 - Instrumental Variable
No ratings yet
Lecture 2 - Instrumental Variable
18 pages
05 - Instrumental Variables
No ratings yet
05 - Instrumental Variables
92 pages
MIT Microeconomics 14.32 Final Review
No ratings yet
MIT Microeconomics 14.32 Final Review
5 pages
How To Test Endogeneity or Exogeneity Using SAS-1
No ratings yet
How To Test Endogeneity or Exogeneity Using SAS-1
28 pages
American Statistical Association
No ratings yet
American Statistical Association
9 pages
RM Notes-1
No ratings yet
RM Notes-1
9 pages
Duality Cheat Sheet-2
No ratings yet
Duality Cheat Sheet-2
1 page
Class 3 After
No ratings yet
Class 3 After
28 pages
Class 9 After
No ratings yet
Class 9 After
38 pages
An Iterative Approach For Accurate Dynamic Model Identification of Industrial Robots
No ratings yet
An Iterative Approach For Accurate Dynamic Model Identification of Industrial Robots
18 pages
Lecture 8
No ratings yet
Lecture 8
29 pages
4a. LPM-Logit-Probit-Tobit Model - IInd Sem 23-24
No ratings yet
4a. LPM-Logit-Probit-Tobit Model - IInd Sem 23-24
130 pages
Error Types and Error Propagation
No ratings yet
Error Types and Error Propagation
6 pages
Web Page Design and Download Time: Jing Zhi
No ratings yet
Web Page Design and Download Time: Jing Zhi
16 pages
Essentials of Statistics For Business and Economics 7th Edition by David R Anderson Ebook and TestBank Bundle Verified PDF
No ratings yet
Essentials of Statistics For Business and Economics 7th Edition by David R Anderson Ebook and TestBank Bundle Verified PDF
411 pages
009 GTE Unit07-Principles of LSQs
No ratings yet
009 GTE Unit07-Principles of LSQs
28 pages
Xi Assignment Eco
No ratings yet
Xi Assignment Eco
6 pages
Fundamentals of Hypothesis Testing: One-Sample Tests
100% (1)
Fundamentals of Hypothesis Testing: One-Sample Tests
105 pages
The Impact of Supply Chain Management On Developing The Organizations Performance
No ratings yet
The Impact of Supply Chain Management On Developing The Organizations Performance
11 pages
Venture Capital Financing: A Conceptual Framework: Swee-Sum Lam
No ratings yet
Venture Capital Financing: A Conceptual Framework: Swee-Sum Lam
14 pages
Book 111
No ratings yet
Book 111
3 pages
Stats With Python
75% (4)
Stats With Python
4 pages
Abbas Ali Et Al. (2014)
No ratings yet
Abbas Ali Et Al. (2014)
8 pages
The Relationship Between Intrinsic Motivation and Happiness With Academic Achievement in High School Students
No ratings yet
The Relationship Between Intrinsic Motivation and Happiness With Academic Achievement in High School Students
7 pages
Mock Exam 5 Ans
No ratings yet
Mock Exam 5 Ans
74 pages
Health Econometrics Using Stata 1nbsped 1597182281 9781597182287 Compress
No ratings yet
Health Econometrics Using Stata 1nbsped 1597182281 9781597182287 Compress
374 pages
Ids Unit 5 Final
No ratings yet
Ids Unit 5 Final
25 pages
Heterosced and Normality in Eviews
No ratings yet
Heterosced and Normality in Eviews
11 pages
STATA Heteroskedasticity Tests Guide
No ratings yet
STATA Heteroskedasticity Tests Guide
2 pages
Unit 2 Ipr
No ratings yet
Unit 2 Ipr
15 pages
HNS 2321 Biostatistics Lecture 3 and 4 Descritive Statistics
No ratings yet
HNS 2321 Biostatistics Lecture 3 and 4 Descritive Statistics
36 pages
Prof. Hemant Kombrabail
100% (2)
Prof. Hemant Kombrabail
32 pages
Part Ii - Time Series Analysis: C5 ARIMA (Box-Jenkins) Models
No ratings yet
Part Ii - Time Series Analysis: C5 ARIMA (Box-Jenkins) Models
14 pages
Classroom Interaction Skills Study
No ratings yet
Classroom Interaction Skills Study
70 pages
Percobaan Mat
No ratings yet
Percobaan Mat
4 pages
Effect of Porter's Generic Competitive Strategies and The Performance of Soft Drink Companies: Case of Somaliland Beverage Industry (SBI) in Somaliland.
No ratings yet
Effect of Porter's Generic Competitive Strategies and The Performance of Soft Drink Companies: Case of Somaliland Beverage Industry (SBI) in Somaliland.
5 pages
Ebooks File (Ebook PDF) Statistics For Business Economics 13th Edition by David All Chapters
100% (5)
Ebooks File (Ebook PDF) Statistics For Business Economics 13th Edition by David All Chapters
56 pages
Excel Regression Analysis Output Explained
No ratings yet
Excel Regression Analysis Output Explained
14 pages
Capital Structure Impact on Ethiopian Manufacturing Performance
No ratings yet
Capital Structure Impact on Ethiopian Manufacturing Performance
20 pages

Class 7 After

Uploaded by

Class 7 After

Uploaded by

PMDA

Lecture 7: Instrumental Variables (IV)

● Another method for causal inference in observational data:

● IV is a general method to eliminate endogeneity, i.e. when you have a

1. (Relevance) it is correlated with X: cov(Z,X)≠0

2. (Exogeneity) it is not correlated with e: cov(Z,e)=0

● Visualization of the two conditions for instrument validity:

● TSLS procedure. Run two regressions:

Exogenous part is the predicted value,

2. Second-stage regression of Y on predicted value

Because is uncorrelated with e (why?), the estimator of β_1 is unbiased

The example we saw of the Frisch-Waugh theorem:

2. Then regress Y on the residual

Q: What are the diﬀerences?

● β_1 is demand elasticity = % change in quantity for a 1% change in price

● Fitting a regression through the scatterplot gives a biased estimate of

Examples of instruments for the regression

A: Less rain → less milk produced → higher price

A: Sales tax aﬀects all prices so also cigarette price

Does putting criminals in jail reduce crime? (Levitt 1996)

What type of bias can we have?

What instrument would you use?

● Say we want to understand the eﬀect of class size on students’ performance

Suppose you wish to measure the impact of smoking on the weight of

● We can have more than one instrument: Z_1,…,Z_m

● We can also add control variables: W_1,…,W_r

1. First-stage regression: regress X on Z_1,…,Z_m and W_1,…,W_r and get

2. Second-stage regression: regress Y on and W_1,…,W_r

● We can also have more endogenous regressor: X_1,…,X_k

● In the latter case, we cannot estimate all the model parameters

● Requirements for validity when you have multiple instruments Z_1,…,Z_m

● Can you test these conditions?

● First stage regression:

● You can test for weak instruments by:

What if you have weak instruments?

Why can’t we do the same approach when m=k (same instruments as

● The TSLS estimator is computed in Python using the command IV2SLS

○ iv_reg.sargan: This is the J-test of instrument exogeneity. It can only be

● The TSLS estimator is computed in R using the command ivreg

You might also like