Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
16 views88 pages

Lecture Set 7

The document discusses Instrumental Variables (IV) Regression, focusing on its necessity due to biases like omitted variable bias, simultaneous causality bias, and errors-in-variables bias. It explains the Two Stage Least Squares (TSLS) method for estimating causal effects using valid instruments and provides examples, including the demand for cigarettes and butter, to illustrate these concepts. Key conditions for a valid instrument are highlighted, along with the process for conducting inference using TSLS.

Uploaded by

Jimmy Teng
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views88 pages

Lecture Set 7

The document discusses Instrumental Variables (IV) Regression, focusing on its necessity due to biases like omitted variable bias, simultaneous causality bias, and errors-in-variables bias. It explains the Two Stage Least Squares (TSLS) method for estimating causal effects using valid instruments and provides examples, including the demand for cigarettes and butter, to illustrate these concepts. Key conditions for a valid instrument are highlighted, along with the process for conducting inference using TSLS.

Uploaded by

Jimmy Teng
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 88

ECONS303: Applied Quantitative Research Methods

Lecture set 7: Instrumental Variables (IV)


Regression
2

Outline
1. IV Regression: Why and What; Two Stage Least Squares
2. The General IV Regression Model
3. Checking Instrument Validity
a) Weak and strong instruments
b) Instrument exogeneity

4. Application: Demand for cigarettes


5. Examples: Where Do Instruments Come From?
3

IV Regression: Why?
Three important threats to internal validity are:
• Omitted variable bias from a variable that is correlated with X
but is unobserved (so cannot be included in the regression) and
for which there are inadequate control variables;
• Simultaneous causality bias (X causes Y, Y causes X );
• Errors-in-variables bias (X is measured with error)
All three problems result in E(u|X) ≠ 0.
• Instrumental variables regression can eliminate bias when E(u|X )
≠ 0 – using an instrumental variable (IV), Z.
4

The IV Estimator with a Single Regressor


and a Single Instrument (SW Section 12.1)
Yi = β0 + β1Xi + ui

• The goal is to obtain an estimate of the causal effect β1.


However, X is correlated with the error term, and we cannot
solve the problem simply by including control variables.
• Instrumental variables (IV) regression breaks X into two parts: a
part that might be correlated with u, and a part that is not. By
isolating the part that is not correlated with u, it is possible to
estimate β1.
• This is done using an instrumental variable, Zi, which is
correlated with Xi but uncorrelated with ui.
5

Terminology: Endogeneity and Exogeneity


An endogenous variable is one that is correlated with u
An exogenous variable is one that is uncorrelated with u
In IV regression, we focus on the case that X is endogenous and
there is an instrument, Z, which is exogenous.
Digression on terminology: “Endogenous” literally means
“determined within the system.” If X is jointly determined with
Y, then a regression of Y on X is subject to simultaneous causality
bias. But this definition of endogeneity is too narrow because IV
regression can be used to address OV bias and errors-in-variable
bias. Thus we use the broader definition of endogeneity above.
6

Two Conditions for a Valid Instrument


Yi = β0 + β1Xi + ui
For an instrumental variable (an “instrument”) Z to be valid, it
must satisfy two conditions:
1. Instrument relevance: corr(Zi, Xi) ≠ 0
2. Instrument exogeneity: corr(Zi, ui) = 0
Suppose for now that you have such a Zi (we’ll discuss how to find
instrumental variables later). How can you use Zi to estimate β1?
7

The IV estimator with one X and one Z (1 of 7)


Explanation #1: Two Stage Least Squares (TSLS)
As it sounds, TSLS has two stages – two regressions:
(1) Isolate the part of X that is uncorrelated with u by regressing X
on Z using OLS:
Xi = π0 + π1Zi + vi (1)
• Because Zi is uncorrelated with ui, π0 + π1Zi is uncorrelated with
ui. We don’t know π0 or π1 but we have estimated them, so…

• Compute the predicted values of X i , where Xˆ i  ˆ 0  ˆ1Z i ,


i  1, , n.
8

The IV estimator with one X and one Z (2 of 7)


(2) Replace X i by Xˆ i in the regression of interest: regress Y
on Xˆ i using OLS:
Yi   0  1 Xˆ i  ui (2)
• Because Xˆ i is uncorrelated with ui , the first least squares
assumption holds for regression (2). (This requires n to be
large so that  0 and  1 are precisely estimated.)

• Thus, in large samples, β1 can be estimated by OLS using


regression (2)
• The resulting estimator is called the Two Stage Least Squares
(TSLS ) estimator, ˆ1TSLS .
9

Two Stage Least Squares: Summary


Suppose Zi, satisfies the two conditions for a valid instrument:
1. Instrument relevance: corr(Zi, Xi) ≠ 0
2. Instrument exogeneity: corr(Zi, ui) = 0
Two-stage least squares:
Stage 1: Regress X i on Zi (including an intercept), obtain the
predicted values Xˆ i

Stage 2: Regress Yi on Xˆ i (including an intercept); the coefficient


on Xˆ i is the TSLS estimator, ˆ1TSLS .

ˆ1TSLS is a consistent estimator of 1.


10

The IV estimator with one X and one Z (3 of 7)


Explanation #2: A direct algebraic derivation
Yi = β0 + β1Xi + ui
Thus,
cov(Yi, Zi) = cov(β0 + β1Xi + ui, Zi)
= cov(β0, Zi) + cov(β1Xi, Zi) + cov(ui, Zi)
= 0 + cov(β1Xi, Zi) + 0
= β1cov(Xi, Zi)
where cov(ui, Zi) = 0 by instrument exogeneity; thus
cov(Yi , Z i )
1 
cov( X i , Z i )
11

The IV estimator with one X and one Z (4 of 7)


cov(Yi , Z i )
1 
cov( X i , Z i )

The IV estimator replaces these population covariances with


sample covariances:

ˆ sYZ
1 
TSLS
,
s XZ

where sYZ and sXZ are the sample covariances. This is the TSLS
estimator – just a different derivation!
12

The IV estimator with one X and one Z (5 of 7)


Explanation #3: Derivation from the ―reduced form‖
The “reduced form” relates Y to Z and X to Z:
Xi = π0 + π1Zi + vi
Yi = γ0 + γ1Zi + wi
where wi is an error term. Because Z is exogenous, Z is uncorrelated with
both vi and wi.

The idea: A unit change in Zi results in a change in Xi of π1 and a change


in Yi of γ1. Because that change in Xi arises from the exogenous change in
Zi, that change in Xi is exogenous.
13

The IV estimator with one X and one Z (6 of 7)


The math:
Xi = π0 + π1Zi + vi
Yi = γ0 + γ1Zi + wi
Solve the X equation for Z:
Zi = –π0/π1 + (1/π1)Xi – (1/π1)vi
Substitute this into the Y equation and collect terms:
Yi = γ0 + γ1Zi + wi
= γ0 + γ1[–π0/π1 + (1/π1)Xi – (1/π1)vi] + wi
= [γ0 – π0γ1/π1] + (γ1/π1)Xi + [wi – (γ1/π1)vi]
= β0 + β1Xi + ui,
where β0 = γ0 – π0γ1 /π1, β1 = γ1/π1, and ui = wi – (γ1/π1)vi.
14

The IV estimator with one X and one Z (7 of 7)


Xi = π0 + π1Zi + vi
Yi = γ0 + γ1Zi + wi
yields
Yi = β0 + β1Xi + ui,
where
β1 = γ1/π1
Interpretation: An exogenous change in Xi of π1 units is associated
with a change in Yi of γ1 units – so the effect on Y of an exogenous
unit change in X is β1 = γ1/π1.
15

Consistency of the TSLS estimator

ˆ sYZ
1 
TSLS

s XZ
p
The sample covariances are consistent: sYZ  cov(Y , Z )
p
and s XZ  cov( X , Z ). Thus,
sYZ p cov(Y , Z )
ˆ1TSLS    1
s XZ cov( X , Z )

• The instrument relevance condition, cov(X, Z) ≠ 0, ensures that


you don’t divide by zero.
16

Example: Supply and demand for butter (1 of


2)

IV regression was first developed to estimate demand elasticities


for agricultural goods, for example, butter:

ln(Qibutter )  0  1 ln( Pi butter )  ui

• β1 = price elasticity of demand for butter = percent change in


quantity for a 1% change in price (recall log-log specification
discussion)
• Data: observations on price and quantity of butter for different
years
• The OLS regression of ln(Qibutter ) on ln( Pi butter ) suffers from
simultaneous causality bias ( why ?)
17

Example: Supply and demand for butter (2 of


2)

Simultaneous causality bias in the OLS regression of ln(Qibutter )


on ln( Pi butter ) arises because price and quantity are determined by
the interaction of demand and supply:
18

This interaction of demand and supply


produces data like…

Would a regression using these data produce the demand curve?


19

But…what would you get if only supply


shifted?

• TSLS estimates the demand curve by isolating shifts in price and


quantity that arise from shifts in supply.
• Z is a variable that shifts supply but not demand.
20

TSLS in the supply-demand example (1 of 2)

ln(Qibutter )  0  1 ln( Pi butter )  ui


Let Z = rainfall in dairy-producing regions.
Is Z a valid instrument?

(1) Relevant? corr(raini ,ln( Pi butter ))  0 ?


Plausibly: insufficient rainfall means less grazing means
less butter means higher prices
(2) Exogenous? corr(raini, ui) = 0?
Plausibly: whether it rains in dairy-producing regions
shouldn’t affect demand for butter
21

TSLS in the supply-demand example (2 of 2)


ln(Qibutter )  0  1 ln( Pi butter )  ui
Zi = raini = rainfall in dairy-producing regions.

Stage 1: regress ln( Pi butter ) on rain, get ln( Pi butter )


ln( Pi butter ) isolates changes in log price that arise from supply
(part of supply, at least)

Stage 2: regress ln(Qibutter ) on ln( Pi butter )


The regression counterpart of using shifts in the supply curve to
trace out the demand curve.
22

Inference using TSLS (1 of 5)


• In large samples, the sampling distribution of the TSLS estimator
is normal
• Inference (hypothesis tests, confidence intervals) proceeds in the
usual way, e.g. ± 1.96SE
• The idea behind the large-sample normal distribution of the
TSLS estimator is that – like all the other estimators we have
considered – it involves an average of mean zero i.i.d. random
variables, to which we can apply the CLT.
• Here is the math (SW App. 12.3)…
23

Inference using TSLS (5 of 5)


ˆ1TSLS is approx. distributed N ( 1 ,  2ˆ TSLS ),
1

1 var[( Z i   Z )ui ]
where  2
ˆ TSLS

1
n [cov( Z i , X i )]2
• Statistical inference proceeds in the usual way.
• The justification is (as usual) based on large samples
• This all assumes that the instruments are valid – we’ll discuss
what happens if they aren’t valid shortly.
• Important note on standard errors:
 The OLS standard errors from the second stage regression aren’t right 
they don’t take into account the estimation in the first stage (Xˆ i is estimated).
– Instead, use a single specialized command that computes the TSLS
estimator and the correct SEs.
– As usual, use heteroskedasticity-robust SEs
24

Example: Demand for Cigarettes (1 of 3)


ln(Qicigarettes )  0  1 ln( Pi cigarettes )  ui

Why is the OLS estimator of β1 likely to be biased?


• Data set: Panel data on annual cigarette consumption and
average prices paid (including tax), by state, for the 48
continental US states, 1985–1995.
• Proposed instrumental variable:
• Zi = general sales tax per pack in the state = SalesTaxi
• Do you think this instrument is plausibly valid?
1. Relevant ? corr( SalesTaxi ln( Pi cigarettes ))  0 ?
2. Exogenous? corr(SalesTaxi, ui) = 0?
25

Example: Demand for Cigarettes (2 of 3)


For now, use data from 1995 only.
First stage OLS regression:
ln( Pi cigarettes )  4.63  .031SalesTaxi , n  48

Second stage OLS regression:

ln(Qicigarettes )  9.72  1.08 ln( Pi cigarettes ), n  48


Combined TSLS regression with correct, heteroskedasticity-robust
standard errors:
ln(Qicigarettes )  9.72  1.08, ln( Pi cigarettes ) n  48
(1.53) (0.32)
26

STATA Example: Cigarette demand, First stage


Instrument = Z = rtaxso = general sales tax (real $/pack)
X Z
. reg lravgprs rtaxso if year==1995, r

Regression with robust standard errors Number of obs = 48


F( 1, 46) = 40.39
Prob > F = 0.0000
R-squared = 0.4710
Root MSE = .09394

------------------------------------------------------------------------------
| Robust
lravgprs | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
rtaxso | .0307289 .0048354 6.35 0.000 .0209956 .0404621
_cons | 4.616546 .0289177 159.64 0.000 4.558338 4.674755
------------------------------------------------------------------------------

X-hat
. predict lravphat
Now we have the predicted values from the 1st stage
27

Second stage
Y X-hat
. reg lpackpc lravphat if year==1995, r

Regression with robust standard errors Number of obs = 48


F( 1, 46) = 10.54
Prob > F = 0.0022
R-squared = 0.1525
Root MSE = .22645

------------------------------------------------------------------------------
| Robust
lpackpc | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
lravphat | -1.083586 .3336949 -3.25 0.002 -1.755279 -.4118932
_cons | 9.719875 1.597119 6.09 0.000 6.505042 12.93471
------------------------------------------------------------------------------

• These coefficients are the TSLS estimates


• The standard errors are wrong because they ignore the fact that
the first stage was estimated
28

Combined into a single command:


Y X Z
. ivregress 2sls lpackpc (lravgprs = rtaxso) if year==1995, vce(robust);

Instrumental variables (2SLS) regression Number of obs = 48


Wald chi2(1) = 12.05
Prob > chi2 = 0.0005
R-squared = 0.4011
Root MSE = .18635

------------------------------------------------------------------------------
| Robust
lpackpc | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
lravgprs | -1.083587 .3122035 -3.47 0.001 -1.695494 -.471679
_cons | 9.719876 1.496143 6.50 0.000 6.78749 12.65226
------------------------------------------------------------------------------
Instrumented: lravgprs This is the endogenous regressor
Instruments: rtaxso This is the instrumental varible
------------------------------------------------------------------------------

Estimated cigarette demand equation:


ln(Qicigarettes )  9.72  1.08 ln( Pi cigarettes ), n  48
(1.53) (0.31)
29

Summary of IV Regression with a Single X


and Z
• A valid instrument Z must satisfy two conditions:
1. relevance: corr(Zi, Xi) ≠ 0
2. exogeneity: corr(Zi, ui) = 0

• TSLS proceeds by first regressing X on Z to get Xˆ , then


regressing Y on Xˆ

• The key idea is that the first stage isolates part of the variation in
X that is uncorrelated with u

• If the instrument is valid, then the large-sample sampling


distribution of the TSLS estimator is normal, so inference
proceeds as usual
30

The General IV Regression Model


(SW Section 12.2)
• So far we have considered IV regression with a single
endogenous regressor (X ) and a single instrument (Z ).
• We need to extend this to:
– multiple endogenous regressors (X1,…, Xk)
– multiple included exogenous variables (W1,…,Wr) or control variables
 multiple instrumental variables ( Z1 , , Z m ). Having more (relevant)
instruments can produce a smaller variance of TSLS: the R 2 of the
first stage increases, so you have more variation in Xˆ .

• New terminology: identification & overidentification


31

Identification (1 of 2)
• In general, a parameter is said to be identified if different values
of the parameter produce different distributions of the data.
• In IV regression, whether the coefficients are identified depends
on the relation between the number of instruments (m) and the
number of endogenous regressors (k)
• Intuitively, if there are fewer instruments than endogenous
regressors, we can’t estimate β1,…,βk
– For example, suppose k = 1 but m = 0 (no instruments)!
32

Identification (2 of 2)
The coefficients β1,…, βk are said to be:
• exactly identified if m = k.
There are just enough instruments to estimate β1,…,βk.
• overidentified if m > k.
There are more than enough instruments to estimate β1,…, βk. If
so, you can test whether the instruments are valid (a test of the
“overidentifying restrictions”) – we’ll return to this later
• underidentified if m < k.
There are too few instruments to estimate β1,…, βk. If so, you
need to get more instruments!
33

The General IV Regression Model:


Summary of Jargon
Yi = β0 + β1X1i + … + βk Xki + βk+1W1i + … + βk+rWri + ui
• Yi is the dependent variable
• X1i,…, Xki are the endogenous regressors (potentially correlated with ui)
• W1i,…,Wri are the included exogenous regressors (uncorrelated with ui)
or control variables (included so that Zi is uncorrelated with ui, once the
W’s are included)
• β0, β1,…, βk+r are the unknown regression coefficients
• Z1i,…,Zmi are the m instrumental variables (the excluded exogenous
variables)
• The coefficients are overidentified if m > k; exactly identified if m = k;
and underidentified if m < k.
34

TSLS with a Single Endogenous Regressor


Yi = β0 + β1X1i + β2W1i + … + β1+rWri + ui

• m instruments: Z1i,…, Zm
• First stage
– Regress X1 on all the exogenous regressors: regress X1 on W1,…,Wr,
Z1,…, Zm, and an intercept, by OLS
 Compute predicted values Xˆ 1i , i  1, ,n
• Second stage
 Regress Y on Xˆ 1i , W1, , Wr , and an intercept, by OLS
– The coefficients from this second stage regression are the TSLS estimators, but SEs
are wrong

• To get correct SEs, do this in a single step in your regression software


35

Example: Demand for cigarettes (3 of 3)


Suppose income is exogenous (this is plausible – why?), and we
also want to estimate the income elasticity:

ln(ln(Qicigarettes ))   0  1 ln(ln( Pi cigarettes ))   2 ln( Incomei )  ui

We actually have two instruments:


Z1i = general sales taxi
Z2i = cigarette-specific taxi

• Endogenous variable: ln(ln( Pi cigarettes )) (“one X ”)


• Included exogenous variable: ln(Incomei) (“one W ”)
• Instruments (excluded endogenous variables): general sales tax,
cigarette-specific tax (“two Zs”)
• Is β1 over–, under–, or exactly identified?
36

Example: Cigarette demand, one instrument


IV: rtaxso = real overall sales tax in state
Y W X Z
. ivreg lpackpc lperinc (lravgprs = rtaxso) if year==1995, r

IV (2SLS) regression with robust standard errors Number of obs = 48


F( 2, 45) = 8.19
Prob > F = 0.0009
R-squared = 0.4189
Root MSE = .18957

------------------------------------------------------------------------------
| Robust
lpackpc | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
lravgprs | -1.143375 .3723025 -3.07 0.004 -1.893231 -.3935191
lperinc | .214515 .3117467 0.69 0.495 -.413375 .842405
_cons | 9.430658 1.259392 7.49 0.000 6.894112 11.9672
------------------------------------------------------------------------------
Instrumented: lravgprs
Instruments: lperinc rtaxso STATA lists ALL the exogenous regressors
as instruments – slightly different
terminology than we have been using
------------------------------------------------------------------------------
• Running IV as a single command yields the correct SEs
• Use , r for heteroskedasticity-robust SEs
37

Example: Cigarette demand, two instruments


(1 of 2)

Y W X Z1 Z2
. ivreg lpackpc lperinc (lravgprs = rtaxso rtax) if year==1995, r;

IV (2SLS) regression with robust standard errors Number of obs = 48


F( 2, 45) = 16.17
Prob > F = 0.0000
R-squared = 0.4294
Root MSE = .18786

------------------------------------------------------------------------------
| Robust
lpackpc | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
lravgprs | -1.277424 .2496099 -5.12 0.000 -1.780164 -.7746837
lperinc | .2804045 .2538894 1.10 0.275 -.230955 .7917641
_cons | 9.894955 .9592169 10.32 0.000 7.962993 11.82692
------------------------------------------------------------------------------
Instrumented: lravgprs
Instruments: lperinc rtaxso rtax STATA lists ALL the exogenous regressors
as “instruments” – slightly different
terminology than we have been using
------------------------------------------------------------------------------
38

Example: Cigarette demand, two instruments


(2 of 2)
TSLS estimates, Z = sales tax (m = 1)

ln(Qicigarettes )  9.43  1.14 ln( Pi cigarettes )  0.21ln( Incomei )


(1.26) (0.37) (0.31)

TSLS estimates, Z = sales tax & cig-only tax (m = 2)

ln(Qicigarettes )  9.89  1.28 ln( Pi cigarettes )  0.28ln( Incomei )


(0.96) (0.25) (0.25)
• Smaller SEs for m = 2. Using 2 instruments gives more information –
more “as-if random variation.”
• Low income elasticity (not a luxury good); income elasticity not
statistically significantly different from 0
• Surprisingly high price elasticity
39

The General Instrument Validity


Assumptions
Yi = β0 + β1X1i + … + βk Xki + βk+1W1i + … + βk+rWri + ui
(1) Instrument exogeneity: corr(Z1i, ui) = 0,…, corr(Zmi, ui) = 0
(2) Instrument relevance: General case, multiple X’s
Suppose the second stage regression could be run using the
predicted values from the population first stage regression. Then:
there is no perfect multicollinearity in this (infeasible) second
stage regression.
• Special case of one X: the general assumption is equivalent to
(a) at least one instrument must enter the population
counterpart of the first stage regression, and (b) the W ’s are
not perfectly multicollinear.
40

The IV Regression Assumptions


Yi = β0 + β1X1i + … + βkXki + βk+1W1i + … + βk+rWri + ui
1. E(ui|W1i,…,Wri) = 0
• the additional regressors are exogenous.

2. (Yi, X1i,…,Xki,W1i,…,Wri,Z1i,…,Zmi) are i.i.d.


3. The X’s, W’s, Z’s, and Y have nonzero, finite 4th moments
4. The instruments (Z1i,…,Zmi) are valid.
• We have discussed this

• Under 1–4, TSLS and its t-statistic are normally distributed


• The critical requirement is that the instruments be valid
41

W ’s as control variables (1 of 2)
• In many cases, the purpose of including the W’s is to control for
omitted factors, so that once the W’s are included, Z is
uncorrelated with u.
• If so, W’s don’t need to be exogenous; instead, the W’s need to
be effective control variables in the sense discussed in Chapter 7.
• Technically, the condition for W’s being effective control
variables is that the conditional mean of ui does not depend on
Zi, given Wi:

E(ui|Wi, Zi) = E(ui|Wi)


42

W ’s as control variables (2 of 2)
• Thus an alternative to IV regression assumption #1 is that
conditional mean independence holds:
E(ui|Wi, Zi) = E(ui|Wi)
This is the IV version of the conditional mean independence
(CMI) assumption in Chapter 7.

• Here is the key idea: in many applications you need to include


control variables (W’s) so that Z is plausibly exogenous
(uncorrelated with u).
43

Checking Instrument Validity (SW Section 12.3)


Recall the two requirements for valid instruments:

1. Relevance (special case of one X)


At least one instrument must enter the population counterpart
of the first stage regression.
2. Exogeneity
All the instruments must be uncorrelated with the error term:
corr(Z1i, ui) = 0,…, corr(Zmi, ui) = 0

What happens if one of these requirements isn’t satisfied? How


can you check? What do you do?
If you have multiple instruments, which should you use?
44

Checking Assumption #1: Instrument


Relevance
We will focus on a single included endogenous regressor:
Yi = β0 + β1Xi + β2W1i + … + β1+rWri + ui
First stage regression:
Xi = π0 + π1Z1i +…+ πmZmi + πm+1W1i +…+ πm+kWki + ui
• The instruments are relevant if at least one of π1,…, πm are
nonzero.
• The instruments are said to be weak if all the π1,…, πm are either
zero or nearly zero.
• Weak instruments explain very little of the variation in X.
45

What are the consequences of weak instruments?


If instruments are weak, the sampling distribution of TSLS and its
t-statistic are not (at all) normal, even with n large.
Consider the simplest case of 1 X, 1 Z, no control variables:

ˆ sYZ
The IV estimator is 1 
TSLS

s XZ

• If cov(X, Z) is zero or small, then sXZ will be small: With weak


instruments, the denominator is nearly zero.
• If so, the sampling distribution of ˆ1TSLS (and its t -statistic) is not well
approximated by its large-n normal approximation...
46

An example: The sampling distribution of


the TSLS t-statistic with weak instruments

Dark line = irrelevant instruments


Dashed light line = strong instruments
47

Why does our trusty normal approximation fail us?

• If cov( X , Z ) is small, small changes in s XZ (from one sample to the


next) can induce big changes in ˆ1TSLS

• Thus the large-n normal approximation is a poor approximation


to the sampling distribution of ˆ1TSLS

• A better approximation is that ˆ1TSLS is distributed as the ratio of two


correlated normal random variables (see SW App. 12.4)

• If instruments are weak, the usual methods of inference are


unreliable – potentially very unreliable.
48

Measuring the Strength of Instruments in


Practice: The First-Stage F-statistic
• The first stage regression (one X ):
• Regress X on Z1,..,Zm,W1,…,Wk.
• Totally irrelevant instruments ↔ all the coefficients on Z1,…,Zm
are zero.
• The first-stage F-statistic tests the hypothesis that Z1,…,Zm do
not enter the first stage regression.
• Weak instruments imply a small first stage F-statistic.
49

Checking for Weak Instruments with a


Single X (1 of 2)
• Compute the first-stage F-statistic.
Rule-of-thumb: If the first stage F-statistic is less than 10, then
the set of instruments is weak.
• If so, the TSLS estimator will be biased, and statistical
inferences (standard errors, hypothesis tests, confidence
intervals) can be misleading.
50

Checking for Weak Instruments with a


Single X (2 of 2)
• Why compare the first-stage F to 10?
• Simply rejecting the null hypothesis that the coefficients on the
Z’s are zero isn’t enough – you need substantial predictive
content for the normal approximation to be a good one.
• Comparing the first-stage F to 10 is to tests for whether the bias
of TSLS, relative to OLS, is less than 10%.
• If F is smaller than 10, the relative bias exceeds 10%—that
is, TSLS can have substantial bias (see SW App. 12.5).
51

What to do if you have weak instruments


• Get better instruments (often easier said than done!)
• If you have many instruments, some are probably weaker than
others and it’s a good idea to drop the weaker ones (dropping an
irrelevant instrument will increase the first-stage F)
• If you only have a few instruments, and all are weak, then you
need to employ methods other than TSLS.
• Some other methods for IV analysis are less sensitive than TSLS,
and some of these methods are discussed in SW edition 4
Appendix 12.5.
52

Weak Instruments and Heteroskedasticity


The foregoing discussion applies to the homoskedasticity case. In
practice, you would want to use robust SEs, either
heteroskedasticity-robust or, in panel data, clustered SEs.
• If you have 1 X and 1 Z:
– Assess instrument strength using the robust first-stage F, which you can
compare to 10
– Compute weak-instrument confidence intervals by the Anderson-Rubin
method, using robust SEs in the regression of Yi – β1,0Xi on W1i,…, Wri,
Z1i,…, Zmi

• If you have more than one Z, then the methods for weak-
instrument robust inference go beyond the scope of this book. A
reasonable compromise – better than ignoring the weak
instrument problem – is to use homoskedasticity-only SEs for
the first stage F and the CLR (if available) for confidence
intervals for β1
53

Checking Assumption #2: Instrument


Exogeneity
• Instrument exogeneity: All the instruments are uncorrelated with
the error term: corr(Z1i, ui) = 0,…, corr(Zmi, ui) = 0

• If the instruments are correlated with the error term, the first stage
of TSLS cannot isolate a component of X that is uncorrelated with
the error term, so Xˆ is correlated with u and TSLS is inconsistent.

• If there are more instruments than endogenous regressors, it is


possible to test – partially – for instrument exogeneity.
54

Testing Overidentifying Restrictions


Consider the simplest case:
Yi = β0 + β1Xi + ui,
• Suppose there are two valid instruments: Z1i, Z2i
• Then you could compute two separate TSLS estimates.
• Intuitively, if these 2 TSLS estimates are very different from
each other, then something must be wrong: one or the other (or
both) of the instruments must be invalid.
• The J-test of overidentifying restrictions makes this comparison
in a statistically precise way.
• This can only be done if #Z’s > #X ’s, i.e., m > k (overidentified).
55

The J-test of Overidentifying Restrictions (1 of 2)


Suppose # instruments = m > # X’s = k (overidentified)

Yi = β0 + β1X1i + … + βk Xki + βk+1W1i + … + βk+rWri + ui


The J-test is the Anderson-Rubin test, using the TSLS estimator instead
of the hypothesized value β1,0. The recipe:
1. First estimate the equation of interest using TSLS and all m
instruments; compute the predicted values Yˆi , using the actual
X ’s (not the Xˆ ’s used to estimate the second stage)
2. Compute the residuals uˆ  Y  Yˆ
i i i

3. Regress against Z1i,…,Zmi, W1i,…,Wri


4. Compute the F-statistic testing the hypothesis that the coefficients
on Z1i,…,Zmi are all zero;
5. The J-statistic is J = mF
56

The J-test of Overidentifying Restrictions (2 of 2)


J = mF, where F = the F-statistic testing the coefficients on
Z1i,…,Zmi in a regression of the TSLS residuals against Z1i,…,Zmi,
W1i,…,Wri.
Distribution of the J-statistic
• Under the null hypothesis that all the instruments are exogeneous,
J has a chi-squared distribution with m–k degrees of freedom
• If m = k, J = 0 (does this make sense?)
• If some instruments are exogenous and others are endogenous,
the J statistic will be large, and the null hypothesis that all
instruments are exogenous will be rejected.
57

Checking Instrument Validity: Summary (1 of 2)


This summary considers the case of a single X. The two
requirements for valid instruments are:
1. Relevance
• At least one instrument must enter the population
counterpart of the first stage regression.
• If instruments are weak, then the TSLS estimator is biased
and the and t-statistic has a non-normal distribution
• To check for weak instruments with a single included
endogenous regressor, check the first-stage F
– If F > 10, instruments are strong – use TSLS
– If F < 10, weak instruments – take some action.
58

Checking Instrument Validity: Summary (2 of 2)


2. Exogeneity
• All the instruments must be uncorrelated with the error
term: corr(Z1i,ui) = 0,…, corr(Zmi,ui) = 0
• We can partially test for exogeneity: if m > 1, we can test
the null hypothesis that all the instruments are exogenous,
against the alternative that as many as m – 1 are endogenous
(correlated with u)
• The test is the J-test, which is constructed using the TSLS
residuals.
• If the J-test rejects, then at least some of your instruments
are endogenous – so you must make a difficult decision and
jettison some (or all) of your instruments.
59

Application to the Demand for Cigarettes


(SW Section 12.4)
Why are we interested in knowing the elasticity of demand for
cigarettes?
• Theory of optimal taxation. The optimal tax rate is inversely
related to the price elasticity: the greater the elasticity, the less
quantity is affected by a given percentage tax, so the smaller is
the change in consumption and deadweight loss.
• Externalities of smoking – role for government intervention to
discourage smoking
– health effects of second-hand smoke? (non-monetary)
– monetary externalities
60

Panel data set


• Annual cigarette consumption, average prices paid by end
consumer (including tax), personal income, and tax rates
(cigarette-specific and general statewide sales tax rates)
• 48 continental US states, 1985–1995
Estimation strategy
• We need to use IV estimation methods to handle the
simultaneous causality bias that arises from the interaction of
supply and demand.
• State binary indicators = W variables (control variables) which
control for unobserved state-level characteristics that affect the
demand for cigarettes and the tax rate, as long as those
characteristics don’t vary over time.
61

Fixed-effects model of cigarette demand


ln(Qitcigarettes )   i  1 ln( Pitcigarettes )   2 ln( Incomeit )  uit

• i = 1,…,48, t = 1985, 1986,…,1995


• corr(ln( Pitcigarettes ), uit ) is plausibly nonzero because of supply/demand
interactions
• αi reflects unobserved omitted factors that vary across states but
not over time, e.g. attitude towards smoking
• Estimation strategy:
– Use panel data regression methods to eliminate αi
– Use TSLS to handle simultaneous causality bias
– Use T = 2 with 1985 – 1995 changes (“changes” method) – look at long-
term response, not short-term dynamics (short- v. long-run elasticities)
62

The ―changes‖ method (when T=2)


• One way to model long-term effects is to consider 10-year
changes, between 1985 and 1995
• Rewrite the regression in “changes” form:
ln(Qicigarettes
1995 )  ln(Q cigarettes
i1985 )  1[ln( P cigarettes
i1995 )  ln( P cigarettes
i1985 )]
  2 [ln( Incomei1995 )  ln( Incomei1985 )]  (ui1995  ui1985 )

• Create “10-year change” variables, for example:


• 10-year change in log price = ln(Pi1995) – ln(Pi1985)
• Then estimate the demand elasticity by TSLS using 10-year
changes in the instrumental variables
• This is equivalent to using the original data and including the
state binary indicators (“W ” variables) in the regression
63

STATA: Cigarette demand


First create ―10-year change‖ variables
10-year change in log price
= ln(Pit) – ln(Pit–10) = ln(Pit /Pit–10)

. gen dlpackpc = log(packpc/packpc[_n-10]) _n-10 is the 10-yr lagged value


. gen dlavgprs = log(avgprs/avgprs[_n-10])
. gen dlperinc = log(perinc/perinc[_n-10])
. gen drtaxs = rtaxs-rtaxs[_n-10]
. gen drtax = rtax-rtax[_n-10]
. gen drtaxso = rtaxso-rtaxso[_n-10]
64

Use TSLS to estimate the demand elasticity by


using the ―10-year changes‖ specification
Y W X Z
. ivregress 2sls dlpackpc dlperinc (dlavgprs = drtaxso) , r

IV (2SLS) regression with robust standard errors Number of obs = 48


F( 2, 45) = 12.31
Prob > F = 0.0001
R-squared = 0.5499
Root MSE = .09092

------------------------------------------------------------------------------
| Robust
dlpackpc | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
dlavgprs | -.9380143 .2075022 -4.52 0.000 -1.355945 -.5200834
dlperinc | .5259693 .3394942 1.55 0.128 -.1578071 1.209746
_cons | .2085492 .1302294 1.60 0.116 -.0537463 .4708446
------------------------------------------------------------------------------
Instrumented: dlavgprs
Instruments: dlperinc drtaxso
------------------------------------------------------------------------------
NOTE:
- All the variables – Y, X, W, and Z’s – are in 10-year changes
- Estimated elasticity = –.94 (SE = .21) – surprisingly elastic!
- Income elasticity small, not statistically different from zero
- Must check whether the instrument is relevant…
65

Check instrument relevance: compute first-


stage F
. reg dlavgprs drtaxso dlperinc

Source | SS df MS Number of obs = 48


-------------+------------------------------ F( 2, 45) = 23.86
Model | .191437213 2 .095718606 Prob > F = 0.0000
Residual | .180549989 45 .004012222 R-squared = 0.5146
-------------+------------------------------ Adj R-squared = 0.4931
Total | .371987202 47 .007914621 Root MSE = .06334
------------------------------------------------------------------------------
dlavgprs | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
drtaxso | .0254611 .0037374 6.81 0.000 .0179337 .0329885
dlperinc | -.2241037 .2119405 -1.06 0.296 -.6509738 .2027664
_cons | .5321948 .031249 17.03 0.000 .4692561 .5951334
------------------------------------------------------------------------------

. test drtaxso
( 1) drtaxso = 0 We didn’t need to run “test” here!
With m=1 instrument, the F-stat is
F( 1, 45) = 46.41 the square of the t-stat:
Prob > F = 0.0000 6.81*6.81 = 46.41

First stage F = 46.5 > 10 so instrument is not weak

Can we check instrument exogeneity? No: m = k


66

Cigarette demand, 10 year changes – 2 IVs


Y W X Z1 Z2
. ivregress 2sls dlpackpc dlperinc (dlavgprs = drtaxso drtax) , vce(r)

Instrumental variables (2SLS) regression Number of obs = 48


Wald chi2(2) = 45.44
Prob > chi2 = 0.0000
R-squared = 0.5466
Root MSE = .08836

------------------------------------------------------------------------------
| Robust
dlpackpc | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
dlavgprs | -1.202403 .1906896 -6.31 0.000 -1.576148 -.8286588
dlperinc | .4620299 .2995177 1.54 0.123 -.1250139 1.049074
_cons | .3665388 .1180414 3.11 0.002 .1351819 .5978957
------------------------------------------------------------------------------
Instrumented: dlavgprs
Instruments: dlperinc drtaxso drtax
------------------------------------------------------------------------------

drtaxso = general sales tax only


drtax = cigarette-specific tax only
Estimated elasticity is -1.2, even more elastic than using general
sales tax only!
67

First-stage F – both instruments


X Z1 Z2 W
. reg dlavgprs drtaxso drtax dlperinc

Source | SS df MS Number of obs = 48


-------------+------------------------------ F( 3, 44) = 51.36
Model | .289359873 3 .096453291 Prob > F = 0.0000
Residual | .082627329 44 .001877894 R-squared = 0.7779
-------------+------------------------------ Adj R-squared = 0.7627
Total | .371987202 47 .007914621 Root MSE = .04333

------------------------------------------------------------------------------
dlavgprs | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
drtaxso | .013457 .0030498 4.41 0.000 .0073106 .0196033
drtax | .0075734 .0010488 7.22 0.000 .0054597 .009687
dlperinc | -.0289943 .1474923 -0.20 0.845 -.3262455 .2682568
_cons | .4919733 .0220923 22.27 0.000 .4474492 .5364973
------------------------------------------------------------------------------
. test drtaxso drtax
( 1) drtaxso = 0
( 2) drtax = 0
F( 2, 44) = 75.65 75.65 > 10 so instruments aren’t weak
Prob > F = 0.0000

With m>k, we can test the overidentifying restrictions…


68

Test the overidentifying restrictions (1 of 2)


. predict e, resid Computes predicted values for most recently
estimated regression (the previous TSLS regression)
. reg e drtaxso drtax dlperinc Regress e on Z’s and W’s

Source | SS df MS Number of obs = 48


-------------+------------------------------ F( 3, 44) = 1.64
Model | .037769176 3 .012589725 Prob > F = 0.1929
Residual | .336952289 44 .007658007 R-squared = 0.1008
-------------+------------------------------ Adj R-squared = 0.0395
Total | .374721465 47 .007972797 Root MSE = .08751

------------------------------------------------------------------------------
e | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
drtaxso | .0127669 .0061587 2.07 0.044 .000355 .0251789
drtax | -.0038077 .0021179 -1.80 0.079 -.008076 .0004607
dlperinc | -.0934062 .2978459 -0.31 0.755 -.6936752 .5068627
_cons | .002939 .0446131 0.07 0.948 -.0869728 .0928509
------------------------------------------------------------------------------
. test drtaxso drtax
( 1) drtaxso = 0 Compute J-statistic, which is m*F,
( 2) drtax = 0 where F tests whether coefficients on
the instruments are zero
F( 2, 44) = 2.47 so J = 2  2.47 = 4.93
Prob > F = 0.0966 ** WARNING – this uses the wrong d.f. **
69

Test the overidentifying restrictions (2 of 2)


The correct degrees of freedom for the J-statistic is m–k:
• J = mF, where F = the F-statistic testing the coefficients on Z1i,…,Zmi
in a regression of the TSLS residuals against Z1i,…,Zmi, W1i,…,Wmi.
• Under the null hypothesis that all the instruments are exogeneous, J
has a chi-squared distribution with m–k degrees of freedom
• Here, J = 4.93, distributed chi-squared with d.f. = 1; the 5% critical
value is 3.84, so reject at 5% sig. level.
• In STATA:
. dis "J-stat = " r(df)*r(F) " p-value = " chiprob(r(df)-1,r(df)*r(F))
J-stat = 4.9319853 p-value = .02636401

J = 2 × 2.47 = 4.93 p-value from chi-squared(1) distribution

Now what???
Tabular summary of these results: 70
71

How should we interpret the J-test rejection?


• J-test rejects the null hypothesis that both the instruments are
exogenous
• This means that either rtaxso is endogenous, or rtax is
endogenous, or both!
• The J-test doesn’t tell us which! You must exercise judgment…
• Why might rtax (cig-only tax) be endogenous?
– Political forces: history of smoking or lots of smokers  political pressure
for low cigarette taxes
– If so, cig-only tax is endogenous

• This reasoning doesn’t apply to general sales tax


• → use just one instrument, the general sales tax
72

The Demand for Cigarettes: Summary


of Empirical Results
• Use the estimated elasticity based on TSLS with the general
sales tax as the only instrument:

Elasticity = –.94, SE = .21

• This elasticity is surprisingly large (not inelastic) – a 1%


increase in prices reduces cigarette sales by nearly 1%. This is
much more elastic than conventional wisdom in the health
economics literature.

• This is a long-run (ten-year change) elasticity. What would you


expect a short-run (one-year change) elasticity to be – more or
less elastic?
73

Assess the Validity of the Study (1 of 2)


Remaining threats to internal validity?
1. Omitted variable bias?
– The fixed effects estimator controls for unobserved factors that vary
across states but not over time

2. Functional form mis-specification? (could check this)


3. Remaining simultaneous causality bias?
– Not if the general sales tax a valid instrument, once state fixed effects are
included!

4. Errors-in-variables bias?
5. Selection bias? (no, we have all the states)
6. An additional threat to internal validity of IV regression studies is
whether the instrument is (1) relevant and (2) exogenous. How
significant are these threats in the cigarette elasticity application?
74

Assess the Validity of the Study (2 of 2)


External validity?
• We have estimated a long-run elasticity – can it be generalized to
a short-run elasticity? Why or why not?
• Suppose we want to use the estimated elasticity of –0.94 to guide
policy today. Here are two changes since the period covered by
the data (1985–95) – do these changes pose a threat to external
validity (generalization from 1985–95 to today)?
– Levels of smoking today are lower than in 1985–1995
– Cultural attitudes toward smoking have changed against smoking since
1985–95.
75

Where Do Valid Instruments Come From?


(SW Section 12.5)
General comments
The hard part of IV analysis is finding valid instruments
• Method #1: “variables in another equation” (e.g. supply shifters that do
not affect demand)
• Method #2: look for exogenous variation (Z) that is “as if ” randomly
assigned (does not directly affect Y ) but affects X.
• These two methods are different ways to think about the same issues –
see the link…
– Rainfall shifts the supply curve for butter but not the demand curve;
rainfall is “as if ” randomly assigned
– Sales tax shifts the supply curve for cigarettes but not the demand curve;
sales taxes are “as if ” randomly assigned
76

Conclusion (SW Section 12.6)


• A valid instrument lets us isolate a part of X that is uncorrelated
with u, and that part can be used to estimate the effect of a change
in X on Y
• IV regression hinges on having valid instruments:
1. Relevance: Check via first-stage F
2. Exogeneity: Test overidentifying restrictions via the J-statistic
• A valid instrument isolates variation in X that is “as if ” randomly
assigned.
• The critical requirement of at least m valid instruments cannot be
tested – you must use your head.
77

Some IV FAQs (1 of 2)
1. When might I want to use IV regression?
Any time that X is correlated with u and you have a valid instrument. The
primary reasons for correlation between X and u could be:

• Omitted variable(s) that lead to OV bias


– Ex: ability bias in returns to education

• Measurement error
– Ex: measurement error in years of education

• Selection bias
– Patients select treatment

• Simultaneous causality bias


– Ex: supply and demand for butter, cigarettes
78

THE END
79

APPENDIX
80

Inference using TSLS (2 of 5)


1 n n

SYZ 
n  1 i 1
(Yi  Y )( Z i  Z )  Yi ( Z i  Z )
ˆ
1 
TSLS
  i 1
1 n n

 ( X i  X )( Z i  Z )  X i ( Z i  Z )
S XZ
n  1 i 1 i 1

Substitute in Yi = β0 + β1Xi + ui and simplify:


n n
1  X i ( Z i  Z )   ui ( Z i  Z )
ˆ1TSLS  i 1
n
i 1

 X (Z
i 1
i i  Z)

so…
81

Inference using TSLS (3 of 5)


n

 u (Z i i  Z)
ˆ1TSLS  1  i 1
n
.
 X (Z
i 1
i i  Z)
n

 u (Z i i  Z)
So ˆ1TSLS  1  i 1
n

 X (Z
i 1
i i  Z)

Multiply through by n :
1 n
 ( Z i  Z )ui
n ( ˆ1TSLS  1 )  n ni 1
1
 X i (Zi  Z )
n i 1
82

Inference using TSLS (4 of 5)


1 n
 ( Z i  Z )ui
n ( ˆ1TSLS  1 )  n ni 1
1

n i 1
X i (Zi  Z )

1 n 1 n p


n i 1
X i ( Z i  Z )   ( X i  X )( Z i  Z )  cov( X , Z )  0
n i 1
1 n

n i 1
( Z i  Z )ui is distributed N (0, var[( Z   Z )u ]) (CLT)

so: ˆ TSLS is approx. distributed N (  ,  2TSLS ),


1 1 ˆ1

1 var[( Z i   Z )ui ]
where  2
ˆ TSLS
 2
.
1
n [cov( Z i , X i )]

where cov(X, Z) ≠ 0 because the instrument is relevant


83

Confidence Intervals with Weak Instruments (1 of 2)


• With weak instruments, TSLS confidence intervals are not valid –
but some other confidence intervals are. Here are two ways to
compute confidence intervals that are valid in large samples, even
if instruments are weak:
1. The Anderson-Rubin confidence interval
• The Anderson-Rubin confidence interval is based on the
Anderson-Rubin test statistic testing β1 = β1,0:
– Compute = Yi – β1,0Xi
– Regress on W1i,…, Wri, Z1i,…, Zmi
– The AR test is the F-statistic on Z1i,…, Zmi

• Now invert this test: the 95% AR confidence interval is the


set of β1 not rejected at the 5% level by the AR test.
• Computation: a pain by hand! use specialized software.
84

Confidence Intervals with Weak Instruments (2 of 2)


2. Moreira’s Conditional Likelihood Ratio confidence interval
• The Conditional Likelihood Ratio (CLR) confidence
interval is based on inverting Moreira’s Conditional
Likelihood Ratio test. Computing this test, its critical value,
and the CLR confidence interval requires specialized
software.
• The CLR confidence interval tends to be tighter than the
Anderson-Rubin confidence interval, especially when there
are many instruments.
• If your software produces the CLR confidence interval, this
is the one to use.
85

Example #1: Effect of Studying on Grades (4 of 6)


Yi = β0 + β1Xi + ui
Y = first-semester GPA
X = average study hours per day
Z = 1 if roommate brought video game, = 0 otherwise
Roommates were randomly assigned
Can you think of a reason that Z might be correlated with u – even
though it is randomly assigned? What else enters the error term –
what are other determinants of grades, beyond time spent studying?
86
Example #1: Effect of Studying on Grades (5 of 6)
Yi = β0 + β1Xi + ui
Why might Z be correlated with u?
• Here’s a hypothetical possibility: the student’s sex. Suppose:
– Roommates are randomly assigned – except always men with men and
women with women.
– Women get better grades than men, holding constant hour spent studying
– Men are more likely to bring a video game than women
– Then corr(Zi, ui) < 0 (males are more likely to have a [male] roommate
who brings a video game – but males also tend to have lower grades,
holding constant the amount of studying).

• Because corr(Zi, ui) < 0, the IV (roommate brings video game)


isn’t valid.
– This is the IV version of OV bias.
– The solution to OV bias is to control for (or include) the OV – in this
case, sex.
87

Example #1: Effect of Studying on Grades (6 of 6)


• This logic leads you to include W = student’s sex as a control
variable in the IV regression:
Yi = β0 + β1Xi + β2Wi + ui
• The TSLS estimate reported above is from a regression that
included gender as a W variable – along with other variables
such as individual i’s major.
• The conditional mean independence condition for an exogenous
instrument is, E(ui|Zi,Wi) = E(ui|Wi).
– In words: among men (conditional on W = male), roommates are
randomly assigned, so whether your roommate brings a video game is
random. Same thing among women (conditional on W = female).
– The instrument is not exogenous if W isn’t included in the regression.
– But when W is included, the conditional mean independence condition
E(ui|Zi,Wi) = E(ui|Wi) holds, and the instrument is valid.
88

Estimation with Weak Instruments


There are no unbiased estimators if instruments are weak or
irrelevant. However, some estimators have a distribution more
centered around β1 than TSLS.

• One such estimator is the limited information maximum


likelihood estimator (LIML)
• The LIML estimator
– can be derived as a maximum likelihood estimator
– is the value of β1 that minimizes the p-value of the AR test(!)

• For more discussion about estimators, tests, and confidence


intervals when you have weak instruments, see SW, App. 12.5

You might also like