Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
31 views42 pages

Did, Iv

Uploaded by

sangbo930116
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views42 pages

Did, Iv

Uploaded by

sangbo930116
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 42

Chapter 6: Panel Data

Peter Hull

Mathematical Econometrics I
Brown University
Spring 2024
Motivation

We’ve seen how to estimate causal effects when a treatment is as


good as randomly assigned conditional on observable characteristics

But often we’re worried that there are unobservable characteristics we


haven’t properly accounted for (i.e. confounding variables)

Next we’ll think about how we can deal with certain types of
unobserved confounding variables when we have panel data

1
What is Panel Data?

Panel data refers to a situation where we observe observations for


each unit i (say a person or state) across multiple periods t

Why is this useful? It allows us to look at differences in outcomes


between treated/untreated units before the treatment occurred

If treated/control outcomes are different before the treatment, this


must be the result of confounding factors.

So we can potentially use pre-treatment differences to learn about the


confounds and adjust for them

Let’s see how this works in an example of difference-in-differences,


which is the most common panel data method used in applied
microeconomic research

2
Outline

1. Diff-in-Diff Basics

2. DiD Meets Regression

3. The DiD Frontier

3
Hastings (2004)

In 2004, Justine Hastings (a former Brown prof!) wrote a study


analyzing how mergers in the gas industry affect gas prices

In particular, she studied an episode in California where a refinery,


ARCO, bought one of the largest gas stations, Thrifty

How do you think such a merger might affect prices?


On the one hand, it could reduce competition and increase prices
On the other, a merger could reduce costs of providing gas and
decrease prices (synergies)

Hastings attempted to answer this question empirically using data on


gas prices by neighborhood in CA
Data contains info on neighborhoods both with/without Thrifty stations

4
Suppose first that we only had data on gas prices from after the
merger occurred.

We could compare prices in areas that had a Thrifty beforehand


(Di = 1) and places that didn’t have a Thrifty beforehand (Di = 0) to
estimate the causal effect of a Thrifty conversion

Why might this not give us the causal effect of converting Thrifties?
Omitted variables!

In particular, places that already had a Thrifty beforehand likely had


more competition than places without a Thrifty. We thus might
expect them to have lower prices.

With panel data, we can test this empirically by looking at prices


before the merger!

5
Before the merger, stations in markets competing with Thrifty had gas
prices about 3 cents lower in every period

Is it reasonable to assume unconfoundedness after the merger? No!

A better assumption might be that the gap would haved remained 3c


if not for the merger! This is the idea of difference-in-differences

6
After the merger, stations in areas with a Thrifty had higher prices by
about 2c

If we assume that they would have had lower prices by 3c (as before
the merger), then this implies a treatment effect of 2 − (−3) = 5

This is the post-treatment difference (2) between treatment & control


minus the pre-treatment difference (-3), i.e. a difference-in-differences
7
Formalizing the Assumptions of DiD
Assume there are 2 periods, t = 1, 2. Treated units (Di = 1) are
treated in period 2; control units never-treated.

Let Yit be the observed outcome for unit i in period t.


Assume Yit = Di Yit (1) + (1 − Di )Yit (0)

No anticipation assumption: Yi1 (0) = Yi1 (1)


Your treatment in period 2 doesn’t affect your outcome in period 1

Parallel trends assumption:

E [Y (0) − Yi1 (0)|Di = 1] = E [Yi2 (0) − Yi1 (0)|Di = 0]


| i2 {z } | {z }
Change in Y (0) for treated Change in Y (0) for control

Equivalently,
E [Yi2 (0)|Di = 1] − E [Yi2 (0)|Di = 0] = E [Yi1 (0)|Di = 1] − E [Yi1 (0)|Di = 0]
| {z } | {z }
Selection bias in period 2 Selection bias in period 1

8
Under these assumptions, we have

E [Y − Y |D = 1] − E [Yi2 − Yi1 |Di = 0] =


| i2 {zi1 i } | {z }
Observed change for treated Observed change for control

= E [Yi2 (1) − Yi1 (1)|Di = 1] − E [Yi2 (0) − Yi1 (0)|Di = 0] (Observed data rule)

= E [Yi2 (1) − Yi1 (0)|Di = 1] − E [Yi2 (0) − Yi1 (0)|Di = 0] (No anticipation)

= E [Yi2 (1) − Yi2 (0)|Di = 1]+


E [Yi2 (0) − Yi1 (0)|Di = 1] − E [Yi2 (0) − Yi1 (0)|Di = 0](Adding and subtracting)
= E [Yi2 (1) − Yi2 (0)|Di = 1] (Parallel trends)

Thus, the difference-in-difference of sample means identifies


τATT = E [Yi2 (1) − Yi2 (0)|Di = 1].

This is called the average treatment effect on the treated (ATT).


It is the average effect in period 2 for treated units.

9
Estimating the ATT

We’ve shown that under the DiD assumptions (parallel trends and no
anticipation), the ATT is identified as

τATT = E [Y − Y |D = 1] − E [Yi2 − Yi1 |Di = 0]


| i2 {zi1 i } | {z }
Change in pop mean for treated Change in pop mean for control

How can we estimate this? Plug in sample means!

Our estimate is:

τ̂ATT = Ȳ − Ȳ − Ȳ02 − Ȳ01 ,


| 12 {z 11} | {z }
Change in sample mean for treated Change in pop mean for control

where Ȳdt is the sample mean for units with Di = d in period t.

10
Example
Consider Hasting’s example, comparing June (period 1) to October

τ̂ATT = Ȳ − Ȳ − Ȳ02 − Ȳ01 =


| 12 {z 11} | {z }
Change in sample mean for treated Change in pop mean for control

(1.43 − 1.25) − (1.41 − 1.28) = 0.05


11
Outline

1. Diff-in-Diff Basics✓

2. DiD Meets Regression

3. The DiD Frontier

12
DiD as Regression
Consider the regression
Yit = β0 + β1 × Postt + β2 Di + β3 Di × Postt + εit ,
where Postt = 1[t = 2].
Claim: the population regression coefficient β3 is equal to τATT under
the DiD assumptions.
Why? The regression above models the CEF as:
E [Yit |Di = 0, Postt = 0] = β0
E [Yit |Di = 0, Postt = 1] = β0 + β1
E [Yit |Di = 1, Postt = 0] = β0 + β2
E [Yit |Di = 1, Postt = 1] = β0 + β1 + β2 + β3
Thus,
β3 =(E [Yit |Di = 1, Postt = 1] − E [Yit |Di = 1, Postt = 0])−
(E [Yit |Di = 0, Postt = 1] − E [Yit |Di = 0, Postt = 0]) = τATT
Analogously, βˆ3 = (Ȳ12 − Ȳ11 ) − (Ȳ02 − Ȳ01 ) = τ̂ATT
13
Example

Suppose we take the Hastings data from June/October and estimate


Yit = β0 + β1 × Postt + β2 Di + β3 Di × Postt + εit ,
via OLS, where Postt is 1 for October and 0 for June.

Constant (βˆ0 ) 1.28


Post (βˆ1 ) 0.13
We get the regression coefficients:
Treated (βˆ2 ) -0.03
Treated × Post (βˆ3 ) 0.05
14
DiD with Multiple Periods

Often we have more that 2 periods for a DiD analysis

This is useful for two reasons:


1 We can test whether parallel trends appears to hold prior to treatment
2 We can analyze how the ATT changes over time

How do we do this?

15
DiD with Multiple periods
Suppose that we have periods t = −T , ..., T̄ . Treated units begin
getting treatment at period 1.

For each period s ̸= 0, we can estimate a 2-period DiD between period


s and period 0:

βˆs = (Ȳ1s − Ȳ0s ) − (Ȳ10 − Ȳ00 )


| {z } | {z }
Diff in period s Diff in period 0

where Ȳdt is the average for treatment group d in period t.

Conveniently, the βˆs are equal to the OLS estimates of the regression

Yit = φt + Di γ + ∑ Di × 1[t = s] × βs + εit


s̸=0

You can also replace Di γ with a unit fixed effect λi and you get the
exact same βˆs .
16
Example - Medicaid Expansion

The Affordable Care Act (ACA, aka Obamacare) expanded Medicaid


coverage to people with income up to 138% of the federal poverty line

Medicaid expansion went into effect in 2014. However, some


Republican-leaning states opted out of expanded coverage.

By 2015, 24 states had expanded Medicaid (more have done so since)

Carey, Miller, and Wherry (2020) study the impacts of Medicaid


expansion using a DiD design comparing early-adopting states to
non-adopters.

17
Example - Medicaid Expansion
A slightly simplified version of their regression specification is

Yits = φt + λs + ∑ Di × 1[t = 2014 + r ] × βr + εit


r ̸=−1
where Yits is outcome for person i in year t in state s, and Di = 1 if in
an expansion state. Lets plot the βs estimates and 95% CIs:

Results show similar “pre-trends” but negative effects after treatment


18
In a related paper, some of the same authors used a similar research design
to estimate the impacts on mortality

19
Some Caution about Parallel Trends

DiD relies on the parallel trends assumption, which allows for selection
bias but requires it to be stable over time. This rules out time-varying
confounding factors.

Often we will be worried about time-varying confounds — e.g.,


macro-economic factors might differentially affect Democratic versus
Republican states

Testing for pre-treatment differences (“pre-trends”) can help increase


our confidence in the research design. But they’re not perfect. Why?
1 Just because trends were parallel beforehand doesn’t mean that they
would continue to be afterwards
2 Often our estimates of pre-trends are noisy so we’re not sure whether
they’re actually zero or not.

20
In addition to looking at the point estimates of pre-trends, it’s
important to consider what the CIs rule out

A good rule of thumb for whether a plot is convincing is whether you


can draw a smooth line through all the confidence intervals

Are you convinced there’s an effect here?

21
In addition to looking at the point estimates of pre-trends, it’s
important to consider what the CIs rule out

A good rule of thumb for whether a plot is convincing is whether you


can draw a smooth line through all the confidence intervals

Are you convinced there’s an effect here? Maybe not!

21
In addition to looking at the point estimates of pre-trends, it’s
important to consider what the CIs rule out

A good rule of thumb for whether a plot is convincing is whether you


can draw a smooth line through all the confidence intervals

What about here?

21
In addition to looking at the point estimates of pre-trends, it’s
important to consider what the CIs rule out

A good rule of thumb for whether a plot is convincing is whether you


can draw a smooth line through all the confidence intervals

And here?

21
Standard Errors for Panel Regressions

We know how to get standard errors for OLS estimates of

Yi = Xi′ β + ei
when (Yi , Xi ) are drawn iid.

Now, we have

Yit = Xit′ β + eit

Is it reasonable to assume that (Yit , Xit ) are iid across i and t? No


1 We expect Yi1 to be correlated with Yi2 , e.g., people with high earnings
in 2010 also tend to have higher earnings in 2011. This is called serial
autocorrelation
2 More subtlely, if treatment is assigned at the state level, all people in a
given state will have the same value of Dit (which is included in Xit )

22
Clustered Standard Errors
Clustered standard errors extend the OLS variance formula to allow
(Yit , Xit ) to be correlated across observations in the same “cluster”

The assumption is that each cluster is sampled independently.

For example, if we cluster at the individual level (i), then we allow for
Yi1 and Yi2 to be dependent, but assume (Yi1 , Yi2 ) is independent of
(Yj1 , Yj2 ) for j ̸= i

In panel analyses, you should at minimum cluster at the individual


level to allow for autocorrelation.

If treatment is assigned at a more aggregate level, it is best to cluster


at the level where treatment is assigned.

Keep in mind: the number of “effective observations” (used for CLT)


is the number of clusters
Clustered SEs will not be reliable when the number of clusters is very
small (e.g. < 20)
23
XKCD

24
Implementing Clustered SEs

Implementing clustered SEs in Stata is very easy


Just replace
reg y x, robust
with
reg y x, cluster(clustervar)

25
26
27
Outline

1. Diff-in-Diff Basics✓

2. DiD Meets Regression✓

3. The DiD Frontier

28
A Very Famous DiD

Card and Krueger (1994) ask: how does the minimum wage affect
employment?

How would you expect the MW to affect employment, based on what


you learned in micro-economic theory?
In a competitive market, a floor on wages (i.e. the price of labor),
should induce a decrease in demand

To study this, CK study an episode in 1992 where NJ raised its


minimum wage from $4.25 to $5.05

They use a DiD comparing change in employment in fast food


restaurants in NJ to that in neighboring PA , where the MW was flat
at $4.25

29
Point estimates suggest an increase in employment of 2.76 FTEs, but
not statisticially signfiicant.

30
Why?!

The result that an increase in the MW does not seem to decrease


employment was very surprising (and controversial) at the time

One explanation for this finding is that labor markets are not perfectly
competitive. Rather, firms are monopsonistic
Consider a firm the employs 100 workers at $7/hour.

Suppose hiring another worker would produce an extra $10 of profit,


but would require raising the wage to $8/hour.

Should the firm raise the raise to $8/hour? Not if it means they have to
pay all 100 workers an extra $1!

However, if the MW is raised to $8/hour, then the firm has to pay the
first 100 workers $8 anyway, and would gladly hire the 101st worker at
$8/hour since this brings $10 of profit.

31
By modern standards, the CK analysis is perhaps not the most
convincing

The two states do not move exactly in parallel even before the policy
change in April 1992. We also only have 2 states!
32
Staggered Timing
Next I’ll show you some more modern evidence on the MW.

But first we need to discuss DiD when treatment timing is staggered –


e.g., states pass minimum wages in different years

Until about 5 years ago, people extended DiD to the staggered setting
by running OLS regressions like:
Yit = φi + λt + Dit β + eit
where Dit = 1 if unit i is treated in period t.

In the two-period model, this corresponds to the diff-in-diff in sample


means between treatment and control

Unfortunately, it turns out that this estimator is not an average of


DiDs between treated and untreated units in the staggered case.
See Borusyak and Jaravel (2016), de Chaisemartin and D’Haultfoeuille
(2020), Goodman-Bacon (2021)
33
Over the last few years, there has been a lot of research about “fixing”
the issues with these regressions

The solutions typically involve making “clean comparisons” by hand


1 For units first treated in year g, compare outcome change between
g − 1 and g + k to that of units who weren’t treated over that period
2 This is an estimate of the effect k years after treatment for cohort g
3 Do this for every g, and then aggregate them to get an average effect

There are many implementations of this and related approaches,


including Callaway and Sant’Anna (2020), Sun and Abraham (2020),
Borusyak, Jaravel & Spiess (2021)

34
Cengiz et al (2019) do a modern version of C&K using 138 MW
changes between 1976 and 2016

For each state that changes its MW, they take a “control group” of
states that didn’t change their MW in the 4 years before/after

They compute a DiD between the treated state and the matched
control states

They then take a weighted average of these DiDs to get an overall


average effect

35
38
Important Considerations/Caveats

Historical MW changes in the MW have been fairly modest


Not clear that changes in MW from $4.25 to $5.05 are informative
abour raises from $7.25 to $15!

Historical analyses of MW are typically relatively short-run


Over long-run, MW increases may induce shifts in technology that
replace workers

There is still some debate among economists over whether MWs


reduce employment!

39
Other Panel Data Methods

We’ve focused on DiD, which is the most commonly-used panel data


method in applied micro-economics

But there are many others:


Controls for lagged dependent variables
Synthetic control
Matrix completion

We won’t have time to cover these, but if you’re interested, I suggest


taking more econometrics classes :)

40

You might also like