Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
38 views59 pages

Regression Discontinuity Design: Isac Olave

The document discusses regression discontinuity design (RDD), a quasi-experimental method that exploits arbitrary rules that divide a population into two groups. RDD relies on a "running variable" and a "cutoff point" that determines group assignment. There are two types of RDD: sharp, where assignment is deterministic, and fuzzy, where it is not. The key assumption is that absent treatment, outcomes would vary continuously with the running variable, allowing causal effects to be identified by comparing outcomes just above and below the cutoff point.

Uploaded by

Davi Marim
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views59 pages

Regression Discontinuity Design: Isac Olave

The document discusses regression discontinuity design (RDD), a quasi-experimental method that exploits arbitrary rules that divide a population into two groups. RDD relies on a "running variable" and a "cutoff point" that determines group assignment. There are two types of RDD: sharp, where assignment is deterministic, and fuzzy, where it is not. The key assumption is that absent treatment, outcomes would vary continuously with the running variable, allowing causal effects to be identified by comparing outcomes just above and below the cutoff point.

Uploaded by

Davi Marim
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 59

Regression Discontinuity Design

Isac Olave a
a Paris-Dauphine|PSL
[email protected]

Advance Econometrics

Master 2: Industries de Réseau et Économie Numérique (IREN)

August 23, 2023

Isac Olave RDD August 23, 2023 1 / 29


Overview

1 Introduction

2 Empirical design

3 Estimating and RDD model

4 Example

Isac Olave RDD August 23, 2023 2 / 29


Overview

1 Introduction

2 Empirical design

3 Estimating and RDD model

4 Example

Isac Olave RDD August 23, 2023 3 / 29


Introduction

Regression Discontinuity Design (RDD) is another quasi-experimental


model. What does it mean?
"... in a highly rule-based world, some rules are arbitrary and therefore
provide good experiments".
Very powerful design but restricted in terms of assumptions.
A little bit of history
▶ The idea was rst developed by Campbell in 1960. But, it was

resurrected in 1999.

▶ In 2010, Lee and Lemieux published the guidelines for RDD. About

1,500 new papers using RDD were published.

▶ In 2020, 5,600 papers were published using this method.

▶ Today it is quite hard to publish an RDD paper.

Isac Olave RDD August 23, 2023 4 / 29


Introduction

As we said, RDD is based on arbitrary rules.

Isac Olave RDD August 23, 2023 5 / 29


Introduction

As we said, RDD is based on arbitrary rules.


Can you think of a rule (policy) that could separate populations in two
groups?

Isac Olave RDD August 23, 2023 5 / 29


Introduction

As we said, RDD is based on arbitrary rules.


Can you think of a rule (policy) that could separate populations in two
groups?
▶ Age to enter school (6 years old).

▶ Underweight newborns use incubators (2.5kg).

▶ Income thresholds for the tax level.

▶ Who receives social aid? e.g., Conditional Cash Transfer programs.

Isac Olave RDD August 23, 2023 5 / 29


Introduction

As we said, RDD is based on arbitrary rules.


Can you think of a rule (policy) that could separate populations in two
groups?
▶ Age to enter school (6 years old).

▶ Underweight newborns use incubators (2.5kg).

▶ Income thresholds for the tax level.

▶ Who receives social aid? e.g., Conditional Cash Transfer programs.

All this examples have something in common: the cuto k split the
population in two groups.

Isac Olave RDD August 23, 2023 5 / 29


Introduction
As we said, RDD is based on arbitrary rules.
Can you think of a rule (policy) that could separate populations in two
groups?
▶ Age to enter school (6 years old).

▶ Underweight newborns use incubators (2.5kg).

▶ Income thresholds for the tax level.

▶ Who receives social aid? e.g., Conditional Cash Transfer programs.

All this examples have something in common: the cuto k split the
population in two groups.

if weitgh <= 2.5kg


(
Incubator
N ewborns =
Bed if weitgh > 2.5kg

Isac Olave RDD August 23, 2023 5 / 29


Introduction
As we said, RDD is based on arbitrary rules.
Can you think of a rule (policy) that could separate populations in two
groups?
▶ Age to enter school (6 years old).

▶ Underweight newborns use incubators (2.5kg).

▶ Income thresholds for the tax level.

▶ Who receives social aid? e.g., Conditional Cash Transfer programs.

All this examples have something in common: the cuto k split the
population in two groups.

1 if weitgh <= 2.5kg


(
Di =
0 if weitgh > 2.5kg

Isac Olave RDD August 23, 2023 5 / 29


Overview

1 Introduction

2 Empirical design

3 Estimating and RDD model

4 Example

Isac Olave RDD August 23, 2023 6 / 29


Intuition

Suppose you are interested in measuring the eect of attending a


prestigious college on earnings.
Prestigious colleges receive better students who get better wages in
the labor market (selectivity bias).

Isac Olave RDD August 23, 2023 7 / 29


Intuition

Suppose you are interested in measuring the eect of attending a


prestigious college on earnings.
Prestigious colleges receive better students who get better wages in
the labor market (selectivity bias).
Hoekstra (2009) designed an experiment in the US collecting
information on college enrollment.

Isac Olave RDD August 23, 2023 7 / 29


Intuition

Suppose you are interested in measuring the eect of attending a


prestigious college on earnings.
Prestigious colleges receive better students who get better wages in
the labor market (selectivity bias).
Hoekstra (2009) designed an experiment in the US collecting
information on college enrollment.
One criteria in the application procedure is the SAT score:

Y es(= 1) if SAT >= k


(
Admitted =
N o(= 0) if SAT < k

Isac Olave RDD August 23, 2023 7 / 29


Intuition

Suppose you are interested in measuring the eect of attending a


prestigious college on earnings.
Prestigious colleges receive better students who get better wages in
the labor market (selectivity bias).
Hoekstra (2009) designed an experiment in the US collecting
information on college enrollment.
One criteria in the application procedure is the SAT score:

Y es(= 1) if SAT >= k


(
Admitted =
N o(= 0) if SAT < k

The SAT score is called the running variable and k is the cuto point.

Isac Olave RDD August 23, 2023 7 / 29


Visual inspection
Relationship of the enrollment rate to a prestigiuos university and the
SAT score Hoekstra (2009) found:

Isac Olave RDD August 23, 2023 8 / 29


Visual inspection
But the author is interested in future earnings.
So he collected earnings from tax reports 10 years after enrollment.

Isac Olave RDD August 23, 2023 9 / 29


Visual inspection

The author found that, exactly at the cuto point, workers who barely
enter a prestigious university earned 10% more than those who barely
missed the cut.

Isac Olave RDD August 23, 2023 10 / 29


Overview

1 Introduction

2 Empirical design

3 Estimating and RDD model

4 Example

Isac Olave RDD August 23, 2023 11 / 29


Types of designs
There are two kinds of RDDs:
1 Sharp: The assignment to treatment goes from 0 to 1.

2 Fuzzy: The assignment is not deterministic.

Isac Olave RDD August 23, 2023 12 / 29


Types of designs
What is the case in the following example?

Isac Olave RDD August 23, 2023 13 / 29


The Sharp design
Formally, in the sharp design with Xi the running variable:

1 if Xi >= k
(
Di =
0 if Xi < k

Note also that


Yi0 = α + βXi
Yi1 = Yi0 + δ

The identication strategy is:


Yi = α + βXi + δDi + ϵi

Where δ is the causal eect of interest.

Isac Olave RDD August 23, 2023 14 / 29


The validity of the empirical design

The continuity assumption is the key identifying assumption.


In the absence of treatment, Yi is a continuous (smooth) function of
the running variable.
I.e., in the absence of treatment the expected potential outcomes Yi1
and Yi0 wouldn't have jumped.

Isac Olave RDD August 23, 2023 15 / 29


The validity of the empirical design

The continuity assumption is the key identifying assumption.


In the absence of treatment, Yi is a continuous (smooth) function of
the running variable.
I.e., in the absence of treatment the expected potential outcomes Yi1
and Yi0 wouldn't have jumped.

Intuition when the assumption holds:


▶ Endogeneity is ruled out at the cuto. If Yi is not jumping why other
variables would jump? (no omitting variables)

Isac Olave RDD August 23, 2023 15 / 29


The validity of the empirical design

The continuity assumption is the key identifying assumption.


In the absence of treatment, Yi is a continuous (smooth) function of
the running variable.
I.e., in the absence of treatment the expected potential outcomes Yi1
and Yi0 wouldn't have jumped.

Intuition when the assumption holds:


▶ Endogeneity is ruled out at the cuto. If Yi is not jumping why other
variables would jump? (no omitting variables)

Is it possible to test for this assumption using the data?

Isac Olave RDD August 23, 2023 15 / 29


The validity of the empirical design

The continuity assumption is the key identifying assumption.


In the absence of treatment, Yi is a continuous (smooth) function of
the running variable.
I.e., in the absence of treatment the expected potential outcomes Yi1
and Yi0 wouldn't have jumped.

Intuition when the assumption holds:


▶ Endogeneity is ruled out at the cuto. If Yi is not jumping why other
variables would jump? (no omitting variables)

Is it possible to test for this assumption using the data?


Not really!

Isac Olave RDD August 23, 2023 15 / 29


Challenges to the identication

The continuous assumption is violated in the following cases:

Isac Olave RDD August 23, 2023 16 / 29


Challenges to the identication

The continuous assumption is violated in the following cases:


1 The assignment rule is known in advance.

Isac Olave RDD August 23, 2023 16 / 29


Challenges to the identication

The continuous assumption is violated in the following cases:


1 The assignment rule is known in advance.

2 Agents are interested in adjusting.

Isac Olave RDD August 23, 2023 16 / 29


Challenges to the identication

The continuous assumption is violated in the following cases:


1 The assignment rule is known in advance.

2 Agents are interested in adjusting.

3 Agents have time to adjust.

Isac Olave RDD August 23, 2023 16 / 29


Challenges to the identication

The continuous assumption is violated in the following cases:


1 The assignment rule is known in advance.

2 Agents are interested in adjusting.

3 Agents have time to adjust.

4 The cuto is endogenous to factors that independently cause potential


outcomes to shift.

5 There is nonrandom heaping along the running variable.

Isac Olave RDD August 23, 2023 16 / 29


The McCrary density test

Even if the continuous assumption is not testable, there are some tests
to dig deeper in the validity of the design.

Isac Olave RDD August 23, 2023 17 / 29


The McCrary density test

Even if the continuous assumption is not testable, there are some tests
to dig deeper in the validity of the design.
The McCrary density test is used to check whether units are sorting at
the cuto.

Isac Olave RDD August 23, 2023 17 / 29


The McCrary density test

Even if the continuous assumption is not testable, there are some tests
to dig deeper in the validity of the design.
The McCrary density test is used to check whether units are sorting at
the cuto.
If the design is valid, then the density is smooth at the cuto.

Isac Olave RDD August 23, 2023 17 / 29


The McCrary density test
Even if the continuous assumption is not testable, there are some tests
to dig deeper in the validity of the design.
The McCrary density test is used to check whether units are sorting at
the cuto.
If the design is valid, then the density is smooth at the cuto.

Isac Olave RDD August 23, 2023 17 / 29


Covariate balance
Under the assumption, any observable discontinuous change in the
average values of observable covariates around the cuto.

Isac Olave RDD August 23, 2023 18 / 29


Other important concerns I

Another relevant concern is the spurious eect for misspecied


models.

Isac Olave RDD August 23, 2023 19 / 29


Other important concerns I

Another relevant concern is the spurious eect for misspecied models.


The sharp design specied above is
Yi = α + βXi + δDi + ϵi

Isac Olave RDD August 23, 2023 19 / 29


Other important concerns I

Another relevant concern is the spurious eect for misspecied models.


The sharp design specied above is
Yi = α + βXi + δDi + ϵi

However, assuming linearity in the running variable is not always


correct.
In fact, the model is exible for any specication:
Yi = α + βf (Xi ) + δDi + ϵi

Isac Olave RDD August 23, 2023 19 / 29


Other important concerns I
Another relevant concern is the spurious eect for misspecied models.
The sharp design specied above is
Yi = α + βXi + δDi + ϵi

However, assuming linearity in the running variable is not always


correct.
In fact, the model is exible for any specication:
Yi = α + βf (Xi ) + δDi + ϵi

For instance quadratic


Yi = α + β1 Xi + β2 Xi2 + δDi + ϵi

Isac Olave RDD August 23, 2023 19 / 29


Other important concerns I
Misspecications could conduct to spurious eect:

Rule of thumb: Always show dierent specications. The eect is


expected to be the same all the time.
Isac Olave RDD August 23, 2023 19 / 29
Other important concerns II

RDD is a local eect, i.e., only true at the cuto.


Why we use then more information?

Isac Olave RDD August 23, 2023 20 / 29


Other important concerns II

RDD is a local eect, i.e., only true at the cuto.


Why we use then more information?
Trade-o:
More information gives more statistical power but it may bias the
estimate.
A reduced number of information is more precise, but less powerful.

Isac Olave RDD August 23, 2023 20 / 29


Other important concerns II

RDD is a local eect, i.e., only true at the cuto.


Why we use then more information?
Trade-o:
More information gives more statistical power but it may bias the
estimate.
A reduced number of information is more precise, but less powerful.

This problem is known as the Bandwidth selection.


Rule of thumb: show dierent bandwidths. The eect is expected to
be the same all the time.

Isac Olave RDD August 23, 2023 20 / 29


The Fuzzy design
In the Fuzzy design, units are assigned to treatment in a not
deterministic way:

Isac Olave RDD August 23, 2023 21 / 29


The Fuzzy design

We could follow an IV approach as follows:

Isac Olave RDD August 23, 2023 21 / 29


The Fuzzy design

We could follow an IV approach as follows:


Lets dene Zi = 1 if Xi >= k and 0 otherwise.

Isac Olave RDD August 23, 2023 21 / 29


The Fuzzy design

We could follow an IV approach as follows:


Lets dene Zi = 1 if Xi >= k and 0 otherwise.
Lets dene Di = 1 if Xi received treatment.

Isac Olave RDD August 23, 2023 21 / 29


The Fuzzy design

We could follow an IV approach as follows:


Lets dene Zi = 1 if Xi >= k and 0 otherwise.
Lets dene Di = 1 if Xi received treatment.
Then, the rst stage is:
Di = α + βf (Xi ) + πZi + ψi

Isac Olave RDD August 23, 2023 21 / 29


The Fuzzy design

We could follow an IV approach as follows:


Lets dene Zi = 1 if Xi >= k and 0 otherwise.
Lets dene Di = 1 if Xi received treatment.
Then, the rst stage is:
Di = α + βf (Xi ) + πZi + ψi

And the second stage is:


Yi = γ + δf (Xi ) + κD̂i + ξi

The eect is then κ̂

Isac Olave RDD August 23, 2023 21 / 29


Overview

1 Introduction

2 Empirical design

3 Estimating and RDD model

4 Example

Isac Olave RDD August 23, 2023 22 / 29


Anderson and Magruder (2012) Learning from the crowd

Uncertainty of the quality of a product before purchasing


▶ Experts opinions, guarantees, etc.

▶ Social learning (peers and family), mouth-to-mouth.

▶ Online large-scale consumer evaluations (digital word-of-mouth)

Identication challenge: causal impact of online review on sales.


Endogeneity: products that receive higher notes are of better quality
RD design: Similar product, dierent notes => sales

Isac Olave RDD August 23, 2023 23 / 29


Anderson and Magruder (2012) Learning from the crowd

The authors use data on Yelp.com about restaurant ratings and


availability.
Take the average of all ratings received by the business and round o
to the nearest half-star.
When the RDD design be invalid?
▶ If the restaurants can manipulate the average ratings to select which

side they want to be.

▶ If ratings cause other covariates to jump. Appearances in google.

Isac Olave RDD August 23, 2023 24 / 29


Anderson and Magruder (2012) Learning from the crowd

Visual evidence:

Isac Olave RDD August 23, 2023 25 / 29


Anderson and Magruder (2012) Learning from the crowd

Visual evidence:

Isac Olave RDD August 23, 2023 25 / 29


Anderson and Magruder (2012) Learning from the crowd

Empirical strategy
yit = α + βDRit + γf (Rit ) + ϵit

Where:
yit = Booking indicator in restaurant i on date t
DRit = Rating displayed in Yelp.
Rit = Actual average rating.

Isac Olave RDD August 23, 2023 26 / 29


Anderson and Magruder (2012) Learning from the crowd

Main results:

Isac Olave RDD August 23, 2023 27 / 29


Other readings

Angrist and Lavy (1999)


Jacob et al. (2012)
Asadullah (2005)
Lee et al. (2004)
Lee and Lemieux (2010)

Isac Olave RDD August 23, 2023 28 / 29


References I

Anderson, M. and Magruder, J. (2012). Learning from the crowd: Regression discontinuity estimates of the eects
of an online review database. The Economic Journal, 122(563):957989.

Angrist, J. D. and Lavy, V. (1999). Using maimonides' rule to estimate the eect of class size on scholastic
achievement. The Quarterly journal of economics, 114(2):533575.

Asadullah, M. N. (2005). The eect of class size on student achievement: Evidence from bangladesh. Applied
Economics Letters, 12(4):217221.
Hoekstra, M. (2009). The eect of attending the agship state university on earnings: A discontinuity-based
approach. The review of economics and statistics, 91(4):717724.

Jacob, R., Zhu, P., Somers, M.-A., and Bloom, H. (2012). A practical guide to regression discontinuity. MDRC.
Lee, D. S. and Lemieux, T. (2010). Regression discontinuity designs in economics. Journal of economic literature,
48(2):281355.

Lee, D. S., Moretti, E., and Butler, M. J. (2004). Do voters aect or elect policies? evidence from the us house.
The Quarterly Journal of Economics, 119(3):807859.

Isac Olave RDD August 23, 2023 29 / 29

You might also like