Regression Discontinuity Design
Isac Olave a
a Paris-Dauphine|PSL
[email protected]
Advance Econometrics
Master 2: Industries de Réseau et Économie Numérique (IREN)
August 23, 2023
Isac Olave RDD August 23, 2023 1 / 29
Overview
1 Introduction
2 Empirical design
3 Estimating and RDD model
4 Example
Isac Olave RDD August 23, 2023 2 / 29
Overview
1 Introduction
2 Empirical design
3 Estimating and RDD model
4 Example
Isac Olave RDD August 23, 2023 3 / 29
Introduction
Regression Discontinuity Design (RDD) is another quasi-experimental
model. What does it mean?
"... in a highly rule-based world, some rules are arbitrary and therefore
provide good experiments".
Very powerful design but restricted in terms of assumptions.
A little bit of history
▶ The idea was rst developed by Campbell in 1960. But, it was
resurrected in 1999.
▶ In 2010, Lee and Lemieux published the guidelines for RDD. About
1,500 new papers using RDD were published.
▶ In 2020, 5,600 papers were published using this method.
▶ Today it is quite hard to publish an RDD paper.
Isac Olave RDD August 23, 2023 4 / 29
Introduction
As we said, RDD is based on arbitrary rules.
Isac Olave RDD August 23, 2023 5 / 29
Introduction
As we said, RDD is based on arbitrary rules.
Can you think of a rule (policy) that could separate populations in two
groups?
Isac Olave RDD August 23, 2023 5 / 29
Introduction
As we said, RDD is based on arbitrary rules.
Can you think of a rule (policy) that could separate populations in two
groups?
▶ Age to enter school (6 years old).
▶ Underweight newborns use incubators (2.5kg).
▶ Income thresholds for the tax level.
▶ Who receives social aid? e.g., Conditional Cash Transfer programs.
Isac Olave RDD August 23, 2023 5 / 29
Introduction
As we said, RDD is based on arbitrary rules.
Can you think of a rule (policy) that could separate populations in two
groups?
▶ Age to enter school (6 years old).
▶ Underweight newborns use incubators (2.5kg).
▶ Income thresholds for the tax level.
▶ Who receives social aid? e.g., Conditional Cash Transfer programs.
All this examples have something in common: the cuto k split the
population in two groups.
Isac Olave RDD August 23, 2023 5 / 29
Introduction
As we said, RDD is based on arbitrary rules.
Can you think of a rule (policy) that could separate populations in two
groups?
▶ Age to enter school (6 years old).
▶ Underweight newborns use incubators (2.5kg).
▶ Income thresholds for the tax level.
▶ Who receives social aid? e.g., Conditional Cash Transfer programs.
All this examples have something in common: the cuto k split the
population in two groups.
if weitgh <= 2.5kg
(
Incubator
N ewborns =
Bed if weitgh > 2.5kg
Isac Olave RDD August 23, 2023 5 / 29
Introduction
As we said, RDD is based on arbitrary rules.
Can you think of a rule (policy) that could separate populations in two
groups?
▶ Age to enter school (6 years old).
▶ Underweight newborns use incubators (2.5kg).
▶ Income thresholds for the tax level.
▶ Who receives social aid? e.g., Conditional Cash Transfer programs.
All this examples have something in common: the cuto k split the
population in two groups.
1 if weitgh <= 2.5kg
(
Di =
0 if weitgh > 2.5kg
Isac Olave RDD August 23, 2023 5 / 29
Overview
1 Introduction
2 Empirical design
3 Estimating and RDD model
4 Example
Isac Olave RDD August 23, 2023 6 / 29
Intuition
Suppose you are interested in measuring the eect of attending a
prestigious college on earnings.
Prestigious colleges receive better students who get better wages in
the labor market (selectivity bias).
Isac Olave RDD August 23, 2023 7 / 29
Intuition
Suppose you are interested in measuring the eect of attending a
prestigious college on earnings.
Prestigious colleges receive better students who get better wages in
the labor market (selectivity bias).
Hoekstra (2009) designed an experiment in the US collecting
information on college enrollment.
Isac Olave RDD August 23, 2023 7 / 29
Intuition
Suppose you are interested in measuring the eect of attending a
prestigious college on earnings.
Prestigious colleges receive better students who get better wages in
the labor market (selectivity bias).
Hoekstra (2009) designed an experiment in the US collecting
information on college enrollment.
One criteria in the application procedure is the SAT score:
Y es(= 1) if SAT >= k
(
Admitted =
N o(= 0) if SAT < k
Isac Olave RDD August 23, 2023 7 / 29
Intuition
Suppose you are interested in measuring the eect of attending a
prestigious college on earnings.
Prestigious colleges receive better students who get better wages in
the labor market (selectivity bias).
Hoekstra (2009) designed an experiment in the US collecting
information on college enrollment.
One criteria in the application procedure is the SAT score:
Y es(= 1) if SAT >= k
(
Admitted =
N o(= 0) if SAT < k
The SAT score is called the running variable and k is the cuto point.
Isac Olave RDD August 23, 2023 7 / 29
Visual inspection
Relationship of the enrollment rate to a prestigiuos university and the
SAT score Hoekstra (2009) found:
Isac Olave RDD August 23, 2023 8 / 29
Visual inspection
But the author is interested in future earnings.
So he collected earnings from tax reports 10 years after enrollment.
Isac Olave RDD August 23, 2023 9 / 29
Visual inspection
The author found that, exactly at the cuto point, workers who barely
enter a prestigious university earned 10% more than those who barely
missed the cut.
Isac Olave RDD August 23, 2023 10 / 29
Overview
1 Introduction
2 Empirical design
3 Estimating and RDD model
4 Example
Isac Olave RDD August 23, 2023 11 / 29
Types of designs
There are two kinds of RDDs:
1 Sharp: The assignment to treatment goes from 0 to 1.
2 Fuzzy: The assignment is not deterministic.
Isac Olave RDD August 23, 2023 12 / 29
Types of designs
What is the case in the following example?
Isac Olave RDD August 23, 2023 13 / 29
The Sharp design
Formally, in the sharp design with Xi the running variable:
1 if Xi >= k
(
Di =
0 if Xi < k
Note also that
Yi0 = α + βXi
Yi1 = Yi0 + δ
The identication strategy is:
Yi = α + βXi + δDi + ϵi
Where δ is the causal eect of interest.
Isac Olave RDD August 23, 2023 14 / 29
The validity of the empirical design
The continuity assumption is the key identifying assumption.
In the absence of treatment, Yi is a continuous (smooth) function of
the running variable.
I.e., in the absence of treatment the expected potential outcomes Yi1
and Yi0 wouldn't have jumped.
Isac Olave RDD August 23, 2023 15 / 29
The validity of the empirical design
The continuity assumption is the key identifying assumption.
In the absence of treatment, Yi is a continuous (smooth) function of
the running variable.
I.e., in the absence of treatment the expected potential outcomes Yi1
and Yi0 wouldn't have jumped.
Intuition when the assumption holds:
▶ Endogeneity is ruled out at the cuto. If Yi is not jumping why other
variables would jump? (no omitting variables)
Isac Olave RDD August 23, 2023 15 / 29
The validity of the empirical design
The continuity assumption is the key identifying assumption.
In the absence of treatment, Yi is a continuous (smooth) function of
the running variable.
I.e., in the absence of treatment the expected potential outcomes Yi1
and Yi0 wouldn't have jumped.
Intuition when the assumption holds:
▶ Endogeneity is ruled out at the cuto. If Yi is not jumping why other
variables would jump? (no omitting variables)
Is it possible to test for this assumption using the data?
Isac Olave RDD August 23, 2023 15 / 29
The validity of the empirical design
The continuity assumption is the key identifying assumption.
In the absence of treatment, Yi is a continuous (smooth) function of
the running variable.
I.e., in the absence of treatment the expected potential outcomes Yi1
and Yi0 wouldn't have jumped.
Intuition when the assumption holds:
▶ Endogeneity is ruled out at the cuto. If Yi is not jumping why other
variables would jump? (no omitting variables)
Is it possible to test for this assumption using the data?
Not really!
Isac Olave RDD August 23, 2023 15 / 29
Challenges to the identication
The continuous assumption is violated in the following cases:
Isac Olave RDD August 23, 2023 16 / 29
Challenges to the identication
The continuous assumption is violated in the following cases:
1 The assignment rule is known in advance.
Isac Olave RDD August 23, 2023 16 / 29
Challenges to the identication
The continuous assumption is violated in the following cases:
1 The assignment rule is known in advance.
2 Agents are interested in adjusting.
Isac Olave RDD August 23, 2023 16 / 29
Challenges to the identication
The continuous assumption is violated in the following cases:
1 The assignment rule is known in advance.
2 Agents are interested in adjusting.
3 Agents have time to adjust.
Isac Olave RDD August 23, 2023 16 / 29
Challenges to the identication
The continuous assumption is violated in the following cases:
1 The assignment rule is known in advance.
2 Agents are interested in adjusting.
3 Agents have time to adjust.
4 The cuto is endogenous to factors that independently cause potential
outcomes to shift.
5 There is nonrandom heaping along the running variable.
Isac Olave RDD August 23, 2023 16 / 29
The McCrary density test
Even if the continuous assumption is not testable, there are some tests
to dig deeper in the validity of the design.
Isac Olave RDD August 23, 2023 17 / 29
The McCrary density test
Even if the continuous assumption is not testable, there are some tests
to dig deeper in the validity of the design.
The McCrary density test is used to check whether units are sorting at
the cuto.
Isac Olave RDD August 23, 2023 17 / 29
The McCrary density test
Even if the continuous assumption is not testable, there are some tests
to dig deeper in the validity of the design.
The McCrary density test is used to check whether units are sorting at
the cuto.
If the design is valid, then the density is smooth at the cuto.
Isac Olave RDD August 23, 2023 17 / 29
The McCrary density test
Even if the continuous assumption is not testable, there are some tests
to dig deeper in the validity of the design.
The McCrary density test is used to check whether units are sorting at
the cuto.
If the design is valid, then the density is smooth at the cuto.
Isac Olave RDD August 23, 2023 17 / 29
Covariate balance
Under the assumption, any observable discontinuous change in the
average values of observable covariates around the cuto.
Isac Olave RDD August 23, 2023 18 / 29
Other important concerns I
Another relevant concern is the spurious eect for misspecied
models.
Isac Olave RDD August 23, 2023 19 / 29
Other important concerns I
Another relevant concern is the spurious eect for misspecied models.
The sharp design specied above is
Yi = α + βXi + δDi + ϵi
Isac Olave RDD August 23, 2023 19 / 29
Other important concerns I
Another relevant concern is the spurious eect for misspecied models.
The sharp design specied above is
Yi = α + βXi + δDi + ϵi
However, assuming linearity in the running variable is not always
correct.
In fact, the model is exible for any specication:
Yi = α + βf (Xi ) + δDi + ϵi
Isac Olave RDD August 23, 2023 19 / 29
Other important concerns I
Another relevant concern is the spurious eect for misspecied models.
The sharp design specied above is
Yi = α + βXi + δDi + ϵi
However, assuming linearity in the running variable is not always
correct.
In fact, the model is exible for any specication:
Yi = α + βf (Xi ) + δDi + ϵi
For instance quadratic
Yi = α + β1 Xi + β2 Xi2 + δDi + ϵi
Isac Olave RDD August 23, 2023 19 / 29
Other important concerns I
Misspecications could conduct to spurious eect:
Rule of thumb: Always show dierent specications. The eect is
expected to be the same all the time.
Isac Olave RDD August 23, 2023 19 / 29
Other important concerns II
RDD is a local eect, i.e., only true at the cuto.
Why we use then more information?
Isac Olave RDD August 23, 2023 20 / 29
Other important concerns II
RDD is a local eect, i.e., only true at the cuto.
Why we use then more information?
Trade-o:
More information gives more statistical power but it may bias the
estimate.
A reduced number of information is more precise, but less powerful.
Isac Olave RDD August 23, 2023 20 / 29
Other important concerns II
RDD is a local eect, i.e., only true at the cuto.
Why we use then more information?
Trade-o:
More information gives more statistical power but it may bias the
estimate.
A reduced number of information is more precise, but less powerful.
This problem is known as the Bandwidth selection.
Rule of thumb: show dierent bandwidths. The eect is expected to
be the same all the time.
Isac Olave RDD August 23, 2023 20 / 29
The Fuzzy design
In the Fuzzy design, units are assigned to treatment in a not
deterministic way:
Isac Olave RDD August 23, 2023 21 / 29
The Fuzzy design
We could follow an IV approach as follows:
Isac Olave RDD August 23, 2023 21 / 29
The Fuzzy design
We could follow an IV approach as follows:
Lets dene Zi = 1 if Xi >= k and 0 otherwise.
Isac Olave RDD August 23, 2023 21 / 29
The Fuzzy design
We could follow an IV approach as follows:
Lets dene Zi = 1 if Xi >= k and 0 otherwise.
Lets dene Di = 1 if Xi received treatment.
Isac Olave RDD August 23, 2023 21 / 29
The Fuzzy design
We could follow an IV approach as follows:
Lets dene Zi = 1 if Xi >= k and 0 otherwise.
Lets dene Di = 1 if Xi received treatment.
Then, the rst stage is:
Di = α + βf (Xi ) + πZi + ψi
Isac Olave RDD August 23, 2023 21 / 29
The Fuzzy design
We could follow an IV approach as follows:
Lets dene Zi = 1 if Xi >= k and 0 otherwise.
Lets dene Di = 1 if Xi received treatment.
Then, the rst stage is:
Di = α + βf (Xi ) + πZi + ψi
And the second stage is:
Yi = γ + δf (Xi ) + κD̂i + ξi
The eect is then κ̂
Isac Olave RDD August 23, 2023 21 / 29
Overview
1 Introduction
2 Empirical design
3 Estimating and RDD model
4 Example
Isac Olave RDD August 23, 2023 22 / 29
Anderson and Magruder (2012) Learning from the crowd
Uncertainty of the quality of a product before purchasing
▶ Experts opinions, guarantees, etc.
▶ Social learning (peers and family), mouth-to-mouth.
▶ Online large-scale consumer evaluations (digital word-of-mouth)
Identication challenge: causal impact of online review on sales.
Endogeneity: products that receive higher notes are of better quality
RD design: Similar product, dierent notes => sales
Isac Olave RDD August 23, 2023 23 / 29
Anderson and Magruder (2012) Learning from the crowd
The authors use data on Yelp.com about restaurant ratings and
availability.
Take the average of all ratings received by the business and round o
to the nearest half-star.
When the RDD design be invalid?
▶ If the restaurants can manipulate the average ratings to select which
side they want to be.
▶ If ratings cause other covariates to jump. Appearances in google.
Isac Olave RDD August 23, 2023 24 / 29
Anderson and Magruder (2012) Learning from the crowd
Visual evidence:
Isac Olave RDD August 23, 2023 25 / 29
Anderson and Magruder (2012) Learning from the crowd
Visual evidence:
Isac Olave RDD August 23, 2023 25 / 29
Anderson and Magruder (2012) Learning from the crowd
Empirical strategy
yit = α + βDRit + γf (Rit ) + ϵit
Where:
yit = Booking indicator in restaurant i on date t
DRit = Rating displayed in Yelp.
Rit = Actual average rating.
Isac Olave RDD August 23, 2023 26 / 29
Anderson and Magruder (2012) Learning from the crowd
Main results:
Isac Olave RDD August 23, 2023 27 / 29
Other readings
Angrist and Lavy (1999)
Jacob et al. (2012)
Asadullah (2005)
Lee et al. (2004)
Lee and Lemieux (2010)
Isac Olave RDD August 23, 2023 28 / 29
References I
Anderson, M. and Magruder, J. (2012). Learning from the crowd: Regression discontinuity estimates of the eects
of an online review database. The Economic Journal, 122(563):957989.
Angrist, J. D. and Lavy, V. (1999). Using maimonides' rule to estimate the eect of class size on scholastic
achievement. The Quarterly journal of economics, 114(2):533575.
Asadullah, M. N. (2005). The eect of class size on student achievement: Evidence from bangladesh. Applied
Economics Letters, 12(4):217221.
Hoekstra, M. (2009). The eect of attending the agship state university on earnings: A discontinuity-based
approach. The review of economics and statistics, 91(4):717724.
Jacob, R., Zhu, P., Somers, M.-A., and Bloom, H. (2012). A practical guide to regression discontinuity. MDRC.
Lee, D. S. and Lemieux, T. (2010). Regression discontinuity designs in economics. Journal of economic literature,
48(2):281355.
Lee, D. S., Moretti, E., and Butler, M. J. (2004). Do voters aect or elect policies? evidence from the us house.
The Quarterly Journal of Economics, 119(3):807859.
Isac Olave RDD August 23, 2023 29 / 29