Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
5 views101 pages

RD Lecture Slides

Uploaded by

Sh Ha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views101 pages

RD Lecture Slides

Uploaded by

Sh Ha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 101

2025 Capacity Building Program for Kurdish Experts of Iraq

Lecture: Regression Discontinuity

Trinh Pham
KDI School of Public Policy and Management
September 2025
Roadmap
Introductions and expectations

Review of correlation vs. causality

Introduction to regression discontinuity design (RD)

Regression discontinuity estimation and assumptions

Case study 1: BRIGHT School construction program in Burkina Faso

Case study 2: Saving lives with indexed disaster funds in Mexico

Case study 3: National borders and nature conservation in Brazil

Brainstorming: Research idea using RD in the Kurdish context

Hands-on practice: Replicating Kazianga et al. (2013)

Self-Study with replication package.

2 / 101
About me
Name: Trinh Pham

Education: Ph.D. Applied Economics (Cornell University, USA)

Position: Tenure-track Assistant Professor of Economics, KDI School

Research interests: development, labor, and the environment

What about you?

3 / 101
Quick review: What do we mean by a causal effect?

4 / 101
Correlation vs. Causality
In the summer, we see increases in both ice cream sales and sunburn rates.

Does it mean ice cream sales cause sunburn? Why or why not?

The underlying omitted factor is the hot weather, leading to increased ice cream consumption and more people going to the
beach, thus increasing the risk of sunburn.

Our challenge is to find methodologies that could help us establish true causality.
5 / 101
Let us consider another example.

Suppose we implement a micro-credit program providing loan to entrepreneurs.

We want to know if the program has improved business outcomes or not.

6 / 101
Working example: loan and business profit
The following figure shows how the outcome evolves over time, before the program starts

7 / 101
Working example: loan and business profit
The following figure shows how the outcome evolves over time, after the program starts

Based on this figure, does the program have: positive impact? negative impact? zero impact?

8 / 101
Working example: loan and business profit
The following figure shows how the outcome evolves over time, after the program starts

We do not have enough information to make any statement about the program impact!

9 / 101
What would have happened had the program not started?

10 / 101
Working example: loan and business profit
What would have happened had the program not started? Would it be like this?

11 / 101
Working example: loan and business profit
What would have happened had the program not started? Would it be like this? or this?

12 / 101
Working example: loan and business profit
Impact = Factual − Counterfactual

the outcome some time after the program has started (factual)

the outcome at the same time point had the program not started (counterfactual)

13 / 101
Counterfactual
Getting the factual is easy.

However, the counterfactual cannot be observed.

We address this challenge by constructing the counterfactual.

Typically by selecting a group of individuals who did not participate in the program.

This group is usually referred to as the control group or comparison group.

How to select/construct the comparison group is the key decision in impact evaluation design.

14 / 101
Throughout this program, you have encountered different econometric approaches to measure causal effects.

Randomized Controlled Trials (RCT): gold standard, but not always feasible or ethical.

The idea is to construct statistically similar treatment and control ex-ante.

Difference-in-Differences: comparing the change in before vs. after outcomes for treated units with the change in before vs.
after outcomes for non-treated units.

Key identification assumption: the two groups in parallel trends (even absent treatment).

Instrumental Variable (IV): used to control non-causal correlations.

when participation is endogenous, instrument it with an exogenous factor that influences that program take-up (but
otherwise not correlated with the outcome).

This lecture focuses on another quasi-experimental method: Regression Discontinuity.

15 / 101
Regression Discontinuity (RD): Introduction

16 / 101
Regression discontinuity (RD)
Regression discontinuity (RD) works in very specific circumstances.

When RD works, it is one of the most credible non-experimental methods for causal inference and program evaluation.

Suppose we are interested in studying the effect of selective college enrollment on labor market outcomes.

Can we just compare people who are enrolled in selective colleges with people who are not? Why?

If there is a cutoff in college entrance exam score (e.g., 80) that determines admission:

then people who fall just on either side of the cutoff should not be so different,

so we can compare those just above and below the cutoff.

This is the main idea of regression discontinuity.

17 / 101
RD components
RD requires three components: a score, a cutoff, and a discontinuous treatment assignment rule.

all the units in the study are assigned a value of the score (a.k.a a running or forcing variable),

and the treatment is assigned only to units whose score value exceeds a known cutoff (also called threshold).

There are two types of RD: sharp (treatment jumps from 0 to 1) vs. fuzzy (probability of treatment jumps).
18 / 101
Example: Exam ranking - college admission/enrollment

19 / 101
Example: distance to border and irrigation (spatial RD)

20 / 101
Example: distance to border and deforestation (spatial
RD)

21 / 101
Regression Discontinuity (RD): Formal Framework

22 / 101
RD estimation: sharp RD
Let Xi be the running variable, normalized to be 0 at the cutoff.

f (Xi ) is a flexible function of Xi (linear, quadratic, etc.)

In practice, we can estimate sharp RD in a regression:

Yi = α + βsharp 1{Xi ≥ 0} + f (Xi ) + f (Xi ) × 1{Xi ≥ 0} + εi

Fuzzy RD can be estimated with the "first" and "second" stage IV model:

Di = θ1{Xi ≥ 0} + f (Xi ) + f (Xi ) × 1{Xi ≥ 0} + εi

^ ^
Yi = βf uzzy Di + f (Xi ) + f (Xi ) × Di + ηi

It is also common practice to run the "reduced form" regression:

Yi = βreduced−f orm 1{Xi ≥ 0} + f (Xi ) + f (Xi ) × 1{Xi ≥ 0} + εi

23 / 101
RD assumptions
RD requires that being on either side of the cutoff is as good as random.

Manipulation around cutoff can invalidate RD estimates:

e.g., if people know their exam score and the cutoff, they can find ways to change their scores to get on the other side
(retake the exam).

We want to bolster the credibility of an RD analysis with:

A balance test of no RD jump in pre-treatment outcomes.

A density test (McCrary sorting test) of no "bunching" at the cutoff.

24 / 101
Checklist for RD paper
RD jump visualization

Balance test

Bunching test

Robustness to choice of estimation procedure, kernel, bandwidth

25 / 101
Checklist for RD paper
RD jump visualization

Balance test

Bunching test

Robustness to choice of estimation procedure, kernel, bandwidth

26 / 101
Checklist: RD jump visualization
Lee (2008) studies the effect of Democrat winning on subsequent victory:

Xi vote share margin of victory, Di = 1{Xi ≥ 0} : winning election

Yi : subsequent victory (candidacy/vote share) in an election

27 / 101
Checklist: RD jump visualization
Lee (2008) studies the effect of Democrat winning on subsequent victory

Xi vote share margin of victory, Di = 1{Xi ≥ 0} : winning election

Yi : subsequent victory (candidacy/vote share) in an election

28 / 101
Checklist: RD jump visualization
Lee (2008) studies the effect of Democrat winning on subsequent victory

Xi vote share margin of victory, Di = 1{Xi ≥ 0} : winning election

Yi : subsequent victory (candidacy/vote share) in an election

29 / 101
Checklist: RD jump visualization
Lee (2008) studies the effect of Democrat winning on subsequent victory

Xi vote share margin of victory, Di = 1{Xi ≥ 0} : winning election

Yi : subsequent victory (candidacy/vote share) in an election

30 / 101
A quick aside on graphical presentation
One of the most powerful aspects of regression discontinuity is the ability to present the results graphically.

But what is the right approach?

Raw data is rarely informative.

We would plot a version of the scatter plot but focusing on means within binned areas.

RD jump should be clear across different binning strategies.

See Korting et al. (2024) for discussions on this.

31 / 101
Graphical presentation: Lee (2008)
This is the raw data. There seems to be a little jump across the cutoff but not really informative.

32 / 101
Graphical presentation: Lee (2008)
This is also the raw data but are averaged by 0.1 percent bin.

33 / 101
Graphical presentation: Lee (2008)
This is also the raw data but are averaged by 0.5 percent bin.

34 / 101
Graphical presentation: Lee (2008)
This is also the raw data but are averaged by 4 percent bin.

35 / 101
Checklist for RD paper
RD jump visualization

Balance test

Bunching test

Robustness to choice of estimation procedure, kernel, bandwidth

36 / 101
Checklist: balance test
Lee (2008) studies the effect of Democrat winning on subsequent victory

There is no jump in pre-treatment outcomes.

37 / 101
Checklist: balance test
Lee (2008) studies the effect of Democrat winning on subsequent victory.

There is no jump in pre-treatment outcomes.

38 / 101
Checklist: balance test
Lee (2008) studies the effect of Democrat winning on subsequent victory.

There is no jump in pre-treatment outcomes.

39 / 101
Checklist: balance test
Lee (2008) studies the effect of Democrat winning on subsequent victory.

There is no jump in pre-treatment outcomes.

See Canay and Kamat (2018) for a more complete permutation test (\scriptsize checking whether covariates are approximately
identically distributed on each side of the cutoff).
40 / 101
Checklist for RD paper
RD jump visualization

Balance test

Bunching test

Robustness to choice of estimation procedure, kernel, bandwidth

41 / 101
Checklist: Bunching test
There are two reasons RD might fail:

The cutoff is set systematically, such that confounding factors change discontinuously

The running variable is manipulable

e.g., people can retake exam to get scores above their passing grade

While both of these can be likely to show up as not meet the balance test, manipulation can also be detected by bunching of
the running variable.

McCrary (2008) checks whether the density of the running variable changes discontinuously.

This can be done by rddensity package in Stata.

42 / 101
# Checklist: Bunching test

43 / 101
Checklist for RD paper
RD jump visualization

Balance test

Bunching test

Robustness to choice of estimation procedure, kernel, bandwidth

44 / 101
Checklist: Robustness
The actual estimation is subject to many tuning parameters:

choice of estimation procedure, kernel, bandwidth.

In practice, use defaults unless you have very good reason not to:

Local linear regression, but local quadratic/cubic are also reasonable

Triangular (highest weights at the cutoff, linearly lower further away) or uniform kernel (same weights to all)

Optimized bandwidth from estimation procedures in packages (e.g., Cattaneo et al.'s rdrobust)

For discrete running variables, use rdhonest package

Show robustness to different bandwidths/polynomials (table/graph).

45 / 101
Checklist: Robustness

46 / 101
Example of robustness

47 / 101
RD in practice: The Effects of “Girl-Friendly” Schools: Evidence from the BRIGHT School Construction
Program in Burkina Faso
by Kazianga et al. (2013)

48 / 101
Background: Education in Burkina Faso
Primary education is officially free for ages 6–12, but families often bear some direct costs (e.g., school supplies).

Enrollment rates remained among the world’s lowest, with significant gender gap.

In 2003, net enrollment was 42% for boys and 29% for girls.

The Burkinabé Response to Improve Girls’ Chances to Succeed (BRIGHT) program, was introduced to improve education
outcomes of children in rural villages.

The program started in 2005 and implemented an integrated package of education interventions in 132 rural villages.

placed relatively well-resourced schools with a number of amenities directed at encouraging the enrollment of girls.

Interestingly, to allocate these schools, the Ministry of Education scored each of the 293 villages that requested a school by
the villages’ claims of the number of primary school-aged girls that would be likely to attend a school in their village.

then assigned schools to the highest ranking villages.

This allows a regression discontinuity design to study the impact of schools.

49 / 101
Allocation of BRIGHT Schools
The Ministry of Education designed the allocation process based on a predetermined set of criteria:

Departments nominated 293 villages from 10 provinces and 49 departments, proposing villages with low enrollment levels.

Each village then completed a survey with the assistance of a Ministry staff member.

The Ministry then assigned each village a score based largely on the estimated number of children to be served from the
proposed and neighboring villages, giving additional weight to girls.

Within each department, the Ministry ranked the villages and awarded the top half of the villages a BRIGHT school.

If a department proposed an odd number of villages, the median village did not receive a school.

And for the two departments that only nominated one village, the proposed village was automatically accepted.

50 / 101
Allocation of BRIGHT Schools
This process generated a set of 138 villages that should have received a BRIGHT school.

However, not all villages selected to receive a school did so because some locations proved inappropriate.

Likewise, 5 villages not initially selected received one.

This implies a fuzzy RD (treatment does not jump from 0 to 1, but probability of treatment jumps).

The authors identify, for each department, the lowest score among villages assigned a BRIGHT School and the highest score
among villages not assigned one.

They then define the department-specific cutoff (point of discontinuity) as halfway between these two scores.

Note that because each department has a different cutoff, the authors normalize the scores by creating a variable
Rel_Scorej , calculated as the village's original score minus the cutoff score for its department.

This normalization centers the running variable at zero for each department’s cutoff.

This step is crucial for accounting for the varying cutoffs across departments.

51 / 101
Empirical strategy
Yihjk = β0 + β1 Tj + f (Rel_Scorej ) + δXihjk + γZk + εihjk

where i is the individual child in household h in village j of department k.

Yihjk is the outcome of interest (test score, enrollment, attendance).

Xihjk is a vector of child and household characteristics.

Zk is the department fixed effects.

Tj = 1(Rel_Scorej ≥ 0) : indicator for whether or not a village is at/above the cutoff.

f (Rel_Scorej ) is a polynomial expansion of the running variable.

52 / 101
RD paper checklist
RD jump visualization

Balance test

Bunching test

Robustness to choice of estimation procedure, kernel, bandwidth

53 / 101
First-stage RD jump
The probability of receiving a BRIGHT school increases sharply from nearly zero just below the relative score cutoff, to greater
than 80 percent at the cutoff.

54 / 101
Balance test
Demographic characteristics of the children do not vary discontinuously at zero.

55 / 101
Balance test
Demographic characteristics of the children do not vary discontinuously at zero.

56 / 101
Bunching test
The distribution of villages does not vary discontinuously at zero using the test suggested by McCrary (2008).

57 / 101
ITT (reduced-form) RD jump: enrollment
Enrollment increases by about 20 percentage points across the cutoff.

58 / 101
Summary of results on enrollment

The results are robust to different tuning parameters:

functional forms (linear, quadratic, cubic), bandwidths.


59 / 101
Summary of results: by gender

60 / 101
RD in practice: Saving lives with indexed disaster funds: evidence from Mexico
by del Valle (2024)

61 / 101
Background: Fonden diaster fund
In the last century, there have been 10,514 recorded storms, floods, and other water-related disasters, and 8.3 million deaths
that can be directly attributed to these disasters.

One striking feature of the global distribution of deaths from disasters is that it is strongly skewed toward developing
economies despite disasters not occurring disproportionately in these countries.

A possible contributing factor to the larger death toll in developing economies is the bottlenecks in the provision of disaster
aid, which results in delays in the restoration of critical services, including roads, safe water, and medical infrastructure.

reliance on post-disaster financing

lack of rules and administrative capacity to disburse these resources

62 / 101
Background: Fonden diaster fund
Fonden ensures public infrastructure and low-income housing against natural disasters.

It overcomes bottlenecks in disaster aid delivery through a structured response plan.

Pre-disaster funding: ensures funds are available before a disaster through protected budget allocation, excess loss
reinsurance, catastrophe bonds.

Rules-based disbursement: enforces a structured process for fund distribution, including:

verifying disaster occurrence using indexes (e.g., above certain rainfall threshold)

conducting damage assessments, paying reconstruction service providers, auditing reconstruction projects

del Valle (2024) studies the impact of this fund on mortality.

63 / 101
Empirical strategy
Ymt = α + βAbovemt + g(Rmt ) + εmt

where

Ymt is the outcome in municipality m at disaster time t.

Above = 1 if the running variable (rainfall minus threshold) is nonnegative, 0 otherwise.

The function g(Rmt ) represents a polynomial of order p of the running variable Rmt that is fully interacted with the
Abovemt indicator variable.

64 / 101
RD paper checklist
RD jump visualization

Balance test

Bunching test

Robustness to choice of estimation procedure, kernel, bandwidth

65 / 101
First-stage RD jump
The likelihood of receiving Fonden increases from 0.65 just below the threshold to 0.87 just above.

66 / 101
ITT (reduced-form) RD jump
Municipalities immediately to the left of the threshold experience roughly 0.79 excess deaths per 1,000 person-years, those
eligible for Fonden (immediately to the right) don’t experience excess deaths.

67 / 101
Summary of results

68 / 101
Balance test

69 / 101
Bunching test

70 / 101
Robustness checks: tuning parameters

71 / 101
Robustness checks: bandwidths

72 / 101
RD in practice: National borders and the conservation of nature
by Burgess, Costa, and Olken (2023)

73 / 101
Background: Deforestation in Brazil
The Amazon rainforest is the largest in the world.

Brazil's Forest Code sets legal requirements for landowners in the Amazon to maintain a certain percentage of native forest on
their property (e.g., 80%).

If illegally clear beyond the limit, they face heavy fines, land confiscation, or imprisonment.

This makes illegal deforestation more expensive than sustainable practices.

However, there has been always a gap between de jure rules and de facto action.

Starting in 2005, Brazil increased enforcement of these policies, strengthening fines and increasing enforcement in a variety of
ways, including satellites.

What is the net effect of this increased enforcement? And is it stable?

Burgess, Costa, and Olken (2023) study this by examining what happens at the border.

74 / 101
The border

75 / 101
The border

76 / 101
The border

77 / 101
The border
Idea: comparing deforestation at the border captures the effect of Brazillian state policy

holding other aspects, like profitability, soil, etc constant.

The Brazil effect is estimated using an RD design, using distance to the border as running variable.

78 / 101
Empirical strategy
Yi = α + βBrazili + g(DistBorderi ) + δXi + εi

where

Yi is forest cover or annual forest loss

The function g(DistBorderi ) represents a (linear) polynomial of distance from the border.

Xi are geographic controls (slope, distance to water)

79 / 101
Forest cover as of 2000
Before the policy was implemented, forest cover dropped at the border.

80 / 101
Annual forest loss 2001-2005
During 2001-2005 (e.g., before the enforcement policy), annual forest loss increased significantly.

81 / 101
Annual forest loss 2006-2013
Annual forest loss decreased after the policy implemented in Brazil in 2005.

82 / 101
Annual forest loss 2014-2020
The effects tend to persist even more than a decade later.

83 / 101
Summary of results

There is a role for state policy to determine the wedge between de jure and de facto conservation policy.

84 / 101
RD in Practice: Exploring Applications in the Kurdish Context

85 / 101
What are pressing social, economic, or policy issues in your region

where eligibility or access is determined by a cutoff, threshold, or score?

Below are some examples to spark ideas:

Education: exam score cutoffs for scholarships, school admission, teacher incentives.

Health: age thresholds for child/maternal health programs, vaccination campaigns.

Social policy: poverty score thresholds for cash transfers, subsidies, food aid.

Infrastructure: eligibility rules for housing or reconstruction support.

86 / 101
Please choose two problems you would like to explore.

Use these guiding questions:

What is the cutoff or threshold?

What is assigned above vs. below the threshold?

What would be the outcome of interest?

Is there data (or potential to collect data) around the cutoff?

Activity plan:

1 hour: work individually in small groups to sketch out your RD research idea.

30 minutes: share your ideas with the group (would be helpful to have presentation slides outlining your idea).

30 minutes: we will discuss together and I will provide feedback and suggestions on feasibility.

87 / 101
Hands-on Practice: Replicating Kazianga et al. (2013)

88 / 101
Replicating Kazianga et al. (2013)
For this exercise, we will use the software Stata and the dataset bright.dta to conduct the analysis in the spirit of the paper.

We will follow the RD checklist in the lecture:

Show the clear (first stage) RD jump (Figure 1)

Check the validity of the RD with bunching test (Figure 2) and balance test (Table 2)

Show the main RD results (Figure 3)

Show the robustness to difference specification (Table 5)

Show the RD results for test scores (Figure 4)

Replicate the heterogeneity analysis by gender (Table 6 Column 1)

Replication files are available in the Google Drive. You can also practice RD using the codes.

89 / 101
Replicating Kazianga et al. (2013)
RD checklist: Show the clear (first stage) RD jump (Figure 1)

The plot shows a village’s probability of receiving a BRIGHT school (selected) as a function of its relative score.

Use linear local polynomial estimator with an Epanechnikov kernel and a bandwidth of 25 points, focusing on the narrow
range of (−250, 250) and estimating the function separately for villages on either side of zero.

You don’t need to report the fraction of explained variance as in Figure 1.

90 / 101
Replicating Kazianga et al. (2013)
RD checklist: Show the clear (first stage) RD jump (Figure 1)

use "bright.dta", clear

lpoly selected rel_score if rel_score >= 0, nograph deg(1) k(ep) bw(25) at(rel_score) gen(pred_above)
lpoly selected rel_score if rel_score < 0, nograph deg(1) k(ep) bw(25) at(rel_score) gen(pred_below)
replace pred_above = . if rel_score <= 0
replace pred_below = . if rel_score > 0

preserve
// Generate the bin variable
gen bin = 25*floor(rel_score/25)

// Define the range of the data


drop if abs(rel_score) > 250

// Collapse the "raw data" into bins


collapse selected pred_*, by(bin)

// Plot graph by bin


gr tw (scatter selected bin) (line pred_above bin) (line pred_below bin), ///
xline(0) xtitle("Relative score") ytitle("Received a BRIGHT school") ///
sch(modern) legend(off) xlabel(-250(50)250)
restore
91 / 101
Replicating Kazianga et al. (2013)
RD checklist: Show the clear (first stage) RD jump (Figure 1)

92 / 101
Replicating Kazianga et al. (2013)
RD checklist: Check the validity of the RD with bunching test (Figure 2) and balance tests (Table 2)

For bunching tests, we can use rddensity

For balance tests, estimate the main equation without controlling for child and household characteristics.

The outcome Yihjk would be the child/household characteristics.

Yihjk = β0 + β1 Tj + f (Rel_Scorej ) + γZk + εihjk

93 / 101
Replicating Kazianga et al. (2013)
RD checklist: Check the validity of the RD with bunching test (Figure 2)

For bunching tests, we can use rddensity

ssc install rddensity


net install lpdensity, from(https://raw.githubusercontent.com/nppackages/lpdensity/master/stata) replace
rddensity rel_score if village_level == 1, pl plot_range(-250 250) plot_n(10 10) ///
nohist plotl_estype(point) plotr_estype(point)

94 / 101
Replicating Kazianga et al. (2013)
RD checklist: Check the validity of the RD with balance tests (Table 2)

Yihjk = β0 + β1 Tj + f (Rel_Scorej ) + γZk + εihjk

// save list of variables as global macros


gl children Hh_HeadMale Hh_HeadAge Hh_HeadSchool Hh_NumMembers ///
Hh_NumKids Ch_Age Ch_Girl Ch_HeadChild Ch_HeadGrandChild Ch_HeadNephew

gl religion Hh_ReligionMuslin Hh_Animist Hh_Christian Hh_Lang_Fulfude ///


Hh_Lang_Gulmachema Hh_Lang_Moore Hh_Ethnicity_Gourmanche Hh_Ethnicity_Mossi Hh_Ethnicity_Peul

gl household Hh_FloorBasic Hh_RoofBasic Hh_Radio Hh_Telmob Hh_Watch Hh_Bike Hh_Cows Hh_Motorbike Hh_Cart

// create the squared term of relative score


gen rel_score2 = rel_score^2

// loop over all variables in the global "children"


foreach var of global children {
eststo c`var': reghdfe `var' proj_selected rel_score rel_score2, a(department) cl(clustercode)
}

// repeat the same for other globals


// report the results
esttab c*, keep(proj_selected)
95 / 101
Replicating Kazianga et al. (2013)
RD checklist: Show the main RD results (Figure 3)

The plot shows an individual’s probability of attending school as a function of its relative score.

Use linear local polynomial estimator with an Epanechnikov kernel and a bandwidth of 25 points, focusing on the narrow
range of (−250, 250) and estimating the function separately for villages on either side of zero.

You don’t need to report the fraction of explained variance as in Figure 3

96 / 101
Replicating Kazianga et al. (2013)
RD checklist: Show the main RD results (Figure 3)

use "bright.dta", clear

lpoly attending rel_score if rel_score >= 0, nograph deg(1) k(ep) bw(25) at(rel_score) gen(pred_above)
lpoly attending rel_score if rel_score < 0, nograph deg(1) k(ep) bw(25) at(rel_score) gen(pred_below)
replace pred_above = . if rel_score <= 0
replace pred_below = . if rel_score > 0

preserve
// Generate the bin variable
gen bin = 25*floor(rel_score/25)

// Define the range of the data


drop if abs(rel_score) > 250

// Collapse the "raw data" into bins


collapse attending pred_*, by(bin)

// Plot graph by bin


gr tw (scatter attending bin) (line pred_above bin) (line pred_below bin), ///
xline(0) xtitle("Relative score") ytitle("Received a BRIGHT school") ///
sch(modern) legend(off) xlabel(-250(50)250)
restore
97 / 101
Replicating Kazianga et al. (2013)
RD checklist: Show the main RD results (Figure 3)

98 / 101
Replicating Kazianga et al. (2013)
RD checklist: Show the robustness to different specifications (Table 5)

// save control variables as global macro

gl children_controls Hh_HeadMale Hh_HeadAge Hh_HeadSchool Hh_NumMembers ///


Hh_NumKids Ch_Girl Ch_HeadChild Ch_HeadGrandChild Ch_HeadNephew"

// generate additional interaction terms


cap gen rel_score3 = rel_score^3
cap gen rel_scorexproj_selected = rel_score * selected
cap gen rel_score2xproj_selected = rel_score2 * selected

// estimation: models (1) - (4)


est clear
eststo c1: reghdfe attending proj_selected rel_score rel_score2 ///
$children_nage $religion $household, a(department Ch_Age) cl(clustercode)

eststo c2: reghdfe attending proj_selected rel_score rel_score2, a(department) cl(clustercode)

eststo c3: reghdfe attending proj_selected rel_score, a(department) cl(clustercode)

eststo c4: reghdfe attending proj_selected rel_score rel_score2 rel_score3 ///


$children_nage $religion $household, a(department Ch_Age) cl(clustercode)

esttab c*, keep(proj_selected)


99 / 101
Replicating Kazianga et al. (2013)
RD checklist: Show the robustness to different specifications (Table 5)

// save control variables as global macro

gl children_controls Hh_HeadMale Hh_HeadAge Hh_HeadSchool Hh_NumMembers ///


Hh_NumKids Ch_Girl Ch_HeadChild Ch_HeadGrandChild Ch_HeadNephew"

// generate additional interaction terms


cap gen rel_score3 = rel_score^3
cap gen rel_scorexproj_selected = rel_score * selected
cap gen rel_score2xproj_selected = rel_score2 * selected

// estimation: models (5) - (8)


eststo c5: reghdfe attending proj_selected rel_score rel_score2 rel_scorexproj_selected ///
rel_score2xproj_selected $children_nage $religion $household, a(department Ch_Age) cl(clustercode)

eststo c6: xi: dprobit attending proj_selected rel_score rel_score2 ///


$children_nage $religion $household i.Ch_Age i.department, cl(clustercode)

eststo c7: reghdfe attending proj_selected if abs(rel_score)<40, a(department) cl(clustercode)

eststo c8: reghdfe attending2 proj_selected rel_score rel_score2 ///


$children_nage $religion $household, a(department Ch_Age) cl(clustercode)

// report the results


esttab c*, keep(proj_selected) 100 / 101
Thank you!

Questions can be directed to [email protected].

101 / 101

You might also like