0% found this document useful (0 votes)

20 views8 pages

Lecture 21

This document discusses causal inference and how to determine causation rather than just association. It introduces the potential outcomes framework using counterfactuals to define treatments and outcomes. Randomized experiments allow for causal estimates by ensuring treatment is independent of potential outcomes. Observational studies require addressing confounding factors that influence both treatment assignment and outcomes.

Uploaded by

Tirthankar Adhikari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views8 pages

Lecture 21

Uploaded by

Tirthankar Adhikari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Lecture Notes 21

36-705

1 Causal Inference
Much of statistics and machine learning focuses on questions of association. Are X and Y
correlated? Is X predictive of Y , and so on.
In many applications however, our questions are inherently causal: is a medication effective
against a disease? Do masks prevent the spread of Covid? Was someone fired because of
their age? Does making an ad larger on a website make people buy more?
These are not questions of association. Aspirin is strongly associated with headaches but we
don’t think that aspirin causes headaches. We often experience turbulence aafter the seat
belt sign comes on in a plane. The association is strong. But turning on the seaat belt sign
does not cause turbulence. This is what we mean by the phrase: “correlation does not imply
causation.”

2 The Potential Outcomes Framework

There are two essentially equivalent languages for causation: the first is called potential
outcomes or counterfactuals. The second is structural equation models or directed acyclic
graphs. We’ll start with the first one.
Suppose we have two random variables (A, Y ) where A is an exposure or treatment and Y
is an outcome. For now, assume that A is binary such as “take aspirin (A = 1)” and “don’t
take aspirin (A = 0).” A typical dataset looks like this:
A 1 1 1 1 0 0 0 0
Y 97 76 83 93 100 89 13 67
Now introduce more random variables called potential outcomes (or counterfactuals). Let
Y (0) be the outcome that would have been observed if A = 0 and let Y (1) be the outcome
that would have been observed if A = 1. Causal questions involve comparisons of these two
potential outcomes. Note that
(
Y (0) if A = 0
Y =
Y (1) if A = 1.

We can write this as

Y = Y (A)

1
or
Y = (1 − A)Y (0) + AY (1).

So now we have four random variaables (Y, A, Y (0), Y (1)) where Y is related to Y (0) and
Y (1) by the above consistency relations. Our data set now looks like this:
A 1 1 1 1 0 0 0 0
Y 97 76 83 93 100 89 13 67
Y(0) ? ? ? ? 100 89 13 67
Y(1) 97 76 83 93 ? ? ? ?
Much of the data are missing because we don’t observe Y (0) when A = 1 and we don’t
observe Y (1) when A = 0.
More generally, if A ∈ R then the set of counterfactuals is (Y (a) : a ∈ R). In this case there
are infinitely many counterfactuals. The observed Y is
Y = Y (A).
You can think of Y (a) as a curve and we get to observe Y (a) evaluated at A.
While all of this might seem rather obvious, thinking formally about treatment and control,
and the potential outcomes is extremely important to causal inference. A point of partic-
ular emphasis is that if you are asking a causal question, ideally you need to be able to
meaningfully say what the “treatment” is and what the potential outcomes are.
Here are a few examples of statements:

1. “Aspirin cures headaches.” In order to cast this is the potential outcomes framework
we could imagine that for a person with a headache (a unit) we could either give
the person aspirin (treatment) or a placebo (control), and observe the corresponding
potential outcome.
2. “She has long hair because she is a girl.” This sounds like a causal statement so we
should be able to describe the experiment. Is a unit a girl/boy? What exactly is a
treatment? Can we meaningfully say what the potential outcomes are?

For some causal questions we can naturally define an associated “experiment”. Murky causal
questions are ubiquitous, and are in some sense interesting and challenging.

3 Causal Estimands
There are many possible parameters of interest. For example, E[Y (a)] which is the outcome
if everyone had A = a. Here is some other notation that is sometimes used:
E[Y (a)] = E[Y |set A = a] = E[Y |do A = a].

2
In general, E[Y (a)] 6= E[Y |A = a]!
When A is binary, it is often of interest to estimate the average treatment effect (ATE)

ψ = E[Y (1)] − E[Y (1)].

Think of this as the mean of Y if everyone took treatment minus the mean of Y if nobody
took treatment. In prediction and machine learning one instead focuses on quantities like

α = E[Y |A = 1] − E[Y |A = 0]

which is not, in general, the same as ψ. The latter is some measure of association.
How are we going to estimate ψ?

4 Randomized Experiments
Suppose that A was randomly assigned. (Think of the vaccine trials for covid.) In that case,
A is independent of (Y (0), Y (1) which we write as

A⊥
⊥ (Y (0), Y (1))

then we have

α = E[Y |A = 1] − E[Y |A = 0] = E[Y (1)|A = 1] − E[Y (0)|A = 0] = E[Y (1)] − E[Y (0)] = ψ.

Randomization ensures that association IS causaation. And we can estimate α easily. Sup-
pose, for example, that we assigned treatment by flipping a coin. Let
1 X 1 X
α
b= Yi − ≡Y1−Y0
n1 i:A =1 n0 i:A =0
i i

P P √
where n1 = i I(Ai = 1) and n0 = i I(Ai = 0). It is easy to see that n(Y 1 − Y 0 )
N (θ, τ 2 ) where τ 2 = 2σ12 + 2σ22 and σj2 = Var[Y |A = j]. Inference is easy. This is why those
companies are spending millions of dollars doing randomized trials.

5 Hypothesis testing: Fisher’s Exact p-values

Fisher was one of the first to understand the power of a randomized trial. In agricultural
experiments, he advocated randomized experiments in order to draw rigorous causal con-
clusions. A natural subsequent problem is: given an estimate of the causal effect, assess its
significance (or construct confidence intervals for it).

3
Fisher gave a way to construct valid p-values under what is called the sharp null, i.e. the null
hypothesis that for every unit i the potential outcomes are the same under the treatment
and control, i.e. the treatment has no effect. The method is reminiscent of the permutation
method we used for two-sample testing.
Suppose we test H0 : θ = 0 by rejecting when |b
α| is large. Under the null hypothesis, we can
determine both potential outcomes Yi (0) and Yi (1) for all the units.
We can now use the permutation method. Say there are n subjects and m were treated.
Permute the values of Ai and let T 0 denote the m units who receive treatment: then our
estimate would be:
1 X 1 X
ψbT 0 = Yi (1) − Yi (0),
m i∈T 0 n−m 0
i∈T
/

where we can use the sharp null hypothesis to “fill in” the potential outcomes we do not
observe. We can repeat this many times (say B) and compute the p-value:
B
1 X b
p-value = I(|ψTb | ≥ |ψ|).
b
B b=1
It is easy to verify that this is a valid p-value.

6 Confounding
For many policy questions, we cannot actually do a randomized trial. For instance, if I
wanted to know if smoking caused lung cancer, there are ethical issues with trying to run a
randomized trial. In this case, we have to use observational i.e. we have information about
many people who are smokers and not, and whether they have lung cancer or not. It is clear
that we can measure the correlation between smoking and lung cancer: the main question
is when, if ever, can we claim a causal relationship?
Here is a motivating example: Suppose that our population has two kinds of people, those
who are always healthy (Yi (1) = Yi (0) = 1) irrespective of whether they take the treatment
or not, and those who are always unhealthy (Yi (1) = Yi (0) = 0) irrespective of whether
they take the treatment or not. Then Yi (1) − Yi (0) = 0 for all i so there is no causal effect.
Suppose further that mostly healthy people take the treatment, while the unhealthy ones
do not take the treatment. The causal effect is ψ = 0, but the estimator above would yield,
ψb ≈ 1, and we might incorrectly conclude that the treatment is beneficial. The data would
look like this:
A 1 1 1 1 0 0 0 0
Y 1 1 1 1 0 0 0 0
Y(0) 1 1 1 1 0 0 0 0
Y(1) 1 1 1 1 0 0 0 0

4
Suppose however, that we knew who the healthy people were and who the unhealthy people
were (we could gather such information by asking people questions about their lifestyle and
other things). Then we could try to compare healthy people who took the treatment with
healthy people who did not and similarly compare unhealthy people who took the treatment
with unhealthy people who did not (and then try to combine these two estimates in some
way). In this case, when we compared two healthy people who took the treatment and who
did not we would see the treatment had no effect, and similarly for the unhealthy ones. We
would correctly conclude that the treatment has no effect.
The key assumption that makes causal inference from observational data possible is the as-
sumption of no unmeasured confounding or selection on observables or ignorability. Formally,
we suppose that we have access to covariates X (think demographic information) such that,

A⊥
⊥ (Y (1), Y (0))|X.

This is an assumption. Roughly the assumption is plausible in settings where we believe we

can measure all of the covariates that explain the decision to take the treatment. We also
need the assumption that P(A = 1|X = x) is bounded away from 0 and 1, so that every
individual has some non-zero chance of being either treated or in the control group.
One way to think about this assumption, is that conditional on X we have a randomized
trial: the treatment is independent of the potential outcomes. So if we condition on the
confounders X we no longer have any selection bias.
In what follows we will assume we have random variables (X, A, Y, Y (0), Y (1)) where

Y = AY (1) + (1 − A)Y (0) = Y (A).

7 Identification under no unmeasured confounding

We want to estimate:

ψ = E[Y (1) − Y (0)]

assuming that

A⊥
⊥ (Y (1), Y (0))|X.

Now
Z Z
E[Y (1)] = E[Y (1)|X = x]p(x)dx =
E[Y (1)|X = x, A = 1]p(x)dx
Z Z
= E[Y |X = x, A = 1]p(x)dx = µ1 (x)p(x)dx

5
where
R µa (x) = E[Y |X = a, A = a].R Note that thus is NOT equal to E[Y |A = 1] =
µa (x)p(x|1)dx. Similarly, E[Y (0)] = µ0 (x)p(x)dx. So
Z
ψ = E[Y (1) − Y (0)] = [µ1 (x) − µ0 (x)]p(x)dx.

This is a function of the observed data (X, A, Y ) so we can estimate it.

In the case that A is continuous, the same argument shows that
Z
E[Y (a)] = µa (x)p(x).

8 Estimation under no unmeasured confounding

The most direct way to estimate ψ is to estimate:

µ0 (x) = E[Y |X = x, W = 0]
µ1 (x) = E[Y |X = x, W = 1].

These are two functions of the covariates X, one of them is the average outcome of the
treatment group as a function of the covariates, and the other is the average outcome of the
control group as a function of the covariates.
Estimating a conditional expectation is a problem is probably the most common problem
in statistics – it is known as regression. We will delve into this formally in the next few
lectures but for now let us suppose that someone hands us estimators µb0 and µ
b1 of these two
functions.
Then we can compute the plug-in estimator:
n
1X
ψb = E µ1 (X) − µ
b X [b b0 (X)] = µ1 (Xi ) − µ
[b b0 (Xi )]
n i=1

which is just the average of the difference between two regression functions. One approxi-
mately correct way to think about this is that we are using regression to impute the missing
potential outcomes for each individual.
There are other ways to try to estimate ψ. The other popular estimator is called the inverse
propensity score estimator. The propensity score is

π(x) = P(A = 1|X = x),

6
which represents the probability that a unit with covariates x receives treatment. Note that,

E[A|X = x] = π(x)
E[1 − A|X = x] = 1 − π(x).

p(x, a, y) = p(x)p(a|x)p(y|x, 0) = p(x)(1 − π(x))p(y|x, 0).

So, for a = 1,
Z Z Z
E[Y (1)] = E[Y |X = x, A = 1]p(x)dx = yp(y|x, 1)p(x)dxdy
Z Z
y
= p(y|x, 1)π(x)p(x)dxdy
π(x)
Z Z
y
= p(x, a = 1, y)dxdy
π(x)
Z Z
ay
= p(x, a = 1, y)dxdy
π(x)
1 Z Z
X ay
= p(x, a, y)dxdy
a=0
π(x)

AY
=E .
π(X)
Similarly,
(1 − A)Y
E[Y (1)] = E .
1 − π(X)
Therefore,
YA Y (1 − A)
ψ=E −E .
π(X) (1 − π(X))
This suggests the estimator
n
1 X Y i Ai Y i (1 − Ai )
ψb = − .
n i=1 π(Xi ) 1 − π(Xi )

This is called the Horvitz-Thompson estimator or the inverse probability weighted (IPW)
estimator. This requires that π(x) be known as it would be in a randomized experiment.
Otherwise we have to insert an estimate of π(x). This is again a problem of regression except
the outcome is binary.

7
9 Advanced topics
This is just the tip of the iceberg. If you take a course in Causal Inference you will see many
other interesting things such as:

1. No unmeasured confounding is just one assumption that leads to identification of a

causal effect. More broadly, in economics, political science and other fields people look
for what are called natural experiments, i.e. roughly some subset of the population for
which the assignment to treatment/control is nearly random.

2. Even in a randomized trial you might have something called non-compliance, i.e. some
people don’t do what they are told. In this case, you need to adjust your estimates.
This is a canonical example of something called an instrumental variable problem.

3. There are many things beyond the average treatment effect that you might want to
estimate. They all have different assumptions under which they are identified (i.e.
can be written in terms of observable quantities) and there are different strategies to
estimate them.

4. There is a very nice/simple way to combine the regression-based and propensity-score

based estimators from above to construct what are called doubly robust estimators.
These have the property that they are consistent if you can estimate either the re-
gression function or the propensity score well (i.e. you do not need to estimate both
well).

5. The plug-in estimator ψb = n−1 i [b

P R
µ(Xi , 1) − µ
b(Xi , 0)] is not optimal. Finding opti-
mal estimators of functionals is part of semiparametric theory.

6. There are many different languages for talking about causality and causal inference.
We used potential outcomes. Many people use structural equation models and directed
graphs. These lead to the same formulas for causal effects. We might revisit this later.

Aleix Ruiz de Villa Robert - Causal Inference For Data Science (MEAP V04) - Manning (2023)
No ratings yet
Aleix Ruiz de Villa Robert - Causal Inference For Data Science (MEAP V04) - Manning (2023)
217 pages
Exercise 2 T-Test
No ratings yet
Exercise 2 T-Test
1 page
Data Mining: Clustering & CART Analysis
100% (4)
Data Mining: Clustering & CART Analysis
57 pages
Causal Inference for Researchers
100% (2)
Causal Inference for Researchers
51 pages
6 Causal Inference Technical
No ratings yet
6 Causal Inference Technical
28 pages
Introduction Causal Inference
No ratings yet
Introduction Causal Inference
2 pages
1-Introduction To Applied Econometrics
No ratings yet
1-Introduction To Applied Econometrics
33 pages
Causal Inference
No ratings yet
Causal Inference
2 pages
Causal Inference, Michael E. Sobel
No ratings yet
Causal Inference, Michael E. Sobel
3 pages
Kenneth Rothman - Timothy L. Lash - Modern Epidemiology-LWW (2020) - 96-142
No ratings yet
Kenneth Rothman - Timothy L. Lash - Modern Epidemiology-LWW (2020) - 96-142
47 pages
Causal Inference: 1.1 Two Types of Causal Questions
No ratings yet
Causal Inference: 1.1 Two Types of Causal Questions
19 pages
Causal Inference: 1.1 Two Types of Causal Questions
No ratings yet
Causal Inference: 1.1 Two Types of Causal Questions
8 pages
Lehmann Scheffe PDF
100% (1)
Lehmann Scheffe PDF
7 pages
Formulating Causal Questions and Principled Statistical Answers
No ratings yet
Formulating Causal Questions and Principled Statistical Answers
26 pages
Causal Inference Lecture Intro
No ratings yet
Causal Inference Lecture Intro
51 pages
Causal Inference Slides
No ratings yet
Causal Inference Slides
80 pages
MIT (14.32) Spring 2009 J. Angrist Preliminaries
No ratings yet
MIT (14.32) Spring 2009 J. Angrist Preliminaries
6 pages
Imperial Causality
No ratings yet
Imperial Causality
124 pages
Causal Inference in Statistics: An Overview
100% (1)
Causal Inference in Statistics: An Overview
51 pages
The International Journal of Biostatistics: An Introduction To Causal Inference
No ratings yet
The International Journal of Biostatistics: An Introduction To Causal Inference
62 pages
Bayesian Causal Tutorial Ohiostate June2019
No ratings yet
Bayesian Causal Tutorial Ohiostate June2019
56 pages
CIML2023
No ratings yet
CIML2023
87 pages
Causal Inference for Researchers
No ratings yet
Causal Inference for Researchers
17 pages
BUS 172 Practice Maths
0% (1)
BUS 172 Practice Maths
18 pages
Causality (Slides 2009 Video Link) - Philip Dawid
No ratings yet
Causality (Slides 2009 Video Link) - Philip Dawid
34 pages
01 Foundations
No ratings yet
01 Foundations
102 pages
Introduction To Treatment Effects Handout
No ratings yet
Introduction To Treatment Effects Handout
18 pages
Spe2024 CIlect KF
No ratings yet
Spe2024 CIlect KF
35 pages
Levels of Data
100% (1)
Levels of Data
26 pages
Syllabus - Causal Inference
No ratings yet
Syllabus - Causal Inference
6 pages
PPKU06 07 Modelling Asosiasi Korelasi Regresi
No ratings yet
PPKU06 07 Modelling Asosiasi Korelasi Regresi
58 pages
Fisher's Randomization Inference - Matthew Blackwell
No ratings yet
Fisher's Randomization Inference - Matthew Blackwell
7 pages
Causal Inference in The Social Sciences
No ratings yet
Causal Inference in The Social Sciences
30 pages
Chaper Five: Curve Fitting
No ratings yet
Chaper Five: Curve Fitting
44 pages
A Brief Introduction To Causal Inference in Machine Learning
No ratings yet
A Brief Introduction To Causal Inference in Machine Learning
88 pages
Casual Tutorial Slides
No ratings yet
Casual Tutorial Slides
254 pages
q3 Weeks56 III Learning Material
No ratings yet
q3 Weeks56 III Learning Material
7 pages
PORDATA Project: Income and Household Expenditure
No ratings yet
PORDATA Project: Income and Household Expenditure
14 pages
Monthly Income Analysis Report
No ratings yet
Monthly Income Analysis Report
35 pages
Causal Inference - Estimating Counterfactuals
No ratings yet
Causal Inference - Estimating Counterfactuals
15 pages
Time Series Forecasting Guide
100% (2)
Time Series Forecasting Guide
16 pages
CH 18 Wooldridge 6e PPT Updated
No ratings yet
CH 18 Wooldridge 6e PPT Updated
18 pages
STATA Guide to Analyzing Longitudinal Data
No ratings yet
STATA Guide to Analyzing Longitudinal Data
98 pages
Histogram Analysis Guide
No ratings yet
Histogram Analysis Guide
3 pages
Catena: Khanh Pham, Dongku Kim, Sangyeong Park, Hangseok Choi T
No ratings yet
Catena: Khanh Pham, Dongku Kim, Sangyeong Park, Hangseok Choi T
11 pages
Causality
No ratings yet
Causality
22 pages
Least Square Method - Definition, Graph and Formula
No ratings yet
Least Square Method - Definition, Graph and Formula
14 pages
Applied Maths 2020 U1 P1
No ratings yet
Applied Maths 2020 U1 P1
12 pages
Session 6 BEDO - Hyd (Before Class)
No ratings yet
Session 6 BEDO - Hyd (Before Class)
31 pages
Performance Comparison of Simple Regression Random Forest and XGBoost Algorithms For Forecasting Electricity Demand
No ratings yet
Performance Comparison of Simple Regression Random Forest and XGBoost Algorithms For Forecasting Electricity Demand
7 pages
Endogeneity
No ratings yet
Endogeneity
10 pages
Li Et Al 2023 Bayesian Causal Inference A Critical Review
No ratings yet
Li Et Al 2023 Bayesian Causal Inference A Critical Review
24 pages
Reflection 03 (Rahul Singh)
No ratings yet
Reflection 03 (Rahul Singh)
5 pages
Causal Inference Lesson One PDF
No ratings yet
Causal Inference Lesson One PDF
16 pages
Annurev Statistics 033121 114601
No ratings yet
Annurev Statistics 033121 114601
30 pages
Introduction To Causal Inference-Aug25 2020-Neal
No ratings yet
Introduction To Causal Inference-Aug25 2020-Neal
61 pages
Predicting Inflation Through Online Prices
No ratings yet
Predicting Inflation Through Online Prices
20 pages
Intro Stat
No ratings yet
Intro Stat
17 pages
Causal Inferences
No ratings yet
Causal Inferences
10 pages
AAAI-2023 教程用于因果推断的机器学习
No ratings yet
AAAI-2023 教程用于因果推断的机器学习
145 pages
Estiiiiiiiii Terbaru
No ratings yet
Estiiiiiiiii Terbaru
4 pages
EDA - Midterms - Reviewer
No ratings yet
EDA - Midterms - Reviewer
7 pages
CA Sample
No ratings yet
CA Sample
3 pages
An Introduction To Causal Modelling: Gauranga Kumar Baishya and M. R. Srinivasan Chennai Mathematical Institute (CMI)
No ratings yet
An Introduction To Causal Modelling: Gauranga Kumar Baishya and M. R. Srinivasan Chennai Mathematical Institute (CMI)
52 pages
Causal Inference Intro
No ratings yet
Causal Inference Intro
16 pages
Causal Inference in Python
No ratings yet
Causal Inference in Python
10 pages
21.1 Causality
No ratings yet
21.1 Causality
56 pages
Handout - 7 - Threats To Validity
No ratings yet
Handout - 7 - Threats To Validity
1 page
A2 Causality
No ratings yet
A2 Causality
28 pages
2071 TC2AILab5
No ratings yet
2071 TC2AILab5
6 pages
Causal Inference Book Part I-Ifqdve
No ratings yet
Causal Inference Book Part I-Ifqdve
158 pages
Lecture Notes
No ratings yet
Lecture Notes
10 pages
Week 12 Slides - New
No ratings yet
Week 12 Slides - New
20 pages
M Api
No ratings yet
M Api
17 pages
Lab Program (SVM From Scratch)
No ratings yet
Lab Program (SVM From Scratch)
2 pages
Complete Download Causal Inference What If 1st Edition Miguel A. Hernan PDF All Chapters
100% (10)
Complete Download Causal Inference What If 1st Edition Miguel A. Hernan PDF All Chapters
78 pages
Statistics QSTN
No ratings yet
Statistics QSTN
3 pages
Econ5813 Lecturenotes Lecture1 0
No ratings yet
Econ5813 Lecturenotes Lecture1 0
25 pages
Statistical Approaches To Causal Analysis, 1st Edition EPUB DOCX PDF Download
100% (11)
Statistical Approaches To Causal Analysis, 1st Edition EPUB DOCX PDF Download
14 pages
Introduction To Causal Inference 1711927430
No ratings yet
Introduction To Causal Inference 1711927430
108 pages
Peter Spirtes 2010
No ratings yet
Peter Spirtes 2010
20 pages
The Hardness of Validating Observational Studies With Experimental Data
No ratings yet
The Hardness of Validating Observational Studies With Experimental Data
20 pages
Ap15 FRQ Statistics
No ratings yet
Ap15 FRQ Statistics
17 pages
Causal Report
No ratings yet
Causal Report
52 pages
Causal Inference and Machine Learning
No ratings yet
Causal Inference and Machine Learning
296 pages

Lecture 21

Uploaded by

Lecture 21

Uploaded by

Lecture Notes 21

2 The Potential Outcomes Framework

We can write this as

ψ = E[Y (1)] − E[Y (1)].

5 Hypothesis testing: Fisher’s Exact p-values

This is an assumption. Roughly the assumption is plausible in settings where we believe we

Y = AY (1) + (1 − A)Y (0) = Y (A).

7 Identification under no unmeasured confounding

ψ = E[Y (1) − Y (0)]

This is a function of the observed data (X, A, Y ) so we can estimate it.

8 Estimation under no unmeasured confounding

π(x) = P(A = 1|X = x),

p(x, a, y) = p(x)p(a|x)p(y|x, 0) = p(x)(1 − π(x))p(y|x, 0).

1. No unmeasured confounding is just one assumption that leads to identification of a

4. There is a very nice/simple way to combine the regression-based and propensity-score

5. The plug-in estimator ψb = n−1 i [b

You might also like