Lecture 3: Survival Models and Analysis
Estimating the Survival Function
November 9, 2021
Survival Models and Analysis November 9, 2021 1 / 23
Survival Models and Analysis
Estimating the Survival Function
Introduction
Introduction
In an analysis, we first explore the data. Hereby we are also interested in
estimating the survival function S from right censored data.
Consider patients entering a study at different times and the observation
of known (•) and censored (◦) survival times.
Survival Models and Analysis November 9, 2021 2 / 23
Survival Models and Analysis
Estimating the Survival Function
Introduction
We extract the following data from the graph.
Example: We want to know the probability that a patient has died at 6
months.
Survival Models and Analysis November 9, 2021 3 / 23
Survival Models and Analysis
Estimating the Survival Function
Introduction
If we ignore censoring, we use the empirical distribution function and
estimate
10
1 X 4
p̂(6) = I (lifetimei ≤ 6) =
10 10
i=1
However, one patient (no. 4) has survived 3.2 months (censored
observation).
How to deal with this patient?
• Assume no. 4 died before 6 months overestimation ( 104
)
3
• Assume no. 4 survived 6 months underestimation ( 10 )
3
• Ignore no. 4 loss of information ( 9 )
Hence, censoring is causing problems!
Survival Models and Analysis November 9, 2021 4 / 23
Survival Models and Analysis
Estimating the Survival Function
Introduction
The basic data layout is given as;
• t - survival time information (regardless of whether the person got the
event or is censored.)
• d - dichotomous variable that indicates censorship status.(1 if a
person gets the event or 0 if a person is censored;)
Survival Models and Analysis November 9, 2021 5 / 23
Survival Models and Analysis
Estimating the Survival Function
One-Sample Nonparametric Methods
One-Sample Nonparametric Methods
If we are assuming that every subject follows the same survival function
(no covariates or other individual differences), we can easily estimate S(t).
We will consider three methods for estimating a survivorship:
S(t) = P(T ≥ t)
without resorting to parametric methods.
We can use nonparametric estimators (function without covariates);
1 Kaplan-Meier
2 Life-table (Actuarial Estimator)
3 Nelson-Aalen estimator Via the Cumulative hazard
Estimate a survivor function with covariates: Cox proportional hazards
model.
Survival Models and Analysis November 9, 2021 6 / 23
Survival Models and Analysis
Estimating the Survival Function
One-Sample Nonparametric Methods
The Kaplan-Meier Estimator
The Kaplan-Meier (or KM) estimator is probably the most popular
approach. It can be justified from several perspectives:
• product limit estimator
• likelihood justification
• redistribute to the right estimator
We will start with an intuitive motivation based on conditional
probabilities, then review some of the other justifications.
Motivation:
First, consider an example where there is no censoring. The following are
times of remission (weeks) for 21 leukemia patients receiving control
treatment:
1, 1, 2, 2, 3, 4, 4, 5, 5, 8, 8, 8, 8, 11, 11, 12, 12, 15, 17, 22, 23
Survival Models and Analysis November 9, 2021 7 / 23
Survival Models and Analysis
Estimating the Survival Function
One-Sample Nonparametric Methods
How would we estimate S(10), the probability that an individual survives
to time 10 or later?
What about S(8)?
e is it 12 8
21 or 21
Let’s construct a table of S(t):
e
t(j) rj dj cj Ŝ(tj )
0 21 0 0 21/21=1.000
Empirical Survival Function:
1 21 2 0 19/21=0.905
2 19 2 0 17/21=0.810 When there is no censoring,
3 17 1 0 16/21=0.762 the general formula is:
4 16 2 0 14/21=0.667
5 14 2 0 12/21=0.571
8 12 4 0 8/21=0.381
11 8 2 0 6/21=0.286 e = #Individuals with T ≥ t
S(t)
12 6 2 0 4/21=0.190 total sample size
15 4 1 0 3/21=0.143
17 3 1 0 2/21=0.095 #subjects with event at t
=
22 2 1 0 1/21=0.048 total sample size
23 1 1 0 0/21=0.000
Survival Models and Analysis November 9, 2021 8 / 23
Survival Models and Analysis
Estimating the Survival Function
One-Sample Nonparametric Methods
Example for leukemia data (control arm):
Survival Models and Analysis November 9, 2021 9 / 23
Survival Models and Analysis
Estimating the Survival Function
One-Sample Nonparametric Methods
.What if there is censoring?
Consider the treated group for the leukemia example above
6+, 6, 6, 6, 7, 9+, 10+, 10, 11+, 13, 16, 17+ 19+, 20+, 22, 23, 25+,
32+, 32+, 34+, 35+
[Note: times with + are right censored]
We know S(6)= 21/21, because everyone survived at least until time 6 or
greater. But, we can’t say S(7) =17/21, because we don’t know the
status of the person who was censored at time 6.
In a 1958 paper in the Journal of the American Statistical Association,
Kaplan and Meier proposed a way to nonparametrically estimate S(t),
even in the presence of censoring.
The method is based on the ideas of conditional probability.
Survival Models and Analysis November 9, 2021 10 / 23
Survival Models and Analysis
Estimating the Survival Function
One-Sample Nonparametric Methods
.A quick review of conditional probability:
Conditional Probability: Suppose A and B are two events. Then,
P(A ∩ B)
P(A|B) =
P(B)
Multiplication law of probability: can be obtained from the above
relationship, by multiplying both sides by P(B):
P(A ∩ B) = P(A|B)P(B)
Extension to more than 2 events: Suppose A1 , A2 , ...Ak are k different
events. Then, the probability of all k events happening together can be
written as a product of conditional probabilities:
P(A1 ∩ A2 ... ∩ Ak ) = P(Ak |Ak−1 ∩ ... ∩ A1 ) × ...
× P(Ak−1 |Ak−2 ∩ ... ∩ A1 )
...
× P(A2 |A1 ) × P(A1 )
Survival Models and Analysis November 9, 2021 11 / 23
Survival Models and Analysis
Estimating the Survival Function
One-Sample Nonparametric Methods
.Now, let’s apply these ideas to estimate S(t):
Suppose ak < t ≤ ak+1 . Then
S(t) = P(T ≥ ak+1 )
= P(T ≥ a1 , T ≥ a2 , · · · , T ≥ ak+1 )
k
Y
= P(T ≥ a1 ) × P(T ≥ aj+1 |T ≥ aj )
j=1
k
Y k
Y
= [1 − P(T = aj |T ≥ aj )] = [1 − λj ]
j=1 j=1
k
dj dj
So Ŝ ∼
Y Y
= (1 − )= (1 − )
rj rj
j=1 j:aj <t
dj is the number of deaths at aj
rj is the number at risk at aj
Survival Models and Analysis November 9, 2021 12 / 23
Survival Models and Analysis
Estimating the Survival Function
One-Sample Nonparametric Methods
.Intuition behind the Kaplan-Meier Estimator
Think of dividing the observed time span of the study into a series of fine
intervals so that there is a separate interval for each time of death or
censoring:
Using the law of conditional probability,
Y
Pr (T ≥ t) = Pr (survive j th interval Ij |survived to start of Ij )
j
where the product is taken over all the intervals including or preceding
time t.
Survival Models and Analysis November 9, 2021 13 / 23
Survival Models and Analysis
Estimating the Survival Function
One-Sample Nonparametric Methods
.4 possibilities for each interval:
• No events (death or censoring) - conditional probability of surviving
the interval is 1.
• Censoring - assume they survive to the end of the interval,so that the
conditional probability of surviving the interval is 1.
• Death, but no censoring - conditional probability of not surviving the
interval is # deaths (d) divided by # ‘at risk’ (r) at the beginning of
the interval. So the conditional probability of surviving the interval is
1 − (d/r ).
• Tied deaths and censoring - assume censoring last to the end of the
interval, so that conditional probability of surviving the interval is still
1 − (d/r )
Survival Models and Analysis November 9, 2021 14 / 23
Survival Models and Analysis
Estimating the Survival Function
One-Sample Nonparametric Methods
.General Formula for jth interval:
We can write a general formula for the conditional probability of surviving
the j-th interval that holds for all 4 cases:
dj
1−
rj
We could use the same approach by grouping the event times into
intervals (say, one interval for each month), and then counting up the
number of deaths (events) in each to estimate the probability of surviving
the interval (this is called the lifetable estimate).
However, the assumption that those censored last until the end of the
interval wouldn’t be quite accurate, so we would end up with a cruder
approximation.
Survival Models and Analysis November 9, 2021 15 / 23
Survival Models and Analysis
Estimating the Survival Function
One-Sample Nonparametric Methods
As the intervals get finer and finer, the approximations made in estimating
the probabilities of getting through each interval become smaller and
smaller, so that the estimator converges to the true S(t).
This intuition clarifies why an alternative name for the KM is the product
limit estimator.
Survival Models and Analysis November 9, 2021 16 / 23
Survival Models and Analysis
Estimating the Survival Function
One-Sample Nonparametric Methods
where
• τ1 , · · · , τK is the set of K distinct death times observed in the sample.
• dj is the number of deaths at τj .
• rj is the number of individuals “at risk” right before the j-th death
time (everyone dead or censored at or after that time).
• cj is the number of censored observations between the j-th and (j +
1)-st death times. Censorings tied at τj are included in cj
Note: two useful formulas are:
1
rj = rj−1 − dj−1 − cj−1
2 X
rj = (cl + dl )
l≥j
Survival Models and Analysis November 9, 2021 17 / 23
Survival Models and Analysis
Estimating the Survival Function
One-Sample Nonparametric Methods
.Calculating the KM - Cox and Oakes example
Make a table with a row for every death or censoring time:
t(j) dj cj rj 1-dj /rj Ŝ(t(j) )
0 0 0 21 1-0 1
6 3 1 21 18/21=0.857 1x0.857=0.857
7 1 0 17 16/17=0.941 0.857x0.941=0.807
9 0 1 16 1 0.807x1=0.807
10 1 1 15 14/15=0.933 0.807x0.933=0.753
11 0 1 13 1 0.753x0.753
13 1 0 12 11/12=0.917 0.753x0.917=0.691
16 1 0 11 10/11=0.909 0.691x0.909=0.628
17 0 1 10 1 0.628x1=0.628
19 0 1 9 1 0.628x1=0.628
20 0 1 8 1 0.628x1=0.628
22 1 0 7 6/7=0.857 0.628x0.857=0.538
23 1 0 6 5/6=0.833 0.538x0.833=0.448
25 0 1 5 1 0.448x1=0.448
32 0 2 4 1 0.448x1=0.448
34 0 1 2 1 0.448x1=0.448
35 0 1 1 1 0.448x1=0.448
Survival Models and Analysis November 9, 2021 18 / 23
Survival Models and Analysis
Estimating the Survival Function
One-Sample Nonparametric Methods
.Note that:
• Ŝ(t+) only changes at death (failure) times
• Ŝ(t+) is 1 up to the first death time
• Ŝ(t+) only goes to 0 if the last event is a death.
KM plot for treated leukemia patients
Survival Models and Analysis November 9, 2021 19 / 23
Survival Models and Analysis
Estimating the Survival Function
One-Sample Nonparametric Methods
.Note: most statistical software packages summarize the KM survival
function at τj+ , i.e., just after the time of the j-th failure.
In other words, they provide Ŝ(τj+ )
When there is no censoring, the empirical survival estimate would then be:
e + ) = #individuals with T > t
S(t
total sample size
Survival Models and Analysis November 9, 2021 20 / 23
Survival Models and Analysis
Estimating the Survival Function
One-Sample Nonparametric Methods
How to get this in R
##Loading the survival package
> library(survival)
## Data
> time<-c(6,6,6,6,7,9,10,10,11,13,16,17,19,20,22,23,25,32,
32,34,35)
> status<-c(0,1,1,1,1,0,0,1,0,1,1,0,0,0,1,1,0,0,0,0,0)
> data<-data.frame(time,status)
##Survival object
> surv.object<-Surv(time,status)
> surv.object
[1] 6+ 6 6 6 7 9+ 10+ 10 11+ 13 16 17+ 19+ 20+ 22 23 25+
[18] 32+ 32+ 34+ 35+
##Fitting survival data
> fit<-survfit(surv.object ∼ 1)
Survival Models and Analysis November 9, 2021 21 / 23
Survival Models and Analysis
Estimating the Survival Function
One-Sample Nonparametric Methods
> fit
Call: survfit(formula = surv.object ∼ 1)
n events median 0.95LCL 0.95UCL
21 9 23 16 NA
> summary(fit)
Call: survfit(formula = surv.object 1)
time n.risk n.event survival std.err lower 95% CI upper 95% CI
6 21 3 0.857 0.0764 0.720 1.000
7 17 1 0.807 0.0869 0.653 0.996
10 15 1 0.753 0.0963 0.586 0.968
13 12 1 0.690 0.1068 0.510 0.935
16 11 1 0.627 0.1141 0.439 0.896
22 7 1 0.538 0.1282 0.337 0.858
23 6 1 0.448 0.1346 0.249 0.807
Survival Models and Analysis November 9, 2021 22 / 23
Survival Models and Analysis
Estimating the Survival Function
One-Sample Nonparametric Methods
##Survival plot
> plot(fit,xlab="t",ylab="S(t)")
Survival Models and Analysis November 9, 2021 23 / 23