Introduction to
Survival Analysis
Jerry Dwi Trijoyo Purnomo
INSTITUT TEKNOLOGI SEPULUH NOPEMBER
(ITS)
Surabaya - Indonesia
www.its.ac.id
Instructors:
• Jerry D.T. Purnomo, Ph.D.
• Diaz F. Aksioma, Ph.D.
Class Meeting:
• Thursday, 13.30-16.00
Credit:
• Three (3) credits
2
Jerry Dwi Trijoyo Purnomo
(B.Sc.-ITS; M.Sc.-ITS; Ph.D.-NCTU, Taiwan)
Education:
B.Sc. : Statistics, Institut Teknologi Sepuluh Nopember, Indonesia, 2003
M.Sc. : Statistics, Institut Teknologi Sepuluh Nopember, Indonesia, 2005
Ph.D. : Biostatistics and Bioinformatics, National Chiao Tung University, Taiwan, 2018
Keanggotaan Profesi dan Jabatan:
- Phi Tau Phi Honorary Member, 2018 – sekarang
- Kaprodi pascasarjana MT ITS, 2020 – 2024
Textbooks
1. Kleinbaum, D.G., and Klein, M. (2012). Survival Analysis, third edition, Springer Science
and Bussiness Media, LLC.
2. Hosmer, D.W., Lemeshow, S., and May, S. (2008). Applied Survival Analysis, John Wiley
& Sons, Inc., Hoboken, New Jersey.
3. Klein, J.P., and Moeschberger, M.L. (2003). Survival Analysis: Techniques for Censored
and Truncated Data, second edition, Springer, New York.
4. Cox, D.R., and Oakes, D. (1984). Analysis of Survival Data, University Printing House,
Cambridge.
5. Le, C.T. (1997). Applied Survival Analysis, John Wiley & Sons.
6. Purnami, S.W., Andari, S., Prastyo, D.D., and Purnomo, J.D.T. (2024). Analisis Survival
dan Aplikasinya Menggunakan R. ITS Press.
4
Course Outline
Week 1 – Introduction to Survival Analysis: basic concept and censored data.
Week 2 – Survival Function: Survival Function (parametric), Kaplan Meier survival
curve, hazard rate.
Week 3 and 4 – Log Rank (LR)Test: LR test for 2 group, and more than 2 group.
Week 5, 6 and 7 – Parametric survival regression: exponential, Weibull, and
loglogistic regression.
Week 8 – Midterm
5
Course Outline Cont.
Week 9-10 – Cox proportional hazard (PH) model: estimation, hazard ratio, interval
estimation
Week 10-11 – The evaluation of the assumption of PH: graph and goodness of fit
aproaches.
Week 12-13 – Test for the assumption of PH using covariate time dependent
Week 14-15 – Stratified Cox Procedure
Week 16 – Final Exam
6
Evaluation
• Midterm = 25%
• HW and/or quiz = 25%
7
Course Outline
Week 1 – Introduction to Survival Analysis: basic concept and censored data.
Week 2 – Survival Function: Survival Function (parametric), Kaplan Meier survival
curve, hazard rate.
Week 3 and 4 – Log Rank (LR)Test: LR test for 2 group, and more than 2 group.
Week 5, 6 and 7 – Parametric survival regression: exponential, Weibull, and
loglogistic regression.
Week 8 – Midterm
8
What is Survival Analysis? (1/3)
• Survival analysis is a collection of statistical procedures for data analysis for
which the outcome variable of interest is time until an event occurs.
• By time, we mean years, months, weeks, or days from the beginning of
follow-up of an individual until an event occurs, alternatively, time can
refer to the age of an individual when an event occurs.
• By event, we mean death, disease incidence,
relapse from remission, recovery (e.g., return to
work) or any designated experience of interest
that may happen to an individual.
9
What is Survival Analysis? (2/3)
• Although more than one event may be considered in the same
analysis, we will assume that only one event is of designated interest.
• When more than one event is considered (e.g., death from any of
several causes), the statistical problem can be characterized as either
a recurrent event or a competing risk problem.
10
What is Survival Analysis? (3/3)
• In a survival analysis, we usually refer to the time variable as
survival time, because it gives the time that an individual has
“survived” over some follow-up period. We also typically refer to
the event as a failure, because the event of interest usually is
death, disease incidence, or some other negative individual
experience.
11
Applications (1/2)
1. Leukemia patients/time in remission (weeks)
2. Elderly (60+) population/time until death (years)
3. Parolees (recidivism study)/time until rearrest (weeks)
4. Heart transplants/time until death (months)
5. Smoking study/time to first smoking (age)
6. Sociology/birth of the first child
7. Labor economics/time to unemployment (James Heckman: Nobel
prize winner 2000)
8. Industrial statistics – reliability
12
Application (2/2)
Sociology
T
marriage birth of the
first child
Labor Economy
T
employment unemployment
13
Important Descriptive Measures for T
Key points:
➢Understand the meaning and application of each measure.
➢Understand the mathematical relationship between different
measures.
14
Summary
• Outcome: Time until an event occurs (T>0)
Start follow-up Time Event
• Event : death, disease, relapse, recovery
• Assume 1 event
Recurrent event or
• >1 event
Competing risk
Time ≡ survival time
Event ≡ failure
15
Censored Data (1/2)
• Censoring occurs when we have some information about
individual survival time, but we don’t know the survival time
exactly.
16
Censored Data (2/2)
Three reasons why censoring may occur:
1. study ends – no event
2. lost to follow-up (drop out)
3. Withdraws from the study due to the reason that is not the event of
interest
17
Types of censored data
• Right censoring (type I censoring)
• Left censoring
• Interval censoring
• Double censoring
18
Right Censoring
• Right censoring occurs when a subject leaves the study before an
event occurs, or the study ends before the event has occurred.
• Example: Suppose you’re conducting a study on pregnancy duration.
You’re ready to complete the study and run your analysis, but some
women in the study are still pregnant, so you don’t know exactly how
long their pregnancies will last.
19
Illustration for Right Censoring
study begin study end
January 15, 2005 December 20, 2012
C
T
1 x
T
C
2 o x
3 o Lost to follow up
4 o Withdraw
n
20
Notations (Right Censoring)
• Let C be the censoring variable: time from beginning to end-if-study
• Observed variables:
X=min(T,C) and δ=I(T≤C) → plot
δ=1 → X=T, C>T
δ=0 → X=C, X>C
• Objective: recover the information of T based on a random sample
of (X, d)
• Common assumption: T ⊥ C (independent censorship)
21
Left Censoring
• The “failure” occurred before a particular time.
• Turnbull and Weiss (1978) report part of a study conducted at the
Stanford-Palo Alto Peer Counseling Program (see Hamburg et al.
[1975] for details of the study). In this study, 191 California high
school boys were asked, “When did you first use marijuana?” The
answers were the exact ages (uncensored observations); or “I have
used it but can not recall just when the first time was,” which is a left-
censored observation
22
Illustration for Left Censoring
study begin study end
January 15, 2005 December 20, 2012
C
T
1 x
2 o x
C
23
Notations (Left Censoring)
• Let C be the censoring variable = the time from use marijuana to
recruitment
• Observed variables:
X=max(T,C) and δ=I(T≥C)
δ=1 → X=T, C≥T
δ=0 → X=C, X<C
24
Double Censoring
• Both left and right censoring.
• A rare case.
• Turnbull and Weiss (1978) report part of a study conducted at the Stanford-
Palo Alto Peer Counseling Program (see Hamburg et al. [1975] for details of
the study). In this study, 191 California high school boys were asked, “When
did you first use marijuana?” The answers were the exact ages (uncensored
observations); “I never used it,” which are right-censored observations at
the boys’ current ages; or “I have used it but can not recall just when the
first time was,” which is a left-censored observation
25
Illustration for Double Censoring
study begin study end
January 15, 2005 December 20, 2012
C
T
1 x
2 o
C
T
C
3 o x
n
26
Notations (Double Censoring)
• Let (Cl, Cr) be the left and right censoring variables, respectively.
• Observed variables: X=max{min(T, Cr), Cl}
δ=1 if X=T and Cl<T<Cr → exact
δ=0 if X=Cr and T>Cr → right censoring
δ=-1 if X=Cl and T<Cl → left cencoring
27
Interval Censoring
• We know the “failure” occurred within some given time period.
• If we don’t know exactly when some students used marijuana but we
know it was within some interval of time, these observations would
be interval-censored.
28
Terminology and Notation
• T = survival time (T ≥ 0); T is random variable
• t = specific value for T
• δ = (0, 1) random variable
1 if failure
=
0 censored
cause: study ends, lost to follow-up, withdraws
• S(t) = P(T > t)= survivor function
• h(t) = hazard function
29
Survival Curve (1/2)
Theoretical S(t):
Properties:
• They are nonincreasing; that is, they head downward as t increases
• at time t = 0, S(t) = S(0) = 1
• at time t = ∞, S(t) = S(∞) = 0
30
Survival Curve (2/2)
Sˆ (t ) in practice:
• In practice, when using actual data, we usually obtain graphs that are step functions.
31
Hazard Function (1/3)
• The hazard function is given by
P (t T t + t | T t )
h ( t ) = lim
t →0 t
• This mathematical formula is difficult to explain in practical terms.
• The hazard function h(t) gives the instantaneous potential per unit time for the
event to occur, given that the individual has survived up to time t.
• Note that, in contrast to the survivor function, which focuses on not failing, the
hazard function focuses on failing, that is, on the event occurring (S(t): not failing
v.s. h(t): failing)
32
Hazard Function (2/3)
• The numerator: conditional probability.
• Because of the given sign here, the hazard function is sometimes called a
conditional failure rate.
• Rate: 0 to ∞
• The value obtained will give a different number depending on the units of time
used, and may even give a number larger than one.
33
Hazard Function (3/3)
Hazard Function
• h(t)≥0
• h(t) has no upper bound
• It is always nonnegative, that is, equal to or greater than zero
• It has no upper bound
34
S(t) v.s. h(t)
S(t): directly describes survival
h(t):
• A measure of instantaneous potential
• Identify specific model form
• Math model for survival analysis
35
Relationship of S(t) and h(t)
• If you know one, you can determine the other.
• General Formulae:
S ( t ) = exp − h ( u ) du
t
0
dS ( t ) dt
h (t ) = −
S ( )
t
• The first of these formulae describes how the survivor function S(t) can be
written in terms of an integral involving the hazard function.
36
Goal of Survival Analysis (1/2)
The basic goals of survival analysis:
• To estimate and interpret survivor and/or hazard functions from survival data
• To compare survivor and/or hazard functions
• To assess the relationship of explanatory variables to survival time
37
Goal of Survival Analysis (2/2)
• Note that up to 6 weeks, the survivor function for the treatment group lies above
that for the placebo group, but thereafter the two functions are at about the same
level.
• This dual graph indicates that up to 6 weeks the treatment is more effective for
survival than the placebo but has about the same effect thereafter.
38
Basic Data Layout
39
Example
40
Descriptive Measure (1/2)
41
Descriptive Measure (2/2)
• Placebo hazard > treatment hazard: suggests that treatment is more
effective than placebo
• Descriptive measures give overall comparison; they do not give
comparison over time.
42
Estimated Survivor Curves
43
Multivariable Example
44
Measure of Effect:
Linear regression:
regression coefficient β
Logistic regression
odds ratio exp(β)
Survival analysis
hazard ratio exp(β)
45
Censoring Assumption
Three assumptions about censoring:
• Independent (vs.non-independent) censoring
• Random (vs. non-random) censoring
• Non-informative (vs. informative) censoring
46
Independent Censoring
• Most useful
• Affects validity
• Independent censoring is random censoring conditional on each
level of covariates.
47
Non-informative Censoring
• Non-informative censoring occurs if the distribution of survival
times (T) provides no information about the distribution of
censorship times (C), and vice versa.
48