Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
15 views48 pages

Week 1 - Introduction

The document is an introduction to a course on Survival Analysis, taught by Jerry Dwi Trijoyo Purnomo and Diaz F. Aksioma at Institut Teknologi Sepuluh Nopember, covering fundamental concepts, methods, and applications of survival analysis. It includes a detailed course outline, evaluation criteria, and descriptions of various types of censoring in survival data. The course aims to equip students with the skills to analyze time-to-event data and interpret survival functions.

Uploaded by

Oryza Sativa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views48 pages

Week 1 - Introduction

The document is an introduction to a course on Survival Analysis, taught by Jerry Dwi Trijoyo Purnomo and Diaz F. Aksioma at Institut Teknologi Sepuluh Nopember, covering fundamental concepts, methods, and applications of survival analysis. It includes a detailed course outline, evaluation criteria, and descriptions of various types of censoring in survival data. The course aims to equip students with the skills to analyze time-to-event data and interpret survival functions.

Uploaded by

Oryza Sativa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 48

Introduction to

Survival Analysis
Jerry Dwi Trijoyo Purnomo

INSTITUT TEKNOLOGI SEPULUH NOPEMBER


(ITS)
Surabaya - Indonesia

www.its.ac.id
Instructors:
• Jerry D.T. Purnomo, Ph.D.
• Diaz F. Aksioma, Ph.D.

Class Meeting:
• Thursday, 13.30-16.00

Credit:
• Three (3) credits

2
Jerry Dwi Trijoyo Purnomo
(B.Sc.-ITS; M.Sc.-ITS; Ph.D.-NCTU, Taiwan)

Education:
B.Sc. : Statistics, Institut Teknologi Sepuluh Nopember, Indonesia, 2003
M.Sc. : Statistics, Institut Teknologi Sepuluh Nopember, Indonesia, 2005
Ph.D. : Biostatistics and Bioinformatics, National Chiao Tung University, Taiwan, 2018

Keanggotaan Profesi dan Jabatan:


- Phi Tau Phi Honorary Member, 2018 – sekarang
- Kaprodi pascasarjana MT ITS, 2020 – 2024
Textbooks
1. Kleinbaum, D.G., and Klein, M. (2012). Survival Analysis, third edition, Springer Science
and Bussiness Media, LLC.
2. Hosmer, D.W., Lemeshow, S., and May, S. (2008). Applied Survival Analysis, John Wiley
& Sons, Inc., Hoboken, New Jersey.
3. Klein, J.P., and Moeschberger, M.L. (2003). Survival Analysis: Techniques for Censored
and Truncated Data, second edition, Springer, New York.
4. Cox, D.R., and Oakes, D. (1984). Analysis of Survival Data, University Printing House,
Cambridge.
5. Le, C.T. (1997). Applied Survival Analysis, John Wiley & Sons.
6. Purnami, S.W., Andari, S., Prastyo, D.D., and Purnomo, J.D.T. (2024). Analisis Survival
dan Aplikasinya Menggunakan R. ITS Press.

4
Course Outline
Week 1 – Introduction to Survival Analysis: basic concept and censored data.
Week 2 – Survival Function: Survival Function (parametric), Kaplan Meier survival
curve, hazard rate.
Week 3 and 4 – Log Rank (LR)Test: LR test for 2 group, and more than 2 group.
Week 5, 6 and 7 – Parametric survival regression: exponential, Weibull, and
loglogistic regression.
Week 8 – Midterm

5
Course Outline Cont.
Week 9-10 – Cox proportional hazard (PH) model: estimation, hazard ratio, interval
estimation
Week 10-11 – The evaluation of the assumption of PH: graph and goodness of fit
aproaches.
Week 12-13 – Test for the assumption of PH using covariate time dependent
Week 14-15 – Stratified Cox Procedure
Week 16 – Final Exam

6
Evaluation
• Midterm = 25%
• HW and/or quiz = 25%

7
Course Outline
Week 1 – Introduction to Survival Analysis: basic concept and censored data.
Week 2 – Survival Function: Survival Function (parametric), Kaplan Meier survival
curve, hazard rate.
Week 3 and 4 – Log Rank (LR)Test: LR test for 2 group, and more than 2 group.
Week 5, 6 and 7 – Parametric survival regression: exponential, Weibull, and
loglogistic regression.
Week 8 – Midterm

8
What is Survival Analysis? (1/3)
• Survival analysis is a collection of statistical procedures for data analysis for
which the outcome variable of interest is time until an event occurs.
• By time, we mean years, months, weeks, or days from the beginning of
follow-up of an individual until an event occurs, alternatively, time can
refer to the age of an individual when an event occurs.
• By event, we mean death, disease incidence,
relapse from remission, recovery (e.g., return to
work) or any designated experience of interest
that may happen to an individual.

9
What is Survival Analysis? (2/3)
• Although more than one event may be considered in the same
analysis, we will assume that only one event is of designated interest.
• When more than one event is considered (e.g., death from any of
several causes), the statistical problem can be characterized as either
a recurrent event or a competing risk problem.

10
What is Survival Analysis? (3/3)
• In a survival analysis, we usually refer to the time variable as
survival time, because it gives the time that an individual has
“survived” over some follow-up period. We also typically refer to
the event as a failure, because the event of interest usually is
death, disease incidence, or some other negative individual
experience.

11
Applications (1/2)
1. Leukemia patients/time in remission (weeks)
2. Elderly (60+) population/time until death (years)
3. Parolees (recidivism study)/time until rearrest (weeks)
4. Heart transplants/time until death (months)
5. Smoking study/time to first smoking (age)
6. Sociology/birth of the first child
7. Labor economics/time to unemployment (James Heckman: Nobel
prize winner 2000)
8. Industrial statistics – reliability

12
Application (2/2)
Sociology

T
marriage birth of the
first child

Labor Economy

T
employment unemployment

13
Important Descriptive Measures for T
Key points:
➢Understand the meaning and application of each measure.
➢Understand the mathematical relationship between different
measures.

14
Summary
• Outcome: Time until an event occurs (T>0)
Start follow-up Time Event
• Event : death, disease, relapse, recovery
• Assume 1 event
Recurrent event or
• >1 event
Competing risk

Time ≡ survival time


Event ≡ failure
15
Censored Data (1/2)
• Censoring occurs when we have some information about
individual survival time, but we don’t know the survival time
exactly.

16
Censored Data (2/2)
Three reasons why censoring may occur:
1. study ends – no event
2. lost to follow-up (drop out)
3. Withdraws from the study due to the reason that is not the event of
interest

17
Types of censored data
• Right censoring (type I censoring)
• Left censoring
• Interval censoring
• Double censoring

18
Right Censoring
• Right censoring occurs when a subject leaves the study before an
event occurs, or the study ends before the event has occurred.
• Example: Suppose you’re conducting a study on pregnancy duration.
You’re ready to complete the study and run your analysis, but some
women in the study are still pregnant, so you don’t know exactly how
long their pregnancies will last.

19
Illustration for Right Censoring
study begin study end
January 15, 2005 December 20, 2012
C
T
1 x

T
C

2 o x

3 o Lost to follow up

4 o Withdraw

n
20
Notations (Right Censoring)
• Let C be the censoring variable: time from beginning to end-if-study
• Observed variables:
X=min(T,C) and δ=I(T≤C) → plot
δ=1 → X=T, C>T
δ=0 → X=C, X>C
• Objective: recover the information of T based on a random sample
of (X, d)
• Common assumption: T ⊥ C (independent censorship)

21
Left Censoring
• The “failure” occurred before a particular time.
• Turnbull and Weiss (1978) report part of a study conducted at the
Stanford-Palo Alto Peer Counseling Program (see Hamburg et al.
[1975] for details of the study). In this study, 191 California high
school boys were asked, “When did you first use marijuana?” The
answers were the exact ages (uncensored observations); or “I have
used it but can not recall just when the first time was,” which is a left-
censored observation

22
Illustration for Left Censoring
study begin study end
January 15, 2005 December 20, 2012
C
T
1 x

2 o x
C

23
Notations (Left Censoring)
• Let C be the censoring variable = the time from use marijuana to
recruitment
• Observed variables:
X=max(T,C) and δ=I(T≥C)
δ=1 → X=T, C≥T
δ=0 → X=C, X<C

24
Double Censoring
• Both left and right censoring.
• A rare case.
• Turnbull and Weiss (1978) report part of a study conducted at the Stanford-
Palo Alto Peer Counseling Program (see Hamburg et al. [1975] for details of
the study). In this study, 191 California high school boys were asked, “When
did you first use marijuana?” The answers were the exact ages (uncensored
observations); “I never used it,” which are right-censored observations at
the boys’ current ages; or “I have used it but can not recall just when the
first time was,” which is a left-censored observation

25
Illustration for Double Censoring

study begin study end


January 15, 2005 December 20, 2012
C
T
1 x

2 o
C
T
C

3 o x

n
26
Notations (Double Censoring)
• Let (Cl, Cr) be the left and right censoring variables, respectively.
• Observed variables: X=max{min(T, Cr), Cl}
δ=1 if X=T and Cl<T<Cr → exact
δ=0 if X=Cr and T>Cr → right censoring
δ=-1 if X=Cl and T<Cl → left cencoring

27
Interval Censoring
• We know the “failure” occurred within some given time period.
• If we don’t know exactly when some students used marijuana but we
know it was within some interval of time, these observations would
be interval-censored.

28
Terminology and Notation
• T = survival time (T ≥ 0); T is random variable
• t = specific value for T
• δ = (0, 1) random variable
 1 if failure
=
 0 censored
cause: study ends, lost to follow-up, withdraws
• S(t) = P(T > t)= survivor function
• h(t) = hazard function

29
Survival Curve (1/2)
Theoretical S(t):

Properties:
• They are nonincreasing; that is, they head downward as t increases
• at time t = 0, S(t) = S(0) = 1
• at time t = ∞, S(t) = S(∞) = 0
30
Survival Curve (2/2)
Sˆ (t ) in practice:

• In practice, when using actual data, we usually obtain graphs that are step functions.

31
Hazard Function (1/3)
• The hazard function is given by
P (t  T  t + t | T  t )
h ( t ) = lim
t →0 t
• This mathematical formula is difficult to explain in practical terms.
• The hazard function h(t) gives the instantaneous potential per unit time for the
event to occur, given that the individual has survived up to time t.
• Note that, in contrast to the survivor function, which focuses on not failing, the
hazard function focuses on failing, that is, on the event occurring (S(t): not failing
v.s. h(t): failing)

32
Hazard Function (2/3)
• The numerator: conditional probability.
• Because of the given sign here, the hazard function is sometimes called a
conditional failure rate.
• Rate: 0 to ∞
• The value obtained will give a different number depending on the units of time
used, and may even give a number larger than one.

33
Hazard Function (3/3)
Hazard Function

• h(t)≥0
• h(t) has no upper bound
• It is always nonnegative, that is, equal to or greater than zero
• It has no upper bound

34
S(t) v.s. h(t)
S(t): directly describes survival
h(t):
• A measure of instantaneous potential
• Identify specific model form
• Math model for survival analysis

35
Relationship of S(t) and h(t)
• If you know one, you can determine the other.
• General Formulae:
S ( t ) = exp −  h ( u ) du 
 t

 0 
 dS ( t ) dt 
h (t ) = −  
 S ( ) 
t

• The first of these formulae describes how the survivor function S(t) can be
written in terms of an integral involving the hazard function.

36
Goal of Survival Analysis (1/2)
The basic goals of survival analysis:
• To estimate and interpret survivor and/or hazard functions from survival data
• To compare survivor and/or hazard functions
• To assess the relationship of explanatory variables to survival time

37
Goal of Survival Analysis (2/2)

• Note that up to 6 weeks, the survivor function for the treatment group lies above
that for the placebo group, but thereafter the two functions are at about the same
level.
• This dual graph indicates that up to 6 weeks the treatment is more effective for
survival than the placebo but has about the same effect thereafter.

38
Basic Data Layout

39
Example

40
Descriptive Measure (1/2)

41
Descriptive Measure (2/2)
• Placebo hazard > treatment hazard: suggests that treatment is more
effective than placebo
• Descriptive measures give overall comparison; they do not give
comparison over time.

42
Estimated Survivor Curves

43
Multivariable Example

44
Measure of Effect:
Linear regression:
regression coefficient β

Logistic regression
odds ratio exp(β)

Survival analysis
hazard ratio exp(β)

45
Censoring Assumption
Three assumptions about censoring:
• Independent (vs.non-independent) censoring
• Random (vs. non-random) censoring
• Non-informative (vs. informative) censoring

46
Independent Censoring
• Most useful
• Affects validity
• Independent censoring is random censoring conditional on each
level of covariates.

47
Non-informative Censoring
• Non-informative censoring occurs if the distribution of survival
times (T) provides no information about the distribution of
censorship times (C), and vice versa.

48

You might also like