0% found this document useful (0 votes)

46 views58 pages

Statistical Principles of Experimental Design: Dov Stekel

The document discusses principles of experimental design for statistical analysis. It covers topics like blocking and randomization to reduce variability, blinding to avoid bias, and power analysis to determine sample sizes. Several examples of microarray experimental designs for different study types are provided, comparing options like paired vs unpaired samples, inclusion of references, and time series designs. The best designs balance reducing variability from external factors while maintaining power and avoiding confounding between variables.

Uploaded by

Atif Hussain

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

46 views58 pages

Statistical Principles of Experimental Design: Dov Stekel

Uploaded by

Atif Hussain

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 58

Statistical Principles of

Experimental Design

Dov Stekel

Maximum information from

minimum effort

Overview

Blocking and randomization

Arrangement of samples and arrays
Class exercise
How many replicates?
Computer practical

Blocking, Randomization and

Blinding
Arrangement of experimental design that
minimises problems from extraneous
sources of variability
Use blocking to avoid confounding
Use randomization and blinding to avoid
bias

Toxicity Example
We are interested in characterising the
toxic effect of Benzo(a)pyrene (BP) on rats
8 Rats are to be treated with BP and 8 rats
with a control compound
Each array will be hybridized against a
reference sample
16 Arrays in the experiment

Experimental Design
There are two batches of 8 slides from two
different print runs (1 and 2)
Hybridisation will be done by two
researchers, Alison and Brian.
What is the best way to arrange the
experiment?

Design 1
Alison prepares all 8 BP samples and
hybridises them to the arrays of print run 1
Brian prepares all 8 control samples and
hybridises them to the arrays of print run 2

Design 2
Alison chooses 8 rats and treats 4 with BP and 4
with control substance.
She prepares and hybridises 2 BP samples to
arrays from print run 1 and 2 BP samples to
arrays from print run 2
She prepares and hybridises 2 control samples
to arrays from print run 1 and 2 control samples
to arrays from print run 2
Brian does the same with the other 8 rats

Design 2
Alison

Print Run 1

Print Run 2

Print Run 1

Print Run 2

Control
Treated

Brian
Control
Treated

Design 3
8 rats are randomly assigned to Alison, along
with 4 BP preparations and 4 control
preparations. She is not told which
preparations are which.
She prepares and hybridises samples to
randomly pre-arranged arrays so that 2 BP
samples and 2 control samples are hybridised
to 4 arrays from each of print runs 1 and 2.
Brian does the same with the other 8 rats

What is wrong with design 1?

Treatment, researcher and print run are
confounded variables
We cannot tell whether differences between the
two groups of rats result from treatment,
researcher or print run
Use blocking in designs 2 and 3 to deconfound
the variability of interest (treatment) from the
extraneous variabilities (researcher and print run)
Designs 2 and 3 are also balanced which
increases power of analyses

What is wrong with design 2?

Alison's choice of rats may be biased
For example, she may choose the
healthiest rats, so confounding potential
treatment effects with researcher variability
Use randomization and blinding in design
3 to avoid bias

Arrangement of Samples and Arrays

Is it better to use Affymetrix arrays or a
two-colour array system?
If using a two-colour array system, is it
better to use a reference sample?
If using a two-colour array system, what is
the best arrangement of samples on the
slides?

Several Factors

Available technology
Cost
Statistical considerations
We consider problem from perspective of
three different experiments

Example 1:
Hepatocellular Carcinomas
Samples are taken from disease and
healthy tissue from patients suffering from
hepatocellular carcinomas and hybridised
to microarrays. We would like to identify
genes that are up- or down- regulated in
hepatocellular carcinomas relative to
healthy tissue.

Design 1.1
Reference
Sample

Reference
Sample

Healthy 1

Disease 1

Array 1

Array 2

x 20

Design 1.2

Healthy 1

GeneChip 1

Disease 1

GeneChip 2

x 20

Design 1.3

Healthy 1

x 20
Disease 1

Array 1

Design 1.4

Healthy 1

Healthy 11

x 10

Disease 1

Disease 11

Array 1

Array 11

Design 1.5

Healthy 1

x 20
Disease 1

Disease 1

Array 1

Array 2

Which is the best design?

Simple experiment - five different

designs!
Design 1.1 is bad because it increases
variability.
Design 1.3 is bad because it confounds
colour with disease state.
Designs 1.4 and 1.5 are best.

Design 1.1
Reference
Sample

Reference
Sample

Healthy

Disease

Array 1

Array 2

Coefficient of
Variability is 30%
Design increases
variability to 43%

Design 1.5

Healthy

Disease

Array 1

Array 2

Coefficient of
Variability: 30%
Experimental
design reduces
variability to 21%

Example 2:
B-Cell Lymphomas
Samples are taken from 60 patients
suffering from B-cell lymphomas and
hybridised to microarrays. The aim of the
experiment is to identify clinically relevant
subgroups of patients using a cluster
analysis, and then to build a classification
model to differentiate between the
subgroups.

Design 2.1

Patient 1

x 30
Patient 2

Array 1

Design 2.2

Patient 1

x 60
Reference

Array 1

Design 2.3

Patient 1

GeneChip 1

x 60

Which design is best?

Design 2.1 is bad because it is difficult to
compare patients on equal footing.
Designs 2.2 and 2.3 are good.
Probably most appropriate use of
Affymetrix technology.

Example 3:
Yeast Time Series
Budding yeast can reproduce sexually by
producing haploid cells through a process
called sporulation. Yeast was placed in a
sporulating medium, samples taken at 7
timepoints from the start of sporulation.
We are interested in identifying genes that
show similar profiles in the timecourse.

Design 3.1
Time 0

Time 0

Time 1

Time 2

Time 3

Time 4

Time 5

Time 6

Array 1

Design 3.2
Time 0

Time 1

Time 2

Time 3

Time 4

Time 5

Time 6

Time 1

Time 2

Time 3

Time 4

Time 5

Time 6

Time 0

Array 1

Design 3.3

Time 0

Time 1

GeneChip 1 2

Time 2

Time 3

Time 4

Time 5

Time 6

Which is the best design?

Design 3.3 is bad because timepoint is
confounded with array.
Design 3.2 is a loop design. It is a good
design, but harder to analyse.
Design 3.1 is the best design.

Bright Timepoint Problem

Imagine we have a "bright" array. This
could be because of:
Higher gene expression
Experimental artifact

Normalising by array mean or median

cannot deconfound these factors

Time Series Example

Time Series Ratios

Raw Gene Expression for FYV1

FYV1 Normalised to Array

FYV1 Normalised to Reference

Class Exercise
Two strains of Staphylococcus aureus:
methicillin-sensitive and methicillinresistant
Each strain is cultured and then either
treated or untreated with methicillin
Samples are taken at several time points
(0h, 2h, 6h, 10h)
We want to identify genes involved in
methicillin-resistance

How Many Replicates?

Use Power Analysis which relates:
Difference in mean we are trying to detect
Population and experimental variability
Type of analysis
Chosen significance threshold
Number of replicates

Population Inferrence
Population

Sample

Inferrence

Confidence
The confidence is the probability of not getting
a false positive result.
It is the probability of accepting the null
hypothesis when the null hypothesis is true.
A false positive result is known as a Type I
Error.
We control for Type I errors explicitly by
selecting an appropriate confidence level
In microarray experiments, we must modify the
confidence level to account for multiplicity

Power
The power is the probability of not getting a false
negative result.
It is the probability of rejecting the null hypothesis
when the null hypothesis is false.
A false negative result is known as a Type II
Error.
We control the power implicitly via the confidence
level and the experimental design.

Type I and Type II Errors

TRUE SITUATION

OUR
DECISION

No effect

Effect

Not significant

Correct

Type II error

Significant

Type I error

Correct

Power Analysis Assumptions

We assume that the data is approximately log
normally distributed
This corresponds to standard deviation of the
errors of the raw data being proportional to the
signal intensity
This is equivalent to a constant standard
deviation in the logged data
The standard deviation divided by the mean is
called the coefficient of variation

Log Normally Distributed Data

Power Analysis
We will use the power.t.test() formula in
R to calculate the power of one and two
sample tests
power.t.test(n, delta, sd,
sig.level, power, type,
alternative)

Formula is used with one of the first five

variables omitted and will calculate the
unknown variable

Power Analysis Example:

Doxorubicin Chemotherapy
We are interested in the treatment of breast
cancer patients with doxorubicin chemotherapy
We want to perform a microarray experiment to
determine genes that are up- or down- regulated
as a result of the chemotherapy
We would like to know:
How to design the experiment?
How many patients we need?

Paired vs Unpaired Design

In a paired design, we take samples from each
patient before and after treatment, and for each
gene, look at the difference in expression before
and after treatment
In an unpaired design, we have two groups of
patients, one group treated, the other group
untreated. We look at the difference in gene
expression between the two groups
Which is a better experiment?

Paired and Unpaired Designs

Paired: test if
mean is different
from zero

Unpaired: test if
means of groups
are different

Power Analysis Assumptions

Suppose we know from a pilot study and
evaluation of our technology that the
coefficient of variation is 40%
Let's say that we want to detect genes that are
2-fold regulated
We are testing 10,000 genes so we will use a
signficance threshold of 0.001 to compensate
for multiplicity
How many patients do we need for a power of
80%, 90% and 99%?

Paired Experiment
The standard deviation of the underlying normal
distribution equivalent to 40% variability is 0.39
The difference in means is log2(2) = 1
The number of patients we need is:

Power
80%
90%
99%

Number
8
9
11

Unpaired Experiment
The standard deviation and difference in
means is the same.
The number of patients we need is:
Power
80%
90%
99%

Group Size
8
10
13

Number
16
20
26

1-Sample Number
8
9
11

Paired vs Unpaired
In this example, we need more than twice
the patients in the unpaired experiment to
obtain the same power as the paired
experiment
Paired experimental design is more
powerful than unpaired experimental
design because the differences between
individuals are factored out in the analysis

Conclusions
Extraneous variability:
Block to avoid confounding variables
Randomisation to avoid bias
Blocked experiments require ANOVA
analyses

Two sample experiments

Reference samples increase variability.
Hybridise both samples to same array.

Conclusions
Multiple patient comparisons
Reference samples or Affymetrix technology
enable comparisons.

Time series analysis

Reference samples are essential.

Number of replicates
Calculate using power analyses.

Computer Practical
Power analysis for population inference test

An Introduction To Experimental Design
No ratings yet
An Introduction To Experimental Design
48 pages
Solution Manual For Design and Analysis of Experiments 9th Edition - Douglas C. Montgomery
30% (10)
Solution Manual For Design and Analysis of Experiments 9th Edition - Douglas C. Montgomery
25 pages
Applied Longitudinal Analysis Lecture Notes
No ratings yet
Applied Longitudinal Analysis Lecture Notes
475 pages
Health & Safety Policy of Garments Factory
100% (6)
Health & Safety Policy of Garments Factory
5 pages
Improved Statistical Test
87% (172)
Improved Statistical Test
20 pages
The Top Five Glute Exercises
100% (7)
The Top Five Glute Exercises
19 pages
Designing Comparative Experiments: Points of View
No ratings yet
Designing Comparative Experiments: Points of View
2 pages
Lab Exam: - When: Nov 27 - Dec 1 - Length 1 Hour
No ratings yet
Lab Exam: - When: Nov 27 - Dec 1 - Length 1 Hour
53 pages
Design of Experiments
100% (2)
Design of Experiments
60 pages
1 Improved Statistical Test
100% (1)
1 Improved Statistical Test
20 pages
A First Course in Experimental Design
No ratings yet
A First Course in Experimental Design
193 pages
Biostatistics Assignment: Dna Microarray: AN
No ratings yet
Biostatistics Assignment: Dna Microarray: AN
14 pages
Design of Experiments
No ratings yet
Design of Experiments
60 pages
1 Improved Statistical Test
No ratings yet
1 Improved Statistical Test
20 pages
BM-1, Applied Statistics, Lesson 2: Comparing Two Groups (And One Group)
No ratings yet
BM-1, Applied Statistics, Lesson 2: Comparing Two Groups (And One Group)
39 pages
MEMs Tutorial
No ratings yet
MEMs Tutorial
58 pages
Requirements For Research: (Second Quarter)
No ratings yet
Requirements For Research: (Second Quarter)
13 pages
Statistical Analysis of Experimental Designs Applied To Biological Assays
No ratings yet
Statistical Analysis of Experimental Designs Applied To Biological Assays
42 pages
Microarray Data Analysis
No ratings yet
Microarray Data Analysis
11 pages
Multivariate Exploratory
No ratings yet
Multivariate Exploratory
13 pages
Factorial Experiment Design Guide
No ratings yet
Factorial Experiment Design Guide
20 pages
High Point University Security Hybridization Report
No ratings yet
High Point University Security Hybridization Report
19 pages
Use of The Half-Normal Probability Plot To Identify Significant Effects For Microarray Data
No ratings yet
Use of The Half-Normal Probability Plot To Identify Significant Effects For Microarray Data
24 pages
Lec13 28oct2015
No ratings yet
Lec13 28oct2015
59 pages
Sample Size & Power for Researchers
No ratings yet
Sample Size & Power for Researchers
84 pages
Basics in Experiment Design: Key Terms
No ratings yet
Basics in Experiment Design: Key Terms
14 pages
A Nova Sumner 2016
No ratings yet
A Nova Sumner 2016
23 pages
Gene Expression Analysis: Ulf Leser and Karin Zimmermann
No ratings yet
Gene Expression Analysis: Ulf Leser and Karin Zimmermann
46 pages
Seminar 3
No ratings yet
Seminar 3
69 pages
GraphPad Prism Slides
No ratings yet
GraphPad Prism Slides
79 pages
Genetics & Genomics in Pediatric Primary Care
No ratings yet
Genetics & Genomics in Pediatric Primary Care
55 pages
Torts and Damages: QUASI-DELICT (NCC: 2176)
100% (1)
Torts and Damages: QUASI-DELICT (NCC: 2176)
13 pages
Analysis of Microarray Gene Expression Data - M. Lee (Kluwer
No ratings yet
Analysis of Microarray Gene Expression Data - M. Lee (Kluwer
398 pages
Sokal Rohlf 2012 Contents
No ratings yet
Sokal Rohlf 2012 Contents
12 pages
Modeling Dose-Response Microarray Data in Early Drug Development Experiments Using R
No ratings yet
Modeling Dose-Response Microarray Data in Early Drug Development Experiments Using R
284 pages
Experimental Design 1
No ratings yet
Experimental Design 1
42 pages
Seminar 2
No ratings yet
Seminar 2
69 pages
Main Presentation DOE
No ratings yet
Main Presentation DOE
189 pages
R Session Bootstrapping Randomisation 2024
No ratings yet
R Session Bootstrapping Randomisation 2024
4 pages
Bioinfo 10
No ratings yet
Bioinfo 10
88 pages
STAT453 Study Guide
No ratings yet
STAT453 Study Guide
11 pages
ExperimentalDesignCourse Edwards 23-03-2021
No ratings yet
ExperimentalDesignCourse Edwards 23-03-2021
15 pages
Grant Writing For Cancer Studies
No ratings yet
Grant Writing For Cancer Studies
25 pages
Statistics
No ratings yet
Statistics
50 pages
Functional Communication
No ratings yet
Functional Communication
10 pages
Microarray Experiment Design
No ratings yet
Microarray Experiment Design
18 pages
PSYCO 91.01 Last Lecture 112522
No ratings yet
PSYCO 91.01 Last Lecture 112522
14 pages
Chapter 1
No ratings yet
Chapter 1
18 pages
Gastritis Diet Guide
No ratings yet
Gastritis Diet Guide
17 pages
ch1 - Tagged
No ratings yet
ch1 - Tagged
26 pages
Factories Act, 1948
No ratings yet
Factories Act, 1948
37 pages
Analysis of Microarray Gene Expression Data Ebook Full Text
100% (20)
Analysis of Microarray Gene Expression Data Ebook Full Text
17 pages
Hulda Winnes - Air Pollution From Ships
No ratings yet
Hulda Winnes - Air Pollution From Ships
92 pages
INE340 Advanced Statistics PPT Part 3
No ratings yet
INE340 Advanced Statistics PPT Part 3
166 pages
Lecture Slides - Before Running An Experiment
No ratings yet
Lecture Slides - Before Running An Experiment
27 pages
Design of Experiments Bsce 2nd Year Corrected
No ratings yet
Design of Experiments Bsce 2nd Year Corrected
79 pages
Is Europe Dying
No ratings yet
Is Europe Dying
5 pages
Cestodes
No ratings yet
Cestodes
4 pages
Activity Title: Clay Sculpture: Description of Activity: in A Big or Small Group of People, Arrange The Participants in
No ratings yet
Activity Title: Clay Sculpture: Description of Activity: in A Big or Small Group of People, Arrange The Participants in
16 pages
Combined STAT101B CheatSheet Raw
No ratings yet
Combined STAT101B CheatSheet Raw
17 pages
Experimental
No ratings yet
Experimental
24 pages
Tutorial 8
No ratings yet
Tutorial 8
23 pages
Community Rehabilitation Programs
100% (1)
Community Rehabilitation Programs
5 pages
Safety Data Sheet Rust Remover: Revision Date: 12/05/2015 Revision: 5 Supersedes Date: 02/06/2014
No ratings yet
Safety Data Sheet Rust Remover: Revision Date: 12/05/2015 Revision: 5 Supersedes Date: 02/06/2014
9 pages
EBOOK Etextbook PDF For Voice Disorders Third Edition 3Rd Edition Download Full Chapter PDF Docx Kindle
100% (53)
EBOOK Etextbook PDF For Voice Disorders Third Edition 3Rd Edition Download Full Chapter PDF Docx Kindle
61 pages
Nursing Care for Diverticulitis
No ratings yet
Nursing Care for Diverticulitis
2 pages
A Lost Childhood: Sandali's Story Is Only One Among Many Cases of Child Sex Abuse Reported
No ratings yet
A Lost Childhood: Sandali's Story Is Only One Among Many Cases of Child Sex Abuse Reported
4 pages
Summative Test 1 Q2 Week 1-2
No ratings yet
Summative Test 1 Q2 Week 1-2
1 page
RESEARCH Finalizedd56
No ratings yet
RESEARCH Finalizedd56
80 pages
As Lecture 10 - Block Design, Relative Efficiency
No ratings yet
As Lecture 10 - Block Design, Relative Efficiency
52 pages
Statistical For de
No ratings yet
Statistical For de
9 pages
Lec 4 - Experiment Design
No ratings yet
Lec 4 - Experiment Design
29 pages
Statistics 502 Lecture Notes. Hoff. 2006
No ratings yet
Statistics 502 Lecture Notes. Hoff. 2006
160 pages
Aloe Vera
No ratings yet
Aloe Vera
8 pages
Healthcare Leadership & Management
No ratings yet
Healthcare Leadership & Management
15 pages
Jurnal Diagnostik
No ratings yet
Jurnal Diagnostik
5 pages
Bulettin
No ratings yet
Bulettin
6 pages
Sequencing Cards Handwashing 4 Step
No ratings yet
Sequencing Cards Handwashing 4 Step
22 pages
Impact of Trauma on Student Learning
No ratings yet
Impact of Trauma on Student Learning
5 pages
Design & Analysis of Experiments 10E 2020 Montgomery 1
No ratings yet
Design & Analysis of Experiments 10E 2020 Montgomery 1
50 pages
Accela Citizen Access
No ratings yet
Accela Citizen Access
1 page
Tribal-Sub-STP-Technical Know-How For Spirulina-23-12-2021
No ratings yet
Tribal-Sub-STP-Technical Know-How For Spirulina-23-12-2021
2 pages
21 Veracity News
No ratings yet
21 Veracity News
8 pages
Kali Group of Homoeopathic Remedies
No ratings yet
Kali Group of Homoeopathic Remedies
11 pages
Section 1
No ratings yet
Section 1
86 pages
Overall CHN Handouts
No ratings yet
Overall CHN Handouts
66 pages

Statistical Principles of Experimental Design: Dov Stekel

Uploaded by

Statistical Principles of Experimental Design: Dov Stekel

Uploaded by

Statistical Principles of

Maximum information from

Blocking and randomization

Blocking, Randomization and

What is wrong with design 1?

What is wrong with design 2?

Arrangement of Samples and Arrays

Which is the best design?

Simple experiment - five different

Which design is best?

Which is the best design?

Bright Timepoint Problem

Normalising by array mean or median

Time Series Example

Time Series Ratios

Raw Gene Expression for FYV1

FYV1 Normalised to Array

FYV1 Normalised to Reference

How Many Replicates?

Type I and Type II Errors

Power Analysis Assumptions

Log Normally Distributed Data

Formula is used with one of the first five

Power Analysis Example:

Paired vs Unpaired Design

Paired and Unpaired Designs

Power Analysis Assumptions

Two sample experiments

Time series analysis

You might also like