Introduction to sample
size and power
calculations
How much chance do we have
to reject the null hypothesis
when the alternative is in fact
true?
(whats the probability of
detecting a real effect?)
Can we quantify how
much power we have for
given sample sizes?
study 1: 263 cases, 1241 controls
Null
Distribution:
difference=0.
Rejection region.
Any value >= 6.5
(0+3.3*1.96)
For 5% significance
level, one-tail
area=2.5%
(Z/2 = 1.96)
Power= chance of being in the
Clinically relevant
rejection region if the alternative
alternative: is true=area to the right of this
difference=10%.
line (in yellow)
study 1: 263 cases, 1241 controls
Rejection region.
Any value >= 6.5
(0+3.3*1.96)
Power here:
6.5 10
P( Z >
)=
3.3
P( Z > 1.06) = 85%
Power= chance of being in the
rejection region if the alternative
is true=area to the right of this
line (in yellow)
study 1: 50 cases, 50 controls
Critical value=
0+10*1.96=20
Z/2=1.96
2.5% area
Power closer to
15% now.
Study 2: 18 treated, 72 controls, STD DEV = 2
Critical value=
0+0.52*1.96 = 1
Clinically relevant
alternative:
difference=4 points
Power is nearly
100%!
Study 2: 18 treated, 72 controls, STD DEV=10
Critical value=
0+2.58*1.96 = 5
Power is about
40%
Study 2: 18 treated, 72 controls, effect size=1.0
Critical value=
0+0.52*1.96 = 1
Power is about
50%
Clinically relevant
alternative:
difference=1 point
Factors Affecting Power
1. Size of the effect
2. Standard deviation of the
characteristic
3. Bigger sample size
4. Significance level desired
1. Bigger difference from the null mean
Null
Clinically
relevant
alternative
average weight from samples of 100
2. Bigger standard deviation
average weight from samples of 100
3. Bigger Sample Size
average weight from samples of 100
4. Higher significance level
Rejection region.
average weight from samples of 100
Sample size calculations
Based on these elements, you can
write a formal mathematical
equation that relates power,
sample size, effect size, standard
deviation, and significance level
**WE WILL DERIVE THESE
FORMULAS FORMALLY SHORTLY**
Simple formula for
difference in means
Represents the
desired power
(typically .84 for
80% power).
Sample size in
each group
(assumes equal
sized groups)
2 ( Z Z/2 )
2
difference Represents the
Standard
deviation of the
outcome variable
Effect Size
(the
difference in
desired level of
statistical
significance
(typically 1.96).
Simple formula for
difference in proportions
Represents the
desired power
(typically .84 for
80% power).
Sample size in
each group
(assumes equal
sized groups)
2( p )(1 p )( Z Z/2 )
A measure of
variability
(similar to
standard
(p1 p2 )
Effect Size
(the
difference in
proportions)
2
Represents the
desired level
of statistical
significance
(typically
Derivation of sample size
formula.
Study 2: 18 treated, 72 controls, effect size=1.0
Critical value= 0+.52*1.96=1
Power close to 50%
SAMPLE SIZE AND POWER FORMULAS
Critical value=
0+standard error (difference)*Z/2
Power= area to right of Z=
Z
critical value - alternative difference (here 1)
standard error (diff)
e.g . here :Z
0
; power 50%
standard error (diff)
Power= area to right of Z=
Z
critical value - alternative difference
standard error (diff)
Z/2 * standard error (diff) - difference
Z
standard error(diff)
Power is the area to the right of Z .
difference
Z Z/2
OR power is the area to the left of standard error(diff) Z .
Since normal charts give us the area
to the left by convention, we need to
difference
Z
Z/2 use
standard error(diff)
- Z to get the correct value.
Z power Z
Most textbooks just call this
Z; Ill use the term Zpower to
avoid confusion.
the area to the left of Z power the area to the right of Z
All-purpose power
formula
Z power
difference
Z / 2
standard error(difference)
Derivation of a sample
size formula
s.e.(diff )
n1 n2
2
Sample size is embedded in
the standard error.
if ratio r of group 2 to group 1 : s.e.(diff )
n1 rn1
2
Algebra
Z power
Z power
difference
2 2
n1 rn1
difference
(r 1) 2
rn1
2
( Z power Z/2 ) (
Z/2
Z/2
difference
(r 1) 2
rn1
)2
( r 1) ( Z power Z/2 ) rn1difference
rn1 difference ( r 1) ( Z power Z/2 )
( r 1) ( Z power Z/2 )
2
n1
rdifference 2
(r 1) ( Z power Z/2 )
n1
2
r
difference
2
If r 1 (equal groups), then n1
2 2 ( Z power Z/2 ) 2
difference 2
Sample size formula for
difference in means
(r 1) ( Z power Z/2 )
n1
2
r
difference
2
where :
n 1 size of smaller group
r ratio of larger group to smaller group
standard deviation of the characteristic
diffference clinically meaningful difference in means of the outcome
Z power corresponds to power (.84 80% power)
Z / 2 corresponds to two - tailed significance level (1.96 for .05)
Examples
Example 1: You want to calculate how much
power you will have to see a difference of 3.0
IQ points between two groups: 30 male doctors
and 30 female doctors. If you expect the
standard deviation to be about 10 on an IQ
test for both groups, then the standard error
for the difference will be about:
10 2 10 2
= 2.57
30
30
Power formula
Z power
Z
Z power
d*
Z / 2
(d *)
d*
2 2
n
Z / 2
d* n
Z / 2
2
d*
3
d* n
3
Z / 2
1.96 .79 or ZZpower
/2
(d *)
2.57
2
10
30
1.96 .79
2
P(Z -.79) =.21; only 21% power to see a difference of 3 IQ points.
Example 2: How many people
would you need to sample in each
group to achieve power of 80%
(corresponds to Z=.84)
2 2 ( Z Z / 2 ) 2
(d *) 2
100(2)(.84 1.96) 2
174
2
(3)
174/group; 348 altogether
Sample Size needed for
comparing two
proportions:
Example: I am going to run a case-control
study to determine if pancreatic cancer is
linked to drinking coffee. If I want 80%
power to detect a 10% difference in the
proportion of coffee drinkers among
cases vs. controls (if coffee drinking and
pancreatic cancer are linked, we would
expect that a higher proportion of cases
would be coffee drinkers than controls),
how many cases and controls should I
sample? About half the population drinks
coffee.
Derivation of a sample
size formula:
The standard error of the difference of two proportion
p (1 p ) p (1 p )
n1
n2
Derivation of a sample
size formula:
Here, if we assume equal sample size and
that, under the null hypothesis proportions of
coffee drinkers is .5 in both cases and
controls, then
s.e.(diff)=
.5(1 .5) .5(1 .5)
.5 / n
n
n
Z power
test statistic
Z / 2
s.e.(test statistic )
Z power =
.10
.5 / n
1.96
For 80% power
.84
.10
.5 / n
.84 1.96
1.96
.10
.5 / n
2
.
10
n
(.84 1.96) 2
.5
.5(.84 1.96) 2
n
392
2
.10
There is 80% area to
the left of a Z-score
of .84 on a standard
normal curve;
therefore, there is 80%
area to the right of
-.84.
Would take 392 cases and 392 controls to have 80% power!
Total=784
Question 2:
How many total cases and controls would I
have to sample to get 80% power for the
same study, if I sample 2 controls for
every case?
Ask yourself, what changes here?
Z power
test statistic
Z / 2
s.e.(test statistic)
p (1 p ) p (1 p )
.25 .25
.25 .5
.75
.75
2n
n
2n
n
2 n 2n
2n
2n
Different size groups
.84
.10
.75 / 2n
1.96
.10
.84 1.96
.75 / 2n
(.10 2 ) 2n
(.84 1.96)
.75
.75(.84 1.96) 2
n
294
2
( 2).10
2
Need: 294 cases and 2x294=588 controls. 882 total.
Note: you get the best power for the lowest sample size if you keep both groups equal (882 > 784).
You would only want to make groups unequal if there was an obvious difference in the cost or ease of
collecting data on one group. E.g., cases of pancreatic cancer are rare and take time to find.
General sample size
formula
s.e.(diff )
p (1 p ) p (1 p)
rn
n
p(1 p ) rp(1 p )
( r 1) p (1 p)
rn
rn
rn
2
r 1 p (1 p )( Z power Z / 2 )
n
r
( p1 p 2 ) 2
General sample size needs
when outcome is binary:
2
p
(
1
p
)(
Z
Z
)
r 1
/2
n
2
r
( p1 p2 )
where :
n size of smaller group
r ratio of larger group2 to smaller group
2 ( Z power Z / 2 ) 2
p1 p2 clinically
n meaningful difference in proportions of the outcome
2
Z corresponds to power (.84
80%
(diff
) power)
Z / 2 corresponds to two - tailed significance level (1.96 for .05)
Compare with when
outcome is continuous:
(r 1) ( Z Z/2 )
n1
2
r
difference
2
where :
n1 size of smaller group
r ratio of larger group to smaller group
standard deviation of the characteristic
diffference clinically meaningful difference in means of the outcome
Z corresponds to power (.84 80% power)
Z / 2 corresponds to two - tailed significance level (1.96 for .05)
Question
How many subjects would we need to
sample to have 80% power to detect an
average increase in MCAT biology score
of 1 point, if the average change without
instruction (just due to chance) is plus or
minus 3 points (=standard deviation of
change)?
Standard error here=
change
n
3
n
Z power
Z power
test statistic
Z / 2
s.e.(test statistic )
Z / 2
D
n
( Z power Z / 2 ) 2
2
Where
D=change from
test 1 to test 2.
(difference)
nD
D ( Z power Z / 2 )
D2
Therefore, need:
(9)(1.96+.84)2/1=
70peopletotal
2
Sample size for paired
data:
2
d ( Z Z/2 )
difference
where :
n sample size
standard deviation of the within - pair difference
diffference clinically meaningful difference
Z corresponds to power (.84 80% power)
Z / 2 corresponds to two - tailed significance level (1.96 for .05)
Paired data difference in
proportion: sample size:
n
p (1 p )( Z Z / 2 ) 2
( p1 p2 )
where :
n sample size for 1 group
p1 p2 clinically
2meaningful
( Z powerdifference
Z / in2 )dependent proportions
2
n s to power (.84 80%
Z correspond
2 power)
(diff )
Z / 2 corresponds to two - tailed significance level (1.96 for .05)