Stats Power
Stats Power
Statistics Lecture 17
Statistical Power
Virginia Tech
Type I and Type II Errors
When we choose to use a fixed-level test, i.e. some particular α, then such a fixed-level test results in a
firm test decision. However, the decision could be the wrong one.
Sometimes we obtain an extreme sample mean, x, and hence an extreme test statistic, ts , which may
lie in the extreme tails of the null distribution.
Extreme observations are unlikely, but still possible under the null distribution.
Type I errors are easy to control, we simply select a smaller α prior to conducting the test. The
probability of a Type I error is never greater than α.
2 / 53
A Type II error is when we fail to reject H0 when it is in fact FALSE.
Power: The ability to detect an effect/difference when a significant effect/difference really exists, (i.e.
correctly reject H0 when it is actually false.)
The more power in a test, the greater the chance that we will correctly detect a significant result when
one exists (i.e. when H0 is false, and HA is true).
3 / 53
General Idea Behind Power
µ0 x µA
The shaded blue area is the p-value = P(X > x) assuming H0 is true.
The shaded red area is the power = P(X > x) if HA is true with µA > µ0 .
The power of a hypothesis test is the probability that we detect a specified difference (i.e. when
|µA − µ0 | is large enough) in the means WHEN that difference exists.
4 / 53
Power depends on
I The significance level of the test, α, i.e. the Type I error rate
I The difference of interest µA − µ0
I The variance, σ2
I The sample size
But we should never increase α to increase power because then we increase our type I error rate.
5 / 53
Suppose we have the following test
H0 : µ = 50
versus
HA : µ 6= 50
Suppose n = 100, and we wish to conduct the test at the α = 0.05 level.
σ2 202
= N 50, 22
The null distribution is X ∼ N µ0 , = N 50,
n 100
σ2 202
= N 56, 22
The alternative distribution is X ∼ N µa , = N 56,
n 100
6 / 53
What is the Power of the Test?
When HA is true, µa = 56
X − µa 53.92 − 56
Power = P >
σ 20
X √
100
= P(Z > −1.04) = 0.8508
7 / 53
For a two-sided test, the rejection regions on the original sampling distribution for the sample mean X
have the following boundaries.
σ
xL = µ0 − zα/2 √
n
σ
xU = µ0 + zα/2 √
n
µ0 − 3σ n µ0 − 2σ n µ0 − σ n µ0 µ0 − σ n µ0 − 2σ n µ0 + 3σ n
x − zα 2σ n x + zα 2 σ n
8 / 53
For our example where α = 0.05:
σ
xL = µ0 − zα/2 √ = 50 − (1.96) ∗ 2 = 46.08
n
σ
xU = µ0 + zα/2 √ = 50 + (1.96) ∗ 2 = 53.92
n
9 / 53
power equals 85.08 %
42 44 46 48 50 52 54 56 58 60 62 64
46.08 53.92
10 / 53
Now suppose we have the following test
H0 : µ = 50
versus
HA : µ 6= 50
Suppose n = 100, and we wish to conduct the test at the α = 0.02 level.
σ2 202
= N 50, 22
The null distribution is X ∼ N µ0 , = N 50,
n 100
σ2 202
= N 56, 22
The alternative distribution is X ∼ N µa , = N 56,
n 100
11 / 53
What is the Power of the Test?
When HA is true, µa = 56
X − µa 55.6526 − 56
Power = P >
σ 20
X √
100
= P(Z > −0.6737) = 0.7497
12 / 53
power equals 74.97 %
42 44 46 48 50 52 54 56 58 60 62 64
45.35 54.65
13 / 53
Suppose now that we have the following test
H0 : µ ≤ 50
versus
H0 : µ > 50
Suppose n = 100, and we wish to conduct the test at the α = 0.05 level.
σ2 202
= N 50, 22
The null distribution is X ∼ N µ0 , = N 50,
n 100
σ2 202
= N 56, 22
The alternative distribution is X ∼ N µa , = N 56,
n 100
14 / 53
What is the Power of the Test?
When HA is true, µa = 56
X − µa 53.29 − 56
Power = P >
σ 20
X √
100
= P(Z > −1.355) = 0.9123
15 / 53
Note: We only have one rejection region with critical value in the original sampling distribution X
given by
σ
xc = µ0 + zα √ = 50 + 1.645(2) = 53.29 where α = 0.05
n
42 44 46 48 50 52 54 56 58 60 62 64
53.29
16 / 53
Suppose now that we have the following test
H0 : µ ≥ 50
versus
HA : µ < 50
Suppose n = 100, and we wish to conduct the test at the α = 0.05 level.
σ2 202
= N 50, 22
The null distribution is X ∼ N µ0 , = N 50,
n 100
σ2 202
= N 56, 22
The alternative distribution is X ∼ N µa , = N 56,
n 100
17 / 53
What is the Power of the Test?
When HA is true, µa = 56
X − µa 46.71 − 56
Power = P <
σ 20
X √
100
= P(Z < − 4.645) ≈ 0
18 / 53
Note: We only have one rejection region with critical value in the original sampling distribution X
given by
σ
xc = µ0 − zα √ = 50 − 1.645(2) = 46.71 where α = 0.05
n
power approximately 0 %
42 44 46 48 50 52 54 56 58 60 62 64
46.71
19 / 53
Suppose we have the following test
H0 : µ = 50
versus
HA : µ 6= 50
Suppose n = 64, and we wish to conduct the test at the α = 0.05 level.
σ2 202
= N 50, 2.52
The null distribution is X ∼ N µ0 , = N 50,
n 64
σ2 202
= N 56, 2.52
The alternative distribution is X ∼ N µa , = N 56,
n 64
20 / 53
What is the Power of the Test?
When HA is true, µa = 56
X − µa 54.9 − 56
Power = P >
σ 20
X √
64
= P(Z > −0.44) = 0.67
21 / 53
power equals 67 %
42 44 46 48 50 52 54 56 58 60 62 64
45.1 54.9
22 / 53
Suppose we have the following test
H0 : µ = 50
versus
HA : µ 6= 50
Suppose n = 100, and we wish to conduct the test at the α = 0.05 level.
σ2 202
= N 50, 22
The null distribution is X ∼ N µ0 , = N 50,
n 100
σ2 202
= N 60, 22
The alternative distribution is X ∼ N µa , = N 60,
n 100
23 / 53
What is the Power of the Test?
When HA is true, µa = 60
X − µa 53.92 − 60
Power = P >
σ 20
X √
100
= P(Z > −3.04) = 0.9988
24 / 53
power equals 99.88 %
42 44 46 48 50 52 54 56 58 60 62 64 66 68
46.08 53.92
25 / 53
Power Curves
Suppose we have a variable whos values yield a population standard deviation of 3.6. From the
population we select a simple random sample of size n = 100. We select a value of α = 0.05 for the
hypotheses:
H0 : µ = 17.5
versus
HA : µ 6= 17.5
When conducting a two-sided test, the visualization of the relationship between the significance level of
the test (based upon the rejection regions) and β is also easy to see.
26 / 53
Suppose µA << µ0 . It’s clear that Power is reasonably high because β is reasonably small.
α 2 β
µa µ0
27 / 53
Now µA < µ0 , but closer than in the previous case, hence |µA − µ0 | is small. It’s clear that Power has
decreased because β is large.
α 2 β
µa µ0
28 / 53
Next µA > µ0 where |µA − µB | is still small, hence β is big and we have low Power.
β α 2
µ0 µa
29 / 53
Finally, µA >> µ0 . Since β is now small again, Power has increased.
β α 2
µ0 µa
30 / 53
Since the power of the test depends on the difference between µA − µ0 we can generate power curves
to understand when the power will be large or small. Unless |µa − µ0 | is large, then β is relatively large
compared with α and hence we will have lower statistical power.
31 / 53
32 / 53
One-Sided Power Curves
The mean time laboratory employees now take to do a certain task on a machine is 65 seconds, with
standard deviation of 15 seconds. The times are appxoimately normally distributed. The manufacturers
of a new machine claim that their machine will reduce the mean time required to perform the taks.
The quality-control supervisor designs a test to determine whether or not she should believe the claim
of the makers of the new machine. Shoe chooses a significance level of α = 0.01 and randomly selects
20 employees to perform the task on the new machine.
H0 : µ ≥ 65
versus
HA : µ < 65
The quality-control supervisor computes, for example, the following value of 1 − β for alternative
µ = 55. The critical value of 1 − β for the test is
σ 15
xc = µ0 − z0.01 √ = 65 − 2.33 √ = 57
n 20
33 / 53
We can find β and 1 − β as follows:
57 − 55
β = P(X > 57 | µa = 55) = P Z > √ = P(Z > 0.69) = 1 − 0.7257 = 0.2743 (by table)
15/ 20
Therefore,
34 / 53
The power curve for our left sided test that has α = 0.01 and µ0 = 65. The power increases the farther
µa is from µ0 in leftward direction.
An alternative to plotting the power curve is to plot operating characteristic (OC) curve. To
construct an OC curve, we plot the values of β, rather than 1 − β, along the vertical axis. Thus, an
OC curve is the complement of the corresponding power curve.
35 / 53
Summary
Power depends on
I The significance level of the test, α, i.e. the Type I error rate
I The difference of interest µA − µ0
I The variance, σ2
I The sample size
Definition: The effect size is a measure of the signal-to-noise ratio. It represents the strength of a
phenomenon in the presence of uncertainty. It combines the difference of interest along with the
variance information.
For hypothesis tests for a single mean, the effect size is given by
| µA − µ0 |
effect size =
σ
For other hypothesis tests, we have different effect size measures which will be explained in the later
half of this lecture.
36 / 53
Designing a Test Based Upon Power
In designing studies most people consider obtaining a power of 80% or 90% (similar to how we use
95% as the confidence level for confidence interval estimates).
When designing a study, we choose the following: he desired power, the level of significance, α, and
the effect size.
Based upon the above choices, we find the sample size that will allow us to obtain the desired power.
The effect size is selected to represent a clinically meaningful or practically important difference in the
parameter of interest.
The following formulas represent that minimum sample size to ensure that a hypothesis test will meet
the desired power. However, it should be pointed out, that often times the sample size is usually
selected to be higher to account for attrition of participants in a study.
Important Note: The sample size formulas in the remaining slides are approximations only. More
precise formulations can be found in literature and in tables. See table on Canvas for example. The
difference between our formula and the table result is usually within 2 observations.
37 / 53
Sample Size for Tests of a Single Mean
Suppose
H0 : µ = µ 0
versus
HA :µ 6= µ0
The formula for determining sample size to ensure that the test has a specified power is given below:
zα/2 + zβ 2
n=
ES
where zα/2 is the critical value from the standard normal distribution with α/2 area to the right based
upon the selected level of significance. zβ is also a critical value (area to the right is β) where 1 − β is
the power.
The difference in mean |µA − µ0 | represents what is considered a clinically meaningful or practically
important difference in the means. This represents the minimum significant difference that we would
like to detect at the desired power level.
The value of σ typically comes from a previous study or a pilot study. It is usually best to use a
conservative estimate, so that the resulting sample size is not too small.
39 / 53
Example
An investigator hypothesizes that in people free of diabetes, fasting blood glucose, a risk factor for
coronary heart disease, is higher in those who drink at least 2 cups of coffee per day. A cross-sectional
study is planned to assess the mean fasting blood glucose levels in people who drink at least two cups
of coffee per day. The mean fasting blood glucose level in people free of diabetes is reported as 95.0
mg/dL with a standard deviation of 9.8 mg/dL.
If the mean blood glucose level in people who drink at least 2 cups of coffee per day is 100 mg/dL, this
would be important clinically. How many patients should be enrolled in the study to ensure that the
power of the test is 80% to detect this difference? A two sided test will be used with a 5% level of
significance.
Since the desired power is 80% and α = 0.05, then it follows that
Therefore, a sample of size n = 31 will ensure that a two-sided test with α = 0.05 has 80% power.
In the planned study, participants will be asked to fast overnight and to provide a blood sample for
analysis of glucose levels. Based on prior experience, the investigators hypothesize that 10% of the
participants will fail to fast or will refuse to follow the study protocol. Therefore, a total of 35
participants will be enrolled in the study to ensure that 31 are available for analysis .
41 / 53
Sample Size for a Single Dichotomous Random Variable
In studies where the plan is to perform a test of hypothesis comparing the proportion of successes in a
dichotomous outcome variable in a single population to a known proportion, the hypotheses of interest
are:
H0 : p = p0
versus
HA :p 6= p0
The formula for determining sample size to ensure that the test has a specified power is given below:
zα/2 + zβ 2
n=
ES
where zα/2 is the critical value from the standard normal distribution with α/2 area to the right based
upon the selected level of significance. zβ is also a critical value (area to the right is β) where 1 − β is
the power.
42 / 53
|pa − p0 |
The effect size in this case is: ES = p .
p0 (1 − p0 )
Example: A recent report from the Framingham Heart Study indicated that 26% of people free of
cardiovascular disease had elevated LDL cholesterol levels, defined as LDL > 159 mg/dL. An
investigator hypothesizes that a higher proportion of patients with a history of cardiovascular disease
will have elevated LDL cholesterol.
How many patients should be studied to ensure that the power of the test is 90% to detect a 5%
difference in the proportion with elevated LDL cholesterol? A two sided test will be used with a 5%
level of significance.
|pa − p0 | 0.05
The effect size is ES = p = p = 0.11
p0 (1 − p0 ) 0.26(1 − 0.26)
43 / 53
Sample Size for Two Independent Samples (Equal Variance Assumption)
In studies where the plan is to perform a test of hypothesis comparing the means of two numeric
variables, the hypotheses are
H0 : µ x = µ y
versus
HA : µ x 6 = µ y
The formula for determining sample size to ensure that the test has a specified power is given below:
zα/2 + zβ 2
ni = 2 ∗ where i = 1, 2 is the group number
ES
and zα/2 is the critical value from the standard normal distribution with α/2 area to the right based
upon the selected level of significance. zβ is also a critical value (area to the right is β) where 1 − β is
the power. The above formula obtains nx = ny .
44 / 53
| µx − µy |
The effect size in this case is ES = .
σ
In this case, it’s quite clear that we wish to investigate whether the means of the two groups are
significantly different from one another. So we wish to have the ability to detect this difference for a
given power.
We know that for pooled t-tests, the estimate of σ comes from the pooled sample standard deviation
s
(nx − 1)s2x + (ny − 1)s2y
sp =
nx + ny − 2
but prior studies usually don’t involve sample data from two groups (or at least the groups we care
about for this study).
Often times, µX is usually the mean for a placebo or control group, so the only prior data that we have
comes from a single sample. Hence σ, is estimated only by sx (i.e. from the single group).
Important Note: the formula for the sample size estimate corresponds to samples of equal size. If a
study is planned where different numbers of patients will be assigned or different numbers of patients
will comprise the comparison groups, then alternative formulas can be used.
45 / 53
Example
An investigator is planning a clinical trial to evaluate the efficacy of a new drug designed to reduce
systolic blood pressure. The plan is to enroll participants and to randomly assign them to receive either
the new drug or a placebo. Systolic blood pressures will be measured in each participant after 12 weeks
on the assigned treatment. Based on prior experience with similar trials, the investigator expects that
10% of all participants will be lost to follow up or will drop out of the study. If the new drug shows a 5
unit reduction in mean systolic blood pressure, this would represent a clinically meaningful reduction.
How many patients should be enrolled in the trial to ensure that the power of the test is 80% to detect
this difference? A two sided test will be used with a 5% level of significance.
In order to compute the effect size, an estimate of the variability in systolic blood pressures is needed.
Analysis of data from the Framingham Heart Study showed that the standard deviation of systolic
blood pressure was 19.0. This value can be used to plan the trial.
| µX − µY | 5
The effect size is ES = = = 0.26
σ 19.0
46 / 53
The sample size that
2 2
zα/2 + zβ
1.96 + 0.84
ni = 2 ∗ = 2∗ = 231.95
ES 0.26
Therefore, samples of size nx = ny = 232 will ensure that the hypothesis test will have 80% power to
detect a a 5 unit difference in mean systolic blood pressures in patients receiving the nuew drug as
compared to patients receiving the placebo.
However, the investigators hypothesized a 10% attrition rate (in both groups), and to ensure a total
sample size of 232 they need to allow for attrition.
desired sample size n
N=
% retained
232
= = 258
0.90
Therefore the investigator must enroll 258 participants to be randomly assigned to receive either the
new drug or placebo.
47 / 53
Sample Size for Matched Pairs (Mean Difference)
In studies where the plan is to perform a test of hypothesis on the mean difference in a continuous
outcome variable based on matched data, the hypotheses of interest are:
H0 : µ d = 0
versus
HA : µ d 6 = 0
The formula for determining sample size to ensure that the test has a specified power is given below:
zα/2 + zβ 2
n=
ES
were the effect size is given by
µd
ES =
σd
48 / 53
Example
An investigator wants to evaluate the efficacy of an acupuncture treatment for reducing pain in
patients with chronic migraine headaches. The plan is to enroll patients who suffer from migraine
headaches. Each will be asked to rate the severity of the pain they experience with their next migraine
before any treatment is administered. Pain will be recorded on a scale of 1-100 with higher scores
indicative of more severe pain. Each patient will then undergo the acupuncture treatment. On their
next migraine (post-treatment), each patient will again be asked to rate the severity of the pain. The
difference in pain will be computed for each patient. A two sided test of hypothesis will be conducted,
at α = 0.05, to assess whether there is a statistically significant difference in pain scores before and
after treatment. How many patients should be involved in the study to ensure that the test has 80%
power to detect a difference of 10 units on the pain scale? Assume that the standard deviation in the
difference scores is approximately 20 units.
µd 10
The effect size is ES = = = 0.50.
σd 20
49 / 53
The sample size is
2 2
zα/2 + zβ
1.96 + 0.84
n= = = 31.4
ES 0.50
Hence a sample of size n = 32 patients with migraine will ensure that a two-sided test with α = 0.05
has 80% power to detect a mean difference of 10 points in pain before and after treatment, assuming
that all 32 patients complete the treatment.
50 / 53
Sample Size for Difference of Proportions
In studies where the plan is to perform a test of hypothesis comparing the proportions of successes in
two independent populations, the hypotheses of interest are:
H0 : px = py
versus
HA : px 6= py
The formula for determining sample size to ensure that the test has a specified power is given below:
zα/2 + zβ 2
ni = 2 ∗ where i = 1, 2 is the group number
ES
|px − py |
The effect size in this case is ES = p , where p is the pooled proportion, we can obtain this by
p(1 − p)
taking the average of the two proportions (justified by equal sample size).
51 / 53
Example
An investigator hypothesizes that there is a higher incidence of flu among students who use their
athletic facility regularly than their counterparts who do not. The study will be conducted in the
spring. Each student will be asked if they used the athletic facility regularly over the past 6 months
and whether or not they had the flu. A test of hypothesis will be conducted to compare the proportion
of students who used the athletic facility regularly and got flu with the proportion of students who did
not and got flu. During a typical year, approximately 35% of the students experience flu. The
investigators feel that a 30% increase in flu among those who used the athletic facility regularly would
be clinically meaningful. How many students should be enrolled in the study to ensure that the power
of the test is 80% to detect this difference in the proportions? A two sided test will be used with a 5%
level of significance.
We first compute the effect size by substituting the proportions of students in each group who are
expected to develop flu, first px = 0.35, py = 0.46 is a 30% increase over px (i.e. 0.35*1.30=0.46) and
the overall proportion, p = 0.41 (i.e. (0.46+0.35)/2).
52 / 53
|px − py | |0.35 − 0.46|
The effect size is ES = p = p = 0.22
p(1 − p) 0.41(1 − 0.41)
Hence nx = ny = 324 will ensure that the test of hypothesis will have 80% power to detect a 30%
difference in the proportions of students who develop flu between those who do and do not use the
athletic facilities regularly.
53 / 53