Sample
Size
Year:3
Semester:
Module:
Instructor information
Name: Dr . Noha Mohamed Abu Bakr Elsaid
Department: Public health and community medicine
Office hours: Sunday from 9 to 11 am.
Objectives:
By the end of this lecture, students will be able to:
1. Rationale for sample size calculation
2. Factors affecting Sample Size
3. Principles of sample size calculation
Introduction
Introduction
Population:
a set which includes all
measurements of interest
to the researcher
(The collection of all
responses, measurements, or
counts that are of interest)
Sample:
A subset of the population
Introduction
A population could be:
• all individuals, families, groups, organizations,
events, objects, or items from which samples are
taken for measurement.
• For example, a population of presidents,
professors, books or students, RBCs.
Target Population:
• The population to be studied/ to which the
investigator wants to generalize his results
Introduction
Sampling:
• The process of obtaining information from a subset
(sample) of a larger group (population)
• The process of sampling has 3 elements
– Selecting the sample
– Collecting information
– Making inference about population
Introduction
Sampling
It refers to how close the It refers to how accurately this
sample’s statistic is to the sample’s members resemble the
true population’s value it members of the entire population
represents. it represents.
Why sampling?
Get information about large populations
• Less costs (Economy) it is cheaper to observe a
part rather than the whole population),
• Less field time
• More accuracy
• The large size of many populations, when it’s
impossible to study the whole population
Why Sample Size calculation?
With Small S. Size, you can NOT
Scientific reasons be sure about your conclusion, or
may mistakenly decide wrong
conclusion
• Undersized study can expose subjects
to potentially harms without yielding
Ethical reasons a solid conclusion.
• Oversized study expose unnecessarily
large number of subjects to potentially
harmful intervention
• Undersized study is a waste of resources due
to its inability to yield useful results
• Oversized study may result in statistically
Economic reasons significant result with doubtful clinical
importance leading to waste of resources
Statistical Terms
• Target Population:
The population to be studied/ to which the investigator wants
to generalize his results
• Sampling Unit:
Subject under observation on which information is collected
Smallest unit from which sample can be selected
Example: children <5 years, hospital discharges, health events
• Sampling fraction
Ratio between sample size and population size
Example: 100 out of 2000 (5%)
• Sampling frame
List of all the sampling units from which sample is drawn
• Sampling scheme
Method of selecting sampling units from sampling frame
Steps of the sampling process
1. Identify the target population.
2. Identify the accessible population to be
sampled (N)
3. Determine the size of the sample needed (n)
4. Select the sampling technique
5. Implement the plan
Types of sampling
• Probability samples:
Samples in which each members of the
population have a known equal non zero
chance (probability) of being selected in the
study
• Non-probability samples:
Samples in which the chances (probability) of
selecting members from the population are
unknown
Types of sampling
Sample size justification
How many subjects (sampling units)
should be studied?
◼ The answer to this question is often an arbitrary
choice of a number.
◼ This number will erroneously depend only on
feasibility:
Time allowed for the study
Available resources
Frequency of cases, etc.
Unethical sample size
◼ Example: effect of 2 anti-HTN drugs on DBP
Drug A: n=10 mean=78.6 SD=10.2
Drug B: n=10 mean=85.3 SD=10.3
◼ Student t-test=1.462 (p=0.161): NOT statistically significant
Power=0.297, ß error=0.703
◼ Another study:
Drug A: n=10000 mean=78.6 SD=10.2
Drug B: n=10000 mean=78.9 SD=10.3
◼ Student t-test=2.070 (p=0.039): Statistically significant
Clinical vs statistical significance
Sample size justification
How to judge that a sample size
is small or large, and how to
estimate an optimum size of a
sample for a certain study is
known as sample size
justification.
Sample size justification
◼ The size of the sample will depend on the
following factors:
Magnitude of the difference to be detected
(effect size)
Variability of the measurement
Level of significance
Power of the study
Significance level ( level)
•p-value at which we will consider the
result as statistically significant or not
(conventionally it is 5%)
•p-value: The probability that the observed
effect or phenomena is due to chance.
•The more stringent significance level
(e.g. 1%) the larger required sample size
Type 1 error
• The probability of finding a difference with our sample
compared to population, and there really isn’t one.
• Probability of rejecting a true null hypothesis
• Known as the α (or “type 1 error”) and since it is error
,so keep it at minimum, so the max value is 5%,to be
more confident, it be .001.
• Usually set at 5% (or 0.05)
Sample size justification
◼ Level of significance:
How confident in not rejecting a true null hypothesis, i.e.
avoiding "α error" or "type I error".
Max level of "α“ arbitrarily set to 5% or 0.05.
To be more confident (α-error 0.01 or 0.001), investigator
pays in terms of increase in sample size.
Power (1 - level)
• The capacity of the study to detect differences or
relationships that actually exist in the population.
• The likelihood of finding a statistically significant effect of
a given magnitude if one truly exists?.
• It is the probability of finding an effect when an effect
actually exists.
• The greater the power, the larger required sample size
Type 2 error
• Known as the β (or “type 2 error”)
• The power is related to the β (or “type 2 error”) error
which is the risk of accepting null hypothesis while it is
false i.e. The probability of not finding a difference that
actually exists
• So power= 1- β, where β not more than 20% so power is
more than 80% is acceptable
• So, there is an inverse relation between sample size and β
error
Sample size justification
◼ Power of the study:
Probability to yield statistically significant result.
The risk of accepting the null hypothesis although it is
false (type II or β-error) is minimized.
The power is equal to 1 - β.
Studies with β-error of up to 0.2 (power of 0.8 or 80%)
are acceptable.
Effect size or desired Precision
• Magnitude of difference to be detected.
• Effect is the presence of a phenomenon, relationship or
difference you looking for (primary outcome)
• It is the generic term to describe the magnitude of the
relationship between an independent variable and a
dependent variable.
• Example:
✓In Surveys/ Single group research >> Width of Confidence interval(margin of error/Precision)
✓In Comparative Studies >> Difference(in means or proportions)
• The smaller effect/confidence interval (i.high precision), the
larger sample size will be required.
Sample size justification
◼ Magnitude of the difference (effect size)
A large sample size is needed for detection of a minute
difference
Examples:
◼ Prevalence rate 1/1000 vs 25%
◼ Effect of 2 antihypertensive drugs: difference is in the order of 1-2
mm Hg vs a difference of 5-10 mm Hg.
◼ OR = 1.5 vs OR = 4
The magnitude depends on clinical significance
Variability in the population
• It is measured by “Variance”
• Its measurement depends on type of outcome measure:
–Continuous: standard deviation square (S2)
–Dichotomous: p(1-p)
• The Larger variability, the larger sample size will be
needed
Sample size justification
◼ Variability of the measurement:
Simulated to background noise: higher noise leads to
more difficult detection of the signal
Variability is reflected by the standard deviation or the
variance.
Higher standard deviation needs larger sample size.
Sample size justification
◼ To summarize, sample size (n) is:
Directly related to standard deviation (s)
Inversely related to:
◼ Effect size
◼ α-error
◼ β-error
SD
n
effect size . α-error . β-error
Sample size equations – single mean
◼ Estimation of single parameter: mean
◼ e.g.: A survey is done to determine the mean
diastolic BP in a certain community
Expected mean DBP m= 80 mm Hg
Expected standard deviation s = 10 mm Hg
Desired 95% confidence interval 75 - 85 mm Hg:
◼ Absolute precision “e”: ± 5 mm Hg
◼ Standard Error ≈ 2.5 mm Hg (absolute precision / z)
Sample size equations – single mean
Z2 * 2
n =
e2
(1.96)2 * 102
n =
(5)2
= 16
Sample size equations – single proportion
◼ Estimation of single parameter: prevalence
◼ e.g.: a survey is done to determine the prevalence
of hypertension in a community
Expected prevalence P = 20% (0.2)
Standard Deviation s = pq (=0.25 max)
Desired 95% confidence interval 18 – 22:
◼ Absolute precision “e”: 2% (0.02)
◼ Standard Error = 1% (0.01)
Sample size equations – single proportion
Z2 * (p/(1-p))
n =
e2
(1.96)2 * (20/(100-20)
n =
(2*2)2
= 1536
Sample size equations – 2 means
◼ Estimation of difference between 2 means
◼ e.g.: difference between DBP in drug A & B
A: mean1 = 85 s1 = 7 mm Hg
B: mean2 = 82 s2 = 5 mm Hg
Confidence: 95%
Power: 90%
Effect size: = 85 – 82 = 3
Standard deviation: s12 + s22
Sample size equations – 2 means
(Za/2 + Zb/2) 2* (s12 + s22)
n =
2
(1.96 + 1.28)2 * (52 + 72)
n =
(85-82)2
= 86
Sample size equations – 2 proportions
◼ Estimation of difference between 2
proportions (rates):
Two samples according to exposure/outcome
Sample size to estimate:
◼ Effect size
◼ Odds Ratio/Relative Risk
Sample size equations – 2 proportions
◼ Case-control:
2 groups:
◼ One has the disease (outcome)
◼ One control (without disease or outcome)
Objective: to compare past exposure in the two groups
Sample size depends on:
◼ Difference in exposure between the 2 groups
◼ Variability of exposure between the 2 groups
◼ Level of confidence desired
◼ Power of the study desired
Sample size equations – 2 proportions
◼ e.g.: compare prevalence of smoking in IHD
Smoking in IHD (cases): P1 = 20% (0.20)
Smoking in non-IHD (controls): P2 = 15% (0.15)
Confidence: 95%
Power: 90%
Effect size: = 20 - 15 = 5
Sample size equations – 2 proportions
(Za/2 + Zb/2) 2* ((p1q1) + (p2q2))
n =
2
(1.96 + 1.28)2 * ((20*80) + (15*85))
n =
(20-15)2
= 1207
Sample size equations – 2 proportions
◼ e.g.: compare incidence of IHD in smoking
IHD in smokers (exposed): P1 = 20% (0.20)
IHD in non-smokers (controls): P2 = 15% (0.15)
Confidence: 95%
Power: 80%
Effect size: = 20 - 15 = 5
Sample size equations – 2 proportions
(Za/2 + Zb/2) 2* ((p1q1) + (p2q2))
n =
2
(1.96 + 1.28)2 * ((20*80) + (15*85))
n =
(20-15)2
= 1207
Tools for calculating sample size
1.Use of formulae (for each study design)
2.Nomograms
3.Ready made tables
4.Computer software
Take home message
◼ Too small sample size
ß error (lower power)
Accept a false null hypothesis (example)
◼ Too large sample size
Unethical
Can prove anything regardless clinical significance
Loss of resources
Take home message
Thank you