Sample Size Calculation
Sample Size Calculation
https://doi.org/10.1007/s43994-024-00153-x
REVIEW
Abstract
Sample size determination is a critical aspect of biomedical research, as it dictates the number of samples needed for a suc-
cessful experiment that can yield generalizable results to the broader population. This paper outlines the methodology for
calculating sample sizes in various categories of biomedical and clinical research, encompassing cross-sectional surveys,
comparative studies, case–control studies, cohort studies, and animal studies. Detailed examples are provided for each cat-
egory. The determination of an appropriate sample size holds significant importance from scientific, ethical, and resource
allocation perspectives. It is noteworthy that research outcomes are often directly influenced by the accuracy of sample size
calculations. A robust sample size calculation serves as a cornerstone for researchers, enabling them to draw precise infer-
ences with a high level of precision across a spectrum of biomedical and clinical investigations.
Keywords Sample size · Biomedical research · Clinical research · Animal studies · Surveys
Vol.:(0123456789)
Journal of Umm Al-Qura University for Applied Sciences
state that the new drug, on average, is no more effective strengthens its statistical capability and diminishes the
than the current drug. On the other hand, the alternative chance of making a Type II error, thereby lowering the
hypothesis, denoted as H 1, outlines the purpose of a statisti- likelihood of incorrect negative conclusions. This statisti-
cal hypothesis test. In the same clinical trial scenario, the cal power is denoted as 1 − 1 − β and, in most clinical trials,
alternative hypothesis could propose that the new drug has a power of 0.8 (or 80%) is deemed optimal for effectively
a different average impact compared to the existing drug. An detecting a statistically significant distinction. With an 80%
alternative hypothesis is a statement that suggests or hints at power, there remains a 20% possibility of failing to detect a
a potential finding or result for an investigator or researcher. significant difference, even if it truly exists [4].
These alternative theories are categorized into two groups:
directional and non-directional. 2.4 Effect size
The value of 1 0n is usually 1 or 100 for common attrib- 2.8 Standard deviation (SD) in the population
utes. The value of 10n might be 1,000, 100,000, or even
1,000,000 for rare attributes and most diseases. The standard deviation gauges the extent of spread within
Example for calculating prevalence in population. An a given dataset. A smaller standard deviation indicates that
evaluation of 2000 pregnant women attending a hospital in the data points are closely clustered around the mean (also
Kano, Nigeria reported a total of 785pregnant women tak- known as the expected value), while a higher standard devia-
ing supplements at least 2 times a week during the second tion signifies a wider dispersion of values [10]. When the
trimester. Calculate the prevalence of frequent supplement standard deviation is smaller, the data consistently centers
use in this group. around the mean, whereas a larger standard deviation indi-
cates greater variability or dispersion in the data. A standard
Prevalance = (785∕2000) × 100 = 0.39 × 100 = 39%
deviation approaching 0 implies that data points are near the
Therefore, the prevalence of using supplement in this mean, whereas a high or low standard deviation suggests that
group is 39% data points deviate from the mean, either above or below it,
respectively [11].
2.6 Incidence rate
3 Sample size calculation for cross‑sectional
An incidence rate is a measure of how easily disease studies/surveys
spreads across a population. The incidence rate is the ratio
of the number of cases to the total time that the popula- A cross-sectional study alternatively called a transver-
tion is exposed to the disease. The incidence, which can sal study, prevalence study or cross-analysis is a type of
be expressed as a risk or an incidence rate, represents the research that focuses on data from a population or a typical
number of new cases of disease over a given period. subgroup at a specific point in time
Number of new cases of disease
Incidence rate = (3)
time each person was observed, the total for all the people
�� �2
((Z1− 𝛼2 ) {(1 + 1∕m)p(1 − p)} + (Z1− 𝛽)
√
(p0 (1 − p0 )∕m)p1 (1 − p1 ) )
Sample size(n) = (9)
� �2
p0 − p1
between the case and control groups was 2 mg/day and where.
SD was 10 mg/day. n = number of samples required.
Given that: m = number of control subject per group.
r = 1. p0 = control possibility.
SD = 10. p1 = Experimental possibility.
z1-β = 0.84. p = population proportion.
z1-α/2 = 1.95. z1-β = the power of the study.
d = 2. z1-α/2 = Critical value.
using the following equation: Example of calculating same size estimation from a
cohort studies. Calculate the sample size needed to assess
2
(r + 1)(SD)2 (Z1−𝛽 + Z 1− 𝛼 ) the association between tobacco smoking and risk of mor-
Sample size(n) = 2 (8) tality at 95% CI and 80% power of the study with the equal
r(d)2 number of case and control subjects if a previous study indi-
cated that proportion of risk of mortality in tobacco smokers
(1 + 1)(10)2 (0.84 + 1.95)
2
is 15% and, in the control, group is 24%.
Sample size(n) = = 389.205 Given that;
1(2)2
m = 1.
(n) = 389. p = 0.195.
The researcher, therefore, needed at least 389 samples. z1-β = 0.84.
z1-α/2 = 1.96.
po = 0.24.
p1 = 0.15.
Applying:
�� �2
((Z1− 𝛼2 ) {(1 + 1∕m)p(1 − p)} + (Z1− 𝛽)
√
(p0 (1 − p0 )∕m)p1 (1 − p1 ) )
Sample size(n) = � �2
p0 − p1
Journal of Umm Al-Qura University for Applied Sciences
√ √ 2
(1.96 {2(0.195)(1 − 0.195)} + 0.84 {(0.24(1 − 0.24))0.15(1 − 0.15)})
Sample size(n) = = 185.6593
(0.24 − 0.15)2
8 Sample size formula for animal studies Example. Calculate the minimum and a maximum num-
ber of animals required per group to test a new antidiar-
Taking into account both ethical considerations and finan- rheal drug with control if the research is designed to have 2
cial constraints, it is imperative to conduct experiments groups. Assume df to be 10 and 20 respectively.
with a limited number of animals, but with meticulously
n = df∕s + 1 (16)
designed protocols to ensure precise data analysis. It is
strongly advised that researchers engage a statistician in
the early stages of experimental design. Furthermore, df
Minimum = n = +1
embarking on any experiment without a clear understand- s
ing of how the results will be assessed is inadvisable 10
= +1
[12]. The power analysis approach, akin to its applica- 2
tion in human studies, is often the preferred method for =6 animals per group,
determining sample size in animal research. Additionally,
the 'resource equation' method stands as an equally valid Totalsamplesize = 6 × 2 = 12
alternative for ascertaining the appropriate sample size in
studies involving animals [2, 12, 14, 15]. This approach df
aligns with the overarching goal of conducting research Maximum = n = +1
s
that is scientifically sound, cost-effective, and ethically 20
= +1
responsible. 2
The resource equation [13] is given as =11 animals per group,
E =(the total number of experimental units)
(13) Total sample size = 11 × 2 = 22
− (the total number of treatment groups)
Sample size calculation for animal studies having one
where E denotes sample size should be between 10 and 20
control group with two or more experimental groups
[16].
The power analysis. The power, significance level, sided-
(II) for an experiment involving the use of one con-
ness, standard deviation, size effect, and Sample size are six
trol group and two or more experimental groups
variables that are mathematically related to power analysis.
to be treated with the same compound at varying
If the first five are known, the sixth (normally sample size)
concentrations and at the end to be analyzed using
can be calculated [16].
ANOVA, the formula below can be used.
Sample size calculation for animal study having one
experimental group and one control df = (N − 1)(v − 1) (17)
(I) For an experiment involving the use of one experimen- where N is the total number of subjects and v is the number
tal group and one control to be compared using an inde- of measurements repeated
pendent T-test, the number of animals can be calculated N = df∕(v − 1) + 1 (18)
using the formula below.
To obtain the minimum and maximum numbers of ani-
df = N − s = sn − s = s(n − 1) (14) mals available, substitute the df in the formulas below with
where N denotes the total number of subjects, s denotes the 10 and 20, respectively:
number of groups, and n denotes the number of subjects per df 10
group, df denotes the degree of freedom. n is obtained by Minimum = N = +1= +1 (19)
v−1 v−1
rearranging the formula:
n = df∕s + 1 (15) df 20
Maximum = N = +1= +1 (20)
v−1 v−1
The df in the formulas is substituted with the minimum
(10) and maximum (20) based on the appropriate range of Example. Calculate the sample size required to test the
the df to achieve the optimal and maximum numbers of ani- toxicity of different doses of cisplatin 4.5 mg/kg,5.5 mg/
mals per group: kg, 6.5 mg/kg, and 7.5 mg/kg against control.
For minimum, n = 10/s + 1 and for maximum, n = 20/s + 1.
Journal of Umm Al-Qura University for Applied Sciences
Epi Info A suite of free, user-friendly tools for public health CDC Epi Info Website
professionals, including a sample size calculator
Sample Size Calculator by Creative Research An online tool for determining sample size needed Creative Research Systems Website
Systems for surveys and experiments
Power and Sample Size Calculation (PS) A software package for calculating power and sam- PS Website
ple size for a wide range of study designs
ClinStat A statistical calculator with a suite of tools, includ- ClinStat Website
ing sample size calculations for clinical research
Stata A comprehensive statistical software that includes Stata Website
tools for sample size determination among its
features
Sample Size Calculator by Creative Research An online tool for determining sample size needed Creative Research Systems Website
Systems for surveys and experiments