SAMPLE SIZE ESTIMATION
UNIT- X
INTRODUCTION
• ‘Sample size’ is a research term used for defining the number of individuals included in
conducting research. Researchers choose their sample based on variables such as demographics ,
for instance age, gender questions, or physical location.
• Sample size estimation is the process of choosing the right number of observations or people
from a larger group to use in a sample.
• The goal of figuring out the sample size is to ensure that the sample is big enough to give
statistically valid results and accurate estimates of population parameters but small enough to be
manageable and cost-effective.
The need for sample size estimation
• The sample size (n) can be defined as the number of units in a group under study.
• An adequate sample size allows the researcher to report the results with sufficient degree of
confidence and acceptable statistical power and helps ensure that the study will yield reliable
information, interpretable results and minimizes research waste, but with inadequate sample
size, making the interpretation of negative results difficult and yielding statistically inconclusive
results.
Factors that affect the sample size calculation
• There are several factors that must be take in consideration for the sample size estimation, as
knowing these factors and their effect helps the researcher to determine the appropriate the
sample size of the study. These factors can be summarized as follows:
• (1) The type of collective data of the study. The statistical methods of sample size determination
depend on the type of data being collected. If the data are continuous, the concern will be about
studying the means, but if the data are qualitative (categorical data), the concern will be about
studying the proportions (Ralph et al, 2002).
(2) The level of precision
• The level of precision refers to the survey error or the sampling error. It is defined as the
difference between the parameter of population and the sample statistic that is associated with
this parameter.
• The level of precision is inversely proportional with the sample size. It means, the lower the
precision level, the more sample size will be needed to reach the goal. This level is often
expressed in percentage (such as 5% or 7%). It is important to note that, while determining the
sample size, the level should be as low as possible without increasing the sample size to more
than the resources will allow (Chadha, 2006; Israel, 2009).
(3) The confidence level
• The confidence level is necessary when the results will be presented using confidence intervals.
This level indicates the probability value that the sample contains the parameter being estimated.
• Common confidence levels used in most of studies are 99% or 95%. If 95% confidence level is
selected, this means that, if an experiment is repeated 100 times, there are 95 out of 100 times
will have the true value of population parameter within the range of precision prespecified.
• The confidence level is directly proportional with the sample size. That is, the higher the
confidence level predetermined, the larger the sample size will be needed to achieve the purpose
of the study (Rahul, 2007).
(4) The significant level
• The significant level refer to the probability of detecting a statistically significant difference that is
the result of chance in other words, this level determines the probability of obtaining an
erroneously significant results. Also, this factor refers to the maximum P-value for which a
difference is to be considered statistically significant.
• The significant level is inversely proportional with the sample size, as this level is decreased; the
sample size is needed to detect the difference increased. The level of significant is most
commonly set a 5% or 1% chance of erroneously reporting a significant effect (Jan et al 2003).
(5) The effect size
• The effect size (minimum expected difference) is the smallest measured difference between
comparisons groups that the researcher would like the study to detect. It measures the distance
between the null hypothesis and a specified value of the alternative hypothesis.
• The effect size has a reverse relationship with the sample size. It means, the smaller the minimum
expected difference is predetermined; the larger sample size is needed to detect a statistical
significance.
• There are different ways to calculate effect size depending on the evaluation design used; general
effect size is calculated by taking the difference between the two groups and dividing it by the
standard deviation of one of the groups (Adcock, 1997; Eng, 2003).
(6) The degree of variability
• The degree of variability in the variables being measured refers to the distribution of variables in the
population. The more heterogeneous a statistical population (the more the degree of variability), the larger
the sample size required to obtain level of precision.
• The more homogeneous a population, the smaller the sample size. This factor is considered a vital
component of the sample size estimation and is represented by the expected variance in the primary
variables of interest in the study.
• Cochran(1977) listed several ways of estimating population variance for sample size estimation : 1) Use pilot
study results; 2) Use data from previous studies of the similar population; 3) Estimate or guess the structure
of the population assisted by some logical explanation; and 4) the standard deviation(σ).
(7) Type I error (α error)
• Type I error is the probability that difference revealed by statistical analysis really do not exist or is
the probability of rejecting a null hypothesis (Ho) even though it is true, this means that, α error
measures the probability of making a false-positive result.
• The α error has inverse relationship with the sample size. That is, the lower the α error is set; the
more sample size will be needed to reach the goal of the study.
• Common type I error used in determining the adequate sample size in most statistical studies is
either 5% or 1% (Kenneth & David, 2005).
(8) Type II error (ꞵ error)
• Type II error occurs when statistical procedures result in a judgment of no significant differences
when these differences do indeed exist or is the probability of accepting a null hypothesis even
though it is false. This means that, the type II error measures the probability of false-negative
result.
• The ꞵ error has a reverse relationship with the sample size. That is, the smaller the ꞵ error is
predetermined, the greater sample size is needed. A common ꞵ error that used in most statistical
studies is 20% (Eng, 2003; Jones et al., 2003).
(9) Statistical power (1 - ꞵ)
• The power of a statistical test is derived from the ꞵ error; it is the complement of ꞵ (1 - ꞵ) and represents the
probability of avoiding a false-negative result. This means, the ꞵ error is the probability of rejecting a false
null hypothesis.
• The statistical power is defined as the ability to detect a true significant difference between the two different
groups, when the differences in fact exist. The power of a test has a direct relationship with the sample size.
That is as power is increased, the sample size increases.
• The power is affected by many factors: the significant level (direct relationship), the effect size (direct
relationship) and the sample size (direct relationship).
• The statistical power is customarily set a number greater than or equal to 80%. A power of 80% means that
the researcher is correct in 8 items out of 10 when accepting his result (Elise and Jonathan 2002; Chadha,
2006).
(10) Others factors
• There are other factors that must be take into consideration for the sample size estimation.
• The sampling design- The approaches used in determining sample size is dependent on the
sampling design used in the study. Especially when more complex design is used, e.g., stratified
random samples must take into consideration the variances of subpopulations.
• The sample size formulas- provide the number of responses that need to be obtained. Many
researches add 10% to the sample size to compensate for persons that researcher is unable to
contact. Also, the sample size often increased by 30% to compensate for non response (Israel, 2009).
THE NEED FOR SAMPLE SIZE ESTIMATION
• Prior information of sample variation (measured by σ or proportion P ) on parameter of interest
• Desire precision limit (95% or 99%) (more accuracy of result needs higher degree of precision)
• Desire margin of error (error in estimate be within the specific limits that may be tolerated).
• Effect size.
• Desirable power.
• Sampling distribution of the formulae to be used (parametric/non-parametric tests, etc.)
• Sampling techniques to be used (Probability/Non-probability sampling, etc.)
• No. of covariates to be considered.
• Dropout/attrition rate
FORMULAE FOR SAMPLE SIZE ESTIMATION
• There are several approaches to determine the sample size; these include using a census for small
population as the sample, imitating a sample size of similar studies, using published tables and
applying formulas to calculate a sample size.
• There are two approaches to sample size calculations : the first is a precision-based sample size
calculations and the second is the power-based sample size calculations.
CATEGORY OF FORMULAE
1. For a study trying to 2. For a study seeking to
measure a variable demonstrate a significant
with a certain precision difference between two groups
1. Measuring one variable:
Abbreviations:
n = sample size
s = standard deviation
e = standard error
(Desirable margin of error ‘L’ = Zα e)
r = rate
p = percentage
Conditions:
- Prior information on parameter of interest
- Desirable precision (95% or 99%)
- Desirable margin of error
1.1 Single Mean:
s2
n = e2
where s = standard deviation and e = standard error
1.2 Single Proportion:
p(100-p)
n =
e2
where p = percentage of the group and e = standard error
1.3 Difference between two Means:
s12 + s22
n = e2
where s = standard deviation and e = standard error
1.4 Difference between two Proportions:
n= p1(100-p1) + p2(100-p2)
e2
where p1 and p2 = percentage of the two groups and e = standard error
2. Measuring Significant difference between two groups:
Abbreviations:
n = sample size
s = standard deviation
m = mean
p = percentage
u = one-sided % pt. of normal distn. corresponding to (100% - the desire power)
v = two-sided % point of normal distn. corresponding to desire significant level
2.1 Comparison of two means:
(u + v)2 (s12 + s22)
n= (m1 - m2)2
2.2 Comparison of two proportions:
( u + v)2 {p1( 100 - p1) + p2(100 - p2)}
n= (p1 - p2)2
Where, u = one-sided % pt. of normal distn. corresponding to (100% - the desire power.)
v = two-sided % pt. of the normal distn. corresponding to desire significant level
m1 & m2 = mean of the two groups
p1 & p2 = percentage of the two groups