Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
29 views9 pages

Sampling

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views9 pages

Sampling

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Sample size calculation and Sampling Methods

Sample size calculation


One of the most difficult decisions facing the researcher is how large his
sample should be. Two common approaches are employed in research
studies: the empirical and the analytical. The empirical approach involves
using sample sizes that have been used in similar studies. This has no
scientific basis, and will only be satisfactory if the previous studies had
acceptable limits on the errors of generalization, and the current study is very
similar in its scope (objectives, design, study population, etc.). This method
is not recommended.
The analytical (scientific) approach to determining the appropriate size of the
sample to be included in the study depends on the assessment of errors of
inference, and a desire to minimize ‘sampling error’.
The main determinant of the sample size is how accurate the results need to
be. This depends on the purpose of the study (descriptive or analytical study).
Sample sizes for descriptive studies
In the case of descriptive studies, often the objective is to obtain an estimate
of a population parameter. The determination of the size of sample depends
on several factors:
a. What is the measure of interest? This would have been determined by the
study objectives. The identification of the characteristic of primary
importance determines the next steps in the process of defining the sample
size. For example, if a prevalence rate in the population is to be estimated
by observing a sample from the population, the measure is the proportion
of people in the sample with the disease.
b. What is the underlying probability distribution of the characteristic of
interest? Most research questions fall into one of two possible scenarios:
the binomial distribution (when one wants to estimate the proportion of a
certain event), and the normal distribution (when one wants to estimate
an average value). For example, the market researcher has the preference
of a brand as the characteristic, with two possible outcomes. If one
assumes that there is possibly a fixed proportion () of people with
preference for the brand, then the number of people expressing this
preference in any fixed set of people will follow a binomial distribution,
with the proportion (p) of the people showing the preference as a good
estimate of the population proportion. For the nutritionist, the daily caloric
intake of individuals follows a normal distribution with some average (),

1|Sampling and sample size determination in research -By


Wondimagegn W.(MPHE)
April,2021
and the average of the daily caloric intake of the sample of people (x)
observed would be a good estimate of this population value.
c. What is the sampling distribution of the measure? Drawing inferences
from the sample to the population involves inherent errors, which are
measured by the sampling distribution. If we observed several samples,
under the same method of selecting the samples, the measures from each
of these samples would vary, resulting in a ‘probability distribution’ for the
sample measure. This distribution is called the sampling distribution, and
it depends on the type of study design and on how the samples were
obtained. In calculating sample sizes, it is often assumed that the
sampling involves simple random sampling. Sometimes the sampling
design is much more complicated (e.g. multistage cluster sampling
techniques) and more complicated formulae will have to be used to
calculate sample sizes appropriately.
d. How accurate do you want the results to be? Basically, one is interested
in obtaining an estimate as close to the population value as possible.
Therefore, some measure of the difference between the estimate and the
population value has to be considered. In most cases, a mean-squared
error (average of the squared deviation of the sample value from the
population value) is used. A concise way of expressing this error is to use
the ‘standard error of the estimate’. The standard error comes from the
sampling distribution of the estimate. If the sampling is done properly
(with appropriate probabilistic methods), one can predict what this
distribution should be, and based on this, one can estimate how close to
the population value the sample estimate will be:
For example, in the case of estimating the population proportion, the
sampling distribution of the sample proportion, p is approximately normal,
with mean  and variance n, where n is the sample size. This gives
the (1-) confidence interval for to be

Where zis the appropriate cut-off point on the standard normal


distribution. (For example, for 95% confidence, z = 1.96.)
1-

The accuracy of the estimate therefore depends on two quantities: how narrow
this interval is (width of the interval) and how confident we are (e.g. 95%).
The calculation of the size of the sample for a descriptive study therefore
depends on the two parameters – the width of the confidence interval and the
confidence coefficient. Computer programs are readily available (e.g. EPIINFO
has a module that allows for the computation of sample sizes). The two

2|Sampling and sample size determination in research -By


Wondimagegn W.(MPHE)
April,2021
common scenarios, estimating a population proportion and estimating a
population mean, are illustrated below:
i. Estimating a population proportion (p). Suppose we want to conduct a survey
to determine the prevalence of a relatively common disease in a
community. We want to determine how many people should be observed to
obtain a reasonably accurate picture of the prevalence. The following steps
are necessary:
· Specify the parameters of error:
Confidence coefficient (1-) 95%

Width of the interval () 10%

· Make a guess as to the value of  30%


The problem is to calculate the sample size required for estimating the
prevalence of the disease within ± 5% of the true value, with 95% confidence.
Since the confidence interval actually depends on the true value, p, we have
to make a guess as to what this value might be. This is done based on prior
experience; if no guess is available, use the value 50%, which will give the
largest sample size. Using the fact that the sample proportion (p) has the
confidence interval given above, the sample size (n) can be calculated using
the formula:

In the above example, therefore, n = (1.96/0.05)2(30*70) = 323; we need a


minimum of 323 subjects observed to assure that the 95% confidence interval
for the estimated proportion will be within 5% of the true prevalence. If the
true prevalence is less than 30%, the confidence interval will be narrower.
The maximum sample size required will occur when the true prevalence is
50%, in which case, n = 385.
The above calculation assumes a simple random sample from a relatively
large population. In practice, the population from which the samples are
drawn may be fixed and small, in which case corrections to the above
formulae are required. (See EPIINFO program for variations of this formula,
and use under different sampling design
[https://silo.tips/download/statcalc-calculating-a-sample-size-with-epi-
info])
ii.Estimating a population average (): Suppose we want to estimate the
average daily caloric intake of people in a community. The daily caloric
intake is assumed to have a normal distribution around , with a
3|Sampling and sample size determination in research -By
Wondimagegn W.(MPHE)
April,2021
standard deviation (). The sample measure used to estimate  is the
sample mean. The sampling distribution of the sample mean is also
normal, with the same mean,  and standard deviation, n (the
standard error of the mean).
Notice that we need to know the value ofto proceed further. It is
either obtained from other similar studies, or by actually obtaining a
small number of observations at random in a test study. If neither of
these is possible, one may make a reasonable guess by taking the
maximum range (maximum value possible – minimum value possible)
and dividing this range by 4. (Using the supposition that for normal
distribution, 95% of values will be within ± 2 standard deviation from
the mean, and the mean will be the central value.) Then the following
formula ca be used to calculate the sample size:
n = (z σ2/

 Example, a researcher wishes to estimate the mean serum


cholesterol in a population of men. From previous similar studies a
standard deviation of 40 mg/100ml was reported. If he is willing to
tolerate a marginal error of up to 5 mg/100ml in his estimate, how
many subjects should be included in his study? (α =5%, two sided).

Sample sizes for analytical studies


Since the primary purpose of an analytical study is to test (one or more) null hypotheses, the
determination of the sample sizes requires the specification of the limits of errors one is willing to
accept in accepting or rejecting the null hypothesis (type I and type II errors). By equating the two
types of errors based on the sampling distribution to the pre-set limits on these errors, we can work
out the sample size. For example, suppose we decide to accept a type I error, or (probability of
making a false conclusion that the two proportions are not equal in the population, when they are
in fact equal). The calculation of a type II error, or (probability of making a false decision that
the two proportions are equal when they are not) depends on a precise definition of ‘null hypothesis
is not true’. The simplest way to do this is to define the smallest difference () in the two
proportions that we consider meaningful (clinically significant difference) and calculate under
this hypothesis. Clearly, if the difference is larger than , the probability of type II error will be
less. Using this approach, formulae have been derived for calculating sample sizes for various
types of statistical tests. [Note: In statistical tests, the discussion of type II errors may be worded
in terms of ‘statistical power’, which is simply 1-: i.e. having a 5% type II error is the same as
the study having 95% ‘power’.]

4|Sampling and sample size determination in research -By


Wondimagegn W.(MPHE)
April,2021
There were different formulae derived for each study designs. However, they are more complicated
and used rarely. Thus there are different automated computer programs and webpages designed to
calculate sample sizes for each study designs. For example, Epi info and STATA
Sampling methods
Once the population has been identified and the size of the sample determined,
we need to decide how we are going to choose the sample from the population.
The size of the sample will also depend on this choice and therefore, the issue of
sample size may have to be revisited after the choice of the sampling method.
Before proceeding to the sampling techniques, let us define some concepts.
Sampling: Is a process by which we study a small part of population to make
judgments about the entire population. It is used to make estimates of the
population of interest.
Population: An item (or items) that you observe, measure, or collect while
researching for new findings for the problem.
Target /reference/source population: the population of interest, to which the
investigators would like to generalize the results of the study, and from which a
representative sample is to be drawn.
Study or Sample population: the population included in the sample /actually
studied population/
Study unit: the unit in which information is collected
Sampling unit: the unit of selection in the sampling process. The sampling unit
is not necessarily the same as the study unit. If the objective is to determine the
availability of latrine, then the study unit would be the household; if the objective
is to determine the prevalence of trachoma, then the study unit would be the
individual.
Sampling frame: the list of all the units in the reference population, from which
a sample is to be selected
Types of sampling methods/technique
There are two broad divisions of sampling methods. These are probability and
non-probability sampling methods.
Non-Probability Sampling method
In the case of non-probability sampling methods, units of the sample are chosen
on the basis of personal judgment or convenience. It is Used when a sampling
frame does not exist. The chance (probability) of selecting members from the

5|Sampling and sample size determination in research -By


Wondimagegn W.(MPHE)
April,2021
population are unknown. There are NO statistical techniques for measuring
random sampling error in a non-probability sample and generalizability is never
statistically appropriate. Thus, inappropriate if the aim is to measure variables
and generalize findings obtained from a sample to the population.
The common non-probability sampling methods include:
Judgmental /Purposive- Researchers choose the sample based on who they
think would be appropriate for the study. Primarily used when there is a limited
number of people that have expertise in the area being researched. It is efficient
and economic when the sample sizes are small.
Quota- The population is first segmented in to mutually exclusive sub-groups as
in stratified sampling. Select subjects until a specific number of units/quota/
for various sub-groups has been filled. No rules for selecting the subjects. This
is one of the most common forms of non-probability sampling.
Convenience/Haphazard: Selection of subjects based on easy availability &
accessibility. “Take them where you find them”. Example: People who just
happen walking
Snowball: Involves a process of “chain referrals”. A friend of friend. Suitable for
locating key informants. You start with one or two key informants and ask them
if they know persons who know a lot about your topic of interest. Used when
trying to interview hard to reach groups.
Volunteer/self-selection: Subjects selected are volunteers who show interest to
the study. It is commonly used when the selection characteristics of the people
create abnormal or undesirable conditions. Common in trials demanding long
duration. Payments for subjects sometimes be involved. Introduces strong
bias/self-selection/ volunteer bias.
Probability Sampling method
A sampling technique in which every member of the population will have a known
chance, nonzero probability of being selected.
The common probability sampling methods include:
a. Simple random sample
This is the most common and the simplest of the sampling methods. In this
method, the subjects are chosen from the population with equal probability of
selection. One may use a random number table, or use techniques such as
putting the names of the people into a hat and selecting the appropriate number
of names blindly. Recently, computer programs have been developed to draw
simple random samples from a given population. The simple random sample has
6|Sampling and sample size determination in research -By
Wondimagegn W.(MPHE)
April,2021
the advantages that it is easy to administer, is representative of the population
in the long run, and the analysis of data using such a sampling scheme is
straightforward. The disadvantage is that the selected sample may not be truly
representative of the population, especially if the sample size is small. This
method requires a complete and updated list of sampling units, which may be
difficult to obtain in emergency situations.
b. Stratified sampling
Systematic random sampling is based on selection of units situated at a certain
predetermined interval called the sampling interval. When the size of the sample
is small and we have some information about the distribution of a particular
variable (e.g. gender: 50% male/50% female), it may be advantageous to select
simple random samples from within each of the subgroups defined by that
variable. By choosing half the sample from males and half from females, we
assure that the sample is representative of the population with respect to gender.
When confounding is an important issue (such as in case-control studies),
stratified sampling will reduce potential confounding by selecting homogeneous
subgroups. One of its main advantages is that it can also be used without having
a list of basic sampling units, as in situations where dwellings are well organized
in rows, blocks, or along a river or main road, for example.
c. Cluster sampling
In many administrative surveys, studies are done on large populations which
may be geographically quite dispersed. To obtain the required number of
subjects for the study by a simple random sample method will require large costs
and will be inconvenient. In such cases, clusters may be identified (e.g.
households) and random samples of clusters will be included in the study; then
every member of the cluster will also be part of the study. It is preferable to select
a large number of small clusters rather than a small number of large clusters.
d. Multi-stage sampling
Multistage sampling entails two or more stages of random sampling based on
the hierarchical structure of natural clusters within the population. Clusters are
natural groupings of people—for example, electoral wards, general practices,
schools, or households. A different type of cluster is randomly sampled at each
stage, with the clusters nested within each other at successive stages. The final
stage of sampling involves choosing a random sample of people in the clusters
selected at the penultimate stage.
Bias in sampling

7|Sampling and sample size determination in research -By


Wondimagegn W.(MPHE)
April,2021
Is all about lack of representativeness. The crucial step in most design aspects
lies in the phrase “a sample that represents the population.” Sampling bias can
arise in many ways. Clear thinking about this step avoids most of the problems.
Although true randomness is a sampling goal, too often it is not achievable. In
the spirit of “some information is better than none,” many studies are carried
out on convenience samples that include biases of one sort or another. These
studies cannot be considered conclusive and must be interpreted in the spirit in
which they were sampled.

Increasing representativeness by random samples


The attempt to ensure representative samples is a study in itself. One important
approach is to choose the sample randomly. A random sample is a sample of
elements in which the selection is due to chance alone, with no influence by any
other causal factor. Usually, but not always (e.g., in choosing two control
patients per experimental patient), the random sample is chosen such that any
member of the population is as likely to be drawn as any other member. A sample
is not random if we have any advance knowledge at all of what value an element
will have. If the effectiveness of two drugs is being compared, the drug allocated
to be given to the next arriving patient should be chosen by chance alone,
perhaps by the roll of a die or by a flip of a coin.
Sources of bias
The sources of bias in a study are myriad and no list of possible biases can be
complete. Some of the more common sampling biases to be alert to are given in
the following list.
1. Bias resulting from method of selection. Included would be, for example,
patients referred from primary health-care sources, advertisements for
patients (biased by patient awareness or interest), patients who gravitate
to care facilities that have certain reputations, and assignment to clinical
procedures according to therapy risks.
2. Bias resulting from membership in certain groups. Included would be, for
example, patients in a certain geographical region, in certain cultural
groups, in certain economic groups, in certain job-category groups, and in
certain age groups.
3. Bias resulting from missing data. Included would be patients whose data
are missing because of, for example, dropping out of the study because
they got well, or not responding to a survey because they were too ill, too
busy, or illiterate.

8|Sampling and sample size determination in research -By


Wondimagegn W.(MPHE)
April,2021
4. State-of-health bias (Berkson’s bias). Included would be patients selected
from a biased pool, that is, people with atypical health. The combination
of exposure to a risk and occurrence of the disease makes it more likely
that an individual will be admitted to hospital. In a case-control study, this
means the hospital cases could have higher risk exposures or disease than
cases from the population at large. This can affect the estimates of the
association between the exposure and the disease.
5. Prevalence-incidence bias (Neyman’s bias). Included would be patients
selected from a short sub period for having a disease showing a pattern of
occurrence irregular in time. The very sick or very well (or both) are
erroneously excluded from a study. The bias (“error”) in your results can
be skewed in two directions:
 Excluding patients who have died will make conditions look less severe.
 Excluding patients who have recovered will make conditions look more
severe.
6. Comorbidity bias. Included would be patients selected for study who have
concurrent diseases affecting their health.
7. Reporting bias. Some socially unacceptable diseases are underreported

9|Sampling and sample size determination in research -By


Wondimagegn W.(MPHE)
April,2021

You might also like