SAMPLING PLAN
UNIT 3 – Data Collection - Syllabus
➢ Types of data
❖Primary Vs secondary data
➢ Methods of primary data collection
❖Survey Vs observation
❖Experiments
➢ Construction of questionnaire and instrument
❖Validation of questionnaire
➢ Sampling plan
❖Sample size
❖Determinants of optimal sample size
❖Sampling Techniques
❖Probability Vs non-probability sampling methods
Terminology
➢A population is:
❖the total collection of elements about which we wish to make
some inferences
➢A population element is:
❖A single member of the population on which the measurement
is taken
❖It is the unit of study
❖It may be a person or any object of interest.
Terminology … …
➢ A census is a count of all the elements in a population
➢ A Sample is a subset of the population. It comprises only some
elements of the population
➢ A sample frame is the listing of all population elements from
which the sample will be drawn
➢ A sampling unit is a single member of the sample
Terminology … …
➢ Sampling refers to the selection of some elements of the
population
➢ The basic idea is that by selecting some of the elements in a
population, we may draw conclusions about the entire population
Why Sample?
Availability of
Lower cost
elements
Sampling
provides
Greater Greater
speed accuracy
Sampling vs Non-Sampling Error
➢Sampling error: This error arises when a sample is not
representative of the population.
➢Non-sampling error: This error arises not because a sample is
not a representative of the population but because of other
reasons. Some of these reasons are listed below:
❖ Plain lying by the respondent
❖ The error can arise while transferring the data from the questionnaire
to the spreadsheet on the computer
❖ There can be errors at the time of coding, tabulation and computation
What Is A Good Sample?
Accuracy is the degree to
which bias is absent from Precision is measured
the sample (Systematic by the standard error of
Variance) estimate
Accurate Precise
Together they represent
the VALIDITY of the
sample
Sampling Design
➢ Probability Sampling Design
❖Used in conclusive research
❖Here, each and every element of the population has a known
chance of being selected in the sample.
➢Types of Probability Sampling Design
❖Simple random sampling
❖Complex random sampling
▪ Systematic sampling
▪ Stratified random sampling
▪ Cluster sampling
▪ Double
Sampling Design … …
➢Non-probability Sampling Designs
❖Here, the elements of the population do not have any known
chance of being selected in the sample.
➢Types of Non-Probability Sampling Design
❖Convenience sampling
❖Purposive sampling
▪ Judgment sampling
▪ Quota sampling
❖Snowball sampling
Sampling Design
➢ Probability Sampling Design
❖Used in conclusive research
❖Here, each and every element of the population has a known
chance of being selected in the sample.
➢Types of Probability Sampling Design
❖Simple random sampling
❖Complex random sampling
▪ Systematic sampling
▪ Stratified random sampling
▪ Cluster sampling
▪ Double
Larger Sample Sizes
Population
variance
Number of Desired
subgroups precision
When
Confidence Small error
level range
Use Larger Sample Sizes When….
➢The greater the dispersion or variance within the population, the
larger the sample must be to provide estimation precision
➢The greater the desired precision of the estimate, the larger the
sample must be
➢The narrower or smaller the error range, the larger the sample must
be
➢The higher the confidence level in the estimate, the larger the sample
must be
➢The greater the number of subgroups of interest within a sample, the
greater the sample size must be, as each subgroup must meet
minimum sample size requirements.
Simple Random Sampling (SRS)
➢In SRS, each population element has an equal chance of being
selected into the samples
➢The sample is drawn using a random number table or generator
➢The probability of selection is equal to the sample size divided by
the population size.
How to Choose a Random Sample?
➢ The steps are as follows:
➢ Assign each element within the sampling frame a unique number
➢ Identify a random start from the random number table
➢ Determine how the digits in the random number table will be
assigned to the sampling frame
➢ Select the sample elements from the sampling frame that
matches the table number.
Simple Random Sampling Method (SRS)
Advantages Disadvantages
➢ Easy to implement with ➢ Requires list of population
random dialing elements
➢ Time consuming
➢ Uses larger sample sizes
➢ Produces larger errors
➢ High cost
Systematic Sampling
➢Here, an element of the population is selected at the beginning
with a random start and then every Kth element is selected until
the appropriate size is selected.
➢The kth element is the skip interval
➢Skip interval is the interval between successive sample elements
drawn from a sample frame in systematic sampling
➢It is determined by dividing the population size by the sample
size.
How to Draw a Systematic Sample?
➢To draw a systematic sample, the steps are as follows:
❖Identify, list, and number the elements in the population
❖Identify the skip interval
❖Identify the random start
❖Draw a sample by choosing every kth entry.
➢To protect against subtle biases, the research can
❖Randomize the population before sampling
❖Change the random start several times in the process, and
❖Replicate a selection of different samples.
Systematic Sampling Method
Advantages Disadvantages
➢ Simple to design ➢ Periodicity within
➢ Easier than simple population may skew
random sample and results
➢ Easy to determine ➢ Trends in list may bias
sampling distribution of results
mean or proportion ➢ Moderate cost
Stratified Sampling
➢ Here, the population is divided into subpopulations or strata and
simple random method is used on each strata
➢ The cost is high.
➢ Stratified sampling may be proportionate or disproportionate
➢ In proportionate stratified sampling, each stratum’s size is
proportionate to the stratum’s share of the population
➢ Any stratification that departs from the proportionate relationship
is disproportionate.
Stratified Sampling Method
Advantages Disadvantages
➢ Control of sample size in ➢ Increased error will result if
strata subgroups are selected at
➢ Increased statistical different rates
efficiency ➢ Especially expensive if strata
➢ Provides data to represent on population must be created
and analyze subgroups ➢ High cost
➢ Enables use of different
methods in strata
Cluster Sampling
➢In drawing a sample with cluster sampling, the population is
divided into internally heterogeneous subgroups
➢Some are randomly selected for further study
➢Two conditions foster the use of cluster sampling:
❖The need for more economic efficiency than can be provided by simple
random sampling, and
❖The frequent unavailability of a practical sampling frame for individual
elements.
Stratified Vs Cluster Sampling
Stratified Cluster
➢ Population divided into ➢ Population divided into
few subgroups many subgroups
➢ Homogeneity within ➢ Heterogeneity within
subgroups subgroups
➢ Heterogeneity between ➢ Homogeneity between
subgroups subgroups
➢ Choice of elements ➢ Random choice of
from within each subgroups
subgroup
Area Sampling
Area Sampling
➢ Cluster sampling confined to a particular area is Area Sampling
➢ The area could be population with well-defined political or
geographic boundaries
➢ As geographical areas are often chosen as sampling units the
method of sampling is called Area Sampling
➢ EX: Districts, Taluks, Blocks, Village, etc
➢ It is a low-cost and frequently used method.
Cluster Sampling Method
Advantages Disadvantages
➢ Provides an unbiased ➢ Often lower statistical
estimate of population efficiency due to subgroups
parameters if properly done being homogeneous rather
➢ Economically more efficient than heterogeneous
than simple random ➢ Moderate cost
➢ Lowest cost per sample
➢ Easy to do without list
Double Sampling
➢ Collect preliminary data from a sample and choose a sub-sample of
that sample for more detailed investigation
➢ In drawing a sample with double sampling, data are collected using
a previously defined technique
➢ Based on the information found, a subsample is selected for further
study
➢ Example:
❖ To estimate the demand for a Club Resort, you may first collect basic
information by telephonic survey
❖ You might then stratify the respondents by degree of interest and subsample
among them for intensive interviewing
Double Sampling … …
➢ In multistage sampling, the sampling units are of different types
at different stages as in the example above (states, districts,
villages..)
➢ In multiphase sampling, the different phases of observation relate
to sample units of same type
➢ Multiphase sampling is also known as sequential sampling or
double sampling
Double Sampling Method
Advantages Disadvantages
➢ May reduce costs if first ➢ Increased costs if
stage results in enough indiscriminately used
data to stratify or
cluster the population
Sampling Design … …
➢Non-probability Sampling Designs
❖Here, the elements of the population do not have any known
chance of being selected in the sample.
➢Types of Non-Probability Sampling Design
❖Convenience sampling
❖Purposive sampling
▪ Judgment sampling
▪ Quota sampling
❖Snowball sampling
Nonprobability Samples – When Used?
No need to
generalize
Limited
Feasibility
objectives
Issues
Time Cost
Convenience Sampling
➢ Unrestricted, least reliable design
➢ Normally the cheapest and easiest
➢ Researcher is free to choose whomever they find, hence the
name ‘Convenient’
➢ Examples:
❖TV reporting with person-on-the-street
❖Intercept Interviews
❖Use of employees to evaluate a new snack food
➢ Used to gain ideas about a subject of interest
➢ Used in early stages of exploratory research
Purposive Sampling
➢ Two major types: Judgment and Quota sampling
➢ In judgment sampling the experience of an expert is used to
identify a representative sample
➢ Ex 1: Shoppers at a mall may be taken to represent the residents
of a city OR the some of the cities may be representative of the
country
➢ Ex 2: Using employees as a ‘biased’ group for screening a new
product (more favorably disposed)
Purposive Sampling … …
➢ In quota sampling the sample is selected on the basis of certain
demographic characteristics such as age, gender, occupation, education,
etc. to increase representativeness
➢ The sample includes a minimum number from each specified subgroup
➢ Researchers specify one or more control dimension
➢ For example, if we believe that responses to a question should vary
depending on the gender, we should seek proportional men and women
(or say, UG and PG students…)
➢ Where predictive validity has been checked quota sampling has been
generally satisfactory
Snowball Sampling
➢ Snowball sampling is generally used when it is difficult to identify
the members of the desired population
➢ Ex: Deep-sea divers, families with triplets, doctors specializing in
a particular field, etc.
➢ Under this design each respondent, after being interviewed, is
asked to refer other who possess similar characteristics and who,
in turn, identify others
➢ The snowball gathers subjects as it rolls along
DETERMINING SAMPLE SIZE
The Central Limit Theorem
n=5
When sampling from a population 0.25
with mean and finite standard
0.20
0.15
P(X)
0.10
deviation , the sampling 0.05
0.00
X
distribution of the sample mean will n = 20
tend to a normal distribution with
0.2
mean and standard deviation n as
P(X)
0.1
the sample size becomes large 0.0
X
(n >30). Large n
0.4
0.3
f(X)
0.2
0.1
0.0
-
X
Sample Size for Estimating Population Mean
Sample size for estimating population mean -
The formula for determining sample size is given
as:
Where
n = Sample size
σ = Population standard deviation
e = Margin of error ( X − ) or the value given in the problem
Z = The value for the given confidence interval
Illustration 1
➢ An economist is interested in estimating the average monthly
household expenditure on food items by the households of a
town
➢ Based on past data, it is estimated that the standard deviation of
the population on the monthly expenditure on food item is Rs 30
➢ With allowable error set at Rs 7, estimate the sample size
required at a 90% confidence.
Ans: 50 (approx.)
Illustration 2
➢ You are given a population with a standard deviation of 8.6.
Determine the sample size needed to estimate the mean of the
population within ± 0.5 with a 99 % confidence.
Ans: 1962 (approx.)
Illustration 3
➢ It is desired to estimate the mean life-time of a vacuum cleaner
➢ Given that the population standard deviation is 320 days, how
large a sample is needed to be able to assert with a confidence
level of 96 % that the mean of the sample will differ from the
population mean by less that 45 days?
Ans: 214 (approx.)
Sample Size for Estimating Population Proportion
Sample size for estimating population proportion –
1. When population proportion p is known
2. When population proportion p is not known n =
Z (0.25)
2
e
Where
n = Sample size
σ = Population standard deviation
e = Margin of error ( X − ) or the value given in the problem
Z = The value for the given confidence interval
Illustration 4
➢ A market researcher for a consumer electronics company would
like to study the television viewing habits of the residents of a
particular, small city
➢ What sample size is needed if he wishes to be 95% confident of
being within ±0.035 of the true proportion who watch the evening
news on at least three weeknights if no previous estimate is
available?
n= Z
(0.25)
2
e Ans: 784 (approx.)
Illustration 5
➢ A manager of a department store would like to
study women’s spending per year on cosmetics
➢ He is interested in knowing the population
proportion of women who purchase their
cosmetics primarily form his store
➢ If he wants to have a 90% confidence of
estimating the true proportion to be within ±0.045,
what sample size is needed?
2
n= Z
(0.25)
2
e Ans: 335 (approx.)
Illustration 6
➢ A consumer electronics company wants to
determine the job satisfaction levels of its
employees
➢ For this, they ask a simple question, “Are you
satisfied with your job?”
➢ It was estimated that no more than 30% of the
employees would answer yes
➢ What should be the sample size for this company
to estimate the population proportion to esnure a
95% confidence in result, and to be within 0.04 of
the true population proportion?
Ans: 505 (approx.)
Exercise 1
➢ Given a population with a SD of 8.6, what size sample is needed
to estimate the mean of the population within ±0.5 with 99
percent confidence?
Determining An Interval
➢ Find a symmetrically distributed interval around
µ that will include 95% of the sample means
when µ = 368, σ = 15, and n = 25.
❖Since the interval contains 95% of the sample
means 5% of the sample means will be outside the
interval
❖Since the interval is symmetric 2.5% will be above
the upper limit and 2.5% will be below the lower limit.
❖From the standardized normal table, the Z score with
2.5% (0.0250) below it is -1.96 and the Z score with
2.5% (0.0250) above it is 1.96.
Determining An Interval
➢ Calculating the lower limit of the interval
σ 15
XL = μ+Z = 368 + (−1.96) = 362.12
n 25
➢ Calculating the upper limit of the interval
σ 15
XU = μ + Z = 368 + (1.96) = 373.88
n 25
➢ 95% of all sample means of sample size 25 are
between 362.12 and 373.88
Central Limit Theorem
➢ The mean of the sampling distribution of the mean will equal the
population mean
❖regardless of the sample size
❖even if the population is not normal
➢ As the sample size increases, the sampling distribution of the
mean will approach normality
❖Regardless of the shape of the population distribution