Chapter 5: Sampling
and Data Collection
Dr. Mohammed Shamim Uddin Khan
Professor and Ex-Chairman
Department of Finance
University of Chittagong
1
Terminology
Sample: Subset of a larger population
Sampling: The process of obtaining information from a
subset (sample) of a larger group (population)
Population: Any Complete Group
– People
– Sales Territories
– Stores
Census: Investigation of all individual elements that make
up a population
Population Vs. Sample
Population of Interest
Population Sample
Sample
Parameter Statistic
We measure the sample using statistics in order to draw
inferences about the population and its parameters.
Steps in Sampling Process
1. Define the population
2. Identify the sampling frame
3. Select a sampling design or procedure
4. Determine the sample size
5. Draw the sample
Sampling Design Process
Define Population
Determine Sampling Frame
Determine Sampling Procedure
Probability Sampling Non-Probability Sampling
Type of Procedure Type of Procedure
Simple Random Sampling Convenience
Stratified Sampling Judgmental
Cluster Sampling Quota
Determine Appropriate
Sample Size
Execute Sampling
Design
Sampling Frame
A list of elements from which the sample may be
drawn
Working Population
Mailing Lists - Data Base Marketers
Sampling Frame Error
Random Sampling Error
The difference between the sample results and
the result of a census conducted using identical
procedures
Statistical fluctuation due to chance variations
Systematic Errors
Non-sampling errors
Unrepresentative sample results
Not due to chance
Due to study design or imperfections in
execution
Errors Associated with Sampling
Sampling Frame Error
Random Sampling Error
Non-response Error
Two Major Categories of Sampling
Probability Sampling
Non-probability Sampling
Non-probability Sampling
Convenience Sampling (Chunk Sampling)
Judgment Sampling (Purposive Sampling)
Quota Sampling
Snowball Sampling
Probability Sampling
Simple Random Sample
Systematic Sample
Stratified Sample
Cluster Sample
Multistage Area Sample
Convenience Sampling
Also called haphazard or accidental sampling
The sampling procedure of obtaining the people
or units that are most conveniently available
Judgment Sampling
Also called purposive sampling
An experienced individual selects the sample
based on his or her judgment about some
appropriate characteristics required of the
sample member
Quota Sampling
The population is divided into cells on the basis of
relevant control characteristics.
A quota of sample units is established for each cell.
A convenience sample is drawn for each cell until
the quota is met.
(similar to stratified sampling; It should not be confused
with stratified sampling)
Snowball Sampling
A variety of procedures
Initial respondents are selected by probability
methods
Additional respondents are obtained from
information provided by the initial respondents
Simple Random Sampling
A sampling procedure that ensures that each
element in the population will have an equal
chance of being included in the sample
Systematic Sampling
A simple process
Every nth name from the list will be drawn
Stratified Sampling
Probability sample
Subsamples are drawn within different strata
Each stratum is more or less equal on some
characteristic
Do not confuse with quota sample
Cluster Sampling
The purpose of cluster sampling is to sample
economically while retaining the characteristics
of a probability sample.
The primary sampling unit is no longer the
individual element in the population
The primary sampling unit is a larger cluster of
elements located in proximity to one another
Examples of Clusters
Population Element Possible Clusters in the United States
U.S. adult population States
Counties
Metropolitan Statistical Area
Census tracts
Blocks
Households
Examples of Clusters
Population Element Possible Clusters in the United States
College seniors Colleges
Manufacturing firms Counties
Metropolitan Statistical Areas
Localities
Plants
Examples of Clusters
Population Element Possible Clusters in the United States
Airline travelers Airports
Planes
Sports fans Football stadiums
Basketball arenas
Baseball parks
What is the
Appropriate Sample Design?
Degree of Accuracy
Resources
Time
Advanced Knowledge of the Population
National versus Local
Need for Statistical Analysis
Determination of Sample Size
In sampling analysis, the most ticklish question is ‘what
should be the size of the sample (n) or how large or small
should be n? If the sample size is too small, it may not
serve to achieve the objectives. If it is too large, we may
incur huge cost and waste resources. As a general rule, one
can say that the sample must be of an optimum size, i.e. it
should neither be excessively large nor too small. The two
alternative approaches for determining the size of the
sample are:
1. Estimating the sample size based on a proportion
2. Estimating the sample size based on a mean
Estimating the Sample Size Based
on a Proportion
Example 1: A nutrition survey is to be conducted in a Rohinga
camp. Assume that 40% children suffer from malnutrition.
How large a sample would be needed in order to be 95%
certain that the estimated prevalence does not differ from the
true prevalence by more than 0.05?
Solution: Assume that the population is large, Here z = 1.96,
maximum allowable error, e = 0.05, and proportion of
children suffering from malnutrition, p = 0.40. Thus we
employ
z 2 pq (1.96) 2 (0.4)(0.6)
no 2 2
369
e (0.05)
Estimating the Sample Size
Based on a Mean
Example 2: Suppose a researcher wishes to investigate the
average (mean) income level of employees in a city within a
margin of error of and desires a 95% confidence level
assessing the true mean. On the basis of prior studies the
researcher believes that the standard deviation can be
estimated as 1.5. What would be the required sample size?
Solution: Here z = 1.96, maximum allowable error, e = 0.25,
and standard deviation, 15
Thus we employ
z 2 2 (1.96) 2 (1.5)2
no 2 2
138
e (0.25)
Stratified Sampling
Example 3: A population with 300 university students is
divided according to the faculty they belong to: Science,
Arts, Social Sciences and Business studies. The numbers
of students in these faculties were respectively 50, 120,
70, and 60. A stratified sample of 30 is to be selected.
Use proportional allocation technique to allocate sample
size to different strata.
Solution:
Proportional Allocation Method:
Ni Where n = sample size = 30,
ni n Ni = Size of each strata, N1 = 50, N2 = 120, N3 = 70,
N N4 = 60, N = Size of population = 300
Example 3 Cont.
N1 50 N3 70
n1 n 30 5 n3 n 30 7
N 300 N 300
N 120 N 60
n2 n 2 30 12 n4 n 4 30 6
N 300 N 300
Thus using stratified random sampling, we will select 5
students from stratum 1 (Science), 12 students from stratum
2 (Arts), 7 students from stratum 3 (Social Sciences) and 6
students from stratum 4 (Business Studies) to make up a
total of n = 30. Note that all of the four strata have a
uniform sampling fraction 1/10 = 10%.
Data and Its Classification
Data: The raw materials of statistics consists of numbers or observations
usually obtained by some process of counting or measurement, they are
referred to collectively data. Thus, ‘A set of observations is called data’.
Classification of Data: Data can be classified in a number of ways.
1.Data according to origin: (a) Population data (b) Sample data.
2.Data according to variable: (a) Qualitative (categorical) data (b)
Quantitative data.
3.Data according to time: (a) Time series data (b) Cross-section data (c) Panel
data
4.Data according to measurements of scale: (a) Nominal data (b) Ordinal data
(c) Interval data (d) Ratio data.
5.Data according to subject (Discipline): (a) Economic data (b) Agriculture
data (c) Medical data (d) Business data (e) Metrological data (f) Import data
(g) Export data etc.
N.B. Again quantitative data can be classified as (i) Discrete data (ii)
Continuous data
Types of Data
1. Categorical: (e.g., Sex, Marital Status, income category)
2. Continuous: (e.g., Age, income, weight, height, time to
achieve an outcome)
3. Discrete: (e.g.,Number of Children in a family)
4. Binary or Dichotomous: (e.g., response to all Yes or No
type of questions)
31
Scale of Data
1. Nominal: These data do not represent an amount or quantity
(e.g., marital status, religion, race, sex)
2. Ordinal: These data represent an ordered series of
relationship (e.g., level of education)
3. Interval: These data is measured on an interval scale having
equal units but an arbitrary zero point. (e.g.: Temperature in
Fahrenheit)
4. Ratio: Variable such as weight for which we can compare
meaningfully one weight versus another (say, 100 Kg is
twice 50 Kg)
32
Methods of Data Collection
The task of data collection begins after a research problem
has been defined and research design/plan chalked out.
While deciding about the method of data collection to be
used for the study, the researcher should keep in mind two
types of data viz., primary and secondary.
A researcher as per requirement of study may decide on
use of primary data or secondary data or both.
Both primary and secondary data have their own pros and
cons.
Primary and Secondary Data
Primary Data: The primary data are those which are
collected afresh and for the first time, and thus happen to
be original in character. Primary Data are collected by
the researcher.
Secondary Data: The secondary data are those which have
already been collected by some other agency and which
have already been processed. Secondary data collected
by someone else and have already been passed through
the statistical process.
Collecting Secondary Data
Sources of secondary data are existing literature,
Reports of professional agencies, Departments,
Archives, Internet, etc.
While collecting secondary data one has to
follow legal procedures required and maintain
the academic ethics.
Scrutiny of Secondary Data
1. Suitability: The complier should satisfy himself that the data contained
in the publication will be suitable for his study. In particular, the
conformity of the definitions, units measurement and time frame
should be checked.
2. Reliability: The reliability of the secondary data can be ascertained
from the collecting agency, mode of collection and the time period
of collection. For instance, secondary data collected by a voluntary
agency with unskilled investigators are unlikely to be reliable.
3. Adequacy: The source of data may be suitable and reliable but the data
may not be adequate for the proposed enquiry. The original data may
cover a bigger or narrower geographical region or the data may not
cover suitable periods.
4. Accuracy: The user must be satisfied about the accuracy of the
secondary data. The process of collecting raw data, the reproduction
of processed data in the publication, the degree of accuracy desired
and achieved should also be satisfactory and acceptable to the
researcher.
Methods of Collecting Primary Data
There are several methods of collecting primary data,
particularly in surveys and descriptive research.
Important ones are-
Observation
Interview
Questionnaire
Schedule
Other Methods
Primary Data Collection
Techniques
Quantitative Data Qualitative Data
Collection Techniques Collection Techniques
1. Interviewing Method 1. Unstructured interview
2. Observation Method 2. Observation Method
3. Mail Questionnaire 3. Focus Group Discussion
4. Experimental Method 4. Document Study
5. Data Base 5. Content Analysis
Other Data Collection
Techniques
1. Delphi Technique
2. Panel Study
3. Rapid Rural Appraisal
4. Participatory Rural Appraisal
5. Nominal Group Technique
6. Key Informant Interview
7. Community Risk Assessment
Observation
See what is happening
– traffic patterns
– land use patterns
– layout of city and rural areas
– quality of housing
– condition of roads
– conditions of buildings
– who goes to a health clinic
Observation is Helpful when:
Need direct information
Trying to understand ongoing behavior
There is physical evidence, products,
or outputs than can be observed
Need to provide alternative when
other data collection is infeasible or
inappropriate
Types of Observation
Participatory and Non Participatory
Candid and Covert
Structured, Semi-structured and
Unstructured.
Controlled and Uncontrolled
Advantages/Disadvantages of
Observation
Advantages:
Subjective bias eliminated
Researcher gets current information
Independent of Respondents
Disadvantages:
Expensive, Time consuming
Limited information
Unforeseen factors may influence observation
Interview
The interview method of collecting data
involves presentation of oral-verbal stimuli
and reply in terms of oral-verbal responses.
This method can be used through personal
interviews or telephone interviews.
Structured, Semi-Structured or Unstructured
Interview.
Interview Types
Personal Interviews: Interviewer asking questions
generally in a face-to-face contact to the other person
or persons. Direct personal investigation or Indirect
oral investigation.
Focused Interview is meant to focus attention on the
given experience of the respondent and its effects.
Clinical Interview is concerned with broad underlying
feelings or motivations or with the course of
individual’s life experience.
Non-directive Interview is that where the
interviewer’s function is simply to encourage the
respondent to talk about the given topic with a bare
minimum of direct questioning.
Skill of Interviewer
The main game in interviewing is to
facilitate an interviewee’s ability to
answer. This involves:
– easing respondents into the interview
– asking strategic questions
– prompting and probing appropriately
– keeping it moving
– winding it down when the time is right
Merits/Demerits of Interview
Merits:
More and in depth information obtained
Personal Information
Greater Flexibility
Adaptation as per the respondent
Demerits:
Bias of Interviewer
Expensive/Time Consuming
Need expertise
Questionnaire Method
A questionnaire is sent (usually by post) to persons
concerned with a request to answer the questions
and return the questionnaire.
A questionnaire consists of a number of questions
printed in a definite order.
The respondents have to answer the questions on
their own.
Steps in Questionnaire
Construction
Preparation
Constructing the first draft
Self-evaluation
External evaluation
Revision
Pre-test or Pilot study
Revision
Second pre-testing
Preparing final draft
Advantages of Questionnaire
Lower cost
Time saving
Accessibility to widespread respondents
No interviewer’s bias
Greater anonymity
Respondent’s convenience
Standard wordings
No Variation
Disadvantages of Questionnaire
Questionnaires can be used only for educated people.
Sometimes different respondent’s interpreted questions
differently
Questionnaires do not provide an opportunity to collect
additional information
Researchers are not sure whether the person to whom the
questionnaire was mailed has himself answered the
questions.
Many questions remain unanswered
The respondent can consult other persons before filling
in the questionnaire.
Essentials of a Good Questionnaire
1. Number of questions should be kept to the minimum.
2. Questions should be simple, short, and unambiguous
3. Question arranged in from simple to difficult.
4. Questions of sensitive/personal nature, technical term and vague
expression should be avoided.
5. Answers to questions should not require calculations.
6. Questions should be capable of an objective answer.
7. Questions should be arranged logically.
8. Proper words should be used in the questionnaire.
9. Questionnaire should look attractive.
10. Questionnaire should be pre-tested to find out its shortcomings if
any.
11. Cross-Check and footnotes should be considered in the
questionnaire.
12. Necessary instructions should be given to the informant.
Collection of Data Through Schedule
Schedules like questionnaires contain a set of
questions.
Researcher /Enumerators appointed collect data
through schedules.
Enumerators go to the field, put questions to the
respondents and fill the schedules.
Enumerators need to be trained.
Questionnaire Vs. Schedule
Questionnaire Schedule
Mailed, filled by Direct contact , filled by
Respondent Researcher or Enumerator
Economical Expensive
Non-Response high Non-Response low
Time Consuming Time bound
Literate, co-operative No such pre condition
respondents
Success depends on Success depends on quality
quality of questionnaire of enumerator
Some Other Methods
Warranty Cards Post card size cards sent to customers
and feedback collected through asking questions.
Distributor or Store Audits are performed by
manufacturer/distributor through salesmen. Information
so obtained are used to estimate market size, market
share, seasonal sales pattern, etc.
Pantry Audits From the observation of pantry of
customer to know purchase habit of people ( of which
product, what brand, etc.). Questions may be asked at the
time of audit.
Some Other Methods
Consumer Panels Pantry audit approach on a regular
basis is known as ‘consumer panel’, where a set of
consumers are arranged to come to an understanding to
maintain detailed daily records of their consumption and
the same is made available to investigator on demands.
Projective techniques developed by psychologists to use
projections of respondents for inferring about underlying
motives, urges, or intentions which are such that the
respondent either resists to reveal them or is unable to
figure out himself.
Some Other Methods
Use of Mechanical Devices Eye Camera is used
to record the focus of eyes of a respondent on a
specific portion of a sketch or diagram or written
material. Psychological vinometer is used for
measuring the extent of body excitement as a
result of the visual stimulus. Motion picture
camera is used to record movement of consumer
at time of purchase. Audiometer is used to know
the preferences to TV channels, programmes.
Some Other Methods
Depth interviews are those interviews that are designed
to discover underlying motives and desires and are often
used in motivational research. Indirect question or
projective technique are used to know the behaviour of
respondents.
Content Analysis Analyzing the contents of
documentary materials such as books, magazines,
newspapers and the contents of all other verbal materials
which can be either spoken or printed.
Editing of Primary Data
Editing involves reviewing the data collected by investigators to ensure maximum
accuracy and unambiguity. It should be done as soon as possible after the data
have been collected. The different steps of editing are discussed below:
1. Checking legibility: Obviously, the data must be legible to be used. If a
response is not presented clearly, the concerned investigator should be asked
to rewrite it.
2. Checking Completeness: An omitted entry on a fully structured questionnaire
may mean that no attempt was made to collect data from the respondent or that
the investigator simply did not record the data. If the investigator did not
record the data, prompt editing and questioning of the investigator may
provide the missing item.
3. Checking Consistency: The editor should examine each questionnaire to
check inconsistency or inaccuracy if any, in the statement. The income and
expenditure figures may be unduly inconsistent. The age and the date of birth
may disagree. The concerned investigators should be asked to make the
necessary corrections.
Selection of Appropriate Method
of Data Collection
Nature, Scope and Object of enquiry
Availability of Fund
Availability of Time
Degree of Precision Required
Precautions in Data Collection
The data must be relevant to the research problem.
It should be collected through formal or standardized research
tools.
The data should be such as these can be subjected to statistical
treatment easily.
The data should have minimum measurement error.
The data must be tenable for the verification of the hypotheses.
The data should be collected through objective procedure.
The data should be accurate and precise.
The data should be reliable and valid
The data should be complete in itself and also comprehensive in
nature.