Research
Methodology
UNIT III- Part a
Unit III Syllabus
Sampling Design: Questionnaire Design: Data Collection and
Fundamentals of Techniques Precautions Preparation
Sampling
Primary and Secondary Sources
Probability, Design, Non- Measurement and Scaling of data; Data Tabulation,
probability and Sampling, Techniques: Types of Data; Editing and Coding.
Sample Size Determination, Rating Scale and Ranking
Reliability and Validity. Scales.
S ampling
Sampling is a fundamental concept in statistics and research methodology. It
involves selecting a subset of a population to study, rather than examining the
entire population.
Art of taking representative elements from the population
By selecting some of the
elements in a population
we can draw conclusions
about the entire
population
Sampling is the process of selecting a sample from a population.
--- Kothari
Researchers select a sample due to various limitations that may not
allow researching the whole population.
--- Mugenda & Mugenda
In sampling, a section of the population that is selected represents the
entire population.
The Sampling Design Process
1 Define the Population
2 Determine the Sampling Frame
3 Select Sampling Technique(s)
4 Determine the Sample Size
5 Execute the Sampling Process
The sampling design process involves several key steps to ensure the research study is conducted effectively. First, the
population must be clearly defined. Next, the sampling frame, or the list of all the elements in the population, is
determined. The appropriate sampling technique(s) are then selected, such as simple random sampling or quota
sampling. The required sample size is also calculated to ensure statistical significance. Finally, the sampling process is
executed to collect the necessary data from the target population.
Clas s ification of Sampling Techniques
We denominate sampling to the selection of a group of observations (sample) from a bigger
population to obtain information. There are two types of sampling methods:
Non-Probability Sampling Probability Sampling
Here, the researcher (or his client) In these sampling techniques,
selects samples based on some everyone in the population has an
subjective judgment rather than equal chance of being included in the
random selection. Therefore, some sample. These techniques are used to
individuals might have a lower or extract representative samples to
zero probability of being selected. make inferences to the population.
Researchers use this method in
studies where it is impossible to draw
random probability sampling due to
time or cost considerations.
.
Classification of Sampling Techniques
Sampling
SAMPLING
Non
Probability
Probability
Sampling
Sampling
Simple
Systematic Stratified Cluster Convenience Judgment Quota Snowball
Random
Sampling Sampling Sampling Sampling Sampling Sampling sampling
Sampling
Convenience Sampling
Convenience sampling attempts to obtain a sample of
convenient elements. Often, respondents are selected because
they happen to be in the right place at the right time.
Some examples of convenience sampling include the use of
students, and members of social organizations, mall intercept
interviews without qualifying the respondents, department
stores using charge account lists, and "people on the street"
interviews.
J udgment Sampling
Judgment sampling Judgment sampling, also referred to as
judgmental sampling or authoritative sampling, is a non-probability
sampling technique where the researcher selects units to be
sampled based on his own existing knowledge, or his professional
judgment.
Judgment sampling can also be referred to as purposive
sampling. This is because judgment sampling is used in cases where
the knowledge of an authority can select a more representative
sample, which can in turn yield more accurate results than if other
probability sampling techniques were used.
Judgment sampling is prone to researcher bias
Quota Sampling
Quota sampling may be viewed as two-stage restricted judgmental sampling.
– The first stage consists of developing control categories, or quotas, of population elements.
– In the second stage, sample elements are selected based on convenience or judgment.
Snowball Sampling
In snowball sampling, an initial group of respondents is
selected, usually at random.
– After being interviewed, these respondents are asked to
identify others who belong to the target population of
interest.
– Subsequent respondents are selected based on the
referrals.
Simple Random Sampling
• Each element in the population has a known an
equal probability of selection.
• Each possible sample of a given size (n) has a know
and equal probability of being the sample actually selected.
• This implies that every element is selected independently
of every other element.
• It is often used for small finite population
Systematic Random Sampling
• The sample is chosen by selecting a random starting point and
then picking every ith element in succession from the sampling
frame.
• The sampling interval, i, is determined by dividing the population size N
by the sample size n and rounding to the nearest integer.
• When the ordering of the elements is related to the characteristic
of interest, systematic sampling increases the representativeness
of the sample.
• If the ordering of the elements produces a cyclical pattern,
systematic sampling may decrease the representativeness of the
sample.
For example, there are 100,000 elements in the population and a sample
of 1,000 is desired. In this case the sampling interval, i, is 100. A
random number between 1 and 100 is selected. If, for example, this
number is 23, the sample consists of elements 23, 123, 223, 323, 423,
523, and so on.
Stratified Sampling
• A two-step process in which the population is partitioned into
subpopulations, or strata.
• The strata should be mutually exclusive and collectively exhaustive in that
every population element should be assigned to one and only one stratum
and no population elements should be omitted.
• Next, elements are selected from each stratum by a random procedure,
usually SRS.
• A major objective of stratified sampling is to increase precision without
increasing cost.
• The elements within a stratum should be as homogeneous as possible, but
the elements in different strata should be as heterogeneous
as possible.
Clus ter Sampling
• The target population is first divided into mutually exclusive and
collectively exhaustive subpopulations, or clusters.
• Then a random sample of clusters is selected, based on a probability
sampling technique such as SRS.
• For each selected cluster, either all the elements are included in the
sample (one- stage) or a sample of elements is drawn probabilistically
(two-stage).
• Elements within a cluster should be as heterogeneous as possible, but
clusters themselves should be as homogeneous as possible. Ideally, each
cluster should be a small-scale representation of the population.
• In probability proportionate to size sampling, the clusters are
sampled with probability proportional to size. In the second stage, the
probability of selecting a sampling unit in a selected cluster varies inversely
with the size of the cluster.
Elements within a cluster
should be as heterogeneous
as possible, but clusters
themselves should be as
homogeneous as possible.
Ideally, each cluster should be
a small-scale representation of
the population.
DO IT Y OURS E L F
• List advantages and disadvantages of each type
of sampling technique
• Differentiate between Probability and Non
probability Sampling
Sample Size Dis tribution
• The most important objective of a statistical analysis is to draw inferences
about the population using sample information. “How big a sample is
required?” , is one of the most frequently asked questions by the
investigators.
• The inference to be drawn is related to some parameters of the population
such as the mean, standard deviation or some other features like the
proportion of an attribute occurring in the population.
• If the inference about the population is to be drawn on the basis of the
sample, the sample must conform to certain criteria: the sample must be
representative of the whole population.
BUT HOW??
Bas ic Factors to be cons idered
• Level of Precision - The ‘degree of precision’ is the margin of permissible error between the
estimated value and the population value. In other words, it is the measure of how close an estimate
is to the actual characteristic in the population. It depends on the amount of risk a researcher is
willing to accept while using the data to make decisions.
If the sampling error or margin of error is ±5%, and 70% unit in the sample attribute some criteria, then it can be concluded that
65% to 75% of units in the population have attributed that criteria.
• The error which arises due to only a sample being used to estimate the population parameters is
termed as sampling error or sampling fluctuation.
R elations hip between
s ample s ize and s ampling error
• A sample with the smallest sampling error will always be considered a good
representative of the population. Bigger samples have lesser sampling errors.
• On the other hand, smaller samples may be easier to manage and have less non-
sampling error. Handling of bigger samples is more expensive than smaller ones.
In Census survey,
sampling error is zero
Bas ic Factors to be cons idered
• Confidence level desired : The confidence or risk level is ascertained through the
well established probability model called the normal distribution and an associated
theorem called the Central Limit theorem.
• The confidence level tells how confident one can be that the error toleration does
not exceed what was planned for in the precision specification.
• Usually 95% and 99% of probability are taken as the two known degrees of
confidence for specifying the interval within which one may ascertain the existence
of population parameter
95% confidence level means if an investigator takes
100 independent samples from the same population,
then 95 out of the 100 samples will provide an
estimate within the precision set by him
Central L imit Theorem at a glance
• In general, the normal curve results whenever there are a large number of
independent small factors influencing the final outcome. It is for this reason that
many practical distributions, be it the distribution of annual rainfall, the weight at
birth of babies, the heights of individuals etc. are all more or less normal, if
sufficiently large number of items are included in the population.
• It can be shown that even when the original population is not normal, if we draw
samples of n items from it and obtain the distribution of the sample means, we
notice that the distribution of the sample means become more and more normal as
the sample size increases. This fact is proved mathematically in the Central Limit
theorem.
The theorem says that if we take samples of size n from
any arbitrary population (with any arbitrary
distribution) and calculate x , then sampling distribution
of x will approach the normal distribution as the sample
size n increases with mean µ and standard error σ/ √n
Bas ic Factors to be cons idered
• Degree of variability: The degree of variability in the attributes being measured
refers to the distribution of attributes in the population.
• The more heterogeneous a population, the larger the sample size required to be, to
obtain a given level of precision.
• For less variable (more homogeneous) population, smaller sample sizes works
nicely.
• Note that a proportion of 50% indicates a greater level of variability than that of
20% or 80%. This is because 20% and 80% indicate that a large majority do not or
do, respectively, have the attribute of interest.
Because a proportion of 0.5 indicates the maximum variability in a population, it is often
used in determining a more conservative sample size.
Strategies for Determining Sample Size
• Cochran’s formula for calculating sample size when the population is
infinite:
• where,
– n is the sample size,
– z is the selected critical value of desired confidence level,
– p is the estimated proportion of an attribute that is present in the population,
– q = 1-p
– and e is the desired level of precision
STANDAR D NORMAL
TABLE (z)
Cochran’s formula for calculating s ample s ize when
the population is infinite
Strategies for Determining Sample Size
• Cochran’s formula for calculating sample size when population size is finite:
• where,
– n0 =
– N is the population size
– and e is the desired level of precision
Strategies for Determining Sample Size
• Yamane’s formula for calculating sample size:
• According to him, for a 95% confidence level and p = 0.5, size of the sample
should be
• where,
– N is the population size
– and e is the desired level of precision
Yamane’s formula for calculating s ample s ize:
Ques tions to s olve
Example Scenario:
• Population size (N): 50,000
• Desired confidence level: 95%
• Estimated proportion (p): 0.4 (i.e., 40% of the population is estimated to
have the characteristic of interest)
• Margin of error (e): 5% (0.05)
Use Cochran’s and Yamane’s methods to determine sample size.
Ans wer -
Ans wer -
Ans wer -
R eliability & Validity
• Validity
▫ Extent to which a test measures what we actually wish to measure
• Reliability
▫ Accuracy and Precision of a measurement procedure
▫ Whether the same measurement processyields the same results
▫ To the degree a measure supplies consistent results
▫ Concerned with the degree to which a
measurement is free of random or unusable error
Reliability is necessary but not sufficient for Validity
• Can have a reliable measure that is not valid
• A measure must be reliable for it to be valid
R eliability & Validity
Validity
High Low
High
Reliability
Low
Extent to which a test measureswhat we actually wish to measure
VALIDITY •
•
External Validity - Ability of the data to be generalized across persons, settings, and times
Internal Validity - Ability of a research instrument to measure what it is purported to measure
• Classification : Content Validity ; Criterion Validity ; Construct Validity
Type What is Measured Example Methods
Sales Rep Job Satisfaction
Whether measurement instrument
(measurescovering all parts
provides adequate coverage of the 1. Judgment
Content of the job – duties, fellow
investigative questions guiding the 2. Panel evaluation
workers, management, pay,
study
promotion, etc.)
Degree to which the predictor is
Criterion-
adequate in capturing the relevant
Related
aspects of the criterion
Description of the current; criterion GMAT score and MBA
Concurrent data are available at the same time as program grades 1. Correlation
predictor scores
Prediction of the future; criterion data
Predictive
are measured after the passage of time
1. Judgmental
Answersthe question, “What accounts Doesthe measure confirm or 2. Correlation of proposed test with
for the variance in the measure?”; deny the hypothesized established one
attempts to identify the underlying relationships among the 3. Convergent-discriminant techniques
Construct
construct(s) being measured and constructs(convergent and/or 4. Factor Analysis
determine how well the test represents discriminant validity) 5. Multitrait-Multimethod
it (them). 6. analysis
VALIDITY
• Construct validity addresses the question of what construct or characteristic the scale is, in fact,
measuring. Construct validity includes convergent, discriminant, and nomological validity
– Convergent validity is the extent to which the scale correlates positively with other
measures of the same construct
– Discriminant validity is the extent to which a measure does not correlate with other constructs
from which it is supposed to differ
– Nomological validity is the extent to which the scale correlates in theoretically predicted ways
with measures of different but related constructs
RELIABILITY
Refers to the extent to which a scale produces consistent results if repeated measurements are made
• Not affected by systematic sources of error
• Assessing Reliability : Test-retest reliability ; Alternative forms reliability ; Internal consistency
Test- Retest Reliability (Repeatability of a measure)
– Respondents are administered identical setsof scale items at two different times and the degree of
similarity between the two measurements is determined
Issues
– Sensitive to the time interval between testing
– Initial measurement may alter the characteristic being measured
– It may be impossible to make repeated measurements
– First measurement may have a carryover effect to the second or subsequent measurements
– Characteristic being measured may change between measurements
– Test -retest reliability coefficient can be inflated by the correlation of each item with itself
RELIABILITY
Alternative Forms Reliability
– Two equivalent forms of the scale are constructed and the same respondents are measured at two
different times, with a different form being used each time
– The scoresfrom the administration of the alternative-scale forms are correlated to asses reliability
Issues
– It is time consuming and expensive to construct an equivalent form of the scale
– It is difficult to construct two equivalent forms of the scale as the two scales should be equivalent with respect
to content and the scale items should have the same means, variances, and intercorrelations
– Internal Consistency
– Association between items for multiple-item measures (coefficient alpha / Cronbach alpha)
– Determines the extent to which different parts of a summated scale are consistent in what they
indicate about the characteristic being measured
– In split-half reliability, the items on the scale are divided into two halves and the resulting half
scores are correlated. High correlation indicates high internal consistency
– The coefficient alpha, or Cronbach'salpha, is the average of all possible split-half coefficients resulting
from different ways of splitting the scale items. This coefficient varies from 0 to 1, and a value
of 0.6 or less generally indicates unsatisfactory internal consistency reliability
…..Contd. to part b