Samar Colleges, Inc.
A. Mabini Ave., Catbalogan City, Samar,
Phils 6700 Tel: +6355-
5438381/8594
Website: samarcollege.edu.ph
Email:
[email protected] [email protected] HANDOUT ON SAMPLING AND FREQUENCY DISTRIBUTIONS
1. Discuss/distinguish the different types of probability sampling;
2. Discuss/distinguish the different types of non-probability sampling;
3. Organize data into frequency distribution;
4. Discuss some important concepts relative to frequency distributions, and
5. Discuss the different types of commonly used graphs.
A representative sample accurately reflects the characteristics of the larger population from
which it is drawn. An appropriately sized sample can minimize sampling error and increase the
validity of research findings.
Key Factors Influencing Sample Size
a. Population Size: The total number of individuals in the population can influence sample
size. Larger populations usually require larger samples to ensure representativeness, but
the relationship is not always linear due to the effect of sample proportions.
b. Margin of Error (Confidence Interval): This indicates the range within which the true
population parameter is expected to lie. A smaller margin of error requires a larger
sample size. Typical margins are 5% or 10%.
c. Confidence Level: This represents the degree of certainty that the true population
parameter falls within the margin of error. Common confidence levels are 90%, 95%, and
99%. A higher confidence level necessitates a larger sample size.
d. Population Variability: The degree of variability or heterogeneity within the population
impacts sample size. If the population is diverse, a larger sample is needed to capture this
variability accurately.
e. Study Design: The type of study and its objectives may also dictate the sample size. For
instance, exploratory studies may get by with smaller samples than confirmatory studies.
Determining Sample Size: Formula and Steps
A common formula for calculating sample size for a proportion is:
[ n = \frac{Z^2 \cdot p \cdot (1-p)}{E^2} ]
Where:
• ( n ) = required sample size
• ( Z ) = Z-score (found in Z-tables corresponding to the chosen confidence level)
• ( p ) = estimated proportion of the attribute of interest (if unknown, use 0.5 for
maximum variability)
• ( E ) = margin of error (expressed as a decimal)
Example Calculation
Let’s say you want to determine the sample size for a survey with:
• A 95% confidence level (Z = 1.96)
• A margin of error of 5% (0.05)
• An estimated proportion of 50% (0.5)
Plug the values into the formula:
[ n = \frac{(1.96^2) \cdot (0.5) \cdot (1-0.5)}{(0.05^2)} ] [ n = \frac{(3.8416) \cdot
(0.25)}{0.0025} = 384.16 ]
Thus, you would need a sample size of 385 (always round up).
Adjusting for Finite Population
If the population size is small, you may need to adjust the sample size using the finite
population correction formula:
[ n_{adj} = \frac{n}{1 + \frac{(n-1)}{N}} ]
Where:
• ( n_{adj} ) = adjusted sample size
• ( N ) = population size
Thus, determining the right sample size involves carefully considering various factors such
as population size, margin of error, confidence level, and population variability. By applying
statistical formulas, researchers can derive a sample size that enhances the representativeness
of their study, ultimately leading to more reliable results. Always remember to validate your
assumptions and conduct pilot studies if necessary to fine-tune your sample size
determination process.
Sampling is a statistical procedure that is concerned with the selection of the individual
observation; it helps us to make statistical inferences about the population. We therefore make
inferences about the population with the help of samples. (https:// www.statisticssolutions.
com/sample-size- calculation- and - sample-size- justification/ sampling/)
There are two basic sampling techniques: 1) Probability Sampling Technique, and 2) Non-
Probability Sampling Technique.
Probability Sampling Technique refers to a process of selecting samples in such a way
that all individuals in the defined population have an equal and independent chance of being
selected for the sample, the process being called randomization (Subong, 2005: 20). Thus, every
member in the target population has the same probability of being chosen as the other members in
the target population, and the selection of any member will not in any way affect the selection of
other members in the target population.
1. Simple Random Sampling
A sample X1, X2, X3, … Xn of values of a numerical variable X is called random
sample if the sampled values are selected independently from the population.
The simplest method of random sampling is by lottery. From the given
population, the names of the persons or objects are listed on small slips of paper, put in
a container and then jumbled thoroughly. Without looking at the slips of paper, draw
the desired sample size. In this procedure, each item has an equal chance of being
chosen as sample.
2. Stratified Random Sampling
In this type of sampling, the population is divided into categories or strata. From
these divisions, the members will be drawn proportionate to each stratum. In using this
technique, the researcher must first determine the characteristics of the population in
order to obtain appropriate divisions needed in the problem. Essentially, the goal of
stratification is to ensure that there are some relationships between being in a particular
stratum. For example, the company president is planning to improve a new salary
scheme for his employees. The population could be divided into: work efficiency,
length of service in the company, nature of work, or educational qualification. The
population will not be categorized into sex or civil status since these qualities were not
found to affect salaries of employees. This sampling technique is sometimes called
proportionate probability sampling.
The following steps maybe followed:
a) Identify and define the population.
b) Determine desired sample size.
c) Identify the variable and subgroups (strata) for which you want to guarantee
appropriate representation.
d) Come up with the sampling frame by classifying all members of the
population as members of one of the identified subgroups.
e) Compute the proportion of samples to be taken from each group using the
formula p = n/N (n refers to the computed sample size and N refers to the
population size)
f) Randomly select (using a table of random numbers or by using lottery
method) the samples from each stratum.
3. Cluster Sampling
This sampling is used if the population is spread out over a wide area, and if the
complete list of the members of the population is not available.
In this kind of sampling, the total population is divided into a number of
relatively small areas, and some of these areas or clusters are randomly selected for
inclusion in the overall sample. This sampling process involves the random selection
of clusters (not individual members) and all the members comprising the cluster are
taken as part of the samples.
The steps in cluster sampling are not very different from those involved in
random sampling. The major difference is that random selection of groups (clusters)
is involved, not individuals. Cluster sampling involves the following:
a) Identify and define the population.
b) Determine the desired sample size.
c) Identify and define a logical cluster.
d) List all clusters (or obtain a list) that comprise the population.
e) Estimate the average number of population members per cluster.
f) Determine the number of clusters needed by dividing the sample size by the
estimated size of a cluster.
g) Randomly select the needed number of clusters (using a table of random
numbers or by lottery method).
h) Include in your study all population members in each selected cluster.
Cluster sampling can be done in stages, involving selection of clusters within
clusters. This process is called multi-stage sampling. For example, schools can be
randomly selected and then classrooms within each selected school can be randomly
selected.
4. Systematic Sampling
In some instances, the most practical way of sampling is to select, say every
25th name on a list, every 7th house on one side of a street, every 10th piece coming off
a production line, and so on. This is called systematic sampling and an element of
randomness can be introduced by using random numbers to pick the unit with which to
start referred to as a random start.
Steps in Systematic Planning
a) Identify and define the population.
b) Determine the desired sample size.
c) Obtain a list of population.
d) Determine the system (k) by dividing the size of the population by the
desired sample size.
e) Start at some random place at the top of the population list.
f) Starting at that point, take every kth name on the list until the desired sample
size is reached.
g) If the end of the list is reached before the desired sample size is reached, go
back to the top of the list.
Non-Probability Sampling Technique refers to a process of selecting samples where
randomization process is not considered. The samples or subjects that are needed are merely taken
or selected for a certain purpose of the study.
a. Accidental or Incidental Sampling. It is a process of getting a subject of study that is
only available during the period. For example, if the researcher would like to conduct a
survey on which brands of toothpaste are top sellers in Region VI, the researcher has to
identify the peak shopping hours in a certain mall and standby at the exit gate and interview
the number of shoppers who came out about the brand of toothpaste they bought until the
researcher has met the desired sample size.
b. Quota Sampling. It is a process of getting a sample of subject of study through quota
system. For example in making an opinion survey about legalization of divorce in the
Philippines, the researcher can assign a quota system of subject of investigation such all
fourth year students taking Political Science courses in all Higher Education Institutions in
every region.
c. Purposive Sampling. In this sampling technique, the researcher simply picks out the
subjects that are representatives of the population depending on the purpose of the study.
For example, in a study on NSAT Performance of Science Students, the researchers can
study only those students belonging to a high socioeconomic status or those with average
socioeconomic status, but not a representative of all students.
d. Volunteer Sampling. In this sampling technique, the researcher simply includes subjects
who are willing to participate in the investigation. This is usually resorted to when the
study is quite sensitive, or may pose danger to the participants.
When data are collected, one needs to organize them to facilitate analysis of these data.
Organization of data of small size (usually N<30) is easier than when one is dealing with
voluminous data. This is because important characteristics of data must be properly reflected when
they are in organized form. Usually, data are presented in tabular form, where data are into
different classes and then determining the number of observations that fall in each of the classes.
Such arrangement is most commonly referred to as frequency distribution
Construction of Frequency Distribution
This section presents a step-by-step procedure of constructing a frequency distribution.
Consider the following data as the results of a 150-item summative test in Mathematics
taken by 100 first year college students of Samar State University:
130 105 91 80 111 97 85 112 117 129
140 110 120 87 108 99 86 133 118 131
90 95 139 106 99 102 87 113 119 134
100 70 114 105 93 80 104 114 120 135
110 76 93 123 74 129 105 115 121 136
107 89 75 132 73 141 106 116 122 137
98 99 74 128 92 81 89 71 124 138
75 80 71 94 95 82 103 73 125 70
88 72 108 91 96 83 108 88 126 73
77 87 109 70 101 102 109 90 127 75
Step 1. Compute the Range(highest value – lowest value)
Using the above example:
Range = H – L
= 141 – 70
Range = 71
Step 2. Decide on the minimum size of groupings desired.
According to Espina (1995.75), the maximum number of interval steps is 20, the
minimum is 7. The ideal number of interval steps is 10 to 15 and the most common is 10.
Hence, the value derived from this step is arbitrary.
This step does not entail any computation. All you have to do is decide on the desired
number of steps or classes.
In the above given example, let’s say that you decide on 14 to be the number of steps
or grouping size.
Step 3 Compute the interval size, i
In the example given:
Range
i=------- ------
size of groupings
71
Note: If the value of i turned out to be even, it is preferable to round off
the value to the nearest odd integer less than the computed value.
= - - - - - = 5.07 or 5
Step 4. Set up the classes 14
One easy way is to start with the lowest odd integer, less than or equal to
the lowest value as the lower limit of the lowest class. Then simply add to this
value, the computed i in step 3 until you reach the highest value or score in the
raw distribution. Next, one needs to set-up the upper limits.
(Remember that these steps need to be mutually exclusive, no overlapping should
occur and no gaps or missing values).
In our example:
139 –143
134 – 138
129 – 133
124 – 128
119 – 123
114 – 118
109 – 113
104 – 108
99 – 103
94 – 98
89 – 93
84 – 88
(74+5)- - - - - 79 – 83 - - - (78+5)
(69+5)- - - - - 74 – 78 - - - (73+5)
69 – 73
Step 5. Tally the scores, accordingly, to find out the number of scores falling in each class
interval, thus:
Class Limits Tally Frequency
139 –143 lll 3
134 – 138 llll 5
129 – 133 llll – l 6
124 –128 llll 5
119 –123 llll – l 6
114 – 118 llll – l 6
109 –113 llll – ll 7
104 –108 llll –llll 10
99 –104 llll – lll 8
94 – 98 llll – l 6
89 – 93 llll – llll 9
84 – 88 llll – ll 7
79 – 83 llll – l 6
74 – 78 llll – ll 7
69 – 73 llll – llll 9
-------
N = 100
In a frequency distribution, several terminologies are used, like size of groupings or
number of steps, interval size (i), class limits, class boundaries , class midpoint (Xm), class
frequency (f), cumulative frequency (F), modal class, etc.
Size of grouping. This sometimes called the number of steps or classes. In our
example given, the size of grouping if you count the number of steps is 15.
Interval size (i). This refers to the number of scores within an interval step
sometimes, interval size is also called class width.
In our example, taking the lowest step 69 – 73, there are five scores included,
namely: 69, 70, 71, 72, and 73. Hence, i = 5. You can also compute i by getting the
difference between one lower limit to the next lower limit or one upper limit to the next
upper limit. Hence, i = 74 – 69 = 5 or i = 78 – 73 = 5
Class Limits. Class limits are categorized into two – upper class limits
(UL) which refers to the numbers to the right of the class interval and lower class limit
(LL) which refer to the numbers to the left of the class interval.
Class Boundaries. Like the class limits, class boundaries are classified as upper
class boundaries (UB) and lower boundary (LB).Class boundaries are always carried out
to one more decimal place than the recorded observations (Walpole, 1982 :48).Class
boundaries are sometimes referred to as exact class limits. The upper boundary is also
referred to as the exact upper limits and to lower boundary is also referred to as the exact
lower limit.
In the example given, the class boundary for the class interval 60- 73 is 68.5 –
73.5.This means that to get the lower class boundary 0.5 is subtracted from the lower
limit. Meanwhile, to get the upper class boundary, 0.5 is added to the upper limit.
Class Midpoint or Class Mark ( Xm) This refers to the midpoint between the
upper and lower boundaries or class limits of a class interval (Walpole, 1982:49).Usually,
this is obtained by the following formula.
LL + UL
Xm =
2
or
LB + UB
Xm =
2
Thus, in our example, the class midpoint of the class interval 69- 73 is:
Xm = 69+73 = 142 = 71 or Xm 68.5+ 73.5 = 142 = 71
2 2 2 2
Class frequency (f). This refers to the number of cases falling in a particular interval.
This can be determined after accurately tallying the scores. In our example, the class
frequency, f of 69 – 73 is 9.
Cumulative frequency (F). There are 2 type of cumulative frequencies, the less than
or equal to cumulative frequency (<F) and the greater than or equal to cumulative frequency
(>F). With reference to the obtained class frequencies for each distribution, < F is obtained by
adding all the frequencies starting from the lowest class interval to the particular interval you
are interested in. Meanwhile, the > F is obtained by adding all the frequencies from the highest
class to the particular interval you are interested in.
In our example, let’s take the interval 94 – 98. Its < F is obtained by getting the sum
of 9+7+6+7+9+6 = 44. Its > F is obtained by getting the sum of 3+5+6+5+6+6+7+10+8+6 =
62.
The following table summarizes all the terminologies just discussed for the scores of
the 100 first year college students in a 150- item test in Mathematics.
Class Class Class Class Cumulative
Intervals Boundaries Midpoint frequency Frequency
(Xm) (f) >F <F
139 – 143 138.5 – 143.5 141 3 3 100
134 – 138 133.5 – 138.5 136 5 8 97
129 – 133 128.5 – 133.5 131 6 14 92
124 – 128 123.5 – 128.5 126 5 19 86
119 – 123 118.5 – 123.5 121 6 25 81
114 – 118 113.5 – 118.5 116 6 31 75
109 – 113 108.5 – 113.5 111 7 38 69
104 – 108 103.5 – 108.5 106 10 48 62
99 – 103 98.5 – 103.5 101 8 56 52
94 – 98 93.5 - 98.5 96 6 62 44
89 – 93 87.5 – 93.5 91 9 71 38
84 – 88 83.5 – 88.5 86 7 78 29
79 – 83 78.5 - 83.5 81 6 84 22
74 – 78 73.4 – 78.5 76 7 91 16
69 – 73 68.5 – 73.5 71 9 100 9
Exercise 2
Consider the values below, which represent the travel time to work of 170 employees in a
large downtown office building. The times are given in minutes and each value represents an
employee’s average time over five consecutive work days.
Exercise:
29 20 20 34 35 28 37 46 35 19 20 56 43 25 48 43 37
31 22 32 36 38 49 32 86 13 27 14 39 39 27 44 39 40
11 5 15 16 29 38 44 38 74 23 11 40 30 10 35 30 31
24 20 30 29 41 50 30 12 114 42 47 32 39 25 44 39 43
30 30 40 35 56 31 45 90 100 39 37 49 28 35 33 28 58
8 23 33 13 27 48 53 41 40 28 24 38 62 28 67 62 29
25 12 32 30 46 18 41 23 24 90 90 43 30 17 35 30 48
51 18 28 56 25 13 54 115 16 39 41 34 67 23 72 67 27
41 15 25 46 37 18 51 103 52 20 65 22 60 20 65 60 39
37 14 24 42 42 16 108 44 43 39 38 41 40 19 45 40 44
Make a frequency distribution, ( Use 15 as the minimum size of grouping in step 2) and
complete the following below:
Class Class Class Class Cumulative
Intervals Boundaries Midpoint frequency Frequency
(Xm) (f) >F <F
References:
1. Cochran, W. G. (1977). Sampling Techniques (3rd ed.). New York: Wiley.
o This book is a foundational text in sampling theory and provides comprehensive
insights into sample size calculations for various study designs.
2. Babbie, E. R. (2016). The Practice of Social Research (14th ed.). Cengage Learning.
o This textbook discusses the principles of research methodology, including sample
size determination and other statistical considerations in social research.
3. Dewitt, M. (2013). "Determining Sample Size." Penn State Eberly College of Science.
o An online resource that offers guidance on calculating sample sizes for research
projects, including explanations of confidence intervals, margins of error, and
population parameters. Available online.
4. Krejcie, R. V., & Morgan, D. W. (1970). "Determining Sample Size for Research
Activities." Educational and Psychological Measurement, 30(3), 607-610.
o This classic article offers a table for determining sample sizes based on population
size and desired confidence levels and margins.
5. Fink, A. (2003). The Survey Kit (5th ed.). Sage Publications.
o This series provides practical guidance on designing surveys, including sample
size determination and the significance of representativeness.
6. Sauro, J. (2015). "How to Calculate Sample Size." MeasuringU.
o A practical guide that explains sample size calculation for different types of
studies, including formulas and necessary parameters. Available online.