Biostat101Lecture Chapter 3
Biostat101Lecture Chapter 3
Objectives
At the end of the subject matter, the students are expected to:
1. Distinguish primary from secondary sources of data
2. Select the appropriate method of collecting a required set of data
3. Define concept of sampling
4. Differentiate parameter and statistic
5. Determine the most appropriate sampling technique to be used given a research problem
6. Apply the different methods of presenting data.
7. Construct a frequency distribution from a set of raw data
8. Select the appropriate graphical presentation for a given set of data
9. Construct a graph of a set of data.
The world is full of potential data. However, only the relevant and specifically required
data are considered/needed to investigate a research problem.
Collection of Data
Data may be gathered from two general sources. These are: 1) primary and 2) secondary.
Primary sources are those sources from which information are gathered directly from the original
source, or are based on direct or first-hand experiences. It includes:
a. Interview
b. Questionnaires
c. Personal Accounts and Diaries are used in sociological research. It seeks to find information on
people’s actions and feelings by asking them to give their own interpretation, r account of what
they experience.
d. Observations and physical surveys is actual visitation of the subject under study. To determine
the area of a classroom, one has to measure it through the use of a measuring device.
e. Standard scales and tests like mental ability tests and other tests developed by professionals or
groups.
f. Internet can be used for the whole range of surveys, from structured questionnaires to
unstructured interviews and even observational studies and experimental designs. Conducting the
data-gathering through private chat, instant messaging, video chat, among others are the latest
web-based technology.
On the other hand, secondary sources are those sources from which information are gathered from
published or unpublished materials that were previously collected by other individuals or agencies
used for a purpose other than the original purpose for which they were collected. These include:
c. Government departments and commercial and professional bodies that hold much statistical
information, both current and historic.
d. The field which includes ancient cities, buildings, archaeological digs, etc.
2. Indirect or questionnaire method utilizes prepared questions that are intended to elicit answers
to the problems of a study. Questionnaires may be mailed or hand-carried.
1. Begin thinking about the type of data you will have to collect.
2. Think about where you will be obtaining the data.
3. Make sure that the data collection form you are using is clear and easy to use.
4. Make a duplicate copy of the data file and keep it in a separate location.
5. Do not rely on other people to collect or transfer your data unless you personally have trained
them and are confident that they understand the data collection process as well as you do.
6. Plan a detailed schedule of when and where you will be collecting your data.
7. Cultivate possible sources for your participant pool.
8. Try to follow up on subjects who missed their testing session or interview.
9. Never discard original data.
10. Follow the previous 9.
After the collection of data, the next step is to summarize all the information in using the
three methods as: 1) tabular, 2) graphical, and 3) textual.
a. Tabular presentation
Table heading shows the table number and the title. Table number gives the table its
identity for reference purposes while the title briefly explains what are being presented in the table.
Stub contains the stub head and row labels Stub head indicates what the row labels are.
Row labels are the major categories of data contained in rows.
Box head contains the captions that appear above the columns. It includes the stub head,
master caption, column captions and row totals. Master caption and column captions describe the
data that are in their respective columns. Body is the main part of the table that contains the
quantitative information.
Footnote is written immediately below the bottom line of the table to explain or clarify
codes and abbreviations used in the table. Source note is generally written below the footnote and
is used to show the reference and acknowledge the origin of the data.
B. Graphical Presentation
Graphical presentation refers to the pictorial representation of data through the use of
graphs or charts. Graphs or charts are nothing else but illustrations of numerical data. It is a good
way of communicating the numerical figures found in tables. Charts facilitate analysis when it
reveals probable relationships among variables. Graphical presentation also allows comparison of
different series or groups.
Types of Graphs
a. Bar Graph
A bar graph consists of bars of equal width either all vertical or horizontal. The length
represents the magnitude of the quantities being compared. Vertical bars are generally used for
chronological comparison or comparing data taken at a particular time. Horizontal bars are used
to show categorical comparison. It has three types: 1) simple bar graph where the bars stand singly
apart from one another; 2) compound (Multiple) Bar Graph has two or more bars are drawn for
each item; and 3) component bar graph to show proportional variation or changes of the segments
of a whole and the whole itself.
b. Line Graph
Line Graph is an effective device used to portray changes in values with respect to time.
Variations in the data are indicated by a series of line segments formed by joining consecutive
points plotted above categories
c. Band Chart
Band chart is a time series linear graph. It shows the proportional variation of the
component parts of a whole over a period of time.
d. Pie Chart or Circle Graph
Pie Chart or Circle Graph is appropriate for portraying the relative magnitude of the
component parts of a whole.
e. Rectangle Graph
Rectangle Graph is a variation of the pie chart wherein a rectangle takes the place of the
circle and is divided proportionally.
f. Pictograph
Pictograph uses picture symbols to represent values. The symbols used should be
appropriate to the data being presented.
g. Statistical Maps or Cartograms
Statistical Maps or Cartograms is one of the best ways to present geographical data.
The best software for creating tables and graphs is MS Excel. In order to create charts using
MS Excel, access the INSERT ribbon, and add the desired graph at the CHARTS group:
C. Textual Presentation
Textual Presentation summarizes the data in paragraph or narrative form. Put important
figures in the text of the report. Figures may be added: summary statistics like the minimum,
maximum, mean, median, standard deviation, percentage or total. Textual presentation allows us
to highlight the significant figures of the study.
FREQUENCY DISTRIBUTION
Table 2
Distribution of Students in a Biology Class Grouped according to Year Level and Section
Table 3
Distribution of Examination Results in a Biology Class
Class Limits
They refer to the lowest and highest value that can be entered in a class.
Frequency
This is the number of values that fall in a given interval. Frequencies are usually listed in one
column and are represented by “f”. The sum of the frequencies is equal to the total number “n” of raw data.
where 𝑋𝑚 is the class midpoint, 𝐿𝐿𝑖 is the lower limit of a given class interval, and 𝑈𝐿𝑖 is the upper limit
of a given class interval.
where 𝑈𝐶𝐵𝑖 is the upper class boundary of a given class interval, 𝑈𝐿(𝑖+1) is the upper class boundary of a
given class interval.
If only one class interval is given, adding 0.5 to the upper limit and subtracting 0.5 to the lower
limit if the class limits are whole numbers, gives the class boundaries. If the class limits have one decimal
value, we add and subtract 0.05 to the upper and lower limits, respectively. If there are two, we add or
subtract 0.005, and so on.
Table 4
Frequency Distribution of the Examination
Results in a Biostatistics Class
10 - 14 2 12 9.5 – 14.5
15 - 19 5 17 14.5 – 19.5
20 - 24 12 22 19.5 – 24.5
25 - 29 15 27 24.5 – 29.5
30 - 34 10 32 29.5 – 34.5
35 - 39 4 37 34.5 – 39.5
40 - 44 2 42 39.5 – 44.5
n = 50
Step 1. Determine the range of the set of data using the formula, R = HV – LV, where R is the
range, HV is the highest value and LV is the lowest value.
Step 2. Determine “K” of the distribution, with the formula, K = √𝑛, where n is the sample size or
the number of observations.
𝑅
Step 3. Determine the class size “c”, with the formula c = 𝐾 , where c is the class size, R is the
computed R (in step 1), and K is the computed value in step 2.
Step 4. Choose an appropriate lower limit (LL) for the first class interval. Choose a number equal
to or less than the lowest observed value that is divisible by the class size. If the lowest value in
the distribution is lower than the class size, consider the lowest value as the lower limit of the first
class interval.
Step 6. Determine the rest of the class intervals by adding the value of the class size to the upper
and lower limits to get the next class interval. Continue until the highest value is within the class
interval and the desired number of classes is met.
Step 7. Count the number of observations falling within each class and enter the result in the
frequency column and get the sum. This is facilitated by using a tally column.
Step 8. Complete the distribution by providing the rest of the columns for class marks, class
boundaries, and others as needed.
Example:
Construct a frequency distribution using the following result of the examination of 50
students in Biological Statistics.
45 89 32 67 51 60 70 65 72 70
75 55 50 75 65 49 58 71 87 73
63 93 75 75 43 76 73 85 64 45
35 78 54 65 59 55 89 85 40 82
51 58 35 48 55 97 67 56 70 55
Solution:
1. The highest value in the distribution is 97 while the lowest value is 32. So, the range is R = HV
– LV = 97 - 32 = 65.
2. Since there are 50 students, K = √𝑛 = √50 = 7.07. You may consider all the decimal places
appearing in your calculator for a more exact result of K.
𝑅 65
3. From the results in steps 1 and 2, the class size is c = 𝐾 = 7.07 = 9.19 ≈ 9. Always round the
final answer to a whole number.
4. The lowest value of the distribution is 32 but it is not divisible by 9. The lowest limit of the
distribution will be:
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45
Lowest score
From the illustration, the lowest limit of the distribution and the lower limit (LL) of the
first class interval is 27. If we choose 36, the ones who got scores below 36 will be disregarded.
5. To find the upper limit of the first class interval, we compute using the formula UL = LL + c -
1. Hence, UL = 27 + (9-1) = 27 + 8 = 35.
We have determined the lowest class interval and the first class interval as 27 – 35.
6. From the first class interval, we are ready to fill out the rest of the class intervals. The lower
limit of the second class interval is 27 + 9 (computed class size in step 3) = 36. The next lower
limit is determined by again adding 9 to the lower limit of the previous class size, hence we have
45, 54. 63, 72, 81 and 90.
For the upper limit of each class, perform the same process in determining the lower limit.
Hence, from 35, we add 9 and it gives us 44; 44 + 9 = 53, and so on. Be guided by the highest
value in the distribution so that you will not exceed the needed class intervals.
27 - 35
36 - 44
45 - 53
54 - 62
63 - 71
72 - 80
81 - 89
90 - 98
7. Next is to fill out the frequency (f) column. As mentioned, tallying is done to come up with the
values. Be careful on this step because duplication is a threat if not done properly and
systematically. Other students put check marks on the scores already tallied and some double-
check the tallying.
36 - 44 ││ = 2 2
45 - 53 ││││- ││ = 7 7
54 - 62 ││││ - ││││ = 10 10
63 - 71 ││││ - ││││ - │ = 11 11
72 - 80 ││││ - ││││ = 9 9
81 - 89 ││││ - │ = 6 6
90 - 98 ││ = 2 2
n=50
27 - 35 3 31
36 - 44 2 40
45 - 53 7 49
54 - 62 10 58
63 - 71 11 67
72 - 80 9 76
81 - 89 6 85
90 - 98 2 94
n=50
8b. To obtain the class upper and lower class boundaries, we use the formula:
LLi + UL(i+1)
UCBi = = LCB(i+1)
2
In the first class interval, the lower class boundary is 27 – 0.5 = 26.5 while the upper class
27+44
boundary is UCB1 = 2 = 35.5. For the second class interval, the lower class boundary is equivalent
36+53
to the upper class boundary of the first class interval while the upper class boundary is UCB2 = =
2
44.5. For a more convenient way of computation, you may deduct .5 from each corresponding lower limit
and add .5 to each corresponding upper limit.
27 - 35 3 31 26.5 – 35.5
36 - 44 2 40 35.5 – 44.5
45 - 53 7 49 44.5 – 53.5
54 - 62 10 58 53.5 – 62.5
63 - 71 11 67 62.5 – 71.5
72 - 80 9 76 71.5 – 80.5
81 - 89 6 85 80.5 – 89.5
90 - 98 2 94 89.5 – 98.5
n=50
From a given frequency distribution, we can construct other frequency distributions like
the relative and the cumulative frequencies.
Relative Frequency
It shows the proportion in percent of the frequency of each class to the total frequency. It
is denoted by %f or rf.
To continue with our table, we compute for the respective relative frequencies of each class
𝑓
interval. We use the formula %f = 𝑛 𝑥 100. Substituting the values from our table, we have %f =
𝑓
𝑥 100 = 6.00. (You may round off your answers to two decimal places). The same process will
𝑛
be employed to complete the column. Make sure that a sum of 100.00 is obtained when all the
relative frequencies of each class interval are computed.
n=50 100.00
Cumulative Frequencies
These indicate the number of observations that lie above (or greater than) or below (or less
than) a class boundary. They are determined by cumulating or adding-up the absolute frequencies
of the distribution.
To determine the less than cumulative frequency, start from the lowest class interval. From
there add up the frequencies of each class interval. The last value should be equal to the sample
size.
n=50 100.00
To determine the less than cumulative frequency, start from the highest class interval. From
there add up the frequencies of each class interval. The last value should be equal to the sample
size.
n=50 100.00
2. Frequency Polygon
It is a line curve constructed by plotting the class frequencies on the y-axis against the class
midpoints on the x-axis.
3. Ogive
It is the graph of a cumulative frequency distribution. It is a falling frequency
polygon formed by plotting the cumulative frequencies on the y-axis against the class
boundaries on the x-axis.
Interpreting Frequency Counts and Percentages
Age f %
65 -69 1 2.63
60 - 64 4 10.53
55 - 59 13 34.22
50 - 54 7 18.42
45 - 49 3 7.89
40 - 44 5 13.16
35 - 39 2 5.26
30 - 34 2 5.26
25 - 29 1 2.63
Total 38 100.00
Applying our table guide, we may interpret it as:
A great number of the Biology professors (13; 34.22%) are 55-59 years old.
As indicated in the guide, you only use the description of the percentage once. But
you may add in the discussion the categories with the highest frequency, the lowest and
other numerical values that you want to highlight.
The guide is also a great help if you wish to present the thematic data to summarize the
verbatim responses.
Activity No. 4
A. Fill out the table by citing five research title; give instances where you may use primary and
secondary as sources of information; and indicate at least one method of data collection that may
be applied and describe how the method is to be carried out.
B. The following table shows the civil status of females in certain locality. Present the data using
two appropriate graphical presentations.
D. Refer to the frequency distribution table in our example (50 students in Biological Statistics)
and answer the following:
E. Determine the class midpoint, class boundaries and class size of each of the following:
b. 2.3 – 5.6
c. 12.54 – 16.22
d. 31 - 37