Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
44 views26 pages

Biostat101Lecture Chapter 3

Uploaded by

Chriz Gel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views26 pages

Biostat101Lecture Chapter 3

Uploaded by

Chriz Gel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Chapter 3

Collection and Presentation of Data

Objectives
At the end of the subject matter, the students are expected to:
1. Distinguish primary from secondary sources of data
2. Select the appropriate method of collecting a required set of data
3. Define concept of sampling
4. Differentiate parameter and statistic
5. Determine the most appropriate sampling technique to be used given a research problem
6. Apply the different methods of presenting data.
7. Construct a frequency distribution from a set of raw data
8. Select the appropriate graphical presentation for a given set of data
9. Construct a graph of a set of data.

The world is full of potential data. However, only the relevant and specifically required
data are considered/needed to investigate a research problem.

Collection of Data

Data may be gathered from two general sources. These are: 1) primary and 2) secondary.
Primary sources are those sources from which information are gathered directly from the original
source, or are based on direct or first-hand experiences. It includes:

a. Interview

b. Questionnaires

c. Personal Accounts and Diaries are used in sociological research. It seeks to find information on
people’s actions and feelings by asking them to give their own interpretation, r account of what
they experience.

d. Observations and physical surveys is actual visitation of the subject under study. To determine
the area of a classroom, one has to measure it through the use of a measuring device.

e. Standard scales and tests like mental ability tests and other tests developed by professionals or
groups.

f. Internet can be used for the whole range of surveys, from structured questionnaires to
unstructured interviews and even observational studies and experimental designs. Conducting the
data-gathering through private chat, instant messaging, video chat, among others are the latest
web-based technology.
On the other hand, secondary sources are those sources from which information are gathered from
published or unpublished materials that were previously collected by other individuals or agencies
used for a purpose other than the original purpose for which they were collected. These include:

a. Libraries and archives generally equipped with sophisticated catalogue systems.


b. Museums and collections that keeps artifacts and other things that tell something about the past.

c. Government departments and commercial and professional bodies that hold much statistical
information, both current and historic.

d. The field which includes ancient cities, buildings, archaeological digs, etc.

e. The internet that caters a wide array of websites.

Methods of Data Collection

1. Direct/Interview Method is a person to person exchange of ideas or answers between the


interviewer and the interviewee.

2. Indirect or questionnaire method utilizes prepared questions that are intended to elicit answers
to the problems of a study. Questionnaires may be mailed or hand-carried.

3. Registration or Records Review consists purely of perusal of existing records of an agency or


person such as vital statistics like births, deaths, marriages, or number of households and records
of motor vehicles and licenses. Certain laws enforce this method of gathering information.

4. Observation Method involves recording of an object, event or behavior through systematic


planning of observation. In certain situations, the person collecting the data may act as a participant
observer to get a first-hand experience of the event he/she is studying.

5. Experimentation Method involves setting up an experiment by employing the three basic


principles of experimental design, namely randomization, replication and error control in order to
obtain relevant and objective information from the experiment.

Ten Commandments of Data Collection (by: Neil J. Salkind)

1. Begin thinking about the type of data you will have to collect.
2. Think about where you will be obtaining the data.
3. Make sure that the data collection form you are using is clear and easy to use.
4. Make a duplicate copy of the data file and keep it in a separate location.
5. Do not rely on other people to collect or transfer your data unless you personally have trained
them and are confident that they understand the data collection process as well as you do.
6. Plan a detailed schedule of when and where you will be collecting your data.
7. Cultivate possible sources for your participant pool.
8. Try to follow up on subjects who missed their testing session or interview.
9. Never discard original data.
10. Follow the previous 9.

Methods of Data Presentation

After the collection of data, the next step is to summarize all the information in using the
three methods as: 1) tabular, 2) graphical, and 3) textual.

a. Tabular presentation

Tabular presentation summarizes classificatory data in a systematic and logical


arrangement into rows and columns called statistical table. Figures are arranged in rows and
columns for easy reading and analysis. Tabular presentation allows us to compare and look for
relationships among the variables of interest. Tables can have frequency counts, proportions,
percentages, and other summary measures such as totals and averages.

Table heading shows the table number and the title. Table number gives the table its
identity for reference purposes while the title briefly explains what are being presented in the table.

Stub contains the stub head and row labels Stub head indicates what the row labels are.
Row labels are the major categories of data contained in rows.

Box head contains the captions that appear above the columns. It includes the stub head,
master caption, column captions and row totals. Master caption and column captions describe the
data that are in their respective columns. Body is the main part of the table that contains the
quantitative information.
Footnote is written immediately below the bottom line of the table to explain or clarify
codes and abbreviations used in the table. Source note is generally written below the footnote and
is used to show the reference and acknowledge the origin of the data.

B. Graphical Presentation

Graphical presentation refers to the pictorial representation of data through the use of
graphs or charts. Graphs or charts are nothing else but illustrations of numerical data. It is a good
way of communicating the numerical figures found in tables. Charts facilitate analysis when it
reveals probable relationships among variables. Graphical presentation also allows comparison of
different series or groups.

Types of Graphs

a. Bar Graph

A bar graph consists of bars of equal width either all vertical or horizontal. The length
represents the magnitude of the quantities being compared. Vertical bars are generally used for
chronological comparison or comparing data taken at a particular time. Horizontal bars are used
to show categorical comparison. It has three types: 1) simple bar graph where the bars stand singly
apart from one another; 2) compound (Multiple) Bar Graph has two or more bars are drawn for
each item; and 3) component bar graph to show proportional variation or changes of the segments
of a whole and the whole itself.
b. Line Graph

Line Graph is an effective device used to portray changes in values with respect to time.
Variations in the data are indicated by a series of line segments formed by joining consecutive
points plotted above categories
c. Band Chart

Band chart is a time series linear graph. It shows the proportional variation of the
component parts of a whole over a period of time.
d. Pie Chart or Circle Graph

Pie Chart or Circle Graph is appropriate for portraying the relative magnitude of the
component parts of a whole.

e. Rectangle Graph

Rectangle Graph is a variation of the pie chart wherein a rectangle takes the place of the
circle and is divided proportionally.
f. Pictograph

Pictograph uses picture symbols to represent values. The symbols used should be
appropriate to the data being presented.
g. Statistical Maps or Cartograms

Statistical Maps or Cartograms is one of the best ways to present geographical data.

The best software for creating tables and graphs is MS Excel. In order to create charts using
MS Excel, access the INSERT ribbon, and add the desired graph at the CHARTS group:

C. Textual Presentation

Textual Presentation summarizes the data in paragraph or narrative form. Put important
figures in the text of the report. Figures may be added: summary statistics like the minimum,
maximum, mean, median, standard deviation, percentage or total. Textual presentation allows us
to highlight the significant figures of the study.

FREQUENCY DISTRIBUTION

An array is an arrangement of data simply in ascending or descending order of magnitude.


It is usually used for small numbers of observations. Frequency distribution is a systematic
arrangement of data that consists of reducing the data to forms that are manageable without losing
informative details. It is a tabular presentation of qualitative data grouped into categories and also
a tabular presentation of quantitative data grouped into non-overlapping numerical intervals called
classes together with the number of observations in each category or class.

Here is an example of a qualitative data grouped into categories:

Table 2
Distribution of Students in a Biology Class Grouped according to Year Level and Section

Year Level and Section No. of Students


I-A 30
I-B 33
II-A 34
II-B 35
III-A 31
III-B 31
IV 42
Total 236

Here is an example of a quantitative data grouped into non-overlapping numerical


intervals:

Table 3
Distribution of Examination Results in a Biology Class

Score No. of Students


10 - 14 10
15 - 19 34
20 - 24 12
25 - 29 8
30 - 34 42
35 - 39 24
40 - 44 106
Total 236

Components of a Quantitative Frequency Distribution

Classes or Class intervals


These are the groupings in terms of the numerical intervals. In Table 3, the interval 10-14
is the lowest class interval and 40- 44 is the highest class.

Class Limits
They refer to the lowest and highest value that can be entered in a class.

Lower Class Limit is the lowest value in each class.


Upper Class Limit is the highest value in each class.

Frequency
This is the number of values that fall in a given interval. Frequencies are usually listed in one
column and are represented by “f”. The sum of the frequencies is equal to the total number “n” of raw data.

Class midpoint or class mark


It is a single value that serves as the representative of the given class interval or class boundaries.
It has the formula
𝐿𝐿𝑖 + 𝑈𝐿𝑖
𝑋𝑚 =
2

where 𝑋𝑚 is the class midpoint, 𝐿𝐿𝑖 is the lower limit of a given class interval, and 𝑈𝐿𝑖 is the upper limit
of a given class interval.

Class boundary or True Limit


It refers to the value midway between the upper limit of an interval and the lower limit of the next.
It has the formula:
𝐿𝐿𝑖 + 𝑈𝐿(𝑖+1)
𝑈𝐶𝐵𝑖 = = 𝐿𝐶𝐵(𝑖+1)
2

where 𝑈𝐶𝐵𝑖 is the upper class boundary of a given class interval, 𝑈𝐿(𝑖+1) is the upper class boundary of a
given class interval.

If only one class interval is given, adding 0.5 to the upper limit and subtracting 0.5 to the lower
limit if the class limits are whole numbers, gives the class boundaries. If the class limits have one decimal
value, we add and subtract 0.05 to the upper and lower limits, respectively. If there are two, we add or
subtract 0.005, and so on.

Class Size (c)


It is the length of a class interval or class boundaries. This may be obtained by determining the the
difference between the upper and the lower class boundaries, or by getting the difference between two
successive upper or lower limits. The size of the class intervals must be constant throughout the distribution.

Table 4
Frequency Distribution of the Examination
Results in a Biostatistics Class

Classes f 𝑋𝑚 Class Boundaries

10 - 14 2 12 9.5 – 14.5

15 - 19 5 17 14.5 – 19.5

20 - 24 12 22 19.5 – 24.5

25 - 29 15 27 24.5 – 29.5

30 - 34 10 32 29.5 – 34.5

35 - 39 4 37 34.5 – 39.5

40 - 44 2 42 39.5 – 44.5

n = 50

Construction of a Frequency Distribution

Step 1. Determine the range of the set of data using the formula, R = HV – LV, where R is the
range, HV is the highest value and LV is the lowest value.
Step 2. Determine “K” of the distribution, with the formula, K = √𝑛, where n is the sample size or
the number of observations.
𝑅
Step 3. Determine the class size “c”, with the formula c = 𝐾 , where c is the class size, R is the
computed R (in step 1), and K is the computed value in step 2.

Step 4. Choose an appropriate lower limit (LL) for the first class interval. Choose a number equal
to or less than the lowest observed value that is divisible by the class size. If the lowest value in
the distribution is lower than the class size, consider the lowest value as the lower limit of the first
class interval.

Step 5. Determine the upper limit (UL) of the lowest class


UL = LL + c – 1 (unit measure)

Step 6. Determine the rest of the class intervals by adding the value of the class size to the upper
and lower limits to get the next class interval. Continue until the highest value is within the class
interval and the desired number of classes is met.

Step 7. Count the number of observations falling within each class and enter the result in the
frequency column and get the sum. This is facilitated by using a tally column.

Step 8. Complete the distribution by providing the rest of the columns for class marks, class
boundaries, and others as needed.

Example:
Construct a frequency distribution using the following result of the examination of 50
students in Biological Statistics.

45 89 32 67 51 60 70 65 72 70
75 55 50 75 65 49 58 71 87 73
63 93 75 75 43 76 73 85 64 45
35 78 54 65 59 55 89 85 40 82
51 58 35 48 55 97 67 56 70 55

Solution:

By following the steps stated above, we do the following:

1. The highest value in the distribution is 97 while the lowest value is 32. So, the range is R = HV
– LV = 97 - 32 = 65.

2. Since there are 50 students, K = √𝑛 = √50 = 7.07. You may consider all the decimal places
appearing in your calculator for a more exact result of K.
𝑅 65
3. From the results in steps 1 and 2, the class size is c = 𝐾 = 7.07 = 9.19 ≈ 9. Always round the
final answer to a whole number.
4. The lowest value of the distribution is 32 but it is not divisible by 9. The lowest limit of the
distribution will be:
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45

Lowest score

Score divisible by 9 Score divisible by 9 Score divisible by 9

From the illustration, the lowest limit of the distribution and the lower limit (LL) of the
first class interval is 27. If we choose 36, the ones who got scores below 36 will be disregarded.

5. To find the upper limit of the first class interval, we compute using the formula UL = LL + c -
1. Hence, UL = 27 + (9-1) = 27 + 8 = 35.

We have determined the lowest class interval and the first class interval as 27 – 35.

6. From the first class interval, we are ready to fill out the rest of the class intervals. The lower
limit of the second class interval is 27 + 9 (computed class size in step 3) = 36. The next lower
limit is determined by again adding 9 to the lower limit of the previous class size, hence we have
45, 54. 63, 72, 81 and 90.

For the upper limit of each class, perform the same process in determining the lower limit.
Hence, from 35, we add 9 and it gives us 44; 44 + 9 = 53, and so on. Be guided by the highest
value in the distribution so that you will not exceed the needed class intervals.

Classes f 𝑿𝒎 Class boundaries

27 - 35

36 - 44

45 - 53

54 - 62

63 - 71

72 - 80
81 - 89

90 - 98

7. Next is to fill out the frequency (f) column. As mentioned, tallying is done to come up with the
values. Be careful on this step because duplication is a threat if not done properly and
systematically. Other students put check marks on the scores already tallied and some double-
check the tallying.

Classes Tally (Do not include this in your frequency f 𝑿𝒎 Class


distribution table) boundar
ies
27 - 35 │││= 3 3

36 - 44 ││ = 2 2

45 - 53 ││││- ││ = 7 7

54 - 62 ││││ - ││││ = 10 10

63 - 71 ││││ - ││││ - │ = 11 11

72 - 80 ││││ - ││││ = 9 9
81 - 89 ││││ - │ = 6 6
90 - 98 ││ = 2 2
n=50

LLi +ULi 27+35


8a. For class marks, use the formula Xm = 2
. For the first class interval, Xm =
2
= 31. Do
this for the rest of the class intervals. A more convenient way of completing this column is to add the class
size to each class midpoint value. The class midpoint of the first class interval is 31. To determine the
midpoint value of the next class, we just add 9, hence we have 40, and so on.

Classes f 𝑿𝒎 Class boundaries

27 - 35 3 31

36 - 44 2 40

45 - 53 7 49

54 - 62 10 58

63 - 71 11 67

72 - 80 9 76

81 - 89 6 85

90 - 98 2 94
n=50

8b. To obtain the class upper and lower class boundaries, we use the formula:

LLi + UL(i+1)
UCBi = = LCB(i+1)
2

In the first class interval, the lower class boundary is 27 – 0.5 = 26.5 while the upper class
27+44
boundary is UCB1 = 2 = 35.5. For the second class interval, the lower class boundary is equivalent
36+53
to the upper class boundary of the first class interval while the upper class boundary is UCB2 = =
2
44.5. For a more convenient way of computation, you may deduct .5 from each corresponding lower limit
and add .5 to each corresponding upper limit.

Classes f 𝑿𝒎 Class boundaries

27 - 35 3 31 26.5 – 35.5

36 - 44 2 40 35.5 – 44.5

45 - 53 7 49 44.5 – 53.5

54 - 62 10 58 53.5 – 62.5

63 - 71 11 67 62.5 – 71.5

72 - 80 9 76 71.5 – 80.5

81 - 89 6 85 80.5 – 89.5

90 - 98 2 94 89.5 – 98.5

n=50

Derived Frequency Distribution

From a given frequency distribution, we can construct other frequency distributions like
the relative and the cumulative frequencies.

Relative Frequency
It shows the proportion in percent of the frequency of each class to the total frequency. It
is denoted by %f or rf.

The formula is:


𝑓
%f = 𝑛 𝑥 100
where %f is the relative frequency, f is the frequency of each class interval, and n is the
sample size (number of observations).

To continue with our table, we compute for the respective relative frequencies of each class
𝑓
interval. We use the formula %f = 𝑛 𝑥 100. Substituting the values from our table, we have %f =
𝑓
𝑥 100 = 6.00. (You may round off your answers to two decimal places). The same process will
𝑛
be employed to complete the column. Make sure that a sum of 100.00 is obtained when all the
relative frequencies of each class interval are computed.

Classes f 𝑿𝒎 Class boundaries %f

27 - 35 3 31 26.5 – 35.5 6.00

36 - 44 2 40 35.5 – 44.5 4.00

45 - 53 7 49 44.5 – 53.5 14.00

54 - 62 10 58 53.5 – 62.5 20.00

63 - 71 11 67 62.5 – 71.5 22.00

72 - 80 9 76 71.5 – 80.5 18.00

81 - 89 6 85 80.5 – 89.5 12.00

90 - 98 2 94 89.5 – 98.5 4.00

n=50 100.00

Cumulative Frequencies
These indicate the number of observations that lie above (or greater than) or below (or less
than) a class boundary. They are determined by cumulating or adding-up the absolute frequencies
of the distribution.

Two Types of Cumulative Frequency Distribution

1. Less than Cumulative Frequency (<cf)


It is obtained when successive frequencies are added from the smallest to the largest class
boundaries.

To determine the less than cumulative frequency, start from the lowest class interval. From
there add up the frequencies of each class interval. The last value should be equal to the sample
size.

Classes f 𝑿𝒎 Class boundaries %f <cf


27 - 35 3 31 26.5 – 35.5 6.00 3

36 - 44 2 40 35.5 – 44.5 4.00 5

45 - 53 7 49 44.5 – 53.5 14.00 12

54 - 62 10 58 53.5 – 62.5 20.00 22

63 - 71 11 67 62.5 – 71.5 22.00 33

72 - 80 9 76 71.5 – 80.5 18.00 42

81 - 89 6 85 80.5 – 89.5 12.00 48

90 - 98 2 94 89.5 – 98.5 4.00 50

n=50 100.00

2. Greater than Cumulative Frequency (>cf)


It is obtained when successive frequencies are added from the largest to the smallest class
boundaries.

To determine the less than cumulative frequency, start from the highest class interval. From
there add up the frequencies of each class interval. The last value should be equal to the sample
size.

Classes f 𝑿𝒎 Class boundaries %f <cf >cf

27 - 35 3 31 26.5 – 35.5 6.00 3 50

36 - 44 2 40 35.5 – 44.5 4.00 5 47

45 - 53 7 49 44.5 – 53.5 14.00 12 45

54 - 62 10 58 53.5 – 62.5 20.00 22 38

63 - 71 11 67 62.5 – 71.5 22.00 33 28

72 - 80 9 76 71.5 – 80.5 18.00 42 17

81 - 89 6 85 80.5 – 89.5 12.00 48 8

90 - 98 2 94 89.5 – 98.5 4.00 50 2

n=50 100.00

Graphical Presentation of Frequency Distribution


Frequency distributions are presented in graphical forms to give more meaningful
information. It may be in the form of: 1) histogram, 2) frequency polygon, or 3) ogive.
1. Histogram
It is a bar graph that consists of a set of rectangular bars having bases on the x-axis that
centers on the class midpoints of the class boundaries. The base width corresponds to the class size
and the heights of the rectangles correspond to the class frequencies.

2. Frequency Polygon
It is a line curve constructed by plotting the class frequencies on the y-axis against the class
midpoints on the x-axis.

3. Ogive
It is the graph of a cumulative frequency distribution. It is a falling frequency
polygon formed by plotting the cumulative frequencies on the y-axis against the class
boundaries on the x-axis.
Interpreting Frequency Counts and Percentages

Frequencies and percentages may be described using the following guide:


The Highest Description of the percentage results of
Percentage in the the table (used to start the table
Table interpretation, but use only once per
table)

100% All of the respondents…

97% - 99% Almost all of the respondents…

86% - 96% Most of the respondents…

76% - 85% Great majority of the respondents…

51% - 75% Majority of the respondents…

50% Half of the respondents…

49% and below A great percentage of the respondents…


A great number of the respondents…

Source: Communication Research Training conducted by the Development


Academy of the Philippines (DAP)

Table 5 is a quantitative data grouped into non-overlapping numerical intervals.


Table 5
Distribution of Biology professors according to Age

Age f %
65 -69 1 2.63
60 - 64 4 10.53
55 - 59 13 34.22
50 - 54 7 18.42
45 - 49 3 7.89
40 - 44 5 13.16
35 - 39 2 5.26
30 - 34 2 5.26
25 - 29 1 2.63
Total 38 100.00
Applying our table guide, we may interpret it as:

A great number of the Biology professors (13; 34.22%) are 55-59 years old.

As indicated in the guide, you only use the description of the percentage once. But
you may add in the discussion the categories with the highest frequency, the lowest and
other numerical values that you want to highlight.

The guide is also a great help if you wish to present the thematic data to summarize the
verbatim responses.

Activity No. 4

A. Fill out the table by citing five research title; give instances where you may use primary and
secondary as sources of information; and indicate at least one method of data collection that may
be applied and describe how the method is to be carried out.

Research Title Possible Possible Method of Data Description


Primary Secondary Collection
Source Source

B. The following table shows the civil status of females in certain locality. Present the data using
two appropriate graphical presentations.

Civil Status No. of Females


Single 63
Married 188
Widowed 32
Separated/Annulled 7
Total 290
C. Construct a rectangle graph and a pictograph showing the distribution of acceptors by birth
control method.

Birth Control Method No. of Acceptors


Pills 258
Condom 89
IUD 62
Rhythm 43
Injectables 17
Foam Tablets 25

D. Refer to the frequency distribution table in our example (50 students in Biological Statistics)
and answer the following:

___ a. class size


___ b. midpoint of the 4th class
___c. lower limit of the 5th class
___d. upper class boundary of the 3rd class
___e. class interval where the highest number of students is distributed
___f. lowest upper class limit
___g. highest upper class boundary
___h. class interval where the lowest number of students is distributed
___i. upper limit of the 6th class
___j. number of students whose scores are in the lowest class

E. Determine the class midpoint, class boundaries and class size of each of the following:

Classes Class Midpoint Class Boundaries Class Width (Class


Size)
a. 10 -19

b. 2.3 – 5.6

c. 12.54 – 16.22

d. 31 - 37

You might also like