Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
32 views19 pages

Data Presentation

The document discusses data presentation methods, focusing on organizing raw data into frequency distributions using tables and graphical representations. It explains key concepts such as class intervals, cumulative frequencies, and various methods for constructing frequency tables. Additionally, it highlights the importance of graphical methods like histograms and bar charts for visualizing data patterns.

Uploaded by

anyasidivine
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views19 pages

Data Presentation

The document discusses data presentation methods, focusing on organizing raw data into frequency distributions using tables and graphical representations. It explains key concepts such as class intervals, cumulative frequencies, and various methods for constructing frequency tables. Additionally, it highlights the importance of graphical methods like histograms and bar charts for visualizing data patterns.

Uploaded by

anyasidivine
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

1

DATA PRESENTATION
2.0 INTRODUCTION
When raw data are collected, they are organized numerically by distributing them into classes or categories in
order to determine the number of individuals belonging to each class. One of the first things that you may wish
to do when you have entered the data onto a computer is to summarize them in some way so that you can get an
insight into the vital information about the dataset. This can be done by producing diagrams, tables or summary
statistics. Diagrams are often powerful tools for conveying information about the data, for providing simple
summary pictures.
2.1 FREQUENCY DISTRIBUTION TABULAR METHODS
Frequency table is a tabular arrangement of data into various classes together with their corresponding
frequencies. The presentation of data in a meaningful way is done by preparing a frequency distribution. A table listing
all classes and their frequencies is called frequency distribution.
Some Basic Definitions.
Variable: This is a characteristic of a population which can take different values. Basically, we have two types,
namely: continuous variable and discrete variable.
A continuous variable is a variable which may take all values within a given range. Its values are obtained by
measurements e.g. height, volume, time, exam score etc.
A discrete variable is one whose value change by steps. Its value may be obtained by counting. It normally
takes integer values e.g. number of cars, number of chairs.
Class interval: This is a sub-division of the total range of values which a (continuous) variable may take. It is
a symbol defining a class E.g. 0-9, 10-19 etc. there are three types of class interval, namely: Exclusive, inclusive
and open-end classes method.
Exclusive method:
When the class intervals are so fixed that the upper limit of one class is the lower limit of the next class; it is
known as the exclusive method of classification. E.g. Let some expenditures of some families be as follows: 0–
1000, 1000–2000, etc. It is clear that the exclusive method ensures continuity of data as much as the upper limit
of one class is the lower limit of the next class. In the above example, there are so families whose expenditure
is between 0 and 999.99. A family whose expenditure is 1000 would be included in the class interval 1000-
2000.
2

Inclusive method:
In this method, the overlapping of the class intervals is avoided. Both the lower and upper limits are included in
the class interval. This type of classification may be used for a grouped frequency distribution for discrete
variable like members in a family, number of workers in a factory etc., where the variable may take only integral
values. It cannot be used with fractional values like age, height, weight etc. In case of continuous variables, the
exclusive method should be used. The inclusive method should be used in case of discrete variable.
Open end classes:
A class limit is missing either at the lower end of the first class interval or at the upper end of the last class
interval or both are not specified. The necessity of open end classes a rise in a number of practical situations,
particularly relating to economic and medical data when there are few very high values or few very low values
which are far apart from the majority of observations.
Class limit: it represents the end points of a class interval. {Lower class limit & Upper class limit}. A class
interval which has neither upper class limit nor lower class limit indicated is called an open class interval e.g.
“less than 25”,”25 and above”
Class boundaries: The point of demarcation between a class interval and the next class interval is called
boundary. For example, the class boundary of 10-19 is 9.5–19.5
Cumulative frequency: This is the sum of a frequency of the particular class to the frequencies of the class
before it.
Procedure for forming frequency distribution

Given a set of observation x1, x2 , x3 ,...., xn , for a single variable.


1. Determine the range (R) = L – S where L = largest observation in the raw data; and S = smallest observation
in the raw data.
2. Determine the appropriate number of classes or groups (K). The choice of K is arbitrary but as a general rule,
it should be a number (integer) between 5 and 20 depending on the size of the data given. There are several
suggested guide lines aimed at helping one decide on how many class intervals to employ. Two of such methods
are:

(a) K  1  3.322(log10n )

(b) K  n where n = number of observations.


R
3. Determine the width w of the class interval. It is determined as w  .
K
4. Determine the numbers of observations falling into each class interval i.e. find the class frequencies.
3

Ungroup Frequency Distribution


Example 1. Let us demonstrate the concept of a frequency distribution by using the following set.
153413252413201212021453
Let x represent these data values, we can use a frequency distribution to represent this set of data by listing the
x values with their frequencies in Table below
Table 1. Frequency distribution table
X 0 1 2 3 4 5
F 2 6 6 4 3 3
Group Frequency Distribution
In the case where many different entries for x and several low frequencies, it often makes sense to combine the
data in groups or classes. This kind of frequency distribution is called grouped frequency distribution.
Example 2. Let us demonstrate this with this example:
55 60 35 41 43 50 78 72 83 45 70 76 31 49 65
79 83 41 86 53 62 52 47 38 57 64 78 47 54 43
73 85 48 66 48 85 86 82 48 56 84 37 57 45 95
45 73 39 55 60 61 35 41 43 50 78 72 83 45 70
76 31 49 65 79 83 41 86 53 62 52 47 38 57 64
78 47 54 43 73 85 48 66 48 85 86 82 48 56 84
37 57 57 45 95 45 73 39
The following guidelines and terminology will be used to group continuous-type data into classes of equal
length. These guidelines can also be used for sets of discrete data that have a large range.
1. Determine the largest (maximum) and smallest (minimum) observations. The range is the difference, R =
maximum – minimum

2. A frequency distribution should have a minimum of 5 classes and a maximum of 20. For small data sets, use
between 5 and 10 classes. For large data sets, use up to 20 classes.

3. Each data entry must fall into one and only one class.

4. There should be no gaps. Moreover, if there are no entries for a particular class, that class must still be included
with a frequency of 0.

5. The first interval should begin about as much below the smallest value as the last interval ends above the
largest.
The intervals are called class intervals and the boundaries are called class boundaries.
4

The class limits are the smallest and largest possible observed values in a class.
The class mark is the midpoint of a class.
We set up the following classes for the above data 30 – 39, 40 – 49, 50 – 59, etc. We now create a summary
table below:
Table 2. Frequency distribution table
Class Class limits Frequency Class boundaries Class center
1 30 – 39 5 29.5 – 39.5 34.5
2 40 – 49 13 39.5 – 49.5 44.5
3 50 – 59 9 49.5 – 59.5 54.5
4 60 – 69 6 59.5 – 69.5 64.5
5 70 – 79 8 69.5 – 79.5 74.5
6 80 – 89 8 79.5 – 89.5 84.5
7 90 – 99 1 89.5 – 99.5 94.5
Tables like this show us how the data are spread out or distributed; we call this a frequency distribution table or
simply a frequency distribution.
Cumulative and Relative Frequencies
When frequencies of two or more classes are added up, such total frequencies are called Cumulative
Frequencies. The cumulative frequency of a class is the total of all class frequencies up to and including the
present class. On the other hand, relative frequencies express the frequency of each value or class as a percentage
to the total frequency. The relative frequency for a class is the number of entries in the class divided by the total
number of entries. It is calculated as
frequency in the class
Re lative frequency 
Total number of observations
Percentage Relative frequency = (relative frequency x100)%
9
For example, the relative frequency of class 50-59 is 100%  18%
50
The cumulative frequency distribution of the example given above is as follows:
5

Table 3: Cumulative Frequency Table


Class Class limits Freq. Cum Freq. % Relative Cum. Freq.
1 30 – 39 5 5 10%
2 40 – 49 13 18 36%
3 50 – 59 9 27 54%
4 60 – 69 6 33 66%
5 70 – 79 8 41 82%
6 80 – 89 8 49 98%
7 90 – 99 1 50 100%
Relative Cumulative Frequency is also called Percentage Cumulative Frequency. For example, the Relative
Cumulative Frequency for class 60 – 69 is (33/66) x 100% = 50
Example 3: The following are the marks of 50 students in STAT 102:
48 70 60 47 51 55 59 63 68 63 47 53 72 53 67 62 64 70 57 56
48 51 58 63 65 62 49 64 53 59 63 50 61 67 72 56 64 66 49 52
62 71 58 53 63 69 59 64 73 56
(a) Construct a frequency table for the above data.
(b) Answer the following questions using the table obtained:
(i) how many students scored between 51 and 62?
(ii) how many students scored above 50?
(iii) what is the probability that a student selected at random from the class
will score less than 63?
Solution:
(a) Range (R) = 73 − 47 = 26

No of classes ( K )  n  50  7.07  7
Class size w = 26
7 = 3.7 ≈ 4
6

Table 4.
Mark Tally Frequency
47-50 11111 11 7
51- 54 11111 11 7
55 – 58 11111 11 7
59 – 62 11111 111 8
63 – 66 11111 11111 1 11
67 – 70 11111 1 6
71 – 74 1111 4

 f  50
(b) (i) 22 (ii) 43 (iii) 0.58
Example 4: The following data represent the ages (in years) of people living in a housing estate in Aba. 18 31
30 6 16 17 18 43 2 8 32 33 9 18 33 19 21 13 13 14 14 6 52 45 61 23 26 15 14 15 14 27 36 19 37 11 12 11 20 12
39 20 40 69 63 29 64 27 15 28.
Present the above data in a frequency table showing the following columns; class interval, class boundary, class
mark (mid-point), tally, frequency and cumulative frequency in that order.
Solution:
Range (R)= 69 − 2 = 67

No of classes K  n  50  7.07  7.00


Class width ( w)  R  67  9.5  10
K 7
Frequency Table 5
Class interval Class boundary Class mark Tally Frequency Cum.freq
2-11 1.5-11.5 6.5 11111 11 7 7
12-21 11.5-21.5 16.5 11111 11111 11111 21 28
11111 1
22-31 21.5-31.5 26.5 11111 111 8 36
32-41 31.5-41.5 36.5 11111 11 7 43
42-51 41.5-51.5 46.5 11 2 45
52-61 51.5-61.5 56.5 11 2 47
62-71 61.5-71.5 66.5 111 3 50
7

2.2 GRAPHICAL METHODS


Graphical (pictorial) representation of the data reveals patterns of behaviour of the variable being studied. There
are several graphic (pictorial) ways to describe data. The type of data and the idea to be presented determines
the method used.
Some of the common graphical presentations of statistical data are: 1) Histogram, 2) Bar chart, 3) Polygon,
4)Curve, 5)Pictogram, 5)Cumulative(less) distribution, 6)Cumulative(more) distribution,7) pie chart, and
8) cumulative frequency curve (Ogive)
The most common among all graphical presentations of statistical data is the histogram, an example of which is
shown in figure 2.1.
8

A histogram is constructed by representing measurement or observations that are grouped (in figure 2.1 the
scores) on a horizontal scale, the class frequencies on a vertical scale, and drawing rectangles whose bases equal
the class interval and whose heights are determined by the corresponding class frequencies. The markings on
the horizontal scale can be the class limits as in figure 2.1, the class boundaries, the class marks, or arbitrary key
values.
For easy readability it is generally preferable to indicate the class limits, although the bases of the rectangles
actually go from one class boundary to the next. Similar to histograms are bar charts, like the one of figure 2.2,
where the lengths of the bars are proportional to the class frequencies, but there is no pretense of having a
continuous (horizontal) scale. There are several points that must be watched in the construction of histograms.
First, it must be remembered that this kind of figure cannot be used for distributions with open classes. Second,

it should be note
9

that the picture presented by a histogram can be very misleading if a distribution hits unequal classes and no
suitable adjustments are made. To illustrate this point, let us regroup the distribution of the 150 scores by
combining all those from 60 to 79 into one class. Thus, the new distribution is given by the following table 6.
Frequency Table 6
Scores Frequency
10-19 1
20-29 6
30-39 9
40-49 31
50-59 42
60-69 49
70-79 10
80-89 2
90-99 2
and its histogram (with the class frequencies represented by the heights of the rectangles) is shown in figure 2.3.
An alternate, though less widely used, form of graphical presentation is the frequency polygon (see figure 2.6).
Here the class frequencies are plotted at the class marks and the successive points are connected by means of
straight lines. Note that we added classes with zero frequencies
10
11
12

Multiple bar chart


In this type of chart, the component figures are shown as separate bars adjoining each other. The height of each
bar represents the actual value of the component figure. It depicts distributional pattern of more than one
variable.
Example 5
The following table shows the intake through JAMB by the Faculty of Science of a certain University in three
consecutive years.
Table 7
Department 2002 2003 2004
Botany 43 40 35
Chemistry 28 35 42
Zoology 45 40 35
Computer Science 33 25 28
Physics 40 35 38
Mathematics 35 42 45
Biology 37 40 42
Total 261 257 265
Draw (i) multiple bar chart department by department for the three years.
(ii) a component bar chart.
Solution

Figure 2.8: Multiple bar chart for JAMB Admission


Component (or sub-divided) Bar Diagram
13

Bars are sub-divided into component parts of the figure. These sorts of diagrams are constructed when each total
is built up from two or more component figures. They can be of two kinds:
Actual Component Bar Diagrams: When the overall height of the bars and the individual component lengths
represent actual figures.
Percentage Component Bar Diagram: Where the individual component lengths represent the percentage each
component forms the overall total. Note that a series of such bars will all be the same total height, i.e., 100
percent.
(ii)

Figure 2.9: Component bar chart for JAMB Admission


Pie-chart
The pie chart (circle graph) is used to display relative frequencies rather than actual frequencies for the data
(qualitative or quantitative discrete data). We draw a circle and then divide it into a series of wedges or slices to
represent each class in the relative frequency distribution. The size of each slice is proportional to the percentage
of the data that fall into the corresponding class.
EXAMPLE 6
The population of five towns in a country X in 1986 is as follows:
Solution:
Town Population
A 50,000
B 100,000
C 25,000
D 12500
E 12500
14

Town Population Sectorial Angle


A 50,000 900
B 100,000 1800
C 25,000 450
D 12500 22.50
E 12500 22.50

Fig. 2.10: A pie chart showing the population of town X in 1986

Histograms
A histogram is the graph of the frequency distribution of continuous measurement variables (quantitative
continuous data). It is constructed on the basis of the following principles:
i. The horizontal axis is a continuous scale running from one extreme end of the distribution to the other. It
should be labelled with the name of the variable and the units of measurement.

ii. For each class in the distribution a vertical rectangle is drawn with
(i) its base on the horizontal axis extending from one class boundary of the class to the other class boundary,
there will never be any gap between the histogram rectangles.
15

(ii) the bases of all rectangles will be determined by the width of the class intervals. If a distribution with unequal
class-interval is to be presented by means of a histogram, it is necessary to make adjustment for varying
magnitudes of the class intervals.
Values for the class boundaries, class limits, or class marks may be labeled along the x-axis. Use whichever one
of these sets of class numbers best represents the variable.
Example: use the Table below to draw the histogram.
Table 8
Class Class limits Frequency Class boundaries Class center
1 30 – 39 5 29.5 – 39.5 34.5
2 40 – 49 13 39.5 – 49.5 44.5
3 50 – 59 9 49.5 – 59.5 54.5
4 60 – 69 6 59.5 – 69.5 64.5
5 70 – 79 8 69.5 – 79.5 74.5
6 80 – 89 8 79.5 – 89.5 84.5
7 90 – 99 1 89.5 – 99.5 94.5
Frequency Polygon
If we join the midpoints of the tops of the adjacent rectangles of the histogram with line segments a frequency
polygon is obtained. Note that it is not essential to draw histogram in order to obtain frequency polygon. It can
be drawn without erecting rectangles of histogram as follows:
i. The scale should be marked in the numerical values of the mid- points of intervals.
ii. Erect ordinates on the midpoints of the interval - the length or altitude of an ordinate representing the
frequency of the class on whose mid-point it is erected.

iii. Join the tops of the ordinates and extend the connecting lines to the scale of sizes.
Cumulative Frequency Curve or Ogive
A frequency distribution can easily be converted to a cumulative frequency distribution by replacing the
frequencies with cumulative frequencies. When the cumulative frequencies of a distribution are graphed the
resulting curve is called Ogive Curve.
The vertical scale represents either the cumulative frequencies or the relative cumulative frequencies. The
horizontal scale represents the upper class boundaries. Until the upper class boundary of a class has been
reached, you cannot be sure you have accumulated all the data in that class. Therefore, the horizontal scale for
an Ogive is always based on the upper class boundaries.
16

Example 7
i. Prepare an Ogive

ii. Give the estimates of the quartiles

iii. Find the median

iv. Estimate the 30 and 70 percentiles

v. Obtain the Range, Interquartile range and semi interquartile range

vi. What number of students scored marks between 60% and 80%?

vii. What will be the pass mark if 60% of the student failed?

Solution

Figure 9: Cumulative frequency curve


Quartiles
Q1 = 25th percentile = 46.5
Q2 = 50th percentile = 59.5
Q3 = 75th percentile = 72
median is the 50th percentile and it is equal to 59.5
30th percentile = 49.5
70th percentile = 68
(iii) Range = Highest mark - Lowest mark
= 95 – 31 = 64
17

from the raw data in section 2.1


Interquartile range = Q3 - Q1
= 72 - 46.5 = 25.5
Semi-Interquartile range = (Q3 - Q1)/2 = (72 - 46.5)/2 = 25.5/2
= 12.75
At 60% mark this intercept the curve at cumulative frequency of 25 students and at 80% mark this intercept the
curve at cumulative frequency of 43. Therefore, the number of students that scored between 60% and 80% mark
are 43 – 25 = 18 students
(vii) If 60% of the students failed, the pass mark will be from the 60th percentile mark. Trace this to the curve
and the pass mark will be 67.
Line graph
Line graphs are diagrammatical representation of the relation between two variables x and y. The co-ordinate
points of these variables are joined together to have the line graph.
The line graph is especially useful for the study of some variables according to the passage of time. The time,
in weeks, months or years is marked along the horizontal axis; and the value of the quantity that is being studied
is marked on the vertical axis. The distance of each plotted point above the base-line indicates its numerical
value. The line graph is suitable for depicting a consecutive trend of a series over a long period.
Example 8
Draw a line graph to represent the information below:
Before 14 20 21 24 22 25 26
After 16 24 23 25 30 27 34
SOLUTION

Figure 2.11: Line graph


18

2.3 EXERCISES ON CHAPTER TWO


1. The following are baby weighs delivered in a general hospital
2.57 4.21 1.05 3.06 4.50 5.05 3.45 2.15 0.92 4.10 2.21 2.18 3.33 2.48 1.47
3.12 2.67 0.76 4.13 5.93 4.15 2.03 0.57 1.85 3.41 5.86 4.29 5.35 3.81 0.82
1.86 2.53 1.46 3.85 5.12 3.24 1.89 2.51 0.95 1.24 3.57 3.50 1.27 4.25 0.91
Classify these data into a grouped frequency distribution by using classes of 0.01 – 1.00, 1.01 – 2.00, . . ., 5.01
– 6.00
Find the class width
iii. For the class 4.01 – 5.00, name the value of:
the class center
the class limit,
the class boundaries
iv. Construct a relative frequency histogram of these data.
2. The following table shows the frequency distribution of marks of 200 students in a mathematics examination.
Mark 10-19 20-29 30-39 40-49 50-59 60-69 70-79 80-89
Frequency 4 26 52 16 36 34 20 12
i. Draw a cumulative frequency curve and estimate the Quartiles.
ii. Calculate the interquartile and semi-interquartile range from your graph.

iii. Find the pass mark if only 20% of the students should pass.

iv. How many of the students scored between 60% and 85%?
3. Use the table in Exercise 2. to answer the following questions
i. Draw a bar chart of the frequency distribution
ii. Draw a pie chart of the frequency distribution
ii. Draw a histogram of the frequency distribution
4. The following table shows the number admitted into the postgraduate programme for 2 years.
Department 2003 2004
Chemical Engr. 41 37
Electrical Engr. 40 48
Surveying 35 25
Geography 45 48
Mechanical Engr. 50 45
Geology 35 45
Draw a component and multiple bar charts for this data
19

5. The year, x, and the birthrate, y, for 1980 – 2000 were as follows:
Year (x) Birthrate (y)
1980 25,004
1981 25,100
1982 24,345
1983 24,850
1984 23,563
1985 23,236
1986 24,450
1987 18,053
1988 19,245
1989 18,348
1990 15,434
1991 13,347
1992 14,111
1993 15,243
1994 16,172
1995 18,815
1996 17,345
1997 16,457
1998 18,413
1999 19,400
2000 18,721
i. Construct a line graph of these birthrates.

ii. Interpret your output/result

You might also like