Data Management
Introduction
• Statistics is the branch of science that deals with
the collection, presentation, organization,
analysis and interpretation of data.
• The population is the collection of all elements
under consideration in a statistical inquiry. The
sample is a subset of the population.
• The variable is a characteristics or attributes of
the elements in a collection that can assume
different values for the different elements.
Introduction
• The parameter is a summary measure describing a
specific characteristic of the population. The statistic is a
summary measure describing a specific characteristic of
the sample.
• Fields of Statistics
1. Applied Statistics is concerned with the procedures
and techniques used in collection, presentation,
organization, analysis and interpretation of data.
2. Theoretical statistics is concerned with the
development of the mathematical foundations of the
methods used in applied statistics.
Areas of interest in Applied Statistics
1. Descriptive statistics includes all the techniques
used in organizing, summarizing and presenting the
data on hand.
2. Inferential statistics includes all the techniques used
in analysing the sample data that will lead to
generalizations about a population from which the
sample came from.
Descriptive Statistics
Measures of Central Tendency are descriptive measures that
are used to describe the center of a set of data, arranged
numerically.
1. The arithmetic mean is the most common type of average.
It is the sum of all the observed values divided by the
numbers of observations.
2. The median is the value that divides the array into two
equal parts.
3. The mode is the observed value that occurs with the
Consider the given set of data:
Set A: 9, 12, 13, 15, 15, 17, 24
Ⴟ = 15 Median = 15 Mode = 15
Set B: 7, 11, 15, 15, 17, 19, 21
Ⴟ = 15 Median = 15 Mode = 15
Set C: 11, 11, 14, 15, 17, 17, 18, 20
Ⴟ = 15. 38 Median = 16 Mode= 11 and 17 Bimodal
• Give the mean, median and mode
5, 5, 2, 7, 9, 10, 7, 8, 6 , 14, 20, 25
Mean = 9.83
Median = 7.5
Mode = 5 and 7
Weighted mean - is a type of mean that is calculated by
multiplying the weight (or probability) associated with a
particular event or outcome with its associated quantitative
outcome and then summing all the products together.
What is the weighted mean of the ff:
Course Units Grades Weighted
mean
Mathematics 3 3.00 9
English 3 2.00 6
P.E. 2 1.25 2.5
2.8 or 2.75
THE MEASURES OF CENTRAL TENDENCY FOR
GROUPED DATA
Data which are arranged in a frequency distribution are
called grouped data.
Computation of the Mean for Grouped Data
To determine the mean of the interval and the data
organized into the frequency distribution, use the
summation of the product of the frequency and the
midpoint of the class
ΣfX
x = _____________
N
where:
x = is the arithmetic mean
x = is the midpoint of each class
f = the frequency of each class
N = the total number of frequencies
Class interval f X fX
172 – 180 2 176 352
163 – 171 4 167 668
154 – 162 7 158 1,106
145 – 153 10 149 1,490
136 – 144 9 140 1,260
127 – 135 5 131 655
118 – 126 3 122 366
N= 40 ΣfX =5,897
Ⴟ = ΣfX
N
5,897
=____________
40
Ⴟ = 147. 43
Computation of the Median for Grouped Data
• The median is the preferred measure of central tendency when
one does not want extreme scores to influence the average.
• In order to compute the median from grouped data, we also
have to determine the value which divides the distribution into
two equal parts, thus, we consider the less than cumulative
frequency.
• The median can be estimated by locating the point in which the
median class lies.
• The median class lies where half of the observation is located
in the cumulative frequency.
• Example:
• Given the frequency distribution below, solve for
the median.
Class interval F <Cf
172 – 180 2 40
163 – 171 4 38
154 – 162 7 34
*145 – 153 10 27
136 – 144 9 17
127 – 135 5 8
118 – 126 3 3
N= 40
• Substitute the following values
in the formula for median:
144 + 145
LL = _____________ = 144.5
2
F = 10
Cf = 17 i=9
N/2 - Cf
Median (Md) = LL + (___________ ) i
F
= 144.5 + ( 40/2 - 17 ) ( 9 )
10
= 144.5 + ( 20 - 17 ) (9 )
10
= 144.5 + (____3___ )( 9 )
10
= 144. 5 + 2.7
Md = 147.2
• Computation of the Mode for Grouped Data
• The Mode in a frequency distribution is within the class
interval with the highest frequency. The class interval with
the highest frequency is known as the modal class.
• A crude mode may be determined by taking the class
mark with the highest frequency. However, the rough
approximation may be improved by considering the
frequencies adjoining the modal frequency.
• Formula:
• Δ1
• Mode (Mo) = Lm + ( ) i
• Δ1 + Δ2
•
• where : Lm = lower limit of the modal class ( )
• Δ1 = difference between the frequencies of the modal class
• and the next class lower in value
• Δ2 = difference between the frequencies of the modal
• class and the next class higher in value
• i = class size
•
• Example:
• Given the frequency distribution below, solve for
the mode
Class interval F
172 – 180 2
163 – 171 4
154 – 162 7
*145 – 153 10
136 – 144 9
127 – 135 5
118 – 126 3
N= 40
Substitute the following values in the formula for mode:
Lm = 144 + 145 = 144.5 Δ2 = 10 – 7 = 3
2
Δ1 = 10 – 9 = 1 i=9
Mode (Mo) = Lm + Δ1 ( I )
Δ1 + Δ2
= 144.5 + 1 (9 )
1+ 3
= 144.5 + 1 (9)
4
= 144. 5 + 2 .25 = 146.75