STATISTICS AND
PROBABILITY
STATISTICS
is an art and science that deals with the collection, organization, creative
presentation, analysis and interpretation quantitative of data.
FIELDS OF STATISTICS
DESCRIPTIVE STATISTICS
is concerned with the methods of collecting, organizing, and presenting
data appropriately and creatively to describe or assess group
characteristics.
INFERENTIAL STATISTICS
is concerned with inferring or drawing conclusions about the population
based from preselected elements of that population.
JOHN PAUL D. GUNNAWA
.
CONSTANT AND VARIABLE
CONSTANT
refer to the fundamental quantities that do not change in value.
VARIABLE
are quantities that may take anyone of a specified set of value. These set of
values can be classified as qualitative (categorical) and quantitative (numerical)
variables.
QUALITATIVE VARIABLES
are non measurable characteristics that cannot assume a numerical value but can
be classified into two or more categories.
QUANTITATIVE VARIABLES
are those quantities that can be counted with your bare hands, can be measured
with the use of some measuring devices, or can be calculated with the use of a
mathematical formula. JOHN PAUL D. GUNNAWA
Quantitative variables are classified as discrete and continuous.
DISCRETE VARIABLES
consist of variates usually obtained by counting.
CONTINUOUS VARIABLES
are obtained by measurement, usually with units such as height in meter,
weight in kgs and time in minute.
DATA AND INFORMATION
DATA
usually refers to facts concerning things such as status in life of people,
defectiveness of object or effect of an event to the society.
INFORMATION
is a set of data that have been processed and presented in a form suitable for
human interpretation, usually with a purpose of revealing trends or patterns
about the population. JOHN PAUL D. GUNNAWA
SOURCE OF DATA
There are two source of obtaining data
1. PRIMARY SOURCE
from which a firsthand information is obtained usually by means of personal
interview and actual observation.
2. SECONDARY SOURCE
is taken from other works, new report, reading, and those that are kept by the
Philippines Statistics Authority, exchange commission and S.S.S.
SCALES OF MEASURING DATA
These classifications called scales of measurement, are the following:
1. NOMINAL SCALE
classifies objects or peoples` responses so that all of those in a single category are
equal with respect to some attributes and then each category is coded numerically.
JOHN PAUL D. GUNNAWA
2. ORDINAL SCALE
classifies object or individual responses according to degree or level, then each
level is coded numerically.
3. INTERVAL SCALE
refers to quantitative measurement in which lower and upper control limits are
adapted to classify relative and differences of item numbers or actual scores.
4. RATIO SCALE
it takes into account the interval size and ratio of two related quantities, which
are usually based on a standard measurement.
METHOD OF COLLECTING DATA
1. DIRECT OR INTERVIEW METHOD
is a person to person interaction between an interviewer and an interviewee.
2. INDERICT OR QUESTINNAIRE METHOD
is an alternative method for the interview method.
JOHN PAUL D. GUNNAWA
3. REGISTRATION METHOD
is enforced by private organization or government agencies for recording
purposes.
4. OBSERVATION
is a scientific method investigation that makes possible use of all senses to
measure or obtained outcomes/responses from the object of study.
5. EXPERIMENTATION
is used when the objective is to determine the cause-and-effect of a certain
phenomenon under some controlled condition.
POPULATION AND SAMPLE
POPULATION
is a finite or infinite collection of objects, events, or individual with specified
class or characteristics under consideration.
JOHN PAUL D. GUNNAWA
SAMPLE
is a finite or limited collection of objects, events, or individual selected from a population.
SYMBOLS FOR PARAMETER
• POPULATION SIZE……………………………………………………. N
• POPULATION MEAN ………………………………………….…….. μ
• POPULATION STANDARD DEVIATION…………............... σ
• POPULATION VARIANCE …………………………………………. σ²
• POPULATION COEFFICIENT OF CORRELATION …………. p
SYMBOL FOR STATISTICS
• SAMPLE SIZE…………………………………………………. N
• SAMPLE MEAN………………………………………………. ˉx
• SAMPLE STANDARD DEVIATION……………………… s
• SAMPLE VARIANCE ………………………………………… s²
• SAMPLE COEFFICIENT OF CORRELATION ……….. r
JOHN PAUL D. GUNNAWA
CENSUS AND SAMPLING TECHNIQUES
CENSUS
is a vital tool if the information gathered would be use for administrative purposes
and if it is of local or national concern.
SAMPLE
is a portion or sub-aggregate of the population that should represent the common
qualities or characteristics of the population.
RANDOM AND NON-RANDOM SAMPLING
RANDOM SAMPLING
is the most commonly used sampling technique in which each member in the
population is given an equal chance of being selected in the sample.
NON-RANDOM SAMPLING
is a method of collecting a small portion of the population by which not all the
member in the population are deliberately left out from the selection for varied
reasons. JOHN PAUL D. GUNNAWA
.
PROPERTIES OF RANDOM SAMPLING
1. EQUIPROBABILITY
means that each member of the population has an equal chance of being
selected and included in the sample.
2. INDEPENDENCE
means that the chance of the one member being drawn does not affect the
chance of the other member.
TWO KINDS IF RANDOM SAMPLING
1. RESTRICTED RANDOM SAMPLING
involves certain restriction intended to improve the validity of the sampling.
2. UNRESTRICTED RANDOM SAMPLING
is considered the best random sampling design because there were no
restrictions imposed and every member in the population has an equal chance of
being included in the sample. JOHN PAUL D. GUNNAWA
RANDOM SAMPLING TECHNIQUES
1. LOTTERY OR FISHBOWL SAMPLING
this is done by simply writing the names or numbers of all members of the population
in small rolled pieces of paper which are later placed in a container.
2. SAMPLING WITH THE USE OF TABLE OF RANDOM NUMBERS
if the population is large, a more practical procedure is the use of table of random
number which contains rows and columns of digits randomly ordered by a computer.
3. SYSTEMATIC SAMPLING
this method of sampling is done by taking every kth element in the population.
4. STRATIFIED RANDOM SAMPLING
when the population can be partitioned into several strata or subgroups, it may be
wiser to employ the stratified technique to ensure a representative of each group in the
sample.
5. MULTI-STAGE OR MULTIPLE SAMPLING
this technique uses several stages or JOHN
phases in getting the sample from the population.
PAUL D. GUNNAWA
NON-RANDOM SAMPLING TECHNIQUES
1. JUDGEMENT OR PURPOSIVE SAMPLING
this method is also referred as non-random or non-probability sampling.
2. QUOTA SAMPLING
this is relatively quick and inexpensive method to operate since the choice of the number
of persons or elements to be included in a sample is done at the researchers own
convenience.
3. CLUSTER SAMPLING
this is sometimes referred to as area sampling because it is usually applied on a
geographical basis.
4. INCIDENTAL SAMPLING
this design is applied to those samples which are taken because they are the most
available.
5. CONVENIENCE SAMPLING
this method has been widely used in television and radio programs to find out opinion of
TV viewers and listeners regarding a controversial issue.
JOHN PAUL D. GUNNAWA
ORGANIZATION AND PRESENTATION OF DATA
DATA/INFORMATION
TEXTUAL
TABULAR
MAP GRAPH/CARTOGRAPH
SCATTER POINT DIAGRAM
PIE/PICTURE GRAPH
JOHN PAUL D. GUNNAWA
FORMS OF PRESENTATION OF DATA
A. TEXTUAL
this form of presentation combines text and numerical facts in statistical report.
B. TABULAR
this form of presentation is better than textual form because it provides numerical
facts in a more concise and systematic manner.
Advantage of Tabular Presentation
1. It is brief, it reduces the matter to the minimum.
2. It provides the reader a good grasp of the meaning of the quantitative relationship
indicated in the report.
3. The column and rows make comparison easier.
C. GRAPHICAL PRESENTATION
this form is the most effective means of organizing and presenting statistical data
because the important relationships are brought out more clearly and creatively in
virtually solid and colorful figures. JOHN PAUL D. GUNNAWA
.
DIFFERENT KINDS OF GRAPHS/CHARTS
1. LINE GRAPH
it shows relationships between two sets of quantities. This is done by plotting
points of X set of quantities along the horizontal axis against the Y set of
quantities along the vertical axis in a Cartesian coordinate plane.
JOHN PAUL D. GUNNAWA
2. BAR GRAPH
it consist of bars or rectangle of equal widths, either drawn vertically or
horizontally, segmented or non-segmented.
JOHN PAUL D. GUNNAWA
3. CIRCLE GRAPH or PIE CHART
it represent relationship of the different components of a single total as revealed
in the sectors of a circle.
JOHN PAUL D. GUNNAWA
.
4. PICTURE GRAPH or PICTOGRAM
it is a visual presentation of statistical quantities by means of drawing pictures or
symbols related to the subject under study.
JOHN PAUL D. GUNNAWA
.
5. MAP GRAPH or CARTOGRAM
it is one of the best ways to present geographical data.
JOHN PAUL D. GUNNAWA
6. SCATTER POINT DIAGRAM
it is graphical device to show the relationship between two quantities variables.
JOHN PAUL D. GUNNAWA
FREQUENCY DISTRIBUTION
is a tabulation or grouping of data into appropriate categories showing the
number of observations in each group or category.
Consider the given data below which show the scores of 60 students in a statistics
test.
5 13 8 6 13 10 5 13 15 16
8 12 15 10 12 16 12 9 3 7
11 15 11 7 15 2 13 5 9 12
13 9 12 9 9 14 12 11 19 13
16 18 3 13 18 10 15 14 18 11
10 12 6 9 5 17 9 6 9 18
The numbers shown above are called raw data.
JOHN PAUL D. GUNNAWA
PART OF FREQUENCY TABLE
1. CLASS LIMITS
groupings or categories defined by lower and upper limits.
Example:
16 – 20
21 – 25
26 – 30
Lower class limits are the smallest numbers that belong to the different classes.
Upper class limits are the highest numbers that belong to the different classes.
2. CLASS SIZE – width of each class interval.
L.L U.L
16 - 20 class size is 5
21 - 25
JOHN PAUL D. GUNNAWA
3. CLASS BOUNDARIES
are the numbers used to separate class but without gaps created by class limit.
C.I C.B
L.L U.L L.C.B U.C.B
16 - 20 15.5 - 20.5
21 - 25 20.5 - 25.5
26 - 30 25.5 - 30.5
4. CLASS MARK
are the midpoint of the lower and upper class limits.
C.I CLASS MARK (X)
16 - 20 18
21 - 25 23
26 - 30 28
JOHN PAUL D. GUNNAWA
.
•The
construction of this distribution is a very simple activity that requires the
following steps.
1. Get the value of the range. The range denoted by R, refers to the difference
between the highest and the lowest value in the distribution.
R=H–L
2. The number of classes can be approximately by using the relationship
k = 1 + 3.3 log n
Where : k is the number of classes
n is the sample size
3. Determine the size of the class interval. The value of c can be obtained by
dividing the range by the desired number of classes.
c=
4. Construct the classes.
JOHN PAUL D. GUNNAWA
Example:
Test Scores Obtained by the sixty Student in a Statistic Class
48 73 57 57 69 88 11 80 82 47
46 70 49 45 75 81 33 65 38 59
94 59 62 36 58 69 45 55 58 65
30 49 73 29 41 53 37 35 61 48
22 51 56 55 60 37 56 59 57 36
12 36 50 63 68 30 56 70 53 28
Steps 1. Get the range
R = H – L = 94 – 11 = 83
Step 2. Determine the number of class intervals.
k = 1 + 3.3 log n
= 1 + 3.3 log 60
= 6.88 or 7
Step 3. Determine the size of the class interval.
c = R = 83 = 11.86 or 12 JOHN PAUL D. GUNNAWA
k 7
Classes f x Classes Boundaries
11 - 22 3 16.5 10.5 – 22.5
23 - 34 5 28.5 22.5 – 34.5
35 - 46 11 40.5 34.5 – 46.5
47 - 58 19 52.5 46.5 – 58.5
59 - 70 14 64.5 58.5 – 70.5
71 - 82 6 76.5 70.5 – 82.5
83 - 94 2 88.5 82.5 – 94.5
n = 60
JOHN PAUL D. GUNNAWA
Example 2.
The intelligence quotients of 100 freshmen students admitted at the Senior High
School- Department of a certain university were taken and show below.
95 115 110 119 98 93 112 91 94 111
99 111 110 115 107 96 107 105 108 108 solution: using the same procedure:
83 85 109 89 107 100 103 100 94 116 step 1: H = 120 ; L = 83
106 101 108 105 101 120 90 100 112 107 step 2: R = H – L = 120 – 83 = 37
107 102 90 105 87 118 94 117 108 100 step 3: the no. of classes is given.
91 88 120 106 107 106 107 106 100 97 we can say that k = 10.
98 103 106 106 106 106 110 107 94 97 step 4. the size of the class interval
114 99 96 100 106 103 110 109 101 107 c = R = 37 = 3.7 or 4
k 10
107 95 99 97 92 100 113 101 106 106
119 114 96 107 108 112 97 106 105 112
JOHN PAUL D. GUNNAWA
.
Step 5 and 6. Determine the classes and the frequency of each class.
Classes f x Class
boundaries
83 - 86 2 84.5 82.5 – 86.5
87 - 90 5 88.5 86.5 – 90.5
91 - 94 8 92.5 90.5 – 94.5
95 - 98 11 96.5 94.5 – 98.5
99 - 102 15 100.5 98.5 – 102.5
103 - 106 26 104.5 102.5 – 106.5
107 - 110 15 108.5 106.5 – 110.5
111 - 114 9 112.5 110.5 – 114.5
115 - 118 5 116.5 114.5 – 118.5
119 - 122 4 120.5 118.5 – 122.5
n = 100
JOHN PAUL D. GUNNAWA
Derived Frequency Distribution
we can construct other frequency distributions like the relative frequency
distribution and the cumulative frequency distribution.
Relative Frequency Distribution
it is given set of data shows the proportion in percent the frequency of each
class to the total frequency. Denoted by %f.
%f = f x 100
n
where %f – the relative frequency for each class interval
f – the frequency of each class
n – the sample size
The relative frequency of the first interval can be obtained as follow: 11 - 22
%f = 3 x 100 = 5%
60
JOHN PAUL D. GUNNAWA
We can continue converting class frequency to percent, then we shall come up
with the relative frequency distribution below. (use example 1)
Classes f x Classes Boundaries %f
11 - 22 3 16.5 10.5 – 22.5 5
23 - 34 5 28.5 22.5 – 34.5 8.33
35 - 46 11 40.5 34.5 – 46.5 18.33
47 - 58 19 52.5 46.5 – 58.5 31.67
59 - 70 14 64.5 58.5 – 70.5 23.33
71 - 82 6 76.5 70.5 – 82.5 10
83 - 94 2 88.5 82.5 – 94.5 3.33
n = 60 Total = 99.99%
JOHN PAUL D. GUNNAWA
.
Cumulative Frequency Distribution
this distribution can be obtained by simply adding the class
frequency.
Two types of Cumulative Frequency Distribution
1. Less than Cumulative frequency distribution
refers to the distribution whose frequencies are less than or
below the upper class boundary they correspond to.
<cumf
2. Greater than cumulative frequency distribution
refers to the distribution whose frequencies are greater than or
above the lower class boundary they correspond to.
>cumf JOHN PAUL D. GUNNAWA
Less than and greater than cumulative frequency distribution
Classes f x Classes Boundaries %f <cumf >cumf
11 - 22 3 16.5 10.5 – 22.5 5 3 60
23 - 34 5 28.5 22.5 – 34.5 8.33 8 57
35 - 46 11 40.5 34.5 – 46.5 18.33 19 52
47 - 58 19 52.5 46.5 – 58.5 31.67 38 41
59 - 70 14 64.5 58.5 – 70.5 23.33 52 22
71 - 82 6 76.5 70.5 – 82.5 10 58 8
83 - 94 2 88.5 82.5 – 94.5 3.33 60 2
n = 60 Total = 99.99%
JOHN PAUL D. GUNNAWA
HISTOGRAM
refer to the data presentation that uses bars in presenting the frequencies of
each class.
JOHN PAUL D. GUNNAWA
MEASUREMENT OF CENTRAL TENDENCY
MEAN
one of the simplest and most efficient measures of central tendency.
it is the value obtained by adding the values in the distribution and dividing the
sum by the total number of values.
MEAN FOR UNGROUP DATA
To compute the mean for ungroup data:
̅x = sum of all the values in the distribution
number of values in the distribution
̅x = ∑x
n
Example 1. consider the following values.
21, 10, 36, 42, 39, 52, 30, 25,26
Compute the value of the mean
JOHN PAUL D. GUNNAWA
Solution: to compute for the value of the mean.
̅x = ∑x
n
= 21 + 10 + 36 + 42 + 39 + 52 + 30 + 25 + 26
9
= 281
9
̅x = 31.22
Example 2. The age of 15 students in a certain class were taken.
15, 18, 17, 16, 19, 21, 18, 23, 24, 18, 16, 17, 20, 21, 19
Solution: To compute for the value of the mean,
̅x = 15 + 18 + 17 + 16 + 19 + 21 + 18 + 23 + 24 + 18 + 16 + 17 + 20 + 21 + 19
15
= 282
15
̅x = 18.80
JOHN PAUL D. GUNNAWA
WEIGHTED MEAN
Formula: ̅x = ∑wx
∑w
Where x = represents the item value
w = represent the weight associated to x
Example: suppose we are interested in computing the weighted mean grade of the
student in our previous as shown below.
Student No. of units (w) Grade (x)
1 3 2.0
2 3 3.0
3 5 1.25
4 1 3.0
5 2 2.5
6 3 2.5
JOHN PAUL D. GUNNAWA
To compute the value of the weighted mean,
̅x = ∑wx
∑w
= 3(2.0) + 3(3.0) + 5(1.25) + 1(3.0) + 2(2.5) + 3(2.5)
3+3+ 5+1+2+3
= 36.75
17
̅x = 2.16
The weighted mean can also be computed by constructing
another column representing the products of the item values and
their corresponding weights.
JOHN PAUL D. GUNNAWA
Example 2: Suppose we want to compute the weighted mean grade of the student
in our example using vertical addition. If we let x be the grade of the student and w
be the number of units per subject.
Student No. of units (w) Grade (x) wx
1 3 2.0 6.0
2 3 3.0 9.0
3 5 1.25 6.25
4 1 3.0 3.0
5 2 2.5 5.0
6 3 2.5 7.5
∑w = 17 ∑w x= 36.75
̅x = ∑wx
∑w
= 36.75
17
x̅ = 2.16 JOHN PAUL D. GUNNAWA
MEAN FOR GROUPED DATA
To compute the value of the mean of a data presented in a frequency
distribution, we shall consider two methods:
1. midpoint method
2. Unit Deviation method
The formula is:
̅x = ∑fx
n
Where: f – represent the frequency of each class
x – the midpoint of each class
n – the number of frequencies or sample size
Steps: 1. Get the midpoint of each class
2. Multiply each midpoint by its corresponding frequency
3. Get the sum of the products in step 2
4. Divide the sum obtained in step 3 by the total number of frequencies. The
result shall be rounded off to two decimal
JOHN PAUL D. places.
GUNNAWA
Example: Consider the frequency distribution of the examination scores of the sixty
students in a statistics class. (MIDPOINT METHOD)
Solution: To be able to compute the value of the mean.
Step 1. Get the midpoint of each class. The midpoint are shown in the third
column. Classes f x
11 - 22 3 16.5
23 - 34 5 28.5
35 - 46 11 40.5
47 - 58 19 52.5
59 - 70 14 64.5
71 - 82 6 76.5
83 - 94 2 88.5
Step 2: Multiply each midpoint by its corresponding frequency. The product are
shown in the 4th column.
Classes f x fx
11 - 22 3 16.5 49.5
23 - 34 5 28.5 142.5
35 - 46 11 40.5 445.5
47 - 58 19 52.5 997.5
59 - 70 14 64.5 903
71 - 82 6 76.5 459
83 - 94 2 88.5 177
JOHN PAUL D. GUNNAWA
.
Step 3. Get the sum of the product in step 2.
Classes f x fx
11 - 22 3 16.5 49.5
23 - 34 5 28.5 142.5
35 - 46 11 40.5 445.5
47 - 58 19 52.5 997.5
59 - 70 14 64.5 903
71 - 82 6 76.5 459
83 - 94 2 88.5 177
n = 60 ∑fx = 3,174
JOHN PAUL D. GUNNAWA
.
Step 4: Divide the result in step 3 by the sample size. The
result is the mean of the distribution.
̅x = ∑fx
n
= 3,174
60
̅ x = 52.90
JOHN PAUL D. GUNNAWA
Example 2: Consider the frequency distribution of the ages of 75 mayors. Compute the
mean age of the mayors.
Classes f x fx
Solution: Using the same procedure. 25 - 30 3 27.5 82.5
31 - 36 6 33.5 201
37 - 42 11 39.5 434.5
43 - 48 27 45.5 1,228.5
49 - 54 16 51.5 824
55 - 60 7 57.5 402.5
61 - 66 4 63.5 254
67 - 72 1 69.5 69.5
n = 75 ∑fx = 3,496.5
̅x = ∑fx
n
= 3,496.5
75
̅ x = 46.62 JOHN PAUL D. GUNNAWA
UNIT DEVIATION METHOD
The formula is: ̅x = xₐ + (∑fd)c
n
Where: ̅x - represents the assumed mean
f – the frequency of each class
d – the unit deviation
c – the size of the class interval
n – the sample size
Follow the step:
1. Choose an assumed mean by getting the midpoint of any interval
2. Construct the unit deviation column
3. Multiply the frequencies by their corresponding unit deviation. Add the
products.
4. Divide the sum in step 3 by the sample size
5. Multiply the result in step 4 by the size of the class interval
6. Add the value obtained in step 5 to the assumed mean. The obtained result
which is the mean should be rounded off two decimal places.
JOHN PAUL D. GUNNAWA
Example 1. compute the value of the mean of the data. Using the unit deviation
method.
Solution:
Step 1. choose an assumed mean. Classes f
11 - 22 3
23 - 34 5
35 - 46 11
47 - 58 19
59 - 70 14
71 - 82 6
83 - 94 2
n = 60
JOHN PAUL D. GUNNAWA
Step 2. Construct the unit deviation column.
Classes f d
11 - 22 3 -3
23 - 34 5 -2
35 - 46 11 -1
47 - 58 19 0
59 - 70 14 1
71 - 82 6 2
83 - 94 2 3
JOHN PAUL D. GUNNAWA
Step 3. Multiply the frequencies by their corresponding unit deviation. Add the products.
Classes f d fd
11 - 22 3 -3 -9
23 - 34 5 -2 -10
35 - 46 11 -1 -11
47 - 58 19 0 0
59 - 70 14 1 14
71 - 82 6 2 12
83 - 94 2 3 6
∑fd = 2
JOHN PAUL D. GUNNAWA
Step 4, 5 and 6.
̅x = xₐ + (∑fd)c
n
= 52.5 + ( 2 )12
60
= 52.5 + ( 24 )
60
= 52.5 + 0.4
̅x = 52.9
JOHN PAUL D. GUNNAWA
MEDIAN
is a potential measure defined as the middlemost value in the
distribution.
MEDIAN FOR UNGROUP DATA
it is always a must that the values be arranged in terms of
magnitude either from lowest to highest or vice versa.
let ῀x be the median.
῀x = ᵡ(n + 1) if n is odd
2
= ᵡ(n) + ᵡ(n + 1)
2 2 If n is even
2 JOHN PAUL D. GUNNAWA
Example 1. find the median of the following values.
21, 10, 36, 42, 39, 52, 30, 25, 26
Solution: Before identifying the value of the median, it is
necessary that the values be arranged in terms of magnitude.
10, 21, 25, 26, 30, 36, 39, 42, 52
since n = 9 and is odd,
῀x = ᵡ(n + 1)
2
= ᵡ(9 + 1)
2
= x₅ (refers to the fifth value)
῀x = 30 JOHN PAUL D. GUNNAWA
Example 2. The following values are the number of students of the first 8 classes in a
certain college taken for inspection:
21, 25, 26, 30, 36, 39, 42, 55
Determine the median.
Solution: The values are already arranged in terms of magnitude. Since n = 8 and is
even,
ᵡ(n) + ᵡ(n + 1)
῀x = 2 2 .
2
ᵡ(8) + ᵡ(8 + 1)
῀x = 2 2 .
2
= x₄ + x₅
2
= 30 + 36
2
῀x = 33 JOHN PAUL D. GUNNAWA
MEDIAN FOR GROUP DATA
The computing formula for grouped data is given below.
Where:
x₁ - refers to the lower boundary
cumfₐ - the cumulative frequency before the median class
f – the frequency of the median class
To be able to apply, we shall follow the steps below.
1. Get ½ of the total number of values.
2. Determine the value of cumf
3. Determine the median class.
4. Determine the lower boundary and the frequency of the median class and the size of the class
interval.
5. Substitute the values obtained in step 1-4 . Round off the final result to two decimal places.
JOHN PAUL D. GUNNAWA
Example: Compute the value of the median of the examination scores of the
students in Statistics.
Solution: We shall first construct the less than cumulative frequency column.
Classes F <cumf
11 - 22 3 3
23 - 34 5 8
35 - 46 11 19
19 cumf
47 - 58 19 f 38
median class
59 - 70 14 52
71 - 82 6 58
83 - 94 2 60
JOHN PAUL D. GUNNAWA
Steps:
1. n = 60 = 30
2 2
2. cumfₐ = 19
3. Median class: 47 – 58
4. x₁ₐ = 46.5 ; fₐ = 19 ; c = 12
JOHN PAUL D. GUNNAWA
.
Example 2. A researcher is conducting an investigation regarding the income of the
alumni of a certain university 5 years after graduation. The monthly incomes of the
200 respondents were taken and are presented below.
Classes f
3,500 – 4,999 6
5,000 – 6,499 23
6,500 – 7,999 36
8,000 – 9,499 40
9,500 – 10,999 59
11,000 – 12,499 20
12,500 – 13,999 8
14,000 – 15,499 6
15,500 – 16,999 2
n = 200
Determine the median of the monthly income of the 200 respondents.
JOHN PAUL D. GUNNAWA
Solution: By using same procedure.
Classes f <cumf
3,500 – 4,999 6 6
5,000 – 6,499 23 29
6,500 – 7,999 36 65
8,000 – 9,499 40 105
9,500 – 10,999 59 164 median class
11,000 – 12,499 20 184
12,500 – 13,999 8 192
14,000 – 15,499 6 198
15,500 – 16,999 2 200
JOHN PAUL D. GUNNAWA
.
Steps:
JOHN PAUL D. GUNNAWA
MODE
this type of average is the simplest both in concept and in application.
MODE FOR UNGROUPED DATA
the value of the mode can be obtained through inspection, thus, no computation
is needed.
example: 1. Consider the following sets of measurements.
a: 21, 23, 16, 15, 26, 27, 19, 24
b: 31, 21, 16, 15, 21, 27, 19, 18
c: 17, 25, 24, 25, 27, 19, 19, 24
Solution:
a: there is no value that occurred more than once.
b: ᶺx = 21
c: ᶺx = 25, 19, 24
JOHN PAUL D. GUNNAWA
MODE FOR GROUP DATA
it is necessary to identify the class interval that contains the mode.
MODAL CLASS
contain the highest frequency in the distribution.
To be able to apply, we shall consider the following step:
1. Determine the modal class
2. Get the value of d₁.
3. Get the value of d₂.
4. Get the lower boundary of the modal class.
5. Apply the formula by substitution the values obtained in the proceeding steps.
JOHN PAUL D. GUNNAWA
Example 1: consider the frequency distribution of the examination scores of sixty
students. Compute the mode of that distribution.
Solution: The frequency distribution of the data is reproduced below.
Classes f
11 - 22 3
23 - 34 5
35 - 46 11
47 - 58 19
modal class
59 - 79 14
71 - 82 6
83 - 94 2
JOHN PAUL D. GUNNAWA
To get the value of d₁ and d₂, we have
d₁ = 19 – 11 = 8
d₂ = 19 – 14 = 5
Substituting these values:
JOHN PAUL D. GUNNAWA
QUARTILES
refer to the values that divide the distribution into four equal parts. These are 3
quartiles represented by Q₁, Q₂, and Q₃.
Q₁ refers to the value in the distribution that falls on the first one fourth of the
distribution arranged in magnitude.
Q₂ this value correspond to the median.
Q₃ this value corresponds to three fourths of the distribution.
| <…………………..3/4……………….>|
|<……………1/2…………..>|
|<.…1/4…..>|
The procedure of the First (Q₁ ) Second (Q₂ ), and Third(Q₃ )
Quartile in a given Set of Data
JOHN PAUL D. GUNNAWA
For grouped data, the procedure of computing the value of the first and the third
quartiles is similar to that of computing the value of the median. The computing of
the kth quartile where k = 1, 2, 3 is given by
Where x₁ - lower boundary of the kth quartile class
cumfₐ - cumulative frequency before the kth quartile class
fₒ - frequency before the kth quartile class
JOHN PAUL D. GUNNAWA
.
Example: 1. For purpose of illustration, let us again reproduce the less than
frequency distribution of the results of examination of 60 students, let us compute
the value of the first quartile and the third quartile.
Solution: The frequency distribution is reproduced below.
Classes F <cumf
11 - 22 3 3
23 - 34 5 8
35 - 46 11 19
47 - 58 19 f 38
59 - 70 14 52
71 - 82 6 58
83 - 94 2 60
To compute the value of Q₁, we shall follow the procedure used in computing the
value of the median.
JOHN PAUL D. GUNNAWA
Steps:
1. Get ¼ of the total number of frequencies.
n/4 = 60/4 = 15
2. Get the value of the cumulative frequency before the first quartile class.
cumfₐ = 8
3. Determine the first quartile class.
1st quartile class: 35 – 46
4. Determine the lower boundary of the first quartile class.
x₁ₐ = 34.5
5. Get the frequency of the first quartile class.
fₐ₁ = 11
6. Substitute all values and compute.
JOHN PAUL D. GUNNAWA
To compute the third quartile , we shall follow the procedure used in computing
the value of the first quartile:
Steps:
1. 3n/4 = 3(60)/4 = 180/4 = 45
2. cumfₐ = 38
3. Third quartile class: 59 – 70
4. x₁ₐ = 58.5
5. fₐ₃ = 14
6.
JOHN PAUL D. GUNNAWA
DECILES
it a set of data is divided into ten equal parts, then we have nine points of
division. The method of computing the values of these measurements is just the
same as in the median or quartiles.
JOHN PAUL D. GUNNAWA
Example: Using the same frequency distribution as in the preceding example,
determine the value of the following:
a. D₃
b. D₅
Solution: the frequency distribution is reproduced below.
Classes F <cumf
11 - 22 3 3
23 - 34 5 8
35 - 46 11 19
47 - 58 19 f 38
59 - 70 14 52
71 - 82 6 58
83 - 94 2 60
JOHN PAUL D. GUNNAWA
.
a. To compute the value of D₃, we have
1. 3n/10 = 3(60)/10 = 180/10 = 18
2. cumfₐ= 8
3. 3rd decile class: 35 – 46
4. x₁ₐ = 34.5
5. fₐ₃ = 11
JOHN PAUL D. GUNNAWA
b. To compute the value of D₅, we shall have
1. 5n/10 = 5(60)/10 = 300/10 = 30
2. cumfₐ = 19
3. Fifth decile class: 47 -58
4. x₁ₐ = 46.5 ; fₒ₅ = 19
JOHN PAUL D. GUNNAWA
PERCENTILE
refer to those values that divide a distribution into one hundred equal parts.
JOHN PAUL D. GUNNAWA
.
Example: Determine the value of the 43rd percentile using the same frequency
distribution of the median, quartile or decile.
Solution: The frequency distribution is reproduced below.
Classes F <cumf
11 - 22 3 3
23 - 34 5 8
35 - 46 11 19
47 - 58 19 f 38
59 - 70 14 52
71 - 82 6 58
83 - 94 2 60
JOHN PAUL D. GUNNAWA
To compute the value of P₄₃, we have
Steps:
1. 43n/100 = 43(60)/100 = 2,580/100 = 25.8
2. cumfₐ = 19
3. 43rd percentile class: 47 – 58
4. x₁ₐ = 46.5
5. fₐ₄₃ = 19
JOHN PAUL D. GUNNAWA
.
SEMI-INTER QUARTILE RANGE OR QUARTILE DEVIATION
this value is obtained by getting one half the difference between the third and
the first quartiles.
Q = Q₃ - Q₁
2
Example 1: The examination scores of 50 students in a statistics class resulted to
the following values Q₃ = 75.43 and Q₁ = 54.24. Determine the value of the semi-
inter quartile range.
Solution: To be able to compute the value of Q.
Q = Q₃ - Q₁
2
Q = 75.4 – 54.24
2
Q = 21.19
2
Q = 10.6
JOHN PAUL D. GUNNAWA
MEASURES OF VARIATION
these are the values used to determine the scatter of values in a distribution.
RANGE
R=H–L
Example 1. Determine the value of the range of the data.
Solution:
Classes F
11 - 22 3 R=H-L
23 - 34 5 = 94.5 – 10.5
35 - 46 11 R = 84
47 - 58 19 f
59 - 70 14
71 - 82 6
83 - 94 2
n = 60
JOHN PAUL D. GUNNAWA
Example 2: suppose the performance ratings of 100 faculty members of a certain
college were taken and are presented in frequency distribution as follows:
Classes F
71 - 74 3
75 - 78 10
79 - 82 13
83 - 86 18
87 - 90 25
91 - 94 19
95 - 98 12
n = 100
Compute the value of the semi-inter quartile range.
JOHN PAUL D. GUNNAWA
Solution: we will first compute the value of Q₃ and Q₁, since only the frequency
distributions is given.
Classes F <cumf
71 - 74 3 3
75 - 78 10 13
79 - 82 13 26
1st quartile class
83 - 86 18 44
87 - 90 25 69
91 - 94 19 88
95 - 98 12 100 3rd quartile class
JOHN PAUL D. GUNNAWA
.
JOHN PAUL D. GUNNAWA
.
AVERAGE DEVIATION
refers to the arithmetic mean of the absolute deviations of the values from the
mean of the distribution.
AVERAGE DEVIATION FOR UNGROUPED DATA
AD = ∑│x - ⁻x│
n
We shall follow the steps below:
1. Arrange the values in column according to magnitude.
2. Compute the value of the mean (⁻x).
3. Determine the deviation (x - ⁻x)
4. Convert the deviation in step 3 into positive deviation. Use the absolute value
sign │x - ⁻x│.
5. Get the sum of the absolute deviation in step 4.
6. Divide the sum in step 5 by n.
JOHN PAUL D. GUNNAWA
Example. Consider the following values.
x: 13, 16, 9, 6, 15, 7, 11
Determine the value of the average deviation.
Solution: First, we arrange the values in vertical column and then we compute the
value of the mean.
x
6
7
9
11
13
15
16
∑x = 77
⁻x =∑x = 77 = 11
n 7
JOHN PAUL D. GUNNAWA
We get the deviations of the individual items from the mean.
x x - ⁻x
6 6 – 11 = -5
7 7 – 11 = -4
9 9 – 11 = -2
11 11 – 11 = 0
13 13 – 11 = 2
15 15 – 11 = 4
16 16 – 11 = 5
Notice that some of the deviations from the mean are negative. Hence, we make
an assumption that all deviation are positive deviations by introducing the absolute
value sign. Adding all these absolute deviation.
JOHN PAUL D. GUNNAWA
x x - ⁻x │x - ⁻x│
6 -5 5
7 -4 4
9 -2 2
11 0 0
13 2 2
15 4 4
16 5 . 5 .
∑│x - ⁻x│= 22
If we divide the sum of the absolute deviation by n, then we were able to compute
the value of the average deviation.
AD = ∑│x - ⁻x│ = 22 = 3.14
n 7
JOHN PAUL D. GUNNAWA
AVERAGE DEVIATION FOR GROUPED DATA
For grouped data, the computing formula for the mean absolute deviation or
average deviation is given by:
AD = ∑f│x - ⁻x│
n
We shall follow the steps bellow.
1. Compute the value of the mean.
2. Get the deviation by using the expression x - ⁻x.
3. Multiply the deviation by its corresponding frequency.
4. Add the result in step 3.
5. Divide the sum in step 4 by n.
JOHN PAUL D. GUNNAWA
Example: compute the value of the average deviation of the frequency distribution.
Solution: we compute the value of the mean.
Classes f x fx
11 – 22 3 16.5 49.5
23 – 34 5 28.5 142.5
35 – 46 11 40.5 445.5
47 – 58 19 52.5 997.5
59 – 70 14 64.5 903
71 – 82 6 76.5 459
83 – 94 2 88.5 177
n = 60 ∑fx = 3,174
⁻x = ∑fx = 3,174 = 52.9
n 60
JOHN PAUL D. GUNNAWA
.
Second, we construct the deviation column x - ⁻x.
Classes f x fx x - ⁻x
11 – 22 3 16.5 49.5 -36.4
23 – 34 5 28.5 142.5 -24.4
35 – 46 11 40.5 445.5 -12.4
47 – 58 19 52.5 997.5 -0.4
59 – 70 14 64.5 903 11.6
71 – 82 6 76.5 459 23.6
83 – 94 2 88.5 177 35.6
JOHN PAUL D. GUNNAWA
.
Third, we construct the deviation to positive deviation.
Classes f x fx x - ⁻x │ x - ⁻x│
11 – 22 3 16.5 49.5 -36.4 36.4
23 – 34 5 28.5 142.5 -24.4 24.6
35 – 46 11 40.5 445.5 -12.4 12.4
47 – 58 19 52.5 997.5 -0.4 0.4
59 – 70 14 64.5 903 11.6 11.6
71 – 82 6 76.5 459 23.6 23.6
83 – 94 2 88.5 177 35.6 35.6
JOHN PAUL D. GUNNAWA
Fourth, we multiply the positive deviations by their corresponding frequencies.
Classes f x fx x - ⁻x │ x - ⁻x│ f│ x - ⁻x│
11 – 22 3 16.5 49.5 -36.4 36.4 109.2
23 – 34 5 28.5 142.5 -24.4 24.6 123
35 – 46 11 40.5 445.5 -12.4 12.4 136.4
47 – 58 19 52.5 997.5 -0.4 0.4 7.6
59 – 70 14 64.5 903 11.6 11.6 162.4
71 – 82 6 76.5 459 23.6 23.6 141.6
83 – 94 2 88.5 177 35.6 35.6 71.2
AD = ∑f│x - ⁻x│
n
= 751.4
60
AD = 12.52
JOHN PAUL D. GUNNAWA
VARIANCE FOR UNGROUPED DATA
If we let s² be the variance, then we have
s² = ∑(x - ⁻x)
n
We shall consider the following steps.
1. Compute the value of the mean.
2. Get the deviation of each value from the mean.
3. Square the deviations.
4. Calculate the sum of the squared deviations.
5. Divide the sum by the total number of values.
JOHN PAUL D. GUNNAWA
Example: Compute the value of the variance of the following measurements.
13, 5, 7, 9, 10, 17, 15, 12
Solution: for simplicity, we shall first arrange these values in magnitude in a vertical
column, using the steps indicated above.
x x - ⁻x (x - ⁻x)²
5 -6 36
7 -4 16 ⁻x = ∑n = 88 = 11
9 -2 4 n 8
10 -1 1
12 1 1
13 2 4 s² = ∑(x - ⁻x)² = 114 = 14.25
15 4 16 n 8
17 6 . 36 .
∑x = 88 ∑(x - ⁻x)² = 114
JOHN PAUL D. GUNNAWA
s² = ∑x ² - (∑x)²
n n
Example: Find the value of the variance of the distribution used in example 1 of this
section.
Solution: by simply following the steps above, we hall have
x x²
5 25
7 49 s² = ∑x ² - (∑x)²
9 81 n n
10 100 = 1,082 – (88)²
12 144 8 8
13 169 = 135.25 – (7,744)
15 225 64
17 289 = 135.25 - 121
∑x = 88 ∑x² = 1,082 s² = 14.25
JOHN PAUL D. GUNNAWA
VARIANCE FOR GROUPED DATA
s² = ∑f(x - ⁻x)²
n
We shall consider the steps below.
1. Compute the value of the mean.
2. Determine the deviation x - ⁻x by subtracting the mean from
the midpoint of each class interval.
3. Square the deviation obtained in step 2.
4. Multiply the frequency by their corresponding squared
deviations.
5. Add the results in step 4.
6. Divide the result in step 5 by the sample size.
JOHN PAUL D. GUNNAWA
Example. Calculate the value of the variance of the distribution.
Solution: First, we shall reproduce the frequency distribution.
Classes f x fx x - ⁻x (x - ⁻x)² f(x - ⁻x)²
11 – 22 3 16.5 49.5 -36.4 1,324.96 3,974.88
23 – 34 5 28.5 142.5 -24.4 595.36 2,976.8
35 – 46 11 40.5 445.5 -12.4 153.76 1,691.36
47 – 58 19 52.5 997.5 -0.4 0.16 3.04
59 – 70 14 64.5 903 11.6 134.56 1,883.84
71 – 82 6 76.5 459 23.6 556.96 3,341.76
83 – 94 . 2 . 88.5 . 177 . 35.6 1,267.36 . 2,534.72 .
n = 60 ∑fx =3,174 ∑f(x-⁻x) = 16,406.4
⁻x = ∑fx = 3,174 = 52.9
n 60
s² = ∑f(x - ⁻x)² = 16,406.4 = 273.44
n 60
JOHN PAUL D. GUNNAWA
s² = [∑fd² - (∑fd)²]c²
n n
The procedure for the computation of the variance using the unit
deviation method is as follow:
1. Determine the unit deviation column.
2. Multiply the frequency by its corresponding unit deviation.
3. Square the unit deviation.
4. Multiply the squared unit deviation by its corresponding
frequency.
5. Add the result in step 2.
6. Add the result in step 4.
7. Apply the formula through substitution.
JOHN PAUL D. GUNNAWA
Example: Calculate the data use in example 3, compute the value of the variance using the
unit deviation method.
Solution:
Classes f d fd d² fd²
11 – 22 3 -3 -9 9 27
23 – 34 5 -2 -10 4 20
35 – 46 11 -1 -11 1 11
47 – 58 19 0 0 0 0
59 – 70 14 1 14 1 14
71 – 82 6 2 12 4 24
83 – 94 . 2 . 3 . 6 . 9 . 18 .
n = 60 ∑fd= 2 ∑fd²= 114
s² = [∑fd² - (∑fd)²]c²
n n
s² = [114 - (2)²]12²
60 60
s² = 273.44 JOHN PAUL D. GUNNAWA
• STANDARD DEVIATION
it is extracting the square root of the value of the
variance will give the value.
s²
Example. Suppose the value of the variance of a set of
measurements was computed to be equal to 128.93.
Determine the value of the standard deviation.
Solution: The standard deviation is simply the square root
of the variance.
s² = = 11.35
JOHN PAUL D. GUNNAWA