Statistics
Statistics
AS A
TOOL
DATA MANAGEMENT
PART 1
INTRODUCTION
01 BASIC STATISTICAL CONCEPTS
03 MEASURES OF DISPERSION
04 NORMAL DISTRIBUTION
LEARNING OUTCOME:
1. Organize and present data in forms that are both meaningful and
useful to decisions makers;
2. Use a variety of statistical tools to process and manage numerical
data;
3. Use the methods of linear regression and correlations to predict the
value of a variable given certain conditions; and
4. Advocate the use of statistical data in making important decisions.
BASIC STATISTICAL CONCEPTS
Statistics is a branch of mathematics that deals with
the theory and method of collecting, organizing,
presenting, analyzing, and interpreting data.
Infinite Population
Example:
ID number of a student, numbers on the uniform
jerseys of basketball players, and plate numbers of
vehicles.
FOUR LEVELS OF MEASUREMENT
• ORDINAL LEVEL the numbers are used not only to
classify items but also to reflect some rank or order of
the individuals or objects.
Example:
Winners in a singing contest, hotel classifications,
military ranks.
FOUR LEVELS OF MEASUREMENT
• INTERVAL LEVEL the measurements have all the
properties of ordinal data; in addition they also
measure the degree of differences between any two
classes.
Example:
Temperature reading in Celsius, Scores in
Intelligence tests, and Scholastic Grades of students.
FOUR LEVELS OF MEASUREMENT
• RATIO LEVEL has the same properties as interval level
but the zero point value of this level is absolute.
Example:
Height, Weight, Time, and Volume.
PRESENTATION OF DATA
1. TEXTUAL PRESENTATION
- presents data in a paragraph form which combines text and numerical
facts in statistical report.
Example:
Data in business, Finance, Economics, or Industries
2. TABULAR PRESENTATION
- presents data in tables. It gives more precise, systematic, and orderly
presentation of data in rows and columns.
3. GRAPHICAL PRESENTATION
- is an effective method of presenting statistical results and can present
clear pictures of the data. There are several kinds of graphs, and some of these
are as follows:
PRESENTATION OF DATA
a. Bar Graph consists of bars either vertically or horizontally and
usually constructed for comparative purposes. The lengths of the
bars represent the frequencies or magnitudes of the quantities
being compared.
30
25
20
15
10
5
0
25 28 31 34 37 40 43 46 49 52 55 58 61 64
PRESENTATION OF DATA
b. Line Graph shows the relationship between two or more sets of
quantities. It may show the relationship between two variables, and
it is best used to establish trends.
30
25
20
15
10
0
25 28 31 34 37 40 43 46 49 52 55 58 61 64
PRESENTATION OF DATA
c. Pie Chart is used to represent quantities that make up a whole .
It is a circular diagram cut into subdivisions. It can be constructed
using percents or the actual figures.
INFO. TECH
6%
CRIMINOLOGY
30%
MEASURES OF CENTRAL TENDENCY
One type of measure being used to describe a data
set which yields information about the center or
majority of a group of numbers.
It is the middlemost value in the data set. It is found between the highest and the lowest
value in a rank order distribution and divides the distribution into two equal parts.
Example:
Find the median of the following set of measurements.
25, 41, 56, 34, 28, 67, 49, 37, 52
Solution:
Arrange the data in ascending order
25, 28, 34, 37, 41, 49, 52, 56, 67
Locate the middlemost value. The middlemost value is the median.
𝑥 = 41
MEDIAN
Example:
Find the median of the given data set.
4.5, 2.8, 5.6, 9.2, 3.5, 6.7, 3.9, 8.4
Solution:
Arrange the data in ascending order
2.8, 3.5, 3.9, 4.5, 5.6, 6.7, 8.4, 9.2
Locate the middlemost value. The middlemost value is the median.
Example:
Solution:
a. In the first data set , 12 has the highest frequency in the distribution;
therefore, the mode is
𝑥 = 12
MODE
Example:
Find the mode of the following data sets.
b. 3.4, 2.2, 3.5, 3.4, 2.2, 2.6, 2.1, 3.9, 2.2, 3.4
Solution:
b. In the second data set , two values have the highest frequency;
therefore , there are two modes and the distribution is called bimodal.
The modes are:
Solution:
c. In the third data set , there is no value that occurs most often; therefore, there is
No mode in the distribution.
𝑅 = 𝐻𝑉 − 𝐿𝑉
RANGE
Set A: 81, 83, 87, 90, 94
Set B: 84, 86, 87, , 88, 90
Set C: 85, 86, 87, 88, 89
Solution:
Set A: Range = HV – LV Set B: Range = HV – LV
= 94 – 81 = 90 – 84
= 13 =6
Set C: Range = HV – LV
= 89 – 85
=4
𝒙 𝒙−𝒙 (𝒙 − 𝒙)𝟐
77 −14 196
84 −7 49
91 0 0
98 7 49
105 14 196
Total
(𝑥 − 𝑥 )2 = 490
𝒙 𝒙−𝒙 (𝒙 − 𝒙)𝟐
77 −14 196
84 −7 49
91 0 0
98 7 49
105 14 196
Total
(𝑥 − 𝑥 )2 = 490
THE NORMAL DISTRIBUTION
- The normal distribution was proposed by C.F. Gauss (1777 – 1855),
thus this was named Gaussian distribution, a model for relative
frequency distribution of errors such as errors of measurement.
1 𝑋−𝜇
𝑒 − 2 ( 𝜎 )2
𝑌=
𝜎 2𝜋
Where:
𝑌 = represents the height of the curve at a particular value of X
𝑋 = represents any score in the distribution
𝜎 = represents the standard deviation of the population
𝜇 = represents the population mean
𝜋 = 3.1416
𝑒 = 2.7183
THE NORMAL DISTRIBUTION
THE NORMAL DISTRIBUTION
Four-Step Process in Finding the Areas
Under the Normal Curve Given a 𝑧 −Value
Step 1. Express the given 𝑧 −value into a three-digit form.
Step 2. Using the 𝑧 −Table, find the first two digits on the left
column.
Step 3. Match the third digit with the appropriate column on the
right.
Step 4. Read the area at the intersection of the row and the column.
This is the required area.
Example 1
Find the area that corresponds to 𝑧 = 2.
Finding the area that corresponds to 𝑧 = 2 is the same as finding the
area between 𝑧 = 0
and 𝑧 = 2.
STEPS SOLUTION
Express the given z-value to a 3 digit number z = 2 becomes z = 2.00
In the table, find the first two digit number on the first (2.0)
column
Find the third digit on the first row (.00)
Read the probability at the intersection of row 2.0 and The area at the intersection is 0.4772.
column .00
Example 2
Find the area that corresponds to 𝒛 = 𝟏. 𝟒𝟓.
Finding the area that corresponds to 𝑧 = 1.45 is the same as finding the area
between 𝑧 = 0 and 𝑧 = 1.45.
STEPS SOLUTIONS
Express the given z-value to a 3 digit number z = 1.45 (it is already expressed to a 3 digit
number, so just copy the given)
In the table, find the first two digit number on the (1.4)
first column
Find the third digit on the first row (.05)
Read the probability at the intersection of row 1.4 The area at the intersection is 0.4265.
and column .05
Example 3
Find the area that corresponds to 𝑧 = – 2.5
The negative in the given z-value as 𝑧 = – 2.5 indicates the location of where that area is in
the curve. Since it is a negative, meaning the measurement X that corresponds to 𝑧 = – 2.5 is
located at the left side of the curve. Since the normal curve is symmetrical about the mean,
therefore finding the area of 𝑧 = – 2.5 is the same as finding the area of 𝑧 = 2.5.
STEPS SOLUTION
Express the given z-value to a 3 digit number z = 2.5 becomes z = 2.50
In the table, find the first two digit number on the first (2.5)
column
Find the third digit on the first row (.00)
Read the probability at the intersection of row 2.5 and The area at the intersection is 0.4938.
column .00
Identifying Regions of Areas
Under the Normal Curve
Find the area of the regions between any two specific z-values under the normal curve.
Example 1
Find the area of the region between 𝑧 = 1 and 𝑧 = 3.
Solution:
The problem states that we need to find the area of the region
between the given two z-values (red line).
In doing so, we need to add the area that
corresponds to z = 1 to the area that
corresponds to 𝑧 = −1.
Thus, we have 0.3413 + 0.3413 = 0.6826
Therefore, the area between 𝑧 = 1 and 𝑧 = −1 is 0.6826.
Identifying Regions of Areas
Under the Normal Curve
Example 3.
Find the area of the region between 𝑧 = 2 and 𝑧 = −1.5.
Solution:
The area of the region described by the point z = 2 indicates the
area from z = 0 to z = 2. Using the z-Table, it has corresponding
area of 0.4772.
So with 𝑧 = −1.5, it describes the area of the region from
𝑧 = 0 to 𝑧 = −1.5, with corresponding area of 0.4332
(using the z-Table).
𝑧 = 1.5 has the same area with 𝑧 = −1.5 since the curve is
symmetrical about the mean. Therefore, finding the area of
𝑧 = −1.5 is the same as finding the area of 𝑧 = 1.5. The negative
sign indicates only the location of the z-value under the curve.
Because it is a negative, it is located at the left side of the curve.
Identifying Regions of Areas
Under the Normal Curve
The problem states that we need to find the area of the
region between the given two z-values (red line).
• 𝑷(𝒛 > 𝒂) this notation represents the idea stating the probability that the z-value is above a
• 𝑷(𝒛 < 𝒂) this notation represents the idea stating the probability that the z-value is below a where
a and b are z-score values.
• 𝑷(𝒛 = 𝒂) = 𝟎 this notation represents the idea stating the probability that the z-value is equal to
a is 0. This notation indicates that a z-value is equal to exactly one point on the curve. With that
single point, a line can be drawn signifying the probability can be below or above it. That is why, for
a z-value to be exactly equal to a value its probability is equal to 0.
Determining Probabilities
Let us familiarize some of the terms involved in using notations.
Steps Solution
Draw a normal curve. Locate the required z-
values. Shade the required region.
Locate from the z-Table the corresponding z = 2 has a corresponding area of 0.4772 z = 3
areas of the given z-values. has a corresponding area of 0.4987
With the graph, decide on what operation will be With the given graph, the operation to be used
used to identify the proportion of the area of the is subtraction. P(2 < z <3) = 0.4987 – 0.4772 =
region. Use probability notation to avoid lengthy 0.0215
expressions.
Make a concluding statement. The required area between z = 2 and z = 3 is
0.0215.
EXAMPLE 2: Find the proportion of the area below 𝒛 = 𝟏.
Steps Solution
Draw a normal curve. Locate the required z-value.
Shade the required region.
Locate from the z-Table the corresponding areas z = 1 has a corresponding area of 0.3413. This
of the given z-value. area signifies only from z = 0 to z = 1.
With the graph, decide on what operation will be With the given graph, the operation to be used is
used to identify the proportion of the area of the addition. P(z < 1) = 0.5000+ 0.3413 = 0.8413 This
region. Use probability notation to avoid lengthy is so because the area of the region from z = 0 to
expressions. its left is 0.5 since it represents half of the normal
curve. With the property that the curve has area
equal to 1, therefore half of its area signifies
0.5000 or 0.5.
Make a concluding statement. The required area below z = 1 is 0.8413.
EXAMPLE 3: Find the area that the z-value is exactly equal to 1.
Steps Solution
Draw a normal curve. Locate the required z-value.
Shade the required region.
With the graph, decide on what operation will be With the given graph, there is no need to decide
used to identify the proportion of the area of the on what operation to be used since as defined, if a
region. Use probability notation to avoid lengthy z-value is equal to exactly one number then its
expressions. probability or the proportion of the area of the
region is automatically 0.
P(z = 1) = 0
𝒁 – score
A z – score measures the distance between a data point and the mean using
standard deviations.
STEPS Solution