CSC 1311: STATISTICS FOR PHYSICAL SCIENCE AND ENGINEERING
By HABEEBAH ADAMU KAKUDI PH.D.
1) Measures of location, partition and dispersion
2) Elements of Probability
3) Probability distribution: binomial Poisson, geometric, hypergeometric, negative-
binomial, normal Poisson
4) Estimation (Point and interval) and tests of hypotheses concerning population means,
proportions and variances, confidence interval
5) Regression and correlation
6) Non-parametric tests
7) Contingency table analysis
8) Introduction to design of experiments
9) Analysis of variance
1) Measures of location, partition and dispersion by H. A. Kakudi
After studying this lesson, you will be able to : ·
• explain the meaning of dispersion through examples;
• define various measures of dispersion - range, mean deviation, variance
and standard deviation;
• calculate mean deviation from the mean of raw and grouped data;
• calculate variance and standard deviation for raw and grouped data; and
illustrate the properties of variance and standard deviation.
To explain the meaning of dispersion, let us consider an example.
1
Two sections of 10 students each in class X in a certain school were given a
common test in Mathematics (40 maximum marks). The scores of the students
are given below :
Section A : 6 9 11 13 15 21 23 28 29 35
Section B: 15 16 16 17 18 19 20 21 23 25
The average score in section A is 19.
The average score in section B is 19.
Let us construct a dot diagram, on the same scale for section A and section B
The position of mean is marked by an arrow in the dot diagram.
Clearly, the extent of spread or dispersion of the data is different in section A
from that of B.
The measurement of the scatter of the given data about the average is said to
be a measure of dispersion or scatter.
Population Vs Sample Data
A population data set contains all members of a specified group (the entire list
of possible data values). [Utilizes the count n in formulas.]
Example: The population may be "ALL people living in Nigeria."
A sample data set contains a part, or a subset, of a population. The size of a
sample is always less than the size of the population from which it is taken.
[Utilizes the count n - 1 in formulas.]
Example: The sample may be "SOME people living in Nigeria."
When calculating the formulas for mean absolute deviation (MAD),
variance, and standard deviation, it is important to know if you are working
with an entire population (where you have all of the possible data), or if you
are working with only a sample (a part) of the data. In addition, if you are using
a sample of the data, you need to know if you will be making generalizations
about the entire population, based upon this sample.
2
Range
In the above cited example, we observe that
(i) the scores of all the students in section A are ranging from 6 to 35;
ii) the scores of the students in section B are ranging from 15 to 25.
iii) The difference between the largest and the smallest scores in section
A is 29 (35-6).
iv) The difference between the largest and smallest scores in section B is
10 (25-15).
v) Thus, the difference between the largest and the smallest value of a
data, is termed as the range of the distribution.
Properties of Range
Mean Absolute Deviation of Raw Data
Mean absolute deviation helps us get a sense of how "spread out" the values
in a data set are. It is the average distance between each data value and the
mean. The mean absolute deviation is the "average" of the "positive distances"
of each point from the mean.
The larger the MAD, the greater variability there is in the data (the data is
more spread out).
The MAD helps determine whether the set's mean is a useful indicator of the
values within the set.
The larger the MAD, the less relevant is the mean as an indicator of the values
within the set.
3
Properties of MAD
4
Variance
To get a more representative idea of spread we need to take into account the
actual values of each score in a data set. The absolute deviation, variance
and standard deviation are such measures.
Unlike the absolute deviation, which uses the absolute value of the deviation
in order to "rid itself" of the negative values, the variance achieves positive
values by squaring each of the deviations instead.
As a measure of variability, the variance is useful. If the scores in our group of
data are spread out, the variance will be a large number.
Conversely, if the scores are spread closely around the mean, the variance will
be a smaller number. However, there are two potential problems with the
variance.
First, because the deviations of scores from the mean are 'squared', this gives
more weight to extreme scores. If our data contains outliers (in other words,
one or a small number of scores that are particularly far away from the mean
and perhaps do not represent well our data as a whole), this can give undo
weight to these scores.
Secondly, the variance is not in the same units as the scores in our data set:
variance is measured in the units squared. This means we cannot place it on
our frequency distribution and cannot directly relate its value to the values in
our data set.
Therefore, the figure of 1, our variance, appears somewhat arbitrary.
Calculating the standard deviation rather than the variance rectifies this
problem. Nonetheless, analysing variance is extremely important in some
statistical analyses, discussed in other statistical guides.
5
Standard Deviation of Raw Data
The standard deviation is a measure of the spread of scores within a set of
data. Usually, we are interested in the standard deviation of a population.
We are normally interested in knowing the population standard deviation
because our population contains all the values we are interested in. Therefore,
you would normally calculate the population standard deviation if: (1) you
have the entire population or (2) you have a sample of a larger population, but
you are only interested in this sample and do not wish to generalize your
findings to the population.
However, in statistics, we are usually presented with a sample from which we
wish to estimate (generalize to) a population, and the standard deviation is no
exception to this. Therefore, if all you have is a sample, but you wish to make a
statement about the population standard deviation from which the sample is
drawn, you need to use the sample standard deviation.
6
Properties of Variance/Standard deviation
7
Range of Grouped Data
Mean Absolute Deviation for Grouped Data
Where 𝑓𝑖 = frequency, 𝑥𝑖 = observation value , 𝑥̅ , N = number of observations (count)
8
9
Variance/Standard Deviation for Grouped Data-Method I
10
Variance/Standard Deviation for Grouped Data-Method II
If 𝑥̅ is not given or if 𝑥̅ is in decimals in which case the calculations become rather tedious,
we employ the alternative formula for the calculation of SD as given below:
11
Range for grouped data
12
What is the cumulative frequency of the modal class in this distribution?
Height No. of
patients
10-19 1
20-29 19
30-39 9
40-49 13
50-59 3
13