MATHEMATICAL STATISTICS
UNIT-III
MEASURE OF DISPERSION
The measures of central tendency describes the central part of values in the data set
appears to concentrate around a central value called average. But these measures do not reveal
how these values are dispersed (spread or scattered) on each side of the central value. Therefore
while describing data set it is equally important to know how for the item in the data are close
around or scattered away from the measures of central tendency.
Definitions
Dispersion is the measure of the variation of the items – A L Bowley
Dispersion or spread is the degree of the scatter or variation of the variable about a central value
– Brooks & Dick
Example
Look at the runs scored by the two cricket players in a test match:
Players I Innings II Innings Mean
Player 1 0 100 50
Player 2 40 60 50
Comparing the averages of the two players we may come to the conclusion that they were
playing alike. But player 1 scored 0 runs in I innings and 100 in II innings. Player 2 scored
nearly equal runs in both the innings. Therefore it is necessary for us to understand data by
measuring dispersion.
Characteristics of a good Measure of Dispersion
An ideal measure of dispersion is to satisfy the following characteristics.
(i) It should be well defined without any ambiguity.
(ii) It should be based on all observations in the data set..
(iii) It should be easy to understand and compute.
(iv) It should be capable of further mathematical treatment.
(v) It should not be affected by fluctuations of sampling.
(vi) It should not be affected by extreme observation.
Dr.J.Kesavan, Biostatistician Research, VMRF(DU) Page 1
MATHEMATICAL STATISTICS
Types of measures of dispersion
The measures of dispersion are classified in two categories, namely
1. Absolute measures
2. Relative measures
1. Absolute Measures
In involve the units of measurements of the observations. For example,
(i) the dispersion of salary of employees is expressed in rupees
(ii) the variation of time required for workers is expressed in hours.
Such measures are not suitable for comparing the variability of the two data sets which
are expressed in different units of measurements.
Range
Raw Data
Range is defined as difference between the largest and smallest observations in the data
set. Range (R) = Largest value in the data set (L) - Smallest value in the data set (S)
R=L–S
Grouped Data
For grouped frequency distribution of values in the data set, the range is the difference
between the upper class limit of the last class interval and the lower class limit of first class
interval.
Coefficient of Range
The relative measure of range is called the coefficient of range
Coefficient of Range = (L - S) / (L + S)
Example
The following data relates to the heights of 10 students (in cm’s) in a school. Calculated
the range and coefficient of range
158, 164, 168, 170, 142, 160, 154, 174, 159, 146
Dr.J.Kesavan, Biostatistician Research, VMRF(DU) Page 2
MATHEMATICAL STATISTICS
Solution
L = 174 S = 142
Range = L – S = 174 – 142
Range = 32
Coefficient of range = (L – S) / (L + S)
= (174 – 142) / (174 + 142) = 32 / 316
Coefficient of range = 0.101
Example
Calculate the range and the co-efficient of range for the marks obtained by 100 students
in a school
Marks 60 – 63 63-66 66-69 69-72 72-75
No. of students 5 18 42 27 8
Solution
L = Upper limit of highest class = 75
S = Lower limit of lowest class = 60
Range = L – S
= 75 – 60
Range = 15
Coefficient of range = (L – S) / (L + S)
= (75 - 60) / (75 + 60)
Coefficient of range = 0.111
Merits
Range is the simplest measure of dispersion
It is well defined, and easy to compute
It is widely used in quality control, weather forecasting, stock market variations etc.
Dr.J.Kesavan, Biostatistician Research, VMRF(DU) Page 3
MATHEMATICAL STATISTICS
Limitations
The calculations of range is based on only two values - largest value and smallest value.
It is largely influenced by two extreme values.
It cannot be computed in the case of open-ended frequency distributions
It is not suitable for further mathematical treatment.
Mean Deviation
The Mean Deviation (MD) is defined as the arithmetic mean of the absolute deviations of
the individual values from a measure of central tendency of the data set. It is also known as the
average deviation.
The measure of central tendency is either mean or median. If the measure of central
tendency is mean (or median), then we get the mean deviation about the mean (or median).
MD (about mean)
D
D xx
N
MD (about median)
D m
Dm x median
N
The coefficient of mean deviation (CMD) is the relative measure of dispersion
corresponding to mean deviation and it is given by
MD( Mean or median)
Coefficient of Mean Deviation (CMD)
mean or median
Example
The following are the weights of 10 children admitted in a hospital on a particular day.
Find the mean deviation about mean, median and their coefficients of mean deviation.
7 4 10 9 15 12 7 9 9 18
Solution
n = 10; Mean : x
x 100 10
n 10
Median: The arranged data is : 4 7 7 9 9 9 10 12 15 18
9 9 18
Median 9
2 2
Dr.J.Kesavan, Biostatistician Research, VMRF(DU) Page 4
MATHEMATICAL STATISTICS
Marks (x) D xx Dm x Median
7 3 2
4 6 5
10 0 1
9 1 0
15 5 6
12 2 3
7 3 2
9 1 0
9 1 0
18 8 9
Total = 100 30 28
MD (about mean)
D 30 3
N 10
Mean Deviation about mean 3
Coefficient of Mean Deviation about mean 0.3
x 10
MD (about median)
D m
28
2.8
N 10
Mean Deviation about median 2.8
Coefficient of Mean Deviation about median 0.311
median 9
Example
Calculate mean deviation from the following series
x 10 11 12 13 14
f 3 12 18 12 3
Solution
x f c.f D x median f D
10 3 3 2 6
11 12 15 1 12
12 18 33 0 0
13 12 45 1 12
14 14 48 2 6
Total = 60 48 36
Dr.J.Kesavan, Biostatistician Research, VMRF(DU) Page 5
MATHEMATICAL STATISTICS
MD
f D
N
N 1 48 1
Median Size of th item 24.5th item
2 2
Size of 24.5th item is 12 hence median = 12
MD
f D
36
0.75
N 48
Example
Find the median and mean deviation of the following data
Size 0-10 10-20 20-30 30-40 40-50 50-60 60-70
f 7 12 18 25 16 14 8
Solution
Size f c.f x D x median f D
0-10 7 7 5 30.2 211.4
10-20 12 19 15 20.2 242.4
20-30 18 37 25 10.2 183.6
30-40 25 62 35 0.2 5.0
40-50 16 78 45 9.8 156.8
50-60 14 92 55 19.8 277.2
60-70 8 100 65 29.8 238.4
Total 100 1314.8
N 100
Q1 Size of th item 50th item
2 2
Median lies in the class 30 - 40.
N
c. f .
Median L 2 i
f
L = 30, N/2 = 50, c.f. = 37, f = 25, i = 10
50 37
Median 30 10 30 5.2 35.2
25
MD
D
1314.8
13.148
n 100
Dr.J.Kesavan, Biostatistician Research, VMRF(DU) Page 6
MATHEMATICAL STATISTICS
Standard Deviation
Definition: Standard deviation is the positive square root of average of the deviations of all the
observation taken from the mean. It is denoted by a Greek letter σ.
Ungrouped data
x1 , x2 , x3 , , xn are the ungrouped data then standard deviation is calculated by
1. Actual mean method: Standard deviation
d 2
,d x x
N
d d
2 2
Standard deviation ,d x A
N
2. Assumed mean method:
N
Grouped Data (Discrete)
fd fd
2 2
, d x A
N N
Where, f = frequency of each class interval
N = total number of observation (or elements) in the population
x = mid – value of each class interval
where A is an assumed A.M.
Grouped Data (continuous)
fd fd
2
x A
2
N
C, d
N C
Where, f = frequency of each class interval
N = total number of observation (or elements) in the population
c = width of class interval
x = mid – value of each class interval
where A is an assumed A.M.
Merits
Dr.J.Kesavan, Biostatistician Research, VMRF(DU) Page 7
MATHEMATICAL STATISTICS
The value of standard deviation is based on every observation in a set of data.
It is less affected by fluctuations of sampling.
It is the only measure of variation capable of algebraic treatment.
Limitations
Compared to other measures of dispersion, calculations of standard deviation are
difficult.
While calculating standard deviation, more weight is given to extreme values and less to
those near mean.
It cannot be calculated in open intervals.
If two or more data set were given in different units, variation among those data set
cannot be compared.
Example
Weights of children admitted in a hospital is given below calculate the standard deviation of
weights of children.
13 15 12 19 10.5 11.3 13 15 12 9
Solution
A.M ., x
x
n
13 15 12 19 10.5 11.3 13 15 12 9
10
129.8
10
x 12.98
x d = x – 12.98 d2
13 0.02 0.0004
15 2.02 4.0804
12 -0.98 0.9604
19 6.02 36.2404
10.5 2.48 6.1504
11.3 -1.68 2.8224
13 0.02 0.0004
15 2.02 4.0804
12 -0.98 0.9604
9 -3.98 15.8404
Dr.J.Kesavan, Biostatistician Research, VMRF(DU) Page 8
MATHEMATICAL STATISTICS
129.8 71.136
Standard deviation
d 2
,d x x
N
71.136
10
2.67
Example
The wholesale price of a commodity for seven consecutive days in a month is as follows:
Days 1 2 3 4 5 6 7
Commodity/price/quintal 240 260 270 245 255 286 264
Calculate the variance and standard deviation.
Solution
We assume the A.M. = 255.
Observations (x) d = x - A d2
240 -15 225
260 5 25
270 15 225
245 -10 100
255 0 0
286 31 961
264 9 81
35 1617
d d
2 2
Variance 2
n n
2
1617 35
7 7
231 52 231 25
Dr.J.Kesavan, Biostatistician Research, VMRF(DU) Page 9
MATHEMATICAL STATISTICS
Variance 2 206
Standard deviation variance
206 14.35
Example
A study of 100 engineering companies gives the following information
Profit (`in Crore) 0-10 10-20 20-30 30-40 40-50 50-60
No. of Companies 8 12 20 30 20 10
Calculate the standard deviation of the profit earned.
Solution
A = 35, C = 10
Profit Mid-value x A
d f fd fd2
(Rs. In Crore) (x) C
0-10 5 -3 8 -24 72
10-20 15 -2 12 -24 48
20-30 25 -1 20 -20 20
30-40 35 0 30 0 0
40-50 45 1 20 20 20
50-60 55 2 10 20 40
Total 100 -28 400
fd fd
2 2
Standatd deviation C
N N
200 28
2
C
100 100
2 0.078 10
13.863
Dr.J.Kesavan, Biostatistician Research, VMRF(DU) Page 10
MATHEMATICAL STATISTICS
Coefficient of Variation
The coefficient of variation is obtained by dividing the standard deviation by the mean
and multiplying it by 100. Symbolically,
Coefficient of Variation C.V . 100
x
Merit
The CV is independent of the unit in which the measurement has been taken, but standard
deviation depends on units of measurement. Hence one should use the coefficient of
variation instead of the standard deviation.
Limitations
If the value of mean approaches 0, the coefficient of variation approaches infinity. So the
minute changes in the mean will make major changes.
Example
The scores of two batsmen, A and B, in ten innings during a certain season, are as under:
A: Mean score = 50; Standard deviation = 5
B: Mean score = 75; Standard deviation = 25
Find which of the batsmen is more consistent in scoring
Solution
Coefficient of Variation C.V . 100
x
5
C.V for batsman A 100 10%
50
25
C.V for batsman B 100 33.33%
75
The batsman with the smaller C.V is more consistent.
Since for Cricketer A, the C.V is smaller, he is more consistent than B.
Dr.J.Kesavan, Biostatistician Research, VMRF(DU) Page 11
MATHEMATICAL STATISTICS
Example
The weekly sales of two products A and B were recorded as given below
Product A 59 75 27 63 27 28 56
Product B 150 200 125 310 330 250 225
Find out which of the two shows greater fluctuations in sales.
Solution:
For comparing the fluctuations in sales of two products, we will prefer to calculate coefficient of
variation for both the products.
Product A: Let A = 56 be the assumed mean of sales for product A.
A = 56
Sales (x) Frequency (f) fd fd2
d=x–A
27 2 -29 -58 1682
28 1 -28 -28 784
56A 1 0 0 0
59 1 3 3 9
63 1 7 7 49
75 1 19 19 361
Total 7 -57 2885
x A
fd 56 57 47.86
N 7
fd fd
2
2885 57
2 2
Variance 2
412.14 66.30 345.84
N N 7 7
SD Variance 345.84 18.59
18.59
C.V .( A) 100 100 38.84%
x 47.86
Dr.J.Kesavan, Biostatistician Research, VMRF(DU) Page 12
MATHEMATICAL STATISTICS
Product B: Let A = 225 be the assumed mean of sales for product B.
A = 225
Sales (x) Frequency (f) fd fd2
d=x–A
125 1 -100 -100 10000
150 1 -75 -75 5625
200 1 -25 -25 625
225 1 0 0 0
250 1 25 25 625
310 1 85 85 7225
330 1 105 105 11025
Total 7 15 35125
x A
fd 225 15 227.14
N 7
fd fd 35125 15 2
2 2
Variance 2
5017.85 4.59 5013.26
N N 7 7
SD Variance 5013.26 70.80
70.80
C.V .( B) 100 100 31.17%
x 227.14
Since the coefficient of variation fro product A is more than that of product B,
Therefore the fluctuation in sales of product A is higher than product B.
Inter Quartile Range or Quartile Deviation
The quartile Q 1 , Q 2 and Q 3 have been introduced and studied
Inter quartile range is defined as:
Inter quartile Range (IQR) = Q3 – Q1
Quartile Deviation is defined as, half of the distance between Q 1 and Q 3 ,
Quartile Deviation Q.D = (Q 3 – Q1 ) / 2
Q3 Q1 2 Q3 Q1
Coefficient of Q.D.
Q3 Q1 2 Q3 Q1
Dr.J.Kesavan, Biostatistician Research, VMRF(DU) Page 13
MATHEMATICAL STATISTICS
Example
Find out the value of quartile deviation and its coefficient from the following data:
Roll No 1 2 3 4 5 6 7
Marks 20 28 40 12 30 15 50
Solution
Marks arranged in ascending order
12 15 20 28 30 40 50
N 1
Q1 Size of th item
4
7 1
Size of th item
4
Size of 2nd item is 15. Thus Q1 = 15
N 1
Q3 Size of 3 th item
4
3 8
Size of th item
4
Size of 6nd item is 40. Thus Q3 = 40
Q3 Q1
Q.D =
2
40 15
2
Q.D 12.5
Q3 Q1
Coefficient of Q.D.
Q3 Q1
40 15 25
40 15 55
Coefficient of Q.D. 0.455
Dr.J.Kesavan, Biostatistician Research, VMRF(DU) Page 14
MATHEMATICAL STATISTICS
Example
Compute coefficient of quartile deviation from the following data:
Marks 10 20 30 40 50 60
No. of Students 4 7 15 8 7 2
Solution:
Marks Frequency c.f
10 4 4
20 7 11
30 15 26
40 8 34
50 7 41
60 2 43
N 1 43 1
Q1 Size of th item 11th item
4 4
Size if 11th item is 20. Thus Q 1 = 20
N 1 43 1
Q3 Size of 3 th item 3 33rd item
4 4
Size if 33rd item is 40. Thus Q 3 = 40
Q3 Q1 40 20
Q.D. 10
2 2
Q3 Q1 40 20 20
Coefficient of Q.D. 0.333
Q3 Q1 40 20 60
Example
Calculate quartile deviation and the coefficient of quartile deviation from the following data
Wages in Rupees per hour Less than 35 35-37 38-40 41-43 Over 43
Number of wage earners 14 62 99 18 7
Solution
Wages
f c.f
(Rs. Per hour)
Less than 35 14 14
35-37 62 76
38-40 99 175
41-43 18 193
Over 43 7 200
Dr.J.Kesavan, Biostatistician Research, VMRF(DU) Page 15
MATHEMATICAL STATISTICS
Q3 Q1
Q.D =
2
N 200
Q1 Size of th item 50th item
4 4
Q1 lies in the class 35 – 37.
N
c. f .
Q1 L 4 i
f
L = 35, N/4 = 50, c.f. = 14, f = 62, i = 2
50 14
Q1 35 2 35 1.16 36.16
62
3N 3 200
Q3 Size of th item 150th item
4 4
Q3 lies in the class 38 – 40.
3N
c. f .
Q1 L 4 i
f
L = 38, 3N/4 = 150, c.f. = 76, f = 99, i = 2
150 76
Q3 38 2 38 1.49 39.49
99
39.49 36.16
Q.D. 1.67
2
Q3 Q1 39.49 36.16 3.33
Coefficient of Q.D. 0.044
Q3 Q1 39.49 36.16 75.65
Merits
It is not affected by the extreme (highest and lowest) values in the data set.
It is an appropriate measure of variation for a data set summarized in open-ended class
intervals.
It is a positional measure of variation; therefore it is useful in the cases of erratic or
highly skewed distributions.
Dr.J.Kesavan, Biostatistician Research, VMRF(DU) Page 16
MATHEMATICAL STATISTICS
Limitations
The QD is based on the middle 50 per cent observed values only and is not based on all
the observations in the data set, therefore it cannot be considered as a good measure of
variation.
It is not suitable for mathematical treatment.
It is affected by sampling fluctuations.
The QD is a positional measure and has no relationship with any average in the data set.
Dr.J.Kesavan, Biostatistician Research, VMRF(DU) Page 17