Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
6 views55 pages

Unit II

Uploaded by

asnamirza2020
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views55 pages

Unit II

Uploaded by

asnamirza2020
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 55

Unit -2

Measure of Central Tendency

2
Measures of Central tendency
Measures of Location or Central Tendency or Measure of Location gives an idea about the central
part of the distribution.
The main objective of Measure of Central Tendency are:
 To condense data in a single value.
 To facilitate comparison between data
Requisites of a Good Measure of Central Tendency
(i) It should be rigidly defined.
(ii) It should be simple to understand and easy to calculate.
(iii) It should be based upon all values of given data.
(iv) It should be capable of further mathematical treatment.
(v) It should have sampling stability.
(vi) It should be not be unduly affected by extreme values.

There are different types of averages, each has its own advantages and disadvantages.

Measure of Central Tendency

Location (positional average) Mathematical Average

Partition values Mode Arithmetic Geometric Harmonic


Mean Mean Mean

Median Quartiles Deciles Percentiles

1. Arithmetic Mean:
Case I: Discrete data
Arithmetic Mean (A.M.) of a set of observations is their sum divided by the number of observations
Let are observations, then

Example 1: Let there are five numbers 5, 4, 3, 7, 6, then arithmetic mean is

3
Case II: Discrete or ungrouped frequency distribution
In case of frequency distribution , where is the frequency of the variable , then

where
Example 2: Let observations and their frequencies are given as
X: 3 5 7 8 9 10
f: 2 3 1 2 3 2
Then arithmetic mean is calculated as

Case III: Grouped or continuous frequency distribution

In case of grouped or continuous frequency distribution, is taken as the midpoint of the


corresponding class.

Example 3a: Distribution of marks of students in a class are given as below:


Marks: 0 - 10 10 - 20 20 - 30 30 - 40 40 - 50
No. of Students 5 3 9 2 6
Find the arithmetic mean.
Solution:
Marks No. of Mid Point
Students
0 - 10 5 5 25
10 - 20 3 15 45
20 - 30 9 25 225
30 - 40 2 35 70
40 - 50 6 45 270
25 635

Therefore, arithmetic mean is

4
Example 3b: The following table gives the frequency distribution of the number of orders received
each day during the past 50 days at the office of a mail order:

No. of Order: 10 - 12 13 - 15 16 - 18 19 - 21
Frequency : 4 12 20 14

Find the arithmetic mean.


Solution:
No. of Order Frequency Mid Point

10 - 12 4 11 44
13 - 15 12 14 168
16 - 18 20 17 340
19 - 21 14 20 280
50 832

Therefore, arithmetic mean is

Properties of Arithmetic Mean:

1. Arithmetic mean is dependent of change of origin and scale.


Example 4: Calculate mean marks of students by short cut method using data given in example 3.

Marks No. of Mid Point


(CI) Students

0 - 10 5 5 -2 -10
10 - 20 3 15 -1 -3
20 - 30 9 25 0 0
30 - 40 2 35 1 2
40 - 50 6 45 2 12
25 1

Here and .
Then,

Therefore,

5
2. Algebraic sum of the deviation of a set of values from their arithmetic mean is always zero.
That is

where is the arithmetic mean.


Example 5: Find the sum of the deviations of the values 3, 4, 6, 8,14 from their mean
3  4  6  8  14 35
X   7
5 5

 X i  X   (3  7)  4  7  6  7  8  7  14  7  0
n

i 1
3. The sum of the square of the deviation of a set of values is minimum when taken about mean.
For the frequency distribution ,
let , A is any arbitrary point.
Then Z is minimum when .
4. Mean of combined (composite) series
Let there be k series having observations respectively

Series Nos. 1 2 3  k


Observations 
    

Means 

then the mean of combined series is given by

and for , the above formula reduces to

Example 6: In a class average weight of 25 boys is 48 kg. and average weight of 15 girls is 40 kg.
Find the average weight of students in the class.
Solution: Here

6
Therefore, the average weight of students is

Example 7: In a test of English average marks of Arts student is 7 and average marks of science
students is 9. The mean marks of all the students is 8. Calculate the percentage(%) of Arts and
Science students.
Solution: Let

Therefore,

or

That means the percentage of Arts and Science students are 50 each.

5. Weighted Average or Weighted Mean


A method of computing a kind of arithmetic mean of a set of numbers in which some elements of the
set carry more importance (weight) than others.
Let there be n observations and are their respective weights in the set.
Then weighted mean is calculated using the formula

Example 8: Grades are often computed using a weighted average.


Suppose that homework counts 10%, quizzes 20% and tests 70%.
If Mr. Zaid has a homework grade of 92, a quiz grade of 68, and a test grade of 81, then Zaid's overall
grade is

6. Trimmed Mean
p “ ” p and the
smallest set of values. For example, the t% trimmed mean is found by eliminating the largest t% and
~
smallest t% and computing the average of the remaining values. It is denoted by X .

Example 9: Find the sample mean and the 10% trimmed mean of the following data set
0.28, 0.32, 0.36, 0.37, 0.38, 0.42, 0.43, 0.43, 0.47, 0.53.

7
Solution:
The sample mean is
0.28  0.32  0.36  0.37  0.38  0.42  0.43  0.43  0.47  0.53
X
10
 0.3990
To compute the 10% trimmed mean, we remove the smallest value and the largest value because
there are 10 values. Hence
~ 0.32  0.36  0.37  0.38  0.42  0.43  0.43  0.47
X
8
 0.3975
Remark:
The trimmed mean is, in general, more insensitive to outliers (extreme values) than the sample mean.

Merits of Arithmetic Mean:


1. It is rigidly defined.
2. It is easy to understand and easy to calculate.
3. It is based upon all values of the given data.
4. It is capable of further mathematical treatment.
5. It is not much affected by sampling fluctuations.
Demerits of Arithmetic Mean:
1. It cannot be calculated if any observation is missing.
2. It cannot be calculated for the data with open-end classes.
3. It is affected by extreme values.
4. It cannot be located graphically.
5. It may be number which is not present in the data.
6. It cannot be calculated for the data representing qualitative characteristic.

Solved Examples:
Problem 1: The sum of deviations of a certain numbers of observations measured from 4 is 72 and
the sum of deviations of observations measured from 7 is -3. Find the number of observations and
their mean.
Solution:
Let be the required number of observations.
We have given that
, therefore (i)
and , therefore (ii)
Subtracting the above two equations we get, and hence .
Therefore, required mean is given by

8
Problem 2: The mean weight of 98 students is found to be 50 lbs. It is later discovered that the
frequency of the class interval (30- 40) was wrongly taken as 8 instead of 10. Calculate the correct
mean.
Solution:
We have given that
Incorrect mean and
Therefore,

Incorrect mean

Therefore,

Now,

Also the

Therefore, the correct mean

Problem 3: The average marks of three batches of students having 70, 50 and 30 students
respectively are 50, 55 and 45. Find the average marks of all the 150 students, taken together.
Solution:
We have given that
Batch I II III
Average Marks

No. of Students

Let be the average marks of all 150 students taken together.


Therefore,

or, marks.

9
Practice Exercises:
1. The arithmetic mean of 3 numbers is 60. If two numbers are 50 and 60, what is the third number?
2. If the mean of numbers 28, x, 42, 78 and 104 is 62, then what is the mean of 128, 255, 511, 1023
and x.
3. The average of nine numbers is 9. When a tenth number is added the average of ten number is
also 9. What is the tenth number?
4. The arithmetic mean of a set of 10 numbers is 20. If each number is first multiplied by 2 and then
increased by 5, then what is the mean of new numbers?
5. The mean of 25 observations is 36. The mean of first 13 observations is 32 and that of last 13
observations is 39. What is the value of 13th observation?
6. The average age of 06 persons living in a house is 23.5 years. Three of them are majors and their
average age is 42 years. The difference in ages of the three minor children is same. What is the
mean of the ages of minor children?
7. The mean age of combined group of men and women is 25 years. If the mean age of group of men
is 26 and that of group of women is 21, then what is the percentage of men and women in the
group?

Median
Median of a distribution is the value of the variable which divides it into two equal parts, that is the
value such that the number of observations above it is equal to the number of observations below it.
The median is a positional average.
Application:
Median is only average to be used while dealing with qualitative data which cannot be measured
quantitatively but can be arranged in ascending or descending order of magnitude.
Case I: Discrete data
Arrange all the observations in ascending or descending order.
 If the number of observations are odd, then

Example 10: Let the observations are 3, 2, 5, 1, 9, 8, 7


Then arrange these observations like
1, 2, 3, 5, 7, 8, 9 Ascending order
or 9, 8, 7, 5, 3, 2, 1 Descending order
Here , so

 If the number of observations are even, then

10
Example 11: Let the observations are 3, 2, 5, 1, 9, 8, 7, 6
Then arrange these observations like
1, 2, 3, 5, 6, 7, 8, 9 Ascending order
or 9, 8, 7, 6, 5, 3, 2, 1 Descending order
Here , so

Case II: Discrete frequency distribution


Steps for calculating median
 Find where
 Locate cumulative frequency (c.f.) just greater than
 The value of corresponding to that c.f. is the median

Example 12: Find median for the following data

1 3 3
3 2 5
4 9 14
6 6 20
9 2 22
10 5 27
12 9 36
15 4 40

Solution: Here

 Cumulative frequency (c.f.) just greater than 20 is 22.
 The value of corresponding to 22 is 9.
Therefore,
Median = 9

11
Case III: Continuous frequency distribution (grouped data)

Steps for calculating median

 Find where
 Locate cumulative frequency (c.f.) just greater than
 The class corresponding to that c.f. is the median class
 Then use following formula to calculate median

where,

Example 13: Find the median wage (salary) per day of workers from the following data

Wages in dollar 20 - 30 30 - 40 40 - 50 50 - 60 60 - 70
No. of Workers 3 5 20 10 2

Solution:
Wages No. of Workers
(in dollar)
20 - 30 3 3
30 - 40 5 8
40 - 50 20 28
50 - 60 10 39
60 - 70 2 40

Here N = 40, therefore



 Cumulative frequency (c.f.) just greater than 20 is 28
 The class corresponding to 28 is (40 - 50). This is the median class,
 Thus the

12
Merits:
1. It is rigidly defined.
2. It is easy to calculate and understand
3. It can be located merely by inspection.
4. It is not all affected by extreme values.
5. It can be calculated for the distributions with open-end classes.
Demerits:
1. In case of even number of observations median cannot be determined exactly.
2. It is not based on all observations.
3. It is not suitable for algebraic treatment.
4. As compared to mean, it is affected much by fluctuations of sampling.

Quartiles: Three points which divide the whole distribution into four equal parts are called quartiles.

A B
Q1 Q2 Q3

Calculation of Quartiles:
Case I: Discrete data
 The First quartile Q1 is the value such that 25% of the ranked data are smaller and 75% are
larger.
 X n1 , when n is odd
 4

Q1    
 1  X n  X n  , when n is even
2  1 
  4 4 

 The Second quartile Q2 is the Median.


 The First quartile Q3 is the value such that 75% of the ranked data are smaller and 25% are
larger.
 X 3( n1) , when n is odd
 4

Q3    
 1  X 3n  X 3n  , when n is even
2  1 
  4 4 
Example 14: Find the first quartile, second quartile (median) and third quartile of the following data
of scores:
12 25 15 5 22 7 14 36 53 30 42
Solution: First, arrange the data in ascending order:
5 7 12 14 15 22 25 30 36 42 53

Q1 Q2 (Median) Q3

13
Q1  X (n1)/4  X (111)/4  X 3  12

Q2  X 2(n1)/4  X 2(111)/4  X 6  22  Median

Q3  X 3(n1)/4  X 3(111)/4  X 9  36

Example 15: Find the first quartile, median and third quartile, if 65 is added to Example 14.
Solution: First, arrange the data in ascending order:
5 7 12 14 15 22 25 30 36 42 53

Q1 Q2 (Median) Q3

1  1 1
Q1   X n  X n    X 3  X 4   (12  14)  13
2 1  2 2
 4 4 

1  1 1
Q2   X 2n  X 2n    X 6  X 7   (22  25)  23.5  Median
2 1  2 2
 4 4 

1  1 1
Q3   X 3n  X 3n    X 9  X10   (36  42)  39
2 1  2 2
 4 4 

When the subscript of X is not a whole number (whether n is odd or even), then we use the following
steps
Step1: Calculate Q1 and Q3 using the formulae

Q1  X ( n1) and Q3  X 3( n1)


4 4
Step 2: Select the ranked positions immediately below and above the number calculated.
For example, for 10 values, Q1  X ( n1)  X 11  X 2.75 . So select second and third ranked values.
4 4
With these values, do the following:
 Multiply the larger ranked value by the decimal fraction of the original result (0.75 in the
example).
Multiply the smaller ranked value by 1 minus the decimal fraction of the original result (0.25
for the example, because 1-0.75 is 0.25).
 Add the two products to determine the quartile value
Special case: If the two ranked values selected be the same number, the quartile is that number and
above two multiplication and addition can be skipped.

14
Example 16: Find the first and third quartiles of the following data values:
52 39 44 39 31 40 43 35 44 29
Solution: First, arrange the data in ascending order:

29 31 35 39 39 40 43 44 44 52

Q1 Q2 (Median) Q3

 Calculate Q1  X (n1)  X (101)  X 2.75


4 4
The second and third ranked values are 31 and 35.
1. Multiply the larger ranked value by the decimal fraction of the original result, we get
35  0.75  26.25
2. Multiply the smaller ranked value by 1 minus the decimal fraction of the original result, we get
31 0.25  7.75
3. Therefore final Q1  26.25  7.75  34

 Calculate Q3  X 3( n1)  X 3(101)  X 8.25


4 4
The eighth and ninth ranked values are 44 and 44.
1. Multiply the larger ranked value by the decimal fraction of the original result, we get
44  0.25  11
2. Multiply the smaller ranked value by 1 minus the decimal fraction of the original result, we get
44  0.75  33
3. Therefore final Q3  11  33  44
OR
Since eighth and ninth ranked values are same, that is 44. Therefore Q3  44
Example 17: Find the first and third quartiles of the following data values:
52 39 44 39 31 40 43 35 44
Solution: First, arrange the data in ascending order:

31 35 39 39 40 43 44 44 52

Q1 Q2 (Median Q3

 Calculate Q1  X ( n1)  X (91)  X 2.5


4 4
The second and third ranked values are 35 and 39.
1. Multiply the larger ranked value by the decimal fraction of the original result, we get
39  0.5  19.50
2. Multiply the smaller ranked value by 1 minus the decimal fraction of the original result, we get
35  0.5  17.50
3. Therefore final Q1  19.50  17.50  37

15
 Calculate Q3  X 3( n1)  X 3(91)  X 7.5
4 4
The seventh and eighth ranked values are 44 and 44.
1. Multiply the larger ranked value by the decimal fraction of the original result, we get
44  0.5  22
2. Multiply the smaller ranked value by 1 minus the decimal fraction of the original result, we get
44  0.5  22
3. Therefore final Q3  22  22  44
OR
Since eighth and ninth ranked values are same, that is 44. Therefore Q3  44
Case II: Discrete frequency distribution (ungrouped data)
Steps for calculating quartiles
 Find where
 See cumulative frequency (c.f.) just greater than
 The value of corresponding to that c.f. is the
Example 18: Calculate all quartiles for the given data

1 3 3
2 2 5
3 9 14
4 6 20
5 2 22
6 5 27
7 4 31
8 2 33
9 3 36
10 4 40

Solution: Here , thus


 Cumulative frequency (c.f.) just greater than 10, 20 and 30 are 14, 22, 31 respectively.
 The value of corresponding to 14, 22, 31 are 3, 5, 7 respectively
Therefore,

16
Case III: Continuous frequency distribution (grouped data)
Steps for calculating quartiles
 Find where

 See cumulative frequency (c.f.) just greater than

 The class corresponding to that c.f. is the given quartile class


 Then use following formula to calculate quartiles

where,

Example 19: Calculate first and third quartiles of wage (salary) per day of workers from the
following data
Wages in dollar 20 - 30 30 - 40 40 - 50 50 - 60 60 - 70 70 - 80
No. of Workers 5 8 17 11 4 3

Solution:
Wages No. of Workers
(in dollar)
20 - 30 5 5
30 - 40 8 13
40 - 50 17 30
50 - 60 11 41
60 - 70 4 45
70 - 80 3 48

Here N = 48, therefore


 and

 Cumulative frequency (c.f.) just greater than 12 is 13, so first quartile class is 30 - 40.
 Cumulative frequency (c.f.) just greater than 36 is 41, so third quartile class is 50 - 60.
 Thus the

and

17
Deciles: The nine points which divide the whole distribution into ten equal parts are called deciles.

A B
D1 D2 D9

Method of Calculating Deciles


Case I: Discrete frequency distribution (ungrouped data)
Steps for calculating deciles
 Find where

 See cumulative frequency (c.f.) just greater than

 The value of corresponding to that c.f. is the

Case II: Continuous frequency distribution (grouped data)


Steps for calculating quartiles
 Find where

 See cumulative frequency (c.f.) just greater than

 The class corresponding to that c.f. is the given decile class


 Then use following formula to calculate deciles

where,

Percentiles: The ninety-nine points which divide the whole distribution into hundred equal parts are
called percentiles.

A B
P1 P2 P99

Method of Calculating Percentiles


Case I: Discrete frequency distribution (ungrouped data)
Steps for calculating deciles
 Find where

 See cumulative frequency (c.f.) just greater than

 The value of corresponding to that c.f. is the

18
Case II: Continuous frequency distribution (grouped data)
Steps for calculating quartiles
 Find where

 See cumulative frequency (c.f.) just greater than

 The class corresponding to that c.f. is the given percentile class


 Then use following formula to calculate deciles

where,

Example: In Example 19, calculate D4 , D7 , P30 and P78 .

Mode
Mode is the value which occurs most frequently in a set of observations.
Case I: Discrete frequency distribution
In case of discrete frequency distribution, mode is the value, which corresponds to maximum
frequency.
Example 20: Find mode for the following data

1 4
2 9
3 16
4 25
5 22
6 15
7 7
8 3

Here maximum frequency is 25 and value corresponding to 25 is 4, therefore mode is 4.


Note: In any one or more of the following cases
 If the maximum frequency is repeated.
 If the maximum frequency occurs in the very beginning or at the end of the distribution.
 If there are irregularities in the distribution.
The value of mode is determined by the method of grouping.
Example 21: Find mode of the following distribution
X: 1 2 3 4 5 6 7 8 9 10 11 12
f: 3 8 15 23 35 40 32 28 20 45 14 6

19
Solution:
Here irregularities in the distribution may be observed. So mode will be calculated by grouping
method.
(i) (ii) (iii) (iv) (v) (vi)
1 3
11
2 8 26
23
3 15 46
38
4 23 73
58
98
5 35
75 107
6 40
72 100
7 32
60 80
8 28
48 93
9 20 79
65
10 45 59 65
11 14
20
12 6

Analysis Table
Col. Max. 1 2 3 4 5 6 7 8 9 10 11 12
No. freq.
(i) 45
(ii) 75
(iii) 72
(iv) 98
(v) 107
(vi) 100
Total 1 3 5 3 1 1

Therefore, Mode = 6
Case II: Continuous frequency distribution (grouped data)
 Identify the modal class corresponding to the maximum frequency.
 Calculate the mode using the following formula

where,

20
Example 22: Calculate mode of the following data

Class Interval frequency


(CI)

00 - 10 5
10 - 20 8
20 - 30 7
30 - 40 12
40 - 50 28
50 - 60 20
60 - 70 10
70 - 80 10

Solution: Here maximum frequency is 28, therefore modal class is 40 - 50.


Now,

Remarks:
1. In any one or more of the cases as defined for discrete case modal class is determined by the
method of grouping.
2. If the method of grouping gives the modal class which does not correspond to the maximum
frequency or in some cases , in such cases mode is obtained by using formula

Merits:
1. It is easy to understand and easy to calculate.
2. It is not affected by extreme values or sampling fluctuations.
3. It can be located just by inspection in many cases.
4. It is always present within data.
5. It is applicable for both quantitative and qualitative data.

Demerits:
1. It is not rigidly defined.
2. It is not based upon all values of the given data.
3. It is not capable of further mathematical treatment.

21
Relationship Between Mean, Median and Mode

 In positively skewed distribution, the presence of exceptionally high values affect mean more than
median and mode. Therefore
Mean > Median > Mode

 In negatively skewed distribution, the presence of exceptionally low values depress the mean
most, followed by median and mode. Thus
Mean < Median < Mode

In the above two situation, it can be observed that median always lies between mean and mode.
Thus, if the number of observations in any set of data is large enough to yield a fairly smooth and
moderately skewed distribution, the mean, median and mode are empirically related as
Mode = 3 Median - 2 Mean
 In the case of symmetrical distribution, the mean, median and mode coincide. That is
Mean = Median = Mode

22
Example 23: The marks of 36 students in an entrance test are given below:

Grades (Less than): 40 50 60 70 80 90 100


No. of Students 3 7 13 23 29 33 36

Find Mean, Median and Mode. It is noted that minimum marks of any student is 30.
Solution: The given data is in cumulative frequency form. First, we shall convert this into simple
frequency table.

X  65
Grades f cf X d fd
h
30 - 40 3 3 35 3 9
40 - 50 4 7 45 2 8
50 - 60 6 13 55 1 6
60 - 70 10 23 65 00 00
70 - 80 6 29 75 1 6
80 - 90 4 33 85 2 8
90 - 100 3 36 95 3 9

f  36  fd  0

Mean

X  A  hd

 A h
 fd  65
f
Median
N
Here cf just greater than  18 is 23. Thus, median class is 60 - 70.
2
Therefore

h N  10
Median  lm  m   C pm   60  (18  13)  65
fm  2  10

Mode
By inspection mode lies in the class 60 - 70. Therefore
f m  f p 10  6
Mode  lm   hm  60   10  65
2 f m  f p  f s 20  6  6

Since Mean = Median = Mode. Therefore distribution is symmetrical.

23
Geometric Mean
Geometric Mean (GM) of a set of observations is the nth root of their product.
Case I: Discrete data
Let are observations, then

or

For example, for a sample of two observations 4 and 9

Example 23: Suppose hypothetical weekly income of 9 families living in a particular locality are 70,
15, 75, 500, 8, 45, 250, 40 and 36. Compute geometric mean.
Solution:

70 1.8451
15 1.1761
75 1.8751
500 2.6990
8 0.9031
45 1.6532
250 2.3979
40 1.6021
36 1.5563

Therefore

Geometric Mean (GM)

Example 24: The annual percentage change in the sales of a popular brand of toys for five successive
years was observed as 19.5, 20.8, 30.6, 28.5 and 27.2. Find the average annual percentage change in
sales.
Solution:
Years %age change
1 19.5 119.5 2.07740
2 20.8 120.8 2.08202
3 30.6 130.6 2.11591
4 28.5 128.5 2.10891
5 27.2 127.2 2.10451

24
Therefore

Geometric Mean (GM)

It means that sales have increased by (125.26 - 100.00) = 25.26 %


Case II: Discrete frequency distribution
In case of frequency distribution , where is the frequency of the variable , the
geometric mean is calculated by using formula

or ,
n
where N   fi
i 1

Note: For computation of weighted geometric mean fi is replaced with the weight of i th
observation wi .
Example 25: Suppose there are five observations 5, 12, 20, 50, 90 and their corresponding
frequency in the data set are 15, 10, 7, 4 and 2. Compute geometric mean the given data set.

Solution:

Xi fi log X i fi log X i

5 16 0.6990 11.1840
12 11 1.0792 11.8712
20 7 1.3010 09.1070
50 4 1.6990 06.7960
90 2 1.9031 03.8062
 fi  40  log fi  6.6813  fi log X i  42.7640

  fi log xi   42.7640 
GM  Antilog    Antilog  
  fi   40 

GM  Antilog(1.0691)=11.78

Example 26: Obtain geometric mean of the following distribution

Class Intervals 10 - 20 20 - 30 30 - 40 40 - 50 50 - 60
Frequencies 5 8 12 10 5

25
Solution:
Class Interval Xi fi log X i fi log X i

10-20 15 5 1.17609 5.8804


20-30 25 8 1.39794 11.1835
30-40 35 12 1.54407 18.5288
40-50 45 10 1.65321 16.5321
50-60 55 5 1.74036 08.7018
 fi  40  fi log X i  60.8266

  fi log xi   60.8266 
GM  Antilog    Antilog  
  fi   40 

GM  Antilog(1.5206)=33.16

Computation of Average Population Growth Rate

Example 27: If the population of a city has recorded a growth rate of 20% in the first decade, 30% in
the second decade, and 40% in the third decade. Obtain average population growth rate.

Decade Population Growth Population at the End log X i


Rate of Decade
First 20% 120 2.0792
Second 30% 130 2.1139
Third 40% 140 2.1461
Total  fi log X i  6.3392

Therefore
1   6.3392 
GM  Antilog   log xi   Antilog  
n   3 

GM  Antilog(2.1131)=129.7

It means that the city population over the three decade period has increased at an average rate of
(129.7-100) = 29.7 per cent per decade

Combined Geometric Mean


Let and be the geometric means of two series of sizes and , then the geometric means of
the combined series is given by

 n log G1  n2 log G2 
or GM c  Antilog  1 
 n1  n2 

26 Lecture notes by Dr. Haseeb Athar, Department of Statistics & O.R., A.M.U., Aligarh, India
The above formulae for calculating combined geometric means of k series can be generalized as
 n log G1  n2 log G2   nk log Gk 
GM c  Antilog  1 .
 n1  n2   nk 
Example 28: The geometric mean of two sets of data consisting of 20 and 25 observations was
reported to be 115.80 and 180.50, respectively. Find the combined geometric mean.
Solution: Given that
n1  20
n2  25
G1  115.80
G2  180.50
Therefore
 20  log115.80  25  log180.50 
GM c  Antilog  
 20  25 
 Antilog(2.17077)=148.19
Uses:
1. To find rate of population growth and rate of interest.
2. In construction of index number.

Harmonic Mean
Harmonic mean of the given observations is defined as the reciprocal of the arithmetic mean
of the reciprocals of the given set of observations.
1 1 1
If X1, X 2 ,..., X n are n sample observations, with their reciprocals , ,..., , then the
X1 X 2 Xn
Harmonic Mean (HM) is given by
1 n
HM   .
1 1 1 1  n 1
 ,
n  X1 X 2
,..., 
Xn 
X
i 1 i

In the case of ungrouped frequency distribution


1 N n
HM   , N   fi
1 f1 f 2 f  n f i 1

N
,
X1 X 2
,..., n 
Xn 
 Xi
i 1 i

In the case of grouped frequency distribution X i , i  1,2,..., n are the mid points of the class
intervals.

Note: For computation of weighted harmonic mean fi is replaced with the weight of i th observation
wi .

27 Lecture notes by Dr. Haseeb Athar, Department of Statistics & O.R., A.M.U., Aligarh, India
Uses
 The harmonic mean is often used to calculate the average of the ratios or rates. It is the most
appropriate measure for ratios and rates because it equalizes the weights of each data point. For
instance, the arithmetic mean places a high weight to large data points, while geometric mean
gives a lower weight to the smaller data points.
 It is most appropriate average where unit of observation (such as per day, per hour, per unit, per
worker etc.) remains the same and act being performed, such as covering distance.
Relationship between AM, GM and HM
If we compute these three averages for the same data set, then we get the relation
AM  GM  HM
provided that the observations comprising a set of data are not same.
If the set of data has the same observations, then
AM  GM  HM
Example 29: A car rallyist had to cover a total distance of 500 km spread over four zones. He
covered the first zone distance of 120 km at a speed of 80 kmph, the second zone distance of 160 km
at a speed of 110 kmph, third zone distance of 140 km at a speed of 140 kmph, and fourth zone
distance of 80 km at a speed of 160 kmph. Find the average speed at which he drove the car to cover
the entire distance.
Solution: We know that
Speed = (Total distance)/Time taken
Here Total distance = 500 km

 120 160 140 80 


and Time taken =       4.5 hrs.
 80 110 140 160 
Therefore
500
Average Speed =   111.11 km/hr
4.5
which is the weighted harmonic mean.

28 Lecture notes by Dr. Haseeb Athar, Department of Statistics & O.R., A.M.U., Aligarh, India
Depar tment of Statistics & O.R.

Aligarh Muslim University Aligarh

BA/BSc I Semester

Descriptive Statistics (STB 151)

by

Dr. Haseeb Athar


Unit -2
Measure of Dispersion

2 Lecture notes by Dr. Haseeb Athar, Department of Statistics & O.R., A.M.U., Aligarh, India
Measures of Dispersion
Measure of Variability or Measure of Dispersion indicates how spread out the data around the mean.

The following are the measure of dispersions:


1. Range
2. Quartile Deviation
3. Mean Deviation
4. Standard Deviation

Characteristic for ideal measure of dispersion


(i) It should be rigidly defined.
(ii) It should be simple to understand and easy to calculate.
(iii) It should be based upon all values of given data.
(iv) It should be capable of further mathematical treatment.
(v) It should have sampling stability.

1. Range:
The range is the difference between the largest and smallest values of the distribution.
Let are the set of observations, then
Range =
Example 1: Let there are five numbers 25, 34, 13, 27, 36, then

 One of the simplest measures of variability to calculate.


 Depends only on extreme values and provides no information about how the remaining data is
distributed.
2. Quartile Deviation
Quartile Deviation or Semi Inter Quartile Range is given by

where and are the first and third quartiles of the distribution.
Example 2: Suppose there are 12 observations
12, 25, 15, 5, 22, 7, 14, 36, 53, 30, 42, 65
In order to calculate quartiles we sort the observations,
5, 7, 12, 14, 15, 22, 25, 30, 36, 42, 53, 65

3 Lecture notes by Dr. Haseeb Athar, Department of Statistics & O.R., A.M.U., Aligarh, India
Here (even)

Therefore Quartile Deviation is given by

Similarly for discrete frequency distribution and grouped frequency distribution Quartile
Deviation can be calculated after getting the value of and .

 It is better than range, as it uses 50% observations. Since it ignores other 50% observations, so it
cannot be regarded as an ideal measure.

3. Mean Deviation
Case I: Discrete data
Let are observations, then

where is generally taken as mean, median and mode.

Example 3: Find mean deviation from mean for the given data 5, 4, 3, 7, 6
Solution: Here

So

Therefore mean deviation from mean is given by

Case II: Discrete frequency distribution

In case of frequency distribution , where is the frequency of the variable , then

where

4 Lecture notes by Dr. Haseeb Athar, Department of Statistics & O.R., A.M.U., Aligarh, India
Example 4: Find mean deviation from mean for the given data
X: 3 5 7 8 9 10
f: 2 3 1 2 3 2
Solution:
Hare Arithmetic Mean is 7. Therefore

3 2 4 8
5 3 2 6
7 1 0 0
8 2 1 2
9 3 2 6
10 2 3 6
13 28

Therefore

Case III: Grouped or continuous frequency distribution


In case of grouped or continuous frequency distribution, is taken as the midpoint of the
corresponding class.
Example 5: Distribution of marks of students in a class are given as below:
Marks: 0 - 10 10 - 20 20 - 30 30 - 40 40 - 50
No. of Students 5 3 9 2 6
Find the mean deviation from the mean.
Solution:
Marks No. of Mid Point
Students
0 - 10 5 5 25 20 100
10 - 20 4 15 60 10 40
20 - 30 8 25 200 0 0
30 - 40 2 35 70 10 20
40 - 50 6 45 270 20 120
25 625 280

Here, arithmetic mean is

5 Lecture notes by Dr. Haseeb Athar, Department of Statistics & O.R., A.M.U., Aligarh, India
Therefore mean deviation from mean is

 It is based on all the observations, it is better measure of dispersion than above two. But step of
ignoring negative sign of the deviation creates artificiality and it is not capable of further
mathematical treatment.

4. Standard Deviation
Case I: Discrete data
Let are observations, then Standard Deviation generally denoted by is given by

where is the arithmetic mean.


The square of Standard Deviation is called Variance. Therefore the formula for calculating Variance
is

The above formulation can also be written as

Example 6: Find Standard Deviation and Variance from the given data 5, 4, 3, 7, 6
Solution: Here

Therefore standard deviation is given by

Hence the variance is

Case II: Discrete frequency distribution


In case of frequency distribution , where is the frequency of the variable , then
standard deviation is

where
and

The above formulation can also be written as

6 Lecture notes by Dr. Haseeb Athar, Department of Statistics & O.R., A.M.U., Aligarh, India
Example 7: Find Standard Deviation for the given data
X: 3 5 7 8 9 10
f: 2 3 1 2 3 2
Solution:
Hare Arithmetic Mean is .

3 2 16 32
5 3 4 12
7 1 0 0
8 2 1 2
9 3 4 12
10 2 9 18
13 76

Therefore

Case III: Grouped or continuous frequency distribution


In case of grouped or continuous frequency distribution, is taken as the midpoint of the
corresponding class.
Example 8: Distribution of marks of students in a class is given as below:
Marks: 0 - 10 10 - 20 20 - 30 30 - 40 40 - 50
No. of Students 5 3 9 2 6
Find the standard deviation and hence variance.

Solution:
Marks No. of Mid Point
Students
0 - 10 5 5 25 400 2000
10 - 20 4 15 60 100 400
20 - 30 8 25 200 0 0
30 - 40 2 35 70 100 200
40 - 50 6 45 270 400 2400
25 625 5000

Here, arithmetic mean is

7 Lecture notes by Dr. Haseeb Athar, Department of Statistics & O.R., A.M.U., Aligarh, India
Therefore

and

 It is based on all the observations; it is better measure of dispersion than mean deviation.
 The artificiality created in the case of mean deviation is removed by squaring the .
 It is capable of further mathematical treatment.
 Population Standard Deviation and Population Variance

Let are the population values and mean of the population is , then population Standard
Deviation and population variance is given by

and

 Sample Standard Deviation and Sample Variance

Let are observations taken from a population and sample size is small, then we
replace the denominator by in formula of standard deviation and variance.
Therefore the Sample Standard Deviation generally denoted by is given by

and

This is also called sample mean square.

But when sample size is large, then no need to do this modification as

 Root Mean Square Deviation


Let X1, X 2 ,..., X n are n observations with their respective frequencies f1, f2 ,..., fn , then root mean
square deviation is given by
1 n n
s 2  
N i 1
fi ( X i  A)2 , N   fi
i 1

8 Lecture notes by Dr. Haseeb Athar, Department of Statistics & O.R., A.M.U., Aligarh, India
Relation Between Standard and Root Mean Square Deviation
Theorem: the root mean square deviation is least when deviations are measured from mean.
Proof. We have
1 n
s 2  
N i 1
fi ( X i  A)2

1 n
  fi {( X i  X )  ( X  A)}2
N i 1
1 n
  fi {( X i  X )2  ( X  A)2  2( X i  X )( X  A)}
N i 1
1 n
  fi ( X i  X )2  ( X  A)2  0
N i 1
  x2  ( X  A)2
s2   x2  d x2
Now
ds2
 0  2( X  A)  0 .
dA
This gives A  X

d 2 s 2
 2  0.
dA2

Hence the root mean square deviation is least when deviations are measured from mean.

 Standard deviation and Variance are independent of change in origin but not the scale.
Let X1, X 2 ,..., X n are n observations with their respective frequencies f1, f2 ,..., fn and mean X .
Define new variate
Xi  A
di  , i  1, 2,..., n
h
Where A is a new location and h be any multiple of all the observations.
n n
Now, h  di   X i  nA
i 1 i 1

or, hd  X  A
X  A  hd
We have
1 n
 X2   fi ( X i  X ) 2
N i 1
1 n
 
N i 1
fi {( X i  A)  ( X  A)}2

1 n n
 
N i 1
( X i  A)2  ( X  A)2  2( X  A)  ( X i  A)
i 1

9 Lecture notes by Dr. Haseeb Athar, Department of Statistics & O.R., A.M.U., Aligarh, India
1 n
 

 h2   di2  d 2  2d 2   h2 d2 .

 N i 1 

Also  X  h d
If h  1 , then  X2   d2 and  X   d
This shows variance and standard deviation are independent of change in origin but not scale.
 Variance of the Combined Series
Let there be two series having n1 and n2 observations respectively

Series Nos. 1 2

Observations
 

Means
S.D.  12  22

The variance of series 1 is given by

1 n1
12   ( X1i  X1 )2
n1 i 1

and The variance of series 2 is

1 n2
 22   ( X 2 j  X 2 )2
n2 j 1

The variance of combined series is


n1 n2
 ( X1i  X )2   ( X 2 j  X )2
i 1 j 1
 c2 
n1  n2
Note that
n1 n1
 ( X1i  X )2   {( X1i  X1 )  ( X1  X )}2
i 1 i 1

n X n X
 n1 (12  d12 ) , d1  X1  X and X  1 1 2 2
n1  n2
Similarly
n2
 ( X 2 j  X )2  n2 ( 22  d22 ) , d2  X 2  X
j 1

Therefore

10 Lecture notes by Dr. Haseeb Athar, Department of Statistics & O.R., A.M.U., Aligarh, India
n1 (12  d12 )  n2 ( 22  d 22 )
 c2  .
n1  n2
The above result can be generalized for k series.

n1 (12  d12 )  n2 ( 22  d 22 )  ...  nk ( k2  d k2 )


 c2  .
n1  n2  ...  nk

 Relation Between Standard Deviation and Mean Deviation from Mean


Theorem: For any discrete distribution, the standard deviation is not less than the mean deviation
from mean.

Proof. We have to show that


S.D.  Mean deviation from mean
or Variance  (Mean deviation from mean)2
2
1 n 1 n 

N i 1
fi ( X i  X ) 2    fi | X i  X |
 N i 1 

Let di  X i  X , then
2
1 n 1 n 

N i 1
fi di2    fi di 
 N i 1 
2
1 n 1 n 
or 
N i 1
fi di2    fi di   0
 N i 1 

or Variance ( d2 )  0
Which is true. Hence the theorem.

 Some Useful Results:


Let is any variable and be any constant. If we denote Mean by M and Variance by V, then
For Mean
1.
2.
3.
4. ,
where is any other constant
For Variance and SD
1.
2.
3.
4.

11 Lecture notes by Dr. Haseeb Athar, Department of Statistics & O.R., A.M.U., Aligarh, India
Example 9: Consider the following data set

4 7 9 12 16 24
Calculate
i) Mean and variance of the above data
ii) If 2 is subtracted from each value, then mean and variance of new series
iii) If 3 is multiplied to each value, then mean, standard deviation and variance of new series
iv) If each value is multiplied by 3 and then increased by 5, then mean and variance of new series.

Coefficient of Dispersion
Sometimes we want to compare the variability of two series which differ widely in their averages or
which are measured in different units, we do not only calculate the measure of dispersion but we
calculate coefficient of dispersion, which are pure numbers independent of unit of measurement.

The Coefficient of Dispersion (C.D.) based on different measure of dispersion are as below:

1. C.D. based on range

where A is the largest value and B is the smallest value in the series.

2. C.D. based on quartile deviation

where and are first and third quartiles respectively.

3. C.D. based on mean deviation

4. C.D. based on standard deviation

Coefficient of Variation
100 times the coefficient of dispersion based on standard deviation (S.D.) is called Coefficient of
Variation, that is

Coefficient of Variation (C.V.)

12 Lecture notes by Dr. Haseeb Athar, Department of Statistics & O.R., A.M.U., Aligarh, India
Example 10: In the game of Soccer goal scored by two teams A and B in ten matches are given
below:

Match No.: 1 2 3 4 5 6 7 8 9 10
Team A : 2 0 4 3 1 0 5 2 1 2
Team B : 5 0 1 4 1 1 4 3 2 6
Test which team is more consistent in the game?

Practice Exercise:

1. What is the standard deviation of the first 10 natural numbers (1 to 10)?

2. Mr. Zaid did a survey of the number of pets owned by his classmates, with the following results:

What was the standard deviation?

3. What is the population standard deviation for the numbers: 75, 83, 96, 100, 121 and 125?

4. A booklet has 12 pages with the following numbers of words:


271, 354, 296, 301, 333, 326, 285, 298, 327, 316, 287 and 314
Find the mean number of words per page and its standard deviation.
MCQs
1. The population standard deviation of the numbers 3, 8, 12, 17, and 25 is 7.563 correct to 3 decimal
places. Find if each of the five numbers is multiplied by 3?
a) The standard deviation remains the same
b) The standard deviation is increased by 3
c) The standard deviation is multiplied by 3
d) The standard deviation is multiplied by 9

2. For comparison of two different series, the best measure of dispersion is


a) Range
b) Coefficient of variation
c) Standard deviation
d) none

13 Lecture notes by Dr. Haseeb Athar, Department of Statistics & O.R., A.M.U., Aligarh, India
3. If a constant value 5 is subtracted from each observation of a set of values, the variance is
a) Reduced by 5
b) Reduced by 25
c) Unaltered
d) Increased by 25
4. If the mean and standard deviation of two series A and B are given as

Which of the two series is more consistent?


a) Series A
b) Series B
c) Series A and Series B are equally consistent
d) None

5. A researcher has collected the following sample data. The mean of the sample is 5.
3 5 12 3 2
The coefficient of variation is

a) 72.66%
b) 81.24%
c) 264%
d)330%

14 Lecture notes by Dr. Haseeb Athar, Department of Statistics & O.R., A.M.U., Aligarh, India
Depar tment of Statistics & O.R.

Aligarh Muslim University Aligarh

BA/BSc I Semester

Descriptive Statistics (STB 151)

by

Dr. Haseeb Athar


Unit -2
Moments
&
Measure of Skewness and Kurtosis

2 Lecture notes by Dr. Haseeb Athar, Department of Statistics & O.R., A.M.U., Aligarh, India
Moments
The concept of moments has crept into the statistical literature from mechanics. In mechanics,
this concept refers to the turning or the rotating effect of a force whereas it is used to describe the
characteristic of a frequency distribution in statistics. Moments represent a convenient and unifying
method for summarizing many of the most commonly used statistical measures such as measures of
tendency, variation, skewness and kurtosis.
Types of Moments
There are two types of moments we calculate
 Moments about arbitrary point or raw moments
 Moments about mean or central moments

Moments about Arbitrary Point:


For Discrete Data

Let X1, X 2 ,..., X n are n observations, the r th moment about any arbitrary point A is given by
1 n
r   ( X i  A)r ; r  0,1, 2,...
n i 1
In particular
1 n
0   ( X i  A)0  1 :
n i 1
Zero order moment

1 n
1   ( X i  A)
n i 1
: First order moment

1 n
2  
n i 1
( X i  A)2 : Second order moment

1 n
3   ( X i  A)3
n i 1
: Third order moment

1 n
4   ( X i  A)4
n i 1
: Fourth order moment

For Ungrouped Frequency Distribution

Let X1, X 2 ,..., X n are n observations with their respective frequencies f1, f 2 ,..., f n , then the r th
moment about any arbitrary point A is given by

1 n
r  
N i 1
fi ( X i  A)r ; r  0,1,2,...,

n
where N   fi
i 1

Remark 1: In the case of grouped frequency distribution X i , i  1,2,..., n are the mid points of the
class intervals.
Remark 2: If we take an arbitrary point A  0 , then we get the moments about origin as

3 Lecture notes by Dr. Haseeb Athar, Department of Statistics & O.R., A.M.U., Aligarh, India
1 n r
r   X i ; r  0,1, 2,...
n i 1
: For discrete data

1 n
r   fi X i r ; r  0,1, 2,...,
N i 1
: For ungrouped and grouped frequency distribution

n
where N   fi
i 1

2. Moments about Mean or Central Moments

When we take deviation from the actual mean (i.e A  X ) and calculate moments, then these are
called moments about mean or central moments.
For Discrete Data

Let X1, X 2 ,..., X n are n observations, then the r th moment about mean ( X ) or r th central moment
is given by

1 n
r   ( X i  X )r ; r  0,1, 2,...
n i 1

In particular
1 n
0  
n i 1
( X i  X )0  1 : Zero order central moment

1 n
1   ( Xi  X )  0 :
n i 1
First order central moment

1 n
2   ( X i  X )2
n i 1
: Second order central moment also called variance

1 n
3  
n i 1
( X i  X )3 : Third order central moment

1 n
4   ( X i  X )4
n i 1
: Fourth order central moment

For Ungrouped Frequency Distribution

Let X1, X 2 ,..., X n are n observations with their respective frequencies f1, f 2 ,..., f n , then the r th
moment about mean X or r th central moment is given by

1 n
r   fi ( X i  X )r ; r  0,1,2,...,
N i 1
n
where N   fi
i 1

Remark: In the case of grouped frequency distribution X i , i  1,2,..., n are the mid points of the
class intervals.

4 Lecture notes by Dr. Haseeb Athar, Department of Statistics & O.R., A.M.U., Aligarh, India
Relation between Central Moments and Raw Moments and conversely

We know that the r th moment about mean or central moment is

1 n
r  
N i 1
fi ( X i  X )r ; r  0,1,2,...,

1 n
r  
N i 1
fi ( X i  A  A  X ) r (1)

Let di  X i  A and also we have

1 n 1 n
1   i i
N i 1
f ( X  A)   di  d
n i 1

1 n
Also 1   fi ( X i  A)  X  A
N i 1

Therefore from (1), we have

1 n
r   fi (di  1 )r
N i 1
(2)

Expand (di  1 )r in (2) binomially, we get

1 n   r r  r 

r   fi dir    dir 11    dir 2 ( 1 )2    dir 3 ( 1 )3  ...  (1)r 1r 
N i 1   1   2  3 

1 n r 1 n r  n r n
  fi dir    1  fi dir 1   12  fi dir 2    13  fi dir 3  ...  (1)r 1r
N i 1 1  N i 1  2  i 1  3  i 1
In particular on putting r  1,2,3 and 4 in the above expression, we get

1  0
2  2  12
3  3  32 1  213
4  4  43 1  62 12  314
Converse

The r th moment about any point A is given by

1 n
r   fi ( X i  A)r ; r  0,1,2,...,
N i 1

1 n
r   fi ( X i  X  X  A)r
N i 1
(3)

Let zi  X i  X and X  A  d , then from (3), we have

5 Lecture notes by Dr. Haseeb Athar, Department of Statistics & O.R., A.M.U., Aligarh, India
1 n
r  
N i 1
fi ( zi  d )r

1 n   r r  r 

  fi  zir    zir 1d    zir 2 d 2    zir 3d 3  ...  d r 
N i 1   1   2  3 

1 n r 1 n r  1 n r 1 n
  fi zir    d  fi zir 1    d 2  fi zir 2    d 3  fi zir 3  ...  d r
N i 1 1  N i 1  2  N i 1  3  N i 1

r r  r
 r    1r 1    12 r 2    13r 3  ...  1r (4)
1   2  3
In particular at r  2,3,4 with 0  1 and 1  0 , we get

2  2  12
3  3  32 1  13
4  4  431  62 1  14
Example 1: The first four moments of a distribution about the value 5 of a variable are 1, 10, 20 and
25. Find the central moments.
Solution: We have given that
1  1, 2  10, 3  20 and 4  25 .
Therefore,
2  2  12  10  1  9
3  3  32 1  213  20  3 10 1  2  (1)3  8
4  4  43 1  62 12  314
 25  4  20  1  6  10  12  3  14  2

Effect of Change in Origin and Scale on Moments


Xi  A
Let ui 
h
then X i  A  hui , which implies X  A  hu

Therefore

X i  X  h(ui  u )
Thus, the r th moment of X about arbitrary point X  A is given by
1 n
r ( x)  
N i 1
fi ( X i  A)r

1 n
 
N i 1
fi (hui )r

1 n
 hr  fiuir  hr r (u ) .
N i 1

6 Lecture notes by Dr. Haseeb Athar, Department of Statistics & O.R., A.M.U., Aligarh, India
Again, the r th moment of X about mean or central moment is given as
1 n
r ( x)   fi ( X i  X ) r
N i 1
1 n
 
N i 1
fi {h(ui  u )}r

1 n
r ( x)  h r 
N i 1
fi (ui  u )r  h r r (u ) .

Thus, the r th moment of the variable X is h r times the r th moment of the new variable u after
changing the origin and scale.
Example 2: Wages of workers are given in the following table:
Weekly 10-12 12-14 14-16 16-18 18-20 20-22 22-24
wages
No. of 1 3 7 20 12 4 3
workers

Find the first four raw moments by suitably changing origin and scale and then convert into central
moments.
Solution: Calculation of moments

Wages No. of Mid Point X i  17


Workers ( Xi )
ui  fui fui2 fui3 fui4
2
( fi )

10 - 12 1 11 3 3 9 27 81

12 - 14 3 13 2 6 12 24 48

14 - 16 7 15 1 7 7 7 7

16 - 18 20 17 0 0 0 0 0

18 - 20 12 19 1 12 12 12 12

20 - 22 4 21 2 8 16 32 64

22 - 24 3 23 3 9 27 81 243

13 27 67 455

The formula for r th raw moment is given by


1 n
r  r ( x)  hr 
N i 1
fi uir

Therefore,
1 n 13
1  h  
N i 1
fi ui  2   0.52
50

7 Lecture notes by Dr. Haseeb Athar, Department of Statistics & O.R., A.M.U., Aligarh, India
1 n 27
2  h2  
N i 1
fi ui2  4 
50
 2.16

1 n 67
3  h3  
N i 1
fi ui3  8 
50
 10.72

1 n 455
4  h4  
N i 1
fi ui4  16 
50
 145.6

Now central moments are


1  0
2  2  12  2.16  0.2704  1.8896

3  3  32 1  213


10.72  3  2.16  0.52  (0.52)3  7.491

4  4  43 1  62 12  314


145.6  4  0.52  10.72  6  2.56  0.2704  3  0.07312  126.5874

Sheppard’s Correction for Grouping Errors


When moments are calculated for continuous frequency distributions or grouped data (i.e. data that
has been binned), it is assumed that the data is centred on the class interval mid-points. This erroneous
assumption introduces “grouping errors” in calculation of moments. The grouping errors occur
only in even moments (i.e. second and fourth moments).
The correction should only be made to data with the following characteristics:
 Frequencies should taper to zero in both the positive and negative direction. In other words,
frequencies should be symmetrical and gradually taper off (like the behaviour you would see in
a normal distribution).
 Variables should be continuous. The method is not suited to discrete variables.
 Class intervals should be equal in width.
 Class intervals should be more than 1/20th of the total range.
W. F. Sheppard suggested some corrections to be made to get rid of the so called “grouping errors”
that enter into the calculation of moments.

h2
2 (Corrected)  2 
12
3 (No correction needed)
1 7 4
4 (Corrected)  4  h2 2  h ,
2 240
where h is the width of the class interval.

8 Lecture notes by Dr. Haseeb Athar, Department of Statistics & O.R., A.M.U., Aligarh, India
Example 3: Compute the first four moments for the following distribution of marks after applying
Sheppard's correction.
Marks out of 20 5 6 7 8 9 10 11 12 13 14 15
No. of Students 1 2 5 10 20 51 22 11 5 3 1

Solution: Do it by yourself
Factorial Moments: The r th factorial moment of a variable X about the origin is given as
1 n n
(r )   i  fi ,
(r )
f x , N 
N i 1 i i 1
where
x(r )  x( x  1)( x  2)...( x  r  1) .

The r th factorial moment of a variable X about any point x  a is given by


1 n n
(r )   fi ( xi  a)(r ) , N   fi
N i 1 i 1
where
( x  a)(r )  ( x  a)( x  a  1)( x  a  2)...( x  a  r  1)
In particular
1 n
   fi xi  1
(1)
N i 1
1 n 1 n
 
(2)  i  fi xi ( xi  1)
(2)
f x 
N i 1 i N i 1
1 n 1 n
 
N i 1
fi xi2   fi xi
N i 1
   2  1
(2)
1 n 1 n
 
(3) 
N i 1
fi xi(3)   fi xi ( xi  1)( xi  2)
N i 1
 (3)
  3  32  21

1 n 1 n
 
(4) 
N i 1
fi xi(4)   fi xi ( xi  1)( xi  2)( xi  3)
N i 1
1 n
  fi xi ( xi3  6 xi2  11xi  6)
N i 1
 (4)
  4  63  112  61 .

Example 4: In Example 2, convert first four raw moments into factorial moments.
Solution: Do it by yourself.
Absolute Moment
The absolute moment of order r about the origin is given as
1 n n
r  
N i 1
fi | X i |r , N   fi
i 1

9 Lecture notes by Dr. Haseeb Athar, Department of Statistics & O.R., A.M.U., Aligarh, India
Measure of Skewness
The term skewness refers to lack of symmetry or departure from symmetry. When a distribution is
not symmetrical (or is asymmetrical), it is called skewed distribution. The measures of skewness
indicate the difference between the manners in which the observations are distributed in particular
distribution compared with a symmetrical (or normal) distribution. The concept of skewness gains
importance from the fact that statistical theory is often based upon the assumption of the normal
distribution.
Nature of Skewness: Skewness can be positive or negative or zero.
 When the values of mean, median and mode are equal, there is no skewness.
 When mean > median > mode, skewness will be positive.
 When mean < median < mode, skewness will be negative.

There is different measure of skewness, which are discussed below.


1. Karl Pearson’s Coefficient of Skewness
This method is most frequently used for measuring skewness. The formula for measuring
coefficient based on mean, mode and standard deviation is given by
Mean  Mode
sk P  ,
SD
where sk P denotes the Pearsonian coefficient of skewness.

This formula can be used for fairly symmetric data. However, if the data is moderately skewed or
mode is ill defined, then the above formula can be modified by using the empirical relation as,
3( Mean  Median)
skP 
SD
 If skP  0 , the symmetrical distribution.
 If mean is greater than mode, implies sk p  0 , then positive skewness.
 If mean is less than mode, implies sk p  0 , then negative skewness.

The Karl Pearson's coefficient of skewness lies between 3 to 3 .


2. Bowley’s Coefficient of Skewness
This method is based on quartiles. The formula for calculating coefficient of skewness is

10 Lecture notes by Dr. Haseeb Athar, Department of Statistics & O.R., A.M.U., Aligarh, India
(Q3  Q2 )  (Q2  Q1 ) Q1  2Q2  Q3
sk B  
Q3  Q1 Q3  Q1

 If skB  0 , then it is symmetrical distribution.


 If skB  0 , and then positively skewed distribution.
 If skB  0 , and then negatively skewed distribution.

The Bowley’s coefficient of skewness is used for moderately skewed distribution and distribution
having open end class. The Bowley’s coefficient of skewness lies between 1 to 1 .

Example 5: The IQ scores of 50 students of a class are given blow

IQ Score: 50 - 60 60 - 70 70 - 80 80 - 90 90 - 100

No. of Std: 5 8 16 12 9
Calculate
(i) Karl Pearson coefficient of skewness based on median and mode.
(ii) Bowley's coefficient of skewness.

Solution: To calculate all above measures, first we shall calculate mean, median, mode, standard
deviation, first quartile and third quartile.

X A
IQ Score No. of Std. ( f ) cf X d fd fd 2
h

50 - 60 5 5 55 -2 -10 20
60 - 70 8 13 65 -1 -8 8
70 - 80 16 29 75 0 0 0
80 - 90 12 41 85 1 12 12
90 - 100 9 50 95 2 18 36
f  50  fd  12  fd 2  76

X  A  hd  75  10  0.24  77.4
1
 d2 
N
 fd 2  d 2  1.52  0.0576  1.46

 x2  h2 d2  100 1.46  146

 x  12.08

h N  10
Q1  lq1    C pq1   60  (12.5  5)  69.38
f q1  4  8

h N  10
Q2  lq2    C pq2   70  (25  13)  83.75
f q2 2  16

11 Lecture notes by Dr. Haseeb Athar, Department of Statistics & O.R., A.M.U., Aligarh, India
h N  10
Q3  lq3    C pq3   80  (37.5  29)  87.06
f q3  4  12

(i) The Karl Pearson coefficient of skewness based on mean, mode and standard deviation is given by

X  Mode 77.4  76.67


skP    0.06
x 12.08

The Karl Pearson coefficient of skewness based on mean, median and standard deviation is given
by

3( X  Md ) 3(77.4  83.75)
sk P    1.58
x 12.08

(ii) The Bowley's coefficient of skewness is


Q  2Q2  Q3 69.38  2  83.75  87.06
sk B  1   0.6256
Q3  Q1 87.06  69.38

3. Measure of Skewness based on Moments


Karl Pearson defined the following four coefficients, based upon the first four central moments.
The coefficients 1 and  1 are used to measure the skewness. These coefficients are defined as
below:

32 
1  , 1   1  3
23 3

For a symmetrical distribution 1  0 , but it does not tell the direction of skewness, that is
positive or negative. Because 32  0 and variance 2 is always positive. This drawback is
removed by calculating  1 coefficient. Thus, the sign of skewness would depend upon the value
of 3 whether it is positive or negative.

 If 1  0 , then positively skewed distribution

 If 1  0 , then negatively skewed distribution

 If 1  0 , then symmetrical distribution

12 Lecture notes by Dr. Haseeb Athar, Department of Statistics & O.R., A.M.U., Aligarh, India
Measure of Kurtosis
The relative flatness of the top is called kurtosis or convexity of curve. The coefficients  2 and  2
are used to measure the kurtosis. These coefficients are defined as below:
4
2  ,  2  2  3
22

The quantity  2  2  3 is called excess of kurtosis.

 If  2  3 , then curve is called


Platykurtic.
 If  2  3 , then curve is called
Mesokustic.
 If  2  3 , then curve is called
Leptokurtic.

Example 6: Refer to example 2 and calculate 1, 1, 2 and  2 coefficients. Also study the nature
of the distribution.
Solution: In example 2, we have calculated the following first four central moments as
1  0, 2  1.8896, 3  7.491 and 4  126,5874
Therefore

32 (7.491)2 56.11508


1     8.317
23 (1.8896)3 6.7469

3 7.491
and 1   1    2.91
 3
(1.37)3

Since 1  0 , therefore distribution is positively skewed.

Further,
4 126.5874
2    35.4527
22 (1.8896)2

and  2  2  3  32.4527

Since  2  0 , therefore distribution is leptokurtic.

13 Lecture notes by Dr. Haseeb Athar, Department of Statistics & O.R., A.M.U., Aligarh, India
Practice Exercises
1. The first three moments of a distribution about the value 2 of the variable are 1, 16 and -40. Show
that the mean is 3, the variance is 15 and 3  86 . Also show that the first three moments about
x  0 are 3, 24 and 76.
2. The first four moments of a distribution about the value 5 of the variable are 2, 20, 40 and 50.
Show that the mean is 7, variance 16, 3  64, 4  162, 1  1 and 2  0.63 .

3. For a certain distribution, the mean is 10, variance is 16,  1 is 1 and  2 is 4. Find the first four
moments about the origin.
4. Calculate the first four moments about the mean for the following data. Also calculate 1 and  2 .

x: 1 2 3 4 5 6 7 8 9
f: 1 6 13 25 30 22 9 5 2
5. In a certain distribution, the first four moments about the point 4 are -1.5, 17, -30 and 108.
Calculate 1 and  2 and state whether the distribution is leptokurtic or platykurtic.

6. Show that for discrete distribution


(i) 2  1

(ii) 2  1 .

14 Lecture notes by Dr. Haseeb Athar, Department of Statistics & O.R., A.M.U., Aligarh, India

You might also like