Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
428 views64 pages

Statistics: Central Tendency & Variation

The document discusses various statistical concepts including descriptive statistics, measures of central tendency, and the arithmetic mean. It provides definitions and formulas for calculating the arithmetic mean of univariate, bivariate, and multivariate data, as well as grouped and ungrouped data. Examples are given to demonstrate how to calculate the arithmetic mean from raw data sets, frequency distributions, and grouped data using methods like the assumed mean and step-deviation approaches. Properties of the arithmetic mean are also outlined.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
428 views64 pages

Statistics: Central Tendency & Variation

The document discusses various statistical concepts including descriptive statistics, measures of central tendency, and the arithmetic mean. It provides definitions and formulas for calculating the arithmetic mean of univariate, bivariate, and multivariate data, as well as grouped and ungrouped data. Examples are given to demonstrate how to calculate the arithmetic mean from raw data sets, frequency distributions, and grouped data using methods like the assumed mean and step-deviation approaches. Properties of the arithmetic mean are also outlined.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 64

Course Content

Statistical Data & Descriptive Statistics: Nature & Classification of data: univariate, bivariate &
multivariate data; time-series & cross-sectional data, Measures of Central Tendency, Mathematical
averages including arithmetic mean, geometric mean & harmonic mean. Properties & applications;
Positional Averages Mode & Median (& other partition values including quartiles, deciles, &
percentiles) (including graphic determination); Measures of Variation: absolute & relative; Range,
quartile deviation, mean deviation, standard deviation, & their coefficients, Properties of standard
deviation/variance; Skewness: Meaning, Measurement using Karl Pearson & Bowley’s measures;
Concept of Kurtosis.

What is statistics?
• Statistics has been defined in two ways. Some writers define it as ‘Statistical data’ i.e.,
numerical statement of facts,
• while others define it as 'Statistical methods', i.e., complete body of the principles and
techniques used in collecting and analyzing such data. So,
• “Statistics is the science which deals with the collection, analysis and interpretation of
numerical data."
STATISTICAL TECHNIQUES
Statistical techniques are techniques based on concept of statistics. These include the methods of
collection of data, classification and tabulation of data, analysis of data through statistical measures
such as mean, standard deviation, coefficients of correlation and regression, decision theory, index
numbers, time series analysis, theory of sampling , probability and probability distributions etc..
Some Important Statistical Techniques
• Method of collecting data.
• Classification and tabulation of data.
• Measures of central tendency.
• Measures of dispersion.
• Correlation and regression analysis.

1
2
IMPORTANCE AND SCOPE OF STATISTICS

In modern times, Statistics is viewed not as a mere device for collecting numerical data but as a
means of developing sound techniques for their handling and analysis and drawing valid inferences
from them. So it has rather become indispensable in all phases of human endeavour.
Statistics and Planning. Economics, business, mathematics, biology astronomy and medical science,
engineering etc

3
FREQUENCY DISTRIBUTION’S
When observations, discrete or continuous,are available on a single characteristic of a large number
of individuals, often it becomes necessary to condense the data as far as possible without losing any
information of interest.
Let us consider the marks in Statistics obtained by 250 candidates selected at random from among
those appearing in a certain examination
·.

This representation of, the data does not furnish any useful information and is rather confusing to
mind.

4
This representation, though better than an array, does not condense the data much and it is quite
cumbersome to go through this huge mass of data.

5
AVERAGES OR MEASURES OF CENTRAL TENDENCY, OR MEASURES OF LOCATION:
An average in a statistical series is the value of the variable which representative of entire distribution.
There are the five measures of central tendency that are in common use:
(i) Arithmetic Mean or simply Mean,
(ii) Median,
(iii) Mode,
(iv)Geometric Mean, and
(v) Harmonic Mean.
Requisites for an Ideal Measure of Central Tendency
(i) It should be strongly defined.
(ii) It should be readily comprehensible and easy to calculate.
(iii) It should be based on all the observations.
(iv) It should be suitable for further mathematical treatment. By this we mean that if we are given the
averages and sizes of a number of series. We should be able to calculate the average of the
composite series obtained on combining the given series.
(v) It should be affected as little as possible by fluctuations of sampling.
In addition to the above criteria we may add the following (which is not due to Prof.Yule) :
(vi) It should not be affected much by extreme values.

ARITHMETIC MEAN:
6
Arithmetic mean of a set of observations is their sum divided by the number of observations. e.g., the
arithmetic mean x̅of n observations x1, x2,……… xn is given by

Weighted/ frequency arithmetic mean


xi / fi i = 1.2... n. where fi is the frequency of the variable xi..

7
8
The arithmetic mean of a set of observations is their sum

divided by the number of observations. Let x1, x2,..., xn be n observations. Then their average
or arithmetic mean is given by

For example,

The marks obtained by 10 students in Class XII in a physics examination are 25, 30, 21, 55, 40,
45, 17, 48, 35, 42. The arithmetic mean of the marks is given by

If n observations consist of n distinct values denoted by x1, x2, ..., xn of the observed variable x
occurring with frequencies f1, f2, ..., fn respectively then the arithmetic mean is given by

arithmetic Mean of Grouped data

In case of grouped or continuous frequency distribution the arithmetic mean is given by

9
10
Example 3

A company is planning to improve plant safety. For this, accident data for the last 50 weeks
was compiled. These data are grouped into the frequency distribution as shown below.
Calculate the arithmetic mean of the number of accidents per week.

Solution

The given class intervals are inclusive. However, they need not be converted into exclusive
class intervals for the calculation of mean.

11
ARITHMETIC MEAN FROM ASSUMED MEAN

If the values of x and (or) f are large, the calculation of mean becomes quite time consuming
and tedious. In such cases, the provisional man ‘a’ is taken as that value of x (mid value of the
class interval) which corresponds to the highest frequency or which comes near the middle
value of the frequency distribution. This number is called the

assumed mean.

12
13
14
15
16
ARITHMETIC MEAN BY THE STEP-DEVIATION METHOD

When the class intervals in a grouped data are equal, calculation can be simplified by the step-
deviation method. In such cases, deviation of variate x from the assumed mean

a (i.e., d = x – a) are divided by the common factor h which is equal to the width of the class
interval.

17
18
19
20
21
22
23
24
25
26
PROPERTIES OR ARITHMETIC MEAN
Property 1. Algebraic sum of the deviations of a set of values from their arithmetic mean is zero.

27
Property 2. The sum of the squares of the deviations of a set of values is minimum when taken about
mean.
Property3. If two group contains n1 and n2 observations with mean x̅ 1 and x̅ 2 respectively then mean
of composite group (n1 + n2) observations is given by

In general there are several groups, the ith group containing n i observations with mean x̅ i the mean
of composite series

Merits and Demerits of Arithmetic Mean


Merits Demerits
(i) It is rigidly defined. (i) It cannot be determined by inspection
(ii) It is easy to understand and easy to' nor it can be located graphically.
calculate. (ii) Arithmetic mean cannot be used if we
(iii) It is based upon all the observations are dealing with qualitative characteristics
(iv) It is suitable to algebraic treatment. which cannot be measured quantitively;
The mean of the composite series such as, intelligence, honesty, beauty, etc.
in terms of the means and sizes of the In such cases median (discussed later) is
component series is given by the only average to be used.

28
(iii) Arithmetic mean cannot be obtained if
a single observation is missing or
(v) Of all the averages, arithmetic mean is lost or is illegible unless we drop it out and
affected least by fluctuations of sampling. compute the arithmetic mean of the
This property is sometimes described by remaining values. ,
saying that arithmetic mean is, a stable (iv) Arithmetic mean is affected very much
average. by extreme values.
(v) Arithmetic mean may lead to wrong
conclusions if the details of the data from
which it is computed are not given. let us
consider the following marks obtained by
two students A and B in three tests, viz.,
terminal test, half-yearly examination and
annual examination respectively.
Marks in : I Test II Test III
Test
A 50% 60% 70%
B 70% 60% 50%
Thus average marks obtained by each of
the two students at the end of the year
are 60%. If we are given the average
marks alone we conclude that the level of
intelligence of both the students at the end
of the year is same. This is a fallacious
conclusion since we find from the data that
student A has improved consistently while
student B has deteriorated consistently.
(vi) Arithmetic mean cannot 'be calculated
if the extreme class is open, e.g., below I0
or above 90.Morever, even if a single
observation is missing mean cannot be
calculated.
(vii) In extremely asymmetrical (skewed)
distribution, usually arithmetic mean is not
a suitable measure of location.

MEDIAN
Median of a distribution is the value of the variable which divides it into two equal parts. it ,is the value
such that the number of observations above it, is equal to the number of observations below it. The
median is thus a positional average.
In case of ungrouped data, if the number of observations is odd then median is the middle value after
the values have been arranged in ascending or descending order of magnitude. In case of even

29
number of observations, there are two middle terms and median is obtained by taking the arithmetic
mean of the middle terms.
e.g. Median of the values 5, 20, 15, 35, 18
arranging in ascending order 15, 18, 20, 25, 35
and the median of 8, 20, 50, 25, 15, 30, i.e., of 8, 15, 20, 25, 30, 50' is
1/2 ( 20+25) = 22·5
In case of Discrete Frequency Distribution Median is obtained by considering the cumulative
freqoencies. The steps for calculating median are given below:
(i) Find NI2, where N = ∑fi
(ii) See the (less than) cumulative frequency (cf.) just' greater than N12.
(iii) The corresponding value of x is median.
Example: Obtain the median for the following frequency distribution:
x: 1 2 3 4 5 6 7 8 9
f: 8 10 11 16 20 25 15 9 6
Solution.
X F c.f

1 8 8

2 10 18

3 11 29

4 16 45

5 20 65

6 25 90

7 15 105

8 9 114

9 6 120

120

Hence N = 120  N/2 = 60


Cumulative frequency (cf.) just greater thanN/2, is 65 and the value of x corresponding to 65 is 5.
Therefore, median is 5.
Median for Continuous Frequency Distribution: In the case of continuous frequency distribution, the
class corresponding to the c.f. just greater than N/2 is called the median class and the value of
median is obtained by the following formula:

where l is the lower. limit of the median class,


fmd is the frequency of the median class,
30
h is the magnitude of the median class,
'c' is the cf. of the class preceding the median class, and N = ∑fi
Example: Find the median wage of the following distribution:
Wages (in Rs.) : 20-30 30-40 40-50 50—60 60-70

No. of labourers 3 5 20 10 5

Solution.
Wages (in Rs.) No. of labourers cf.

20-30 3 3

30-40 5 8c

40-50 20 fmd 28

50-60 10 38

60-70 5 43

Here N/2= 43/2= 21·5. Cumulative frequency just greater than 21·5 is 28 and the corresponding
class is 40-50. Thus median class is 40-50. Hence
Median = 40+ 10/20(21.5- 8)
= 40+ 6·75=,46·75
Thus median wage is Rs. 46·75.

31
32
33
34
35
Merits and Demerits of Median
Merits and uses: Demerits

(i) It is rigidly defined. (i) In case of even number of observations


median cannot be determined exactly. We
(ii) It is easy to understand and easy to merely estimate it by taking the mean of
calculate. In some cases it can be located two middle terms.
merely by inspection.
(ii) It is not based on all the observations.
36
(iii) It is not at all affected by extreme For example, the median of 10, 25, 50,60
values. and' 65 is 50.

(iv) It can be calculated for distributions We can replace the observations 10 and
with open-end classes. 25 by any two values which are smaller
than 50 and the observations 60 and 65
by any two values greater
Uses: (i) Median is the only average to be than 50 without affecting the value of
used while dealing with qualitative data median. This property is sometimes
which cannot be measured quantitatively described by saying that median is
but still can be arranged in ascending insensitive.
or descending order of magnitude, e.g., to (iii) It is not suitable to algebraic treatment.
find the average intelligence or average
honesty among a group of people. (iv) As compared with mean, it is affected
much by fluctuations of
(ii) It is to be used for determining the
typical value in problems concerning

wages, distribution of wealth, etc.

MODE:
Mode is the value which occurs most frequently in a set of observations.
Mode is the value of the variable which is predominant in the series.
For example in the following frequency distribution:
x: 1 2 3 4 5 6 7 8

f: 4 9 16 25 22 15 7 3

37
In case of continuous frequency distribution. mode is given by the formula :

Where I is the lower limit, h the magnitude and f1 the frequency of the modal class, f0 and f2 are the
frequencies of the casses preceding and succeeding the modal class respectively.
Ex.: Find the mode for the following distribution:
Class- 0-10 10-20 20-30 30-40 40-50 50-60 60-70 70-80
interval:

Frequency 5 8 7 12 28 20 10 10

Solution. Here maximum frequency is 28. Thus the class 40-50 is the modal class. The value of mode
is given.by

Mode :: 40 + 10(28-12)/(2 x 28 – 12 - 20) = 40 + 6·666 = 46·67 (approx.)

38
39
40
41
Remarks.
1. In case of irregularities in the distribution, or the maximum frequency being repeated or the
maximum frequency occurring in the very beginning or at the end of the distribution, the modal class
is determined by the method of grouping .
Sometimes mode is estimated from the mean and the median. For a symmetrical distribution, mean,
median and mode coincide. If the distribution is moderately asymmetrical, the mean, median and
mode obey the following empirical relationship (due to Karl Pearson) :
(Mean - Mode) =3( Mean – Median)
Mode = 3 Median - 2 Mean
2. If the method of grouping gives the modal class which does not correspond to the maximum
frequency, i.e., the frequency of modal class is not the maximum
frequency, then in some situations we may get, . In such cases, the value of
mode can be obtained by the formula:

Merits and Demerits of Mode


Merits and uses: Demerits

(i) Mode is readily comprehensible and (i) Mode is ill-defined. It is not always
easy to calculate. Like median, mode can possible to find a clearly defined mode. In
be located in some cases merely by some cases, we may come across
inspection. distributions with two modes. Such
distributions are called bi-modal. If a
(ii) Mode is not at all affected by extreme distribution has more than two modes, it is
values. said to be multimodal.
(iii) Mode can be conveniently located (ii) It is not based upon all the
even if the frequency distribution has observations.
class-intervals of unequal magnitude
provided the modal class and the classes (iii) It is not capable of further
mathematical treatment..
preceding and succeeding it are of the
same magnitude. Open-end classes also (iv) As compared with mean, mode is
do not pose any problem in the location of affected to a greater extent by fluctuations
mode. of sampling.

Uses: Mode is the average to be used to


find the ideal size, e.g., in business
forecasting , in the manufacture of ready-
made garments, shoes. etc.

42
Geometric Mean:
Geometric mean of a set of n observations is the nth root of their product Thus the geometric mean G
of n observations xi i=1.2…… n is
G = (x1. x2 ..... xn)1/n
In case of frequency distribution geometric mean, G is
given by

Merits and Demerits

(i) It is rigidly defined. Demerits. (i) Because of its abstract


mathematical character, geometric mean
(U) It is based upon all the observations. is not easy to understand and to calculate
(iii) It is suitable for further mathematical for non-mathematics person.
treatment (ii) If anyone of the observations is zero,
geometric mean becomes zero and if
(iv) It is not affected much by fluctuations
of sampling. anyone of the observations is negative,
geometric mean becomes imaginary
(v) It gives comparatively more weight to regardless of the magnitude of the other
small items. items.

Uses:

(i) To find 'the rate of population growth


and the rate of interest.

(ii) In the construction of index numbers.

HARMONIC MEAN:
Harmonic mean of a number of observations is the reciprocal of the arithmetic mean of the
reciprocals of the given values. Thus, harmonic mean H, of n observations xi, I =1, 2, …….n is

In case of frequency distribution harmonic mean, H is

43
Merits. Hannonic mean is rigidly defined, Demerits
based upon all the observations and is Harmonic mean is not easily understood
suitable for further mathematical and is difficult to compute.
treatment. Like GM it cannot be calculated, if any of
Like geometric mean, it is not affected the observation is zero.
much by fluctuations of sampling. It may not be actual value of the variable.
It gives greater importance to small items .
and is useful only when small items have
to be given a greater weightage
Uses:
(i) To find 'the rate of population growth
and the rate of interest.
(ii) In the construction of index numbers.

44
45
46
MEASURES OF DISPERSIONS:
The measures of central tendency give us an idea of tile concentration of the observations about the
central part of the distribution.
If we know the average alone we cannot form a complete idea about the distribution. e.g.
Consider the series (i) 7. 8, 10, 11, (ii) 3, 6; 9, 12, 15, (iii) I, 5, 9, 13, 17.
In all those cases we see that n, the number of observations is 5 and the mean is 9. If we are given
that the mea n of 5 observations is 9, we cannot form an idea as to whether it is the average of first
series or second series or third series or of any other series (If 5 observations whose sum is 45. Thus
we see that the measures of central tendency are inadequate to give us a complete idea of the
distribution. They must be supported and supplemented by some other measures, One such measure
is Dispersion.
Literal meaning of dispersion is scatteredness.
We study dispersion to have an idea about the homogeneity or heterogeneity of the distribution. In
the above case we say that series (i) is more homogeneous (less dispersed) than the
series (ii) or (iii) or we say that series (iii) is more heterogeneous (more scattered) than the series (i)
or (ii),
Characteristics for an Ideal measure of Dispersion
are the same as those for all ideal measure of central tendency.
Measures of dispersion
(i) Range,
(ii) Quartile deviation or Semi-interquartile range,
(iii) Mean deviation, and
(iv) Standard deviation.

Range
The range is the difference between two extreme observations of the distribution.
Range = Xmax - Xmin
Range is the simplest but a crude measure of dispersion.
47
Quartile deviation or semi-interquartile range

Q = (Q3 – Q1)/2

where Q1 and Q3 are the first and third quartiles of the distribution respectively. Quartile deviation is
definitely a better measure than the range as it makes use of 50% of the data. But since it ignores the
other 50% of the data, it cannot be regarded as a reliable measure.
Mean deviation

If xi / fi i = 1.2... n.is the frequency distribution then mean deviation from the average A, (usually
mean, median or mode), is given by

Since mean deviation is based on all the observations. it is a better measure of dispersion than range
or quartile deviation. But the step of ignoring the signs of the deviations (xi - A) creates artificiality and
renders it useless for further mathematical treatment.

48
Standard deviation and Root mean square deviation

49
The positive square root of the arithmetic mean of the squares of the deviations of the given values
from their arithmetic mean is known as S.D. For the frequency distribution
xi / fi i = 1.2... n

The step of squaring the deviations (xi -𝑥̅ ) overcomes the drawback of ignoring the signs in mean
deviation.
Standard deviation is also suitable for further mathematical treatment. Moreover of all the measures,
standard deviation is affected least by fluctuations of sampling.

Root mean square deviation, denoted by 's' is given by

Standard deviation is the minimum value of root mean square deviation.


Different Formulae For Calculating Variance

50
If X is not a whole number but comes out to be in fractions, the calculation of 𝜎2 by using above
formula is difficult and cumbersome.
In order to overcome this difficulty, we shall develop different forms of the above formula which
reduce the arithmetic to a great extent and are very useful for computational work.

If the values of x and f are large the calculation of fx, fx2 is quite tedious.

Hence, variance is independent of change of origin but not of scale.

51
52
53
54
55
56
Coefficient of Dispersion

Whenever we want to compare the variability of the two series which differ widely in their averages or
which are measured in different units, we do not merely calculate the measures of dispersion but we
calculate the coefficients of dispersion which are pure numbers independent of the units of
measurement.
The coefficients of dispersion (C.D.) based on different measures of dispersion are as follows:
1. C.D. based upon range = A – B/A+B

2. Based upon quartile deviation = Q3 – Q1/ Q3 + Q1

3. Based upon mean deviation = Mean deviation/ Average from which it is calculated
4. Based upon standard deviation = 𝜎

Coefficient of variations

C. V. is the percentage variation in the mean, standard deviation being considered as the total
variation in the mean.
For comparing the variability of two series, we calculate the co-efficient of variation for each series.
The series having greater C. V. is said to be more variable than the other and the series having lesser
C.V. is said to be more consistent (or homogeneous) than the other.

57
58
MOMENT
Moment is familiar mechanical term for measure of a force with reference to its tendency to produce
rotation.
The strength of this tendency depends upon (i) the amount of the force and (ii) the distance from
origin of the point at which the force exerted.
And in statistics
In place of distance from origin of the point at which the force exerted we use deviation from the
average (assumed mean or mean) of the value.
The rth moment of a variable X about any point. X = A usually denoted  r is given by

If A = 𝑥̅

Relation between moments about mean in terms of moments about any point and vice versa.

Effect of Change of Origin and Scale on Moments.


rth moment of the variable x about mean ishr times the rth moment of the variable u about its mean.

PEARSON'S 𝛽 AND 𝛾 COEFFICIENTS.


Karl Pearson defined the following four coefficients, based upon the first four moments about mean:

The beta coefficient can never be negative and are of pure number independent of change of scale
and origin of observations.
Since coefficient for a symmetrical distribution all odd order moments are zero.
𝛽 1 is zero when distribution is symmetrical.
Beta coefficients are used for measuring skewness and kurtosis

59
Skewness
Literally. skewness means 'lack of symnietry·.
We study skewness to have an idea about the shape of the curve which we can draw with the help of
the given data, A distribution is said to be skewed if
(i) Mean. median and mode fall at different points.
i.e., Mean ¢ Median ¢ Mode,
(ji) Quartiles are not equidistant from median. and
(iii) The curve drawn with the help of the given data is not symmetrical but stretched more to one side
than to the other.

Measures of Skewness

(I)Sk=M - Md

(2)Sk =M - Mo.

3)Sk = (Q3 - Md) - (Md – Q1).

These are the absolute measures of skewness.

the relative measures called the coefficients of skewness which are pure numbers independent of
units of measurement. The following are the coefficients of Skewness

Prof, Karl Pearson's Coefficient of Skewness.

If mode is ill defined.

Bowley'!s Coefficient of Skewness. Based quartile

60
61
62
Kurtosis.

Kurtosis enables us to have an idea about the flatness or peakedness of the curve,

63
64

You might also like