Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
36 views28 pages

Stat 101 Notes

Vjcj

Uploaded by

dreamboys871
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views28 pages

Stat 101 Notes

Vjcj

Uploaded by

dreamboys871
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

STAT-101 Introduction to Statistics BS Programs

Chapter # 01 Statistics

Data and Datum:


The word data is used for numerical facts and single numerical fact is datum.
Statistics:-
The science of collection, presentation, analysis and interpretation of numerical data
is called statistics.
The mathematical science of making decisions and drawing conclusion from data in
situations of uncertainty.
Kinds of Statistics:-
(i) Descriptive Statistics:-
It provides procedure for organizing, summarizing and presenting the
collected data from the sample.
(ii) Inferential Statistics:-
It includes all those techniques and methods that help in deriving conclusion
regarding population considering only sample.
Population:-
The total group under study or the group to which the results will be generalized is
called population.
Sample:-
The representative part of the population is called sample. The subset of population is
also called sample.
Ratio:-
The ratio of A to B is the fraction A/B.
Proportion:-
A proportion is a special ratio of a part to its total.
Parameter:-
Any numerical quantity calculated from population is called parameter. For example,
µ &σ 2 .
Statistic:-
A numerical quantity calculated from sample observations is called statistic. For
x & S2 .

Sanan Fazal Lecturer in Statistics


M.Phil Statistics
+92-313-6212440 University of Gujrat 1
STAT-101 Introduction to Statistics BS Programs
Chapter # 01 Statistics

Variable:-
Any observed characteristics which can vary/change from individual to individual.
Constant:-
Any observed characteristics which cannot vary/change from individual to individual.
Quantitative Variable:-
A variable which can assume numerical value is called quantitative variable. For
example, age, height, weight, marks, etc.
Qualitative Variable:-
A variable which can assume non-numerical value is called qualitative variable. For
example, eye color, gender, etc.
Discrete Variable:-
Any quantitative variable which can assumes only some specific, finite or countable
values within a given range.
Continuous Variable:-
Any quantitative variable which can assumes every possible values within a given
interval.
Primary data / Raw Data / Ungroup Data:-
The data which has not undergone any statistical treatment is called primary data. 1st
hand collected data is called primary data. It is also called raw data and ungroup data.
Secondary data / Group Data:-
The data which has undergone any statistical treatment is called secondary data. 2nd
hand collected data is called secondary data. It is also called group data.
Sources of Primary Data:-
(i) Direct Personal Observation
(ii) Indirect Oral Observation
(iii) Estimates through Correspondents
(iv) Investigation through Schedules/Questionnaires
Sources of Secondary Data:-
(i) Government Organization
(ii) Semi- Government Organization
(iii) Published Sources
(iv) Unpublished Sources
(v) Internet

Sanan Fazal Lecturer in Statistics


M.Phil Statistics
+92-313-6212440 University of Gujrat 2
STAT-101 Introduction to Statistics BS Programs
Chapter # 02 Presentation of Data

Classification:-
The process of arranging observations into different classes or categories to some
common characteristics is called classification.
Tabulation:-
The process of making tables or arranging data into rows and columns is called
tabulation.
Steps in Constructing Table
(i) Title:-
It is the heading at top of the table.
(ii) Head Note / Prefatory Note:-
It appears after the title of the table. It is used for further description about the
title.
(iii) Column Captions and Box Head:-
The headings of columns are called column captions. The part of column
caption is called box head.
(iv) Row Captions and Stub:-
The headings of rows are called row captions. The part of row caption is
called stub.
(v) Body:-
The entries in different cells of columns and rows in a table is called body of
the table.
(vi) Source Note:-
Source notes are given at the end of the table.
(vii) Foot Note:-
It is given at bottom of the table.

Sanan Fazal Lecturer in Statistics


M.Phil Statistics
+92-313-6212440 University of Gujrat 1
STAT-101 Introduction to Statistics BS Programs
Chapter # 02 Presentation of Data

Frequency:-
The number of values falling in a particular class is called frequency of that class.
Frequency Distribution:-
Frequency distribution is a compact form of the data in a table which displays all
categories of observations according to their magnitudes and frequencies.
Cumulative Frequency Distribution:-
It is a table that displays class intervals and the corresponding cumulative frequencies.
Relative Frequency Distribution:-
It is a table that displays class intervals and the corresponding relative frequencies.
Class Limits:-
Class limit is also a technical term used to express non-overlapping classes like: 20-
29, 30-39, 40-49 and so on.
Class Boundaries:-
Class boundary is a technical term used to express overlapping classes like: 20-30, 30-
40, 40-50 and so on.
Mid-Point / Class Mark:-
The mid way of given class limits of a class or class boundaries of a class are called
mid-points or class marks.
Class size / Interval:-
Class size or class interval means gap between upper and lower class boundaries of
the frequency distribution.
Simple Bar Diagram:-
It is used to get an impression of the distribution of a discrete or categorical data set.
Multiple Bar Chart:-
It is an extension of the simple bar diagram and is used to represent two or more
related sets of data in the form of groups of simple bars..
Sub-Divided Bar Chart / Component Bar Chart:-
There are certain situations where the simple bar diagram represents the totals and it is
possible to divide it further into different segments.
Pie Chart:-
Component Part
It is a division of a circular region into different
= sectors. Q × 360o
Total Part

Sanan Fazal Lecturer in Statistics


M.Phil Statistics
+92-313-6212440 University of Gujrat 2
STAT-101 Introduction to Statistics BS Programs
Chapter # 02 Presentation of Data

Histogram:-
The graph of frequency distribution is called Histogram. It is a useful graphic
representation of data to get a visual impression about its distribution.
Historigram:-
The graph of time series is called Historigram.
Frequency Polygon:-
It is a closed geometric figure used to display a frequency distribution graphically.
Frequency Curve:-
The frequency polygon smoothed is called frequency curve, which is useful to have a
visual impression about the data.

Sanan Fazal Lecturer in Statistics


M.Phil Statistics
+92-313-6212440 University of Gujrat 3
STAT-101 Introduction to Statistics BS Programs
Chapter # 03 Measures of Central Tendency or Averages
Measures of Central Tendency
A single numerical value which represents the whole data is called an average. The
measure tend to lie in the central of the distribution is called measure of central tendency and
is also called measure of location because they locate the general position of the distribution.

Characteristics or Qualities of Good Average


(i) It should be clearly defined by mathematically formula.
(ii) It should be simple to understand and easy to calculate.
(iii) It should be based on all the values.
(iv) It should be capable of further algebraic treatment.
(v) It should be less affected by extreme values.
(vi) It should be relatively stable in repeated sampling treatment.

Types of Averages:
(1) Arithmetic Mean
(2) Geometric Mean
(3) Harmonic Mean
(4) Median
(5) Mode

Arithmetic Mean:-
The Arithmetic mean is obtained by dividing the sum of the values by their numbers
is called Arithmetic Mean (A.M). It is denoted by x . It is computed by following formula.

For Ungroup Data For Group Data

x=
∑x Direct Method x=
∑ fx
n ∑f
x= A +
∑D x= A+
∑ fD
n ∑f
D= x − A Indirect/Shortcut Method
A = Assume value or
provisional mean

x =+
A
∑u × h x=
A+
∑ fu × h
n Step-Deviation/Coding ∑f
x− A Method
u=
h
h = Class Interval

Sanan Fazal Lecturer in Statistics


M.Phil Statistics
+92-313-6212440 University of Gujrat 1
STAT-101 Introduction to Statistics BS Programs
Chapter # 03 Measures of Central Tendency or Averages
Properties of Arithmetic Mean:-
(i) Mean of a constant is a constant itself.
a = a . Where “a” is constant
(ii) Sum of deviation from mean is always zero.
∑(x − x ) = 0 (For ungroup data)
∑ f (x − x) =
0 (For group data)
(iii) Sum of square of deviation from mean is least or minimum.
∑( x − x ) < ∑( x − a)
2 2
(For ungroup data)

∑ f ( x − x ) < ∑ f ( x − a)
2 2
(For group data)

Merits:-
(i) It is simple to understand and easy to calculate.
(ii) It is clearly defined by mathematically formula.
(iii) It is based on all the values.
(iv) It is capable of further algebraic treatment.
(v) It is stable average in repeating sampling treatment.

Demerits:-
(i) It is greatly affected by extreme values.
(ii) It cannot be accurately computed for open-end classes without assuming open
ends.
(iii) It cannot be located graphically.
(iv) It is not suitable average in highly skewed distribution.

Geometric Mean:-
The nth root of the product of all observations is called Geometric Mean (G.M). It is
computed by following formula

For Ungroup Data For Group Data


G.M = n x1.x2 .x3 .........xn
1
or G.M = ( x1.x2 .x3 .........xn ) n  ∑ f log x 
G.M = Anti log  
or G.M = Anti log  ∑
 log x   ∑ f 

 n 

Sanan Fazal Lecturer in Statistics


M.Phil Statistics
+92-313-6212440 University of Gujrat 2
STAT-101 Introduction to Statistics BS Programs
Chapter # 03 Measures of Central Tendency or Averages
Properties of Geometric Mean:-

(i) If there are k sets each with observations n1.n2 .n3 .........nK and G1.G2 .G3 .........GK as
their Geometric Means. Then the combined Geometric Mean of the total
 k 
 ∑ ni log Gi 
observations is given by G.M combined = Anti log  i =1 k .
 ni 
 ∑ i =1

(ii) If there are two sets each consisting of n positive observations x11.x12 .x13 .........x1k
with Geometric Mean G1 and x21.x22 .x23 .........x2 k with Geometric Mean G2 then the
G
Geometric Mean G of the ratio of other Geometric Means. G = 1
G2

Merits:-
(i) It is rigorously defined by a mathematically formula.
(ii) It is based on all observed values.
(iii) It is amenable to mathematical treatment in certain cases.
(iv) It gives equal weightage to all the observations.
(v) It is not much affected by sampling variability / extreme values.
(vi) It is an appropriate type of average to be used in case rates of changes or ratios are
to be averaged.

Demerits:-
(i) It is neither easy to calculate nor to understand.
(ii) It becomes zero if any of the observations is zero.
(iii) In case of negative values, it cannot be computed at all.
Harmonic Mean:-
The reciprocal of the mean of reciprocals of the observations is called Harmonic
Mean. It is computed by following formula.

For Ungroup Data For Group Data


H .M =
n
H .M =
∑f
1 f
∑x ∑x

Properties of Harmonic Mean:-

(i) If there are k sets each with observations n1.n2 .n3 .........nK and H1.H 2 .H 3 .........H K as
their harmonic Means. Then the combined Harmonic Mean of the total
 k 
 ∑ ni 
observations is given by H .M combined =  ki =1  .
 ni 
∑ 
 i =1 H i 
Sanan Fazal Lecturer in Statistics
M.Phil Statistics
+92-313-6212440 University of Gujrat 3
STAT-101 Introduction to Statistics BS Programs
Chapter # 03 Measures of Central Tendency or Averages
Merits:-
(i) It is rigorously defined by a mathematically formula.
(ii) It is based on all the observations in the data.
(iii) It is amenable to mathematical treatment.
(iv) It is not much affected by sampling variability.
(v) It is an appropriate type for averaging rates and ratios.

Demerits:-
(i) It is not readily understood.
(ii) It cannot be calculated, if any one observation is zero.
(iii) It gives less weight to large values and more weight to small values.

Weighted Mean:-
Arithmetic mean is used when all the observations are given equal importance but
there are certain situations in which the different observations get different weights. In this

situation weighted mean is preferred. It is denoted by xw and calculated by xw =


∑ wx .
∑w
Median:-
The Median is defined as a value which divides a data set that have been ordered, into
two equal parts, one part comprising of observations greater than and the other part smaller
than it. Or more precisely, the median is a value at which 50% of the ordered data lie. It is
denoted by x and computed by following formula.

For Ungroup Data For Group Data


n +1 hn 
Median = The value of th observations Median =l +  −c
2 f 2 
Note: If 3.5th Observation Then Where
3rd Obs. + 0.5(4th Obs. - 3rd Obs.) l = Lower class boundary of median group
f = frequency of median group
h = Class Interval

Properties of Median:-

i) If a constant a is added to each of the n observations X1 , X 2 , X 3 ,......., X n having


median M , then the median of a + X1 , a + X 2 , a + X 3 ,......., a + X n would be a + M .
ii) If a is multiplied to each of the n observations, then median of aX1 , aX 2 , aX 3 ,......., aX n
would be aM .
iii) The sum of absolute deviations of the observations from their median is least.
i.e., ∑ X − median is minimum
iv) For a symmetrical distribution median is equidistant from the first and third
quartiles
i.e., Q3 − Median = Median − Q1
Where, Q1 and Q3 are first and third quartiles respectively.

Sanan Fazal Lecturer in Statistics


M.Phil Statistics
+92-313-6212440 University of Gujrat 4
STAT-101 Introduction to Statistics BS Programs
Chapter # 03 Measures of Central Tendency or Averages
Merits:-
(i) It is easily calculated and simple to understand.
(ii) It is located even when the values are not capable of quantitative measurement.
(iii) It is not affected by extreme value.
(iv) It can be computed when a frequency distribution involves “open-end” classes
like those of income and prices.
(v) In a highly skewed distribution, median is an appropriate average to use.
Demerits:-
(i) It is not rigorously defined.
(ii) It is not capable of leading itself to further statistical treatment.
(iii) It necessitates the arrangement of data into an array which can be tedious and time
consuming for a large body of data.
Quantiles:-
Quartiles, deciles, percentiles and other values obtained by equal subdivision of the
given set of data are collectively called Quantiles or sometimes Fractiles. The Quantiles
should be calculated when the number of observations is quite large.

Quartiles:-
The three values which divide the distribution into four equal parts, are called
quartiles. These values are denoted by Q1 , Q2 and Q3 respectively. Q1 is called the first or
lower quartile and Q3 is called the third or upper quartile. Q2 is called the second quartile.
Q 2 is equal to median.

For Ungroup Data For Group Data

Q1 = The value of
( n + 1) th observations Q1 =
h n
l +  −c

4 f 4 
2 ( n + 1) h  2n 
Q 2 = The value of th observations Q2 =l +  −c
4 f  4 
Q 2 =Median Q 2 =Median
3 ( n + 1) h  3n 
Q3 = The value of th observations Q3 =
l +  −c
4 f  4 
Deciles:-
The nine values which divide the distribution into ten parts, are called Deciles and are
denoted by D1 , D2 , D3 ,........., D9 . D5 is equal to median.

For Ungroup Data For Group Data

D1 = The value of
( n + 1) th observations D1 =
h n
l +  −c

10 f  10 
2 ( n + 1) h  2n 
D 2 = The value of th observations D2 =l +  −c
10 f  10 
. .
. .
Sanan Fazal Lecturer in Statistics
M.Phil Statistics
+92-313-6212440 University of Gujrat 5
STAT-101 Introduction to Statistics BS Programs
Chapter # 03 Measures of Central Tendency or Averages

5 ( n + 1) h  5n 
D5 = The value of th observations D5 =l +  −c
10 f  10 
D5 =Median D5 =Median
. .
. .
.
.
9 ( n + 1)
D9 = The value of th observations h  9n 
10 D9 =
l +  −c
f  10 

Percentiles:-
The ninety-nine values dividing the data into one hundred equal parts, are called
Percentiles and are denoted by P1 , P2 , P3 ,........., P99 . Where P25 is equal to Q1 , P50 is equal to
median and P75 is equal to Q3 .

For Ungroup Data For Group Data

P1 = The value of
( n + 1) th observations P1 =
h n
l+ 

−c
100 f  100 
2 ( n + 1) h  2n 
P2 = The value of th observations P2 = l+  −c
100 f  100 
. .
. .
.
.
25 ( n + 1)
P25 = The value of th observations h  25n 
100 P25 =l+  −c
f  100 
.
. .
. .
.
50 ( n + 1) h  50n 
P50 = The value of th observations P50 =l+  −c
100 f  100 
. .
. .
. .
75 ( n + 1) h  75n 
P75 = The value of th observations P75 =l+  −c
100 f  100 
. .
. .
. .
99 ( n + 1) h  99n 
P99 = The value of th observations P99 =l+  −c
100 f  100 

Sanan Fazal Lecturer in Statistics


M.Phil Statistics
+92-313-6212440 University of Gujrat 6
STAT-101 Introduction to Statistics BS Programs
Chapter # 03 Measures of Central Tendency or Averages
Mode:-
The most frequent value in the data is called mode. If a distribution has only one
model value is called unimodel distribution. If a distribution has two model values is called
Biomodel or Binomial distribution. If a distribution has more than two model values is called
Multimodel or Multinomial distribution. It is denoted by x̂ and computed by the following
formula.

For Ungroup Data For Group Data

Mode = l +
( f m − f1 ) h
( f m − f1 ) + ( f m − f 2 )
Most frequent value or Most repeated value Where
l = Lower class boundary of model group
f m = Maximum frequency
h = Class Interval

Properties of Mode:-

i) If a constant a is added to each of the n observations X1 , X 2 , X 3 ,......., X n having mode


m , then the mode of a + X 1 , a + X 2 , a + X 3 ,......., a + X n would be a + m .
ii) If a is multiplied to each of the n observations, then mode of aX1 , aX 2 , aX 3 ,......., aX n
would be am .

Merits:-
(i) It is simply defined and easily calculated.
(ii) In many cases, it is extremely easy to locate the mode.
(iii) It is not affected by abnormal large or small observations.
(iv) It can be determined for both the qualitative and the quantitative data.
Demerits:-
(i) It is not rigorously defined.
(ii) It is often indeterminate and indefinite.
(iii) It is not based on all the observations made.
(iv) It is not capable of lending itself to further statistical treatment.
(v) When the distribution consists of a small number of values the mode may not
exist.

Sanan Fazal Lecturer in Statistics


M.Phil Statistics
+92-313-6212440 University of Gujrat 7
STAT-101 Introduction to Statistics BS Programs
Chapter # 04 Measures of Dispersion, Moments and Skewness
Measure of Dispersion:-
The degree of observation from some central value is called dispersion. So dispersion
is variability of values about the measures of central tendency. A numerical value that
describes the spread of the values in a set of data is called measure of dispersion or
variability.
Types of Measures of Dispersion:-
There are two types of measures of dispersion or variability
i) Absolute Measures ii) Relative Measures
Absolute Measures of Dispersion:-
An absolute measure of dispersion is one that measures the dispersion in terms of the
same units or in the square of units, as the units of the data. For example, if the units of the
data are rupees, meters, kilograms, etc., the units of the measures of dispersion will also be
rupees, meters, kilograms, etc.
The most common measures of absolute variability are
a) Range
b) Quartile Deviation
c) Mean Deviation
d) Variance
e) Standard Deviation
Relative Measures of Dispersion:-
A relative measure of dispersion is one that is expressed in the form of a ratio, co-
efficient or percentage and is independent of the units of measurement.
The most common measures of relative variability are
a) Co-efficient of Range
b) Co-efficient of Quartile Deviation
c) Co-efficient of Mean Deviation
d) Co-efficient of Variance
e) Co-efficient of Standard Deviation
Range:-
The difference between maximum and minimum value is called range. It is denoted
by R. It is computed by the following formula.
R xm − x0
=

Its relative measure known as the co-efficient of range and the co-efficient of
xm − x0
dispersion is defined by the following relation: Co-efficient of range =
xm + x0

Sanan Fazal Lecturer in Statistics


M.Phil Statistics
+92-313-6212440 University of Gujrat 1
STAT-101 Introduction to Statistics BS Programs
Chapter # 04 Measures of Dispersion, Moments and Skewness
Merits:-
i) It is easy to calculate.
ii) It is useful measure in small samples.
Demerits:-
i) It is not based on all the observations.
ii) It ignores all the information available from the intermediate observations.
iii) It depends only upon the extreme observations.
Quartile Deviation:-

The half difference between upper quartile ( Q3 ) and lower quartile ( Q1 ) is called
quartile deviation or semi-interquartile range. It is denoted by Q.D.
Q3 − Q1
Q.D =
2

Its relative measure called the Co-efficient of Quartile Deviation or of Semi-


interquartile Range, is defined by the relation
Q3 − Q1
Co-efficient of Quartile Deviation =
Q3 + Q1

Merits:-
i) It is simple to understand and easy to calculate.
ii) It is not affected by extreme values.
Demerits:-
i) It is not based on all the values.
ii) Q.D. will be same value for all the distributions having the same quartiles.
iii) It gives no information about the position of observations lying outside the two
quartiles.
iv) It is not amenable to mathematical treatment.
Mean Deviation:-
The mean of absolute deviations of observations from mean, median and mode is
mean deviation.
OR
The mean of the deviation from central values (mean, median or mode) without
considering the algebraic sign is called mean deviation. It is denoted by M.D. It is
calculated as follows.

Sanan Fazal Lecturer in Statistics


M.Phil Statistics
+92-313-6212440 University of Gujrat 2
STAT-101 Introduction to Statistics BS Programs
Chapter # 04 Measures of Dispersion, Moments and Skewness
For Ungroup Data For Group Data
∑ x−x ∑ f x−x
Mean Deviation from Mean M.D( x ) = M.D( x ) =
n ∑f
∑ x − x ∑ f x − x
Mean Deviation from Median M.D( x ) = M.D( x ) =
n ∑f
∑ x − xˆ ∑ f x − xˆ
Mean Deviation from Mode M.D( xˆ ) = M.D( xˆ ) =
n ∑f

Co-efficient of mean deviation is obtained by dividing the mean deviation by the


average used in the calculation of deviations.
M.D( x )
Co-efficient of M.D( x ) =
x

M.D( x )
Co-efficient of M.D( x ) =
x

M.D( xˆ )
Co-efficient of M.D( xˆ ) =

Properties of Mean Deviation:-


i) M.D from median is less than any other.
∑ X i − Median
i.e., is least.
n
ii) It is always greater than or equal to zero.
i.e., M.D ≥ 0
4
iii) For symmetrical distributions, the following relation holds M.D= σ.
5

Merits:-
i) It is easy to calculate.
ii) It is based on all the values.
iii) It gives more information than the range or the quartile deviation.

Demerits:-
i) It is affected by the extreme values.
ii) It is not readily capable of mathematical development.
iii) It does not take into account the negative signs of the deviations from some
average.

Sanan Fazal Lecturer in Statistics


M.Phil Statistics
+92-313-6212440 University of Gujrat 3
STAT-101 Introduction to Statistics BS Programs
Chapter # 04 Measures of Dispersion, Moments and Skewness
Variance:-
The mean of squared deviations from mean is called variance and is denoted

For population by σ2

For sample by S 2
It is computed by the following formula.

For Ungroup Data For Group Data


∑ f (X − X )
2
∑( X − X )
2

S2 = Direct Method 2
S =
n ∑f
2 2 2
∑ X 2 − ∑ X  ∑ fX −  ∑ fX 
S2
=  n  Shortcut Method =S2  
n   ∑f  ∑f 
2 2 2
∑ D2 −  ∑ D  ∑ fD −  ∑ fD 
S2
=  n  Deviation Method =S2  
n   ∑f  ∑f 
 ∑ u 2  ∑ u 2   fu 2  fu 2 
S 2 h2  S 2 h2 
∑ ∑
= −   Coding Method = −  
 n  n    ∑f  ∑ f  
 

Standard Deviation:-
The square root of the mean of squared deviations from mean is called standard
deviation and is denoted

For population by σ

For sample by S

It is computed by the following formula.

For Ungroup Data For Group Data


∑ f (X − X )
2
∑( X − X )
2

S= Direct Method S=
n ∑f
2 2 2
=S
∑ X 2 − ∑ X  Shortcut Method =S
∑ fX −  ∑ fX 
 n   
n   ∑f  ∑f 
2 2 2
=S
∑ D2 −  ∑ D  Deviation Method =S
∑ fD −  ∑ fD 
 n   
n   ∑f  ∑f 
2 2 2
=S h
∑ u2 −  ∑ u  Coding Method =S h
∑ fu −  ∑ fu 
 n   
n   ∑f ∑f 

Sanan Fazal Lecturer in Statistics


M.Phil Statistics
+92-313-6212440 University of Gujrat 4
STAT-101 Introduction to Statistics BS Programs
Chapter # 04 Measures of Dispersion, Moments and Skewness
Properties of Variance and S.D:-
i) Variance or S.D of a constant is zero.
i.e., Var ( a ) = 0
i.e., S .D ( a ) = 0
ii) Variance or S.D is independent of origin
OR
Variance or S.D remains unchanged if a constant is add to or subtract from each
values of the variables.
i.e., Var ( X ± a ) =Var ( X )
i.e., S .D ( X ± a ) =
S .D ( X )
iii) Variance or S.D is not independent of scale.
OR
Variance is multiply or dividing by square of the constant if each value of the
variable is multiplied or divided by that constant.
X 1
i.e., Var ( bX ) = b 2 var ( X ) OR Var   = 2 var ( X )
 c  c
S.D is multiply or dividing by the constant if each value of the variable is
multiplied or divided by that constant.
X 1
i.e., S .D ( bX ) = bS .D ( X ) OR S .D   = S .D ( X )
 c  c
iv) Variance or S.D of the sum (or difference) of two independent variable is equal to
the sum of their separate variance or S.D.
i.e., V ar ( X ±=
Y ) V ar ( X ) + V ar (Y )
i.e., S .D ( X=
±Y ) Var ( X ) + Var (Y )
v) If k subgroups of data containing of N1 , N 2 , N3 ,......., N k ( ∑ Ni = N ) observations have
respective means x1 , x2 , x3 ,......., xk and variances S12 , S22 , S32 ,......., Sk2 , then the variance
of the combined observations is given by
∑ ni  Si + ( X i − X c ) 
2 2

Sc2 =  , i = 1, 2, 3,...., k
∑ ni
n1 X1 + n2 X 2 + ....... + nk X k
Where, X c =
n1 + n2 + ....... + nk

Merits of Variance and S.D:-


i) It is based on all the observations of a series.
ii) It is easy to calculate and simple to understand.
Demerits of Variance and S.D:-
i) It is affected by extreme values.

Sanan Fazal Lecturer in Statistics


M.Phil Statistics
+92-313-6212440 University of Gujrat 5
STAT-101 Introduction to Statistics BS Programs
Chapter # 04 Measures of Dispersion, Moments and Skewness
Co-efficient of Variance:-
The percentage ratio between S.D and the mean is called co-efficient of variation. It is
used to compare to more distributions. A large value of C.V indicates that the variability is
S
great and a small value of C.V indicates less variability. It is computed by C.V= × 100 .
X

Co-efficient of Standard Deviation:-


The ratio between S.D and the mean is called co-efficient of S.D. It is computed by
S
C.SD = .
X

Moments:-
The moments about mean are the mean of deviations from the mean after raising them
to integer powers. The rth population moment about the mean is denoted by µr is defined as:

For Ungroup Data For Group Data


N

∑ f (X −X)
r
N

∑( X −X)
r i i
i
Where r = 1, 2,3,........ µr = i =1
µr = i =1 N

N ∑f i =1
i

The rth sampled moment about the mean is denoted by mr is defined as:

For Ungroup Data For Group Data


n

∑ f (x − x)
r
n

∑(x − x )
r i i
i
Where r = 1, 2,3,........ mr = i =1
n
mr = i =1

n ∑f i =1
i

Sanan Fazal Lecturer in Statistics


M.Phil Statistics
+92-313-6212440 University of Gujrat 6
STAT-101 Introduction to Statistics BS Programs
Chapter # 04 Measures of Dispersion, Moments and Skewness
Moments about the Mean:-

For Ungroup Data For Group Data

m1 =
∑(x − x )
i
m1 =
∑ fi ( xi − x ) m1 = 0 (Always)
n ∑ fi
∑ f (x − x)
2
∑(x − x )
2

m2 = i
m2 = i i
m2 = Variance (Always)
n ∑f i

∑ f (x − x)
3
∑(x − x )
3
i i
m3 = i
m3 =
n ∑f i

∑ f (x − x)
4
∑(x − x )
4
i i
m4 = i
m4 =
n ∑f i

Moments about the Origin Arbitrary Value:-


i) For Equal and Unequal Class Interval

For Ungroup Data For Group Data

=
∑ ( xi − a ) ∑ D=
m1′ = i ∑ fi ( xi − a ) ∑ fi Di
m1′ =
n n ∑ fi ∑ fi
∑fD
2
∑ f ( x − a)
2
∑=
2
∑ ( xi − a )
2
D
m′ = i i i
m2′
i
= = i

∑f ∑f
2
n n i i

∑ f ( x − a) ∑fD
3
∑=
3

3
( x − a)
3
D
=m3′ = i m′ = i
i i i i

∑f ∑f
3
n n i i

∑ f ( x − a) ∑fD
4 4
∑=
4
∑ ( x − a)
4
D
=m4′ = i m′ = i
i i i i

∑f ∑f
4
n n i i

ii) For Equal Class Interval

For Ungroup Data For Group Data

m1′ = h
∑u i
m1′ = h
∑ fiui
n ∑ fi
∑u
2
m2′ = h 2
∑ fu i i
2

m2′ = h 2 i

n ∑f i

∑u
3
m3′ = h3
∑ fu i i
3

m3′ = h 3 i

n ∑f i

∑u
4
m4′ = h 4
∑ fu i i
4

m4′ = h 4 i

n ∑f i

Sanan Fazal Lecturer in Statistics


M.Phil Statistics
+92-313-6212440 University of Gujrat 7
STAT-101 Introduction to Statistics BS Programs
Chapter # 04 Measures of Dispersion, Moments and Skewness
Moments about the Origin Zero:-

For Ungroup DataFor Group Data

=
∑ ( xi − 0 ) ∑ x=
m1′ = i ∑ fi ( xi − 0 ) ∑ fi xi
m1′ =
n n ∑ fi ∑ fi
∑ f ( x − 0) ∑fx
2
∑ ( x − 0) ∑ x
2 2 2

=m2′ = = i
m′ = i i i i i

∑f ∑f
2
n n i i

∑ f ( x − 0) ∑fx
3
∑ ( x − 0) ∑ x
3 3 3

=m3′ = = i
m′ = i i i i i

∑f ∑f
3
n n i i

∑ f ( x − 0) ∑fx
4
∑ ( x − 0) ∑ x
4 4 4

=m4′ = = i
m′ = i i i i i

∑f ∑f
4
n n i i

If m1′ , m2′ , m3′and m4′ are given and we want to calculate first four moments about mean then
we use the following formulas:

m1 = m1′ − m1′ = 0

( )
2
m2′ − m1′
m2 = Variance
=

( )
3
m3′ 3m2′m1′ + 2 m1′
m3 =−

( ) ( )
2 4
m4′ − 4m3′m1′ + 6m2′ m1′
m4 = − 3 m1′

Sanan Fazal Lecturer in Statistics


M.Phil Statistics
+92-313-6212440 University of Gujrat 8
STAT-101 Introduction to Statistics BS Programs
Chapter # 04 Measures of Dispersion, Moments and Skewness
Importance of Moments:-
The measures of location along with measures of dispersion/variability are useful to
describe a data set but fail to tell anything about the shape of the distribution. For this
purpose, we need to define certain other measures. Some important measures about the shape
of the distribution depend on what we call moments. These moments are discussed under
skewness and kurtosis.
Sheppard’s Corrections:-
In the calculation of moments from a grouped frequency distribution, certain errors
are introduced by the assumption that the frequencies associated with a class are located at
the midpoint of the class interval. These errors therefore need corrections. It has been shown
by W.F Sheppard that, if the frequency distribution (i) is continuous and (ii) tails off to zero
at each end, the corrected moments are as given below:

h2
m2 ( corrected ) m2 ( uncorrected ) −
=
12

m3 ( corrected ) = m3 ( uncorrected )

h2 7 4
m4 ( corrected ) =
m4 ( uncorrected ) − .m2 ( uncorrected ) + h
2 240
Where h = Class Interval
Note:-
These corrections are not applicable to highly skewed distributions and distributions
having unequal class-intervals.
Moment-Ratios:-
There are certain ratios in which both the numerators and denominators are moments.
They are independent of origin and units of measurement, i.e., they are pure members.

For Population Data For Sample Data


µ 2
m32
β1 = 3
b1 = 3
µ 3
2 m2
µ m
β 2 = 42 b2 = 42
µ2 m2

Where 1st moment ratio is the square of the third moment expressed in standard units and 2nd
moment ratio is the fourth standardized moment.

Sanan Fazal Lecturer in Statistics


M.Phil Statistics
+92-313-6212440 University of Gujrat 9
STAT-101 Introduction to Statistics BS Programs
Chapter # 04 Measures of Dispersion, Moments and Skewness
Skewness:-
A distribution in which the values equidistant from the mean have equal frequencies
is defined to be symmetrical and any departure from symmetry is called skewness.
If mean, median & mode are equal/coincide/identical/same and that the two tails of the
distribution are equal in length from the mean, the distribution is symmetrical.
i.e., Mean = Median = Mode Distribution is Symmetrical
If Mean > Median > Mode and the right tail is longer than the left tail, the distribution is said
to have positive skewness.
i.e., Mean > Median > Mode Distribution is Positively Skewed
If Mean < Median < Mode and the left tail is longer than the right tail, the distribution is said
to have negative skewness.
i.e., Mean < Median < Mode Distribution is Negatively
Skewed
Measures of Skewness:-
There are four measures of skewness.
i) Karl Pearson 1st Co-efficient of Skewness:-

Mean − Mode
Sk =
Standard Deviation
ii) Karl Pearson 2nd Co-efficient of Skewness:-
Sometimes mode is ill-defined and is difficult to locate by simple methods
then

3 ( Mean − Median )
Sk = −3 < S k < +3
Standard Deviation
This co-efficient usually lies/varies between -3 (negative skewness) and +3 (positive
skewness) and the sign indicates the direction of the skewness.
iii) Bowley’s Co-efficient of Skewness Based on Quartiles:-

Q3 + Q1 − 2Median
Sk = −1 < S k < +1
Q3 − Q1

This co-efficient lies between -1 & +1. For symmetrical distributions its value is zero.

Sanan Fazal Lecturer in Statistics


M.Phil Statistics
+92-313-6212440 University of Gujrat 10
STAT-101 Introduction to Statistics BS Programs
Chapter # 04 Measures of Dispersion, Moments and Skewness
iv) Moment Ratio ( b1 ):-

m32
b1 = 3
m2

If b1 = 0 Distribution is Symmetric

If b1 > 0 Distribution is Positively Skewed

If b1 < 0 Distribution is Negatively


Skewed

Kurtosis:-
The word kurtosis is used to indicate the length of the tails and peakedness of
symmetrical distributions. Symmetrical distributions may be platykurtic, mesokurtic (normal)
or leptokurtic.
i) The mesokurtic is the usual normal distribution.
ii) The leptokurtic is more peaked and has many values around the mean and in the
tails away from the mean. The leptokurtic distribution may be composite of two
normal distributions with the same mean but different variances.
iii) The platykurtic is bit flat and has more values between the mean and tails. The
platykurtic distribution may be composite of two normal distributions with the
same variance but different means.
Measures of Kurtosis:-

i) Moment Ratio ( b2 ):-


m4
b2 =
m22

If b2 = 3 Distribution is Mesokurtic

If b2 > 3 Distribution is
Leptokurtic

If b2 < 3 Distribution is Platykurtic

ii) Percentile Co-efficient of Kurtosis:-

Q.D
K= 0 < K < 0.50 K = 0.263 for a Normal Distribution
P90 − P10

Sanan Fazal Lecturer in Statistics


M.Phil Statistics
+92-313-6212440 University of Gujrat 11
STAT-101 Introduction to Statistics BS Programs
Chapter # 05 Index Numbers
Index Number:-

The device which measures the changes or variations which are occurred in the data
Pn
due to index number. I=
n ×100
Po

Where Pn = Price in Current Year n Po = Price in Base Year

Types of Index Number:-

(i) Simple Index Number (ii) Composite or aggregate Index Number

Simple Index Number:-

An index number is computed for a single commodity or variable is called simple


index number.

Composite or Aggregate Index Number:-

An index number is computed for more than one commodity or variable is called
composite index number.

Uses of Index Number:-

Index number are used as a economic barometers for measuring the prevailing
conditions as well as changes in economic variables like whole sale prices, consumer prices,
production, investment, import, export, business conditions and terms of trade etc.

Limitation of Index Number:-

(i) All index numbers are not suitable for all purpose.
(ii) They are based on sampling and sampling error creep into the calculations.
(iii) Comparison of changes in variable over long period are not reliable.
(iv) The choice of normal period is difficult.
(v) It is not practicable to price all the goods and services.

Price Index Number:-

An index number which measures the changes in the whole sale or relative prices of a
particular commodity or a number of commodities is called price index number.

Quantity Index Number:-

An index number which measures the changes in the quantity or volume of goods,
produced, consumed exported or imported is called quantity or value index number.

Sanan Fazal Lecturer in Statistics


M.Phil Statistics
+92-313-6212440 University of Gujrat. 1
STAT-101 Introduction to Statistics BS Programs
Chapter # 05 Index Numbers
Price Relative:-

The percentage ratio of the price in current year to the price in the base year is called
P
price relative. It is computed by n ×100
Po

Link Relative:-

The percentage ratio of the price in current year to the price in the preceding year is
P
called link relative. It is computed by n ×100 .
Pn −1

Fixed Base Method:-

In a fixed base method, one of the time period is chosen as the base and rest of prices
of the various time period are divided by base period price and the results are expressed in
percentage form. These results are also called price relatives.

Chain Base Method or Chain Indices:-

In a chain base method, the price of preceding year is taken as base then compute the
P
link relatives by the formula n ×100 . Then link relatives converted into a fixed base is
Pn −1
called chain indices.

The chain index for a year is obtained by multiplying average of the link relatives of
that year by chain index of the preceding year and then dividing the resulting product by
hundred.

Un-Weighted Price (or Quantity) Index Number:-

An index number that measures the changes in the price or quantity of a group of
commodities. When the relative importance of commodities is not taken into account is called
un-weighted index number.

Weighted Price (or Quantity) Index Number:-

An index number that measures the changes in the price or quantity of a group of
commodities. When the relative importance of commodities has been taken into account is
called weighted index number.

Sanan Fazal Lecturer in Statistics


M.Phil Statistics
+92-313-6212440 University of Gujrat. 2
STAT-101 Introduction to Statistics BS Programs
Chapter # 05 Index Numbers
Simple Aggregative Price index Number:-

Simple aggregative price index number is the percentage ratio between the sums of
commodity prices in current year and the sum of commodity prices in base year is called
simple aggregative price index number.

Simple Weighted Aggregative Price index Number:-

Weighted aggregative price index number is the percentage ratio between the sums of
weighted commodity prices in current year and the sum of weighted commodity prices in
base year is called weighted aggregative price index number. There are four different
formulae of weighted aggregative price index number.

Price Index Number Quantity Index Number

Laspeyres’ =Pon
∑p q n o
×100 Qon
=
∑q n po
×100
∑p q o o ∑q o po

Paasche’s =Pon
∑p q n n
×100 Qon
=
∑q n pn
×100
∑p q o n ∑q o pn

Fisher’s Ideal Pon =


∑p q ×∑p q
n o n n
×100 Qon =
∑q
n po
×
∑q n pn
×100
∑p q ∑p q
o o o n ∑q
o po ∑q o pn

=
Marshall-Edgeworth Pon
∑ p q +∑ p q =
n o n n
× 100 Q P
∑ q p +∑ q p n o n n
× 100
∑ p q +∑ p q ∑q p + ∑q p
on on
o o o n o o o n

Where

pn = Price in Current Year

po = Price in Base Year

qn = Quantity in Current Year

qo = Quantity in Base Year

Value Index Number:-

=Pon
∑p q n n
×100
∑p q o o

Where, Value = Product of Price and Quantity

Sanan Fazal Lecturer in Statistics


M.Phil Statistics
+92-313-6212440 University of Gujrat. 3
STAT-101 Introduction to Statistics BS Programs
Chapter # 05 Index Numbers
Main Steps in construction of Price Index Number:-

(i) Purpose and Scope:-


The purpose for which the index number is constructed must be clearly defined. It
should clearly straight why, where, and what change are to be measure.

(ii) Selection of commodities:-

A measurable number of relatively important commodities should be include. The


included commodities should be representative of the taste and habits and
requirements of the people concerned and should be easily recognizable.

(iii) Collection of Prices:-

The Prices of the selected commodities are to be carefully selected from the
selected markets through enumerates, trade associations, chamber of commerce,
news correspondents and govt. price reporters, etc.

(iv) Selection of Base Period:-

The period with which prices in other period are to be compared is called base
period. It should be normal year. There are two methods for selecting the base
period (i) Fixed Base Method (ii) Chain Base Method

(v) Choice of Average:-

The price relatives are computed by fixed base method and link relatives by
compute chain base method for more than one commodities then these relatives
are averaged to get the index number.

For this the following averages may be used

(a) Arithmetic (b) Median (c) Geometric Mean


(vi) Selection of Weights:-
All the included commodities are not equally important. For example, wheat is
more important than rice and maize. Therefore an appropriate wheat should be
given keeping in view their relative importance. The weights could be an
appropriate number or quantities of various commodities.

Sanan Fazal Lecturer in Statistics


M.Phil Statistics
+92-313-6212440 University of Gujrat. 4
STAT-101 Introduction to Statistics BS Programs
Chapter # 05 Index Numbers
Cost of Living Index Number:-

An index that measures the changes in the price of a specific basket of goods and
services between current year and base year is called cost of living index number. The basket
of goods and services contains (i) Food (ii) Clothing (iii) House Rent (iv) Education (v) Misc.
etc.

Pon =
∑ IW
∑W
pn
Where ∑W = ∑ p qo o and =
I
po
×100

Uses of Price Index Number:-

(i) The price index numbers are used to measure the changes in the price of
commodities or a group of related commodities.
(ii) They measure the purchasing power of money.
(iii) They are used to measure the changes in the level of industrial production.

(iv) They are used to forecast the further economic barometers, business conditions of
country and to discover seasonal fluctuation and business cycle.

The consumer price index numbers are used to measure the changes in retail price of
specified in formulating the policies.

Sanan Fazal Lecturer in Statistics


M.Phil Statistics
+92-313-6212440 University of Gujrat. 5

You might also like