Lecture 1
ON
MATH-208
(Probability and Statistics)
BY
Kiran Kumar Shrestha
Department of Mathematics
School of Science
Kathmandu University
TO
CIVE - II – II Group
Topics Covered
Numerical representation of data
Measurement of Central Values
Measurement of Variation
Date: Friday, Nov.19, 2021
MATH 208 (Probability and Statistics)
Chapter I – Data Representation
Section I – Numerical Representation of Data
Section II – Graphical Representation of Data
Numerical Representation of Data
Data
Data can be defined as some numeric or literal value describing some attribute of one or more entities.
Exm – Age of Ram is 19.
In crude form data are meaningless and no decision can be made with them. For making any decision
with the help of data we need to process them. After processing data we get some meaningful value called
information.
Data Processing
It is the activity of working on data to get some meaningful value called information so that we can use
data to make some decision.
Some examples are –
Arranging data in order
Find maximum/ minimum value
Finding average/ variation , etc.
Measuring Central Values of Data
Types-
Mean
Median
Mode
Mean
Types-
Arithmetic mean
Geometric mean
Harmonic mean
Arithmetic Mean
#.1 For individual series -
Or,
̅ ∑ ∑ ∑
#.2 For discrete frequency distribution
̅ ∑
where ∑ is the sum of frequencies.
Or,
̅ ∑
#.3 For Continuous Frequency Distribution (with classes defined)
̅ ∑
where
Median
Median of a data distribution is the value which divides it into two equal parts (halves) so that 50% of
data lie above it and 50% lie below it.
Mean is preferred when actual values are important and median is preferred when some attribute of the
values are important. For example- if actual time is important then mean is used, however, if timing is
important then median is preferred.
Measurement of median
#.1 Individual series
( )
Example-
23 24 43 44 45 53 67 82
Here n = 8,
Now,
( ) ( )
( )
#.2 For discrete frequency distribution-
( )
#.3 For continuous frequency distribution
Mode
#.1 For individual series
If no value is repeated then mode is not defined. Mode is not also defined if two or more values are
repeated same number of times.
#.2 For discrete frequency distribution
#.3 For continuous frequency distribution-
Partition Values
Types-
1. Median
2. Quartiles
3. Deciles
4. Percentiles
Quartiles
Quartiles are 3 values which divide given set of data into four equal parts and they are denoted as Q1, Q2
and Q3.
Notes:
#.1
#.2 Below Q1, 25% of data lie, above Q3, 25% of data lie and between Q1 and Q3 50% of data lie.
Measurement
#.1 for individual series
( )
{ ( )}
#.2 for discrete freq. dist.
( )
{ ( )}
#.3 continuous freq. dist.
Measurement of Variation/ Scatteredness/ Dispersion/ Uniformity
Variation of a data distribution can be defined as a measure of heterogeneity (or homogeneity) of data.
Different measures of variation can be divided into two broad classes :
1. Absolute measure of variation
2. Relative measure of variation
Different types of absolute measure of variation (with which unit used to measure data are associated)
are-
Range/ coefficient of range
Interquartile range/ quartile deviation
Mean deviation
Standard deviation/ variance/ coefficient of variation
Range
Note:(i) For continuous freq. distr. with classes defined,
(ii) Range is absolute measure of variation (since unit used to express data are associated). A relative
measure of range (with whihc unit of measurement is not associated) is given by
Inter-quartile Range
Quartile Deviation (Q.D.)/ Semi-interquartile range
A relative measure of Q.D. is given by
Mean Deviation
For individual series:
∑| ̅|
For discrete and continuous frequency distribution –
∑ | ̅|
Standard Deviation
For Individual series
√ ∑ ̅ √ ∑ ̅ √ ∑ ( ∑ )
Example –
Find s.d. of following data – 30,40,35,22,25,48,45.
Method I –
Here, mean is
̅ ∑
Now, s.d., is
√ ∑ ̅ √
√ √
Method II –
We have
√ ∑ ( ∑ )
√ ( )
√ ( )
√ √
For Discrete and Continuous F.D.
√ ∑ ̅ √ ∑ ̅ √ ∑ ( ∑ )
Problem:
Given data
Marks 0-10 10-20 20-30 30-40 40-50
No. of Student 7 12 24 10 7
Solution-
Working Table-
Marks Mid-Value (x) No. of Students (f) fX fX2
0-10 5 7 35
10-20 15 12 180
20-30 25 24 600
30-40 35 10 350
40-50 45 7 245
Total N=60 1480 44300
Now
√ ∑ ( ∑ ) √ ( )
√ √
Notes:
#.1 The square of standard deviation is called variance of data, i.e.,
#.2 The relative measure of s.d. is called coefficient of standard deviation and is given by
#.3 If coefficient of s.d. is multiplied by 100 to express as percentage, then it is called coefficient of
variation (C.V.), so
#.4 Coefficient of variation is used to compare variations of two or more sets of data values.
Problem/ Example
#.(A) For individual data values-
Discussed in the previous problem.
#.(B) For discrete frequency distribution-
Following is the frequency distribution of the weekly wages of 900 workers in construction project.
Wage Frequency
10000 51
20000 128
30000 248
40000 356
50000 95
60000 22
Calculate following values of weekly wage: (a) mean (b) median (c) quartiles (d) mode (e) range (f)
coefficient of range (g) quartile deviation (h) coefficient of quartile deviation (i) standard deviation (j)
variance (k) coefficient of variation.
Solution-
#(a)
#(b)
Working table for median
Cum.
Wage Frequency Freq.
10000 51 51
20000 128 179
30000 248 427
40000 356 783
50000 95 878
60000 22 900
( ) ( )
#.(c)
( ) ( )
( ) ( )
√ ∑ ̅
Working table (for calculation of s.d.)-
Wage Frequency
(x) (f) fx x-mean (x-mean)2 f(x-mean)2
10000 51 510000 -24244.4 587792871 29977436417
20000 128 2560000 -14244.4 202904071 25971721077
30000 248 7440000 -4244.44 18015271 4467787187
40000 356 14240000 5755.56 33126471 11793023645
50000 95 4750000 15755.56 248237671 23582578737
60000 22 1320000 25755.56 663348871 14593675160
900 30820000 110386222222.24
Here,
#.(C) Long Answer Problem
Following data represent the lives of two models of refrigerators A and B
Life No. of Freeze of Model A No. of Freeze of Model B
0-2 5 2
2-4 16 7
4-6 13 17
6-8 7 19
8-10 5 9
10-12 4 1
Which model has greater (less) uniformity (consistency, variation, dispersion)?
Solution-
Working table-
Life Mid-value (x) fA fB fA.x fB.x fAx2 fBx2
0-2 1 5 2 5 2 5 2
2-4 3 16 7 48 21 144 63
4-6 5 13 17 65 85 325 425
6-8 7 7 19 49 133 343 931
8-10 9 5 9 45 81 405 729
10-12 11 4 1 44 11 484 121
Total 50 55 256 333 1706 2271
Calculation of mean
̅ ∑
̅ ∑
Calculation of s.d.
We have
√ ∑ ̅ √
√ ∑ ̅ √
Calculation of C.V.
Conclusion-
More uniform – Model B
More dispersed/ varied/ scattered – Model A
More consistent – Model B