Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
15 views19 pages

Chapter 2 Data Processing New Version2

Chapter 2 discusses data processing, emphasizing the importance of transforming raw data into valuable information through various statistical measures. It covers measures of central tendency, dispersion, and relationships, with a focus on calculating averages, including arithmetic mean and median, using different methods. The chapter also highlights the objectives, requisites, merits, and limitations of these statistical measures.

Uploaded by

racheldaimai
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views19 pages

Chapter 2 Data Processing New Version2

Chapter 2 discusses data processing, emphasizing the importance of transforming raw data into valuable information through various statistical measures. It covers measures of central tendency, dispersion, and relationships, with a focus on calculating averages, including arithmetic mean and median, using different methods. The chapter also highlights the objectives, requisites, merits, and limitations of these statistical measures.

Uploaded by

racheldaimai
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Chapter-2: Data Processing

INTRODUCTION
Data is described in layman terms as the collection of facts, such as measurements, observations or just descriptions of things. Data are also the characteristics or
information, usually numerical that is collected through observation. In a more technical sense, Data are the set of values of qualitative or quantitative variables
about one or more persons or objects.

The numerical values which are collected through the measurements, observations, or descriptions need to be processed to become useful and valuable
information. In other words, the collected data needs to be compiled, analysed and presented through the different graphical methods to make it understandable
to all. For this, different measures are used. These measures which are used are:

1. Measures of Central Tendency


2. Measures of Dispersions
3. Measures of Relationships

The Measures of central tendency provide the value that is an ideal representative of a set of observations. The measures of dispersion take into account the
internal variations of the data, often around a measure of central tendency. The measures of relationship, on the other hand, provide the degree of association
between any two or more related phenomena, like rainfall and incidence of flood or fertilizer consumption and yield of crops
1. Measures of Central tendency
I. Introduction
One of the most important objectives of the statistical analysis is to get one single value that describes the characteristics of the entire mass of the unwieldy
data. Such a value is called the central value of an “Average” or the expected value of the variable. The word average is very commonly used in day to day
conversation. For example, we often talk about the average boy in a class, average height of an Indian, average income, average marks of the class etc. When
we say that ‘he is an average student, that means he is neither very good nor very bad student, just a mediocre student. However in statistics, the term average
has a different meaning.
Average in Statistics is not just mediocre but rather it is the single value that represents a group of values. Such a value is of significance because it depicts the
characteristics of the whole group. Since an average represents the entire data, it lies somewhere in between the two extremes, i.e. the largest item and the
smallest item. For this reason, an average is frequently referred to as the “Measure of central tendency”.

II.Objectives of Averaging
The main objectives of the studies of averages:
A. To get the single value that describes the characteristics of the entire group.
B. To facilitate comparisons.

III. Requisites of a good average.


A. Easy to understand E. Rigidly Defined
B. Simple to compute F. Capable of further algebraic treatment
C. Based on all the items G. Sampling stability
D. Not be unduly affected by the extreme observations
IV. Types of Averages:
The following are the types of averages:
A. Arithmetic Mean: C. Mode
i. Simple Arithmetic mean D. Geometric mean
ii. Weighted Arithmetic mean E. Harmonic mean
B. Median
Apart from these, there are other less important averages like moving averages, progressive averages, etc. These averages have a very limited field of
application and are therefore not so popular.
A. Arithmetic mean
The most popular and widely used measure of representing the entire data by one value is what most laymen call an “average” and what the statisticians
call as the Arithmetic mean. Its value is obtained by adding together all the items and by dividing this total by the number of items. Arithmetic Mean may
either be
i. Simple Arithmetic mean
ii. Weighted Arithmetic mean

Calculating both the Simple arithmetic mean and the weighted arithmetic mean is done for 3 types of Observations:

i. Individual observations iii. Continuous series observations


ii. Discrete series observations a. Direct Method.
a. Direct method. b. Short-cut Method.
b. Shortcut method.
Computing Arithmetic Mean
Calculation of Arithmetic Mean- Individual Observations (calculate the Simple arithmetic mean; the data has been given below.
Create a frequency distribution table and then find the sum of all the frequencies and then divide the sum of the frequencies by the total number of
Individuals or items. One has been done for you. Try doing the rest.)

Q. The following table gives the monthly income of 10 employees in an office:

Income (Rs.) 14780, 15760, 26690, 27750, 24840, 24920, 16100, 17810, 27050, 26950

Calculate the Arithmetic mean of the income by Direct Method and Short-cut method.

Soln.

Direct Method
Calculation of Arithmetic mean

Monthly x̄= ΣX
Employees Income (Rs.) N
(X) = 222650
1 14780 10
2 15760 =22265
3 26690
4 27750 The Average income of the employees is Rs. 22265
5 24840
6 24920
7 16100
8 17810
9 27050
10 26950
N = 10 ΣX = 222650
Short-cut Method

Σd
x̄ = A +
N

Monthly d = (X-A*)
Employees Income (Rs.) (X-22000)
(X)
1 14780 -7220
2 15760 -6240
3 26690 +4690
4 27750 +5750
5 24840 +2840
6 24920 +2920
7 16100 -5900
8 17810 -4190
9 27050 +5050
10 26950 +4950
N = 10 ΣX = 222650 Σ d = 2650

Σd
x̄ = A +
N
A = 22000 (Here it’s taken as 22000, since ΣX/N= 222650/10 = 22265; rounding off the value to nearest ‘000 value we get 22000), Σ d = 2650; N = 10

2650
x̄ = 22000 + 10

=22000 + 265
=22265
The Average income of the employee is Rs. 22265.
Calculation of Arithmetic Mean- Discrete series observations
In the discrete series, Arithmetic mean may be computed by applying
i. Direct Method
ii. Indirect Method

Calculation
Q. From the following data of marks obtained by 60 students of a class, calculate the arithmetic mean using the Direct Method as well as the Shortcut
method.

Marks 20 30 40 50 60 70
No. Of Students 8 12 20 10 6 4
Soln.
Direct Method
Let the marks be X and the number of students be F.

Marks No. of students


f.X
(X) (f)
20 8 160
30 12 360
40 20 800
50 10 500
60 6 360
70 4 280
Σf. or N = 60 Σf.X =2460

ΣfX
x̄ = ; Where f = frequency, X = the variable in the question, N = Total number of Observation or Σf.
N

ΣfX
x̄ =
N

2460
x̄ =
60

x̄ = 41
Short-cut Method

Marks No. of students d = (X-A); here


f.d
X (f) A = 40
20 8 -20 -160
30 12 -10 -120
40 20 0 0
50 10 10 +100
60 6 20 +120
70 4 30 +120
N = 60 Σf.d = 60

Σfd
x̄ = A + N ; Where A = Assumed mean (it can be taken from any values among the frequency ‘x’ or any value whether existing in the data or not
can be taken as the assumed mean and the final answer would still be the same. However nearer the assumed mean is to the actual mean,
lesser are the calculations), d = Deviation i.e. (X - A), N = total number of observations or Σf.

Σfd
x̄ = A + N

60
x̄ = 40 + 60

x̄ = 40 + 1

x̄ = 41
Calculation of Arithmetic Mean- Continuous series observations
In the Continuous series, Arithmetic mean may be computed by applying
i. Direct Method
ii. Indirect Method

Calculation
Q. From the following data compute arithmetic mean by direct and Short-cut Method

Marks 0 -10 10 – 20 20 - 30 30 – 40 40 – 50 50 - 60
No. Of Students 5 10 25 30 20 10

Soln.
Direct Method

Marks No. of students Mid-Points


f.m
(X) (f) (m)
0 - 10 5 5 25
10 - 20 10 15 150
20 - 30 25 25 625
30 - 40 30 35 1050
40 - 50 20 45 900
50 - 60 10 55 550
Σf. or N = 100 Σf.m = 3300

Σfm
x̄ = ; Where f = frequency, m = mid-point of the various classes, N = Total number of Observation or Σf.
N

3300
x̄ = 100

x̄ = 33

Therefore, the Average mark of the students is 33.


Short-cut Method

Marks No. of students Mid-Points


d = (m – 35) f.d
(X) (f) (m)
0 - 10 5 5 -30 -150
10 - 20 10 15 -20 -200
20 - 30 25 25 -10 -250
30 - 40 30 35 0 0
40 - 50 20 45 +10 +200
50 - 60 10 55 +20 +200
Σf. or N = 100 Σf.d = -200

Σfd
x̄ = A + N ; Where A* = Assumed mid-point value (it can be taken from any values among the frequency ‘x’ or any value whether existing in the
data or not can be taken as the assumed mean and the final answer would still be the same. However nearer the assumed mean is to the actual
mean, lesser are the calculations), d = Deviation i.e. (X - A), N = total number of observations or Σf.

−200
x̄ = 35 +
100

x̄ = 35 + (−2)
x̄ = 33
B. Median
The median by definition refers to the middle value in a distribution. In case of Median, one-half of the items in the distribution have a value the
size of the median value or smaller, and the other-half of the items in the distributions have a value the size of the median value or larger. The
median is just the 50th percentile value below which the 50% of the values in the sample fall. It splits the observations into two halves.

As distinct from the Arithmetic mean, which is calculated from the value of each item in the series, the median is known as positional average.
The term “position” is refers to the place of a value in the series. The place of the median in a series is such that an equal number of items lie on
either side of it.

Since its location is based on its position, Incase of odd number of observations, the median can be assumed to the value at the middle of the
series, whereas, when there is even number of observations, there is no single middle position value then the median is taken to be the
Arithmetic mean of the two middle most items. Thus, when N is odd, the median is the actual value with the reminder of the series in two equal
parts on either side of it, but when N is even, the median is derived figure, i.e. half the sum of the middle values.

i. Merits
a. It is especially useful incase of the open-end classes since only the position and not the values of the items must be known. The median is also
recommended if the distribution has unequal classes, since it is easier to compute than the mean.
b. Extreme values do not affect the median as strongly as they do the mean.
c. In markedly skewed distributions such as income distributions or the price distributions where the arithmetic mean would be distorted by the
extreme values, the median is especially useful. Consequently, the median income for some purposes be regarded as a more representative
figure, for half the income earners must be receiving atleast the median income and as many do not.
d. It is the most appropriate average in dealing with qualitative data i.e. where ranks are given or there are other types of items that are not
counted or measured but are scored.
e. The value of median can be graphed graphically where as the value of the median cannot be ascertained.
f. Perhaps the greatest advantage of median is, however, the fact that the median actually does indicate what many people incorrectly believe
the arithmetic mean indicates. The median indicates the value of the middle item in the distribution. This is a clear cut meaning and makes the
median a measure that can be easily explained.
ii. Limitations
a. For calculating median, it is necessary to arrange the data; other averages do not need any arrangement.
b. Since it is a positional average, its value is not determined by each and observation.
c. It is not capable of algebraic treatment.
d. The value of median is affected more by sampling fluctuations than the value of the arithmetic mean.
e. The median, in some cases, cannot be computed exactly as the mean. When the number of items included in a series of data is even, the
median is determined approximately as the mid-point of the two middle items.
f. It is erratic if the number of items is small.
Computing Median
Calculation of Median- Individual series observations
Q. from the following data of the wages of 7 workers, compute the median wage.
Wages (in Rs.) 14100 14150 16080 17120 15200 16160 17400

Soln.
A. Arrangement of the data set into Ascending or descending order, here we will arrange the data in ascending order.
Sl. No. 1 2 3 4 5 6 7
Wages (in Rs.) 14100 14150 15200 16080 16160 17120 17400

B. Arrangement the data in the table and calculation of the median.


Calculation of median

Wages arranged
Sl. No. in ascending
order
1 14100
2 14150
3 15200
4 16080
5 16160
6 17120
7 17400
𝑁+1
Median = Size of th item.
2
7+1 8
Median = = = 4th item.
2 2

= Rs. 16080.00

Interpretation
We thus find that the median is the middlemost item: 3 persons get a wage less than Rs. 16080 and equal number, i.e. 3 persons, get more than Rs. 16080.
Calculation of Median- Discrete series observations
Q. from the following data of the income of some individuals, find the median of the income group.
Income (in Rs.) 15000 15500 16800 18000 18500 17800
No. of Persons 24 26 20 16 6 30

Soln.
A. Arrangement of the data set into Ascending or descending order, here we will arrange the data in ascending order.
Income (in Rs.) 15000 15500 16800 17800 18000 18500
No. of Persons 24 26 20 30 16 6

B. Arrangement the data in the table and calculation of the median.


Calculation of median
Income (in Rs.) No. of Persons C.f.
15000 24 24
15500 26 50
16800 20 70
17800 30 100
18000 16 116
18500 6 122
𝑁+1
Median = Size of th item.
2
122+1 123
Median = = = 61.5th item.
2 2

Size of the 61.5th item = 16800 (since 61.5th item is not there and the closest value to the 61.5th item in the C.f. is 70th item, so we take the value of the 70th
item here i.e. 16800)

Interpretation
We see that in the question there is even number of items and the median can be any among the middle two values. We find by calculations that the median is the 3rd from
the top in the table and there are 2 income groups earning less and there are 3 income groups earning more than the median income group i.e. 16800.
Calculation of Median- Continuous series observations
Q. from the following data of marks of some students, find the median marks of the students.
Marks 45-50 40-45 35-40 30-35 25-30 20-25 15-20 10-15 5-10
No. of students 10 15 26 30 42 31 24 15 7

Soln.

A. Arrangement of the data set into Ascending or descending order, here we will arrange the data in ascending order.
Marks 5-10 10-15 15-20 20-25 25-30 30-35 35-40 40-45 45-50
No. of students 7 15 24 31 42 30 26 15 10

B. Arrangement the data in the table and calculation of the median.


Calculation of median
n
Marks No. of Students C.f. 2
−C.f.
Median = L + ∗ i ; Where L = lower limit of the median class, i.e. the
5-10 7 7 f

10-15 15 22 class in which the middle item of the distribution lies, C.f. = Cumulative frequency
15-20 24 46 of the class preceding the median class or the sum of the frequencies of all the
20-25 31 77 classes lower than the median class, f = frequency of the median class, i = class
25-30 42 119 interval of the median class.
30-35 30 149
200
35-40 26 175 2
−77
40-45 15 190 Median = 25 + ∗5
42
45-50 10 200
100−77
𝑁 Median = 25 + ∗5
42
Median = Size of 2 th item.
23
200 th Median = 25 + ∗5
Median = = 100 item. 42
2
Median = 25 + 2.74
The median class or the median lies in the class (25-30). (Since 100th item is
not there and the closest value to the 100th item in the C.f. is 119th item, so Median = 27.74
we take the class group of the 119th value of the C.f. as the median class)
The median mark of the students is 27.74.
C. Mode
The mode or the modal value is that value in a series of series of observations which occurs with the greatest frequency. For example, the mode of the
series 3, 5,8,5,4,5,9,3 would be 5, since this value occurs more than any of the others. The mode is often said to be that value which occurs most often in
the data, that is, with the highest frequency. While this statement is quite helpful in interpreting the mode, it cannot safely be applied to any distribution,
because of the vagaries of sampling. Even fairly large samples drawn from a statistical population with a single well defined mode may exhibit very erratic
fluctuations in this average if the mode is defined as that exact value in the ungrouped data of each sample which occurs most frequently. Rather it should
be thought as the value about which the items are most closely concentrated. It is the value which has the greatest frequency density in its immediate
neighbourhood. For this reason it is also called the most typical or fashionable value of a distribution.

Merits (look at the pictures sent in the group and write down the merits and limitations)
i. ....
ii. ....
iii. ....
iv. ....
v. ....

Limitations
i. ....
ii. ....
iii. ....
iv. ....
v. ....
Computing Mode
Calculation of Mode- Individual series observations
Q. Calculate the mode from the following data of the marks obtained by 10 students.
10, 27, 24, 12, 27, 27, 20, 18, 15, 30

Soln.

i. Arrange the items in Ascending or descending order.


10, 12, 15, 18, 20, 24, 27, 27, 27, 30

ii. Create a statistical table for the calculation of mode

Calculation of mode

Size of the item No. of times it occurs


10 1
12 1
15 1
18 1
20 1
24 1
27 3
30 1

Since the number 27occurs the maximum number of times, i.e. 3, the modal marks is 27.
Calculation of Mode- Continuous series observations

Q. Calculate the mode from the following data.

Marks No. of Students


0-10 3
10-20 5
20-30 7
30-40 10
40-50 12
50-60 15
60-70 12
70-80 6
80-90 2
90-100 8

Soln.

i. Convert to Cumulative frequency.

Cumulative Frequency
Marks No. of Students (f)
(C.f.)
0-10 3 3
10-20 5 8
20-30 7 15
30-40 10 25
40-50 12 37
50-60 15 52
60-70 12 64
70-80 6 70
80-90 2 72
90-100 8 80

ii. Since this is a continuous data, we have to see which class has the highest frequency among all the class groups. The class with the highest
value will be the modal class.
Cumulative Frequency
Marks No. of Students (f)
(C.f.)
0-10 3 3
10-20 5 8
20-30 7 15
30-40 10 25
40-50 12 37
50-60 15 52
60-70 12 64
70-80 6 70
80-90 2 72
90-100 8 80

By inspection, the class 50 – 60 has the highest frequency and hence is the modal class for the entire data set.
𝑓1−𝑓ₒ
Mₒ = 𝐿 + ∗ 𝑖; Where L = Lower limit of the modal class, f1 = Frequency of the modal class, fₒ = Frequency of the class preceding the
2𝑓1−𝑓ₒ−𝑓2
modal class, f2= frequency of the class succeeding the modal class.
15−12
Mₒ = 50 + ∗ 10
2∗15−12−12
3
Mₒ = 50 + ∗ 10
6

Mₒ = 50 + 5
Mₒ = 55

You might also like