Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
18 views38 pages

Module 1 - 4

statistics theory 4 modules

Uploaded by

Kusuma Kusuma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views38 pages

Module 1 - 4

statistics theory 4 modules

Uploaded by

Kusuma Kusuma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38

SHASHIKUMAR C R BUSINESS STATISTICS

MODULE-1
INTRODUCTION TO STATISTICS
Definitions
 “Statistics are numerical statements of facts in any department of enquiry placed in relation
to each other” – Bowley

 Statistics are the classified facts representing the conditions of the people in a state.
Specially those facts which can be stated in number or in tables of number or in any tabular
or classified arrangement” – Webster

Importance and scope of statistics


1. Statistics in planning
2. Statistics in state
3. Statistics in mathematics
4. Statistics in economics
5. Statistics in Business & Management
6. Statistics in accountancy & auditing
7. Statistics in social sciences
8. Statistics in physical sciences
9. Statistics in biology and medical sciences
10. Statistics in astronomy

Limitations of statistics
1. Statistics does not study qualitative phenomenon
2. Statistics does not study individuals
3. Statistics laws are not exact
4. Statistics is liable to be misused

DEPARTMENT OF MBA SJBIT Page 1


SHASHIKUMAR C R BUSINESS STATISTICS

COLLECTION OF DATA:
Primary Data:
“The data which are originally collected by an investigator or agency for the first time for any statistical
investigation & used by them in the statistical analysis are termed as primary data.”

Methods of collecting primary data:-

1. Direct personal investigation: - This method consists in the collection of data personally by the
investigator from the source concerned. The investigator has to go to the field personally for making
enquiries & collecting the information from the respondents. This method should be used only if
the investigation is generally local confined to a single locality or area.

2. Indirect oral interviews: - This method consists in collecting the information by interviewing his
personal friends, relatives, or neighbors who know him thoroughly well. In these types of enquires
factual data on different problems are collected by interviewing person who directly or indirectly
concerned with the subject matter of the enquiry & who are in possession of the requisite
information. A list of questions are prepared & put to the persons known as witnesses & records
this procedure is usually adopted by the enquiry committees or commissions appointed by the
government

3. Information received through local Agencies:- In this method the information is not collected
formally by the investigator. This method consists in the appointment of local agents called
correspondents by the investigator in different parts of field of enquiry. The correspondents in
different regions collect the information according to their own ways & submit their reports
periodically to the central or head office where the data are processed for final analysis. This
technique of data collection is usually employed by newspaper or periodical agencies who require
information in different fields like sports, economic trends business stock & share market, policies
so on

4. Mailed questionnaire method: this method consist in preparing a questionnaire which is mailed to
the respondents with a request for quick response with in the specified time. A very polite covering
note, explaining in detail the aims & objectives of collecting the information & also the operational
definitions of collecting the information & also the operational definitions of various terms &
concepts used in the questionnaire is attached.

DEPARTMENT OF MBA SJBIT Page 2


SHASHIKUMAR C R BUSINESS STATISTICS

Respondents ate requested to extend their full co – operation by furnishing the correct replies &
returning the questionnaire duly filled in time. & it is kept strictly confidential & secret. In this method
the questionnaires is the only media for communication between the investigator & the respondents.

Secondary Data:
The data which have already been collected & processed by some agency or person & taken over from there
& used by only other agency for their statistical work are termed as secondary data.
Methods of collecting secondary Data:-
1. Published data
2. Unpublished data

DEPARTMENT OF MBA SJBIT Page 3


SHASHIKUMAR C R BUSINESS STATISTICS

AVERAGES OR MEAURE OF CENTRAL TENDENCY

Averages are the measures which condense a huge unwidely set of numerical data in to single
numerical values which are representative of the entire distribution.
Averages are sometimes referred to as the measure of central tendency.
“Averages are statistical constant which enable us to comprehend in a single effort the significance
of the whole” – A.L Bowley.

Averages are very much useful.


1. For describing the distribution in concise manner.
2. For comparative study of different distributions
3. For computing carious other statistical measure such as dispersion, skewness, kurtosis and
various other basic characteristics of a mass of data.
Requisites of a good Average:
1. It should be rigidly defined: - The definition should be clear & unambiguous so that
it leads to one & only one Interpretation by different persons. In other words the
definition should not leave anything to the discretion of the investigator or the observer.
2. It should be easy to understand & calculate: The data should be easy to understand
even for a non- mathematical person. ie it should be readily comprehensible & should
be competed with sufficient ease & rapidity & should not involve heavy arithmetical
calculation.
3. It should be based on all the observations:- In the computation of an ideal average
the entire set of data at our disposal should be used & there should
Not be any loss of information resulting from not using the available data.
4. It should be suitable for further mathematical treatment:- The average
should Possess some important & interesting mathematical properties so that its use In
further statistical theory is enforced.
5. It should not be affected much by extreme observation: - By extreme Observations
we mean very small & very large observations. Thus a few very
Small or very large observations should not unduly affect the value of a good average.

DEPARTMENT OF MBA SJBIT Page 4


SHASHIKUMAR C R BUSINESS STATISTICS

Various Measure Of Central Tendency:


1. Arithmetic Mean or Simple Mean
2. Median
3. Mode
4. Geometric mean
5. Harmonic mean

1) Arithmetic Mean or Simple Mean:


Arithmetic Mean or Simple Mean of a given set of observations is their sum divided by the
number of observations.
E.g. arithmetic mean of 3, 5,10,15,19 and 25 is

3 + 5 + 10 + 15 + 19 + 25
6
In general if X1, X2--------Xn are the given ‘n’ observation then their arithmetic mean usually
denoted by X

i.e. X = X1 + X2 + --------- + Xn
N

. X = ∑X
N

In case of frequency distribution the A.M of X is given by.

X = f1x1 + f2x2 +--------- + fnXn


f1 + f2 +--------- + fnXn

X = ∑fX
N

Or
∑fX
∑f
Where N = ∑f is the total frequency.

DEPARTMENT OF MBA SJBIT Page 5


SHASHIKUMAR C R BUSINESS STATISTICS

Steps for the computation of Arithmetic Mean:

1. Multiply each value of X or the mid value of the class (in case of grouped or continuous
frequency distribution) by the corresponding frequency f.
2. Obtain the total of the products obtained in step 1 to get ∑fX
3. Divide the total obtained in step 2 by N, the total frequency.

E.g. the intelligence quotients of 9 boys in a class are given below.


50, 70, 85 95,100,120,115,125,105. Find the mean I.Q

Sol: Mean I.Q of the 9 boys are given by

X = ∑fX = 865
N 9

X = 96.1

2. The following is the frequency distribution of the number of telephone calls received in 245
successive one- minute intervals at an exchange.

Number of calls 0 1 2 3 4 5 6 7
Frequency 14 21 25 43 51 40 39 12
Obtain the mean number of call per minute.

Sol:
No of calls (X) Frequency fx
0 14 0

1 21 21

2 25 50

3 43 129

4 51 204

5 40 200

6 39 234

7 12 84
Total N= 245 ∑fx =922

DEPARTMENT OF MBA SJBIT Page 6


SHASHIKUMAR C R BUSINESS STATISTICS

X = ∑fX = 922
N 245

X = 3.763 call/min

Step deviation method for computing arithmetic mean:

Step deviation method is a method of computing A.M which consist in taking the deviations
(difference) of the given observations from any arbitrary value A. the formula to calculate the
mean is

X = A + h.∑fd
N

Steps for computation of mean by step deviation method:

1. Compute d = (x – A), ‘A’ being any arbitrary number and ‘h’ is the common magnitude
of the classes.
2. Multiply ‘d’ by the corresponding frequency ‘f’ to get fd
3. Find the sum of the products obtained in step 2 to get ∑fd
4. Divide the sum obtained in step 3 by N, the total frequency
5. Multiply the value obtained in step 4 by ‘h’.
6. Add ‘A’ to the value obtained in step 5

PROBLEMS

1. Find the Athematic mean from the following table


Marks 52 58 60 65 68 70 75
No of 7 5 4 6 3 3 2
students

2. Calculate the mean of the following marks obtained by students in English using direct
method & step deviation method.
Marks 5 10 15 20 25 30 35 40 45 50
No of 20 43 75 67 72 45 39 9 8 6
students

3. Calculate the mean for the following frequency distribution.

Marks 0-10 10-20 20-30 30-40 40-50 50-60 60-70


No of students 6 5 8 15 7 6 3
By direct method and by step deviation method

DEPARTMENT OF MBA SJBIT Page 7


SHASHIKUMAR C R BUSINESS STATISTICS

4. Calculate the mean from the following table.


Wages 0-10 10-20 20-30 30-40 40-50 50-60 60-70 70-80 80-90
Frequency 1 4 10 22 30 35 10 7 1

5. Calculate the average marks by the step deviation method from the following data.
Marks 0-10 10-20 20-30 30-40 40-50 50-60
No of 42 44 58 35 26 15
students

6. From the following data of income distribution calculate the arithmetic mean. It is given
that the total income of persons in the highest group is Rs 435 and none is earning less
than Rs 20
Income (Rs) No of persons
Below 30 16
Below 40 36
Below 50 61
Below 60 76
Below 70 87
Below 80 95
80 & above 5

7. Find the missing frequency from the following series, if the value of the Arithmetic
average is 33
X 10 12 60 70 40
Y 5 10 ? 2 5

8. From the following data find the missing frequency when the mean is 15.38
Size 10 12 14 16 18 20
Frequency 3 7 ? 20 8 5

9. A certain number of salesman were appointed in different territories and the following
data were compiled from their sales report, if the average sale is believed to be Rs 19920,
find the missing frequency.
Sales 4-8 8-12 12-16 16-20 20-24 24-28 28-32 32-36 36-40
‘000’
No of 11 13 16 14 ? 9 17 6 4
sales
man

10. Find the missing frequencies of the following series, if the arithmetic average is 39.5 and
the total number of items is 100

Marks 0-10 10-20 20-30 30-40 40-50 50-60 60-70


F 5 10 ? 4 20 3 ?

DEPARTMENT OF MBA SJBIT Page 8


SHASHIKUMAR C R BUSINESS STATISTICS

11. From the following frequency distribution of 100 families the mean is 50. Find the unknown
frequency f1 and f2 for classes 20-40 and 60-80
Expenditure 0-20 20-40 40-60 60-80 80-100
No of 14 f1 27 f2 15
families

DEPARTMENT OF MBA SJBIT Page 9


SHASHIKUMAR C R BUSINESS STATISTICS

MEDIAN

“The median is that value of the variable which divides the group in two equal parts, one part
comprising all the values greater and the other, all the values less than median”

Calculation of median

Case i) Frequency distribution: in case of frequency distribution where the variable takes the
values X1, X2…….Xn with respective frequencies f1, f2…..fn. in this case cumulative frequency
distribution facilitates the calculations. The steps involved are.

1. Prepare the ‘less than’ C.F distribution


2. Find N/2
3. Set the C.F just greater than N/2
4. The corresponding value of the variable gives median.

PROBLEMS

1. Eight coins were tossed together and the number of heads resulting was noted. The
operation was repeated 256 times and the frequency distribution of the number of heads
is given below.

No of heads 0 1 2 3 4 5 6 7 8
Frequency 1 9 26 59 72 52 29 7 1

Sol:
Computation of median

X F Less than cf
0 1 1
1 9 10
2 26 36
3 59 95
4 72 167
5 52 219
6 29 248
7 7 255
8 1 256

∑f = N = 256
N/2 = 256/2 = 128

DEPARTMENT OF MBA SJBIT Page 10


SHASHIKUMAR C R BUSINESS STATISTICS

The C.F just greater than 128 is 167 and the value of X corresponding to 167 is 4. Hence median
number of head is 4.
Case ii) Continuous frequency distribution

Steps involved for its computation are

1. Prepare “less than” C.F distribution


2. Find N/2
3. See C.F just greater than N/2
4. The corresponding class contains the median value and is called the median class.

The value of median is obtained by using the formula

Median = L + h (N/2 – C)
F
Where
L is the lower limit of the median class.
F is the frequency of the median class
H is the magnitude or width of the median class
N is the total frequency
C is the Cf of the class preceding the median class

2. Calculate the median for the following


Class interval Frequency
0-10 2
10-20 4
20-30 3
30-40 7
40-50 12
50-60 18
60-70 10
70-80 15
Total ∑N = 71

3. The following table gives the marks obtained by 50 students in economics. Find the
median.
Marks 10-14 15-19 20-24 25-29 30-34 35-39 40-44 45-49
No of 4 6 10 5 7 3 9 6
students

DEPARTMENT OF MBA SJBIT Page 11


SHASHIKUMAR C R BUSINESS STATISTICS

4. The following table shows the age distribution of persons in a particular region.

Age (year) No of persons(000)


Below 10 2
Below 20 5
Below 30 9
Below 40 12
Below 50 14
Below 60 15
Below 70 15.5
70 & over 15.6

a) Find the median age


b) Why is the median more suitable measure of central tendency than the mean in this case?

5. Find the missing frequency from the following distribution of daily sales of shops, given
that the median sale of shops is Rs 2400

Sales in hundred Rs 0-10 10-20 20-30 30-40 40-50


No of shops 5 25 ? 18 17

6. Find the frequency distribution of 100 families given below; the number of families
corresponding to expenditure group 20-40 and 60-80 are missing from the table.
However the median is known to be 50. Find the missing frequencies.

Expenditure 0-20 20-40 40-60 60-80 80-100


No of families 14 ? 27 ? 15

7. An incomplete frequency distribution is given below.


X 10-20 20-30 30-40 40-50 50-60 60-70 70-80
F 12 30 ? 65 ? 25 18

8. Calculate the median from the following data .7 marks (July 2008) (Jan 2007)

Value 1-10 10-20 20-30 30-40 40-50 50-60 60-70 70-80


Frequency 4 12 24 36 20 16 8 5

9. Compute median from the following data


Mid 115 125 135 145 155 165 175 185 195
Values
Frequency 6 25 48 72 116 60 38 22 3

DEPARTMENT OF MBA SJBIT Page 12


SHASHIKUMAR C R BUSINESS STATISTICS

10. Expenditure of 1000 families is given as under, the median of distribution Rs 87.
Calculate missing frequency
Expenditure 40-59 60-79 80-99 100-119 120-139
No of 50 ? 500 ? 50
families

DEPARTMENT OF MBA SJBIT Page 13


SHASHIKUMAR C R BUSINESS STATISTICS

MODE

Mode is the value which occurs most frequently in a set of observations and around which the
other items of the set cluster densely.

“According to A.M. Tuttle mode is the value has the greatest frequency density in its
immediate neighborhood”

In case of continuous frequency distribution, the class corresponding to the maximum frequency
is called the modal class and the value of mode is obtained by the formula.

Mode =L + h (f1 – f2)


2f1 - f0 - f2

Where
L is the lower limit of the modal class
F1 is the frequency of the modal class
F0 is the frequency of the class preceding the modal class
F2 is the frequency of the class succeeding the modal class

1. Find the value of mean, mode and median from the data given below.

Earnings (Rs) 66-67 67-68 68-69 69-70 70-71 71-72


No of persons 15 24 40 20 14 11

2. Find the value of mean, mode and median from the data given below. 12 marks (Jan
2009,Jan 2010)

Weight (kg) 93-97 98-102 103-107 108-112 113-117 118-122 123-127 128-132
No of students 3 5 12 17 14 6 3 1

3. The median and mode of the following wage distribution are known to be Rs 33.5
and Rs 34 respectively. Three frequency values from the table are missing. Find
out those values. 5 marks (July 2008, Jan 2009)

Wages 0-10 10-20 20-30 30-40 40-50 50-60 60-70 N=


in Rs 230
No.of 4 16 ? ? ? 6 4
persons

DEPARTMENT OF MBA SJBIT Page 14


SHASHIKUMAR C R BUSINESS STATISTICS

4. Given below is the frequency distribution of marks obtained by 90 students.


Compute the arithmetic mean, median and mode.
Marks Frequency
15-19 6
20-24 14
25-29 12
30-34 10
35-39 10
40-44 9
45-49 9
50-54 10
55-59 5
60-64 4
65-69 1

DEPARTMENT OF MBA SJBIT Page 15


SHASHIKUMAR C R BUSINESS STATISTICS

DISPERSION
“Dispersion is the measure of the variation of the items” – A.L. Bowley.
“Dispersion is a Measure of the extent to which the individual items vary” – L R Connor.

Objectives or Significance of Dispersion:-

1. To find out the reliability of an average


2. To control the variation of the data from the central value.
3. To compare two or more sits of data regarding their variability.
4. To obtain other statistical measure for further analysis of data.

Measures of Dispersion:
The various measure of Dispersions are
1. Range
2. Quartile deviation
3. Mean deviation
4. Standard deviation

I. RANGE

Range is defined as the difference between the two extreme observation of the
distribution i.e. the greatest and the smallest observation of the distribution.

Range = X max – X min


Range = L – S where L = Largest observations
S = Smallest observations

Co-efficient of Range is the Ratio of the difference between two extreme observations of the
distribution of their sum.

Co-efficient of Range = L- S
L+S

Ex: 1. Calculate the range and the co efficient of range of A’s Monthly earnings for a year

Month Month earnings (00 Rs) Month M.E


1 139 7 160
2 150 8 161
3 151 9 162
4 151 10 162
5 157 11 173
6 158 12 175

DEPARTMENT OF MBA SJBIT Page 16


SHASHIKUMAR C R BUSINESS STATISTICS

Range = L-S
= 17500-13900
= Rs. 3,600
Co- efficient of Range = L-S
L+S
= 17500 – 13,900
17500 + 13900
= 36 = 0.115
314

2. The following table gives the age distribution of a group of 50 individuals.


Age (in years): 16 – 20 21 -25 26 -30 31-36
No of persons: 10 15 17 8

Calculate the range and co efficient of Range.

Solution: -
Since age is a continuous variable convert the given classes into continuous classes
The first class is 15.5 – 20.5 and the last class is 305-365
Largest values = 35.5 S.v = 15.5
Range = 35.5 – 15.5
= 20 yrs
Co-efficient of Range = 35.5-15.5 = 0.39
35.5+15.5

II. QUARTILE DEVIATION OR SEMI INTE- QUARTILE RANGE

It is a measure of dispersion based on the upper quartile Q3 and the lower quartile Q1
Inter-quartile range = Q3 – Q1

Quartile deviation is obtained from inter-quartile range on dividing by 2 and hence is also known
as semi inter-quartile range, thus
Quartile Deviation (Q.D) = Q3 – Q1

For comparative studies of variability of two distributions we need a relative measure which is
known as co-efficient of quartile deviation and is given by
Coefficient of Q.D = Q3 – Q1

Q3 + Q1

DEPARTMENT OF MBA SJBIT Page 17


SHASHIKUMAR C R BUSINESS STATISTICS

1. Find inter-quartile range, quartile deviation and coefficient of quartile deviation for the
following distribution:
Class 0-15 15-30 30-45 45-60 60-75 75-90 90-105
Interval
F 8 26 30 45 20 17 4

2. Find inter-quartile range, quartile deviation and coefficient of quartile deviation for the
following distribution:
Marks 10-20 20-30 30-40 40-50 50-60 60-70 70-80 80-90
No of 60 45 120 25 90 80 120 60
students

3. Evaluate an appropriate measure of dispersion for the following data


Income Less 50-70 70-90 90-110 110-130 -130-150 Above
than 50 150
No of 54 100 140 300 230 125 51
person

III. Mean Deviation or Average Deviation

“Mean Deviation is the Average amount of scatter of items in a distribution from either mean or
the median, ignoring the signs of the deviation. The average that is taken of the scatter is an
arithmetic mean which accounts for the fact that this measure is often called the mean deviation”

M.D = 1 ∑ X –A = 1 ∑ d where 1d1 = x – A


N N A is any one of the
Average mean, md, mo

In case of Frequency distribution


MD = 1 ∑ f d where d = x – A
N

MD (about Mean) = 1 ∑f x – M
N
MD (about Mean) = 1 ∑ f x – Md
N
MD (about Mean) = 1 ∑f x – Mo
N
Short cut Method of computation of Mean Deviation:-

MD (about Mean) = 1 ∑f x – a + (m-a) (∑ fB - ∑fA


N

Where M is the mean

DEPARTMENT OF MBA SJBIT Page 18


SHASHIKUMAR C R BUSINESS STATISTICS

A: is the arbitrary constant near the mean


∑fB is the sum of all the class frequencies before the mean value
∑fA is the sum of all the class frequencies after the mean value

Relative Measure of Mean Deviation: The relative measure of dispersion, called the co-efficient
of mean deviation and is given

Co-efficient of MD about mean = MD


Mean

Co-efficient of MD about mean = MD


Medium

PROBLEMS:-

1. Calculate the Mean deviation from Mean for the following data

Class interval 2-4 4–6 6–8 8 – 10


Frequency 3 4 2 3

Also calculate the co-efficient of mean deviation from medium.

2. Calculate Mean deviation from medium of the following distribution

Class interval 50 - 100 100 - 150 150 -200 200-250 250 - 300 300 - 350
Frequency 7 18 25 31 15 4

3. Find the Mean deviation from the Mean for the following data

Class interval 0-10 10-20 20-30 30-40 40-50 50-60 60-70


Frequency 8 12 10 8 3 2 7
a. Also find the MD about Medium
b. Compare the result obtained in a & b.

4. Calculate Mean deviation from the Medium for the following data

Marks less than 80 70 60 50 40 30 20 10


No of Students 100 90 80 60 32 20 13 5

5. From the following series determine the value of the mean deviation and its co-efficient
from the median
Marks 0-10 10-20 20-30 30-40 40-50 50-60 60-70
No of 4 8 11 15 12 6 3
students

DEPARTMENT OF MBA SJBIT Page 19


SHASHIKUMAR C R BUSINESS STATISTICS

6. From the following frequency distribution find the mean deviation and co-efficient of MD
from mode.
Heights No of workers
100-104 4
105-109 14
110-114 60
115-119 138
120-124 206
125-129 298
130-134 380
135-139 476
140-144 500
145-149 430
150-154 260
155-159 128
160-164 66
165-1696 28
170-174 12

STANDARD DEVIATION:

Standard deviation usually denoted by letter “σ”


It is defined as the positive square root of the arithmetic mean of the squares of the deviations
Of the given observations from their arithmetic mean.

Thus if x1, x2, x3…… xn is a set of n observation then.

σ = ∑ (x - x )2
N

In case of Frequency distribution

σ = ∑ f( x - x )2
N

Variance and Mean Square deviation:

Variance is the Mean of the squared deviations about the mean of a series variance is the square
of the S.D and it is denoted by.

σ2 = ∑ f (x- x )2
N

DEPARTMENT OF MBA SJBIT Page 20


SHASHIKUMAR C R BUSINESS STATISTICS

The Mean square deviation, usually denoted by s2 is given

s2 = ∑ f (x- A )2
N
Where A is any arbitrary number.

PROBLEMS

1. Calculate the Mean and SD from the following data January 2010, 6 Marks

Value: 90-99 80-89 70-79 60-69 50-59 40-49 30-39


F: 2 12 22 20 14 4 1

2. Calculate the Mean, SD and Co-efficient of SD from the following data

Age under (in year) 10 20 30 40 50 60 70 80


No of persons dying 15 30 53 75 100 110 115 125

Coefficient of Variation

Coefficient of variation is the percentage variation in mean, standard deviation being considered
as the total variation in the mean.
For comparing the variability of two distribution we compute the coefficient of variation for each
distribution.

C.V = Standard Deviation X 100

Mean

1. From the prices X and Y of shares A and B respectively given below, state which share is
more stable in value
Price of 55 54 52 53 56 58 52 50 51 49
share X
Price of 108 107 105 105 106 107 104 103 104 101
share Y

DEPARTMENT OF MBA SJBIT Page 21


SHASHIKUMAR C R BUSINESS STATISTICS

UNIT-2
CORRELATION AND REGRESSION
“The correlation is a statistical tool which studies the relationship between two variables”

“When the relationship is a quantitative nature, the appropriate statistical tool for
discovering and measuring the relationship and expressing it in a brief formula is known as
correlation” – Croxton & Cowden.

“Correlation is an analysis of the co-variation between two or more variables”


– A.M Tuttle

Two variables are said to be correlated if the change in one variable results in a corresponding
change in the other variables.

TYPES OF CORRELATION:

1. Positive & Negative correlation: If the values of the two variables deviate in the same
direction i.e. if the increase in the value of one variable results in a corresponding increase
in the value of other variable and vice versa. This is said to be positive or direct correlation.
Ex: Height and weights
The family income and expenditure

Correlation is said to be negative or inverse if the variables deviate in the opposite direction
i.e. if the increase or decrease in the value of one variable results in corresponding decrease or
increase in the value of other variable.
Ex. Price and demand of a commodity.

2 Linear and non-linear correlation:

The correlation between two variables is said to be linear if a unit change in one variable result
in a constant change in the values of other variable.
Ex.
X 1 2 3 4 5
Y 5 8 11 14 17

The correlation between two variables is said to be non-linear if a unit change in one variable
result in change in other variable without constant rate
Ex:
X 1 2 3 4 5
Y 5 7 10 11 13

DEPARTMENT OF MBA SJBIT Page 22


SHASHIKUMAR C R BUSINESS STATISTICS

METHODS OF STUDYING CORRELATION:

1. Scatter diagram method.


2. Karl Pearson’s coefficient of correlation
3. Rank method
4. Concurrent deviation method.

SCATTER DIAGRAM:

Scatter diagram is one of the simplest ways of diagrammatic representation of a bivariate


distribution and provides us to understand the correlation between two variables.

1. Perfect positive correlation:

If the points on the scatter diagram raise from left hand corner towards upper right hand corner,
the correlation is perfect and positive i.e. +1

2. Perfect negative correlation:

DEPARTMENT OF MBA SJBIT Page 23


SHASHIKUMAR C R BUSINESS STATISTICS

3. No correlation:

CORRELATION PROBLEMS

1. Calculate Karl Pearson’s co-efficient of correlation between expenditure on advertising


and sales from the data given below

advertising expenses (‘000 39 65 62 90 82 75 25 98 36 78


Rs)
Sales (lakh Rs) 47 53 58 86 62 68 60 91 51 84

2. Find if there is any significant correlation between height and weight given below

Height in inches 57 59 62 63 64 65 55 58 57
Weight in kgs 113 117 126 126 130 129 111 116 112

3. Calculate Karl Pearson’s coefficient of correlation from the following data

X 6 8 12 15 18 20 24 28 31
Y 10 12 15 15 18 25 22 26 28

4. Making use of the data given below calculate the coefficient of correlation r12

Case A B C D E F G H
X1 10 6 9 10 12 13 11 9
X2 9 4 6 9 11 13 8 4

5. Find the coefficient correlation for the following data

X 19 21 23 25 32
Y 65 66 65 68 75

DEPARTMENT OF MBA SJBIT Page 24


SHASHIKUMAR C R BUSINESS STATISTICS

6. Compute Karl Pearson’s coefficient of correlation in the following series relating to cost
of living and wages:

Wages (Rs) 100 101 103 102 100 99 97 98 96 95


Cost of living 98 99 99 97 95 92 95 94 90 91

7. Compute Karl Pearson’s coefficient of correlation for the following ages of husbands and
wives at the time of their marriage

Age of husband 23 27 28 28 28 30 30 33 35 38
(in years)
Age of wife (in 18 20 22 27 21 29 27 29 28 29
years)

8. calculate Karl Pearson’s coefficient of correlation for the following data using 20 as the
working mean for price and 70 as the working mean for demand

Price 14 16 17 18 19 20 21 22 23
Demand 84 78 70 75 66 67 62 58 60

9. Calculate Karl Pearson’s coefficient of correlation for the following data using 44 and 26
respectively as the origin of x and Y

X 43 44 46 40 44 42 45 42 38 40 42 57
Y 29 31 19 18 19 27 27 29 41 30 26 10
When actual mean is not a whole number but a fraction or when the series is large we cannot use
direct method, so we use the assumed mean method.

10. Find out the coefficient of correlation in the following case

Height of father (X) 65 66 67 67 68 69 71 73


Height of son (Y) 67 68 64 68 72 70 69 70

11. Calculate the co-efficient of correlation between the sales and expenses from the following data
Sales (lakhs) 50 50 55 60 65 65 65 60 60
Expenses 11 13 14 16 16 15 15 14 13

12. Calculate the co-efficient of correlation taking 31 and 25 as assumed mean for x & y series for
calculation purpose.
X 23 27 28 29 30 31 33 35 36
Y 18 22 23 24 25 26 28 29 30

DEPARTMENT OF MBA SJBIT Page 25


SHASHIKUMAR C R BUSINESS STATISTICS

13. The following table is the distribution of total population and those who are totally and
partially blind among them. Find out if there is any relation between age and blindness.

Age (years) 0-10 10-20 20-30 30-40 40-50 50-60 60-70 70-80
No. of 100 60 40 36 24 11 6 3
persons (‘000)
Blind 55 40 40 40 36 22 18 15

14. Calculate the coefficient of correlation between age group and mortality from the
following data.
Age group 0-20 20-40 40-60 60-80 80-100
Rate of mortality 350 280 540 760 900

15. Find karl pearson co-efficient of correlation between the age and playing habit of the people from
the following information. Also mention what does your calculated indicates.
Age 15-20 20-25 25-30 30-35 35-40 40-45

No. of 200 270 340 360 400 300


people
Payers 150 162 170 180 180 120

Probable error-

It is a measure to find out the reliability (or) the significance of the co-efficient
correlation.

If the value “r” is less than the probable error then the “r” is not significant. If the “r” is
more than 6 times of the probable error then the “r” is significant.

The probable error = 0.6745 (1-r)/ n

PROBLEMS

16. Find Karl Pearson’s coefficient of correlation from the following series of marks secured
by 10 students in a class test in mathematics and statistics.

Marks in math’s 45 70 65 30 90 40 50 75 85 60
Marks in statistics 35 90 70 40 95 40 60 80 80 50

Also calculate its probable error assuming 60 and 65 as working mean

17. Find the probable error if r = 0.5457 and n=10

DEPARTMENT OF MBA SJBIT Page 26


SHASHIKUMAR C R BUSINESS STATISTICS

18. Calculate the probable error from the following


X 7 6 5 4 3 2 1

Y 18 16 14 12 10 6 8

SPEARMANS’S RANK CORRELATION:

If the qualitative characteristics such as duty, honest, intelligence cannot be measured


quantitatively but can be arranged seriously (ranks)
This cannot be calculated by using Pearson’s co-efficient hence spearman’s rank
correlation can be used.
This can be calculated by using the following formula.
ρ = 1 - 6∑d2
n (n2 – 1)
Where d = x-y

19. The ranks of the same 15 students in 2 subjects A and B are given below.
The 2 numbers (denoting the ranks of the same students in a and b respectively)
(1,10) (2,7) (3,2) (4,6) (5,4) (6,8) (7,3) (8,1) (9,11) (10,15) (11,9) (12,5) (13,14)
(14,12) (15,13)
Use spearman’s formula to find the rank correlation coefficient.

20. 10 competitors in beauty contest are rank by 3 judges in the following order
Judge 1 1 6 5 10 3 2 4 9 7 8
Judge 2 3 5 8 4 7 10 2 1 6 9
Judge 3 6 4 9 8 1 2 3 10 5 7
Use the rank correlation coefficient to determine which pair of judges has the near of approach in
coon taste in beauty.

DEPARTMENT OF MBA SJBIT Page 27


SHASHIKUMAR C R BUSINESS STATISTICS

REGRESSION ANALYSIS

Regression helps us to estimate one variable from the other variable.


“Regression analysis is a mathematical measure of the average relation between two or more
variable in terms of the original units of the data”
Ex: the yield of a crop depends on the rainfall, the cost or price of a product depends on the
production and advertising expenditure, expenditure of a person depends on his income.

In regression analysis there are two types of variables. The variable whose value is
influenced or is to predicted is called dependent variable and the variable which influences the
value or is used for prediction is called independent variable.

LINES OF REGRESSION:

Line of regression is the line which gives the best estimate of one variable for any given value of
the other variable. In case of two variables X and Y. we shall have two lines of regression one of
Y on X and the other X on Y.

Line of regression of Y on X is the line which gives best estimate for the value of Y for
any specified value of X.
Line of regression of X on Y is the line which gives the best estimate for the value of X
for any specified value of Y.

REGRESSION PROBLEMS

1. From the following data, obtain the two regression equations.

Sales 91 97 108 121 67 124 51 73 111 57


Purchase 71 75 69 97 70 91 39 61 80 47

2. Calculate the two regression equation of X on Y and Y on X from the data given below
taking deviation from the actual mean of X and Y

Price 10 12 13 12 16 15
Demand 40 38 43 45 37 43
Also estimate likely demand when the price is Rs 20

3. Calculate the regression equation of X and Y from the following data

X 1 2 3 4 5
Y 2 5 3 8 7

DEPARTMENT OF MBA SJBIT Page 28


SHASHIKUMAR C R BUSINESS STATISTICS

4. From the following data given below find,


i) The two regression coefficient
ii) The coefficient of correlation between the marks in economics and statistics
iii) The most likely marks in statistics when marks in economics is 30.

Marks in 25 28 35 32 31 36 29 38 34 32
economics(X)
Marks in 43 46 49 41 36 32 31 30 33 39
statistics (Y)

Deviation taken from the assumed mean

5. Price indices of cotton and wool are given below for the 12 months of a year. Obtain the
equation of line of regression between the indices

Price index of cotton 78 77 85 88 87 82 81 77 76 83 97 93


Price index of wool 84 82 82 85 89 90 89 92 83 89 98 99

DEPARTMENT OF MBA SJBIT Page 29


SHASHIKUMAR C R BUSINESS STATISTICS

MODULE-4
TIME SERIES ANALYSIS

Time series is an arrangement of statistical data in a chronological order i.e. in accordance with
its time of occurrence. Thus a time series is a set of quantitative reading of some variable
recorded at equal intervals of time. The intervals may be an hour, a day or a week or month or a
year.
E.g.: hourly temperature reading, daily sales in a shop, weekly sales in a market, monthly
production in an industry, yearly agricultural production.

USES OF TIME SERIES:

It is very important in economic business planning, research work etc. because of the
following reasons.
1. It helps in understanding past behavior and it will help in estimating the future behavior.
2. It helps in planning and forecasting.
3. Comparison between data of one period with that of period is possible.
4. Helps to evaluate the progress in any field of economics and business activity.

COMPONENTS OF TIME SERIES:

There are a large number of forces that affecting the time series as a result there are fluctuations
time series. There are 4 basic types of variation and these are called components or elements of
time series.

Components

Long term Short term

Secular Cyclic Seasonal Random

1. SECULAR TREND: The general tendency of time series data to increase or decrease
during a long period of time is called the secular trend. Trend may be upward or a
downward.
E.g. Increase in population, production, prices etc.
Decrease in deaths.

DEPARTMENT OF MBA SJBIT Page 30


SHASHIKUMAR C R BUSINESS STATISTICS

Upward and down ward trend

Y Y

Trend Production Trend

0 time X 0 time X

Linear- Trend and non -linear Trend

Y Y

Y = a + bx Y = a + bx

0 time X 0 time X

When the rate of growth of a time series remains constant in the long run is known as linear
trend. It can be expressed Y = a + bx

When the long run growth of a time series is not a constant rate it is called non – linear
trend.

MEASUREMENT OF SECULAR TREND:

We can find out the direction of long term series whether it is growing or declining by the
measurement of trend. The reason for measurement of trend is to find out characteristics in the
series

For e.g.: we can compare the growth in the agricultural production in one state with the
agricultural production in other state.

DEPARTMENT OF MBA SJBIT Page 31


SHASHIKUMAR C R BUSINESS STATISTICS

The following are the 4 methods which can be used for determining the trend.

1. Free hand method


2. Semi average method
3. Moving average method
4. Method of least square.

METHOD OF MOVING AVERAGES: Method of moving averages is a very simple and


flexible method of measuring trend. In this method the average value of a number of years
(months, weeks or days) is taken as the trend values for the middle point of the period of moving
average. The process of averaging smoothens the curve and reduces the fluctuations.

ODD PERIOD OF MOVING AVERAGE:


Steps for calculating odd number of years i.e. 3,5,7,9

1. Compute the values of 1st 3 years and place the 3years total against the middle year.
2. Leave the 1st year value and add up the values of the next 3 years and place the 3 years
total against the middle year.
3. This process must be continued until the last years value is taken for calculating moving
averages.
4. The three yearly totals must be divided by 3 and placed in the next column this is the
trend value of moving average.
The formula for calculating 3 yearly moving averages is as follows.

a+b+c b+c+d c+d+e


3 3 3

5years

a+b+c+d+e b+c+d+e+f

5 5

DEPARTMENT OF MBA SJBIT Page 32


SHASHIKUMAR C R BUSINESS STATISTICS

PROBLEMS:

1. Calculate 3 yearly moving averages for the following data.

Years 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990
No of students 15 18 17 20 23 25 29 33 36 40

2. Gross revenue data (Rs.in million) for a travel agency for a 10 year period is as follows.

Years 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009
Revenue 3 6 10 8 7 12 14 14 18 19
Calculate 3 yearly moving averages for the revenue earned

3. Calculate three yearly and five yearly moving averages for the following data and
comment on the results.

Years 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000
Y 242 250 252 249 253 255 251 257 260 265 262

EVEN PERIOD OF MOVING AVERAGES:

If the period of moving average is 4, 6, 8 the 4 yearly totals cannot be placed against any years,
as the median 2.5 is between the 2nd year and 3rd year. So the total should be placed between the
2nd and 3rd year. We must center the moving averages in order to place the moving averages
against an year.

Steps for calculating even number of years 4, 6, 8

1. Compute the values of the 1st 4 years & place the total in b/w the 2nd & 3rd years.
2. Leave the 1st year value & compute the value of the next 4 years & place the total in b/w
the 3rd & 4th year.
3. This process must be continued until the last year is taken into account.
4. Compute the 1st to four year total & place it against the middle year ( 3rd year )
5. Leave the 1st 4 years total compute the next 4 years total place it in the 4th year.
6. This method must be continued until all the 4 years totals are computed.
7. Divide the above total by 8 (it is the total of the two 4 yearly total ) & put in the next
column. This is the trend value.

DEPARTMENT OF MBA SJBIT Page 33


SHASHIKUMAR C R BUSINESS STATISTICS

PROBLEMS:

1. Calculate 4 yearly moving averages for the following data.


Years 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990
Production 464 515 518 467 502 540 557 571 586 612

2. From the following data, calculate the trend values using four yearly moving average
Years 1989 1990 1991 1992 1993 1994 1995 1996 1997
Values 506 620 1036 673 588 696 1116 738 663

II. METHOD OF LEAST SQUARE:-

The method of least square can be used to explain the linear & non linear trend that is a straight
line trend or a parabolic trend. Straight line trend = 1/C = a + bx.

Yc = Required Trend Value.


X = Unit of Time.
a = Mean Value of y Values.

a = ∑Y & b= ∑ XY
N X2

b is rate of charge or increment.

DEPARTMENT OF MBA SJBIT Page 34


SHASHIKUMAR C R BUSINESS STATISTICS

PROBLEMS:

1. Calculate the trend values by the method of least square from the following data given
below and estimate the sales for 1993.

Year 1986 1987 1988 1989 1990


Sales (lakhs) 70 74 80 86 90

2. Fit a linear trend to the following data by least square method and also estimate the
production for the year 2007.

Year 1998 2000 2002 2004 2006


Production (‘000) 18 21 23 27 16

3. The sales of a company in millions of rupees for the year 1994-2001 are given below.

Year 1994 1995 1996 1997 1998 1999 2000 2001


Sales 550 560 555 585 540 525 545 585

Fit the liner trend equation and also estimate the sales for the year 1993.

4. Fit a straight line trend to the following data using the method of least squares and
calculate the production for the year 2001

Year 1996 1997 1998 1999 2000


Production (‘000) 83 92 74 90 166

5. The following table shows the number of salesmen working in a certain concern.
Year 1990 1991 1992 1993 1994
No of salesmen 28 38 46 40 56

Use the method of least square to fit a straight line trend and estimate the number of salesmen in
1995.

6. Fit a straight line trend to the following data using the method of least squares and project
the probable sales for the next two years.

Year 1999 2000 2001 2002 2003 2004


Sales( in ‘000) 164 180 186 187 190 192

7. Fit a straight line trend to the following data

Year 1991 1992 1993 1994 1995


Sale of sugar 80 90 92 93 94

DEPARTMENT OF MBA SJBIT Page 35


SHASHIKUMAR C R BUSINESS STATISTICS

Seasonal variation:

The objective of studying seasonal variation is to determine the affect of seasonal variation on
the value of given phenomenon and to eliminate them ie determine the size of the value of the
variable. It is important in deciding the business policy of various firms. The time series data are
recorded monthly, quarterly, weekly, daily or hourly. There will be difference in them due to
seasonal variation. There are 4 methods.

a) Method of simple average


b) Ratio to trend method
c) Ratio to moving average method
d) Link relative method

Simple average method:

1. Average the data for each month or quarter for all the years.
2. Find the totals of each month or quarter.
3. Divide each total by the number of year for which data are given. If we are given monthly
data for 4 years we must 1st get the total for each month for 4 years and divide each total
by 4 to get an average.
4. We must take the averages of month or quarterly as 100 and get seasonal index as
follows.

S.I = Quarterly or monthly X 100


General average

1. Compute the average seasonal movements for the following series

Quarterly production
Year I II III IV
1984 3.5 3.9 3.4 3.6
1985 3.5 4.1 3.7 4.0
1986 3.5 3.9 3.7 4.2
1987 4.0 4.6 3.8 4.5
1988 4.1 4.4 4.2 4.5

2. Compute the average seasonal movements for the following series

Quarter 1990 1991 1992 1993 1994 1995


I 3.5 3.5 3.5 4.0 4.1 4.2
II 3.9 4.1 3.9 4.6 4.4 4.6
III 3.4 3.7 3.7 3.8 4.2 4.3
IV 3.6 4.8 4.0 4.5 4.5 4.7

DEPARTMENT OF MBA SJBIT Page 36


SHASHIKUMAR C R BUSINESS STATISTICS

3. Compute the average seasonal movements for the following series

Quarterly production
Year I II III IV
1990 106 124 104 90
1991 84 114 107 88
1992 90 112 101 85
1993 76 94 91 76
1994 80 104 95 83
1995 104 112 102 84

RATIO TO MOVING AVERAGE METHOD: This is an improvement over the Ratio to trend
method as it tries to eliminate the cyclic variations which are mixed up with seasonal indices in
the ratio to trend method. Ratio to moving average is the most widely used method of measuring
seasonal fluctuations.

PROBLEMS:

1. Calculate seasonal indices by the ratio to moving average method from the following data.
Years I Quarter II Quarter III Quarter IV Quarter
1991 68 62 61 63
1992 65 58 66 61
1993 68 63 63 67

2. Calculate the seasonal indices by the ‘ratio to moving average’ method from the following data.

YEAR Quarter Y 4 yearly moving


average
1992 I 75 -----
II 60 -----
III 54 63.375
IV 59 65.375

1993 I 86 67.125
II 65 70.875
III 63 70.000
IV 80 75.375

1994 I 90 76.625
II 72 77.625
III 66 79.500
IV 85 81.500

1995 I 100 83.000


II 78 84.750
III 72 -----
IV 93 ------

DEPARTMENT OF MBA SJBIT Page 37


SHASHIKUMAR C R BUSINESS STATISTICS

DEPARTMENT OF MBA SJBIT Page 38

You might also like