Unit 4
Unit 4
Topics:
• Correlation
• Properties of correlation
• Types of correlation
• Correlation coefficient
• Rank correlation
• Regression
• Regression coefficient
• Properties of regression coefficient
• Expressions of regression coefficient
1. Correlation
Correlation is the relationship between two or more variables. Two variables are said to be
correlated if a change in one variable affects a change in the other variable. A data that
connects two variables is called bivariate data. Thus, correlation is a statistical analysis which
measures and analyses the degree or extent to which two variables fluctuate with reference
to each other. For example: relation between price and demand of commodity, relation
between rainfall and yield of crops
2. Types of correlation
Silver Oak University-CE/IT Degree Engineering-3rd Sem-Maths IV-Unit 4-Correlation and Regression-Dr. Moksha Satia Page | 1
1. Positive and negative correlation
Correlation is depending on the variation in the variables, which decides whether it may be
positive or negative.
i. Positive correlation
If both the variables vary in the same direction, the correlation is said to be positive.
In other words, if the value of one variable increases, the value of the othe r variable also
increases, or, if value of one variable decreases, the value of the other variable decreases,
e.g., the correlation between heights and weights of group of persons is a positive correlation.
Weight (cm) 60 62 64 65 67 69
If both the variables vary in the opposite direction, the correlation is said to be
negative. In other words, if the value of one variable increases, the value of the other variable
also decreases, or, if value of one variable decreases, the value of the other variable increases,
e.g., the correlation between the price and demand of a commodity is a negative correlation.
Correlation is depending on the study of the number of variables, which decides whether it
may be simple or multiple.
i. Simple correlation
When only two variables are studied, the relation is described as simple correlation,
e.g., the quantity of money and price level, demand and price, etc.
When more than two variables are studied, the relationship is described as multiple
correlation, e.g., relationship of price, demand, and supply of a commodity.
Silver Oak University-CE/IT Degree Engineering-3rd Sem-Maths IV-Unit 4-Correlation and Regression-Dr. Moksha Satia Page | 2
3. Partial and total correlation
i. Partial correlation
When more than two variables are studied excluding some other variables, the
relationship is termed as partial correlation.
When more than two variables are studied without excluding any variables, the
relationship is termed total correlation.
Depending upon the ratio of change between two variables, the correlation may be linear or
nonlinear.
i. Linear correlation
If the ratio of change between two variables is constant, the correlation is said to be
linear. If such variables are plotted on a graph paper, a straight line is obtained, e.g.,
Milk (l) 5 10 15 20 25 30
Curd (kg) 2 4 6 8 10 12
If the ratio of change between two variables is not constant, the correlation is said to
be nonlinear. The graph of a nonlinear or curvilinear relationship will be a curve, e.g.,
Sales (₹ in lacs) 10 12 15 15 16
Silver Oak University-CE/IT Degree Engineering-3rd Sem-Maths IV-Unit 4-Correlation and Regression-Dr. Moksha Satia Page | 3
There are two different methods of studying correlation
Studying
correlation
Graphic Mathematical
methods models
4. Scatter diagram
The scatter diagram is a diagrammatic representation of bivariate data to find the correlation
between two variables. There are various correlationships between two variables represented
by the following scatter diagrams.
If all the plotted points lie on a straight line rising from the lower left-hand corner to the
upper right-hand corner, the correlation is said to be perfectly positive.
If all the plotted points lie on a straight line falling from the upper left-hand corner to the
lower right-hand corner, the correlation is said to be perfectly negative.
If all the plotted points lie in the narrow strip, rising from the lower left-hand corner to the
upper right-hand corner, it indicates a high degree of positive correlation.
Silver Oak University-CE/IT Degree Engineering-3rd Sem-Maths IV-Unit 4-Correlation and Regression-Dr. Moksha Satia Page | 4
If all the plotted points lie in the narrow strip, falling from the upper left-hand corner to
the lower right-hand corner, it indicates a high degree of negative correlation.
5. No correlation
If all the plotted points lie on a straight line parallel to the x-axis or y-axis or in a haphazard
manner, it indicates the absence of any relationship between the variables.
1. It is simple and nonmathematical method to find out the correlation between the
variables.
2. It gives an indication of the degree of linear correlation between the variables.
3. It is easy to understand.
4. It is not influenced by the size of extreme items.
5. Simple graph
The coefficient of correlation is the measure of correlation between two random variables X
and Y, and is denoted by r.
cov ( X , Y )
r=
XY
Silver Oak University-CE/IT Degree Engineering-3rd Sem-Maths IV-Unit 4-Correlation and Regression-Dr. Moksha Satia Page | 5
1
cov ( X , Y ) =
n
( x − x )( y − y )
(x − x )
2
X =
n
( y − y )
2
Y =
n
r=
( x − x )( y − y )
(x − x ) ( y − y )
2 2
xy − n
x y
r=
( x) ( y)
2 2
x 2
−
n
y 2
−
n
d d − n
d d x y
x y
Hence, r =
( d ) ( d )
2 2
d d − −
2 x 2 y
x y
n n
iii. Two independent variables are uncorrelated, i.e., r = 0 .
The converse of above property is not true, i.e., two uncorrelated variables may not
be independent.
Example 1: Calculate the correlation coefficient between x and y using the following data:
x 2 4 5 6 8 11
y 18 12 10 8 7 5
Silver Oak University-CE/IT Degree Engineering-3rd Sem-Maths IV-Unit 4-Correlation and Regression-Dr. Moksha Satia Page | 6
Solution: n = 6
x y x2 y2 xy
2 18 4 324 36
4 12 16 144 48
5 10 25 100 50
6 8 36 64 48
8 7 64 49 56
11 5 121 25 55
x = 36 y = 60 x 2
= 266 y 2
= 706 xy = 293
xy − n
x y
r=
( x) ( y)
2 2
x 2
−
n
y 2
−
n
293 −
( 36 )( 60 )
= 6
( 36 ) ( 60 )
2 2
266 − 706 −
6 6
293 − 360
=
( 7.0711)(10.2956 )
−67
=
72.8012
= −0.9203
Example: Calculate the correlation coefficient between for the following values of demand
and the corresponding price of a commodity:
Silver Oak University-CE/IT Degree Engineering-3rd Sem-Maths IV-Unit 4-Correlation and Regression-Dr. Moksha Satia Page | 7
Demand in Quintals 65 66 67 67 68 69 70 72
Solution: Let the demand in quintal be denoted by x and the price in rupees per kg be denoted
by y.
n=8
x=
x = 544 = 68
n 8
y=
y = 552 = 69
n 8
x y x−x y− y (x − x)
2
(y − y)
2
( x − x )( y − y )
65 67 -3 -2 9 4 6
66 68 -2 -1 4 1 2
67 65 -1 -4 1 16 4
67 68 -1 -1 1 1 1
68 72 0 3 0 9 0
69 72 1 3 1 9 3
70 69 2 0 4 0 0
72 71 4 2 16 4 8
x y ( x − x ) ( y − y ) ( x − x ) ( y − y ) ( x − x )( y − y )
2 2
= 544 = 552 =0 =0 = 36 = 44 = 24
Silver Oak University-CE/IT Degree Engineering-3rd Sem-Maths IV-Unit 4-Correlation and Regression-Dr. Moksha Satia Page | 8
r=
( x − x )( y − y )
(x − x ) ( y − y )
2 2
24
=
36 44
24
=
( 6 )( 6.6332 )
= 0.6030
Example : Calculate the correlation coefficient between x and y using the following data:
x 17 19 21 26 20 28 26 27
y 23 27 25 26 27 25 30 33
Solution:
𝑑𝑥 = 𝑥 − 𝑎 = 𝑥 − 23,
𝑑𝑦 = 𝑦 − 𝑏 = 𝑦 − 27,
𝑛=8
x y 𝒅𝒙 𝒅𝒚 𝒅𝟐𝒙 𝒅𝟐𝒚 𝒅𝒙 𝒅𝒚
17 23 -6 -4 36 16 24
19 27 -4 0 16 0 0
21 25 -2 -2 4 4 4
26 26 3 -1 9 1 -3
20 27 -3 0 9 0 0
28 25 5 -2 25 4 10
Silver Oak University-CE/IT Degree Engineering-3rd Sem-Maths IV-Unit 4-Correlation and Regression-Dr. Moksha Satia Page | 9
26 30 3 3 9 9 9
27 33 4 6 16 36 24
∑ 𝒅𝒚=0
∑ 𝒅𝒙 ∑ 𝒅𝟐𝒙 ∑ 𝒅𝟐𝒚 ∑ 𝒅𝒙 𝒅𝒚
=𝟎 = 𝟏𝟐𝟒 = 𝟕𝟎 = 𝟒𝟖
d d − n
d d x y
x y
r=
( d ) ( d )
2 2
d − d −
2 x 2 y
x y
n n
48−0
=
√120−0√70−0
𝑟 = 0.515
Exercise
Exercise 1:Calculate the correlation coefficient between x and y using the following data:
x 10 12 18 24 23 27
y 13 18 12 25 30 10
Ans:0.223
Exercise 2:Calculate the correlation coefficient between x and y using the following data:
x 62 64 65 69 70 71 72 74
Ans:0.9032
Silver Oak University-CE/IT Degree Engineering-3rd Sem-Maths IV-Unit 4-Correlation and Regression-Dr. Moksha Satia Page | 10
RANK CORRELATION
The orders corresponding to two characteristics A and B, the corresponding between these
n pairs of ranks is called the rank correlation.
It is defined as
6 ∑ 𝑑2
𝑟 =1−
𝑛 ( 𝑛2−1)
x 1 3 7 5 4 6 2 10 9 8
y 3 1 4 5 6 9 7 8 10 2
Solution: 𝑛 = 10
1 3 -2 4
3 1 2 4
7 4 3 9
5 5 0 0
4 6 -2 4
6 9 -3 9
2 7 -5 25
Silver Oak University-CE/IT Degree Engineering-3rd Sem-Maths IV-Unit 4-Correlation and Regression-Dr. Moksha Satia Page | 11
10 8 2 4
9 10 -1 1
8 2 6 36
∑𝑑 = 0 ∑ 𝑑 2 = 96
6 ∑ 𝑑2
𝑟 = 1−
𝑛( 𝑛2−1)
6 ( 96)
𝑟 = 1− = 0.418
10[( 10) 2−1]
2.Tied Rank
If there is a tie between two or more individuals ranks, the rank is divided among equal
individuals, e.g., if two items have fourth rank, the 4 th and 5 th rank is divided between them
4+5
equally and is given as = 4.5th rank to each of them. If three items have the same 4 th
2
4+5+6
rank, each of them is given = 5th rank. As a result of this the following adjustment or
3
correlation is made in the rank correlation formula is defined as
(𝑚13 − 𝑚1 ) (𝑚32 − 𝑚2 )
6 [∑ 𝑑 2 + + ]+ ⋯
12 12
𝑟 =1−
𝑛(𝑛2 − 1)
x 10 12 18 18 15 40
y 12 18 25 25 50 25
Solution: 𝑛 = 6
Silver Oak University-CE/IT Degree Engineering-3rd Sem-Maths IV-Unit 4-Correlation and Regression-Dr. Moksha Satia Page | 12
x y Rank of x Rank of y d = x-y 𝑑2
10 12 1 1 0 0
12 18 2 2 0 0
15 50 3 6 -3 9
40 25 6 4 2 4
∑ 𝑑2
= 13.5
There are two items in the x series having equal values at the rank 4. Each is given the rank
4.5. Similarly, there are three items in the y series at the rank 3. Each of them is given the
rank 4.
𝑚1 = 2, 𝑚2 = 3
(𝑚13 − 𝑚1 ) (𝑚32 − 𝑚2 )
6 [∑ 𝑑 2 + + ]+ ⋯
12 12
𝑟 =1−
𝑛(𝑛2 − 1)
(8 − 2) (27 − 3)
6 [13.50 + + ]
12 12
𝑟 = 1− = 0.5429.
6((6) 2 − 1)
Homework examples:
Example1: Compute Spearman’s rank correlations coefficient from the following data:
x 18 20 34 52 12
Silver Oak University-CE/IT Degree Engineering-3rd Sem-Maths IV-Unit 4-Correlation and Regression-Dr. Moksha Satia Page | 13
y 39 23 35 18 46
Ans: - 0.9
Example 2: Compute Spearman’s rank correlations coefficient from the following data
which of judges has the nearest approach to common liking in voice.
Judge x 6 10 2 9 8 1 5 3 4 7
Judge y 5 4 10 1 9 3 8 7 2 6
Judge z 4 8 2 10 7 6 9 1 3 6
Score 35 40 25 55 85 90 65 55 45 50
IQ 100 100 110 140 150 130 100 120 140 110
Ans: 0.47
Silver Oak University-CE/IT Degree Engineering-3rd Sem-Maths IV-Unit 4-Correlation and Regression-Dr. Moksha Satia Page | 14
Regression
Regression is defined as a method of estimating the value of one variable when that of the
other is known and the variables are correlated. Regression analysis is used to predict or
estimate one variable in terms of the other variables. It is highly variable tool for prediction
purpose in economics and business.
Types of Regression
Regression is depending on the study of the number of variables, which decides whether it
may be simple or multiple.
I. Simple correlation
The regression analysis for studying only two variables at a time, the relation is
described as simple regression.
The regression analysis for studying more than two variables at a time known as
multiple regression.
I. Linear Regression: If the regression curve is a straight line, the regression is said
to be linear
II. Nonlinear Regression: If the regression curve is not a straight line i.e. not a first
degree equation in the variables x and y, the regression is said to be nonlinear or
curvilinear.
Silver Oak University-CE/IT Degree Engineering-3rd Sem-Maths IV-Unit 4-Correlation and Regression-Dr. Moksha Satia Page | 15
Methods of studying Regression
1. Method of scatter Diagram
It is the simplest method of obtaining the line of regression. The data are plotted on a
graph paper by taking the independent variable on the x axis and the dependent
variable on y axis. Each of these points are generally scattered in a narrow strip. If
the correlation is perfect i.e., if r is equal to one, positive, or negative, the points will
lie on a line which is the line of regression.
2. Method of Least square
It is used for obtaining the equation of a curve which fits best to a given set of
observations. It is based on the assumption that the sum of squares of differences
between the estimated values and the actual observed values of the observations is
minimum.
Line of Regression
If all the points in the scatter diagram cluster around a straight line, the line is called
line of regression. The line of regression is the line of best fit and is obtained by the
principle of least squares.
Line of regression of 𝑦 𝑜𝑛 𝑥
It is the line which gives the best estimate for the values of 𝑦 for any given values of
𝑥. The regression equation of 𝑦 𝑜𝑛 𝑥 is given by
𝜎𝑦
𝑦 − 𝑦̅ = 𝑟 (𝑥 − 𝑥̅)
𝜎𝑥
It is also written as
𝑦 = 𝑎 + 𝑏𝑥
Line of regression of 𝑥 𝑜𝑛 𝑦
It is the line which gives the best estimate for the values of 𝑥 for any given values of
𝑦. The regression equation of 𝑥 𝑜𝑛 𝑦 is given by
𝜎𝑥
𝑥 − 𝑥̅ = 𝑟 (𝑦 − 𝑦̅)
𝜎𝑦
It is also written as
Silver Oak University-CE/IT Degree Engineering-3rd Sem-Maths IV-Unit 4-Correlation and Regression-Dr. Moksha Satia Page | 16
𝑥 = 𝑎 + 𝑏𝑦
Where 𝑥̅ 𝑎𝑛𝑑 𝑦̅ are means of 𝑥 series and 𝑦 series respectively, 𝜎𝑥 𝑎𝑛𝑑 𝜎𝑦 are
standard deviation of 𝑥 series and 𝑦 series respectively, 𝑟
is the correlation coefficient between x and y.
Regression Coefficients
∑(𝑥−𝑥̅ )(𝑦−𝑦̅ )
and 𝑏𝑥𝑦 = ∑(𝑦−𝑦̅) 2
∑𝑥 ∑ 𝑦
∑ 𝑥𝑦−
(ii) 𝑏𝑦𝑥 = 𝑛
(∑ 𝑥) 2
∑ 𝑥2 −
𝑛
∑𝑥 ∑ 𝑦
∑ 𝑥𝑦−
and 𝑏𝑥𝑦= 𝑛
(∑ 𝑦)2
∑ 𝑦2 −
𝑛
∑ 𝑑𝑥 ∑ 𝑑𝑦
∑ 𝑑𝑥 𝑑𝑦 −
(iii) 𝑏𝑦𝑥 = 𝑛
(∑ 𝑑𝑥) 2
√∑ 𝑑2𝑥 −
𝑛
∑ 𝑑𝑥 ∑ 𝑑𝑦
∑ 𝑑𝑥 𝑑𝑦 −
and 𝑏𝑥𝑦 = 𝑛
2
√∑ 𝑑2 −(∑ 𝑑𝑦)
𝑦 𝑛
Silver Oak University-CE/IT Degree Engineering-3rd Sem-Maths IV-Unit 4-Correlation and Regression-Dr. Moksha Satia Page | 18
Examples:1 Find the regression coefficients 𝑏𝑦𝑥 𝑎𝑛𝑑 𝑏𝑥𝑦 ,find the correlation coefficient
between 𝑥 𝑎𝑛𝑑 𝑦 for the following data:
x 4 2 3 4 2
y 2 3 2 4 4
Solution: n=5
𝑥 𝑦 𝑥2 𝑦2 𝑥𝑦
4 2 16 4 8
2 3 4 9 6
3 2 9 4 6
4 4 16 16 16
2 4 4 16 8
∑ 𝑥 = 15 ∑ 𝑦 = 15 ∑ 𝑥 2 = 49 ∑ 𝑦 2 = 49 ∑ 𝑥𝑦 = 44
∑𝑥 ∑ 𝑦
∑ 𝑥𝑦−
𝑏𝑦𝑥 = 2
𝑛
(∑ 𝑥) 2 = - 0.25
∑𝑥 −
𝑛
∑𝑥 ∑ 𝑦
∑ 𝑥𝑦−
and 𝑏𝑥𝑦= 𝑛
(∑ 𝑦)2 = -0.25
∑ 𝑦2 −
𝑛
Silver Oak University-CE/IT Degree Engineering-3rd Sem-Maths IV-Unit 4-Correlation and Regression-Dr. Moksha Satia Page | 19
Examples:2 Find the two regression from the following data and hence find the
correlation coefficient.
x 6 2 10 4 8
y 9 11 5 8 7
Solution: n=5
∑𝑥 30 ∑ 𝑦 40
𝑥̅ = = = 6 , 𝑦̅ = = =8
𝑛 5 𝑛 5
𝒙 𝒚 (𝒙 − 𝒙
̅) (𝒚 − 𝒚
̅) ̅) 𝟐
(𝒙 − 𝒙 ̅) 𝟐
(𝒙 − 𝒙
∑(𝒙
−𝒙
̅ )(𝒚 − 𝒚
̅)
6 9 0 1 0 1 0
2 11 -4 3 16 9 -12
10 5 4 -3 16 9 -12
4 8 -2 0 4 0 0
8 7 2 -1 4 1 -2
∑(𝑥−𝑥̅ )(𝑦−𝑦̅)
(i) 𝑏𝑦𝑥 = ∑(𝑥−𝑥̅ )2
= −0.65
∑(𝑥−𝑥̅ )(𝑦−𝑦̅ )
and 𝑏𝑥𝑦 = ∑(𝑦−𝑦̅) 2
= −1.3
The equation of regression line of 𝑥 𝑜𝑛 𝑦 is
𝜎𝑥
𝑥 − 𝑥̅ = 𝑟 (𝑦 − 𝑦̅)
𝜎𝑦
Silver Oak University-CE/IT Degree Engineering-3rd Sem-Maths IV-Unit 4-Correlation and Regression-Dr. Moksha Satia Page | 20
𝑥 − 6 = 1.3(𝒚 − 𝟖)
𝑥 = −1.3𝑦 + 16.4
The equation of regression line of 𝑦 𝑜𝑛 𝑥 is
𝜎𝑦
𝑦 − 𝑦̅ = 𝑟 (𝑥 − 𝑥̅)
𝜎𝑥
𝑦 − 8 = −0.65 (𝑥 − 6)
𝑦 = −0.65𝑥 + 11.9
Examples:3 Find the two regression from the following data and hence find the
correlation coefficient.
Purchase(y) 85 90 70 72 95 81 74
𝑑𝑥 = 𝑥 − 𝑎 = 𝑥 − 93,
𝑑𝑦 = 𝑦 − 𝑏 = 𝑦 − 91
𝑛=7
𝒙 𝒚 𝒅𝒙 𝒅𝒚 𝒅𝟐𝒙 𝒅𝟐𝒚 𝒅𝒙 𝒅𝒚
100 85 7 4 49 16 28
98 90 5 9 25 81 45
Silver Oak University-CE/IT Degree Engineering-3rd Sem-Maths IV-Unit 4-Correlation and Regression-Dr. Moksha Satia Page | 21
85 72 -8 -9 64 81 72
93 81 0 0 0 0 0
80 74 -13 -7 169 49 91
∑ 𝑑𝑥 ∑ 𝑑𝑦
∑ 𝑑𝑥 𝑑𝑦 −
𝑏𝑦𝑥 = 𝑛
(∑ 𝑑𝑥) 2
=0.785
2
√∑ 𝑑𝑥 −
𝑛
∑ 𝑑𝑥 ∑ 𝑑𝑦
∑ 𝑑𝑥 𝑑𝑦 −
and 𝑏𝑥𝑦 = 𝑛
= 1.1746
2
(∑ 𝑑𝑦)
√∑ 𝑑2 −
𝑦 𝑛
𝑥 − 92 = 1.1746( 𝒚 − 𝟖𝟏)
𝑥 = 1.1746𝑦 + 3.1426
The equation of regression line of 𝑦 𝑜𝑛 𝑥 is
𝜎𝑦
𝑦 − 𝑦̅ = 𝑟 (𝑥 − 𝑥̅)
𝜎𝑥
𝑦 − 81 = 0.785 (𝑥 − 92)
𝑦 = 0.785𝑥 + 8.78
Silver Oak University-CE/IT Degree Engineering-3rd Sem-Maths IV-Unit 4-Correlation and Regression-Dr. Moksha Satia Page | 22
Examples:1 Find the regression coefficients 𝑏𝑦𝑥 𝑎𝑛𝑑 𝑏𝑥𝑦 ,find the correlation coefficient
between 𝑥 𝑎𝑛𝑑 𝑦 for the following data:
x 7 4 8 6 5
y 6 5 9 8 2
Examples:2 Find the two regressions from the following data and hence find the
correlation coefficient.
x 25 22 28 26 35 20 22 40 20 18
y 18 15 20 17 22 14 16 21 15 14
Examples:3 Find the two regressions from the following data and hence find the
correlation coefficient and estimate 𝑦 𝑓𝑜𝑟 𝑥 = 73.
x 70 72 74 76 78 80
Examples:4 Find the regression coefficients 𝑏𝑦𝑥 𝑎𝑛𝑑 𝑏𝑥𝑦 ,find the correlation coefficient
between 𝑦 𝑜𝑛 𝑥 for the following data:
x 2 4 6 8
y 1 2 2.5 3
Silver Oak University-CE/IT Degree Engineering-3rd Sem-Maths IV-Unit 4-Correlation and Regression-Dr. Moksha Satia Page | 23