Unit-2 Linear Regression Numericals
Linear regression is the most basic and commonly used predictive analysis. One variable is considered to
be an explanatory variable, and the other is considered to be a dependent variable. For example, a modeler
might want to relate the weights of individuals to their heights using a linear regression model.
There are several linear regression analyses available to the researcher.
Simple linear regression
• One dependent variable (interval or ratio)
• One independent variable (interval or ratio or dichotomous)
Multiple linear regression
• One dependent variable (interval or ratio)
• Two or more independent variables (interval or ratio or dichotomous)
Logistic regression
• One dependent variable (binary)
• Two or more independent variable(s) (interval or ratio or dichotomous)
Ordinal regression
• One dependent variable (ordinal)
• One or more independent variable(s) (nominal or dichotomous)
Multinomial regression
• One dependent variable (nominal)
• One or more independent variable(s) (interval or ratio or dichotomous)
Discriminant analysis
• One dependent variable (nominal)
• One or more independent variable(s) (interval or ratio)
Formula for linear regression equation is given by:
𝑦 = 𝑎 + 𝑏𝑥
a and b are given by the following formulas:
𝑛∑𝑥𝑦 − (∑𝑥)(∑𝑦)
𝑏(𝑠𝑙𝑜𝑝𝑒) =
𝑛∑𝑥 2 − (∑𝑥)2
Where,
x and y are two variables on the regression line.
b = Slope of the line.
a = y-intercept of the line.
x = Values of the first data set.
y = Values of the second data set.
Solved Examples
Question: Find linear regression equation for the following two sets of data:
x 2 4 6 8
y 3 7 5 10
Solution:
Construct the following table:
x y x2 xy
2 3 4 6
4 7 16 28
6 5 36 30
8 10 64 80
= 20 = 25 = 120 = 144
𝑛∑𝑥𝑦−(∑𝑥)(∑𝑦)
𝑏= 𝑛∑𝑥 2 −(∑𝑥)2
=
b = 0.95
∑𝑦∑𝑥 2 –∑𝑥∑𝑥𝑦
𝑎= 𝑛(∑𝑥 2 )–(∑𝑥)2
a = 1.5
Linear regression is given by:
y = a + bx
y = 1.5 + 0.95 x
Linear Regression
Problems with Solutions
Linear regression and modelling problems are presented along with their solutions at the bottom of the
page. Also a linear regression calculator and grapher may be used to check answers and create more
opportunities for practice.
Review
If the plot of n pairs of data (x , y) for an experiment appear to indicate a "linear relationship" between y
and x, then the method of least squares may be used to write a linear relationship between x and y.
The least squares regression line is the line that minimizes the sum of the squares (d1 + d2 + d3 + d4) of
the vertical deviation from each data point to the line (see figure below as an example of 4 points).
Figure 1. Linear regression where the sum of vertical distances d1 + d2 + d3 + d4 between observed and
predicted (line and its equation) values is minimized.
The least square regression line for the set of n data points is given by the equation of a line in slope
intercept form:
y=ax+b
where a and b are given by
Figure 2. Formulas for the constants a and b included in the linear regression .
• Problem 1
Consider the following set of points: {(-2 , -1) , (1 , 1) , (3 , 2)}
a) Find the least square regression line for the given data points.
b) Plot the given points and the regression line in the same rectangular system of axes.
• Problem 2
a) Find the least square regression line for the following set of data
{(-1 , 0),(0 , 2),(1 , 4),(2 , 5)}
b) Plot the given points and the regression line in the same rectangular system of axes.
• Problem 3
The values of y and their corresponding values of y are shown in the table below
x 0 1 2 3 4
y 2 3 5 4 6
a) Find the least square regression line y = a x + b.
b) Estimate the value of y when x = 10.
• Problem 4
The sales of a company (in million dollars) for each year are shown in the table below.
x (year) 2005 2006 2007 2008 2009
y (sales) 12 19 29 37 45
a) Find the least square regression line y = a x + b.
Solutions to the Above Problems
1. a) Let us organize the data in a table.
x y xy x2
-2 -1 2 4
1 1 1 1
3 2 6 9
Σx = 2 Σy = 2 Σxy = 9 Σx2 = 14
2.
We now use the above formula to calculate a and b as follows
a = (nΣx y - ΣxΣy) / (nΣx2 - (Σx)2) = (3*9 - 2*2) / (3*14 - 22) = 23/38
b = (1/n)(Σy - a Σx) = (1/3)(2 - (23/38)*2) = 5/19
b) We now graph the regression line given by y = a x + b and the given points.
3.
Figure 3. Graph of linear regression in problem 1.
4. a) We use a table as follows
x Y xy x2
-1 0 0 1
0 2 0 0
1 4 4 1
2 5 10 4
Σx = 2 Σy = 11 Σx y = 14 Σx2 = 6
We now use the above formula to calculate a and b as follows
a = (nΣx y - ΣxΣy) / (nΣx2 - (Σx)2) = (4*14 - 2*11) / (4*6 - 22) = 17/10 = 1.7
b = (1/n)(Σy - a Σx) = (1/4)(11 - 1.7*2) = 1.9
b) We now graph the regression line given by y = ax + b and the given points.
5.
Figure 4. Graph of linear regression in problem 2.
6. a) We use a table to calculate a and b.
x Y xy x2
0 2 0 0
1 3 3 1
2 5 10 4
3 4 12 9
4 6 24 16
Σx = 10 Σy = 20 Σx y = 49 Σx2 = 30
We now calculate a and b using the least square regression formulas for a and b.
a = (nΣx y - ΣxΣy) / (nΣx2 - (Σx)2) = (5*49 - 10*20) / (5*30 - 102) = 0.9
b = (1/n)(Σy - a Σx) = (1/5)(20 - 0.9*10) = 2.2
b) Now that we have the least square regression line y = 0.9 x + 2.2, substitute x by 10 to find the
value of the corresponding y.
y = 0.9 * 10 + 2.2 = 11.2
7. a) We first change the variable x into t such that t = x - 2005 and therefore t represents the
number of years after 2005. Using t instead of x makes the numbers smaller and therefore
manageable. The table of values becomes.
t (years after 2005) 0 1 2 3 4
y (sales) 12 19 29 37 45
We now use the table to calculate a and b included in the least regression line formula.
t Y ty t2
0 12 0 0
1 19 19 1
2 29 58 4
3 37 111 9
4 45 180 16
Σx = 10 Σy = 142 Σxy = 368 Σx2 = 30
We now calculate a and b using the least square regression formulas for a and b.
a = (nΣt y - ΣtΣy) / (nΣt2 - (Σt)2) = (5*368 - 10*142) / (5*30 - 102) = 8.4
b = (1/n)(Σy - a Σx) = (1/5)(142 - 8.4*10) = 11.6
b) In 2012, t = 2012 - 2005 = 7
The estimated sales in 2012 are: y = 8.4 * 7 + 11.6 = 70.4 million dollars.
Example 9.9
Calculate the regression coefficient and obtain the lines of regression for the following data
Solution:
Regression coefficient of X on Y
(i) Regression equation of X on Y
(ii) Regression coefficient of Y on X
(iii) Regression equation of Y on X
Y = 0.929X–3.716+11
= 0.929X+7.284
The regression equation of Y on X is Y= 0.929X + 7.284
Example 9.10
Calculate the two regression equations of X on Y and Y on X from the data given below, taking deviations
from a actual means of X and Y.
Estimate the likely demand when the price is Rs.20.
Solution:
Calculation of Regression equation
(i) Regression equation of X on Y
(ii) Regression Equation of Y on X
When X is 20, Y will be
= –0.25 (20)+44.25
= –5+44.25
= 39.25 (when the price is Rs. 20, the likely demand is 39.25)
Example 9.11
Obtain regression equation of Y on X and estimate Y when X=55 from the following
Solution:
(i) Regression coefficients of Y on X
(ii) Regression equation of Y on X
Y–51.57 = 0.942(X–48.29 )
Y = 0.942X–45.49+51.57=0.942 #–45.49+51.57
Y = 0.942X+6.08
The regression equation of Y on X is Y= 0.942X+6.08 Estimation of Y when X= 55
Y= 0.942(55)+6.08=57.89
Example 9.12
Find the means of X and Y variables and the coefficient of correlation between them from the following
two regression equations:
2Y–X–50 = 0
3Y–2X–10 = 0.
Solution:
We are given
2Y–X–50 = 0 ... (1)
3Y–2X–10 = 0 ... (2)
Solving equation (1) and (2)
We get Y = 90
Putting the value of Y in equation (1)
We get X = 130
Calculating correlation coefficient
Let us assume equation (1) be the regression equation of Y on X
2Y = X+50
Example 9.13
Find the means of X and Y variables and the coefficient of correlation between them from the following
two regression equations:
4X–5Y+33 = 0
20X–9Y–107 = 0
Solution:
We are given
4X–5Y+33 = 0 ... (1)
20X–9Y–107 =0 ... (2)
Solving equation (1) and (2)
We get Y = 17
Putting the value of Y in equation (1)
Calculating correlation coefficient
Let us assume equation (1) be the regression equation of X on Y
Let us assume equation (2) be the regression equation of Y on X
But this is not possible because both the regression coefficient are greater than
So our above assumption is wrong. Therefore treating equation (1) has regression equation of Y on X and
equation (2) has regression equation of X on Y . So we get
Example 9.16
For 5 pairs of observations the following results are obtained ∑X=15, ∑Y=25, ∑X2 =55, ∑Y2 =135,
∑XY=83 Find the equation of the lines of regression and estimate the value of X on the first line
when Y=12 and value of Y on the second line if X=8.
Solution:
Y–5 = 0.8(X–3)
= 0.8X+2.6
When X=8 the value of Y is estimated as
= 0.8(8)+2.6
=9