Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
9 views3 pages

MIS BA Solution Chapter03

Uploaded by

xujie623
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views3 pages

MIS BA Solution Chapter03

Uploaded by

xujie623
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

UCD Business Analytics - Practical Sheet Solution

Miguel Nicolau

Chapter 3: Linear Regression


Exercise 1
Table 1: Employees and sales in a small sample of companies.

employees sales (thousands of Euros)


1 15
4 25
5 100
7 120

1. For the data shown in Table 1, calculate a linear regression model of the form y = a + bx, using
employees as the predictor and sales as the response. Apply the Least Squares method, using the
formulas below.

P P P
n · xy − x y
b= P 2 P
n · x − ( x)2
a = ȳ − bx̄

Solution
This exercise instructs us to use employees as the predictor variable (x), and sales as the response
variable (y). So in order to apply the Least Squares equations to obtain the a and b coefficients, we
need to calculate some required values:

• n (number of samples): 4;

P
xy (sum of each x value multiplied by corresponding y value): 1×15+4×25+5×100+7×120 =
1455;

P
x (sum of all x values): 1 + 4 + 5 + 7 = 17;

P
y (sum of all y values): 15 + 25 + 100 + 120 = 260;

P 2
x (sum of each x value squared): 12 + 42 + 52 + 72 = 91;
• ( x) (sum of all x values, squared): 172 = 289;
P 2

• x̄ (average of all x values): 17/4 = 4.25;


• ȳ (average of all y values): 260/4 = 65.

So the slope will be:


4 × 1455 − 17 × 260 5820 − 4420 1400
b= = = = 18.667
4 × 91 − 289 364 − 289 75

1
And the intercept will be

a = 65 − 18.667 × 4.25 = 65 − 79.335 = −14.335

2. Calculate the predictions of the model for each of the data points of the training set (i.e. x = 1, 4, 5, 7).

Solution
f (1) = −14.335 + 18.667 ∗ 1 = 4.332
f (4) = −14.335 + 18.667 ∗ 4 = 60.333
f (5) = −14.335 + 18.667 ∗ 5 = 79
f (7) = −14.335 + 18.667 ∗ 7 = 116.334
3. Calculate the train RMSE and R2 , using the formulas below.
r Pn
i=1 (yi − (a + bxi ))2
RMSE =
n
Pn 2
2 i=1(yi − (a + bxi ))
R =1− Pn 2
i=1 (yi − ȳ)

Solution
r
(15 − 4.332)2 + (25 − 60.333)2 + (100 − 79)2 + (120 − 116.334)2
RMSE =
4
r
10.6682 + (−35.333)2 + 212 + 3.6662
=
4
r
113.806 + 1248.421 + 441 + 13.440
=
4
r
1816.667
=
4

= 454.167
= 21.311

1816.667
R2 = 1 −
(15 − 65)2 + (25 − 65)2 + (100 − 65)2 + (120 − 65)2
1816.667
=1−
(−50) + (−40)2 + 352 + 552
2

1816.667
=1−
2500 + 1600 + 1225 + 3025
1816.667
=1−
8350
= 1 − 0.218
= 0.782

4. Table 2 shows available test data. Use it to calculate test RMSE and R2 values. Which would you
typically expect to be a larger value: train RMSE or test RMSE? What about train vs. test R2 ?
2
Table 2: Test data for employees and sales

employees sales (thousands of Euros)


3 26
10 135

Solution
r
(26 − f (3))2 + (135 − f (10))2
RMSE =
2
r
(26 − 41.666) + (135 − 172.335)2
2
=
2
r
(−15.666) + (−37.335)2
2
=
2
r
245.424 + 1393.902
=
2
r
1639.326
=
2

= 819.663
= 28.63

1639.326
R2 = 1 −
(26 − 80.5)2 + (135 − 80.5)2
1639.326
=1−
(−54.5)2 + 54.52
1639.326
=1−
2970.25 + 2970.25
1639.326
=1−
5940.5
= 1 − 0.276
= 0.724

Typically you would expect train RMSE to be smaller than test RMSE, because the model was made
to fit the train data, and RMSE is an error measure.
Likewise, you would expect the train R2 to be higher than the test R2 , because the model was trained
using the variance of y from the train dataset.
5. For each of the points in Table 2, is that an in-sample or out-of-sample point?
In-sample data is the data that was used to train the model. Table 2 does not contain any data from
the training set (i.e. from Table 1), therefore both observations are out-of-sample data.
6. For each of your predictions for Table 2, is it an interpolation or extrapolation?
Interpolation basically means to make predictions for x values within the range of x values used to
train the model; extrapolation is the opposite. The range of x values used to train the model was
[1, 7]; this means that a prediction for x = 3 is an interpolation, whereas a prediction for x = 10 is an
extrapolation.

You might also like