© Jitesh Khurkhuriya – Azure ML Online Course
Section 7
Regression
© Jitesh Khurkhuriya – Azure ML Online Course
Regression Analysis
• Statistical process for estimating the Y
relationships among variables
• Relationship between a dependent
Claims
variable and one or more independent
variables (or 'predictors')
Number of Medical
Claims
• The predictor is a continuous variable
• Can also be used to infer causal Age X
relationships between dependent and
independent variables.
© Jitesh Khurkhuriya – Azure ML Online Course
Regression Analysis
Regression Line
• Statistical process for estimating the Y Y =f(X)
relationships among variables Predicted
Value
• Relationship between a dependent
Claims
variable and one or more independent
variables (or 'predictors')
• The predictor is a continuous variable
Number of Medical
Claims
• Can also be used to infer causal Age X
relationships between dependent and
independent variables.
© Jitesh Khurkhuriya – Azure ML Online Course
Causal Relationship?
© Jitesh Khurkhuriya – Azure ML Online Course
Y Strong Positive Correlation Y Weak Positive Correlation
X X
No Correlation Y Strong Negative Correlation
Y
X X
© Jitesh Khurkhuriya – Azure ML Online Course
Linear Regression
Simple Regression Y
Regression Line
Y =f(X)
Y = 𝛽 0 + 𝛽1X Predicted Value
Claims
Number of Medical
Claims
Age X
multivariate linear regression.
Y = 𝛽 0 + 𝛽1X1+ 𝛽2X2 + 𝛽3X3….. + 𝛽nXn
© Jitesh Khurkhuriya – Azure ML Online Course
Simple Linear Regression
© Jitesh Khurkhuriya – Azure ML Online Course
Simple Linear Regression
Y
^ Predicted Value
Y
Simple Regression
Y = b0 + b1X
Dependent Variable Independent Variable
b0
© Jitesh Khurkhuriya – Azure ML Online Course
Simple Linear Regression
Hrs Studied Marks X – Mean Y – Mean
A^2 A*B
(X) (Y) (A) (B)
0 40 -5.38 -26.31 28.99 141.66
Y = b0 + b1X
2 52
3 53
4 55
4 56
5 72
6 71
6 88
7 56
7 74
8 89
9 67
9 89
5.38 66.31
Mean Sum
© Jitesh Khurkhuriya – Azure ML Online Course
Simple Linear Regression
Hrs Studied Marks X – Mean Y – Mean
A^2 A*B
(X) (Y) (A) (B)
0 40 -5.38 -26.31 28.99 141.66
Y = b0 + b1X
2 52 -3.38 -14.31 11.46 48.43
3 53 -2.38 -13.31 5.69 31.73
4 55 -1.38 -11.31 1.92 15.66
4
5
56
72
-1.38 -10.31 1.92 14.27
b1 =
𝛴(X – X ) (Y – Y )
-0.38 5.69 0.15 -2.19
6
6
71
88
0.62
0.62
4.69
21.69
0.38
0.38
2.89
13.35
𝛴(X – X ) 2
7 56 1.62 -10.31 2.61 -16.65
7 74 1.62 7.69 2.61 12.43
8 89 2.62 22.69 6.84 59.35
= 405.46 / 89.08
9 67 3.62 0.69 13.07 2.50
9 89 3.62 22.69 13.07 82.04 = 4.55
5.38 66.31 89.08 405.46
Mean Sum
© Jitesh Khurkhuriya – Azure ML Online Course
Simple Linear Regression
Hrs Studied Marks
(X) (Y)
Y = b0 + b1X b1 = 4.55 b0 = 41.8
0 40
2 52
3 53 66.31 = b0 + 4.55 (5.38)
4 55
4 56
5 72
6 71
6 88 66.31
7 56
7 74
8 89
9 67
9 89
5.38 66.31
5.38
Mean
© Jitesh Khurkhuriya – Azure ML Online Course
Simple Linear Regression
Hrs Studied Marks
(X) (Y)
Y = b0 + b1X b1 = 4.55 b0 = 41.8
0 40
2 52
3 53
4 55
Y = 41.8 + 4.55X
4 56
5 72
6 71
6 88 66.31
7 56
7 74
8 89
9 67
9 89
5.38 66.31
5.38
Mean
© Jitesh Khurkhuriya – Azure ML Online Course
Common Regression Terms
© Jitesh Khurkhuriya – Azure ML Online Course
Ordinary Least Square
Regression Line
Y
Y = f(X)
Yi
^
Yi
Error
X
Xi
© Jitesh Khurkhuriya – Azure ML Online Course
Ordinary Least Square
Y Regression Line
Y = f(X)
Minimum
𝑛
Yi ^ 2
(𝑦𝑖 − 𝑦𝑖)
^
Yi 𝑖=1
Error
X
Xi
© Jitesh Khurkhuriya – Azure ML Online Course
Mean Absolute Error
Y
𝑛
Yi 1 ^
MAE = |𝑦𝑖 − 𝑦𝑖|
^ 𝑛 𝑖=1
Yi
Error
X
Xi
Mean absolute error (MAE) is a quantity used to measure how close
forecasts or predictions are to the eventual outcomes.
© Jitesh Khurkhuriya – Azure ML Online Course
Root Mean Square Error
√ 1 ^
2
RMSE = (𝑦𝑖 − 𝑦𝑖)
𝑛 𝑖=1
• Very commonly used and makes for an excellent general
purpose error metric for numerical predictions.
• Compared to the similar Mean Absolute Error, RMSE
amplifies and severely punishes large errors.
© Jitesh Khurkhuriya – Azure ML Online Course
Relative Absolute Error
Y
Yi 𝑛
^ ^
Yi |𝑦𝑖 − 𝑦𝑖|
Error 𝑖=1
X RAE =
Xi 𝑛
–
Y |𝑦𝑖 − 𝑦𝑖|
Yi 𝑖=1
– Mean
Yi
X
Xi
© Jitesh Khurkhuriya – Azure ML Online Course
R Squared or Coefficient of
Determination
© Jitesh Khurkhuriya – Azure ML Online Course
Coefficient of Determination
How much (what % ) of variation
–
in Y is described by the variation Y
in X?
Xi X
© Jitesh Khurkhuriya – Azure ML Online Course
R-Square With an Example
Hrs Studied Marks
(X) (Y)
Y = b0 + b1X b1 = 4.55 b0 = 41.8
0 40
2 52
3 53
4 55
Y = 41.8 + 4.55X
4 56
5 72
6 71
6 88 66.31
7 56
7 74
8 89
9 67
9 89
5.38 66.31
5.38
Mean
© Jitesh Khurkhuriya – Azure ML Online Course
R-Square With an Example
Hrs Studied Marks Predicted Marks – ^ –
^
(X) (Y) Y
0 40 41.80
2 52
3 53
4 55
4 56
5 72
6 71
Y = 41.8 + 4.55X
6 88
7 56
7 74
8 89
9 67
9 89
5.38 66.31
Mean
© Jitesh Khurkhuriya – Azure ML Online Course
R-Square With an Example
Hrs Studied Marks Predicted Marks – ^ –
^ (Y – Y)^2 (Y – Y)^2
(X) (Y) Y
0 40 41.80 (40 – 66.31)^2 (41.8 – 66.31)^2
2 52 50.90
3 53 55.45
4 55 60.00
4 56 60.00
5 72 64.55
6 71
Y = 41.8 + 4.55X 69.10
6 88 69.10
7 56 73.65
7 74 73.65
8 89 78.20
9 67 82.75
9 89 82.75
5.38 66.31
Mean
© Jitesh Khurkhuriya – Azure ML Online Course
R-Square With an Example
Hrs Studied Marks Predicted Marks – ^ –
^ (Y – Y)^2 (Y – Y)^2
(X) (Y) Y
0 40 41.80 692.22 600.74
2 52 50.90 204.78 237.47
3 53 55.45 177.16 117.94
4 55 60.00 127.92 39.82
4 56 60.00 106.30 39.82
5 72 64.55 32.38 3.10
6 71
Y = 41.8 + 4.55X 69.10 22.00 7.78
6 88 69.10 470.46 7.78
7 56 73.65 106.30 53.88
7 74 73.65 59.14 53.88
8 89 78.20 514.84 141.37
9 67 82.75 0.48 270.27
9 89 82.75 514.84 270.27
5.38 66.31 3028.77 1844.12
Mean SST SSR
© Jitesh Khurkhuriya – Azure ML Online Course
Coefficient of Determination
Sum of Squares Due to Regression Y
𝑛
– 2
^
SSR = (𝑦𝑖 − 𝑦𝑖) Y
𝑖=1 –
Y
^
Y
Total Sum of Squares
𝑛
– 2
SST = (𝑦𝑖 − 𝑦𝑖) Xi X
𝑖=1
© Jitesh Khurkhuriya – Azure ML Online Course
Coefficient of Determination
2
R = SSR/SST = 1844.12/3028.77
= 0.60886
Higher the value Variation in Y is explained by variation in X.
© Jitesh Khurkhuriya – Azure ML Online Course
Gradient Descent
© Jitesh Khurkhuriya – Azure ML Online Course
Hypothesis
Y
“Proposed explanation made on the basis of limited
evidence as a starting point for further investigation”
h(x) = b0 + b1x
Find out value of b0 and b1 such
that
Y ~ h(x) X
for the given observations
© Jitesh Khurkhuriya – Azure ML Online Course
Example of Linear Regression
Hrs Studied Marks Marks
0 40 100
2 52 90
3 53 80
4 55 70
4 56 60
5 72 50
40
6 71
30
6 88
20
7 56 10
7 74 0
8 89 0 1 2 3 4 5 6 7 8 9 10
9 67 Hrs Studied
9 89
© Jitesh Khurkhuriya – Azure ML Online Course
Cost Function
Hypothesis: h(x) = b0 + b1x
𝑛
1 ^ 2
(𝑦𝑖 − 𝑦𝑖)
2n
𝑖=1
© Jitesh Khurkhuriya – Azure ML Online Course
Cost Function
Hypothesis: h(x) = b0 + b1x
b0 = 0; b1 = 1
Hrs Studied Marks ^
Marks Predicted
0 40 0
2 52 2
3 53
4 55
4 56
5 72
6 71
6 88
7 56
7 74
8 89
9 67
9 89 © Jitesh Khurkhuriya – Azure ML Online Course
Cost Function
Hypothesis: h(x) = b0 + b1x
b0 = 0; b1 = 1
Hrs Studied Marks ^
Marks Predicted (Yi – Yi)^2 𝑛
0 40 0 1600 1 ^ 2
2 52 2 2500 (𝑦𝑖 − 𝑦𝑖)
3 53 3 2500
2n
𝑖=1
4 55 4 2601
4 56 4 2704
5 72 5 4489
6 71 6 4225
6 88 6 6724 b0 b1 Cost
7 56 7 2401
7 74 7 4489 0 1 1944.538
8 89 8 6561
9 67 9 3364
9 89 9 6400
© Jitesh Khurkhuriya – Azure ML Online Course
Cost Function Plot
b0 b1 cost
0 1 1944.54 Cost
0 2 1610.08
2500.00
0 3 1311.46
0 5 821.77 2000.00
0 7 475.46
0 8 356.08 1500.00
0 10 224.85
1000.00
0 12 237.00
0 14 392.54 500.00
0 15 524.08
0 16 691.46 0.00
0 17 894.69 0 5 10 15 20 25
0 18 1133.77 b1
0 20 1719.46
0 21 2066.08 © Jitesh Khurkhuriya – Azure ML Online Course
Cost Function with b0 and b1
X axis – b1
Y axis – b0
Z = C(b0,b1)
https://academo.org/
© Jitesh Khurkhuriya – Azure ML Online Course
Gradient Descent
© Jitesh Khurkhuriya – Azure ML Online Course
Gradient Descent
© Jitesh Khurkhuriya – Azure ML Online Course
Gradient Descent
Cost Function: C(b1)
bj := bj – α f(Ci)
α Learning
Rate
b1
© Jitesh Khurkhuriya – Azure ML Online Course
Gradient Descent?
Local Minimum
Global Minimum
© Jitesh Khurkhuriya – Azure ML Online Course
Batch Gradient Descent
X1 X2 …. Xn
Sum of All before taking one step (epoch)
bj := bj – α f(Ci)
Long time to reach the bottom
Does it for
number of examples
number of features Batch Gradient Descent
learning rate
© Jitesh Khurkhuriya – Azure ML Online Course
Batch Vs Stochastic Gradient Descent
X1 X2 …. Xn X1 X2 …. Xn
bj := bj – α f(Ci)
Does it for Randomly shuffle the dataset
number of examples
number of features Repeat the steps for every example
learning rate
Modify the coefficient at every step
Batch Gradient Descent
© Jitesh Khurkhuriya – Azure ML Online Course
Stochastic Gradient Descent
© Jitesh Khurkhuriya – Azure ML Online Course
Decision Tree Regression
© Jitesh Khurkhuriya – Azure ML Online Course
Decision Tree Terms
Root Node
Branch or Subtree
Decision Node Decision Node
Terminal Node Decision Node Terminal Node Terminal Node
Leaf Leaf Leaf
Terminal Node Terminal Node
Leaf Leaf
© Jitesh Khurkhuriya – Azure ML Online Course
Income
Level
High Low Medium
Credit Score
Low High
Employment Type
SE Salaried
LID IL CS ET Status LID IL CS ET Status
L1 Medium Low SE No L4 Medium Low Salaried Yes
L8 Medium Low SE No
Pure Subset
Pure Subset
© Jitesh Khurkhuriya – Azure ML Online Course
Decision Tree Regression
X1 > 30
Yes No
X2 < 20 X2 < 40
Yes No Yes No
X1 < 60
Yes No
R1 R2 R3 R4 R5
© Jitesh Khurkhuriya – Azure ML Online Course
Decision Tree Regression
X2
Yes
X1 < 30
No R3
X2< 20 X2 < 40 40
Yes No Yes No
X1 < 60 R2
R4 R5
Yes No
20
R1 R2 R4 R5 R3
R1
X1
30 60
© Jitesh Khurkhuriya – Azure ML Online Course
Decision Tree Regression
X2
The output by every region is the
mean of that region
Or
Every leaf may have a regression
line for the data points within
that region
X1
© Jitesh Khurkhuriya – Azure ML Online Course
Boosted Decision Tree Regression
• MART gradient boosting algorithm.
• Builds each regression tree in a step-wise fashion
• Predefined loss function to measure the error in each step
© Jitesh Khurkhuriya – Azure ML Online Course
Thank You and Have a Great
Time…!
© Jitesh Khurkhuriya – Azure ML Online Course