Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
9 views50 pages

Section+07+ +regression

The document provides an overview of regression analysis, focusing on the relationship between dependent and independent variables, and introduces concepts such as simple linear regression, ordinary least squares, and various error metrics like Mean Absolute Error and Root Mean Square Error. It explains the calculation of the coefficient of determination (R-squared) to measure how much variation in the dependent variable is explained by the independent variable. Additionally, it includes examples and formulas for understanding and applying regression techniques.

Uploaded by

sundar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views50 pages

Section+07+ +regression

The document provides an overview of regression analysis, focusing on the relationship between dependent and independent variables, and introduces concepts such as simple linear regression, ordinary least squares, and various error metrics like Mean Absolute Error and Root Mean Square Error. It explains the calculation of the coefficient of determination (R-squared) to measure how much variation in the dependent variable is explained by the independent variable. Additionally, it includes examples and formulas for understanding and applying regression techniques.

Uploaded by

sundar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 50

© Jitesh Khurkhuriya – Azure ML Online Course

Section 7
Regression

© Jitesh Khurkhuriya – Azure ML Online Course


Regression Analysis

• Statistical process for estimating the Y


relationships among variables

• Relationship between a dependent

Claims
variable and one or more independent
variables (or 'predictors')
Number of Medical
Claims
• The predictor is a continuous variable

• Can also be used to infer causal Age X


relationships between dependent and
independent variables.

© Jitesh Khurkhuriya – Azure ML Online Course


Regression Analysis
Regression Line
• Statistical process for estimating the Y Y =f(X)
relationships among variables Predicted
Value

• Relationship between a dependent

Claims
variable and one or more independent
variables (or 'predictors')

• The predictor is a continuous variable


Number of Medical
Claims
• Can also be used to infer causal Age X
relationships between dependent and
independent variables.

© Jitesh Khurkhuriya – Azure ML Online Course


Causal Relationship?

© Jitesh Khurkhuriya – Azure ML Online Course


Y Strong Positive Correlation Y Weak Positive Correlation

X X

No Correlation Y Strong Negative Correlation


Y

X X
© Jitesh Khurkhuriya – Azure ML Online Course
Linear Regression
Simple Regression Y
Regression Line
Y =f(X)

Y = 𝛽 0 + 𝛽1X Predicted Value

Claims
Number of Medical
Claims
Age X
multivariate linear regression.
Y = 𝛽 0 + 𝛽1X1+ 𝛽2X2 + 𝛽3X3….. + 𝛽nXn
© Jitesh Khurkhuriya – Azure ML Online Course
Simple Linear Regression

© Jitesh Khurkhuriya – Azure ML Online Course


Simple Linear Regression
Y

^ Predicted Value
Y
Simple Regression
Y = b0 + b1X

Dependent Variable Independent Variable

b0

© Jitesh Khurkhuriya – Azure ML Online Course


Simple Linear Regression
Hrs Studied Marks X – Mean Y – Mean
A^2 A*B
(X) (Y) (A) (B)
0 40 -5.38 -26.31 28.99 141.66
Y = b0 + b1X
2 52
3 53
4 55
4 56
5 72
6 71
6 88
7 56
7 74
8 89
9 67
9 89
5.38 66.31
Mean Sum
© Jitesh Khurkhuriya – Azure ML Online Course
Simple Linear Regression
Hrs Studied Marks X – Mean Y – Mean
A^2 A*B
(X) (Y) (A) (B)
0 40 -5.38 -26.31 28.99 141.66
Y = b0 + b1X
2 52 -3.38 -14.31 11.46 48.43
3 53 -2.38 -13.31 5.69 31.73
4 55 -1.38 -11.31 1.92 15.66
4
5
56
72
-1.38 -10.31 1.92 14.27
b1 =
𝛴(X – X ) (Y – Y )
-0.38 5.69 0.15 -2.19
6
6
71
88
0.62
0.62
4.69
21.69
0.38
0.38
2.89
13.35
𝛴(X – X ) 2

7 56 1.62 -10.31 2.61 -16.65


7 74 1.62 7.69 2.61 12.43
8 89 2.62 22.69 6.84 59.35
= 405.46 / 89.08
9 67 3.62 0.69 13.07 2.50
9 89 3.62 22.69 13.07 82.04 = 4.55
5.38 66.31 89.08 405.46
Mean Sum
© Jitesh Khurkhuriya – Azure ML Online Course
Simple Linear Regression
Hrs Studied Marks
(X) (Y)
Y = b0 + b1X b1 = 4.55 b0 = 41.8
0 40
2 52
3 53 66.31 = b0 + 4.55 (5.38)
4 55
4 56
5 72
6 71
6 88 66.31
7 56
7 74
8 89
9 67
9 89
5.38 66.31
5.38
Mean
© Jitesh Khurkhuriya – Azure ML Online Course
Simple Linear Regression
Hrs Studied Marks
(X) (Y)
Y = b0 + b1X b1 = 4.55 b0 = 41.8
0 40
2 52
3 53
4 55
Y = 41.8 + 4.55X
4 56
5 72
6 71
6 88 66.31
7 56
7 74
8 89
9 67
9 89
5.38 66.31
5.38
Mean
© Jitesh Khurkhuriya – Azure ML Online Course
Common Regression Terms

© Jitesh Khurkhuriya – Azure ML Online Course


Ordinary Least Square

Regression Line
Y
Y = f(X)

Yi
^
Yi

Error

X
Xi
© Jitesh Khurkhuriya – Azure ML Online Course
Ordinary Least Square

Y Regression Line
Y = f(X)
Minimum
𝑛
Yi ^ 2
෍(𝑦𝑖 − 𝑦𝑖)
^
Yi 𝑖=1

Error
X
Xi
© Jitesh Khurkhuriya – Azure ML Online Course
Mean Absolute Error
Y

𝑛
Yi 1 ^
MAE = ෍ |𝑦𝑖 − 𝑦𝑖|
^ 𝑛 𝑖=1
Yi

Error
X
Xi
Mean absolute error (MAE) is a quantity used to measure how close
forecasts or predictions are to the eventual outcomes.
© Jitesh Khurkhuriya – Azure ML Online Course
Root Mean Square Error

√ 1 ^
2
RMSE = ෍(𝑦𝑖 − 𝑦𝑖)
𝑛 𝑖=1

• Very commonly used and makes for an excellent general


purpose error metric for numerical predictions.

• Compared to the similar Mean Absolute Error, RMSE


amplifies and severely punishes large errors.

© Jitesh Khurkhuriya – Azure ML Online Course


Relative Absolute Error
Y
Yi 𝑛
^ ^
Yi ෍ |𝑦𝑖 − 𝑦𝑖|
Error 𝑖=1
X RAE =
Xi 𝑛

Y ෍ |𝑦𝑖 − 𝑦𝑖|
Yi 𝑖=1
– Mean
Yi

X
Xi
© Jitesh Khurkhuriya – Azure ML Online Course
R Squared or Coefficient of
Determination

© Jitesh Khurkhuriya – Azure ML Online Course


Coefficient of Determination

How much (what % ) of variation



in Y is described by the variation Y
in X?

Xi X

© Jitesh Khurkhuriya – Azure ML Online Course


R-Square With an Example
Hrs Studied Marks
(X) (Y)
Y = b0 + b1X b1 = 4.55 b0 = 41.8
0 40
2 52
3 53
4 55
Y = 41.8 + 4.55X
4 56
5 72
6 71
6 88 66.31
7 56
7 74
8 89
9 67
9 89
5.38 66.31
5.38
Mean
© Jitesh Khurkhuriya – Azure ML Online Course
R-Square With an Example
Hrs Studied Marks Predicted Marks – ^ –
^
(X) (Y) Y
0 40 41.80
2 52
3 53
4 55
4 56
5 72
6 71
Y = 41.8 + 4.55X
6 88
7 56
7 74
8 89
9 67
9 89
5.38 66.31
Mean
© Jitesh Khurkhuriya – Azure ML Online Course
R-Square With an Example
Hrs Studied Marks Predicted Marks – ^ –
^ (Y – Y)^2 (Y – Y)^2
(X) (Y) Y
0 40 41.80 (40 – 66.31)^2 (41.8 – 66.31)^2
2 52 50.90
3 53 55.45
4 55 60.00
4 56 60.00
5 72 64.55
6 71
Y = 41.8 + 4.55X 69.10
6 88 69.10
7 56 73.65
7 74 73.65
8 89 78.20
9 67 82.75
9 89 82.75
5.38 66.31
Mean
© Jitesh Khurkhuriya – Azure ML Online Course
R-Square With an Example
Hrs Studied Marks Predicted Marks – ^ –
^ (Y – Y)^2 (Y – Y)^2
(X) (Y) Y
0 40 41.80 692.22 600.74
2 52 50.90 204.78 237.47
3 53 55.45 177.16 117.94
4 55 60.00 127.92 39.82
4 56 60.00 106.30 39.82
5 72 64.55 32.38 3.10
6 71
Y = 41.8 + 4.55X 69.10 22.00 7.78
6 88 69.10 470.46 7.78
7 56 73.65 106.30 53.88
7 74 73.65 59.14 53.88
8 89 78.20 514.84 141.37
9 67 82.75 0.48 270.27
9 89 82.75 514.84 270.27
5.38 66.31 3028.77 1844.12
Mean SST SSR

© Jitesh Khurkhuriya – Azure ML Online Course


Coefficient of Determination

Sum of Squares Due to Regression Y


𝑛
– 2
^
SSR = ෍(𝑦𝑖 − 𝑦𝑖) Y
𝑖=1 –
Y
^
Y
Total Sum of Squares
𝑛
– 2
SST = ෍(𝑦𝑖 − 𝑦𝑖) Xi X
𝑖=1
© Jitesh Khurkhuriya – Azure ML Online Course
Coefficient of Determination

2
R = SSR/SST = 1844.12/3028.77
= 0.60886

Higher the value  Variation in Y is explained by variation in X.

© Jitesh Khurkhuriya – Azure ML Online Course


Gradient Descent

© Jitesh Khurkhuriya – Azure ML Online Course


Hypothesis
Y
“Proposed explanation made on the basis of limited
evidence as a starting point for further investigation”

h(x) = b0 + b1x

Find out value of b0 and b1 such


that
Y ~ h(x) X

for the given observations


© Jitesh Khurkhuriya – Azure ML Online Course
Example of Linear Regression

Hrs Studied Marks Marks


0 40 100

2 52 90

3 53 80

4 55 70

4 56 60

5 72 50

40
6 71
30
6 88
20
7 56 10

7 74 0

8 89 0 1 2 3 4 5 6 7 8 9 10

9 67 Hrs Studied
9 89
© Jitesh Khurkhuriya – Azure ML Online Course
Cost Function
Hypothesis: h(x) = b0 + b1x

𝑛
1 ^ 2
෍(𝑦𝑖 − 𝑦𝑖)
2n
𝑖=1

© Jitesh Khurkhuriya – Azure ML Online Course


Cost Function
Hypothesis: h(x) = b0 + b1x
b0 = 0; b1 = 1
Hrs Studied Marks ^
Marks Predicted
0 40 0
2 52 2
3 53
4 55
4 56
5 72
6 71
6 88
7 56
7 74
8 89
9 67
9 89 © Jitesh Khurkhuriya – Azure ML Online Course
Cost Function
Hypothesis: h(x) = b0 + b1x
b0 = 0; b1 = 1
Hrs Studied Marks ^
Marks Predicted (Yi – Yi)^2 𝑛
0 40 0 1600 1 ^ 2
2 52 2 2500 ෍(𝑦𝑖 − 𝑦𝑖)
3 53 3 2500
2n
𝑖=1
4 55 4 2601
4 56 4 2704
5 72 5 4489
6 71 6 4225
6 88 6 6724 b0 b1 Cost
7 56 7 2401
7 74 7 4489 0 1 1944.538
8 89 8 6561
9 67 9 3364
9 89 9 6400
© Jitesh Khurkhuriya – Azure ML Online Course
Cost Function Plot

b0 b1 cost
0 1 1944.54 Cost
0 2 1610.08
2500.00
0 3 1311.46
0 5 821.77 2000.00
0 7 475.46
0 8 356.08 1500.00
0 10 224.85
1000.00
0 12 237.00
0 14 392.54 500.00
0 15 524.08
0 16 691.46 0.00
0 17 894.69 0 5 10 15 20 25
0 18 1133.77 b1
0 20 1719.46
0 21 2066.08 © Jitesh Khurkhuriya – Azure ML Online Course
Cost Function with b0 and b1

X axis – b1

Y axis – b0

Z = C(b0,b1)

https://academo.org/
© Jitesh Khurkhuriya – Azure ML Online Course
Gradient Descent

© Jitesh Khurkhuriya – Azure ML Online Course


Gradient Descent

© Jitesh Khurkhuriya – Azure ML Online Course


Gradient Descent
Cost Function: C(b1)

bj := bj – α f(Ci)

α Learning
Rate

b1
© Jitesh Khurkhuriya – Azure ML Online Course
Gradient Descent?

Local Minimum

Global Minimum

© Jitesh Khurkhuriya – Azure ML Online Course


Batch Gradient Descent
X1 X2 …. Xn

Sum of All before taking one step (epoch)


bj := bj – α f(Ci)
Long time to reach the bottom
Does it for
number of examples
number of features Batch Gradient Descent
learning rate
© Jitesh Khurkhuriya – Azure ML Online Course
Batch Vs Stochastic Gradient Descent
X1 X2 …. Xn X1 X2 …. Xn

bj := bj – α f(Ci)
Does it for Randomly shuffle the dataset
number of examples
number of features Repeat the steps for every example
learning rate
Modify the coefficient at every step
Batch Gradient Descent
© Jitesh Khurkhuriya – Azure ML Online Course
Stochastic Gradient Descent

© Jitesh Khurkhuriya – Azure ML Online Course


Decision Tree Regression

© Jitesh Khurkhuriya – Azure ML Online Course


Decision Tree Terms

Root Node

Branch or Subtree

Decision Node Decision Node

Terminal Node Decision Node Terminal Node Terminal Node


Leaf Leaf Leaf

Terminal Node Terminal Node


Leaf Leaf

© Jitesh Khurkhuriya – Azure ML Online Course


Income
Level

High Low Medium

Credit Score

Low High

Employment Type

SE Salaried

LID IL CS ET Status LID IL CS ET Status


L1 Medium Low SE No L4 Medium Low Salaried Yes
L8 Medium Low SE No
Pure Subset
Pure Subset
© Jitesh Khurkhuriya – Azure ML Online Course
Decision Tree Regression

X1 > 30
Yes No

X2 < 20 X2 < 40

Yes No Yes No

X1 < 60

Yes No
R1 R2 R3 R4 R5
© Jitesh Khurkhuriya – Azure ML Online Course
Decision Tree Regression
X2

Yes
X1 < 30
No R3
X2< 20 X2 < 40 40
Yes No Yes No

X1 < 60 R2
R4 R5
Yes No
20
R1 R2 R4 R5 R3
R1
X1
30 60

© Jitesh Khurkhuriya – Azure ML Online Course


Decision Tree Regression
X2

The output by every region is the


mean of that region

Or

Every leaf may have a regression


line for the data points within
that region

X1

© Jitesh Khurkhuriya – Azure ML Online Course


Boosted Decision Tree Regression
• MART gradient boosting algorithm.

• Builds each regression tree in a step-wise fashion

• Predefined loss function to measure the error in each step

© Jitesh Khurkhuriya – Azure ML Online Course


Thank You and Have a Great
Time…!

© Jitesh Khurkhuriya – Azure ML Online Course

You might also like