AI VIETNAM
All-in-One Course
Machine Learning
Nonlinear Regression
Nguyen Quoc Thai
1
Year 2023
CONTENT
(1) – Linear Regression
(2) – Nonlinear Regression
(3) – Polynomial Regression
(4) – Multivariable Polynomial Regression
(5) – Summary
2
1 – Linear Regression
! Problem
Data
Level Salary Level Salary
0 8 3,5 ???
1 15 10 ???
2 18
Prediction
3 22
4 26
Learning
5 30
6 38
7 47
3
1 – Linear Regression
! Problem
Data Visualization
Level Salary
y = 6x + 7
0 8
y = f(x): linear function
1 15
2 18
3 22
4 26
5 30
6 38
7 47
4
1 – Linear Regression
! Linear Regression
Data Modeling Visualization
Level Salary y = ax + b
y = 6x + 7
0 8
y = f(x): linear function
1 15 Find a and b to fit
2 18 the data
3 22
4 26
5 30
6 38
7 47
5
1 – Linear Regression
! Linear Regression using Gradient Descent
Modeling
Init 𝜃 Visualization
y = ax + b lr = 0.1 y = 2x + 2
Data
x = [1 2]
y = 18
y = 2x + 2 = 6
+2
y = 2x
6
1 – Linear Regression
! Linear Regression using Gradient Descent
Modeling
Init 𝜃 Visualization
y = ax + b lr = 0.1 y = 2x + 2
Data
x = [1 2]
y = 18
y = 2x + 2 = 6
+2
y = 2x
Loss
L = (6 – 18)2 = 144
Difference between
predicted and actual value 7
1 – Linear Regression
! Linear Regression using Gradient Descent
Modeling
Init 𝜃 Visualization
y = ax + b lr = 0.1 y = 2x + 2
Data
x = [1 2]
y = 18
4.4
𝜃= y = 2x + 2 = 6
6.8
+2
y = 2x
Loss
−24 k = -24
L’ = L = (6 – 18)2 = 144
−48
Difference between
predicted and actual value 8
1 – Linear Regression
! Linear Regression using Gradient Descent
Modeling
Visualization
y = ax + b y = 2x + 2
Updated 4 .4
x +
.8
=6
y = 6.8x + 4.4 y
+2
y = 2x
9
1 – Linear Regression
! Linear Regression using Gradient Descent
Data Inputs / Features Target
Level Salary 1 𝜑!(1) … 𝜑"#!(1) 𝑦 1
0 8 X= ⋮ ⋮ ⋱ ⋮ Y= ⋮
1 15 1 𝜑!(N) … 𝜑"#!(N) 𝑦 𝑁
2 18
3 22 Weight Predict
4 26 𝜃$ 𝜃$ + 𝜃! ∗ 𝜑!(1) + ⋯ + 𝜃"#!𝜑"#!(1)
5 30 𝜃= ⋮ 5=
Y ⋮
6 38 𝜃"#! 𝜃$ + 𝜃! ∗ 𝜑!(N) + ⋯ + 𝜃"#!𝜑"#!(N)
7 47
10
1 – Linear Regression
! Linear Regression using Gradient Descent
Data Learning
1 𝜑! (1) … 𝜑"#! (1)
X= ⋮ ⋮ ⋱ ⋮
1 𝜑! (N) … 𝜑"#! (N)
𝑦 1
Y= ⋮
𝑦 𝑁
Modeling
b
y = ax + b 𝜃=
a
𝜃$
𝜃= ⋮
𝜃"#! 11
1 – Linear Regression
! Optimal Learning Rate
Slow Optimal High
12
1 – Linear Regression
! Limitation
Modeling
y = ax + b
The main disadvantage of this technique is that the model is linear in both the
parameters and the features. This is a very restrictive assumption, quite of the often
data exhibits behaviours that are nonlinear in the features
Extend this approach to more flexible models…
13
2 – Nonlinear Regression
! Moving beyond linearity
Data Visualization Linear function
y9 i = 𝜃$ + 𝜃! ∗ 𝜑(i)
Polynomial function
y9 i = 𝜃$ + 𝜃! ∗ 𝜑 i + 𝜃% ∗ 𝜑(i)%
Nonlinear regression estimates the ouput
based on nonlinear function
Notice that the prediction is still linear in the
parameters but nonlinear in the features
14
2 – Nonlinear Regression
! Moving beyond linearity
2-degree polynomial function 3-degree polynomial function Step function
Sinusoidal Exponetial function
15
3 – Polynomial Regression
! Polynomial Features
2-degree polynomial function
y9 i = 𝜃$ ∗ 1 + 𝜃! ∗ 𝜑 i + 𝜃% ∗ 𝜑(i)%
Find 𝜃! , 𝜃" , 𝜃# to fit the data
+ 7
+ 6x
2
5x
y=
16
3 – Polynomial Regression
! Polynomial Features
2-degree polynomial function Create polynomial feature
y9 i = 𝜃$ ∗ 1 + 𝜃! ∗ 𝜑 i + 𝜃% ∗ 𝜑(i)%
1
𝜑 i = 𝝍(𝝋 𝒊 )
𝜑(i)%
𝜓 @ is referred to as basis function and it can be seen as a funtion that transforms the
input in some way (In this case its powers function)
17
3 – Polynomial Regression
! Polynomial Features
2-degree polynomial function Create polynomial feature
y9 i = 𝜃$ ∗ 1 + 𝜃! ∗ 𝜑 i + 𝜃% ∗ 𝜑(i)%
Data 𝝍(𝝋 𝒊 )
Level Salary Input 1 𝜑 i 𝜑(i)%
0 45000 0 1 0 0
1 50000 1 1 1 1
2 60000 2 1 2 4
3 80000 3 1 3 9
4 110000 4 1 4 16
5 160000 5 1 5 25 18
3 – Polynomial Regression
! Polynomial Features
2-degree polynomial function
y9 i = 𝜃$ ∗ 1 + 𝜃! ∗ 𝜑 i + 𝜃% ∗ 𝜑(i)%
Features Target
1 𝜑 i 𝜑(i)%
1 0 0 45000
1 1 1 50000
1 2 4 60000
1 3 9 80000
1 4 16 110000
1 5 25 160000 19
3 – Polynomial Regression
! Polynomial Features
3-degree polynomial function
y9 i = 𝜃$ ∗ 1 + 𝜃! ∗ 𝜑 i + 𝜃% ∗ 𝜑 𝑖 % + 𝜃& ∗ 𝜑(i)&
Features Target
1 𝜑 i 𝜑(i)% 𝜑(i)&
1 0 0 0 45000
1 1 1 1 50000
1 2 4 8 60000
1 3 9 27 80000
1 4 16 64 110000
1 5 25 125 160000 20
3 – Polynomial Regression
! Polynomial Features
Input Features Algorithm
1 𝜑 i 𝜑(i)%
0 1 0 0
1 1 1 1
2 1 2 4
3 1 3 9
4 1 4 16
5 1 5 25
21
3 – Polynomial Regression
! Model
Data Inputs / Features with b-degree Target
1 𝜑!(1) … 𝜑!(1)' 𝑦 1
Level Salary X= ⋮ ⋮ ⋱ ⋮ Y= ⋮
0 45000 1 𝜑!(N) … 𝜑!(N)' 𝑦 𝑁
1 50000
2 60000 Weight Predict
3 80000 𝜃$ 𝜃$ + 𝜃! ∗ 𝜑!(1) + ⋯ + 𝜃' 𝜑!(1)'
4 110000 𝜃= ⋮ 5=
Y ⋮
5 160000 𝜃' 𝜃$ + 𝜃! ∗ 𝜑!(N) + ⋯ + 𝜃' 𝜑!(N)'
22
3 – Polynomial Regression
! Model
Nonlinear Regression Model Both models are linear Linear Regression Model
in the parameters
1 𝜑!(1) … 𝜑!(1)' 1 𝜑!(1)
X= ⋮ ⋮ ⋱ ⋮ 1
&
X= ⋮ ⋮
J 𝜃 = & y' − y #
1 𝜑!(N) … 𝜑!(N)' N 1 𝜑!(N)
$%"
𝑦 1 𝜃$ 𝑦 1 𝜃$
Y= ⋮ 𝜃= ⋮ Using Gradient Decent Y= ⋮ 𝜃 =
𝜃!
𝑦 𝑁 𝜃' 𝑦 𝑁
𝜃$ + 𝜃! ∗ 𝜑!(1) + ⋯ + 𝜃' 𝜑!(1)' 𝜃$ + 𝜃! ∗ 𝜑!(1)
5=
Y ⋮ 5=
Y ⋮
𝜃$ + 𝜃! ∗ 𝜑!(N) + ⋯ + 𝜃' 𝜑!(N)' 𝜃$ + 𝜃! ∗ 𝜑!(N)
23
3 – Polynomial Regression
! Degree Choice
b-degree polynomial function
y9 i = 𝜃$ ∗ 1 + 𝜃! ∗ 𝜑 i + 𝜃% ∗ 𝜑 𝑖 % + ⋯ + 𝜃' ∗ 𝜑(i)'
The choice of the degree of the polynomial if critical and depends on the dataset at hand
1-degree 2-degree 3-degree 9-degree
Too simple/not flexible enough Just right Overfitting
24
3 – Polynomial Regression
! Degree Choice
b-degree polynomial function
y9 i = 𝜃$ ∗ 1 + 𝜃! ∗ 𝜑 i + 𝜃% ∗ 𝜑 𝑖 % + ⋯ + 𝜃' ∗ 𝜑(i)'
The choice of the degree of the polynomial if critical and depends on the dataset at hand
Good method for choice of the degree:
K-fold cross-validation
Choose the degree which has the
lowest out-of-sample error
25
3 – Polynomial Regression
! Disadvantages
Increasing the degree of the polynomial always results in a model that is more
sensitive to stochastic noise (even if that degree is the best one obtained from
validation), especially at the boundaries (where we often have less data).
26
4 – Multivariable PL
! Extension to the multivariable case
Build a machine learning model to predict the weight of the fish based on the body
measurement data of seven types of fish species
27
4 – Multivariable PL
! Extension to the multivariable case
Data Visualization
28
4 – Multivariable PL
! Simple Approach
(a+b)2 => a2 + b2 + a + b + 1
Multivariable
𝑋 1 𝜑 i 𝜑(i)%
0 2 1 0 2 0 4
1 1 1 1 1 1 1
2 2 1 2 2 4 4
3 1 1 3 1 9 1
4 2 1 4 2 16 4
5 1 1 5 1 25 1
29
4 – Multivariable PL
! Simple Approach
(a+b)2 => a2 + b2 + a + b + 1
30
4 – Multivariable PL
! Advanced Approach
(a+b)2 => a2 + b2 + ab + a + b + 1
Multivariable
𝑋 1 𝜑 i 𝜑(i)%
0 2 1 0 2 0 4 0
1 1 1 1 1 1 1 1
2 2 1 2 2 4 4 4
3 1 1 3 1 9 1 3
4 2 1 4 2 16 4 8
5 1 1 5 1 25 1 5
31
4 – Multivariable PL
! Practice: Fish Dataset
32
4 – Multivariable PL
! Practice: Fish Dataset
(1) - Preprocessing
33
4 – Multivariable PL
! Practice: Fish Dataset
(2) – EDA (Exploratory Data Analysis)
34
4 – Multivariable PL
! Practice: Fish Dataset
(3) – Representation + Polynomial Feature
Category Feature
35
4 – Multivariable PL
! Practice: Fish Dataset
(3) – Representation + Polynomial Feature
Category Feature Category Encoding One Hot Encoding
Index Category Category
0 Bream Bream [1 0 0]
1 Parkki Parkki [0 1 0]
2 Perch Perch [0 0 1]
36
4 – Multivariable PL
! Practice: Fish Dataset
(4) - Modeling
37
SUMMARY
(1) – Linear Regression
(2) – Nonlinear Regression
(3) – Polynomial Regression
(4) – Multivariable Polynomial Regression
(5) – Summary
38
Thanks!
Any questions?
39