UET
Since 2004
ĐẠI HỌC CÔNG NGHỆ, ĐHQGHN
VNU-University of Engineering and Technology
INT3405 - Machine Learning
Lecture 3: Linear Regression
Duc-Trong Le & Viet-Cuong Ta
Hanoi, 09/2023
Outline
● Supervised Learning
● Linear Regression with One Variable
○ Model Representation
○ Cost Functions
○ Gradient Descent
● Linear Regression with Multiple Variables
○ Learning rate
○ Normal Equation
FIT-CS INT3405 - Machine Learning 2
Recap: Random Variables
FIT-CS INT3405 - Machine Learning 3
Supervised Learning
●Supervised (Inductive) Learning
●Formalization
○ Input:
○ Output:
○ Target function: (unknown)
○ Training Data:
○ Hypothesis:
○ Hypothesis space:
FIT-CS INT3405 - Machine Learning 4
A Learning Problem
Unknown
Function
Input Output
FIT-CS INT3405 - Machine Learning 5
The Statistical Learning Framework
6
The Statistical Learning Framework
7
The Statistical Learning Framework
8
Hypothesis Spaces
●Linear models
○ Infinite possible hypotheses!
○ Any choices of coefficient a and b will result in a possible hypothesis
● Polynomial models
● Any nonlinear models
FIT-CS INT3405 - Machine Learning 9
Two Views of Learning
●Learning is the removal of our remaining uncertainty.
○ If we are know that x and y are linearly dependent, then we could
use the training data to infer the linear function
●Learning requires guessing a good, small hypothesis class.
○ We could start with a very small / simple class, and enlarge it until it
contains a hypothesis that fits the data
●But we could be wrong
○ Our prior knowledge might be wrong
○ Our guess of the hypothesis class could be wrong
■ The smaller the hypothesis class, the more likely we are wrong
FIT-CS INT3405 - Machine Learning 10
Two Strategies for Machine Learning
●Develop Languages for Expressing Prior Knowledge
○ Rule grammars and stochastic models
●Develop Flexible Hypothesis Spaces
○ Nested collections of hypotheses, rules, linear models, decision trees,
neural networks, etc.
●For either case, the key is to
○ Developing efficient algorithms for finding a Hypothesis that best
approximates the target function for fitting the data
FIT-CS INT3405 - Machine Learning 11
Key Issues in Machine Learning
● What are good hypothesis spaces?
○ Which spaces have been useful in practical applications and why?
● What algorithms can work with these spaces?
○ Are there general design principles for machine learning algorithms?
● How can we find the best hypothesis in an efficient way?
○ How to find the optimal solution efficiently (“optimization” question)
● How can we optimize accuracy on future data?
○ Known as the “overfitting” problem (i.e., “generalization” theory)
● How can we have confidence in the results?
○ How much training data is required to find accurate hypothesis? (“statistical” question)
● Are some learning problems computationally intractable? (“computational” question)
● How can we formulate application problems as machine learning problems? (“engineering”
question)
FIT-CS INT3405 - Machine Learning 12
Regression with One Variable (1)
Housing Prices
(Portland, OR)
Price
(in 1000s of dollars)
Size
(feet2)
Supervised Learning Regression Problem
Given the “right answer” for each Predict real-valued output
example in the data.
FIT-CS INT3405 - Machine Learning 13
Regression with One Variable (2)
Training set of Size in feet2 (x) Price ($) in 1000's (y)
housing prices 2104 460
(Portland, OR) 1416 232
1534 315
852 178
… …
Notation:
m = Number of training examples
x’s = “input” variable / features
y’s = “output” variable / “target” variable
FIT-CS INT3405 - Machine Learning 14
Model Representation
Training Set How do we represent h ?
Learning Algorithm y
Size of h Estimated x
house price
x Hypothesis y
Linear regression with one variable.
“Univariate Linear Regression”
How to choose parameters ?
FIT-CS INT3405 - Machine Learning 15
Formulation: Cost Function (1)
Hypothesis:
Parameters:
y
Cost Function: mean squared error (MSE)
x
Goal:
FIT-CS INT3405 - Machine Learning 16
Formulation: Cost Function (2)
Simplified
Hypothesis:
Parameters:
Cost Function:
Goal:
FIT-CS INT3405 - Machine Learning 17
Cost Function: Example (1)
For fix this is a function of x function of the parameter
FIT-CS INT3405 - Machine Learning 18
Cost Function: Example (2)
For fix this is a function of x function of the parameter
FIT-CS INT3405 - Machine Learning 19
Cost Function: Example (3)
For fix this is a function of x function of the parameter
FIT-CS INT3405 - Machine Learning 20
Cost Function (1)
Hypothesis:
Parameters:
Cost Function:
Goal:
FIT-CS INT3405 - Machine Learning 21
Cost Function (2)
(for fixed , this is a function of x) (function of the parameters )
Price ($)
in
1000’s
Size in feet2
(x)
FIT-CS INT3405 - Machine Learning 22
Cost Function (3)
●Contour plots
FIT-CS INT3405 - Machine Learning 23
Cost Function (4)
(for fixed , this is a function of x) (function of the parameters )
FIT-CS INT3405 - Machine Learning 24
Cost Function (5)
(for fixed , this is a function of x) (function of the parameters )
FIT-CS INT3405 - Machine Learning 25
Gradient Descent for Optimization (1)
Given some objective function
Want to optimize
Outline:
• Start with some
• Keep changing to reduce
until we hopefully end up at a minimum
FIT-CS INT3405 - Machine Learning 26
Gradient Descent for Optimization (2)
FIT-CS INT3405 - Machine Learning 27
Gradient Descent for Optimization (3)
FIT-CS INT3405 - Machine Learning 28
Gradient Descent Algorithm
Gradient descent algorithm
learning rate parameter
(rule of thumb: 0.1)
FIT-CS INT3405 - Machine Learning 29
Gradient Descent for Linear Regression (1)
Gradient descent algorithm Linear Regression Model
FIT-CS INT3405 - Machine Learning 30
Gradient Descent for Linear Regression (2)
Gradient descent algorithm
update
and
simultaneously
FIT-CS INT3405 - Machine Learning 31
Gradient Descent Example (1)
(for fixed , this is a function of x) (function of the parameters )
FIT-CS INT3405 - Machine Learning 32
Gradient Descent Example (2)
(for fixed , this is a function of x) (function of the parameters )
FIT-CS INT3405 - Machine Learning 33
Gradient Descent Example (3)
(for fixed , this is a function of x) (function of the parameters )
FIT-CS INT3405 - Machine Learning 34
Gradient Descent Example (4)
(for fixed , this is a function of x) (function of the parameters )
FIT-CS INT3405 - Machine Learning 35
Gradient Descent Example (5)
(for fixed , this is a function of x) (function of the parameters )
FIT-CS INT3405 - Machine Learning 36
Gradient Descent Example (6)
(for fixed , this is a function of x) (function of the parameters )
FIT-CS INT3405 - Machine Learning 37
Gradient Descent Example (7)
(for fixed , this is a function of x) (function of the parameters )
FIT-CS INT3405 - Machine Learning 38
Gradient Descent Example (8)
(for fixed , this is a function of x) (function of the parameters )
FIT-CS INT3405 - Machine Learning 39
Gradient Descent Example (9)
(for fixed , this is a function of x) (function of the parameters )
FIT-CS INT3405 - Machine Learning 40
Batch Gradient Descent
“Batch”: Each step of gradient descent uses all the
training examples.
FIT-CS INT3405 - Machine Learning 41
Multivariate Linear Regression (1)
Multiple features (variables).
Size (feet2) Number of Number of Age of home Price ($1000)
bedrooms floors (years)
2104 5 1 45 460
1416 3 2 40 232
1534 3 2 30 315
852 2 1 36 178
… … … … …
Notation:
= number of features
= input (features) of training example.
= value of feature in training example.
FIT-CS INT3405 - Machine Learning 42
Multivariate Linear Regression (2)
Hypothesis:
Previously:
For convenience of notation, define .
FIT-CS INT3405 - Machine Learning 43
Gradient Descent for Multivariate LR
Hypothesis:
Parameters:
Cost function:
Gradient descent:
Repeat (simultaneously update for every )
FIT-CS INT3405 - Machine Learning 44
Univariate LR vs Multivariate LR
Gradient Descent
Previously (n=1): New algorithm :
Repeat Repeat
(simultaneously update )
FIT-CS INT3405 - Machine Learning 45
Convergence and Learning Rate
Example automatic convergence test:
Declare convergence if
decreases by less than
in one iteration.
No. of iterations
For sufficiently small , should decrease on every iteration.
But if is too small, gradient descent can be slow to converge.
If is too large: may not decrease on every iteration; may not converge.
SML– Term 1 2020-2021
FIT-CS INT3405 - Machine Learning 46
46
Learning Rate
divergenc
e
gradually
too small too decreased
constant large
FIT-CS INT3405 - Machine Learning 47
Normal Equation (1)
Gradient Descent
• Iterative approach
Normal Equation
• Analytical method to solve
Intuition Example: If 1D
Solve equation to find w
FIT-CS INT3405 - Machine Learning 48
Normal Equation (2)
FIT-CS INT3405 - Machine Learning 49
Normal Equation (3)
●Matrix-vector formulation
●Analytical solution
FIT-CS INT3405 - Machine Learning 50
The Pseudo-inverse
FIT-CS INT3405 - Machine Learning 51
Normal Equation: Example
Examples:
Size (feet2) Number of Number of Age of home Price ($1000)
bedrooms floors (years)
1 2104 5 1 45 460
1 1416 3 2 40 232
1 1534 3 2 30 315
1 852 2 1 36 178
is inverse of matrix .
FIT-CS INT3405 - Machine Learning 52
Gradient Descent vs Normal Equation
training examples, features.
Gradient Descent Normal Equation
• Need to choose . • No need to choose .
• Needs many iterations. • Don’t need to iterate.
• Works well even • Need to compute
when is large.
• Slow if is very large.
FIT-CS INT3405 - Machine Learning 53
Summary
● Supervised Learning
● Linear Regression with One Variable
○ Model Representation
○ Cost Functions
○ Gradient Descent
● Linear Regression with Multiple Variables
○ Learning rate
○ Normal Equation
Duc-Trong Le
FIT-CS INT3405 - Machine Learning 54