0% found this document useful (0 votes)

10 views17 pages

Class 5 - ML Concepts (Part II)

The document outlines fundamental concepts of machine learning, focusing on optimization algorithms known as solvers, particularly Gradient Descent (GD) and Stochastic Gradient Descent (SGD). It discusses the advantages and disadvantages of these methods, including learning rate challenges and potential upgrades like Momentum, Adagrad, and Adam. The final message emphasizes that these solvers are not machine learning algorithms themselves but tools for minimizing functions with gradients.

Uploaded by

omarfaroque910

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views17 pages

Class 5 - ML Concepts (Part II)

Uploaded by

omarfaroque910

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

Class 5- Machine Learning concepts

Part II

Prof. Pedram Jahangiry 1

Motivation

Machine learning fundamental concepts:

• Inference and prediction
Prediction Inference
• Part I: The Model
• Part II: Evaluation metrics Inference Prediction

• Part III: Bias-Variance tradeoff

• Part IV: Resampling methods
• Part V: Solvers/learners (GD, SGD, Adagrad, Adam, …)
• Part VI: How do machines learn?
• Part VII: Scaling the features

Prof. Pedram Jahangiry 2

Part V
Solvers (GD, SGD, Adagrad, Adam, …)

Prof. Pedram Jahangiry 3

Solvers (learners)!
A Loss Function tells us “how good” our model is at making predictions for a given set of parameters. The cost
function has its own curve and its own gradients. The slope of this curve tells us how to update our parameters to
make the model more accurate.

The two most frequently used optimization algorithms when the loss function is differentiable are:
1) Gradient Descent (GD)
2) Stochastic Gradient Descent (SGD)

Gradient Descent: is an iterative optimization algorithm for finding the minimum of a function. To find a local minimum of a
function using gradient descent, one starts at some random point and takes steps proportional to the negative of the gradient of
the function at the current point.
𝜕
𝜃𝑗 ≔ 𝜃𝑗 − 𝛼 𝐽 𝜃
𝜕𝜃𝑗
• 𝜃j is the model’s 𝑗𝑡ℎ parameter
• 𝛼 is the learning rate
• 𝐽 𝜃 is the loss function (which is differentiable)

Prof. Pedram Jahangiry 4

Gradient Descent Visualization 𝜃𝑗 ≔ 𝜃𝑗 − 𝛼
𝜕
𝜕𝜃𝑗
𝐽 𝜃

Gradient descent proceeds in epochs. An epoch consists of using the training set entirely to update
each parameter. The learning rate 𝛼 controls the size of an update

𝐿𝑜𝑠𝑠: 𝐽 𝜃𝑗

𝜃𝑗

Prof. Pedram Jahangiry 5

Learning rate schedules

𝜕
𝜃𝑗 ≔ 𝜃𝑗 − 𝛼 𝐽 𝜃
𝜕𝜃𝑗

• If 𝛼 is too small, gradient

descent can be slow

• If 𝛼 is too large, gradient

descent can overshoot the
minimum. It may fail to
converge, or even diverge.

Prof. Pedram Jahangiry 6

Beyond Gradient Descent?
Disadvantages of gradient descent:
• Single batch: use the entire training set to update a parameter!
• Sensitive to the choice of the learning rate
• Slow for large datasets

(Minibatch) Stochastic Gradient Descent: is a version of the algorithm that speeds

up the computation by approximating the gradient using smaller batches (subsets)
of the training data. SGD itself has various “upgrades”.
1) Adagrad
2) Adam

Prof. Pedram Jahangiry 7

Why SGD?

Prof. Pedram Jahangiry 8

SGD vs GD

Prof. Pedram Jahangiry 9

Why upgrade SGD?

Prof. Pedram Jahangiry 10

Beyond Stochastic Gradient Descent?
Disadvantages of Stochastic gradient descent:
• Get trapped in suboptimal local minima (for non-convex loss functions)
• The same learning rate applies to all parameter updates

SGD upgrades:
1) Momentum
2) Adagrad
3) Adam

Prof. Pedram Jahangiry

Momentum
• Momentum is a method that helps accelerate SGD in the relevant direction and dampens
oscillations.
• Essentially, when using momentum, we push a ball down a hill. The ball accumulates
momentum as it rolls downhill, becoming faster and faster on the way.

Prof. Pedram Jahangiry

Adagrad
• Adaptive Gradient Algorithm is a version of SGD that scales the learning rate for each
parameter according to the history of gradients. As a result, the learning rate is
reduced for very large gradients and vice-versa.

• It adapts the learning rate to the parameters, performing smaller updates (low learning
rates) for parameters associated with frequently occurring features, and larger updates
(high learning rates) for parameters associated with infrequent features. For this
reason, it is well-suited for dealing with sparse data.

Prof. Pedram Jahangiry

Adam

• Adaptive Moment Estimation takes

both momentum and adaptive
learning rate (RMSprop) and putting
them together.

• Whereas momentum can be seen as a

ball running down a slope, Adam
behaves like a heavy ball with
friction, which thus prefers flat
minima in the error surface

Prof. Pedram Jahangiry

Final message!

Notice that gradient descent and its variants are not machine
learning algorithms. They are solvers of minimization problems
in which the function to minimize has a gradient (in most points
of its domain).

Prof. Pedram Jahangiry 15

Question of the day!

Prof. Pedram Jahangiry 16

Prof. Pedram Jahangiry

DL CS 6 M2 Live Session Flow
No ratings yet
DL CS 6 M2 Live Session Flow
32 pages
Gradient Descent Overview
No ratings yet
Gradient Descent Overview
14 pages
DL Class2
No ratings yet
DL Class2
30 pages
Optimization Techniques (SGD Alternatives)
No ratings yet
Optimization Techniques (SGD Alternatives)
34 pages
Gradient Descent for ML Practitioners
No ratings yet
Gradient Descent for ML Practitioners
27 pages
Optimizers and Activation Functions in Deep Learning
No ratings yet
Optimizers and Activation Functions in Deep Learning
15 pages
Gradient Descent Method
No ratings yet
Gradient Descent Method
12 pages
Op Tim Ization
No ratings yet
Op Tim Ization
37 pages
Curs6site PDF
No ratings yet
Curs6site PDF
40 pages
Optimization Gradient Descent
No ratings yet
Optimization Gradient Descent
13 pages
Rajesh (DL Unit3) 06dec2024
No ratings yet
Rajesh (DL Unit3) 06dec2024
67 pages
Soft Computing Assignment
No ratings yet
Soft Computing Assignment
9 pages
SGD 1
No ratings yet
SGD 1
86 pages
S09 DNN Gradients Wip
No ratings yet
S09 DNN Gradients Wip
28 pages
Gradient-Based Optimizers
No ratings yet
Gradient-Based Optimizers
54 pages
DL Module 2 1 (Sami)
No ratings yet
DL Module 2 1 (Sami)
17 pages
Gradient Descent for ML Experts
No ratings yet
Gradient Descent for ML Experts
5 pages
An Overview of Gradient Descent Optimization Algorithms PDF
No ratings yet
An Overview of Gradient Descent Optimization Algorithms PDF
12 pages
Optimization For Deep Learning: Sebastian Ruder
No ratings yet
Optimization For Deep Learning: Sebastian Ruder
49 pages
Chapter 4
No ratings yet
Chapter 4
33 pages
Mlfa Autumn 23 Optimization
No ratings yet
Mlfa Autumn 23 Optimization
37 pages
BME 6407 - Class 10 (April 2023)
No ratings yet
BME 6407 - Class 10 (April 2023)
31 pages
Module 2
No ratings yet
Module 2
67 pages
4 - Gradient Descent and Stochastic GD
No ratings yet
4 - Gradient Descent and Stochastic GD
37 pages
Gradient Descent - PR
No ratings yet
Gradient Descent - PR
31 pages
Gradient Descent for Deep Learning
No ratings yet
Gradient Descent for Deep Learning
21 pages
Maths
No ratings yet
Maths
13 pages
MLP Encoder Decoder
No ratings yet
MLP Encoder Decoder
14 pages
Cours 5
No ratings yet
Cours 5
23 pages
Optimization Techniques in Deep Learning
No ratings yet
Optimization Techniques in Deep Learning
14 pages
Opti Incertitude
No ratings yet
Opti Incertitude
231 pages
INT255 Unit-4
No ratings yet
INT255 Unit-4
40 pages
Gradient Descent Deep Learning Lecture
No ratings yet
Gradient Descent Deep Learning Lecture
5 pages
Gradient Decent
No ratings yet
Gradient Decent
15 pages
Gradient Descent
No ratings yet
Gradient Descent
27 pages
Deep Learning Optimizers Explained
No ratings yet
Deep Learning Optimizers Explained
20 pages
Optimizers
No ratings yet
Optimizers
4 pages
DL Test-2
No ratings yet
DL Test-2
28 pages
Gradient Descent and Optimization in Machine Learning
No ratings yet
Gradient Descent and Optimization in Machine Learning
9 pages
Gradient Descent Optimization Guide
No ratings yet
Gradient Descent Optimization Guide
9 pages
L5 Training Neural Networks Part 2 en v2
No ratings yet
L5 Training Neural Networks Part 2 en v2
70 pages
Deep Learning
No ratings yet
Deep Learning
18 pages
Lecture 8 Gradient Descent For Non-Convex Functions
No ratings yet
Lecture 8 Gradient Descent For Non-Convex Functions
21 pages
Types of Gradient Descent
No ratings yet
Types of Gradient Descent
9 pages
Gradient Descent Algorithm Is A First
No ratings yet
Gradient Descent Algorithm Is A First
5 pages
Adadelta: An Adaptive Learning Rate Method Matthew D. Zeiler Google Inc., USA New York University, USA
No ratings yet
Adadelta: An Adaptive Learning Rate Method Matthew D. Zeiler Google Inc., USA New York University, USA
6 pages
Visualizing SGD and Optimizations
No ratings yet
Visualizing SGD and Optimizations
8 pages
Gradient Descent and Cost Function
No ratings yet
Gradient Descent and Cost Function
14 pages
DL Unit - 2
No ratings yet
DL Unit - 2
20 pages
Deep Learning Chapter 1
No ratings yet
Deep Learning Chapter 1
46 pages
AML - Lecture 5
No ratings yet
AML - Lecture 5
97 pages
WINSEM2024-25 CSE4006 ETH AP2024254000693 2025-01-08 Reference-Material-I
No ratings yet
WINSEM2024-25 CSE4006 ETH AP2024254000693 2025-01-08 Reference-Material-I
40 pages
Technical Writing
No ratings yet
Technical Writing
9 pages
GD Compare
No ratings yet
GD Compare
5 pages
Implement 03-1
No ratings yet
Implement 03-1
24 pages
Lec 8
No ratings yet
Lec 8
43 pages
Unit 2.a Optimzer
No ratings yet
Unit 2.a Optimzer
10 pages
Class 10 - Logistic Regression-Checkpoint
No ratings yet
Class 10 - Logistic Regression-Checkpoint
20 pages
Covid-19 Detection From Chest X-Tay Images
No ratings yet
Covid-19 Detection From Chest X-Tay Images
12 pages
Module 6 - Deep Sequence Modeling-Original
No ratings yet
Module 6 - Deep Sequence Modeling-Original
65 pages
Module 7 - Deep Sequence Modeling
No ratings yet
Module 7 - Deep Sequence Modeling
61 pages
SQLforEveryone1 1
No ratings yet
SQLforEveryone1 1
10 pages
EMBS Poster
No ratings yet
EMBS Poster
1 page
Chapter 6 Shear and Moments in Beams Updting 2020
No ratings yet
Chapter 6 Shear and Moments in Beams Updting 2020
19 pages
GCSE Maths Higher Tier Exam 2014
No ratings yet
GCSE Maths Higher Tier Exam 2014
16 pages
NIOS Class 12 Previous Year Question Papers Physics 2006
No ratings yet
NIOS Class 12 Previous Year Question Papers Physics 2006
5 pages
Secondary - 2018 - Class - 9 & 10 - Math Full - PDF Opt
No ratings yet
Secondary - 2018 - Class - 9 & 10 - Math Full - PDF Opt
390 pages
Quantum Mechanics: Variational Method
No ratings yet
Quantum Mechanics: Variational Method
9 pages
DLP MATH 8 Q1 WEEK 6 Day 3
No ratings yet
DLP MATH 8 Q1 WEEK 6 Day 3
9 pages
CV Assignment 3
No ratings yet
CV Assignment 3
2 pages
Introduction To Statistics
No ratings yet
Introduction To Statistics
15 pages
EC6303-Signals and Systems
No ratings yet
EC6303-Signals and Systems
10 pages
Anurag Tyagi Differentiations
No ratings yet
Anurag Tyagi Differentiations
10 pages
Barnouw - Vico and The Continuity of Science
No ratings yet
Barnouw - Vico and The Continuity of Science
13 pages
Accenture Complete Preparation Sheet
100% (1)
Accenture Complete Preparation Sheet
11 pages
Vedic Mathematics
No ratings yet
Vedic Mathematics
46 pages
Naked Statistics: Stripping The Dread From The Data Practical Business Statistics, Sixth Edition
No ratings yet
Naked Statistics: Stripping The Dread From The Data Practical Business Statistics, Sixth Edition
2 pages
Evaluation of Melon Cucumis Melo L
No ratings yet
Evaluation of Melon Cucumis Melo L
10 pages
Name: - : Inquiry Question
No ratings yet
Name: - : Inquiry Question
14 pages
Assignment 1
No ratings yet
Assignment 1
4 pages
Advanced Statistical Physics Problems
No ratings yet
Advanced Statistical Physics Problems
7 pages
Applied Mathematics Msbte Board Paper PDF
No ratings yet
Applied Mathematics Msbte Board Paper PDF
3 pages
Chapter 9
No ratings yet
Chapter 9
34 pages
CSE110 - OOP - Lab Assignment 02 - Student Version
No ratings yet
CSE110 - OOP - Lab Assignment 02 - Student Version
4 pages
Mechanical Engineering MCQs: Production Tech
No ratings yet
Mechanical Engineering MCQs: Production Tech
27 pages
Physics Cambridge Igcse Year 10 Paper 1
No ratings yet
Physics Cambridge Igcse Year 10 Paper 1
18 pages
cs5300 Day06 Adversarial Search
No ratings yet
cs5300 Day06 Adversarial Search
5 pages
FDT Excel Sample
No ratings yet
FDT Excel Sample
4 pages
Ibps RRB Officer Scale 1 Previous Year Paper 2013 Based On Old Pattern (Go Through The Questions For Practice)
No ratings yet
Ibps RRB Officer Scale 1 Previous Year Paper 2013 Based On Old Pattern (Go Through The Questions For Practice)
7 pages
Computational Intelligence (CS3030/CS3031) : School of Computer Engineering, KIIT-DU, BBS-24, India
No ratings yet
Computational Intelligence (CS3030/CS3031) : School of Computer Engineering, KIIT-DU, BBS-24, India
2 pages
Jacobi Method For Nonlinear First-Order Pdes
100% (1)
Jacobi Method For Nonlinear First-Order Pdes
3 pages
Tolerancias Lineales GBT1804
No ratings yet
Tolerancias Lineales GBT1804
4 pages

Class 5 - ML Concepts (Part II)

Uploaded by

Class 5 - ML Concepts (Part II)

Uploaded by

Class 5- Machine Learning concepts

Prof. Pedram Jahangiry 1

Machine learning fundamental concepts:

• Part III: Bias-Variance tradeoff

Prof. Pedram Jahangiry 2

Prof. Pedram Jahangiry 3

Prof. Pedram Jahangiry 4

Prof. Pedram Jahangiry 5

• If 𝛼 is too small, gradient

• If 𝛼 is too large, gradient

Prof. Pedram Jahangiry 6

(Minibatch) Stochastic Gradient Descent: is a version of the algorithm that speeds

Prof. Pedram Jahangiry 7

Prof. Pedram Jahangiry 8

Prof. Pedram Jahangiry 9

Prof. Pedram Jahangiry 10

Prof. Pedram Jahangiry

Prof. Pedram Jahangiry

Prof. Pedram Jahangiry

• Adaptive Moment Estimation takes

• Whereas momentum can be seen as a

Prof. Pedram Jahangiry

Prof. Pedram Jahangiry 15

Prof. Pedram Jahangiry 16

You might also like