0% found this document useful (0 votes)

72 views14 pages

Optimization For ML (2) : CS771: Introduction To Machine Learning Piyush Rai

The document discusses optimization techniques for machine learning problems. It outlines using first-order optimality to find closed-form solutions and gradient descent for iterative optimization of differentiable functions. Gradient descent works by iteratively moving parameter updates in the opposite direction of the gradient. For non-differentiable functions, sub-gradients and sub-differentials can be used instead of the true gradient. Linear and ridge regression examples are provided to illustrate gradient descent.

Uploaded by

Raja

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

72 views14 pages

Optimization For ML (2) : CS771: Introduction To Machine Learning Piyush Rai

Uploaded by

Raja

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 14

Optimization for ML (2)

CS771: Introduction to Machine Learning

Piyush Rai
2
The Plan
 Some basic techniques for solving optimization problems
 First-order optimality
 Gradient descent
 Dealing with non-differentiable functions
 Sub-gradients and sub-differential

CS771: Intro to ML
3
Optimization Problems in ML
 The general form of an optimization problem in ML will usually be
Usually a sum of the
training error +
regularizer However, possible to have
linear/ridge regression
 Here denotes the loss function to be optimized where solution has some
constraints (e.g., non-neg,
sparsity, or even both)
 is the constraint set that the solution must belong to, e.g.,
Linear and ridge regression
 Non-negativity constraint: All entries in must be non-negative that we saw were
 Sparsity constraint: is a sparse vector with atmost non-zeros unconstrained ( was a real-
valued vector)

 If no is specified, it is an unconstrained optimization problem

 Constrained opt. probs can be converted into unconstrained opt. (will see later)
 For now, assume we have an unconstrained optimization problem
CS771: Intro to ML
4

Methods for Solving

Optimization Problems

CS771: Intro to ML
5
Method 1: Using First-Order Optimality
 Very simple. Already used this approach for linear and ridge regression
Called “first order” since only gradient
is used and gradient provides the first
order info about the function being
optimized
The approach works only for very
simple problems where the objective
is convex and there are no
constraints on the values can take

 First order optimality: The gradient must be equal to zero at the optima

=0
 Sometimes, setting and solving for gives a closed form solution

 If closed form solution is not available, the gradient vector can still be used in
iterative optimization algos, like gradient descent
CS771: Intro to ML
6
Method 2: Iterative Optimiz. via Gradient Descent
Can I used this approach For max. problems we Iterative since it requires
to solve maximization several steps/iterations to find
can use gradient ascent
problems? the optimal solution
Fact: Gradient gives the For convex functions, Good
direction of steepest Will move in the
GD will converge to initialization
change in function’s direction of the gradient
the global minima needed for non-
value Gradient Descent convex functions
The learning rate
very imp. Should be
 Initialize as set carefully (fixed
or chosen
adaptively). Will
 For iteration (or until convergence) discuss some
 Calculate the gradient using the current iterates strategies later
 Set the learning rate Will see the Sometimes may be
justification shortly tricky to to assess
 Move in the opposite direction of gradient convergence? Will
(𝑡 +1) (𝑡 ) (𝑡 ) see some methods
𝒘 =𝒘 −𝜂 𝒈
𝑡 later
CS771: Intro to ML
7
Gradient Descent: An Illustration
Negative gradient here . Let’s move Learning rate is very important
𝐿(𝒘 ) in the positive direction

Positive gradient
here. Let’s move in
the negative direction

𝒘
(0 )
𝒘
𝒘 𝒘
(1) (3 )
∗
𝒘
(2 )
𝒘
(2 )
𝒘(3 )
𝒘∗ 𝒘
(1) 𝒘
(0 )
𝒘
Woohoo! 
Stuck at a local
Global minima
minima 
found!!!
GD thanks you for the Good initialization
good initialization  is very important CS771: Intro to ML
8
GD: An Example
 Let’s apply GD for least squares linear regression

=
Training
 The gradient: examples on
Prediction error of current model which the current
 Each GD update will be of the form on the training example model’s error is
large contribute
more to the
update

 Exercise: Assume , and show that GD update improves prediction on the

training input (, ), i.e, is closer to than to
 This is sort of a proof that GD updates are “corrective” in nature (and it actually is true
not just for linear regression but can also be shown for various other ML models)CS771: Intro to ML
9
Dealing with Non-differentiable Functions
 In many ML problems, the objective function will be non-differentiable

 Some examples that we have already seen: Linear regression with absolute
loss, or Huber loss, or -insensitive loss; even norm regularizer is non-diff
¿ 𝑦 𝑛 − 𝑓 ( 𝒙 𝑛 ) ∨−𝜖
Loss ¿ 𝑦 𝑛 − 𝑓 ( 𝒙 𝑛 ) ∨¿ Loss Loss

Not diff. here Not diff. here

) −𝛿 𝛿 ) −𝜖 𝜖 )
Not diff. here Not diff. here Not diff. here

 Basically, any function in which there are points with kink is non-diff
 At such points, the function is non-differentiable and thus gradients not defined
 Reason: Can’t define a unique tangent at such points
CS771: Intro to ML
10
Sub-gradients
 For convex non-diff fn, can define sub-gradients at point(s) of non-
differentiabilty Convex, thus lies
𝑓 (𝑥 )
above all its tangents
Equation of unique tangent at differentiable
here One extreme tangent at
non-differentiable
here Region containing all sub-gradients
The other extreme tangent at

𝑥1 𝑥2

 For a convex, non-diff function , sub-gradient

⊤
atis any vector s.t.
𝑓 ( 𝑥 ) ≥ 𝑓 ( 𝒙 ∗) + 𝒈 ( 𝒙 − 𝒙 ∗)
CS771: Intro to ML
11
Sub-gradients, Sub-differential, and Some Rules
 Set of all sub-gradient at a non-diff point is called the sub-differential
𝜕 𝑓 ( 𝒙 ∗ ) ≜ { 𝒈 : 𝑓 ( 𝐱 ) ≥ 𝑓 ( 𝒙 ∗ ) +𝒈 ( 𝒙 − 𝒙∗ ) ∀ 𝐱 }
⊤

The affine transform

 Some basic rules of sub-diff calculus to keep in mind rule is a special case of
 Scaling rule: the more general chain
rule
 Sum rule:
 Affine trans: , where
 Max rule: If then we calculate at as
 If , If
 If

 is a stationary point for a non-diff function if the zero vector belongs to the
sub-differential at , i.e.,
CS771: Intro to ML
12
Sub-Gradient For Absolute Loss Regression

0 𝑦 𝑛 − 𝒘 ⊤ 𝒙𝑛 0 𝑡
 The loss function for linear reg. with absolute loss:
 Non-differentiable at
 Can use the affine transform rule of sub-diff calculus
 Assume . Then
 if
 if
 where if
CS771: Intro to ML
13
Sub-Gradient Descent
 Suppose we have a non-differentiable function

 Sub-gradient descent is almost identical to GD except we use subgradients

Sub-Gradient Descent
 Initialize as

 For iteration (or until convergence)

 Calculate the sub-gradient
 Set the learning rate
 Move in the opposite direction of subgradient
(𝑡 +1) (𝑡 ) (𝑡 )
𝒘 =𝒘 − 𝜂𝑡 𝒈
CS771: Intro to ML
14
Coming up next
 Making GD faster: Stochastic gradient descent
 Constrained optimization
 Co-ordinate descent
 Alternating optimization
 Practical issue in optimization for ML

CS771: Intro to ML

Manual ACU802
100% (1)
Manual ACU802
100 pages
Optimization For ML: CS771: Introduction To Machine Learning Nisheeth
No ratings yet
Optimization For ML: CS771: Introduction To Machine Learning Nisheeth
18 pages
CS550 Regression Aug12
100% (1)
CS550 Regression Aug12
63 pages
Lecture 5
No ratings yet
Lecture 5
18 pages
Lecture 09 - Calculus and Optimization Techniques (3) - Plain
No ratings yet
Lecture 09 - Calculus and Optimization Techniques (3) - Plain
15 pages
771 A18 Lec9
No ratings yet
771 A18 Lec9
129 pages
Chapter 4
No ratings yet
Chapter 4
65 pages
Eem520l3 2023
No ratings yet
Eem520l3 2023
25 pages
Discussion 4 CS771
No ratings yet
Discussion 4 CS771
25 pages
Lecture 8
No ratings yet
Lecture 8
16 pages
Unit IV BPA GD
No ratings yet
Unit IV BPA GD
12 pages
ML Notes
No ratings yet
ML Notes
14 pages
7.c-CMP460-S22-Linear Models - Gradient Descent
No ratings yet
7.c-CMP460-S22-Linear Models - Gradient Descent
25 pages
Gradient Descent & Optimization Techniques
No ratings yet
Gradient Descent & Optimization Techniques
51 pages
Upload Unit 2
No ratings yet
Upload Unit 2
19 pages
Unit 2-DLV
No ratings yet
Unit 2-DLV
84 pages
Continuous Optimization in ML
No ratings yet
Continuous Optimization in ML
23 pages
CS601 - Machine Learning - Unit 2 New
No ratings yet
CS601 - Machine Learning - Unit 2 New
56 pages
Tut04 - One Algorithm To Optimize Them All
No ratings yet
Tut04 - One Algorithm To Optimize Them All
19 pages
Chapter 0: Introduction: 0.2.1 Examples in Machine Learning
No ratings yet
Chapter 0: Introduction: 0.2.1 Examples in Machine Learning
4 pages
Screenshot 2024-10-19 at 10.37.25 AM
No ratings yet
Screenshot 2024-10-19 at 10.37.25 AM
25 pages
Unit 2
No ratings yet
Unit 2
76 pages
DL Unit-I
No ratings yet
DL Unit-I
30 pages
EE769 7 Introduction To Neural Networks
No ratings yet
EE769 7 Introduction To Neural Networks
52 pages
Mlfa Autumn 23 Optimization
No ratings yet
Mlfa Autumn 23 Optimization
37 pages
6 - Support Vector Machines
No ratings yet
6 - Support Vector Machines
14 pages
Neural Networks Training Loss Functions, Stochastic Gradient Descent, Backpropagation Algorithm, Bias-Variance Tradeoff
No ratings yet
Neural Networks Training Loss Functions, Stochastic Gradient Descent, Backpropagation Algorithm, Bias-Variance Tradeoff
29 pages
DL Unit 2a
No ratings yet
DL Unit 2a
14 pages
Unit V NNHDL
No ratings yet
Unit V NNHDL
33 pages
CS115 Optimization
No ratings yet
CS115 Optimization
160 pages
Week3 Large Scale ML
No ratings yet
Week3 Large Scale ML
66 pages
End Sem
No ratings yet
End Sem
22 pages
11 Gradient Descent
No ratings yet
11 Gradient Descent
58 pages
Handwritten Notes - Unit 1,2
No ratings yet
Handwritten Notes - Unit 1,2
9 pages
Unit 2 Introduction To Deep Learning
67% (3)
Unit 2 Introduction To Deep Learning
79 pages
Gradient Descent
No ratings yet
Gradient Descent
52 pages
CS2011 5
No ratings yet
CS2011 5
43 pages
6 DL F24 Fitting
No ratings yet
6 DL F24 Fitting
49 pages
Backpropagation Optimization Tutorial
No ratings yet
Backpropagation Optimization Tutorial
14 pages
NN WK 3 Lec 5 6 Gradient Descent
No ratings yet
NN WK 3 Lec 5 6 Gradient Descent
7 pages
Tutorial 8 Questions
No ratings yet
Tutorial 8 Questions
3 pages
Linear Regression For Machine Learning Course
No ratings yet
Linear Regression For Machine Learning Course
41 pages
CS769 2025 Lecture 8-Annotated
No ratings yet
CS769 2025 Lecture 8-Annotated
37 pages
Unit VI Optimization Techniques Question Bank Solved Answer
No ratings yet
Unit VI Optimization Techniques Question Bank Solved Answer
20 pages
Lecture 3 ML - Optimization
No ratings yet
Lecture 3 ML - Optimization
32 pages
Convolutional Neural Network
100% (1)
Convolutional Neural Network
59 pages
Unit-4, MCQ
No ratings yet
Unit-4, MCQ
5 pages
Week 4
No ratings yet
Week 4
61 pages
Chapter 07
No ratings yet
Chapter 07
20 pages
Week 06 - Deep Feedforward Networks - Optimization
No ratings yet
Week 06 - Deep Feedforward Networks - Optimization
83 pages
Lec03 2 Linear Regression Slides
No ratings yet
Lec03 2 Linear Regression Slides
14 pages
8 Adagrad, RMSprop, Adam 04 Sep 2020material I 04 Sep 2020 Module4 Optimization
No ratings yet
8 Adagrad, RMSprop, Adam 04 Sep 2020material I 04 Sep 2020 Module4 Optimization
50 pages
Lecture7 Graddesc
No ratings yet
Lecture7 Graddesc
8 pages
1 Intro
No ratings yet
1 Intro
91 pages
Ds 5
No ratings yet
Ds 5
12 pages
Module2 Optimizations
No ratings yet
Module2 Optimizations
65 pages
Deep Learning Tutorial 9
No ratings yet
Deep Learning Tutorial 9
70 pages
Lecture 1
No ratings yet
Lecture 1
6 pages
Ch2-Training, Optimization and Regularization of DNN-new
No ratings yet
Ch2-Training, Optimization and Regularization of DNN-new
114 pages
Deep Learning
100% (1)
Deep Learning
189 pages
Lecture 04 - Supervised Learning by Computing Distances (2) - Plain
No ratings yet
Lecture 04 - Supervised Learning by Computing Distances (2) - Plain
16 pages
Bernd Klein Python Data Analysis Letter
No ratings yet
Bernd Klein Python Data Analysis Letter
514 pages
Intro to Machine Learning Lecture
No ratings yet
Intro to Machine Learning Lecture
23 pages
Lecture 03 - Supervised Learning by Computing Distances - Plain
No ratings yet
Lecture 03 - Supervised Learning by Computing Distances - Plain
17 pages
Bernd Klein Python and Machine Learning Letter
No ratings yet
Bernd Klein Python and Machine Learning Letter
453 pages
General Observation
No ratings yet
General Observation
93 pages
Model Training: (Anything Done While We Train The Model)
No ratings yet
Model Training: (Anything Done While We Train The Model)
194 pages
Dataset: (Most Famous)
No ratings yet
Dataset: (Most Famous)
8 pages
Working With Programs: Takeaways: Syntax
No ratings yet
Working With Programs: Takeaways: Syntax
2 pages
A B Testing
100% (1)
A B Testing
28 pages
Understanding Convolutional Neural Networks
No ratings yet
Understanding Convolutional Neural Networks
50 pages
Command Line Python Scripting: Takeaways: Syntax
No ratings yet
Command Line Python Scripting: Takeaways: Syntax
2 pages
Inventory Management and Control System
No ratings yet
Inventory Management and Control System
88 pages
Audio Compression Using Wavelet Techniques: Project Report
No ratings yet
Audio Compression Using Wavelet Techniques: Project Report
41 pages
Quaternion Cheat Sheet and Problems Quaternion Arithmetic: 0 X y Z I 0 X y Z
No ratings yet
Quaternion Cheat Sheet and Problems Quaternion Arithmetic: 0 X y Z I 0 X y Z
2 pages
ICSE VII Maths Ratio and Proportion
67% (3)
ICSE VII Maths Ratio and Proportion
12 pages
Experiment No.4 Atterberg Limits: Object
No ratings yet
Experiment No.4 Atterberg Limits: Object
3 pages
Energy Auditor Exam Guide
No ratings yet
Energy Auditor Exam Guide
22 pages
EMC Engineering Exam Insights
No ratings yet
EMC Engineering Exam Insights
3 pages
Health Psychology: Well-Being in A Diverse World Regan A R Gurung Instant Download
100% (1)
Health Psychology: Well-Being in A Diverse World Regan A R Gurung Instant Download
59 pages
ATI FT Sensor Catalog 2005
No ratings yet
ATI FT Sensor Catalog 2005
32 pages
M4 L1Assessment For Learning Using Assessment To Classify Learning and Understanding
No ratings yet
M4 L1Assessment For Learning Using Assessment To Classify Learning and Understanding
5 pages
Tent Pole Project Report
No ratings yet
Tent Pole Project Report
6 pages
Viviane Namaste - Undoing Theory
No ratings yet
Viviane Namaste - Undoing Theory
23 pages
Steel Works
No ratings yet
Steel Works
13 pages
CHE 1000-E LEARNING - BALANCING REDOX REACTIONS
No ratings yet
CHE 1000-E LEARNING - BALANCING REDOX REACTIONS
17 pages
MRSPTU M.tech. Mechanical Engg. (Sem 1-4) Syllabus Updated On 19.3.2017
No ratings yet
MRSPTU M.tech. Mechanical Engg. (Sem 1-4) Syllabus Updated On 19.3.2017
15 pages
EIL Participates in India Energy Week 2024
No ratings yet
EIL Participates in India Energy Week 2024
9 pages
QMS Internal Audit - 1 Day Trainng
100% (2)
QMS Internal Audit - 1 Day Trainng
104 pages
Preparation of Fermented Blue Crab With Rice and It'S Market Ability
No ratings yet
Preparation of Fermented Blue Crab With Rice and It'S Market Ability
6 pages
Philmetals 2014 - Rev - Reduced PDF
No ratings yet
Philmetals 2014 - Rev - Reduced PDF
82 pages
Landform Development Theories
No ratings yet
Landform Development Theories
25 pages
Hydratight Sweeny RSL
No ratings yet
Hydratight Sweeny RSL
1 page
IFPS User Manual
100% (1)
IFPS User Manual
678 pages
Joseph Matthews - The Renegade Rapport
No ratings yet
Joseph Matthews - The Renegade Rapport
21 pages
Effective Listening Skills
No ratings yet
Effective Listening Skills
16 pages
Supply Chain Management Project
No ratings yet
Supply Chain Management Project
3 pages
Higher Education Strategy 2011-2016
No ratings yet
Higher Education Strategy 2011-2016
4 pages
Math 10 SLM 18 Permutation and Combination
No ratings yet
Math 10 SLM 18 Permutation and Combination
17 pages
Assignment 1
No ratings yet
Assignment 1
1 page
Spare Parts Book SK550 1.1
No ratings yet
Spare Parts Book SK550 1.1
26 pages

Optimization For ML (2) : CS771: Introduction To Machine Learning Piyush Rai

Uploaded by

Optimization For ML (2) : CS771: Introduction To Machine Learning Piyush Rai

Uploaded by

Optimization for ML (2)

CS771: Introduction to Machine Learning

 If no is specified, it is an unconstrained optimization problem

Methods for Solving

 Exercise: Assume , and show that GD update improves prediction on the

Not diff. here Not diff. here

 For a convex, non-diff function , sub-gradient

The affine transform

 Sub-gradient descent is almost identical to GD except we use subgradients

 For iteration (or until convergence)

You might also like