0% found this document useful (0 votes)

99 views30 pages

Lec3 Gradient Based Method Part I

This document discusses optimization methods for problems with more than one design variable. It begins by introducing Rosenbrock's banana function as a test function for optimization algorithms. It then provides an overview of deterministic optimization methods, including gradient-based and non-gradient based approaches. The rest of the document focuses on gradient-based methods, outlining the general procedure which involves identifying a search direction and performing a line search to minimize the objective function. It also discusses concepts like gradients, Hessians, convexity, and the use of gradients and Hessians in optimization methods.

Uploaded by

Abhay Jindal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

99 views30 pages

Lec3 Gradient Based Method Part I

Uploaded by

Abhay Jindal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 30

Deterministic Unconstrained

Optimisation – Part I

Rosenbrock's banana function

J=f(x1,x2) = (a-x1)2 + b(x2-x12)2

Gobal minima is at x1 = a and x2 =b

minimum f = 0

(Usually a and b are set to 1 and 100 respectively)

Contents

 Comments of characteristics of real life problems

 Classification of optimization problems
 Deterministic optimization methods
 General procedure
 Gradient & Hessian
 General line search methods
 Steepest descent method
 Conjugate gradient method
 Some other methods
Unconstrained minimization
 Characteristics of real-life problems:
 Design variables are invariably more than one
 The objective function may be non-linear
 The objective function may be non-deterministic (not an
issue for the time being)
 Evaluation of objective function may be expensive
 Gradient or Hessian of objective function may not be
available
 We discuss various deterministic methods of
optimization when number of design variables is
more than one
 We also assume that the design variables have
only side constraint (unconstrained optimisation)
Brute force method Line search
method along ei
 Choose ei (unit vectors) as the
set of search directions e2

 Minimize J by searching along

unit vectors one after the other
e1
till the function is minimum
 Method fails if J has a narrow
valley at an angle to the unit
vectors
 Note that a better set of directions than the ei’s
should be possible. Such directions should permit
large step size along narrow valleys be “non-
interfering” directions
Powell’s method
 Powell’s method is an extension of brute force line
search method which uses basis vectors as the
search directions.
 Powell’s method starts with initial guess P0 and uses
each of the basis vector direction, one after the
other, to minimise the function in n steps to locate
Pn. This step is identical to brute force line search
method.
 It then locates the optimal point by line search
method using the vector given by (Pn − P0)
 The method is iterative and the each iteration
requires (n+1) line searches
Gradient based multidimensional
unconstrained minimization
Optimization methods in
n dimensions

Gradient based Non gradient

methods based methods

Methods Methods deterministic Non-deterministic

that do that
not require Nelder– Genetic
require Hessian Mead Algorithms
Hessian Simplex
Simulated
Divided Annealing
Rectangles
Method Particle
Swarm
Optimization
Gradient based multidimensional
unconstrained minimization
General procedure

 Assume that the mathematical statement of the

problem is ready involving
 the objective function
 the design variables (they must be independent) and
 other parameters
 Iteratively search for the optima involving
 identifying the search direction along which optima lies
 searching in that direction for locating the position of
optima by using line search method
 Most procedures require the objective function
and its gradient G
 Some procedures also require Hessian H
Convex design space
 Most optimisation algorithms assume convex
design space
 A real-valued function defined on an n-
dimensional interval is convex if the line segment
between any two points on the graph of the
function is above or on the graph in a Euclidean
space
 In reality design space can be non-convex.
 It is essential to find out if the design space is
convex before attempting optimisation
Convex/concave Design Space in 2D

(x12 ,x22) (x12 ,x22)

(x11 ,x2 1) (x11 ,x21)

Convex 2D domain Concave 2D domain

Convex Sets
a, b S  a  (1   )b S   0,1
Convex vs non-convex function
Condition for convexity
f (x1  (1   ) x2 )  f ( x1 )  (1   ) f ( x2 ), 0    1
y convex function y non-convex function

x2
x2
x2 )
)

x1
x1
1
(
x1

local optima
f (

x x
local and also global optima
global optima
Gradient and Hessian
 J 
 X 
 1
 J 
 X 2 
 
 Gradient of a function is J  G ( J )   . 
 . 
 
 . 
 J 
 X 
 n

 The gradient vector is perpendicular to hyperplane

tangent to the contour surfaces of constant J
 For n=1 and 2, hyperplanes are points and
contour lines respectively
Hessian and its use
 The second derivative of objective function produces
n(n+1)/2 partial derivatives
2 J 2 J
, if i  j and ,i j
x1x2 xi2

 The second-order partial derivatives represents the

Hessian matrix
 2J 2J 2J  Hessian matrix is real square
 ...  (n x n) symmetric matrix
 21x 2
x1x2 x1xn 
We note any real square
  J 2J 2J 
H ... symmetric matrix has
x x x2 2
x2 xn 
 22 1  o only real Eigen values
  J  J  J 
2 2
o has real distinct orthogonal
...
 xn x1 xn x2 xn2 
Eigen vectors if Eigen values
are distinct
Hessian and its use
 Near the minimum, J can be approximated to be
a quadratic in X and can be expressed in terms
of gradient G and Hessian matrix H:

J ( X)  12 XT HX  G T X  C
 It can be seen that condition for minimum is
J ( X)  12 (HX  XT H )  G T  0 or
 12 (H T X  HX)  G  0

If H is symmetric then HX  G  0
 Thus minimisation of J identical to solution of the
linear algebraic equations usually written as
AX=b (H=A and b=-G)
Hessian and its use
 Expanding J(x) about the stationary point x* in a
direction p and noting that G(x*) = 0, at the
stationary point the behavior of the function is
determined by H 0

J (x *   p)  J (x)  G (x) T  p  12  2pT Hp

 J (x*)  12  2pT Hp
 H is a symmetric matrix, and therefore it has real
orthogonal eigenvectors, i.e.
Hui  ui , u  1
J (x *   u i )  J (x*)  12  2uTi Hu i
 J (x*)  12  2 i
Gradient and Hessian
 Thus J(x*+ui) increases or decrease over
and above J(x*) depending on whether λi is
positive, negative or zero
 For J to be minimum H be must be positive
definite, i.e. all its eigen values must be
positive
Gradient based methods
 Assume that J is quadratic and G and H are
constants
J (x)  a  G T x  12 x T Hx and
J  G  Hx
 Therefor unique minimum for J will be given by
J  G  Hx*  0 or
x*  H 1G
 If n is very large, the method is not feasible as it
requires inverse of (n x n) H matrix
 Realistic methods minimize the n-dimensional
function through several 1D line-minimizations.
Line search methods
 Start with X0 and a direction (a vector S0 in n
dimensions)
 Use 1D minimization method and minimize J(α) =
J(X0 + α S0), S0 (or p0) is the initial search
direction and α is the step size.
 Sk is the search direction for major iteration. αk is
the step length from the line search
 The important distinguishing feature of a gradient-
based algorithm its search direction
 Any line search that satisfies sufficient decrease
can be used, but one that satisfies the Strong
Wolfe conditions (on step size) is recommended.
A general gradient based method

start

Input: Initial guess, X0

Search Output: Optimum, X*
direction
k←0
while Not converged do
Compute a search direction Sk
Line Find a step length αk, such that
Update x search J(Xk+αk Sk) < J(Xk)
(the curvature condition may
also be included)

n Is J Update the design variables:

min ? Xk+1 ← Xk + αk Sk
k←k+1
end while
y
stop
Standard procedure (flow chart)

Some methods
do not need H(X)

Sensitivity Analysis
Analysis analysis
Perform
0
Calculate Calculate search
Input X 1D search
J ( X), G ( X), H ( X) direction S q
X q  X q 1   S q
q=q+1
n
y
stop Converged?
The search direction
 There are many algorithms
 Random search
 Powell method
 Steepest descent
 Flecture-Reeves (FR) method
 David-Flecture-Powell (DFP) method
 Broydon-Fletcher-GoldFarb-Shanno (BFGS) method
 Newton's method

 Some of the above are explained

Newton's method- the simplest variant
 If J is twice differentiable, J can be expressed by
using Taylor's series in terms of G and H
     
G X k 1  G X k  H X k ( X k 1  X k )
 
but G X k 1  0 condition for optimality
X  X k 1  X k  H 1G X k or  
X k 1  X k  H 1G ( X k ) or
X k 1  X k  H 1 J ( X k )

 The above expression is similar to Newton’s

method in 1D.
y ' ( x k 1 )  y ' ( x k )  y ' ' ( x k )( x k 1  x k ) or
k 1
x  x  y' ( x ) / y' ' ( x )
k k k
A variant of Newton's method -
Method of Steepest descent
 In the quasi Newton method, the Hessian matrix
is approximated to be the Identity matrix
Xk 1  Xk   I J Xk  
 This is the Method of steepest descent. It uses
the negative of the gradient of objective function
(steepest direction)as the search direction
 Chose 0 <  < 1 for stability (as is usually done)
 We may assume that the change in the magnitude
of X is the same as the one obtained in the
previous iteration. Note that pk=Gk/|Gk|
( k 1)T k 1
G p
 k 1 G ( k 1)T p k 1   k G ( k )T p k   k   k 1
G ( k )T p k
A variant of Newton's method -
Method of Steepest descent
 Alternately, an analytic formula for k can also be
found out by assuming quadratic J in x with G = -b
and H = A calculated at xk

J (x)  12 xT Ax  xT b
J (x k   p k )  12 (x k   p k )T A(x k   p k )  (x k   p k )T b
 12  2 p ( k )T A p k   p ( k )T Ax k   p ( k )T b  constants
as A is (n×n) is symmetric and positive definite
 To minimise J wrt , we set dJ/d=0, which gives

p ( k )T
( Ax k
 b)
 p A p  p A x  p b  0 or   
( k )T k ( k )T k ( k )T

p ( k )T A p k
Method of Steepest descent

 Justification for quasi Newton method

 
Xk 1  Xk   I J Xk , 0    1
 Consider Taylor expansion about Xn

       X
J X  X  J X  J X
k k k T

 Note that LHS and hence RHS must be negative.

J X  X  0
k T

 It can be seen that method of steepest descent

method involves the negative of the gradient of
objective function as the search direction
 It can be shown that the method does not give fast
convergence when close to the local minima
Method of Steepest descent

Input: Initial guess, X0, convergence tolerances, εg, εa and εr.

εg absolute tolerance on gradient (typically 10-6)

εa relative tolerance on objective function (typically 10-2)
εr absolute tolerance on the function (typically 10-2)
Method of Steepest descent
 |J(Xk+1)-J(Xk)|≤a+≤r|J(Xk| is a check for the successive
reductions in J
 If J is of order 1, r dominates, if J is smaller than 1, then
the absolute error dominates
 The method of steepest descent has a problem that with
the exact line search, the steepest descent direction at
each iteration is orthogonal to the previous one
dJ ( X k 1 )
0
d k 1
J ( X )  ( X k   S k )
 0
X K 1

 T J ( X k 1 )S k  0

 G T ( X k 1 )G ( X k )  0
Method of Steepest descent

J ( X)  12 ( X 12  10 X 22 )

 The method is inefficient as successive search directions

are perpendicular to each other.
 Error decreases in the first few iterations, but the method is
slow near the minimum.
 The algorithm is guaranteed to converge, but no of
iterations can be infinite. The rate of convergence is linear.
Steepest descent
Graphical interpretation

The method suffers from poor convergence

Lecture Ends

Gradient and Newton Optimization
No ratings yet
Gradient and Newton Optimization
42 pages
Deep Learning With Tensorflow
100% (1)
Deep Learning With Tensorflow
70 pages
Op Tim Ization
No ratings yet
Op Tim Ization
25 pages
Multivariable Optimization
No ratings yet
Multivariable Optimization
48 pages
Process Optimization
100% (1)
Process Optimization
70 pages
Chapter 2 - Final
No ratings yet
Chapter 2 - Final
11 pages
Unconstrained Multivariable Optimization
No ratings yet
Unconstrained Multivariable Optimization
42 pages
LineSearch Methods
No ratings yet
LineSearch Methods
19 pages
Gradient Based Optimization
No ratings yet
Gradient Based Optimization
24 pages
R22 Machine Learning Digital Notes Final
No ratings yet
R22 Machine Learning Digital Notes Final
143 pages
Conjugate Gradient Method
No ratings yet
Conjugate Gradient Method
50 pages
Clnote Oct8
No ratings yet
Clnote Oct8
39 pages
5 1 SD 17122020
No ratings yet
5 1 SD 17122020
47 pages
ML Unit-Iv
No ratings yet
ML Unit-Iv
18 pages
Nonlinear Optimization with MATLAB
No ratings yet
Nonlinear Optimization with MATLAB
38 pages
Clnote Sept28
No ratings yet
Clnote Sept28
30 pages
Optimization 2
No ratings yet
Optimization 2
40 pages
Lecture8 UnconstrainedII 2023
No ratings yet
Lecture8 UnconstrainedII 2023
57 pages
Optim
No ratings yet
Optim
70 pages
Clnote Oct12
No ratings yet
Clnote Oct12
25 pages
ML Cheatsheet PDF
100% (1)
ML Cheatsheet PDF
211 pages
Lecture 14 From Sensitivities To Optimisation
No ratings yet
Lecture 14 From Sensitivities To Optimisation
20 pages
Mathematical Foundations of Machine Learning
100% (1)
Mathematical Foundations of Machine Learning
340 pages
Optimization Methods for Engineers
No ratings yet
Optimization Methods for Engineers
31 pages
Mathematical Methods of Optimization
No ratings yet
Mathematical Methods of Optimization
62 pages
Lecture 7 (With Notes)
No ratings yet
Lecture 7 (With Notes)
39 pages
Unit VI Optimization Techniques Question Bank Solved Answer
No ratings yet
Unit VI Optimization Techniques Question Bank Solved Answer
20 pages
Gradient Descent Algorithm Guide
No ratings yet
Gradient Descent Algorithm Guide
11 pages
Multivariable Optimization Techniques
No ratings yet
Multivariable Optimization Techniques
25 pages
GA Ex 2
No ratings yet
GA Ex 2
21 pages
Optimumengineeringdesign Day3a
No ratings yet
Optimumengineeringdesign Day3a
34 pages
4 Pattern Directions, 21-08-2024
No ratings yet
4 Pattern Directions, 21-08-2024
58 pages
CS-6777 Liu Abs
100% (1)
CS-6777 Liu Abs
103 pages
Doan BFGS
No ratings yet
Doan BFGS
72 pages
33-Cauchy Method and Fletcher-Reeves Method-13-04-2024
No ratings yet
33-Cauchy Method and Fletcher-Reeves Method-13-04-2024
37 pages
Steepest Descent for Optimization
No ratings yet
Steepest Descent for Optimization
29 pages
An Overview of Traditional Optimization Methods - Truncated
No ratings yet
An Overview of Traditional Optimization Methods - Truncated
17 pages
OPTFIT Aflevering
No ratings yet
OPTFIT Aflevering
9 pages
Unconstrained Numerical Optimization An Introduction For Econometricians
100% (1)
Unconstrained Numerical Optimization An Introduction For Econometricians
32 pages
Nonlinear MPC & Control Applications
No ratings yet
Nonlinear MPC & Control Applications
48 pages
HW 3 Unconstrained-Optimization Advanced
No ratings yet
HW 3 Unconstrained-Optimization Advanced
9 pages
US - TMC - 05 - Optimization 2022
No ratings yet
US - TMC - 05 - Optimization 2022
43 pages
19 Newton Method
No ratings yet
19 Newton Method
10 pages
Lecture 5
No ratings yet
Lecture 5
6 pages
Chương 9
No ratings yet
Chương 9
12 pages
Structural and Multidisciplinary Optimization
No ratings yet
Structural and Multidisciplinary Optimization
33 pages
Optimization Class Notes MTH-9842
No ratings yet
Optimization Class Notes MTH-9842
25 pages
Optimization Techniques Lab
No ratings yet
Optimization Techniques Lab
9 pages
Unconstrained Optimization Methods: Amirkabir University of Technology Dr. Madadi
No ratings yet
Unconstrained Optimization Methods: Amirkabir University of Technology Dr. Madadi
13 pages
Unconstrained Optimization: Prof. S.S. Jang Department of Chemical Engineering National Tsing-Hua Univeristy
No ratings yet
Unconstrained Optimization: Prof. S.S. Jang Department of Chemical Engineering National Tsing-Hua Univeristy
46 pages
Overfitting Underfitting: UNIT 2: Optimization and Regularization in Neural Networks
No ratings yet
Overfitting Underfitting: UNIT 2: Optimization and Regularization in Neural Networks
18 pages
Lecture 7 Newton
No ratings yet
Lecture 7 Newton
44 pages
(k+1) K (K) (K) (K) : Recall That A Direction Is A Vector of Unit Length
No ratings yet
(k+1) K (K) (K) (K) : Recall That A Direction Is A Vector of Unit Length
5 pages
1 Intro
No ratings yet
1 Intro
91 pages
Nonlinear Optimization: Benny Yakir
No ratings yet
Nonlinear Optimization: Benny Yakir
38 pages
Algorithms Process Optimization
No ratings yet
Algorithms Process Optimization
5 pages
Multi Variable Optimization: Min F (X, X, X, - X)
No ratings yet
Multi Variable Optimization: Min F (X, X, X, - X)
38 pages
Steepest Descent Method Overview
No ratings yet
Steepest Descent Method Overview
7 pages
Optimizers and Activation Functions in Deep Learning
No ratings yet
Optimizers and Activation Functions in Deep Learning
15 pages
Exam With Solutions PDF
0% (1)
Exam With Solutions PDF
17 pages
Optimization Based On Gradient Descent
No ratings yet
Optimization Based On Gradient Descent
24 pages
Exam1Review Annotated
No ratings yet
Exam1Review Annotated
13 pages
Multi-Variable Optimization Methods
No ratings yet
Multi-Variable Optimization Methods
21 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
43 pages
Steepest Descent
No ratings yet
Steepest Descent
7 pages
Other Nonlinear Regression Methods For Algebraic Models
No ratings yet
Other Nonlinear Regression Methods For Algebraic Models
17 pages
OpTimIzation Overview
No ratings yet
OpTimIzation Overview
47 pages
Ai Unit 5
No ratings yet
Ai Unit 5
33 pages
Gradient Descent and SGD
No ratings yet
Gradient Descent and SGD
8 pages
Optimization Techniques Lecture
No ratings yet
Optimization Techniques Lecture
37 pages
Global Optimization for ML
No ratings yet
Global Optimization for ML
15 pages
Hydrogen Safety Prediction and Analysis of Hydroge
No ratings yet
Hydrogen Safety Prediction and Analysis of Hydroge
10 pages
Composite Structures: Andjelka Stanic, Blaz Hudobivnik, Boštjan Brank
No ratings yet
Composite Structures: Andjelka Stanic, Blaz Hudobivnik, Boštjan Brank
11 pages
AIML-Module-3-part 2
No ratings yet
AIML-Module-3-part 2
122 pages
Optimization For Engineering Design Algorithms and Examples Introduction Only 45 Page 1st Edition Kalyanmoy Deb
100% (8)
Optimization For Engineering Design Algorithms and Examples Introduction Only 45 Page 1st Edition Kalyanmoy Deb
80 pages
ECOM 6302: Engineering Optimization: Chapter Three
100% (1)
ECOM 6302: Engineering Optimization: Chapter Three
56 pages
Csit (r22) 3-2 Machine Learning Digital Notes
No ratings yet
Csit (r22) 3-2 Machine Learning Digital Notes
120 pages
BookSlides - 7 Part A - Error-Based - Learning
No ratings yet
BookSlides - 7 Part A - Error-Based - Learning
60 pages
Neural Networks in Geophysical Applications
No ratings yet
Neural Networks in Geophysical Applications
17 pages
Subjective Questions
No ratings yet
Subjective Questions
8 pages
Optimization: 1 Motivation
No ratings yet
Optimization: 1 Motivation
20 pages
Unit 2
No ratings yet
Unit 2
25 pages
Himanshu
No ratings yet
Himanshu
22 pages
Stochastic Gradient Descent Algorithm With Python and NumPy - Real
No ratings yet
Stochastic Gradient Descent Algorithm With Python and NumPy - Real
21 pages
Notes On Some Methods For Solving Linear Systems: Dianne P. O'Leary, 1983 and 1999 September 25, 2007
No ratings yet
Notes On Some Methods For Solving Linear Systems: Dianne P. O'Leary, 1983 and 1999 September 25, 2007
11 pages
Cs3491 - Aiml - Unit III - Gradient Descent
No ratings yet
Cs3491 - Aiml - Unit III - Gradient Descent
12 pages
Advanced Stochastic Methods
No ratings yet
Advanced Stochastic Methods
4 pages
Lec 2
No ratings yet
Lec 2
5 pages

Lec3 Gradient Based Method Part I

Uploaded by

Lec3 Gradient Based Method Part I

Uploaded by

Deterministic Unconstrained

Rosenbrock's banana function

Gobal minima is at x1 = a and x2 =b

(Usually a and b are set to 1 and 100 respectively)

 Comments of characteristics of real life problems

 Minimize J by searching along

Gradient based Non gradient

Methods Methods deterministic Non-deterministic

 Assume that the mathematical statement of the

(x12 ,x22) (x12 ,x22)

(x11 ,x2 1) (x11 ,x21)

Convex 2D domain Concave 2D domain

 The gradient vector is perpendicular to hyperplane

 The second-order partial derivatives represents the

J (x *   p)  J (x*)  G (x*) T  p  12  2pT Hp

Input: Initial guess, X0

n Is J Update the design variables:

 Some of the above are explained

 The above expression is similar to Newton’s

 Justification for quasi Newton method

 Note that LHS and hence RHS must be negative.

 It can be seen that method of steepest descent

Input: Initial guess, X0, convergence tolerances, εg, εa and εr.

εg absolute tolerance on gradient (typically 10-6)

 The method is inefficient as successive search directions

The method suffers from poor convergence

You might also like

J (x *   p)  J (x)  G (x) T  p  12  2pT Hp