May 27th 2015
Numerical Optimization:
Basic Concepts and Algorithms
R. Duvigneau
R. Duvigneau - Numerical Optimization: Basic Concepts and Algorithms 1
Outline
I Some basic concepts in optimization
I Some classical descent algorithms
I Some (less classical) semi-deterministic approaches
I Illustrations on various analytical problems
I Constrained optimality
I Some algorithm to account for constraints
R. Duvigneau - Numerical Optimization: Basic Concepts and Algorithms 2
Some basic concepts
R. Duvigneau - Numerical Optimization: Basic Concepts and Algorithms 3
Problem description
Definition of a single-criterion parametric problem with real unknown
Minimize f (x ) x ∈ Rn cost fonction
Submitted to gi (x ) = 0 i = 1, · · · , l equality constraints
hj (x ) > 0 j = 1, · · · , m inequality constraints
What does your cost function look like ?
Convex problem Multi-modal problem Noisy problem
R. Duvigneau - Numerical Optimization: Basic Concepts and Algorithms 4
Some commonly used algorithms
I Descent methods : adapted to convex cost functions
steepest descent, conjugate gradient, quasi-Newton, Newton, etc.
I Evolutionary methods : adapted to multi-modal cost functions
genetic algorithms, evolution strategies, particle swarm, ant colony, simulated
annealing, etc.
I Pattern search methods : adapted to noisy cost functions
Nelder-Mead simplex, Torczon’s multidirectional search, etc.
R. Duvigneau - Numerical Optimization: Basic Concepts and Algorithms 5
Optimality conditions
Definition of a minimum
x ? is a minimum of f : Rn 7→ R if and only if there exists ρ > 0 such as:
I f defined on B(x ? , ρ)
I f (x ? ) < f (y ) ∀y ∈ B(x ? , ρ) y 6= x ?
→ not very useful to build algorithms ...
Characterization
A sufficient condition for x ? to be a minimum is (if f twice differentiable):
I ∇f (x ? ) = 0 (stationarity of gradient vector)
I ∇2 f (x ? ) > 0 (Hessian matrix positive definite)
R. Duvigneau - Numerical Optimization: Basic Concepts and Algorithms 6
Some classical descent algorithms
R. Duvigneau - Numerical Optimization: Basic Concepts and Algorithms 7
Descent methods
Model algorithm
For each iteration k (starting from xk ):
I Evaluate gradient ∇f (xk )
I Define a search direction dk (∇f (xk ))
I Line search : choice of step length ρk
I Update: xk+1 = xk + ρk dk
R. Duvigneau - Numerical Optimization: Basic Concepts and Algorithms 8
Choice of the search direction
Steepest-descent method:
I dk = −∇f (xk )
I Descent condition ensured :
∇f (xk ) · dk = −∇f (xk ) · ∇f (xk ) < 0
I But this yields an oscillatory path:
dk+1 · dk = (−∇f (xk+1 )) · dk = 0 (if
exact line search)
I Linear convergence rate:
kx −x ? k Illustration of steepest-descent path
limk→∞ kxk+1−x ? k = a > 0
k
R. Duvigneau - Numerical Optimization: Basic Concepts and Algorithms 9
Choice of the search direction
quasi-Newton method
I dk = −Hk−1 · ∇f (xk ) où Hk
approximate of the Hessian matrix
∇2 f (xk )
I H should fulfill the following
conditions:
I Symmetry
I Positive definite: ∇f (xk ) · dk =
−∇f (xk ) · H −1 · ∇f (xk ) < 0
I 1D approximation of the curvature:
Hk+1 (xk+1 −xk ) = ∇f (xk+1 )−∇f (xk )
I Ex : BFGS method Hk+1 =
Hk − T 1 Hk sk skT HkT + T1 yk ykT
sk H k sk yk sk
Illustration of quasi-Newton method
où sk = xk+1 − xk et
yk = ∇f (xk+1 ) − ∇f (xk )
I Super-linear convergence rate :
kx −x ? k
limk→∞ kxk+1−x ? k = 0
k
R. Duvigneau - Numerical Optimization: Basic Concepts and Algorithms 10
Choice of the step length
A classical criterion to ensure convergence : Armijo-Goldstein
I f (xk + ρk dk ) < f (xk ) + α∇f (xk ) · ρk dk (Armijo)
I f (xk + ρk dk ) > f (xk ) + β∇f (xk ) · ρk dk (Goldstein)
Illustration of Armijo-Goldstein criterion
R. Duvigneau - Numerical Optimization: Basic Concepts and Algorithms 11
Choice of the step length
An other criterion to ensure convergence (gradient required) : Armijo-Wolfe
I f (xk + ρk dk ) < f (xk ) + α∇f (xk ) · ρk dk (Armijo)
I ∇f (xk + ρk dk ) · dk > β∇f (xk ) · dk (Wolfe)
Illustration of Armijo-Wolfe criterion
R. Duvigneau - Numerical Optimization: Basic Concepts and Algorithms 12
Choice of the step length
The step length is determined using an iterative 1D search:
(p)
I Start from an initial guess ρk (p = 0)
(p+1)
I Update to ρk :
I Bisection method
I Polynomial interpolation
I ...
I until stopping criteria are fulfilled
A balance is necessary between the computational cost and the accuracy
R. Duvigneau - Numerical Optimization: Basic Concepts and Algorithms 13
Some (less classical) semi-deterministic approaches
R. Duvigneau - Numerical Optimization: Basic Concepts and Algorithms 14
Evolutionary algorithms
Principles
Inspired by Darwinian theory of evolution :
I A population is composed of individuals who have different characteristics
I Most fitted individuals can survive and reproduce
I An offspring population is generated from survivors
→ Mechanisms to improve progressively the population performance !
R. Duvigneau - Numerical Optimization: Basic Concepts and Algorithms 15
Evolution strategies
Model algorithm (λ, µ)-ES
At each iteration k, a population is characterized by its mean x̄k and its variance σ̄k2 .
Generation of population k + 1 :
I Generation of λ perturbation amplitudes σi = σ̄k e τ N(0,1)
I Generation of λ new individuals xi = x̄k + σi N(0, Id) (mutation)
with N(0, Id) multi-variate normal distribution
I Evaluation of the fitness of the λ individuals
I Choice of µ survivors among the λ new individuals (selection)
I Update of the population characteristics (crossover et self-adaptation) :
µ µ
1X 1X
x̄k+1 = xi σ̄k+1 = σi
µ i=1 µ i=1
R. Duvigneau - Numerical Optimization: Basic Concepts and Algorithms 16
Evolution strategy
Some results
I Proof of convergence towards the global
optimum in a statistical sense :
∀ > 0 limk→∞ P(|f (x̄k ) − f (x ? )| 6 ) = 1
I Linear convergence rate
I Capability to avoid local optima
I Limited to a rather small number of
Illustration of evolution strategy step
parameters (O(10))
R. Duvigneau - Numerical Optimization: Basic Concepts and Algorithms 17
Evolution strategies
Method CMA-ES (Covariance Matrix Adaption)
Imprvement of ES algorithm by using an anisotropic distribution
I offspring population is generated using a covariance matrix Ck :
xi = x̄k + σ̄k N(0, Ck ) = x̄k + σ̄k Bk Dk N(0, Id)
1/2
avec Bk matrix of eigenvectors of Ck et Dk eigenvalues matrix
I Iterative construction of the covariance matrix:
µ
c 1 X i
C0 = Id Ck+1 = (1 − c)Ck + pk pkT + c(1 − ) ω (yi )(yi )T with :
| {z } m m i=1
| {z }
previous estimation | {z }
1D update
covariance of parents
pk evolution path (last moves) et yi = (xi − x̄k )/σk
R. Duvigneau - Numerical Optimization: Basic Concepts and Algorithms 18
Some illustrations using analytical functions
R. Duvigneau - Numerical Optimization: Basic Concepts and Algorithms 19
Rosenbrock function
I Non-convex unimodal function "Banana valley"
I Dimension n = 16
R. Duvigneau - Numerical Optimization: Basic Concepts and Algorithms 20
Rosenbrock function
Steepest descent Quasi-Newton
R. Duvigneau - Numerical Optimization: Basic Concepts and Algorithms 21
Rosenbrock function
ES CMA-ES
R. Duvigneau - Numerical Optimization: Basic Concepts and Algorithms 22
Camelback function
I Dimension n = 2
I Six local minima
I Two global minima
R. Duvigneau - Numerical Optimization: Basic Concepts and Algorithms 23
Camelback function
Quasi-Newton Optimization path
R. Duvigneau - Numerical Optimization: Basic Concepts and Algorithms 24
Camelback function
ES Optimization path
R. Duvigneau - Numerical Optimization: Basic Concepts and Algorithms 25
Constrained optimality
R. Duvigneau - Numerical Optimization: Basic Concepts and Algorithms 26
Introduction
Necessity of constraints
I Often required to define a well-posed problem from mathematical point of view
(existence, unicity)
I Often required to define a problem that make sense from industrial point of
view (manufacturing)
Different types of constraints
I Equality / inequality constraints
I Linear / non-linear constraints
R. Duvigneau - Numerical Optimization: Basic Concepts and Algorithms 27
Linear contraints
Optimality conditions
A sufficient condition for x ? to be a minimum of f subject to A · x = b :
I A · x ? = b (admissibility)
I ∇f (x ? ) = λ? · A with λ? Lagrange multipliers (stationnarity)
I A · ∇2 f (x ? ) · A > 0 (projected Hessian positive definite)
Illustration of optimality conditions for linear constraints
R. Duvigneau - Numerical Optimization: Basic Concepts and Algorithms 28
Linear constraints
Projection algorithm for descent methods
At each iteration k, from an admissible point xk :
I Evaluation of gradient ∇f (xk )
I Choice of an admissible search direction Z · dk with Z a projection matrix (in the
admissible space: A · Z = 0)
I Line search: choice of step length ρk
I Update : xk+1 = xk + ρk Z · dk
R. Duvigneau - Numerical Optimization: Basic Concepts and Algorithms 29
Non-linear constraints
Optimality conditions
A sufficient condition for x ? to be a minimum of f subject to c(x ) = 0 :
I c(x ? ) = 0 (admissibility)
I ∇f (x ? ) = λ? · A(x ? ) with A(x ) = ∇c(x ) (stationnarity)
I A(x ? ) · ∇2 L(x ? , λ? ) · A(x ? ) > 0 with L(x , λ) = f (x ) − λ · c(x ) (projected
Lagrangian positive definite)
Illustration of optimality conditions for non-linear constraints
R. Duvigneau - Numerical Optimization: Basic Concepts and Algorithms 30
Non-linear constraints
Quadratic penalization algorithm
κ
Cost function with penalization: fq (x , κ) = f (x ) + 2
c(x ) · c(x )
It can be shown that: limκ→∞ x ? (κ) = x ?
Algorithm with quadratic penalization:
I Initialisation of κ
I Minimisation of fq (x , κ)
I Increase κ to reduce constraint
violation
Illustration of quadratic penalization
R. Duvigneau - Numerical Optimization: Basic Concepts and Algorithms 31
Non-linear constraints
Absolute penalization algorithm
Cost function with penalization: fa (x , κ) = f (x ) + κ kc(x )k
It can be shown that: ∃κ? such that x ? (κ) = x ? ∀κ > κ?
Algorithm with absolute penalization :
I Initialisation of κ
I Minimisation of fa (x , κ)
I Increase κ until constraint satisfied
Illustration of absolute penalization
R. Duvigneau - Numerical Optimization: Basic Concepts and Algorithms 32
Non linear constraints
Optimality condition in terms of Lagrangian L(x , λ) = f (x ) − λ · c(x )
I ∇λ L(x ? , λ? ) = 0 (admissibility)
I ∇x L(x ? , λ? ) = 0 (stationnarity)
I A(x ) · ∇2 L(x ? , λ? ) · A(x ) > 0 (positive-definite)
SQP algorithm (Sequential Quadratic Programing)
At each iteration k, Newton method applied to (x , λ):
∇2 f (xk ) − λk · ∇2 c(xk )
−A(xk ) δx −∇f (xk ) + λk · A(xk )
· =
−A(xk ) 0 δλ c(xk )
R. Duvigneau - Numerical Optimization: Basic Concepts and Algorithms 33
Some references
Classical methods
I G. N. Venderplaats. Numerical optimization techniques for engineering design.
McGraw-Hill, 1984.
I R. Fletcher. Practical Methods of Optimization. John Wiley & Sons, 1987.
I P. E. Gill, W. Murray, and M. H. Wright. Practical Optimization. Academic
Press, 1981.
Evolutionary methods
I Z. Michalewics. Genetic algorithms + data structures = evolutionary programs.
AI series. Springer-Verlag, New York, 1992.
I D. Goldberg. Genetic Algorithms in Search, Optimization and Machine Learning.
Addison Wesley Company Inc., 1989.
R. Duvigneau - Numerical Optimization: Basic Concepts and Algorithms 34