Engineering Optimization
Concepts and Applications
Lise Noël
Matthijs Langelaar
3mE-PME 34-G-1-300
[email protected]
[email protected]
1ME46060
Unconstrained optimization
algorithms
● Single-variable methods
● Multiple variable methods (‘multivariate optimization’)
– 0th order
– 1st order
– 2nd order
1
Recap optimization algorithms
● Aspects to consider:
– Efficiency (speed of convergence, computational effort,
scaling with nr. of variables)
– Use of derivatives
– Ability to handle non-smooth problems
– Ability to find global optima
– Termination criteria
● Exhaustive approaches (brute force) generally not
feasible
Summary single variable methods
● Bracketing +
Dichotomous sectioning
Fibonacci sectioning
0th order
Golden ratio sectioning
Quadratic interpolation In practice:
additional “tricks”
needed to deal
Cubic interpolation with:
Bisection method 1st order Multimodality
Strong
Secant method fluctuations
Round-off
errors
Newton method 2nd order Divergence
● And many more!
2
Unconstrained optimization
algorithms
● Single-variable methods min f (x)
x
● Multiple variable methods xxx
– 0th order Direct search methods
– 1st order Descent methods
or
Hill-climbing methods
or
Gradient-based methods
– 2nd order
Contents
● General aspects
● Direct search methods:
– Random methods
– Cyclic coordinate search / Powell’s conjugate directions
– Nelder-Mead simplex method
– Biologically inspired methods
Genetic algorithms
Particle swarm / ant colony
● First order methods
6
3
Algorithm performance
● Comparison of performance of algorithms:
– Mathematical convergence proofs
– Performance on benchmark problems (test functions)
● Examples of test functions:
– Rosenbrock’s function (“banana function”)
f 100 x2 x1 1 x
2 2
1
2
Optimum: (1, 1)
Test functions
● Quadratic function:
f ( x1 2 x2 7) 2 (2 x1 x2 5) 2
Optimum: (1, 3)
● Many local optima:
2 2
f 50 cos x1 cos x2 x1 x2
Optimum: (0, 0)
● And many others …
4
Please note:
● In this lecture: ● In reality:
isolines of objective underlying function
function shown for unknown!
illustration purposes
d
? ?
Many ? Optimizer
function
? ? must
evaluations d? evaluate
are needed ? ? f (x) at
to create ?d individual
this plot! designs
Practical example
● Try your own approach!
● 2 design variables
● Single, global optimum
● Try to locate minimum in
least number of
evaluations
(evaluations are
expensive)
10
5
Contents
● General aspects
● Direct search methods:
– Random methods
– Cyclic coordinate search / Powell’s conjugate directions
– Nelder-Mead simplex method
– Biologically inspired methods
Genetic algorithms
Particle swarm / ant colony
● First order methods
11
Random (stochastic) methods
● Random jumping method:
(random search)
– Generate random points,
remember the best
● Random walk method:
– Generate random unit direction
vectors
– “Walk” to new point if better
– Decrease stepsize after N steps
12
6
Simulated Annealing
● Random method inspired by physical process: annealing
= Heating and gradual cooling
of metal/glass to relieve
internal stresses
– End result: minimum internal energy
– Temperature-dependent probability
of local internal energy change
Local
internal
– Some chance on local energy energy
increase exists!
Time
13
Simulated Annealing Algorithm
1. Set a starting “temperature” T, pick a starting design x,
and obtain f(x)
2. Randomly generate a new design y
close to x, and obtain f(y)
3. If f(y) < f(x), accept and Continue 4.
Otherwise:
1. Compute probability P(T) of accepting worse design
2. Use random number to accept design y or not. Continue 4.
4. Reduce temperature T. Continue 2. (Until convergence)
14
7
Simulated Annealing Algorithm (2)
● Probability of accepting ‘bad’ step depends on T,
as in physical annealing: f ( x ) f ( y ) <0
Paccept f ( y ) f ( x ) e T
● Test: generate random number r, accept if r < P.
● As temperature reduces, probability of accepting a
bad step reduces as well:
f ( x ) f ( y )
P
f ( x ) f ( y ) e T
r P e T 1 f ( x ) f ( y )
T
Increasingly negative
15
Simulated Annealing Properties
● Accepting bad steps (“energy increase”) likely in initial
phase, but less likely at the end
T = 0: basic random walk method
SA can escape local optima,
especially at the start
● Variants: several steps before test,
cooling schemes,
reheating, …
Matlab: simulannealbnd
16
8
Random methods properties
● Very robust: work also for discontinuous /
nondifferentiable functions
● Can find global minimum (unknown when)
● Quite inefficient, but can be used in
initial stage to determine promising
starting point
● Last resort: when all else fails
● S.A. known to perform well on several
hard problems (“traveling salesman”)
● Drawback: results not repeatable
(unless you initialize random number generator with fixed settings)
17
Contents
● General aspects
● Direct search methods:
– Random methods
– Cyclic coordinate search / Powell’s conjugate directions
– Nelder-Mead simplex method
– Biologically inspired methods
Genetic algorithms
Particle swarm / ant colony
● First order methods
18
9
Cyclic coordinate search
● Search alternatingly in each coordinate direction (design variable)
● Perform single-variable optimization along each direction s
(line search, = partial minimization):
min f (x s)
● Directions fixed: can lead
to slow convergence
19
Powell’s Conjugate Directions method
● Adjusting search directions improves convergence
● Idea: after a full cycle, also search in combined direction,
and add it to the direction set while removing the first one:
s1 s2
s2 s3
s (1)s (1)s s (2)s (2)s
3 1 1 2 2 4 2 2 3 3
Directions in cycle 1 Directions in cycle 2
● Guaranteed to converge in
1s1 2s 2
n cycles for quadratic
functions! (theoretically) 2s 2
1s1
(= n(n+1) line searches)
20
10
Nelder-Mead Simplex method
● Simplex: figure of n + 1 points in Rn
● Gradually move toward minimum by
reflecting worst point through centroid of
other points: f = 10
f=5 f=7
● For better performance:
expansion/contraction and
other adjustments
21
Nelder-Mead Simplex in action
f ( x1 , x2 ) ( x12 x2 7) 2
( x1 x22 11) 2
Matlab: see fminsearch
22
11
Contents
● General aspects
● Direct search methods:
– Random methods
– Cyclic coordinate search / Powell’s conjugate directions
– Nelder-Mead simplex method
– Biologically inspired methods
Genetic algorithms
Particle swarm / ant colony
● First order methods
23
Biologically inspired methods
● Popular: inspiration for algorithms
from biological processes:
– Genetic algorithms / evolutionary optimization
– Particle swarms / flocks
– Ant colony methods
● Typically make use of population (collection of designs)
● Computationally intensive
● Stochastic nature, global optimization properties
24
12
Genetic algorithms (GA)
● Based on evolution theory of Darwin:
Survival of the fittest
● Objective = fitness function
● Designs are encoded in chromosomal
strings, ~ genes: e.g. binary strings:
1 1 0 1 0 0 1 0 1 1 0 0 1 0 1
Can also include
x1 x2 discrete variables!
25
GA flowchart
Create initial
population Evaluate fitness
of all individuals
Create new population Test termination
criteria
Crossover Mutation Reproduction
Select individuals
for reproduction Quit
26
13
GA population operators
● Reproduction:
– Exact copy/copies of individual
● Mutation:
– Randomly flip some bits of a gene string
– Used sparingly, but important to explore new designs
1 1 0 1 0 0 1 0 1 1 0 0 1 0 1
1 1 0 1 0 1 1 0 1 1 0 0 1 0 1
27
GA population operators (2)
● Crossover:
– Randomly exchange genes of
different parents
– Many possibilities: how many
genes, parents, children …
Parent 1 Parent 2
1 1 0 1 0 0 1 0 1 1 0 0 1 0 1 0 1 1 0 1 0 0 1 0 1 1 0 0 0 1
0 1 1 0 0 0 1 0 1 1 0 0 1 0 1 1 1 0 1 1 0 0 1 0 1 1 0 0 0 1
Child 1 Child 2
28
14
Genetic Algorithm properties
● Random:
– Very robust: work also for
discontinuous / nondifferentiable
functions
– Can find global minimum (but unknown when)
● Many different variations / strategies / parameters, not
easy to determine best settings
● Computationally intensive (population generations)
● Population: set of results Matlab: see ga
29
Particle swarms / flocks
● No genes and reproduction, but a population that
travels through the design space
● Derived from simulations of flocks/schools in nature
● Individuals tend to follow the individual with the best
fitness value, but also determine their own path
● Some randomness added to give explo-
ration properties (“craziness parameter”)
Matlab: Ant colony: MIDACO*
30 *http://www.midaco-solver.com/
15
PSO (Particle Swarm Optimization)
Example:
http://www.itm.uni-stuttgart.de/research/pso_opt
31
Basic particle swarm Matlab: see
particleswarm
optimization algorithm
1. Initialize location x0 and speed v0 of individuals (random)
2. Evaluate fitness (=objective) for each individual
3. Update best positions: individual (y) and overall (Y)
4. Update velocity and position:
vi 1 vi c1r1 yi xi c2 r2 Yi xi
x i 1 x i v i 1
Control “social behavior” Random number vectors
vs “individual behavior” between 0 and 1
32
16
Overview 0th order methods
● Random:
– Jumping, Walk, Simulated Annealing
– Biological inspired: GA, particle swarm / ant colony
● Cyclic coordinate search (series of 1D optimizations)
● Powell’s conjugate directions
● Nelder-Mead Simplex
(for ~smooth problems)
33
Summary 0th order methods
● Nelder-Mead beats Powell in most cases
● Robust: most can deal with discontinuity etc.
● Less attractive for many design variables (>10)
● Stochastic techniques:
– Computationally expensive, but
– Global optimization properties
– Versatile
● Population-based algorithms are easy to
combine with parallel computing
34
17
Unconstrained optimization
algorithms
● Single-variable methods
● Multiple variable methods
– 0th order
– 1st order
– 2nd order
35
Practical example 2
● Now also gradient
information available in
each point
● Again, try to locate the
optimum in the least
number of steps!
36
18
Steepest descent method
● Move in direction of largest decrease in f :
f (x hs) f (x) f sh o(h 2 )
T
Taylor:
df f (x hs) f (x) f sh
T
Best direction: s f x2 f = 1.9
-f
● Example: 4 2
f x1 2 x1 x2 x2
2
f = 0.044
4 x13 4 x1 x2 2 x1
f 2
2( x1 x 2 ) -f
1
2
2 x1 2 x2
f = 7.2
Divergence occurs! Remedy: line search x1
37
Steepest Descent algorithm
1. Start with abritrary x1
2. Set first search direction: d1 f1
3. Line search to find next point: x i 1 x i i d i
4. Update search direction: di 1 f i 1
5. Repeat 3
38
19
Steepest Descent method (2)
● With line search: Line search
f = const > f
min f (x f )
– Gradient is perpendi-
f = const
cular to isoline f
– Line search direction is Along isoline:
tangent to isoline at line Directional derivative = 0:
f (x ht ) f (x)
search minimum f t 0
T
h
– New gradient after line
f t
search is perpendicular
f t
to previous direction
39
Steepest Descent algorithm
1. Start with abritrary x1
2. Set first search direction: d1 f1
3. Line search to find next point: x i 1 x i i d i
4. Update search direction: di 1 f i 1
di 1 di
5. Repeat 3
40
20
Steepest descent convergence
● Zig-zag
convergence
behavior:
(each step to
previous one)
41
Effect of variable scaling
on Steepest Descent
● Scaling of variables
helps a lot! yx2
2 2
f x1 16x2
y1 x1 , y2 4 x2
2 2
f y1 y2
● Ideal scaling hard to
determine (requires
Hessian information) yx1
42
21
Fletcher-Reeves
Conjugate Gradient method (CG)
● Based on building set of N conjugate directions,
combined with line searches Symmetric, positive definite
● Conjugate directions:
d i Ad j 0 | i j
T
(Examples: orthogonal directions, eigenvectors)
● Conjugate gradient method:
– Matrix A not needed, set of directions d constructed during process
– Guaranteed convergence:
minimizes quadratic problems in N line search steps
(recall Powell’s Conjugate Directions: N cycles of N+1 line searches)
43
Conjugate Gradient algorithm
1. Start with abritrary x1
2. Set first search direction: d1 f1
3. Line search to find next point: x i 1 x i i d i
2
f i 1
4. Next search direction: d i 1 fi 1 2
di
f i
5. Repeat 3
6. Restart every (n+1) steps, using step 2
(proofs & underlying math omitted)
44
22
CG properties
● Theoretically converges in N steps
or less for quadratic functions
● No zig-zag like in steepest descent
● In practice:
– Non-quadratic functions
– Finite line search accuracy Slower convergence;
> N steps
– Round-off errors
● After N steps / bad convergence: restart procedure
d N 1 f N 1 etc.
45
First order Multivariate Unconstrained
Optimization Algorithms - Summary
● First order methods (descent methods):
– Steepest descent method (with line search)
– Fletcher-Reeves Conjugate Gradient method
– Quasi-Newton methods (next lecture) y2
● Conclusions (for now):
– Scaling important for Steepest Descent (zig-zag)
y1
– For quadratic problem, CG converges in N steps
● More in next lecture
46
23
Final remarks
● Papalambros: par. 7.1, 7.2
● Exercise 4.1 + 4.2: unconstrained optimization,
steepest descent (4.3 later)
– Hand-in Exercise 3 & 4: Friday May 20, 11pm, via BS
(keep it brief,
no extensive reports needed)
– Pairs: please both submit
47
24