Numerical geometry of non-rigid shapes Numerical Optimization
Numerical Optimization
Alexander Bronstein, Michael Bronstein 2008 All rights reserved. Web: tosca.cs.technion.ac.il
Numerical geometry of non-rigid shapes Numerical Optimization
Slowest
Longest
Shortest
Maximal
Minimal
Fastest Largest
Smallest
Common denominator: optimization problems
Numerical geometry of non-rigid shapes Numerical Optimization
Optimization problems
Generic unconstrained minimization problem
where Vector space A solution The value is the minimum is the search space
is a cost (or objective) function is the minimizer of
Numerical geometry of non-rigid shapes Numerical Optimization
Local vs. global minimum
Find minimum by analyzing the local behavior of the cost function
Local minimum
Global minimum
Numerical geometry of non-rigid shapes Numerical Optimization
Local vs. global in real life
False summit 8,030 m
Main summit 8,047 m
Broad Peak (K3), 12th highest mountain on Earth
Numerical geometry of non-rigid shapes Numerical Optimization
Convex functions
A function defined on a convex set is called convex if
for any
and
For convex function local minimum = global minimum
Convex
Non-convex
Numerical geometry of non-rigid shapes Numerical Optimization
One-dimensional optimality conditions
Point Approximate a function around as a parabola using Taylor expansion . is the local minimizer of a -function if
guarantees the minimum at
guarantees the parabola is convex
Numerical geometry of non-rigid shapes Numerical Optimization
Gradient
In multidimensional case, linearization of the function according to Taylor
gives a multidimensional analogy of the derivative.
The function
, denoted as
, is called the gradient of
In one-dimensional case, it reduces to standard definition of derivative
Numerical geometry of non-rigid shapes Numerical Optimization
Gradient
In Euclidean space ( ), can be represented in standard basis
in the following way:
i-th place
which gives
Numerical geometry of non-rigid shapes Numerical Optimization
10
Example 1: gradient of a matrix function
Given product Compute the gradient of the function an matrix where is (space of real matrices) with standard inner
For square matrices
Numerical geometry of non-rigid shapes Numerical Optimization
11
Example 2: gradient of a matrix function
Compute the gradient of the function an matrix where is
Numerical geometry of non-rigid shapes Numerical Optimization
12
Hessian
Linearization of the gradient
gives a multidimensional analogy of the secondorder derivative.
The function
is called the Hessian of
, denoted as
Ludwig Otto Hesse (1811-1874)
In the standard basis, Hessian is a symmetric matrix of mixed second-order
derivatives
Numerical geometry of non-rigid shapes Numerical Optimization
13
Optimality conditions, bis
Point matrix (denoted Approximate a function around . for all ) as a parabola using Taylor expansion , i.e., the Hessian is a positive definite is the local minimizer of a -function if
guarantees the minimum at
guarantees the parabola is convex
Numerical geometry of non-rigid shapes Numerical Optimization
14
Optimization algorithms
Descent direction Step size
Numerical geometry of non-rigid shapes Numerical Optimization
15
Generic optimization algorithm
Start with some Determine descent direction
Choose step size
such that
Update iterate
Until convergence
Increment iteration counter Solution Descent direction Step size Stopping criterion
Numerical geometry of non-rigid shapes Numerical Optimization
16
Stopping criteria
Near local minimum, (or equivalently )
Stop when gradient norm becomes small
Stop when step size becomes small
Stop when relative objective change becomes small
Numerical geometry of non-rigid shapes Numerical Optimization
17
Line search
Optimal step size can be found by solving a one-dimensional optimization problem
One-dimensional optimization algorithms for finding the optimal step size are generically called exact line search
Numerical geometry of non-rigid shapes Numerical Optimization
18
Armijo [ar-mi-xo] rule
The function sufficiently decreases if Armijo rule (Larry Armijo, 1966): start with multiplying by some and decrease it by
until the function sufficiently decreases
Numerical geometry of non-rigid shapes Numerical Optimization
19
Descent direction
How to descend in the fastest way? Go in the direction in which the height lines are the densest
Devils Tower
Topographic map
Numerical geometry of non-rigid shapes Numerical Optimization
20
Steepest descent
Directional derivative: how much changes in the direction (negative for a descent direction)
Find a unit-length direction minimizing directional
derivative
Numerical geometry of non-rigid shapes Numerical Optimization
21
Steepest descent
L2 norm
L1 norm
Normalized steepest descent
Coordinate descent (coordinate axis in which descent is maximal)
Numerical geometry of non-rigid shapes Numerical Optimization
22
Steepest descent algorithm
Start with some Compute steepest descent direction
Choose step size using line search
Until convergence
Update iterate
Increment iteration counter
Numerical geometry of non-rigid shapes Numerical Optimization
23
MATLAB
intermezzo
Steepest descent
Numerical geometry of non-rigid shapes Numerical Optimization
24
Condition number
Condition number is the ratio of maximal and minimal eigenvalues of the Hessian
1
,
1
0.5
0.5
-0.5
-0.5
-1 -1
-0.5
0.5
-1 -1
-0.5
0.5
Problem with large condition number is called ill-conditioned Steepest descent convergence rate is slow for ill-conditioned problems
Numerical geometry of non-rigid shapes Numerical Optimization
25
Q-norm
Change of coordinates
Q-norm
L2 norm
Function Gradient Descent direction
Numerical geometry of non-rigid shapes Numerical Optimization
26
Preconditioning
Using Q-norm for steepest descent can be regarded as a change of coordinates, called preconditioning Preconditioner should be chosen to improve the condition number of
the Hessian in the proximity of the solution In system of coordinates, the Hessian at the solution is
(a dream)
Numerical geometry of non-rigid shapes Numerical Optimization
27
Newton method as optimal preconditioner
Best theoretically possible preconditioner direction , giving descent
Ideal condition number
Problem: the solution
is unknown in advance
Newton direction: use Hessian as a preconditioner at each iteration
Numerical geometry of non-rigid shapes Numerical Optimization
28
Another derivation of the Newton method
Approximate the function as a quadratic function using second-order Taylor expansion
(quadratic function in
Close to solution the function looks like a quadratic function; the Newton method converges fast
Numerical geometry of non-rigid shapes Numerical Optimization
29
Newton method
Start with some Compute Newton direction
Choose step size using line search
Until convergence
Update iterate
Increment iteration counter
Numerical geometry of non-rigid shapes Numerical Optimization
30
Frozen Hessian
Observation: close to the optimum, the Hessian does not change significantly Reduce the number of Hessian inversions by keeping the Hessian from previous iterations and update it once in a few iterations Such a method is called Newton with frozen Hessian
Numerical geometry of non-rigid shapes Numerical Optimization
31
Cholesky factorization
Decompose the Hessian
where
is a lower triangular matrix
Solve the Newton system Andre Louis Cholesky (1875-1918)
in two steps Forward substitution
Backward substitution
Complexity: , better than straightforward matrix inversion
Numerical geometry of non-rigid shapes Numerical Optimization
32
Truncated Newton
Solve the Newton system approximately
A few iterations of conjugate gradients or other algorithm for the solution of linear systems can be used Such a method is called truncated or inexact Newton
Numerical geometry of non-rigid shapes Numerical Optimization
33
Non-convex optimization
Using convex optimization methods with non-convex functions does not guarantee global convergence! There is no theoretical guaranteed global optimization, just heuristics
Local minimum
Global minimum
Good initialization
Multiresolution
Numerical geometry of non-rigid shapes Numerical Optimization
34
Iterative majorization
Construct a majorizing function . Majorizing inequality: for all satisfying
is convex or easier to optimize w.r.t.
Numerical geometry of non-rigid shapes Numerical Optimization
35
Iterative majorization
Start with some Find such that
Update iterate
Until convergence
Increment iteration counter Solution
Numerical geometry of non-rigid shapes Numerical Optimization
36
Constrained optimization
MINEFIELD CLOSED ZONE
Numerical geometry of non-rigid shapes Numerical Optimization
37
Constrained optimization problems
Generic constrained minimization problem
where
are inequality constraints
are equality constraints in which the constraints hold is called
A subset of the search space feasible set A point
belonging to the feasible set is called a feasible solution may be infeasible!
A minimizer of the problem
Numerical geometry of non-rigid shapes Numerical Optimization
38
An example
Equality constraint Inequality constraint
Feasible set
Inequality constraint
A point
is active at point
if
, inactive otherwise
and of
is regular if the gradients of equality constraints are linearly independent
active inequality constraints
Numerical geometry of non-rigid shapes Numerical Optimization
39
Lagrange multipliers
Main idea to solve constrained problems: arrange the objective and constraints into a single function
and minimize it as an unconstrained problem is called Lagrangian and are called Lagrange multipliers
Numerical geometry of non-rigid shapes Numerical Optimization
40
KKT conditions
If is a regular point and a local minimum, there exist Lagrange multipliers and Known as Karush-Kuhn-Tucker conditions Necessary but not sufficient! such that for all such that inactive constraints and for all
for active constraints and zero for
Numerical geometry of non-rigid shapes Numerical Optimization
41
KKT conditions
Sufficient conditions:
If the objective
is convex, the inequality constraints
are affine, and and for all
are convex
and the equality constraints then for all such that inactive constraints
for active constraints and zero for
is the solution of the constrained problem (global constrained
minimizer)
Numerical geometry of non-rigid shapes Numerical Optimization
42
Geometric interpretation
Consider a simpler problem: Equality constraint
The gradient of objective and constraint must line up at the solution
Numerical geometry of non-rigid shapes Numerical Optimization
43
Penalty methods
Define a penalty aggregate
where
and
are parametric penalty functions
For larger values of the parameter
is stronger
, the penalty on the constraint violation
Numerical geometry of non-rigid shapes Numerical Optimization
44
Penalty methods
Inequality penalty
Equality penalty
Numerical geometry of non-rigid shapes Numerical Optimization
45
Penalty methods
Start with some Find and initial value of
by solving an unconstrained optimization problem initialized with
Until convergence
Set
Set Update
Solution