LMI Methods in Optimal and Robust Control
Matthew M. Peet
Arizona State University
Lecture 02: Optimization (Convex and Otherwise)
What is Optimization?
An Optimization Problem has 3 parts.
min f (x) : subject to
x∈F
gi (x) ≤ 0 i = 1, · · · K1
hi (x) = 0 i = 1, · · · K2
Variables: x ∈ F
• The things you must choose.
• F represents the set of possible choices for the variables.
• Can be vectors, matrices, functions, systems, locations, colors...
I However, computers prefer vectors or matrices.
Objective: f (x)
• A function which assigns a scalar value to any choice of variables.
I e.g. [x1 , x2 ] 7→ x1 − x2 ; red 7→ 4; et c.
Constraints: g(x) ≤ 0; h(x) = 0
• Defines what is a minimally acceptable choice of variables.
• Equality and Inequality constraints are common.
I x is OK if g(x) ≤ 0 and h(x) = 0.
• Constraints mean variables are not independent. (Constrained optimization
is much harder).
M. Peet Lecture 02: Optimization 2 / 31
Formulating Optimization Problems
What do we need to know?
Topics to Cover:
Formulating Constraints
• Tricks of the Trade for expressing constraints.
• Converting everything to equality and inequality constraints.
Equivalence:
• How to Recognize if Two Optimization Problems are Equivalent.
• May be true despite different variables, constraints and objectives
Knowing which Problems are Solvable
• The Convex Ones.
• Some others, if the problem is relatively small.
M. Peet Lecture 02: Optimization 3 / 31
Least Squares
Unconstrained Optimization
Problem: Given a bunch of data in the form
• Inputs: ai
• Outputs: bi
Find the function f (a) = b which best fits the data.
For Least Squares is: Assume f (x) = xT a + xc for
some unknown x ∈ Rn , xc ∈ R and minimize
XK
|xT ai + xc − bi |2
i=1
The Optimization Problem is:
min kAx − bk2
x∈Rn
where
aT1 1
b1
A := ... b := ...
aTK 1 bK
M. Peet Lecture 02: Optimization 4 / 31
Least Squares
Unconstrained Optimization
Least squares problems are easy-ish to solve.
x = (AT A)−1 AT b
Note that A is assumed to be skinny.
• More rows than columns.
• More data points than inputs.
M. Peet Lecture 02: Optimization 5 / 31
Integer Programming Example
MAX-CUT
Figure: Division of a set of nodes to maximize the weighted cost of separation
Goal: Assign each node i an index xi = −1 or xj = 1 to maximize overall cost.
• The cost if xi and xj do not share the same index is wij .
• The cost if they share an index is 0
• The weight wi,j are given.
M. Peet Lecture 02: Optimization 6 / 31
Integer Programming Example
MAX-CUT
Goal: Assign each node i an index xi = −1 or xj = 1 to maximize overall cost.
Variables: x ∈ {−1, 1}n
• Referred to as Integer Variables or Binary
Variables.
• Binary constraints can be incorporated explicitly:
x2i = 1
Integer/Binary variables may be declared directly in YALMIP:
> x = intvar(n);
> y = binvar(n);
M. Peet Lecture 02: Optimization 7 / 31
Integer Programming Example
MAX-CUT
Objective: We use the trick:
• (1 − xi xj ) = 0 if xi and xj have the same sign
(Together).
• (1 − xi xj ) = 2 if xi and xj have the opposite sign
(Apart).
Then the objective function is
1X
min wi,j (1 − xi xj )
2 i,j
The optimization problem is the integer program:
1X
max wi,j (1 − xi xj )
xi =1 2
2
i,j
M. Peet Lecture 02: Optimization 8 / 31
MAX-CUT
The optimization problem is the integer program:
1X
max wi,j (1 − xi xj )
xi =1 2
2
i,j
Consider the MAX-CUT problem with 5 nodes
w12 = w23 = w45 = w15 = w34 = .5 and w14 = w24 = w25 = 0
where wij = wji .
An Optimal Cut IS:
• x1 = x3 = x4 = 1
1
2 5
• x2 = x5 = −1
This cut has objective value
1 3 4
f (x) = 2.5−.5x1 x2 −.5x2 x3 −.5x3 x4 −.5x4 x5 −.5x1 x5 = 4
Figure: An Optimal Cut
M. Peet Lecture 02: Optimization 9 / 31
Dynamic Programming
Open-Loop Case
Objective Function: Lets minimize a quadratic Cost
N
X −1
x(N )T Sx(N ) + x(k)T Qx(k) + u(k)T Ru(k)
k=1
Variables: The states x(k), and inputs, u(k).
Constraint: The dynamics define how u 7→ x.
x(k + 1) = Ax(k) + Bu(k), k = 0, · · · , N
x(0) = 1
Note: The choice of x(0) = 1 does not affect the solution
Combined:
N
X −1
x(N )T Sx(N ) + x(k)T Qx(k) + u(k)T Ru(k)
min
x,u
k=1
x(k + 1) = Ax(k) + Bu(k), k = 0, · · · , N
x(0) = 1
M. Peet Lecture 02: Optimization 10 / 31
Dynamic Programming
Closed-Loop Case
Objective Function: Lets minimize a quadratic Cost
N
X −1
x(N )T Sx(N ) + x(k)T Qx(k) + u(k)T Ru(k)
k=1
Variables: We want a fixed policy (gain matrix, K) which determines u(k)
based on x(k) as u(k) = Kx(k).
Constraint: The dynamics define how u 7→ x.
x(k + 1) = Ax(k) + Bu(k), k = 0, · · · , N
u(k) = Kx(k), x(0) = 1
Combined:
N
X −1
x(N )T Sx(N ) + x(k)T Qx(k) + u(k)T Ru(k)
min
x,u
k=1
x(k + 1) = Ax(k) + Bu(k), k = 0, · · · , N
u(k) = Kx(k), x(0) = 1
M. Peet Lecture 02: Optimization 11 / 31
Formulating Optimization Problems
Equivalence
Definition 1.
Two optimization problems are Equivalent if a solution to one can be used to
construct a solution to the other.
Example 1: Equivalent Objective Functions
Problem 1: min f (x) subject to AT x ≥ b
x
Problem 2: min 10f (x) − 12 subject to AT x ≥ b
x
1
Problem 3: max subject to AT x ≥ b
x f (x)
In this case x∗1 = x∗2 = x∗3 . Proof:
• For any x 6= x∗1 (both feasible), we have f (x) > f (x∗1 ). Thus
10f (x) − 12 > 10f (x∗1 ) − 12 and 1
f (x) < 1
f (x∗ . i.e x is suboptimal for all.
1)
M. Peet Lecture 02: Optimization 12 / 31
Formulating Optimization Problems
Equivalence in Variables
Example 2: Equivalent Variables
Problem 1: min f (x) subject to AT x ≥ b
x
Problem 2: min f (T x + c) subject to (T T A)T x ≥ b − AT c
x
Here x∗1 = T x∗2 + c and x∗2 = T −1 (x∗1 − c).
• Change of variables is invertible. (given x 6= x∗2 , you can show it is
suboptimal)
Example 3: Variable Separability
Problem 1: min f (x) + g(y) subject to AT1 x ≥ b1 , AT2 y ≥ b2
x,y
Problem 2: min f (x) subject to AT1 x ≥ b1
x
Problem 3: min g(y) subject to AT2 y ≥ b2
y
Here x∗1 = x∗2 and y1∗ = y3∗ .
• Neither feasibility nor minimality are coupled.
M. Peet Lecture 02: Optimization 13 / 31
Formulating Optimization Problems
Constraint Equivalence
Example 4: Constraint/Objective Equivalence
Problem 1: min f (x) subject to g(x) ≤ 0
x
Problem 2: min t subject to g(x) ≤ 0, t ≥ f (x)
x,t
Here x∗1 = x∗2 and t∗2 = f (x∗1 ).
Some other Equivalences:
• Redundant Constraints
• Polytopes (Vertices vs. Hyperplanes)
M. Peet Lecture 02: Optimization 14 / 31
Machine Learning
Classification and Support-Vector Machines
In Classification we data inputs (data) (xi ), each of which has a binary label
(yi ∈ {−1, +1})
• yi = +1 means the output of xi belongs to group 1
• yi = −1 means the output of xi belongs to group 2
We want to find a rule (a classifier) which takes 10
y=+1
y=-1
Hw
the data x and predicts which group it is in. 8
H w1
H w2
• Our rule has the form of a function 6
f (x) = wT x − b. Then
4
I x is in group 1 if f (x) = wT x − b > 0.
x is in group 2 if f (x) = wT x − b < 0.
2
I
0
-2 0 2 4 6 8 10
Question: How to find the best w and b??
Figure: We want to find a rule
which separated two sets of data.
M. Peet Lecture 02: Optimization 15 / 31
Machine Learning
Classification and Support-Vector Machines
Definition 2.
• A Hyperplane is the generalization of the concept of line/plane to multiple
dimensions. {x : wT x − b = 0}
• A Half-Space are the parts above and below a Hyperplane.
{x : wT x − b ≥ 0} OR {x : wT x − b ≤ 0}
M. Peet Lecture 02: Optimization 16 / 31
Machine Learning
Classification and Support-Vector Machines
We want to separate the data into disjoint half-spaces and maximize the
distance between these half-spaces
Variables: w ∈ Rn and b define the hyperplane
Constraint: Each existing data point should be
correctly labelled.
• wT x − b > 1 when yi = +1 and
wT x − b < −1 when yi = −1
(Strict Separation)
• Alternatively: yi (wT xi − b) ≥ 1.
Figure: Maximizing the distance
These two constraints are Equivalent. between two sets of Data
Objective: The distance between Hyperplanes {x : wT x − b = 1} and
{x : wT x − b = −1} is 1
f (w, b) = 2 √
wT w
M. Peet Lecture 02: Optimization 17 / 31
Machine Learning
Unconstrained Form (Soft-Margin SVM)
Machine Learning algorithms solve
1 T
min w w, subject to
w∈R p ,b∈R 2
yi (wT xi − b) ≥ 1, ∀i = 1, ..., K.
Soft Margin Problems 9
Soft Margin SVM Problem
data with y=-1
data with y=+1
The hard margin problem can be relaxed to
8
H
w
H
w1
7 Hw2
maximize the distance between hyperplanes
Hξ
PLUS the magnitude of classification errors
5
n
1 2
X 2
min kwk +c max(0, 1−(wT xi −b)yi ). 1
w∈Rp ,b∈R 2 0
i=1 -2 0 2 4 6 8 10
Figure: Data separation using
soft-margin metric and distances
to associated hyperplanes
Link: Repository of Interesting Machine Learning Data Sets
M. Peet Lecture 02: Optimization 18 / 31
Geometric vs. Functional Constraints
These Problems are all equivalent:
The Classical Representation:
minn f (x) : subject to
x∈R
gi (x) ≤ 0 i = 1, · · · k
The Geometric Representation is:
minn f (x) : subject to x∈S
x∈R
where S := {x ∈ Rn : gi (x) ≤ 0, i = 1, · · · , k}.
The Pure Geometric Representation:
minn γ : subject to
γ,x∈R
Sγ 6= ∅ (Sγ has at least one element)
where Sγ := {x : γ − f (x) ≥ 0, gi (x) ≤ 0, i = 1, · · · , k}.
Proposition: Optimization is only as hard as determining feasibility!
M. Peet Lecture 02: Optimization 19 / 31
Solving by Bisection (Do you have an Oracle?)
Optimization Problem:
γ ∗ = min γ : γu S =∅ γ
γ
γ2 S =∅
γ
γ S ≠∅
subject to Sγ 6= ∅ γ4 5 S ≠∅
γ3 S ≠∅
γ
γ
γ1 S ≠∅
γ
Bisection Algorithm:
1 Initialize feasible γu = b
2 Initialize infeasible γl = a γL S ≠∅ γ
γu +γl
3 Set γ = 2
γu +γl
5 If Sγ feasible, set γu = 2
4 If Sγ infeasible, set γl = γu +γ
2
l
6 k =k+1
7 Goto 3
Then γ ∗ ∈ [γl , γu ] and |γu − γl | ≤ b−a
2k
.
Bisection with oracle also solves the
Primary Problem. (max γ : Sγ = ∅)
M. Peet Lecture 02: Optimization 20 / 31
Computational Complexity
In Computer Science, we focus on Complexity of the PROBLEM
• NOT complexity of the algorithm.
On an Turing machine, the # of steps is a fn of
problem size (number of variables)
• NL: A logarithmic # (SORT)
• P: A polynomial # (LP)
• NP: A polynomial # for verification (TSP)
• NP HARD: at least as hard as NP (TSP)
• NP COMPLETE: A set of Equivalent* NP
problems (MAX-CUT, TSP)
• EXPTIME: Solvable in 2p(n) steps.
p polynomial. (Chess)
• EXPSPACE: Solvable with 2p(n) memory.
*Equivalent means there is a polynomial-time reduction from one to the other.
M. Peet Lecture 02: Optimization 21 / 31
How Hard is Optimization?
The Classical Representation:
min f (x) : subject to
x∈Rn
gi (x) ≤ 0 i = 1, · · · k
hi (x) = 0 i = 1, · · · k
Answer: Easy (P) if f, gi are all Convex and hi are linear.
The Geometric Representation:
min f (x) : subject to x∈S
x∈Rn
Answer: Easy (P) if f is Convex and S is a Convex Set.
The Pure Geometric Representation:
max γ : subject to
γ,x∈Rn
(γ, x) ∈ S 0
Answer: Easy (P) if S 0 is a Convex Set.
M. Peet Lecture 02: Optimization 22 / 31
Convex Functions
Definition 3.
An OBJECTIVE FUNCTION or CONSTRAINT function is convex if
f (λx1 + (1 − λ)x2 ) ≤ λf (x1 ) + (1 − λ)f (x2 ) for all λ ∈ [0, 1].
Useful Facts:
• eax , kxk are convex. xn (n ≥ 1 or n ≤ 0), − log x are convex on x ≥ 0
• If f1 is convex and f2 is convex, then f3 (x) := f1 (x) + f2 (x) is convex.
• A function is convex if ∇2 f (x) is positive semidefinite for all x.
• If f1 , f2 are convex, then f3 (x) := max(f1 (x), f2 (x)) is convex.
• If f1 , f2 are convex, and f1 is increasing, then f3 (x) := f1 (f2 (x)) is convex.
M. Peet Lecture 02: Optimization 23 / 31
Convex Sets
Definition 4.
A FEASIBLE SET is convex if for any x, y ∈ Q,
{µx + (1 − µ)y : µ ∈ [0, 1]} ⊂ Q.
The line connecting any two points lies in the set.
Facts:
• If f is convex, then {x : f (x) ≤ 0} is convex.
• The intersection of convex sets is convex.
I If S1 and S2 are convex, then S2 := {x : x ∈ S1 , x ∈ S2 } is convex.
M. Peet Lecture 02: Optimization 24 / 31
Descent Algorithms (Why Convex Optimization is Easy)
Unconstrained Optimization
All descent algorithms are iterative, with a search direction (∆x ∈ Rn ) and step
size (t ≥ 0). xk+1 = xk + t∆x
Gradient Descent Newton’s Algorithm:
∆x = −∇f (x) ∆x = −(∇2 f (x))−1 ∇f (x)
Tries to solve the equation ∇f (x) = 0.
Both converge for sufficiently small step size.
M. Peet Lecture 02: Optimization 25 / 31
Descent Algorithms
Dealing with Constraints
Method 1: Gradient Projection
Figure: Must project step (t∆x) onto feasible Set
Method 2: Barrier Functions
min f (x) + log(g(x))
x
Converts a Constrained problem to an unconstrained problem.
(Interior-Point Methods)
M. Peet Lecture 02: Optimization 26 / 31
Non-Convexity and Local Optima
1. For convex optimization problems, Descent Methods Always find the global
optimal point.
2. For non-convex optimization, Descent Algorithms may get stuck at local
optima.
M. Peet Lecture 02: Optimization 27 / 31
T
minimize c x
Important Classes of Optimization Problems
Linear Programming
subject to Ax = b
Cx ≤ d
Example
Linear Programming (LP)
minimize x 1 + x2
minn cTto
subject x: 3x1 subject
+ x2 ≥ 3to
x∈R
Ax ≤ b x2 ≥ 1
x1 ≤ 4
A0 x = b0
−x1 + 5x2 ≤ 20
x1 + 4x2 ≤ 20
• EASY: Simplex/Ellipsoid Algorithm (P)
• Can solve for >10,000 variables
Link: A List of Solvers, Performance and Benchmark Problems
M. Peet Lecture 02: Optimization 28 / 31
Important Classes of Optimization Problems
Quadratic Programming
Quadratic Programming (QP)
min xT Qx + cT x : subject to
x∈Rn
Ax ≤ b
• EASY (P): If Q ≥ 0.
• HARD (NP-Hard): Otherwise
M. Peet Lecture 02: Optimization 29 / 31
Important Classes of Optimization Problems
Mixed-Integer Linear Programming
Mixed-Integer Linear Programming (MILP)
min cT x : subject to
x∈Rn
Ax ≤ b
xi ∈ Z i = 1, · · · K
• HARD (NP-Hard)
Mixed-Integer Linear Programming (MINLP)
min f (x) : subject to
x∈Rn
gi (x) ≤ 0
xi ∈ Z i = 1, · · · K
• Very Hard
M. Peet Lecture 02: Optimization 30 / 31
Next Time:
Positive Matrices, SDP and LMIs
• Also a bit on Duality, Relaxations.
M. Peet Lecture 02: Optimization 31 / 31