Convexity I: Sets and Functions
Ryan Tibshirani
Convex Optimization 10-725
See supplements for reviews of
• basic real analysis
• basic multivariate calculus
• basic linear algebra
Last time: why convexity?
Why convexity? Simply put: because we can broadly understand
and solve convex optimization problems
Nonconvex problems are mostly treated on a case by case basis
Reminder: a convex optimization problem is of ●
the form ●
min f (x) ●
x∈D ●
subject to gi (x) ≤ 0, i = 1, . . . , m
hj (x) = 0, j = 1, . . . , r
● ●
where f and gi , i = 1, . . . , m are all convex, and
●
hj , j = 1, . . . , r are affine. Special property:
●●
any local minimizer is a global minimizer
2
Outline
Today:
• Convex sets
• Examples
• Key properties
• Operations preserving convexity
• Same, for convex functions
3
Convex sets
Convex set: C ⊆ Rn such that
x, y ∈ C =⇒ tx + (1 − t)y ∈ C for all 0 ≤ t ≤ 1
In words,
24
line segment joining any two elements lies entirely2 inConvex
set sets
Convex combination
which includes1 its boundary
Rn :darker),
Figure 2.2 Some simple convex and
of x , . . . , xk ∈(shown nonconvex sets. Left. The hexagon,
any linear combination
is convex. Middle. The kidney
shaped set is not convex, since the line segment between the two points in
the set shown as dots is not contained in the set. Right. The square contains
θ x + ··· + θ x
some boundary1 points
1 but not others,k kand is not convex.
P
with θi ≥ 0, i = 1, . . . , k, and ki=1 θi = 1. Convex hull of a set C,
conv(C), is all convex combinations of elements. Always convex
4
Examples of convex sets
• Trivial ones: empty set, point, line
• Norm ball: {x : kxk ≤ r}, for given norm k · k, radius r
• Hyperplane: {x : aT x = b}, for given a, b
• Halfspace: {x : aT x ≤ b}
• Affine space: {x : Ax = b}, for given A, b
5
• Polyhedron: {x : Ax ≤ b}, where inequality ≤ is interpreted
componentwise. Note: the set {x : Ax ≤ b, Cx = d} is also a
polyhedron
32 (why?) 2 Convex sets
a1
a2
P
a5
a3
a4
Figure 2.11 The polyhedron P (shown shaded) is the intersection of five
• Simplex: special case of polyhedra, given by
halfspaces, with outward normal vectors a , . . . . , a .
1 5
conv{x0 , . .when }, where
. , itxiskbounded). these points are affinely independent.
Figure 2.11 shows an example of a polyhedron defined as the
The canonical example is the probability simplex,
intersection of five halfspaces.
It will be convenient to use the compact notation
P = {x | Ax ≼ b, Cx = d} (2.6)
conv{e 1 , . . . , en }
for (2.5), where ⎡
=⎤ {w : w
⎡
≥⎤ 0, 1T w = 1}
aT1 cT1
⎢ .. ⎥ , ⎢ ⎥
A=⎣ . ⎦ C = ⎣ ... ⎦ ,
aTm cTp
and the symbol ≼ denotes vector inequality or componentwise inequality in Rm :
6
Cones
Cone: C ⊆ Rn such that
x ∈ C =⇒ tx ∈ C for all t ≥ 0
Convex cone:
26 cone that is also convex, i.e., 2 Convex sets
x1 , x2 ∈ C =⇒ t1 x1 + t2 x2 ∈ C for all t1 , t2 ≥ 0
x1
x2
0
Figure 2.4 The pie slice shows all points of the form θ1 x1 + θ2 x2 , where
Conic combination of x1 , . . . , xk ∈ Rn : any linear combination
θ1 , θ2 ≥ 0. The apex of the slice (which corresponds to θ1 = θ2 = 0) is at
0; its edges (which correspond to θ1 = 0 or θ2 = 0) pass through the points
x1 and x2 .
θ 1 x 1 + · · · + θk x k
with θi ≥ 0, i = 1, . . . , k. Conic hull collects all conic combinations
7
Examples of convex cones
• Norm cone: {(x, t) : kxk ≤ t}, for a norm k · k. Under the `2
norm k · k2 , called second-order cone
• Normal cone: given any set C and point x ∈ C, we can define
NC (x) = {g : g T x ≥ g T y, for all y ∈ C}
●
●
This is always a convex cone,
● ●
regardless of C
• Positive semidefinite cone: Sn n
+ = {X ∈ S : X 0}, where
X 0 means that X is positive semidefinite (and Sn is the
set of n × n symmetric matrices)
8
2
points and is contained in E2 .
Key properties of convex sets
• Separating hyperplane theorem: two disjoint convex sets have
a separating between hyperplane them
aT x ≥ b aT x ≤ b
D
C
Figure 2.19 The hyperplane {x | aT x = b} separates the disjoint convex sets
if D.C,TheDaffine
Formally:Con and
D.
arefunction
nonempty T
convex
a x − b is nonpositive on Csets with C ∩ D = ∅,
and nonnegative
then there exists a, b such that
C ⊆ {x : aT x ≤ b}
D ⊆ {x : aT x ≥ b}
9
• Supporting hyperplane theorem: a boundary point of a convex
set has a supporting hyperplane passing through it
Formally: if C is a nonempty convex set, and x0 ∈ bd(C),
then there exists a such that
C ⊆ {x : aT x ≤ aT x0 }
Both of the above theorems (separating and supporting hyperplane
theorems) have partial converses; see Section 2.5 of BV
10
Operations preserving convexity
• Intersection: the intersection of convex sets is convex
• Scaling and translation: if C is convex, then
aC + b = {ax + b : x ∈ C}
is convex for any a, b
• Affine images and preimages: if f (x) = Ax + b and C is
convex then
f (C) = {f (x) : x ∈ C}
is convex, and if D is convex then
f −1 (D) = {x : f (x) ∈ D}
is convex
11
Example: linear matrix inequality solution set
Given A1 , . . . , Ak , B ∈ Sn , a linear matrix inequality is of the form
x 1 A1 + x 2 A2 + · · · + x k Ak B
for a variable x ∈ Rk . Let’s prove the set C of points x that satisfy
the above inequality is convex
Approach 1: directly verify that x, y ∈ C ⇒ tx + (1 − t)y ∈ C.
This follows by checking that, for any v,
k
X
T
v B− (txi + (1 − t)yi )Ai v ≥ 0
i=1
P
Approach 2: let f : Rk → Sn , f (x) = B − ki=1 xi Ai . Note that
C = f −1 (Sn+ ), affine preimage of convex set
12
More operations preserving convexity
• Perspective images and preimages: the perspective function is
P : Rn × R++ → Rn (where R++ denotes positive reals),
P (x, z) = x/z
for z > 0. If C ⊆ dom(P ) is convex then so is P (C), and if
D is convex then so is P −1 (D)
• Linear-fractional images and preimages: the perspective map
composed with an affine function,
Ax + b
f (x) =
cT x + d
is called a linear-fractional function, defined on cT x + d > 0.
If C ⊆ dom(f ) is convex then so if f (C), and if D is convex
then so is f −1 (D)
13
Example: conditional probability set
Let U, V be random variables over {1, . . . , n} and {1, . . . , m}. Let
C ⊆ Rnm be a set of joint distributions for U, V , i.e., each p ∈ C
defines joint probabilities
pij = P(U = i, V = j)
Let D ⊆ Rnm contain corresponding conditional distributions, i.e.,
each q ∈ D defines
qij = P(U = i|V = j)
Assume C is convex. Let’s prove that D is convex. Write
n pij o
D = q ∈ Rnm : qij = Pn , for some p ∈ C = f (C)
k=1 pkj
where f is a linear-fractional function, hence D is convex
14
f (θx + (1 − θ)y) ≤ θf (x) + (1 − θ)f (y). (3.1)
Convex functions
Geometrically, this inequality means that the line segment between (x, f (x)) and
(y, f (y)), which is the chord from x to y, lies above the graph of f (figure 3.1).
A function f is strictly convex if strict inequality holds in (3.1) whenever x ̸= y
n → R such that dom(f ) ⊆ Rn convex, and
0 < θ < 1. Wefsay
Convexandfunction: :Rf is concave if −f is convex, and strictly concave if −f is
strictly convex.
For an affine function we always have equality in (3.1), so all affine (and therefore
falso
(tx + (1
linear) − t)y)
functions ≤ tf
are both (1 − t)f
(x)and+concave.
convex (y) any
Conversely, 0 ≤ that
forfunction t≤1
is convex and concave is affine.
A function is convex if and only if it is convex when restricted to any line that
x, y ∈itsdom(f
and allintersects domain. In)other words f is convex if and only if for all x ∈ dom f and
(y, f (y))
(x, f (x))
In words, function liesof abelow
Figure 3.1 Graph the line
convex function. segment
The chord segment) be-f (x), f (y)
(i.e., line joining
tween any two points on the graph lies above the graph.
Concave function: opposite inequality above, so that
f concave ⇐⇒ −f convex
15
Important modifiers:
• Strictly convex: f tx + (1 − t)y < tf (x) + (1 − t)f (y) for
x 6= y and 0 < t < 1. In words, f is convex and has greater
curvature than a linear function
• Strongly convex with parameter m > 0: f − m 2
2 kxk2 is convex.
In words, f is at least as convex as a quadratic function
Note: strongly convex ⇒ strictly convex ⇒ convex
(Analogously for concave functions)
16
Examples of convex functions
• Univariate functions:
I Exponential function: eax is convex for any a over R
I Power function: xa is convex for a ≥ 1 or a ≤ 0 over R+
(nonnegative reals)
I Power function: xa is concave for 0 ≤ a ≤ 1 over R+
I Logarithmic function: log x is concave over R++
• Affine function: aT x + b is both convex and concave
• Quadratic function: 21 xT Qx + bT x + c is convex provided that
Q 0 (positive semidefinite)
• Least squares loss: ky − Axk22 is always convex (since AT A is
always positive semidefinite)
17
• Norm: kxk is convex for any norm; e.g., `p norms,
n
!1/p
X
kxkp = |xi |p for p ≥ 1, kxk∞ = max |xi |
i=1,...,n
i=1
and also operator (spectral) and trace (nuclear) norms,
r
X
kXkop = σ1 (X), kXktr = σr (X)
i=1
where σ1 (X) ≥ . . . ≥ σr (X) ≥ 0 are the singular values of
the matrix X
18
• Indicator function: if C is convex, then its indicator function
(
0 x∈C
IC (x) =
∞ x∈
/C
is convex
• Support function: for any set C (convex or not), its support
function
IC∗ (x) = max xT y
y∈C
is convex
• Max function: f (x) = max{x1 , . . . , xn } is convex
19
Key properties of convex functions
• A function is convex if and only if its restriction to any line is
convex
• Epigraph characterization: a function f is convex if and only
if its epigraph
epi(f ) = {(x, t) ∈ dom(f ) × R : f (x) ≤ t}
is a convex set
• Convex sublevel sets: if f is convex, then its sublevel sets
{x ∈ dom(f ) : f (x) ≤ t}
are convex, for all t ∈ R. The converse is not true
20
• First-order characterization: if f is differentiable, then f is
convex if and only if dom(f ) is convex, and
f (y) ≥ f (x) + ∇f (x)T (y − x)
for all x, y ∈ dom(f ). Therefore for a differentiable convex
function ∇f (x) = 0 ⇐⇒ x minimizes f
• Second-order characterization: if f is twice differentiable, then
f is convex if and only if dom(f ) is convex, and ∇2 f (x) 0
for all x ∈ dom(f )
• Jensen’s inequality: if f is convex, and X is a random variable
supported on dom(f ), then f (E[X]) ≤ E[f (X)]
21
Operations preserving convexity
• Nonnegative linear combination: f1 , . . . , fm convex implies
a1 f1 + · · · + am fm convex for any a1 , . . . , am ≥ 0
• Pointwise maximization: if fs is convex for any s ∈ S, then
f (x) = maxs∈S fs (x) is convex. Note that the set S here
(number of functions fs ) can be infinite
• Partial minimization: if g(x, y) is convex in x, y, and C is
convex, then f (x) = miny∈C g(x, y) is convex
22
Example: distances to a set
Let C be an arbitrary set, and consider the maximum distance to
C under an arbitrary norm k · k:
f (x) = max kx − yk
y∈C
Let’s check convexity: fy (x) = kx − yk is convex in x for any fixed
y, so by pointwise maximization rule, f is convex
Now let C be convex, and consider the minimum distance to C:
f (x) = min kx − yk
y∈C
Let’s check convexity: g(x, y) = kx − yk is convex in x, y jointly,
and C is assumed convex, so apply partial minimization rule
23
More operations preserving convexity
• Affine composition: if f is convex, then g(x) = f (Ax + b) is
convex
• General composition: suppose f = h ◦ g, where g : Rn → R,
h : R → R, f : Rn → R. Then:
I f is convex if h is convex and nondecreasing, g is convex
I f is convex if h is convex and nonincreasing, g is concave
I f is concave if h is concave and nondecreasing, g concave
I f is concave if h is concave and nonincreasing, g convex
How to remember these? Think of the chain rule when n = 1:
f 00 (x) = h00 (g(x))g 0 (x)2 + h0 (g(x))g 00 (x)
24
• Vector composition: suppose that
f (x) = h g(x) = h g1 (x), . . . , gk (x)
where g : Rn → Rk , h : Rk → R, f : Rn → R. Then:
I f is convex if h is convex and nondecreasing in each
argument, g is convex
I f is convex if h is convex and nonincreasing in each
argument, g is concave
I f is concave if h is concave and nondecreasing in each
argument, g is concave
I f is concave if h is concave and nonincreasing in each
argument, g is convex
25
Example: log-sum-exp function
P T
Log-sum-exp function: g(x) = log( ki=1 eai x+bi ), for fixed ai , bi ,
i = 1, . . . , k. Often called “soft max”, as it smoothly approximates
maxi=1,...k (aTi x + bi )
How to showPconvexity? First, note it suffices to prove convexity of
f (x) = log( ni=1 exi ) (affine composition rule)
Now use second-order characterization. Calculate
exi
∇i f (x) = Pn x`
`=1 e
x
e i exi exj
∇2ij f (x) = Pn 1{i = j} − P
`=1 e
x` ( n`=1 ex` )2
P
Write ∇2 f (x) = diag(z) − zz T , where zi = exi /( n`=1 ex` ). This
matrix is diagonally dominant, hence positive semidefinite
26
References and further reading
• S. Boyd and L. Vandenberghe (2004), “Convex optimization”,
Chapters 2 and 3
• J.P. Hiriart-Urruty and C. Lemarechal (1993), “Fundamentals
of convex analysis”, Chapters A and B
• R. T. Rockafellar (1970), “Convex analysis”, Chapters 1–10,
27