Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
49 views29 pages

02 Convex Sets Notes Cvxopt f22

This document defines and provides examples of convex sets. It states that a set is convex if for any two points in the set, the line segment between them is also in the set. Examples of convex sets include subspaces, affine sets defined by linear inequalities, norm balls, ellipsoids, and single points. Convex cones are defined, which are sets closed under positive linear combinations. Common convex cones include the non-negative orthant and positive semi-definite cone. Affine sets are also introduced, which generalize the concept of a subspace by allowing an offset from the origin.

Uploaded by

jchill2018
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views29 pages

02 Convex Sets Notes Cvxopt f22

This document defines and provides examples of convex sets. It states that a set is convex if for any two points in the set, the line segment between them is also in the set. Examples of convex sets include subspaces, affine sets defined by linear inequalities, norm balls, ellipsoids, and single points. Convex cones are defined, which are sets closed under positive linear combinations. Common convex cones include the non-negative orthant and positive semi-definite cone. Affine sets are also introduced, which generalize the concept of a subspace by allowing an offset from the origin.

Uploaded by

jchill2018
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

I.

Convexity

Georgia Tech ECE 6270 Notes by M. Davenport and J. Romberg. Last updated 9:47, August 29, 2022
Convex sets
In this section, we will be introduced to some of the mathematical
fundamentals of convex sets. In order to motivate some of the defini-
tions, we will look at the closest point problem from several different
angles. The tools and concepts we develop here, however, have many
other applications both in this course and beyond.

A set C ⊂ RN is convex if

x, y ∈ C ⇒ (1 − θ)x + θy ∈ C for all θ ∈ [0, 1].

In English, this means that if we travel on a straight line between


any two points in C, then we never leave C.

These sets in R2 are convex:

These sets are not:

1
Georgia Tech ECE 6270 Notes by M. Davenport and J. Romberg. Last updated 9:47, August 29, 2022
Examples of convex (and nonconvex) sets
• Subspaces. Recall that if S is a subspace of RN , then
x, y ∈ S ⇒ ax + by ∈ S for all a, b ∈ R.
So S is clearly convex.
• Affine sets. Affine sets are just subspaces that have been offset
by the origin:

{x ∈ RN : x = y + v, y ∈ T }, T = subspace,

for some fixed vector v. An equivalent definition is that


x, y ∈ C ⇒ θx + (1 − θ)y ∈ C for all θ ∈ R — the difference
between this definition and that for a subspace is that subspaces
must include the origin.
• Bound constraints. Rectangular sets of the form

C = {x ∈ RN : `1 ≤ x1 ≤ u1, `2 ≤ x2 ≤ u2, . . . , `N ≤ xN ≤ uN }

for some `1, . . . , `N , u1, . . . , uN ∈ R are convex.


• The “filled in” simplex in RN

{x ∈ RN : x1 + x2 + · · · + xN ≤ 1, x1, x2, . . . , xN ≥ 0}

is convex.
• Any subset of RN that can be expressed as a set of linear in-
equality constraints

{x ∈ RN : Ax ≤ b}

is convex. Notice that both rectangular sets and the simplex

2
Georgia Tech ECE 6270 Notes by M. Davenport and J. Romberg. Last updated 9:47, August 29, 2022
fall into this category — for the previous example, take
   
1 1 1 ··· 1 1
−1 0 0 · · · 0  0
   
A=
 0. −1 .0 · · · 0
,
 b= 
0. .
 .. ..   .. 
0 ··· −1 0

In general, when sets like these are bounded, the result is a


polyhedron.
• Norm balls. If k · k is a valid norm on RN , then

Br = {x : kxk ≤ r},

is a convex set.
• Ellipsoids. An ellipsoid is a set of the form

E = {x : (x − x0)TP −1(x − x0) ≤ r},

for a symmetric positive-definite matrix P . Geometrically, the


ellipsoid is centered at x0, its axes are oriented with the eigen-
vectors of P , and the relative widths along these axes are pro-
portional to the eigenvalues of P .
• A single point {x} is convex.
• The empty set is convex.
• The set
{x ∈ R2 : x21 − 2x1 − x2 + 1 ≤ 0}
is convex. (Sketch it!)

3
Georgia Tech ECE 6270 Notes by M. Davenport and J. Romberg. Last updated 9:47, August 29, 2022
• The set
{x ∈ R2 : x21 − 2x1 − x2 + 1 ≥ 0}
is not convex.
• The set
{x ∈ R2 : x21 − 2x1 − x2 + 1 = 0}
is certainly not convex.
• Sets defined by linear equality constraints where only some of
the constraints have to hold are in general not convex. For
example
{x ∈ R2 : x1 − x2 ≤ −1 and x1 + x2 ≤ −1}
is convex, while
{x ∈ R2 : x1 − x2 ≤ −1 or x1 + x2 ≤ −1}
is not convex.

Cones
A cone is a set C such that
x∈C ⇒ θx ∈ C for all θ ≥ 0.
Convex cones are sets which are both convex and a cone. C is a
convex cone if
x1 , x2 ∈ C ⇒ θ1x1 + θ2x2 ∈ C for all θ1, θ2 ≥ 0.
Given an x1, x2, the set of all linear combinations with positive
weights makes a wedge. For practice, sketch the region below that
consists of all such combinations of x1 and x2:

4
Georgia Tech ECE 6270 Notes by M. Davenport and J. Romberg. Last updated 9:47, August 29, 2022
x2

x1

We will mostly be interested in proper cones, which in addition to


being convex, are closed, have a non-empty interior1 (“solid”), and
do not contain entire lines (“pointed”).

Examples:
Non-negative orthant. The set of vectors whose entries are non-
negative,
RN
+ = {x ∈ R
N
: xn ≥ 0, for n = 1, . . . , N },
is a proper cone.
Positive semi-definite cone. The set of N × N symmetric matri-
ces with non-negative eigenvalues is a proper cone.
Non-negative polynomials. Vectors of coefficients of non-negative
polynomials on [0, 1],
{x ∈ RN : x1+x2t+x3t2+· · ·+xN tN −1 ≥ 0 for all 0 ≤ t ≤ 1},
form a proper cone. Notice that it is not necessary that all
the xn ≥ 0; for example t − t2 (x1 = 0, x2 = 1, x3 = −1) is
non-negative on [0, 1].
Norm cones. The subset of RN +1 defined by
{(x, t), x ∈ RN , t ∈ R : kxk ≤ t}
1
See Technical Details for precise definition.

5
Georgia Tech ECE 6270 Notes by M. Davenport and J. Romberg. Last updated 9:47, August 29, 2022
is a proper cone for any valid norm k · k and t > 0. (We
encountered this cone in our discussion about SOCPs in the
last set of notes.)

Every proper cone K defines a partial ordering or generalized


inequality. We write

x K y when y − x ∈ K.

For example, for vectors x, y ∈ RN , we say

x RN+ y when xn ≤ yn for all n = 1, . . . , N.

For symmetric matrices X, Y , we say

X S+N Y when Y − X has non-negative eigenvalues.

We will typically just use  when the context makes it clear. In fact,
for RN+ we will just write x ≤ y (as we did above) to mean that
the entries in x are component-by-component upper-bounded by the
entries in y.

Partial orderings obey share of the properties of the standard ≤ on


the real line. For example:

x  y, u  v ⇒ x + u  y + v.

But other properties do not hold; for example, it is not necessary


that either x  y or y  x. For an extensive list of properties of
partial orderings (most of which will make perfect sense on sight) can
be found in [BV04, Chapter 2.4].

6
Georgia Tech ECE 6270 Notes by M. Davenport and J. Romberg. Last updated 9:47, August 29, 2022
Affine sets
Recall the definition of a linear subspace: a set T ⊂ RN is a subspace
if
x, y ∈ T ⇒ αx + βy ∈ T , for all α, β ∈ R.
Affine sets (also referred to as affine spaces) are not fundamentally
different than subspaces. An affine set S is simply a subspace that
has been offset from the origin:
S = T + v0,
for some subspace T and v 0 ∈ RN . (It thus make sense to talk
about the dimension of S as being the dimension of this underlying
subspace.) We can recast this as a definition similar to the above: a
set S ⊂ RN is affine if
x, y ∈ S ⇒ λx + (1 − λ)y ∈ S, for all λ ∈ R.

Just as we can find the smallest subspace that contains a finite set
of vector {v 1, . . . , v K } by taking their span,
( K
)
X
Span({v 1, . . . , v K }) = x ∈ RN : x = αk v k , α k ∈ R ,
k=1

we can define the affine hull (the smallest affine set that contains
the vectors) as
( K K
)
X X
Aff({v 1, . . . , v K }) = x ∈ RN : x = λk v k , λk ∈ R, λk = 1 .
k=1 k=1

Example: Let    
1 1/2
v1 = , v2 = .
0 1/2

7
Georgia Tech ECE 6270 Notes by M. Davenport and J. Romberg. Last updated 9:47, August 29, 2022
Then Span({v 1, v 2}) is all of R2 while Aff({v 1, v 2}) is the line that
connects v 1 and v 2,

Aff({v 1, v 2}) = x ∈ R2 : x1 + x2 = 1 .

Just as any linear subspace T of dimension K can be described using


a homogeneous set of equations,

x ∈ T ⇔ Ax = 0,

using any (N − K) × N matrix A whose nullspace is T , any affine


set S of dimension K can be described as the solution to a linear
system of equations

x ∈ S ⇔ Ax = b,

for some (N − K) × N matrix A and b ∈ RN −K .


It should be clear that every subspace is an affine set, but not every
affine set is a subspace. It is easy to show that an affine set is a
subspace if and only if it contains the 0 vector.
Affine sets are of course convex.

Hyperplanes and halfspaces


Hyperplanes and halfspaces are both very simple constructs, but they
will be crucial to our understanding to convex sets, functions, and
optimization problems.

A hyperplane is an affine set of dimension N − 1; it has the form

{x ∈ RN : hx, ai = t}

8
Georgia Tech ECE 6270 Notes by M. Davenport and J. Romberg. Last updated 9:47, August 29, 2022
for some fixed vector a 6= 0 and scalar t. When t = 0, this set is
a subspace of dimension N − 1, and contains all vectors that are
orthogonal to a. For t 6= 0, this is an affine space consisting of all
the vectors orthogonal to a (call this set A⊥) offset to some x0:
{x ∈ RN : hx, ai = t} = {x ∈ RN : x = x0 + A⊥},
for any x0 with hx0, ai = t. We might take x0 = t · a/kak22, for
instance. The point is, a is a normal vector of the set.

Here are some examples in R2:

5
5

4 t=1 t=5
4

3
3
2
2


<latexit sha1_base64="kc1SV4vC3OZ6kQgjaXPGd2X/2Ys=">AAACHXicbVDLSsNAFJ34rPVVdelmsAiuSlKKdiMU3LisYB/QhDKZ3LRDJ5MwMxFL6I+48VfcuFDEhRvxb5y2QbT1wDCHc+7l3nv8hDOlbfvLWlldW9/YLGwVt3d29/ZLB4dtFaeSQovGPJZdnyjgTEBLM82hm0ggkc+h44+upn7nDqRisbjV4wS8iAwECxkl2kj9Us31Yx6ocWS+jEzwJXZ9GDCR+RHRkt1PcNV1sYNdEMGP1i+V7Yo9A14mTk7KKEezX/pwg5imEQhNOVGq59iJ9jIiNaMcJkU3VZAQOiID6BkqSATKy2bXTfCpUQIcxtI8ofFM/d2RkUhNDzCVZr+hWvSm4n9eL9Vh3cuYSFINgs4HhSnHOsbTqHDAJFDNx4YQKpnZFdMhkYRqE2jRhOAsnrxM2tWKc16p3dTKjXoeRwEdoxN0hhx0gRroGjVRC1H0gJ7QC3q1Hq1n6816n5euWHnPEfoD6/Mb6qmiaA==</latexit>

1
<latexit sha1_base64="M1Fz+g8AWbpYRY64DpjJRqAdUBo=">AAACHXicbVDLSsNAFJ3UV62vqks3g0VwVRIp2o1QcOOygn1AE8pkctMOnUzCzEQsIT/ixl9x40IRF27Ev3H6QLT1wDCHc+7l3nv8hDOlbfvLKqysrq1vFDdLW9s7u3vl/YO2ilNJoUVjHsuuTxRwJqClmebQTSSQyOfQ8UdXE79zB1KxWNzqcQJeRAaChYwSbaR+ueb6MQ/UODJfRnJ8iV0fBkxkfkS0ZPc5dlwX29gFEfxo/XLFrtpT4GXizEkFzdHslz/cIKZpBEJTTpTqOXaivYxIzSiHvOSmChJCR2QAPUMFiUB52fS6HJ8YJcBhLM0TGk/V3x0ZidTkAFNp9huqRW8i/uf1Uh3WvYyJJNUg6GxQmHKsYzyJCgdMAtV8bAihkpldMR0SSag2gZZMCM7iycukfVZ1zqu1m1qlUZ/HUURH6BidIgddoAa6Rk3UQhQ9oCf0gl6tR+vZerPeZ6UFa95ziP7A+vwG54GiZg==</latexit>

2
1 a=
a= 1 1
0

1 1 2 3 4 5
1 1 2 3 4 5

t= 1 t=1 t=5

A halfspace is a set of the form


{x ∈ RN : hx, ai ≤ t}
for some fixed vector a 6= 0 and scalar t. For t = 0, the halfspace
contains all vectors whose inner product with a is negative (i.e. the
angle between x and a is greater than 90◦). Here is a simple example:

9
Georgia Tech ECE 6270 Notes by M. Davenport and J. Romberg. Last updated 9:47, August 29, 2022
5
t=5
4

3 <latexit sha1_base64="kc1SV4vC3OZ6kQgjaXPGd2X/2Ys=">AAACHXicbVDLSsNAFJ34rPVVdelmsAiuSlKKdiMU3LisYB/QhDKZ3LRDJ5MwMxFL6I+48VfcuFDEhRvxb5y2QbT1wDCHc+7l3nv8hDOlbfvLWlldW9/YLGwVt3d29/ZLB4dtFaeSQovGPJZdnyjgTEBLM82hm0ggkc+h44+upn7nDqRisbjV4wS8iAwECxkl2kj9Us31Yx6ocWS+jEzwJXZ9GDCR+RHRkt1PcNV1sYNdEMGP1i+V7Yo9A14mTk7KKEezX/pwg5imEQhNOVGq59iJ9jIiNaMcJkU3VZAQOiID6BkqSATKy2bXTfCpUQIcxtI8ofFM/d2RkUhNDzCVZr+hWvSm4n9eL9Vh3cuYSFINgs4HhSnHOsbTqHDAJFDNx4YQKpnZFdMhkYRqE2jRhOAsnrxM2tWKc16p3dTKjXoeRwEdoxN0hhx0gRroGjVRC1H0gJ7QC3q1Hq1n6816n5euWHnPEfoD6/Mb6qmiaA==</latexit>


2
a=
1
2

1 1 2 3 4 5

Separating hyperplanes
If two convex sets are disjoint, then there is a hyperplane that sepa-
rates them. Here is a picture:

This fact is intuitive, and is incredibly useful in understanding the


solutions to convex optimization programs (we will see this even in

10
Georgia Tech ECE 6270 Notes by M. Davenport and J. Romberg. Last updated 9:47, August 29, 2022
the next section). It is also not true in general if one of the sets is
nonconvex; observe:

For sets C, D ⊂ RN , we say that a hyperplane H = {x : hx, ai = t}


• separates C and D if for all c ∈ C, d ∈ D
hc, ai ≤ t ≤ hd, ai for all c ∈ C, d ∈ D; (1)

• properly separates C and D if (1) holds and both C and D are


not contained in H themselves;
• strictly separates C and D if
hc, ai < t < hd, ai for all c ∈ C, d ∈ D;

• strongly separates C and D if there exists  > 0 such that


hc, ai ≤ t −  and hd, ai ≥ t +  for all c ∈ C, d ∈ D.

Note that we can switch the roles of C and D above, i.e. we also say
H separates C and D if hd, ai ≤ t ≤ hc, ai for all c ∈ C, d ∈ D.

Let us start by showing the following:

11
Georgia Tech ECE 6270 Notes by M. Davenport and J. Romberg. Last updated 9:47, August 29, 2022
Strong separating hyperplane theorem
Let C and D be disjoint nonempty closed convex sets and let C
be bounded. Then there is a hyperplane that strongly separates
C and D.

To prove this, we show how to explicitly construct a strongly sepa-


rating hyperplane. Let d(x, D) be the distance of a point x to the
set D:
d(x, D) = inf kx − yk2.
y∈D

As we will see below, since D is closed, there is a unique closest point


to x that achieves the infimum on the right. It is also true that
d(x, D) is continuous as a function of x, so by the Weierstrauss ex-
treme value theorem it achieves its minimum value over the compact
set C. That is to say, there exist points c ∈ C and d ∈ D that
achieve the minimum distance
kc − dk2 = inf kx − yk2.
x∈C,y∈D

Since C and D are disjoint, we have c − d 6= 0.


Define
kdk22 − kck22 kc − dk22
a = d − c, t= , = .
2 2
We will show that for these choices,
hx, ai ≤ t −  for x ∈ C, hx, ai ≥ t +  for x ∈ D,
To see this, we will set
f (x) = hx, ai − t,

12
Georgia Tech ECE 6270 Notes by M. Davenport and J. Romberg. Last updated 9:47, August 29, 2022
and show that for any point u ∈ D, we have f (u) ≥ .
Here is a picture to help visualize the proof:

{x : hx, ai = t}
<latexit sha1_base64="3pJds1SI+lNMyWdEoWfDHGD3MiM=">AAACOHicbVDLSgMxFM34tr6qLt0MFsGFlBkpKoIguHGngrWFppRMeqcNZjJDckcsw/Sv3PgZ7sSNC0Xc+gWmj4VaD4ScnHsuufcEiRQGPe/ZmZqemZ2bX1gsLC2vrK4V1zduTJxqDlUey1jXA2ZACgVVFCihnmhgUSChFtyeDeq1O9BGxOoaewk0I9ZRIhScoZVaxQsqIUSa0SCWbdOL7JXd5/3jPpVMdST81vd+PllO9dDTP+kj1aLTRZq3iiWv7A3hThJ/TEpkjMtW8Ym2Y55GoJBLZkzD9xJsZkyj4BLyAk0NJIzfsg40LFUsAtPMhovn7o5V2m4Ya3sUukP1Z0fGIjOY1jojhl3ztzYQ/6s1UgyPmplQSYqg+OijMJUuxu4gRbctNHCUPUsY18LO6vIu04yjzbpgQ/D/rjxJbvbL/kG5clUpnR6P41ggW2Sb7BKfHJJTck4uSZVw8kBeyBt5dx6dV+fD+RxZp5xxzyb5BefrG8yasJg=</latexit>

C
c
a=d c
kdk22 kck22
<latexit sha1_base64="Xc2eDSyrYLpJm12+9yykn/bsr/w=">AAACJnicbVDLSsNAFJ34rPUVdelmsAhuLEkpKkKh4MZlBfuAJobJZNIOnTyYmQglzde48VfcuKiIuPNTnLRZ2NYDwxzOuZd773FjRoU0jG9tbX1jc2u7tFPe3ds/ONSPjjsiSjgmbRyxiPdcJAijIWlLKhnpxZygwGWk647ucr/7TLigUfgoxzGxAzQIqU8xkkpy9IaEDWj5HOHUmlhuxDwxDtSXepk1cWpPNXgJFw1cGFlayxy9YlSNGeAqMQtSAQVajj61vAgnAQklZkiIvmnE0k4RlxQzkpWtRJAY4REakL6iIQqIsNPZmRk8V4oH/YirF0o4U/92pCgQ+ZKqMkByKJa9XPzP6yfSv7FTGsaJJCGeD/ITBmUE88ygRznBko0VQZhTtSvEQ6QykyrZsgrBXD55lXRqVfOqWn+oV5q3RRwlcArOwAUwwTVognvQAm2AwQt4A1Pwob1q79qn9jUvXdOKnhOwAO3nF9tzpgU=</latexit>

t=
2
d

First, we prove the basic geometric fact that for any two vectors x, y,
if kx + θyk2 ≥ kxk2 for all θ ∈ [0, 1] then hx, yi ≥ 0. (2)
To establish this, we expand the norm as
kx + θyk22 = kxk22 + θ2kyk22 + 2θhx, yi,
from which we can immediately deduce that
θ
kyk22 + hx, yi ≥ 0 for all θ ∈ [0, 1]
2
⇒ hx, yi ≥ 0.

Now let u be an arbitrary point in D. Since D is convex, we know


that d + θ(u − d) ∈ D for all θ ∈ [0, 1]. Since d is as close to c as
any other point in D, we have
kd + θ(u − d) − ck2 ≥ kd − ck2,

13
Georgia Tech ECE 6270 Notes by M. Davenport and J. Romberg. Last updated 9:47, August 29, 2022
and so by (2), we know that
hd − c, u − di ≥ 0.
This means
kdk22 − kck22
f (u) = hu, d − ci −
2
hd + c, d − ci
= hu, d − ci −
2
= hu − (d + c)/2, d − ci
= hu − d + d/2 − c/2, d − ci
kc − dk22
= hu − d, d − ci +
2
2
kc − dk2
≥ .
2
The argument that f (v) ≤ − for every v ∈ C is exactly the same.

We will not prove it here, but there is an even more interesting result
that says that the sets C and D do not even have to be disjoint —
they can intersect at one or more points along their boundaries as
shown here:

14
Georgia Tech ECE 6270 Notes by M. Davenport and J. Romberg. Last updated 9:47, August 29, 2022
Separating hyperplane theorem
Nonempty convex sets C, D ⊂ RN can be (properly) separated by
a hyperplane if and only if their relative interiors are disjoint:

relint(C) ∩ relint(D) = ∅.

See the Technical Details for what exactly is meant by “relative in-
terior” but it is basically everything not on the natural boundary of
the set once we account for the fact that it might have dimension
smaller than N .

15
Georgia Tech ECE 6270 Notes by M. Davenport and J. Romberg. Last updated 9:47, August 29, 2022
Supporting hyperplanes
A direct consequence of the separating hyperplane theorem is that
every point on the boundary of a convex set C can be separated from
its interior.
If a 6= 0 satisfies hx, ai ≤ hx0, ai for all x ∈ C, then

H = {x : hx, ai = hx0, ai}

is called a supporting hyperplane to C at x0. Here’s a picture:

x0

The hyperplane is tangent to C at x0, and the halfspace {x : hx, ai ≤


hx0, ai} contains C. We call H a proper supporting hyperplane if
hy, ai < hx0, ai for at least one y ∈ C.

The supporting hyperplane theorem says that a supporting hyper-


plane exists at every point x0 on the boundary of a (non-empty)
convex set C.

Supporting hyperplane theorem


Let C ⊂ RN be a convex set, and let x0 ∈ C. Then there is a
supporting hyperplane at x0 if and only if x0 ∈
6 relint(C).

16
Georgia Tech ECE 6270 Notes by M. Davenport and J. Romberg. Last updated 9:47, August 29, 2022
Proof of this theorem (and the general separating hyperplane theo-
rem above) can be found in [Roc70, Chap. 11].
Note that there might be more than one supporting hyperplane at
an boundary point:

x0

The closest point problem


Let x0 ∈ RN be given, and let C be a non-empty, closed, convex set.
The projection of x0 onto C is the closest point (in the standard
Euclidean distance, for now) in C to x0:
PC (x0) = arg min kx0 − yk2
y∈C

We will see below that there is a unique minimizer to this problem,


and that the solution has geometric properties that are analogous to
the case where C is a subspace.

Projection onto a subspace

Let’s recall how we solve this problem in the special case where
C := T is a K-dimensional subspace. In this case, the solution

17
Georgia Tech ECE 6270 Notes by M. Davenport and J. Romberg. Last updated 9:47, August 29, 2022
x̂ = PT (x0) is unique, and is characterized by the orthogonality
principle:
x0 − x̂ ⊥ T
meaning that hy, x0 − x̂i = 0 for all y ∈ T . The proof of this fact
is reviewed in the Technical Details section at the end of these notes.

The orthogonality principle leads immediately to an algorithm for


calculating PT (x0). Let v 1, . . . , v K be a basis for T ; we can write
the solution as
K
X
x̂ = αk v k ;
k=1

solving for the expansion coefficients αk is the same as solving for x̂.
We know that
K
X
hx0 − αj v j , v k i = 0, for k = 1, . . . , K,
j=1

and so the αk must obey the linear system of equations


K
X
αj hv j , v k i = hx0, v k i, for k = 1, . . . , K.
j=1

Concatenating the v k as columns in the N × K matrix V , and the


entries αk into the vector α ∈ RK , we can write the equations above
as
V T V α = V T x0 .
Since the {v k } are a basis for T (i.e. they are linearly independent),
V TV is invertible, and we can solve for the best expansion coeffi-
cients:
α̂ = (V TV )−1V Tx0.

18
Georgia Tech ECE 6270 Notes by M. Davenport and J. Romberg. Last updated 9:47, August 29, 2022
Using these expansion coefficient to reconstruct x̂ yields

x̂ = V α̂ = V (V TV )−1V Tx0.

In this case, the projector PT (·) is a linear map, specified by N ×N


matrix V (V TV )−1V T.

Projection onto an affine set

As discussed above, affine sets are not fundamentally different than


subspaces; any affine set C can be written as a subspace T plus an
offset v 0:

C = T + v 0 = {x : x = y + v 0, y ∈ C}.

This makes it easy to translate the results for subspaces above to


say that the projection onto an affine set is unique, and obeys the
orthogonality principle

hy − x̂, x0 − x̂i = 0, for all y ∈ C. (3)

You can solve this problem by shifting x0 and C by negative v 0,


projecting x0 − v 0 onto the subspace C − v 0, and then shifting the
answer back.

Projection onto a general convex set

In general, there is no closed-form expression for the projector onto


a given convex set. However, the concepts above (orthogonality,
projection onto a subspace) can help us understand the solution for
an arbitrary convex set.

19
Georgia Tech ECE 6270 Notes by M. Davenport and J. Romberg. Last updated 9:47, August 29, 2022
Uniqueness of closest point
If C ⊂ RN is closed and convex, then for any x0, the program

minimize kx0 − yk2 (4)


y∈C

has a unique solution.

First, let’s argue that at least one minimizer to (4). Let x0 be any
point in C, and set B = {x : kx − x0k ≤ kx0 − x0k2}. By
construction, if a minimizer exists, it must be in the set C ∩ B. Since
C ∩B is closed and bounded and kx0 −yk2 is a continuous function of
y, by the Weierstrass extreme value theorem we know that there is at
least one point in the set where this functions achieves it minimum.
Hence there exists at least one solution x̂ to (4).
We can now argue that x̂ is the only minimizer of (4). Consider first
all the points y ∈ C such that y − x0 is co-aligned with x̂ − x0. Let
I = {α ∈ R : x̂ + α(x0 − x̂) ∈ C}.
(Note that if y = x̂ + α(x0 − x̂), then y − x0 = (1 − α)(x̂ − x0)
and so the two difference vectors are co-aligned.) Since C is convex
and closed, this is a closed interval of the real line (that contains at
least the point α = 0). The function
g(α) = kx0 − x̂ − α(x0 − x̂)k22 = (1 − α)2kx0 − x̂k22,
captures the distance of the co-aligned vector for every value of α.
Since, as a function of α, this a parabola with strictly positive second
derivative, it takes its minima at exactly one place on the interval
I and by construction this is α = 0. So any y 6= x̂ with y − x0
co-aligned with x̂ − x0 cannot be another minimizer of (4).

20
Georgia Tech ECE 6270 Notes by M. Davenport and J. Romberg. Last updated 9:47, August 29, 2022
Now let y be any point in C such that the difference vectors are not
co-aligned. We will show that y cannot minimize (4) because the
point x̂/2 + y/2 ∈ C is definitively closer to x0. We have
2
x̂ y x0 − x̂ x0 − y 2
x0 − − = +
2 2 2 2 2 2
2
kx0 − x̂k2 kx0 − ŷk22 hx0 − x̂, x0 − yi
= + +
4 4 2
2 2
kx0 − x̂k2 kx0 − ŷk2 kx0 − x̂k2 kx0 − yk2
< + +

4 4 2
2
kx0 − x̂k2 kx0 − yk2
= +
2 2
≤ kx0 − yk22.

The strict inequality above follows from Cauchy-Schwarz, while the


last inequality follows from the fact that x̂ is a minimizer. This shows
that no y 6= x̂ can also minimize (4), and so x̂ is unique.

Similar to the orthogonality principle for projecting onto an affine


set or linear space, there is a clean geometric optimality condition
for the closest point in a convex set.

Obtuseness principle
PC (x0) = x̂ if and only if

hy − x̂, x0 − x̂i ≤ 0 for all y ∈ C. (5)

Compare (5) with (3) above. Here is a picture:

21
Georgia Tech ECE 6270 Notes by M. Davenport and J. Romberg. Last updated 9:47, August 29, 2022
x0

We first prove that (5)⇒ PC (x0) = x̂. For any y ∈ C, we have


ky − x0k22 = ky − x̂ + x̂ − x0k22
= ky − x̂k22 + kx̂ − x0k22 + 2hy − x̂, x̂ − x0i
≥ kx̂ − x0k22 + 2hy − x̂, x̂ − x0i.
Note that the inner product term above is the same as (5), but with
the sign of the second argument flipped, so we know this term must
be non-negative. Thus
ky − x0k22 ≥ kx̂ − x0k22.
Since this holds uniformly over all y ∈ C, x̂ must be the closest point
in C to x0.

We now show that PC (x0) = x̂ ⇒ (5). Let y be an arbitrary point


in C. Since C is convex, the point x̂ + θ(y − x̂) must also be in C
for all θ ∈ [0, 1]. Since x̂ = PC (x0),
kx̂ + θ(y − x̂) − x0k2 ≥ kx̂ − x0k2 for all θ ∈ [0, 1].
By our intermediate result (2) a few pages ago, this means
hy − x̂, x̂ − x0i ≥ 0 ⇒ hy − x̂, x0 − x̂i ≤ 0.

22
Georgia Tech ECE 6270 Notes by M. Davenport and J. Romberg. Last updated 9:47, August 29, 2022
Technical Details: closest point to a subspace
In this section, we establish the orthogonality principle for projection
of a point x0 onto a subspace T . Let x̂ be a vector which obeys
ê = x − x̂ ⊥ T .
We will show that x̂ is the unique closest point to x0 in T . Let y
be any other vector in T , and set
e = x − y.
We will show that
kek > kêk (i.e. that kx − yk > kx − x̂k) .

Note that
kek2 = kx − yk2 = kê − (y − x̂)k2
= hê − (y − x̂) , ê − (y − x̂)i
= kêk2 + ky − x̂k2 − hê, y − x̂i − hy − x̂, êi.
Since y − x̂ ∈ T and ê ⊥ T ,
hê, y − x̂i = 0, and hy − x̂, êi = 0,
and so
kek2 = kêk2 + ky − x̂k2.
Since all three quantities in the expression above are positive and
ky − x̂k > 0 ⇔ y 6= x̂,
we see that
y 6= x̂ ⇔ kek > kêk.
We leave it as an exercise to establish the converse; that if hy, x̂ −
x0i = 0 for all y ∈ T , then x̂ is the projection of x0 onto T .

23
Georgia Tech ECE 6270 Notes by M. Davenport and J. Romberg. Last updated 9:47, August 29, 2022
Technical Details: Basic analysis in RN
This section contains a brief review of basic topological concepts in
RN . Our discussion will take place using the standard Euclidean
distance measure (i.e. `2 norm), but all of these definitions can be
generalized to other metrics. An excellent source for this material is
[Rud76].

Basic topology

We say that a sequence of vectors {xk , k = 1, 2, . . .} converges to


x̂ if
kxk − x̂k2 → 0 as k → ∞.
More precisely, this means that for every  > 0, there exists an n
such that
kxk − x̂k2 ≤  for all k ≥ n.
It is easy to show that a sequence of vectors converge if and only if
their individual components converge point-by-point. For a conver-
gent sequence like the above, we often write limk→∞ xk = x̂.

A set X is open if we can draw a small ball around every point in


X which is also entirely contained in X . More precisely, let B(x, )
be the set of all points within  of x:
B(x, ) = {y ∈ RN : kx − yk2 ≤ }.
Then X is open if for every x ∈ X , there exists an x > 0 such that
B(x, x) ⊂ X . The standard example here is open intervals of the
real line, e.g. (0, 1).

There are many ways to define closed sets. The easiest is that a
set X is closed if its complement is open. A more illuminating (and

24
Georgia Tech ECE 6270 Notes by M. Davenport and J. Romberg. Last updated 9:47, August 29, 2022
equivalent) definition is that X is closed if it contains all of its limit
points. A vector x̂ is a limit point of X if there exists a sequence
of vectors {xk } ⊂ X that converge to x̂.

The closure of general set X , denoted cl(X ), is the set of all limit
points of X . Note that every x ∈ X is trivially a limit point (take
the sequence xk = x), so X ⊂ cl(X ). By construction, cl(X ) is the
smallest closed set that contains X .

The set X is bounded if we can find a uniform upper bound on


the distance between two points it contains; this upper bound is
commonly referred to as the diameter of the set:

diam X = sup kx − yk2.


x,y∈X

The set X ⊂ RN is compact if it is closed and bounded. A key


fact about compact sets is that every sequence has a convergent sub-
sequence — this is known as the Bolzano-Weierstrauss theorem.

Interiors and boundaries of sets

Related to the definition of open and closed sets are the technical
definitions of boundary and interior. The interior of a set X is the
collection of points around which we can place a ball of finite width
which remains in the set:

int(X ) = {x ∈ X : ∃ > 0 such that B(x, ) ⊂ X }.

We will actually be more concerned with the concept of relative


interior. Let’s motivate this quickly with an example. Consider

25
Georgia Tech ECE 6270 Notes by M. Davenport and J. Romberg. Last updated 9:47, August 29, 2022
the unit simplex in R3

∆ = {x ∈ R3 : x1 + x2 + x3 = 1, xi ≥ 0}.

This is essentially a “flat” (two dimensional) triangle embedded into


three dimensional space. Here is a picture:

By the technical definition of interior above, no point in ∆ is an


interior point, as any ball drawn around a x0 ∈ ∆ will contain points
points with x1 + x2 + x3 6= 1. But somehow this does not capture the
fact that the 2D triangle itself has “points on the edges” and “points
in the middle”.
To rectify this, we introduce the relative interior of a set as

relint(X ) = {x ∈ X : ∃ > 0 such that B(x, ) ∩ Aff(X ) ⊂ X }

where Aff(X ) is the smallest affine set that contains X . This means
that if the set we are analyzing can be embedded in a low-dimensional
affine space, then we define interior points relative to this set. For
the simplex, we have

Aff(∆) = {x : x1 + x2 + x3 = 1},

and
relint(∆) = {x ∈ ∆ : x1, x2, x3 > 0}.

26
Georgia Tech ECE 6270 Notes by M. Davenport and J. Romberg. Last updated 9:47, August 29, 2022
For convex sets, we also have the equivalent and perhaps more intu-
itive definition of relative interior,
relint(C) = {x ∈ C : ∃y ∈ C, ∃λ > 1 such that λx+(1−λ)y ∈ C}.

The boundary of X is the set of points in cl(X ) that are not in the
(relative) interior:
bd(X ) = cl(X )\ relint(X ).

Functions, continuity, and extrema

A function f : RN → R is continuous if for every  > 0 there


exists a δ > 0 such that
kx1 − x2k2 ≤ δ ⇒ |f (x1) − f (x2)| ≤ ,
for all x1, x2. f is called Lipschitz if the δ can be taken proportional
to ; there exists an L such that
|f (x1) − f (x2)| ≤ Lkx1 − x2k2, for all x1, x2.

A function f is called bounded on a set X if there exists an M such


that |f (x)| ≤ M for all x ∈ X . The supremum of f on X is the
smallest upper bound on f and the infimum of f is that greatest
lower bound. We have these terms since maximizers and minimizers
do not always exist. For example
min e−x
x≥0

does not exist; there is no value x0 that we can choose where we can
definitively say that e−x0 ≤ e−x for all x ≥ 0. The infimum, however,
always exists:
inf e−x = 0.
x≥0

27
Georgia Tech ECE 6270 Notes by M. Davenport and J. Romberg. Last updated 9:47, August 29, 2022
When there is a point in X where f achieves its infimum, then of
course the operations agree, e.g.
inf (x − 1/2)2 = min (x − 1/2)2 = 0.
x∈[0,1] x∈[0,1]

It is also true that every continuous function on a compact set is


bounded. The Weierstrass extreme value theorem tells us
even more; it states that a continuous function always achieves its
infimum (minimizer) and supremem (maximizer) over a compact set
X . That is, there exists x? ∈ X such that
f (x?) = sup f (x),
x∈X

and x? ∈ X such that


f (x?) = inf f (x).
x∈X

Because of this, we can freely replace sup with max and inf with min.
This might be viewed as a fundamental result in optimization, as it
gives very general (sufficient) conditions under which optimization
problems have well-defined solutions.

References
[BV04] S. Boyd and L. Vandenberghe. Convex Optimization.
Cambridge University Press, 2004.
[Roc70] R. T. Rockafellar. Convex Analysis. Princeton University
Press, 1970.
[Rud76] W. Rudin. Principles of Mathematical Analysis.
McGraw-Hill, 1976.

28
Georgia Tech ECE 6270 Notes by M. Davenport and J. Romberg. Last updated 9:47, August 29, 2022

You might also like