Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
11 views7 pages

Notes ch4 1

Uploaded by

wzhengmath314
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views7 pages

Notes ch4 1

Uploaded by

wzhengmath314
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

CHAPTER 3.

PROJECTED GRADIENT
DESCENT

Hao Yuan Kun Yuan

October 10, 2023

1 Problem formulation
This chapter considers the following constrained problem

min f (x), x∈X (1)


x∈Rd

where f (x) is a differentiable objective function and X is a closed convex subset of Rd .

Notation. We introduce the following notations:

• Let x? := arg minx∈X {f (x)} be the optimial solution to problem (1).

• Let f ? := minx∈X {f (x)} be the optimal function value.

2 Projection onto closed convex sets

Lemma 2.1. Given a closed convex set C ⊆ Rd , for any x ∈ Rd , there exists a
unique z ∗ ∈ C such that kx − z ∗ k ≤ kx − zk for any z ∈ C. The point z ∗ will be
called the projection of x onto C, and will be denoted by PC [x].

Proof. First, we show the existence of z ∗ . Fix x ∈ Rd , and denote δ := inf{kz − xk, z ∈ C}.
It is evident that δ ≥ 0. Now let {zk }k≥1 be a sequence of points in C such that

1
kzk − xk2 ≤ δ 2 +
k

1
for any k. We first show that {zk }k≥1 is a Cauchy sequence. Let k, l > 0 be arbitrary, and
notice that, since C is convex, we have 21 (zk + zl ) ∈ C, which implies

1
k (zk + zl ) − xk2 ≥ δ 2 . (2)
2
Expanding this inequality leads to

1 1 1
hzk − x, zl − xi ≥ δ 2 − kzk − xk2 − kzl − xk2 .
2 4 4

We now calculate kzk − zl k2 and get

kzk − zl k2 = kzk − xk2 + kzl − xk2 − 2hzk − x, zl − xi


2 2
≤ 2(kzk − xk2 + kzl − xk2 ) − 4δ 2 ≤ + .
k l
where we used the inequality 2 in the second step. Thus for any  > 0, as long as k, l ≥
d4/2 e, we have kzk − zl k ≤ , showing that {zk } is a Cauchy sequence. By the completeness
of Rd , z ∗ = limk→∞ zk exists. Since C is closed, we have z ∗ ∈ C, and by the continuity of
the norm function, we have

kz ∗ − xk = lim kzk − xk = δ,
k→∞

which is less than or equal to kz − xk for any z ∈ C, by the definition of δ.


Next, we show the uniqueness of z ∗ . Suppose both z1∗ and z2∗ satisfy kz1∗ −xk = kz2∗ −xk =
δ. Denote z̄ = 21 (z1∗ + z2∗ ), and we have

2
1 ∗ 1 1 2 1 ∗
δ 2 ≤ kz̄ − xk2 = (z − x) + (z2∗ − x) = δ + hz1 − x, z2∗ − xi,
2 1 2 2 2

which leads to

hz1∗ − x, z2∗ − xi ≥ δ 2 .

Consequently,

kz1∗ − z2∗ k2 = kz1∗ − xk2 + kz2∗ − xk2 − 2hz1∗ − x, z2∗ − xi ≤ δ 2 + δ 2 − 2δ 2 = 0,

implying that z1∗ = z2∗ . The proof is now complete.

Lemma 2.2. Let C ⊆ Rd be a closed convex set, then for any x ∈ Rd and y ∈ C,
we have y = PC [x] if and only if hz − y, x − yi ≤ 0 for any z ∈ C.

Proof. We first prove that if y = PC [x] then hz − y, x − yi ≤ 0 for any z ∈ C. To this end,
we fix x and y and suppose there exists z0 ∈ C such that hz0 − y, x − yi > 0, it then follows

2
that z0 6= y. Set z = y + t(z0 − y), it holds that

kx − zk2 − kx − yk2 = kx − y − t(z0 − y)k2 − kx − yk2


= kz0 − yk2 t2 − 2thx − y, z0 − yi
 
2 2hx − y, z0 − yi
= kz0 − yk t t − . (3)
kz0 − yk2
n o
Defining t∗ := min 1, hx−y,z 0 −yi
kz0 −yk 2 , we have 0 < t∗ ≤ 1 and z ∗ := y + t∗ (z0 − y) =
(1 − t∗ )y + t∗ z0 ∈ C. Substituting 0 < t∗ ≤ hx−y,z 0 −yi
kz0 −yk2 to (3), we have kx − z ∗ k < kx − yk,
which conflicts with y = PC [x].
Next we prove if y = PC [x] then hz − y, x − yi ≤ 0 for any z ∈ C then y = PC [x]. We
notice that kx − zk2 = k(x − y) − (z − y)k2 = kx − yk2 + kz − yk2 − 2hx − y, z − yi for any
z ∈ C. With hx − y, z − yi ≤ 0, we have kx − zk ≥ kx − yk. Since z is arbitrary, we conclude
that y = PC [x].

Lemma 2.3. Let C ⊆ Rd be a closed convex set, then kPC [x] − PC [y]k ≤ kx − yk
for any x, y ∈ Rd .

Proof. We first notice that

kx − yk2 = k(x − PC [x] + PC [y] − y) + (PC [x] − PC [y])k2


= kx − PC [x] + PC [y] − yk2 + kPC [x] − PC [y]k2
+ 2hx − PC [x] + PC [y] − y, PC [x] − PC [y]i.
= kx − PC [x] + PC [y] − yk2 + kPC [x] − PC [y]k2
− 2hy − PC [y], PC [x] − PC [y]i − 2hx − PC [x], PC [y] − PC [x]i. (4)

From Lemma 2.2, we know

hy − PC [y], PC [x] − PC [y]i ≤ 0, hx − PC [x], PC [y] − PC [x]i ≤ 0.

Substituting the above inequalities to (4), we reach kx − yk2 ≥ kPC [x] − PC [y]k2 .

3 Examples of projections
• Box: C = [η1 , η2 ]N , ∀x ∈ RN , (PC [x])i = (max{η1 , min{xi , η2 }}), i = 1, · · · , N .
η−u> x
• Hyperplane: C = {x | u> x = η, u ∈ RN , η ∈ R}, ∀x ∈ RN , PC [x] = x + kuk22
u.

3
4 Projected gradient descent
For optimization problem 1, given any arbitrary initialization variable x0 ∈ X , projected
gradient descent iterates as follows

yk+1 = xk − γ∇f (xk ), (5a)


xk+1 = PX [yk+1 ], ∀ k = 0, 1, 2, · · · (5b)

where γ is the learning rate.

5 Convergence analysis
5.1 Smooth and generally convex problem

Lemma 5.1. Suppose f (x) is L-smooth. If γ = L1 , then the sequence generated by


projected gradient descent (5) with arbitrary x0 ∈ X satisfies

1 L
f (xk+1 ) ≤ f (xk ) − k∇f (xk )k2 + kyk+1 − xk+1 k2 , k = 0, 1, 2, · · ·
2L 2

Proof. Since f (x) is L-smooth, it holds that

L
f (xk+1 ) ≤ f (xk ) + h∇f (xk ), xk+1 − xk i + kxk+1 − xk k2
2
(5a) L
= f (xk ) − Lhyk+1 − xk , xk+1 − xk i + kxk+1 − xk k2
2
L L
= f (xk ) − (kyk+1 − xk k + kxk+1 − xk k2 − kyk+1 − xk+1 k2 ) + kxk+1 − xk k2
2
2 2
L 2 L 2
= f (xk ) − kyk+1 − xk k + kyk+1 − xk+1 k
2 2
1 L
= f (xk ) − k∇f (xk )k + kyk+1 − xk+1 k2 .
2
2L 2

Lemma 5.2. Suppose f (x) is L-smooth. If γ = L1 , then the sequence generated by


projected gradient descent (5) with arbitrary x0 ∈ X satisfies

L
f (xk+1 ) ≤ f (xk ) − kxk+1 − xk k2 , k = 0, 1, 2, · · ·
2

Proof. From Lemma 2.2, we have

PX [xk − γ∇f (xk )] = xk+1


⇒ h(xk − γ∇f (xk )) − xk+1 , xk − xk+1 i ≤ 0
⇒ kxk+1 − xk k2 + hγ∇f (xk ), xk+1 − xk i ≤ 0
1
⇒ h∇f (xk ), xk+1 − xk i ≤ − kxk+1 − xk k2 = −Lkxk+1 − xk k2 (6)
γ

4
Since f (x) is L-smooth, we have

L
f (xk+1 ) ≤ f (xk ) + h∇f (xk ), xk+1 − xk i + kxk+1 − xk k2
2
(6) L
= f (xk ) − kxk+1 − xk k2 (7)
2

Theorem 5.3. Suppose f (x) is L-smooth. If γ = L1 , then the sequence generated


by projected gradient descent (5) with arbitrary x0 ∈ X satisfies

L
f (xK ) − f (x? ) ≤ kx0 − x? k2 , K > 0.
2K

Proof. First, we have

1 2
h∇f (xk ), xk − x? i = (γ k∇f (xk )k2 + kxk − x? k2 − kyk+1 − x? k2 ). (8)

From Theorem 2.2, it holds that

hyk+1 − xk+1 , x? − xk+1 i ≤ 0,

which leads to

kxk+1 − x? k2 + kyk+1 − xk+1 k2 ≤ kyk+1 − x? k2 . (9)

Substituting 9 into 8, we have

1 2
h∇f (xk ), xk − x? i ≤ (γ k∇f (xk )k2 + kxk − x? k2 − kxk+1 − x? k2 − kyk+1 − xk+1 k2 ).

(10)

Then,

K−1
X
(f (xk ) − f (x? ))
k=0
K−1
X
≤ h∇f (xk ), xk − x? i
k=0
K−1 K−1
1 X L L X
≤ k∇f (xk )k2 + kx0 − x? k2 − kyk+1 − xk+1 k2 . (11)
2L 2 2
k=0 k=0

5
From Lemma 5.1, we have

K−1 K−1
1 X X L
k∇f (xk )k2 ≤ (f (xk ) − f (xk+1 ) + kyk+1 − xk+1 k2 )
2L 2
k=0 k=0
K−1
L X
= f (x0 ) − f (xK ) + kyk+1 − xk+1 k2 .
2
k=0

Plugging this into 11, we have

K
X L
(f (xk ) − f (x? )) ≤ kx0 − x? k2 .
2
k=1

Using the Lemma 5.2, we complete the proof.

5.2 Smooth and strongly convex problem

Theorem 5.4. Let X ⊆ Rd be a closed convex set, f : X → R be differentiable, L-


smooth and µ-strongly convex. If γ = L1 , projected gradient descent 5 with arbitrary
x0 ∈ X satisfies
µ K
kxK − x? k2 ≤ (1 − ) kx0 − x? k2 , K > 0.
L

Proof.

f (xk ) − f (x∗ )
µ
≤h∇f (xk ), xk − x? i − kxk − x? k2
2
1 2 µ
≤ (γ k∇f (xk )k2 + kxk − x? k2 − kxk+1 − x? k2 − kyk+1 − xk+1 k2 ) − kxk − x? k2 , (12)
2γ 2

where the first inequality is from the strongly convex property and the last inequality is
from 10. 12 leads to

kxk+1 − x? k2 ≤ 2γ(f (x? ) − f (xk )) + γ 2 k∇f (xk )k2 − kyk+1 − xk+1 k2 + (1 − γµ)kxk − x? k2 .
(13)

Using Lemma 5.1 and Lemma 5.2, we have

1 L
f (x? ) − f (xk ) ≤ f (xk+1 ) − f (xk ) ≤ − k∇f (xk )k2 + kyk+1 − xk+1 k2 . (14)
2L 2
Substituting 14 into 13, we have
µ
kxk+1 − x? k2 ≤ (1 − )kxk − x? k2 ,
L
which completes the proof.

Remark: In Theorem 5.4, if we suppose f is differentiable, L-smooth and µ-strongly convex

6
on Rd , then the result can be strengthened as
µ K
kxK − x? k ≤ (1 − ) kx0 − x? k, K > 0.
L
The proof can be found in chap 4 Theorem 5.7.

You might also like