Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
69 views39 pages

Subgradients

The document provides an overview of subgradients and subdifferential calculus for convex functions. It defines subgradients and subdifferentials, and proves properties like the subdifferential being a closed convex set and bounded when the domain is interior. It presents rules for computing subgradients of pointwise max functions, nonnegative combinations of functions, and affine transformations. Examples are given for absolute value, Euclidean norm, and piecewise-linear functions. The document outlines subgradient calculus for finding both individual subgradients and entire subdifferentials.

Uploaded by

williamrob104
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
69 views39 pages

Subgradients

The document provides an overview of subgradients and subdifferential calculus for convex functions. It defines subgradients and subdifferentials, and proves properties like the subdifferential being a closed convex set and bounded when the domain is interior. It presents rules for computing subgradients of pointwise max functions, nonnegative combinations of functions, and affine transformations. Examples are given for absolute value, Euclidean norm, and piecewise-linear functions. The document outlines subgradient calculus for finding both individual subgradients and entire subdifferentials.

Uploaded by

williamrob104
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 39

L.

Vandenberghe ECE236C (Spring 2022)

2. Subgradients

• definition

• subgradient calculus

• duality and optimality conditions

• directional derivative

2.1
Basic inequality

recall the basic inequality for differentiable convex functions:

𝑓 (𝑦) ≥ 𝑓 (𝑥) + ∇ 𝑓 (𝑥)𝑇 (𝑦 − 𝑥) for all 𝑦 ∈ dom 𝑓

(𝑥, 𝑓 (𝑥))
 
∇ 𝑓 (𝑥)
−1

• the first-order approximation of 𝑓 at 𝑥 is a global lower bound


• ∇ 𝑓 (𝑥) defines non-vertical supporting hyperplane to epigraph of 𝑓 at (𝑥, 𝑓 (𝑥)) :
 𝑇     
∇ 𝑓 (𝑥) 𝑦 𝑥
− ≤ 0 for all (𝑦, 𝑡) ∈ epi 𝑓
−1 𝑡 𝑓 (𝑥)

Subgradients 2.2
Subgradient

𝑔 is a subgradient of a convex function 𝑓 at 𝑥 ∈ dom 𝑓 if

𝑓 (𝑦) ≥ 𝑓 (𝑥) + 𝑔𝑇 (𝑦 − 𝑥) for all 𝑦 ∈ dom 𝑓

𝑓 (𝑦)

𝑓 (𝑥 1) + 𝑔𝑇1 (𝑦 − 𝑥1)

𝑓 (𝑥 1) + 𝑔𝑇2 (𝑦 − 𝑥1)

𝑓 (𝑥2) + 𝑔𝑇3 (𝑦 − 𝑥2)

𝑥1 𝑥2

𝑔1, 𝑔2 are subgradients at 𝑥1; 𝑔3 is a subgradient at 𝑥2

Subgradients 2.3
Subdifferential

the subdifferential 𝜕 𝑓 (𝑥) of 𝑓 at 𝑥 is the set of all subgradients:

𝜕 𝑓 (𝑥) = {𝑔 | 𝑔𝑇 (𝑦 − 𝑥) ≤ 𝑓 (𝑦) − 𝑓 (𝑥), ∀𝑦 ∈ dom 𝑓 }

Properties

• 𝜕 𝑓 (𝑥) is a closed convex set (possibly empty)


this follows from the definition: 𝜕 𝑓 (𝑥) is an intersection of halfspaces

• if 𝑥 ∈ int dom 𝑓 then 𝜕 𝑓 (𝑥) is nonempty and bounded


proof on next two pages

Subgradients 2.4
Proof: we show that 𝜕 𝑓 (𝑥) is nonempty when 𝑥 ∈ int dom 𝑓

• (𝑥, 𝑓 (𝑥)) is in the boundary of the convex set epi 𝑓

• therefore there exists a supporting hyperplane to epi 𝑓 at (𝑥, 𝑓 (𝑥)) :


 𝑇     
𝑎 𝑦 𝑥
∃(𝑎, 𝑏) ≠ 0, − ≤0 ∀(𝑦, 𝑡) ∈ epi 𝑓
𝑏 𝑡 𝑓 (𝑥)

• 𝑏 > 0 gives a contradiction as 𝑡 → ∞

• 𝑏 = 0 gives a contradiction for 𝑦 = 𝑥 + 𝜖 𝑎 with small 𝜖 > 0

1
• therefore 𝑏 < 0 and 𝑔 = 𝑎 is a subgradient of 𝑓 at 𝑥
|𝑏|

Subgradients 2.5
Proof: 𝜕 𝑓 (𝑥) is bounded when 𝑥 ∈ int dom 𝑓

• for small 𝑟 > 0, define a set of 2𝑛 points

𝐵 = {𝑥 ± 𝑟𝑒 𝑘 | 𝑘 = 1, . . . , 𝑛} ⊂ dom 𝑓

and define 𝑀 = max 𝑓 (𝑦) < ∞


𝑦∈𝐵
• for every 𝑔 ∈ 𝜕 𝑓 (𝑥) , there is a point 𝑦 ∈ 𝐵 with

𝑟 k𝑔k∞ = 𝑔𝑇 (𝑦 − 𝑥)

(choose an index 𝑘 with |𝑔 𝑘 | = k𝑔k∞, and take 𝑦 = 𝑥 + 𝑟 sign(𝑔 𝑘 )𝑒 𝑘 )


• since 𝑔 is a subgradient, this implies that

𝑓 (𝑥) + 𝑟 k𝑔k∞ = 𝑓 (𝑥) + 𝑔𝑇 (𝑦 − 𝑥) ≤ 𝑓 (𝑦) ≤ 𝑀

• we conclude that 𝜕 𝑓 (𝑥) is bounded:

𝑀 − 𝑓 (𝑥)
k𝑔k∞ ≤ for all 𝑔 ∈ 𝜕 𝑓 (𝑥)
𝑟

Subgradients 2.6
Example

𝑓 (𝑥) = max { 𝑓1 (𝑥), 𝑓2 (𝑥)} with 𝑓1, 𝑓2 convex and differentiable

𝑓 (𝑦)

𝑓2 (𝑦)

𝑓1 (𝑦)

ˆ = 𝑓2 ( 𝑥)
• if 𝑓1 ( 𝑥) ˆ , subdifferential at 𝑥ˆ is line segment [∇ 𝑓1 ( 𝑥),
ˆ ∇ 𝑓2 ( 𝑥)]
ˆ
ˆ > 𝑓2 ( 𝑥)
• if 𝑓1 ( 𝑥) ˆ , subdifferential at 𝑥ˆ is {∇ 𝑓1 ( 𝑥)}
ˆ
ˆ < 𝑓2 ( 𝑥)
• if 𝑓1 ( 𝑥) ˆ , subdifferential at 𝑥ˆ is {∇ 𝑓2 ( 𝑥)}
ˆ

Subgradients 2.7
Examples

Absolute value 𝑓 (𝑥) = |𝑥|

𝑓 (𝑥) 𝜕 𝑓 (𝑥)

𝑥 −1

Euclidean norm 𝑓 (𝑥) = k𝑥k2

1
𝜕 𝑓 (𝑥) = { 𝑥} if 𝑥 ≠ 0, 𝜕 𝑓 (𝑥) = {𝑔 | k𝑔k2 ≤ 1} if 𝑥 = 0
k𝑥k2

Subgradients 2.8
Monotonicity

the subdifferential of a convex function is a monotone operator:

(𝑢 − 𝑣)𝑇 (𝑥 − 𝑦) ≥ 0 for all 𝑥 , 𝑦 , 𝑢 ∈ 𝜕 𝑓 (𝑥) , 𝑣 ∈ 𝜕 𝑓 (𝑦)

Proof: by definition

𝑓 (𝑦) ≥ 𝑓 (𝑥) + 𝑢𝑇 (𝑦 − 𝑥), 𝑓 (𝑥) ≥ 𝑓 (𝑦) + 𝑣𝑇 (𝑥 − 𝑦)

combining the two inequalities shows monotonicity

Subgradients 2.9
Examples of non-subdifferentiable functions

the following functions are not subdifferentiable at 𝑥 = 0

• 𝑓 : R → R, dom 𝑓 = R+

𝑓 (𝑥) = 1 if 𝑥 = 0, 𝑓 (𝑥) = 0 if 𝑥 > 0

• 𝑓 : R → R, dom 𝑓 = R+ √
𝑓 (𝑥) = − 𝑥

the only supporting hyperplane to epi 𝑓 at (0, 𝑓 (0)) is vertical

Subgradients 2.10
Subgradients and sublevel sets

if 𝑔 is a subgradient of 𝑓 at 𝑥 , then

𝑓 (𝑦) ≤ 𝑓 (𝑥) =⇒ 𝑔𝑇 (𝑦 − 𝑥) ≤ 0

𝑥
𝑓 (𝑦) ≤ 𝑓 (𝑥)

the nonzero subgradients at 𝑥 define supporting hyperplanes to the sublevel set

{𝑦 | 𝑓 (𝑦) ≤ 𝑓 (𝑥)}

Subgradients 2.11
Outline

• definition

• subgradient calculus

• duality and optimality conditions

• directional derivative
Subgradient calculus

Weak subgradient calculus: rules for finding one subgradient

• sufficient for most nondifferentiable convex optimization algorithms


• if you can evaluate 𝑓 (𝑥) , you can usually compute a subgradient

Strong subgradient calculus: rules for finding 𝜕 𝑓 (𝑥) (all subgradients)

• some algorithms, optimality conditions, etc., need entire subdifferential


• can be quite complicated

we will assume that 𝑥 ∈ int dom 𝑓

Subgradients 2.12
Basic rules

Differentiable functions: 𝜕 𝑓 (𝑥) = {∇ 𝑓 (𝑥)} if 𝑓 is differentiable at 𝑥

Nonnegative linear combination

if 𝑓 (𝑥) = 𝛼1 𝑓1 (𝑥) + 𝛼2 𝑓2 (𝑥) with 𝛼1, 𝛼2 ≥ 0, then

𝜕 𝑓 (𝑥) = 𝛼1 𝜕 𝑓1 (𝑥) + 𝛼2 𝜕 𝑓2 (𝑥)

(right-hand side is addition of sets)

Affine transformation of variables: if 𝑓 (𝑥) = ℎ( 𝐴𝑥 + 𝑏) , then

𝜕 𝑓 (𝑥) = 𝐴𝑇 𝜕ℎ( 𝐴𝑥 + 𝑏)

Subgradients 2.13
Pointwise maximum

𝑓 (𝑥) = max { 𝑓1 (𝑥), . . . , 𝑓𝑚 (𝑥)}

define 𝐼 (𝑥) = {𝑖 | 𝑓𝑖 (𝑥) = 𝑓 (𝑥)}, the ‘active’ functions at 𝑥

Weak result

to compute a subgradient at 𝑥 , choose any 𝑘 ∈ 𝐼 (𝑥) , any subgradient of 𝑓 𝑘 at 𝑥

Strong result [
𝜕 𝑓 (𝑥) = conv 𝜕 𝑓𝑖 (𝑥)
𝑖∈𝐼 (𝑥)

• the convex hull of the union of subdifferentials of ‘active’ functions at 𝑥


• if 𝑓𝑖 ’s are differentiable, 𝜕 𝑓 (𝑥) = conv {∇ 𝑓𝑖 (𝑥) | 𝑖 ∈ 𝐼 (𝑥)}

Subgradients 2.14
Example: piecewise-linear function

𝑓 (𝑥) = max (𝑎𝑇𝑖 𝑥 + 𝑏𝑖 )


𝑖=1,...,𝑚

𝑓 (𝑥)

𝑎𝑇𝑖 𝑥 + 𝑏𝑖

the subdifferential at 𝑥 is a polyhedron

𝜕 𝑓 (𝑥) = conv {𝑎𝑖 | 𝑖 ∈ 𝐼 (𝑥)}

with 𝐼 (𝑥) = {𝑖 | 𝑎𝑇𝑖 𝑥 + 𝑏𝑖 = 𝑓 (𝑥)}


Subgradients 2.15
Example: ℓ1-norm

𝑓 (𝑥) = k𝑥k1 = max 𝑠𝑇 𝑥


𝑠∈{−1,1}𝑛

the subdifferential is a product of intervals



 [−1, 1]
 𝑥𝑘 = 0
𝜕 𝑓 (𝑥) = 𝐽1 × · · · × 𝐽𝑛 , 𝐽 𝑘 = {1} 𝑥𝑘 > 0

 {−1}
 𝑥𝑘 < 0

(1, 1)
(−1, 1) (1, 1) (1, 1)

(−1, −1) (1, −1)


(1, −1)

𝜕 𝑓 (0, 0) = [−1, 1] × [−1, 1] 𝜕 𝑓 (1, 0) = {1} × [−1, 1] 𝜕 𝑓 (1, 1) = {(1, 1)}

Subgradients 2.16
Pointwise supremum

𝑓 (𝑥) = sup 𝑓𝛼 (𝑥), 𝑓𝛼 (𝑥) convex in 𝑥 for every 𝛼


𝛼∈A

Weak result: to find a subgradient at 𝑥ˆ ,

ˆ = 𝑓 𝛽 ( 𝑥)
• find any 𝛽 for which 𝑓 ( 𝑥) ˆ (assuming maximum is attained)
ˆ
• choose any 𝑔 ∈ 𝜕 𝑓 𝛽 ( 𝑥)

(Partial) strong result: define 𝐼 (𝑥) = {𝛼 ∈ A | 𝑓𝛼 (𝑥) = 𝑓 (𝑥)}


[
conv 𝜕 𝑓𝛼 (𝑥) ⊆ 𝜕 𝑓 (𝑥)
𝛼∈𝐼 (𝑥)

equality requires extra conditions (for example, A compact, 𝑓𝛼 continuous in 𝛼)

Subgradients 2.17
Exercise: maximum eigenvalue

Problem: explain how to find a subgradient of

𝑓 (𝑥) = 𝜆 max ( 𝐴(𝑥)) = sup 𝑦𝑇 𝐴(𝑥)𝑦


k𝑦k2 =1

where 𝐴(𝑥) = 𝐴0 + 𝑥 1 𝐴1 + · · · + 𝑥 𝑛 𝐴𝑛 with symmetric coefficients 𝐴𝑖

Solution: to find a subgradient at 𝑥ˆ ,

ˆ
• choose any unit eigenvector 𝑦 with eigenvalue 𝜆 max ( 𝐴( 𝑥))
• the gradient of 𝑦𝑇 𝐴(𝑥)𝑦 at 𝑥ˆ is a subgradient of 𝑓 :

ˆ
(𝑦𝑇 𝐴1 𝑦, . . . , 𝑦𝑇 𝐴𝑛 𝑦) ∈ 𝜕 𝑓 ( 𝑥)

Subgradients 2.18
Minimization

𝑓 (𝑥) = inf ℎ(𝑥, 𝑦), ℎ jointly convex in (𝑥, 𝑦)


𝑦

Weak result: to find a subgradient at 𝑥ˆ ,

• find 𝑦ˆ that minimizes ℎ( 𝑥,


ˆ 𝑦) (assuming minimum is attained)
• find subgradient (𝑔, 0) ∈ 𝜕ℎ( 𝑥,
ˆ 𝑦ˆ )

Proof: for all 𝑥 , 𝑦 ,


ℎ(𝑥, 𝑦) ≥ ˆ 𝑦ˆ ) + 𝑔𝑇 (𝑥 − 𝑥)
ℎ( 𝑥, ˆ + 0𝑇 (𝑦 − 𝑦ˆ )
= ˆ + 𝑔𝑇 (𝑥 − 𝑥)
𝑓 ( 𝑥) ˆ

therefore
𝑓 (𝑥) = inf ℎ(𝑥, 𝑦) ≥ 𝑓 ( 𝑥)
ˆ + 𝑔𝑇 (𝑥 − 𝑥)
ˆ
𝑦

Subgradients 2.19
Exercise: Euclidean distance to convex set

Problem: explain how to find a subgradient of

𝑓 (𝑥) = inf k𝑥 − 𝑦k2


𝑦∈𝐶

where 𝐶 is a closed convex set

Solution: to find a subgradient at 𝑥ˆ ,

ˆ = 0 (that is, 𝑥ˆ ∈ 𝐶) , take 𝑔 = 0


• if 𝑓 ( 𝑥)
ˆ > 0, find projection 𝑦ˆ = 𝑃( 𝑥)
• if 𝑓 ( 𝑥) ˆ on 𝐶 and take

1 1
𝑔= ( 𝑥ˆ − 𝑦ˆ ) = ( 𝑥ˆ − 𝑃( 𝑥))
ˆ
k 𝑦ˆ − 𝑥k
ˆ 2 k 𝑥ˆ − 𝑃( 𝑥)
ˆ k2

Subgradients 2.20
Composition

𝑓 (𝑥) = ℎ( 𝑓1 (𝑥), . . . , 𝑓 𝑘 (𝑥)), ℎ convex and nondecreasing, 𝑓𝑖 convex

Weak result: to find a subgradient at 𝑥ˆ ,

ˆ . . . , 𝑓 𝑘 ( 𝑥))
• find 𝑧 ∈ 𝜕ℎ( 𝑓1 ( 𝑥), ˆ and 𝑔𝑖 ∈ 𝜕 𝑓𝑖 ( 𝑥)
ˆ
ˆ
• then 𝑔 = 𝑧1 𝑔1 + · · · + 𝑧 𝑘 𝑔 𝑘 ∈ 𝜕 𝑓 ( 𝑥)

reduces to standard formula for differentiable ℎ, 𝑓𝑖

Proof:
 
𝑓 (𝑥) ≥ ˆ
ℎ 𝑓1 ( 𝑥) + 𝑔𝑇1 (𝑥 ˆ . . . , 𝑓 𝑘 ( 𝑥)
− 𝑥), ˆ
+ 𝑔𝑇𝑘 (𝑥 − 𝑥)ˆ
 
≥ ˆ . . . , 𝑓 𝑘 ( 𝑥))
ℎ ( 𝑓1 ( 𝑥), ˆ + 𝑧 𝑔1 (𝑥 − 𝑥),
𝑇 𝑇
ˆ . . . , 𝑔 𝑘 (𝑥 − 𝑥)
𝑇
ˆ

= ˆ . . . , 𝑓 𝑘 ( 𝑥))
ℎ ( 𝑓1 ( 𝑥), ˆ + (𝑧1 𝑔1 + · · · + 𝑧 𝑘 𝑔 𝑘 )𝑇 (𝑥 − 𝑥)
ˆ
= ˆ + 𝑔𝑇 (𝑥 − 𝑥)
𝑓 ( 𝑥) ˆ

Subgradients 2.21
Optimal value function

define 𝑓 (𝑢, 𝑣) as the optimal value of convex problem

minimize 𝑓0 (𝑥)
subject to 𝑓𝑖 (𝑥) ≤ 𝑢𝑖 , 𝑖 = 1, . . . , 𝑚
𝐴𝑥 = 𝑏 + 𝑣

(functions 𝑓𝑖 are convex; optimization variable is 𝑥 )

ˆ 𝑣ˆ ) is finite and strong duality holds with the dual


Weak result: suppose 𝑓 ( 𝑢,
!
X
maximize inf 𝑓0 (𝑥) + 𝜆𝑖 ( 𝑓𝑖 (𝑥) − 𝑢ˆ𝑖 ) + 𝜈𝑇 ( 𝐴𝑥 − 𝑏 − 𝑣ˆ )
𝑥
𝑖
subject to 𝜆0

if 𝜆ˆ , 𝜈ˆ are dual optimal (for right-hand sides 𝑢, ˆ −𝜈)


ˆ 𝑣ˆ ) then (−𝜆, ˆ ∈ 𝜕 𝑓 ( 𝑢,
ˆ 𝑣ˆ )

Subgradients 2.22
Proof: by weak duality for problem with right-hand sides 𝑢 , 𝑣
!
X
𝑓 (𝑢, 𝑣) ≥ inf 𝑓0 (𝑥) + 𝜆ˆ𝑖 ( 𝑓𝑖 (𝑥) − 𝑢𝑖 ) + 𝜈ˆ𝑇 ( 𝐴𝑥 − 𝑏 − 𝑣)
𝑥
!
𝑖

X
= inf 𝑓0 (𝑥) + 𝜆ˆ𝑖 ( 𝑓𝑖 (𝑥) − 𝑢ˆ𝑖 ) + 𝜈ˆ𝑇 ( 𝐴𝑥 − 𝑏 − 𝑣ˆ )
𝑥
𝑖

− 𝜆ˆ𝑇 (𝑢 − 𝑢) ˆ − 𝜈ˆ𝑇 (𝑣 − 𝑣ˆ )
= ˆ 𝑣ˆ ) − 𝜆ˆ𝑇 (𝑢 − 𝑢)
𝑓 ( 𝑢, ˆ − 𝜈ˆ𝑇 (𝑣 − 𝑣ˆ )

Subgradients 2.23
Expectation

𝑓 (𝑥) = E ℎ(𝑥, 𝑢) 𝑢 random, ℎ convex in 𝑥 for every 𝑢

Weak result: to find a subgradient at 𝑥ˆ ,

ˆ 𝑢)
• choose a function 𝑢 ↦→ 𝑔(𝑢) with 𝑔(𝑢) ∈ 𝜕𝑥 ℎ( 𝑥,
ˆ
• then, 𝑔 = E𝑢 𝑔(𝑢) ∈ 𝜕 𝑓 ( 𝑥)

Proof: by convexity of ℎ and definition of 𝑔(𝑢) ,

𝑓 (𝑥) = E ℎ(𝑥, 𝑢)
 
ˆ 𝑢) + 𝑔(𝑢)𝑇 (𝑥 − 𝑥)
≥ E ℎ( 𝑥, ˆ

= ˆ + 𝑔𝑇 (𝑥 − 𝑥)
𝑓 ( 𝑥) ˆ

Subgradients 2.24
Outline

• definition

• subgradient calculus

• duality and optimality conditions

• directional derivative
Optimality conditions — unconstrained

𝑥★ minimizes 𝑓 (𝑥) if and only


0 ∈ 𝜕 𝑓 (𝑥★)

𝑥★

this follows directly from the definition of subgradient:

𝑓 (𝑦) ≥ 𝑓 (𝑥★) + 0𝑇 (𝑦 − 𝑥★) for all 𝑦 ⇐⇒ 0 ∈ 𝜕 𝑓 (𝑥★)

Subgradients 2.25
Example: piecewise-linear minimization

𝑓 (𝑥) = max (𝑎𝑇𝑖 𝑥 + 𝑏𝑖 )


𝑖=1,...,𝑚

Optimality condition

0 ∈ conv {𝑎𝑖 | 𝑖 ∈ 𝐼 (𝑥★)} where 𝐼 (𝑥) = {𝑖 | 𝑎𝑇𝑖 𝑥 + 𝑏𝑖 = 𝑓 (𝑥)}

• in other words, 𝑥★ is optimal if and only if there is a 𝜆 with

X
𝑚
𝜆  0, 1𝑇 𝜆 = 1, 𝜆𝑖 𝑎𝑖 = 0, 𝜆𝑖 = 0 for 𝑖 ∉ 𝐼 (𝑥★)
𝑖=1

• these are the optimality conditions for the equivalent linear program

minimize 𝑡 maximize 𝑏𝑇 𝜆
subject to 𝐴𝑥 + 𝑏  𝑡1 subject to 𝐴𝑇 𝜆 = 0
𝜆  0, 1𝑇 𝜆 = 1

Subgradients 2.26
Optimality conditions — constrained

minimize 𝑓0 (𝑥)
subject to 𝑓𝑖 (𝑥) ≤ 0, 𝑖 = 1, . . . , 𝑚

assume dom 𝑓𝑖 = R𝑛 , so functions 𝑓𝑖 are subdifferentiable everywhere

Karush–Kuhn–Tucker conditions

if strong duality holds, then 𝑥★, 𝜆★ are primal, dual optimal if and only if
1. 𝑥★ is primal feasible

2. 𝜆★  0

3. 𝜆★ 𝑓 (𝑥 ★) = 0 for 𝑖 = 1, . . . , 𝑚
𝑖 𝑖
P𝑚
4. 𝑥★ is a minimizer of 𝐿 (𝑥, 𝜆★) = 𝑓0 (𝑥) + 𝜆★ 𝑓 (𝑥) :
𝑖=1 𝑖 𝑖

X
𝑚
0 ∈ 𝜕 𝑓0 (𝑥 ) +

𝜆★
𝑖 𝜕 𝑓𝑖 (𝑥 ★
)
𝑖=1

Subgradients 2.27
Outline

• definition

• subgradient calculus

• duality and optimality conditions

• directional derivative
Directional derivative

Definition (for general 𝑓 ): the directional derivative of 𝑓 at 𝑥 in the direction 𝑦 is

0 𝑓 (𝑥 + 𝛼𝑦) − 𝑓 (𝑥)
𝑓 (𝑥; 𝑦) = lim
𝛼&0 𝛼
 
1
= lim 𝑡 𝑓 (𝑥 + 𝑦) − 𝑡 𝑓 (𝑥)
𝑡→∞ 𝑡

(if the limit exists)

• 𝑓 0 (𝑥; 𝑦) is the right derivative of 𝑔(𝛼) = 𝑓 (𝑥 + 𝛼𝑦) at 𝛼 = 0

• 𝑓 0 (𝑥; 𝑦) is homogeneous in 𝑦 :

𝑓 0 (𝑥; 𝜆𝑦) = 𝜆 𝑓 0 (𝑥; 𝑦) for 𝜆 ≥ 0

Subgradients 2.28
Directional derivative of a convex function

Equivalent definition (for convex 𝑓 ): replace lim with inf

0 𝑓 (𝑥 + 𝛼𝑦) − 𝑓 (𝑥)
𝑓 (𝑥; 𝑦) = inf
𝛼
 
𝛼>0
1
= inf 𝑡 𝑓 (𝑥 + 𝑦) − 𝑡 𝑓 (𝑥)
𝑡>0 𝑡

Proof

• the function ℎ(𝑦) = 𝑓 (𝑥 + 𝑦) − 𝑓 (𝑥) is convex in 𝑦 , with ℎ(0) = 0

• its perspective 𝑡ℎ(𝑦/𝑡) is nonincreasing in 𝑡 (ECE236B ex. A3.5); hence

𝑓 0 (𝑥; 𝑦) = lim 𝑡ℎ(𝑦/𝑡) = inf 𝑡ℎ(𝑦/𝑡)


𝑡→∞ 𝑡>0

Subgradients 2.29
Properties

consequences of the expressions (for convex 𝑓 )

0 𝑓 (𝑥 + 𝛼𝑦) − 𝑓 (𝑥)
𝑓 (𝑥; 𝑦) = inf
𝛼
 
𝛼>0
1
= inf 𝑡 𝑓 (𝑥 + 𝑦) − 𝑡 𝑓 (𝑥)
𝑡>0 𝑡

• 𝑓 0 (𝑥; 𝑦) is convex in 𝑦 (partial minimization of a convex function in 𝑦 , 𝑡 )

• 𝑓 0 (𝑥; 𝑦) defines a lower bound on 𝑓 in the direction 𝑦 :

𝑓 (𝑥 + 𝛼𝑦) ≥ 𝑓 (𝑥) + 𝛼 𝑓 0 (𝑥; 𝑦) for all 𝛼 ≥ 0

Subgradients 2.30
Directional derivative and subgradients

for convex 𝑓 and 𝑥 ∈ int dom 𝑓

𝑓 0 (𝑥; 𝑦) = sup 𝑔𝑇 𝑦
𝑔∈𝜕 𝑓 (𝑥)
𝑓ˆ0 (𝑥, 𝑦) = 𝑔𝑇 𝑦

𝑔ˆ
𝑓 0 (𝑥; 𝑦) is support function of 𝜕 𝑓 (𝑥)
𝑦 𝜕 𝑓 (𝑥)

• generalizes 𝑓 0 (𝑥; 𝑦) = ∇ 𝑓 (𝑥)𝑇 𝑦 for differentiable functions


• implies that 𝑓 0 (𝑥; 𝑦) exists for all 𝑥 ∈ int dom 𝑓 , all 𝑦 (see page 2.4)

Subgradients 2.31
Proof: if 𝑔 ∈ 𝜕 𝑓 (𝑥) then from page 2.29

0 𝑓 (𝑥) + 𝛼𝑔𝑇 𝑦 − 𝑓 (𝑥)


𝑓 (𝑥; 𝑦) ≥ inf = 𝑔𝑇 𝑦
𝛼>0 𝛼

it remains to show that 𝑓 0 (𝑥; 𝑦) = 𝑔ˆ𝑇 𝑦 for at least one 𝑔ˆ ∈ 𝜕 𝑓 (𝑥)

• 𝑓 0 (𝑥; 𝑦) is convex in 𝑦 with domain R𝑛 , hence subdifferentiable at all 𝑦

• let 𝑔ˆ be a subgradient of 𝑓 0 (𝑥; 𝑦) at 𝑦 : then for all 𝑣, 𝜆 ≥ 0,

𝜆 𝑓 0 (𝑥; 𝑣) = 𝑓 0 (𝑥; 𝜆𝑣) ≥ 𝑓 0 (𝑥; 𝑦) + 𝑔ˆ𝑇 (𝜆𝑣 − 𝑦)

• taking 𝜆 → ∞ shows that 𝑓 0 (𝑥; 𝑣) ≥ 𝑔ˆ𝑇 𝑣; from the lower bound on page 2.30,

𝑓 (𝑥 + 𝑣) ≥ 𝑓 (𝑥) + 𝑓 0 (𝑥; 𝑣) ≥ 𝑓 (𝑥) + 𝑔ˆ𝑇 𝑣 for all 𝑣

hence 𝑔ˆ ∈ 𝜕 𝑓 (𝑥)

• taking 𝜆 = 0 we see that 𝑓 0 (𝑥; 𝑦) ≤ 𝑔ˆ𝑇 𝑦

Subgradients 2.32
Descent directions and subgradients

𝑦 is a descent direction of 𝑓 at 𝑥 if 𝑓 0 (𝑥; 𝑦) < 0

• the negative gradient of a differentiable 𝑓 is a descent direction (if ∇ 𝑓 (𝑥) ≠ 0)


• negative subgradient is not always a descent direction

Example: 𝑓 (𝑥 1, 𝑥 2) = |𝑥 1 | + 2|𝑥 2 |
𝑥2

𝑔 = (1, 2)

𝑥1
(1, 0)

𝑔 = (1, 2) ∈ 𝜕 𝑓 (1, 0) , but 𝑦 = (−1, −2) is not a descent direction at (1, 0)

Subgradients 2.33
Steepest descent direction

Definition: (normalized) steepest descent direction at 𝑥 ∈ int dom 𝑓 is

Δ𝑥nsd = argmin 𝑓 0 (𝑥; 𝑦)


k𝑦k2 ≤1

Δ𝑥nsd is the primal solution 𝑦 of the pair of dual problems (BV §8.1.3)

minimize (over 𝑦 ) 𝑓 0 (𝑥; 𝑦) maximize (over 𝑔 ) −k𝑔k2


subject to k𝑦k2 ≤ 1 subject to 𝑔 ∈ 𝜕 𝑓 (𝑥)

• dual optimal 𝑔★ is subgradient with least norm


• 𝑓 0 (𝑥; Δ𝑥nsd) = −k𝑔★ k2 𝜕 𝑓 (𝑥)
𝑔★
• if 0 ∉ 𝜕 𝑓 (𝑥) , Δ𝑥nsd = −𝑔★/k𝑔★ k2
• Δ𝑥nsd can be expensive to compute

Δ𝑥nsd 𝑔𝑇 Δ𝑥nsd = 𝑓 0 (𝑥, Δ𝑥nsd)

Subgradients 2.34
Subgradients and distance to sublevel sets

if 𝑓 is convex, 𝑓 (𝑦) < 𝑓 (𝑥) , 𝑔 ∈ 𝜕 𝑓 (𝑥) , then for small 𝑡 > 0,

k𝑥 − 𝑡𝑔 − 𝑦k22 = k𝑥 − 𝑦k22 − 2𝑡𝑔𝑇 (𝑥 − 𝑦) + 𝑡 2 k𝑔k22


≤ k𝑥 − 𝑦k22 − 2𝑡 ( 𝑓 (𝑥) − 𝑓 (𝑦)) + 𝑡 2 k𝑔k22
< k𝑥 − 𝑦k22

• −𝑔 is descent direction for k𝑥 − 𝑦k2, for any 𝑦 with 𝑓 (𝑦) < 𝑓 (𝑥)

• in particular, −𝑔 is descent direction for distance to any minimizer of 𝑓

Subgradients 2.35
References

• A. Beck, First-Order Methods in Optimization (2017), chapter 3.

• D. P. Bertsekas, A. Nedić, A. E. Ozdaglar, Convex Analysis and Optimization


(2003), chapter 4.

• J.-B. Hiriart-Urruty, C. Lemaréchal, Convex Analysis and Minimization Algoritms


(1993), chapter VI.

• Yu. Nesterov, Lectures on Convex Optimization (2018), section 3.1.

• B. T. Polyak, Introduction to Optimization (1987), section 5.1.

Subgradients 2.36

You might also like