0% found this document useful (0 votes)

69 views39 pages

Subgradients

The document provides an overview of subgradients and subdifferential calculus for convex functions. It defines subgradients and subdifferentials, and proves properties like the subdifferential being a closed convex set and bounded when the domain is interior. It presents rules for computing subgradients of pointwise max functions, nonnegative combinations of functions, and affine transformations. Examples are given for absolute value, Euclidean norm, and piecewise-linear functions. The document outlines subgradient calculus for finding both individual subgradients and entire subdifferentials.

Uploaded by

williamrob104

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

69 views39 pages

Subgradients

Uploaded by

williamrob104

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 39

L.

Vandenberghe ECE236C (Spring 2022)

2. Subgradients

• definition

• subgradient calculus

• duality and optimality conditions

• directional derivative

2.1
Basic inequality

recall the basic inequality for differentiable convex functions:

𝑓 (𝑦) ≥ 𝑓 (𝑥) + ∇ 𝑓 (𝑥)𝑇 (𝑦 − 𝑥) for all 𝑦 ∈ dom 𝑓

(𝑥, 𝑓 (𝑥))

∇ 𝑓 (𝑥)
−1

• the first-order approximation of 𝑓 at 𝑥 is a global lower bound

• ∇ 𝑓 (𝑥) defines non-vertical supporting hyperplane to epigraph of 𝑓 at (𝑥, 𝑓 (𝑥)) :
𝑇
∇ 𝑓 (𝑥) 𝑦 𝑥
− ≤ 0 for all (𝑦, 𝑡) ∈ epi 𝑓
−1 𝑡 𝑓 (𝑥)

Subgradients 2.2
Subgradient

𝑔 is a subgradient of a convex function 𝑓 at 𝑥 ∈ dom 𝑓 if

𝑓 (𝑦) ≥ 𝑓 (𝑥) + 𝑔𝑇 (𝑦 − 𝑥) for all 𝑦 ∈ dom 𝑓

𝑓 (𝑦)

𝑓 (𝑥 1) + 𝑔𝑇1 (𝑦 − 𝑥1)

𝑓 (𝑥 1) + 𝑔𝑇2 (𝑦 − 𝑥1)

𝑓 (𝑥2) + 𝑔𝑇3 (𝑦 − 𝑥2)

𝑥1 𝑥2

𝑔1, 𝑔2 are subgradients at 𝑥1; 𝑔3 is a subgradient at 𝑥2

Subgradients 2.3
Subdifferential

the subdifferential 𝜕 𝑓 (𝑥) of 𝑓 at 𝑥 is the set of all subgradients:

𝜕 𝑓 (𝑥) = {𝑔 | 𝑔𝑇 (𝑦 − 𝑥) ≤ 𝑓 (𝑦) − 𝑓 (𝑥), ∀𝑦 ∈ dom 𝑓 }

Properties

• 𝜕 𝑓 (𝑥) is a closed convex set (possibly empty)

this follows from the definition: 𝜕 𝑓 (𝑥) is an intersection of halfspaces

• if 𝑥 ∈ int dom 𝑓 then 𝜕 𝑓 (𝑥) is nonempty and bounded

proof on next two pages

Subgradients 2.4
Proof: we show that 𝜕 𝑓 (𝑥) is nonempty when 𝑥 ∈ int dom 𝑓

• (𝑥, 𝑓 (𝑥)) is in the boundary of the convex set epi 𝑓

• therefore there exists a supporting hyperplane to epi 𝑓 at (𝑥, 𝑓 (𝑥)) :

𝑇
𝑎 𝑦 𝑥
∃(𝑎, 𝑏) ≠ 0, − ≤0 ∀(𝑦, 𝑡) ∈ epi 𝑓
𝑏 𝑡 𝑓 (𝑥)

• 𝑏 > 0 gives a contradiction as 𝑡 → ∞

• 𝑏 = 0 gives a contradiction for 𝑦 = 𝑥 + 𝜖 𝑎 with small 𝜖 > 0

1
• therefore 𝑏 < 0 and 𝑔 = 𝑎 is a subgradient of 𝑓 at 𝑥
|𝑏|

Subgradients 2.5
Proof: 𝜕 𝑓 (𝑥) is bounded when 𝑥 ∈ int dom 𝑓

• for small 𝑟 > 0, define a set of 2𝑛 points

𝐵 = {𝑥 ± 𝑟𝑒 𝑘 | 𝑘 = 1, . . . , 𝑛} ⊂ dom 𝑓

and define 𝑀 = max 𝑓 (𝑦) < ∞

𝑦∈𝐵
• for every 𝑔 ∈ 𝜕 𝑓 (𝑥) , there is a point 𝑦 ∈ 𝐵 with

𝑟 k𝑔k∞ = 𝑔𝑇 (𝑦 − 𝑥)

(choose an index 𝑘 with |𝑔 𝑘 | = k𝑔k∞, and take 𝑦 = 𝑥 + 𝑟 sign(𝑔 𝑘 )𝑒 𝑘 )

• since 𝑔 is a subgradient, this implies that

𝑓 (𝑥) + 𝑟 k𝑔k∞ = 𝑓 (𝑥) + 𝑔𝑇 (𝑦 − 𝑥) ≤ 𝑓 (𝑦) ≤ 𝑀

• we conclude that 𝜕 𝑓 (𝑥) is bounded:

𝑀 − 𝑓 (𝑥)
k𝑔k∞ ≤ for all 𝑔 ∈ 𝜕 𝑓 (𝑥)
𝑟

Subgradients 2.6
Example

𝑓 (𝑥) = max { 𝑓1 (𝑥), 𝑓2 (𝑥)} with 𝑓1, 𝑓2 convex and differentiable

𝑓 (𝑦)

𝑓2 (𝑦)

𝑓1 (𝑦)

ˆ = 𝑓2 ( 𝑥)
• if 𝑓1 ( 𝑥) ˆ , subdifferential at 𝑥ˆ is line segment [∇ 𝑓1 ( 𝑥),
ˆ ∇ 𝑓2 ( 𝑥)]
ˆ
ˆ > 𝑓2 ( 𝑥)
• if 𝑓1 ( 𝑥) ˆ , subdifferential at 𝑥ˆ is {∇ 𝑓1 ( 𝑥)}
ˆ
ˆ < 𝑓2 ( 𝑥)
• if 𝑓1 ( 𝑥) ˆ , subdifferential at 𝑥ˆ is {∇ 𝑓2 ( 𝑥)}
ˆ

Subgradients 2.7
Examples

Absolute value 𝑓 (𝑥) = |𝑥|

𝑓 (𝑥) 𝜕 𝑓 (𝑥)

𝑥 −1

Euclidean norm 𝑓 (𝑥) = k𝑥k2

1
𝜕 𝑓 (𝑥) = { 𝑥} if 𝑥 ≠ 0, 𝜕 𝑓 (𝑥) = {𝑔 | k𝑔k2 ≤ 1} if 𝑥 = 0
k𝑥k2

Subgradients 2.8
Monotonicity

the subdifferential of a convex function is a monotone operator:

(𝑢 − 𝑣)𝑇 (𝑥 − 𝑦) ≥ 0 for all 𝑥 , 𝑦 , 𝑢 ∈ 𝜕 𝑓 (𝑥) , 𝑣 ∈ 𝜕 𝑓 (𝑦)

Proof: by definition

𝑓 (𝑦) ≥ 𝑓 (𝑥) + 𝑢𝑇 (𝑦 − 𝑥), 𝑓 (𝑥) ≥ 𝑓 (𝑦) + 𝑣𝑇 (𝑥 − 𝑦)

combining the two inequalities shows monotonicity

Subgradients 2.9
Examples of non-subdifferentiable functions

the following functions are not subdifferentiable at 𝑥 = 0

• 𝑓 : R → R, dom 𝑓 = R+

𝑓 (𝑥) = 1 if 𝑥 = 0, 𝑓 (𝑥) = 0 if 𝑥 > 0

• 𝑓 : R → R, dom 𝑓 = R+ √
𝑓 (𝑥) = − 𝑥

the only supporting hyperplane to epi 𝑓 at (0, 𝑓 (0)) is vertical

Subgradients 2.10
Subgradients and sublevel sets

if 𝑔 is a subgradient of 𝑓 at 𝑥 , then

𝑓 (𝑦) ≤ 𝑓 (𝑥) =⇒ 𝑔𝑇 (𝑦 − 𝑥) ≤ 0

𝑥
𝑓 (𝑦) ≤ 𝑓 (𝑥)

the nonzero subgradients at 𝑥 define supporting hyperplanes to the sublevel set

{𝑦 | 𝑓 (𝑦) ≤ 𝑓 (𝑥)}

Subgradients 2.11
Outline

• definition

• subgradient calculus

• duality and optimality conditions

• directional derivative
Subgradient calculus

Weak subgradient calculus: rules for finding one subgradient

• sufficient for most nondifferentiable convex optimization algorithms

• if you can evaluate 𝑓 (𝑥) , you can usually compute a subgradient

Strong subgradient calculus: rules for finding 𝜕 𝑓 (𝑥) (all subgradients)

• some algorithms, optimality conditions, etc., need entire subdifferential

• can be quite complicated

we will assume that 𝑥 ∈ int dom 𝑓

Subgradients 2.12
Basic rules

Differentiable functions: 𝜕 𝑓 (𝑥) = {∇ 𝑓 (𝑥)} if 𝑓 is differentiable at 𝑥

Nonnegative linear combination

if 𝑓 (𝑥) = 𝛼1 𝑓1 (𝑥) + 𝛼2 𝑓2 (𝑥) with 𝛼1, 𝛼2 ≥ 0, then

𝜕 𝑓 (𝑥) = 𝛼1 𝜕 𝑓1 (𝑥) + 𝛼2 𝜕 𝑓2 (𝑥)

(right-hand side is addition of sets)

Affine transformation of variables: if 𝑓 (𝑥) = ℎ( 𝐴𝑥 + 𝑏) , then

𝜕 𝑓 (𝑥) = 𝐴𝑇 𝜕ℎ( 𝐴𝑥 + 𝑏)

Subgradients 2.13
Pointwise maximum

𝑓 (𝑥) = max { 𝑓1 (𝑥), . . . , 𝑓𝑚 (𝑥)}

define 𝐼 (𝑥) = {𝑖 | 𝑓𝑖 (𝑥) = 𝑓 (𝑥)}, the ‘active’ functions at 𝑥

Weak result

to compute a subgradient at 𝑥 , choose any 𝑘 ∈ 𝐼 (𝑥) , any subgradient of 𝑓 𝑘 at 𝑥

Strong result [
𝜕 𝑓 (𝑥) = conv 𝜕 𝑓𝑖 (𝑥)
𝑖∈𝐼 (𝑥)

• the convex hull of the union of subdifferentials of ‘active’ functions at 𝑥

• if 𝑓𝑖 ’s are differentiable, 𝜕 𝑓 (𝑥) = conv {∇ 𝑓𝑖 (𝑥) | 𝑖 ∈ 𝐼 (𝑥)}

Subgradients 2.14
Example: piecewise-linear function

𝑓 (𝑥) = max (𝑎𝑇𝑖 𝑥 + 𝑏𝑖 )

𝑖=1,...,𝑚

𝑓 (𝑥)

𝑎𝑇𝑖 𝑥 + 𝑏𝑖

the subdifferential at 𝑥 is a polyhedron

𝜕 𝑓 (𝑥) = conv {𝑎𝑖 | 𝑖 ∈ 𝐼 (𝑥)}

with 𝐼 (𝑥) = {𝑖 | 𝑎𝑇𝑖 𝑥 + 𝑏𝑖 = 𝑓 (𝑥)}

Subgradients 2.15
Example: ℓ1-norm

𝑓 (𝑥) = k𝑥k1 = max 𝑠𝑇 𝑥

𝑠∈{−1,1}𝑛

the subdifferential is a product of intervals



 [−1, 1]
 𝑥𝑘 = 0
𝜕 𝑓 (𝑥) = 𝐽1 × · · · × 𝐽𝑛 , 𝐽 𝑘 = {1} 𝑥𝑘 > 0

 {−1}
 𝑥𝑘 < 0

(1, 1)
(−1, 1) (1, 1) (1, 1)

(−1, −1) (1, −1)

(1, −1)

𝜕 𝑓 (0, 0) = [−1, 1] × [−1, 1] 𝜕 𝑓 (1, 0) = {1} × [−1, 1] 𝜕 𝑓 (1, 1) = {(1, 1)}

Subgradients 2.16
Pointwise supremum

𝑓 (𝑥) = sup 𝑓𝛼 (𝑥), 𝑓𝛼 (𝑥) convex in 𝑥 for every 𝛼

𝛼∈A

Weak result: to find a subgradient at 𝑥ˆ ,

ˆ = 𝑓 𝛽 ( 𝑥)
• find any 𝛽 for which 𝑓 ( 𝑥) ˆ (assuming maximum is attained)
ˆ
• choose any 𝑔 ∈ 𝜕 𝑓 𝛽 ( 𝑥)

(Partial) strong result: define 𝐼 (𝑥) = {𝛼 ∈ A | 𝑓𝛼 (𝑥) = 𝑓 (𝑥)}

[
conv 𝜕 𝑓𝛼 (𝑥) ⊆ 𝜕 𝑓 (𝑥)
𝛼∈𝐼 (𝑥)

equality requires extra conditions (for example, A compact, 𝑓𝛼 continuous in 𝛼)

Subgradients 2.17
Exercise: maximum eigenvalue

Problem: explain how to find a subgradient of

𝑓 (𝑥) = 𝜆 max ( 𝐴(𝑥)) = sup 𝑦𝑇 𝐴(𝑥)𝑦

k𝑦k2 =1

where 𝐴(𝑥) = 𝐴0 + 𝑥 1 𝐴1 + · · · + 𝑥 𝑛 𝐴𝑛 with symmetric coefficients 𝐴𝑖

Solution: to find a subgradient at 𝑥ˆ ,

ˆ
• choose any unit eigenvector 𝑦 with eigenvalue 𝜆 max ( 𝐴( 𝑥))
• the gradient of 𝑦𝑇 𝐴(𝑥)𝑦 at 𝑥ˆ is a subgradient of 𝑓 :

ˆ
(𝑦𝑇 𝐴1 𝑦, . . . , 𝑦𝑇 𝐴𝑛 𝑦) ∈ 𝜕 𝑓 ( 𝑥)

Subgradients 2.18
Minimization

𝑓 (𝑥) = inf ℎ(𝑥, 𝑦), ℎ jointly convex in (𝑥, 𝑦)

𝑦

Weak result: to find a subgradient at 𝑥ˆ ,

• find 𝑦ˆ that minimizes ℎ( 𝑥,

ˆ 𝑦) (assuming minimum is attained)
• find subgradient (𝑔, 0) ∈ 𝜕ℎ( 𝑥,
ˆ 𝑦ˆ )

Proof: for all 𝑥 , 𝑦 ,

ℎ(𝑥, 𝑦) ≥ ˆ 𝑦ˆ ) + 𝑔𝑇 (𝑥 − 𝑥)
ℎ( 𝑥, ˆ + 0𝑇 (𝑦 − 𝑦ˆ )
= ˆ + 𝑔𝑇 (𝑥 − 𝑥)
𝑓 ( 𝑥) ˆ

therefore
𝑓 (𝑥) = inf ℎ(𝑥, 𝑦) ≥ 𝑓 ( 𝑥)
ˆ + 𝑔𝑇 (𝑥 − 𝑥)
ˆ
𝑦

Subgradients 2.19
Exercise: Euclidean distance to convex set

Problem: explain how to find a subgradient of

𝑓 (𝑥) = inf k𝑥 − 𝑦k2

𝑦∈𝐶

where 𝐶 is a closed convex set

Solution: to find a subgradient at 𝑥ˆ ,

ˆ = 0 (that is, 𝑥ˆ ∈ 𝐶) , take 𝑔 = 0

• if 𝑓 ( 𝑥)
ˆ > 0, find projection 𝑦ˆ = 𝑃( 𝑥)
• if 𝑓 ( 𝑥) ˆ on 𝐶 and take

1 1
𝑔= ( 𝑥ˆ − 𝑦ˆ ) = ( 𝑥ˆ − 𝑃( 𝑥))
ˆ
k 𝑦ˆ − 𝑥k
ˆ 2 k 𝑥ˆ − 𝑃( 𝑥)
ˆ k2

Subgradients 2.20
Composition

𝑓 (𝑥) = ℎ( 𝑓1 (𝑥), . . . , 𝑓 𝑘 (𝑥)), ℎ convex and nondecreasing, 𝑓𝑖 convex

Weak result: to find a subgradient at 𝑥ˆ ,

ˆ . . . , 𝑓 𝑘 ( 𝑥))
• find 𝑧 ∈ 𝜕ℎ( 𝑓1 ( 𝑥), ˆ and 𝑔𝑖 ∈ 𝜕 𝑓𝑖 ( 𝑥)
ˆ
ˆ
• then 𝑔 = 𝑧1 𝑔1 + · · · + 𝑧 𝑘 𝑔 𝑘 ∈ 𝜕 𝑓 ( 𝑥)

reduces to standard formula for differentiable ℎ, 𝑓𝑖

Proof:

𝑓 (𝑥) ≥ ˆ
ℎ 𝑓1 ( 𝑥) + 𝑔𝑇1 (𝑥 ˆ . . . , 𝑓 𝑘 ( 𝑥)
− 𝑥), ˆ
+ 𝑔𝑇𝑘 (𝑥 − 𝑥)ˆ

≥ ˆ . . . , 𝑓 𝑘 ( 𝑥))
ℎ ( 𝑓1 ( 𝑥), ˆ + 𝑧 𝑔1 (𝑥 − 𝑥),
𝑇 𝑇
ˆ . . . , 𝑔 𝑘 (𝑥 − 𝑥)
𝑇
ˆ

= ˆ . . . , 𝑓 𝑘 ( 𝑥))
ℎ ( 𝑓1 ( 𝑥), ˆ + (𝑧1 𝑔1 + · · · + 𝑧 𝑘 𝑔 𝑘 )𝑇 (𝑥 − 𝑥)
ˆ
= ˆ + 𝑔𝑇 (𝑥 − 𝑥)
𝑓 ( 𝑥) ˆ

Subgradients 2.21
Optimal value function

define 𝑓 (𝑢, 𝑣) as the optimal value of convex problem

minimize 𝑓0 (𝑥)
subject to 𝑓𝑖 (𝑥) ≤ 𝑢𝑖 , 𝑖 = 1, . . . , 𝑚
𝐴𝑥 = 𝑏 + 𝑣

(functions 𝑓𝑖 are convex; optimization variable is 𝑥 )

ˆ 𝑣ˆ ) is finite and strong duality holds with the dual

Weak result: suppose 𝑓 ( 𝑢,
!
X
maximize inf 𝑓0 (𝑥) + 𝜆𝑖 ( 𝑓𝑖 (𝑥) − 𝑢ˆ𝑖 ) + 𝜈𝑇 ( 𝐴𝑥 − 𝑏 − 𝑣ˆ )
𝑥
𝑖
subject to 𝜆0

if 𝜆ˆ , 𝜈ˆ are dual optimal (for right-hand sides 𝑢, ˆ −𝜈)

ˆ 𝑣ˆ ) then (−𝜆, ˆ ∈ 𝜕 𝑓 ( 𝑢,
ˆ 𝑣ˆ )

Subgradients 2.22
Proof: by weak duality for problem with right-hand sides 𝑢 , 𝑣
!
X
𝑓 (𝑢, 𝑣) ≥ inf 𝑓0 (𝑥) + 𝜆ˆ𝑖 ( 𝑓𝑖 (𝑥) − 𝑢𝑖 ) + 𝜈ˆ𝑇 ( 𝐴𝑥 − 𝑏 − 𝑣)
𝑥
!
𝑖

X
= inf 𝑓0 (𝑥) + 𝜆ˆ𝑖 ( 𝑓𝑖 (𝑥) − 𝑢ˆ𝑖 ) + 𝜈ˆ𝑇 ( 𝐴𝑥 − 𝑏 − 𝑣ˆ )
𝑥
𝑖

− 𝜆ˆ𝑇 (𝑢 − 𝑢) ˆ − 𝜈ˆ𝑇 (𝑣 − 𝑣ˆ )
= ˆ 𝑣ˆ ) − 𝜆ˆ𝑇 (𝑢 − 𝑢)
𝑓 ( 𝑢, ˆ − 𝜈ˆ𝑇 (𝑣 − 𝑣ˆ )

Subgradients 2.23
Expectation

𝑓 (𝑥) = E ℎ(𝑥, 𝑢) 𝑢 random, ℎ convex in 𝑥 for every 𝑢

Weak result: to find a subgradient at 𝑥ˆ ,

ˆ 𝑢)
• choose a function 𝑢 ↦→ 𝑔(𝑢) with 𝑔(𝑢) ∈ 𝜕𝑥 ℎ( 𝑥,
ˆ
• then, 𝑔 = E𝑢 𝑔(𝑢) ∈ 𝜕 𝑓 ( 𝑥)

Proof: by convexity of ℎ and definition of 𝑔(𝑢) ,

𝑓 (𝑥) = E ℎ(𝑥, 𝑢)

ˆ 𝑢) + 𝑔(𝑢)𝑇 (𝑥 − 𝑥)
≥ E ℎ( 𝑥, ˆ

= ˆ + 𝑔𝑇 (𝑥 − 𝑥)
𝑓 ( 𝑥) ˆ

Subgradients 2.24
Outline

• definition

• subgradient calculus

• duality and optimality conditions

• directional derivative
Optimality conditions — unconstrained

𝑥★ minimizes 𝑓 (𝑥) if and only

0 ∈ 𝜕 𝑓 (𝑥★)

𝑥★

this follows directly from the definition of subgradient:

𝑓 (𝑦) ≥ 𝑓 (𝑥★) + 0𝑇 (𝑦 − 𝑥★) for all 𝑦 ⇐⇒ 0 ∈ 𝜕 𝑓 (𝑥★)

Subgradients 2.25
Example: piecewise-linear minimization

𝑓 (𝑥) = max (𝑎𝑇𝑖 𝑥 + 𝑏𝑖 )

𝑖=1,...,𝑚

Optimality condition

0 ∈ conv {𝑎𝑖 | 𝑖 ∈ 𝐼 (𝑥★)} where 𝐼 (𝑥) = {𝑖 | 𝑎𝑇𝑖 𝑥 + 𝑏𝑖 = 𝑓 (𝑥)}

• in other words, 𝑥★ is optimal if and only if there is a 𝜆 with

X
𝑚
𝜆 0, 1𝑇 𝜆 = 1, 𝜆𝑖 𝑎𝑖 = 0, 𝜆𝑖 = 0 for 𝑖 ∉ 𝐼 (𝑥★)
𝑖=1

• these are the optimality conditions for the equivalent linear program

minimize 𝑡 maximize 𝑏𝑇 𝜆
subject to 𝐴𝑥 + 𝑏 𝑡1 subject to 𝐴𝑇 𝜆 = 0
𝜆 0, 1𝑇 𝜆 = 1

Subgradients 2.26
Optimality conditions — constrained

minimize 𝑓0 (𝑥)
subject to 𝑓𝑖 (𝑥) ≤ 0, 𝑖 = 1, . . . , 𝑚

assume dom 𝑓𝑖 = R𝑛 , so functions 𝑓𝑖 are subdifferentiable everywhere

Karush–Kuhn–Tucker conditions

if strong duality holds, then 𝑥★, 𝜆★ are primal, dual optimal if and only if
1. 𝑥★ is primal feasible

2. 𝜆★ 0

3. 𝜆★ 𝑓 (𝑥 ★) = 0 for 𝑖 = 1, . . . , 𝑚
𝑖 𝑖
P𝑚
4. 𝑥★ is a minimizer of 𝐿 (𝑥, 𝜆★) = 𝑓0 (𝑥) + 𝜆★ 𝑓 (𝑥) :
𝑖=1 𝑖 𝑖

X
𝑚
0 ∈ 𝜕 𝑓0 (𝑥 ) +
★
𝜆★
𝑖 𝜕 𝑓𝑖 (𝑥 ★
)
𝑖=1

Subgradients 2.27
Outline

• definition

• subgradient calculus

• duality and optimality conditions

• directional derivative
Directional derivative

Definition (for general 𝑓 ): the directional derivative of 𝑓 at 𝑥 in the direction 𝑦 is

0 𝑓 (𝑥 + 𝛼𝑦) − 𝑓 (𝑥)
𝑓 (𝑥; 𝑦) = lim
𝛼&0 𝛼

1
= lim 𝑡 𝑓 (𝑥 + 𝑦) − 𝑡 𝑓 (𝑥)
𝑡→∞ 𝑡

(if the limit exists)

• 𝑓 0 (𝑥; 𝑦) is the right derivative of 𝑔(𝛼) = 𝑓 (𝑥 + 𝛼𝑦) at 𝛼 = 0

• 𝑓 0 (𝑥; 𝑦) is homogeneous in 𝑦 :

𝑓 0 (𝑥; 𝜆𝑦) = 𝜆 𝑓 0 (𝑥; 𝑦) for 𝜆 ≥ 0

Subgradients 2.28
Directional derivative of a convex function

Equivalent definition (for convex 𝑓 ): replace lim with inf

0 𝑓 (𝑥 + 𝛼𝑦) − 𝑓 (𝑥)
𝑓 (𝑥; 𝑦) = inf
𝛼

𝛼>0
1
= inf 𝑡 𝑓 (𝑥 + 𝑦) − 𝑡 𝑓 (𝑥)
𝑡>0 𝑡

Proof

• the function ℎ(𝑦) = 𝑓 (𝑥 + 𝑦) − 𝑓 (𝑥) is convex in 𝑦 , with ℎ(0) = 0

• its perspective 𝑡ℎ(𝑦/𝑡) is nonincreasing in 𝑡 (ECE236B ex. A3.5); hence

𝑓 0 (𝑥; 𝑦) = lim 𝑡ℎ(𝑦/𝑡) = inf 𝑡ℎ(𝑦/𝑡)

𝑡→∞ 𝑡>0

Subgradients 2.29
Properties

consequences of the expressions (for convex 𝑓 )

0 𝑓 (𝑥 + 𝛼𝑦) − 𝑓 (𝑥)
𝑓 (𝑥; 𝑦) = inf
𝛼

𝛼>0
1
= inf 𝑡 𝑓 (𝑥 + 𝑦) − 𝑡 𝑓 (𝑥)
𝑡>0 𝑡

• 𝑓 0 (𝑥; 𝑦) is convex in 𝑦 (partial minimization of a convex function in 𝑦 , 𝑡 )

• 𝑓 0 (𝑥; 𝑦) defines a lower bound on 𝑓 in the direction 𝑦 :

𝑓 (𝑥 + 𝛼𝑦) ≥ 𝑓 (𝑥) + 𝛼 𝑓 0 (𝑥; 𝑦) for all 𝛼 ≥ 0

Subgradients 2.30
Directional derivative and subgradients

for convex 𝑓 and 𝑥 ∈ int dom 𝑓

𝑓 0 (𝑥; 𝑦) = sup 𝑔𝑇 𝑦
𝑔∈𝜕 𝑓 (𝑥)
𝑓ˆ0 (𝑥, 𝑦) = 𝑔𝑇 𝑦

𝑔ˆ
𝑓 0 (𝑥; 𝑦) is support function of 𝜕 𝑓 (𝑥)
𝑦 𝜕 𝑓 (𝑥)

• generalizes 𝑓 0 (𝑥; 𝑦) = ∇ 𝑓 (𝑥)𝑇 𝑦 for differentiable functions

• implies that 𝑓 0 (𝑥; 𝑦) exists for all 𝑥 ∈ int dom 𝑓 , all 𝑦 (see page 2.4)

Subgradients 2.31
Proof: if 𝑔 ∈ 𝜕 𝑓 (𝑥) then from page 2.29

0 𝑓 (𝑥) + 𝛼𝑔𝑇 𝑦 − 𝑓 (𝑥)

𝑓 (𝑥; 𝑦) ≥ inf = 𝑔𝑇 𝑦
𝛼>0 𝛼

it remains to show that 𝑓 0 (𝑥; 𝑦) = 𝑔ˆ𝑇 𝑦 for at least one 𝑔ˆ ∈ 𝜕 𝑓 (𝑥)

• 𝑓 0 (𝑥; 𝑦) is convex in 𝑦 with domain R𝑛 , hence subdifferentiable at all 𝑦

• let 𝑔ˆ be a subgradient of 𝑓 0 (𝑥; 𝑦) at 𝑦 : then for all 𝑣, 𝜆 ≥ 0,

𝜆 𝑓 0 (𝑥; 𝑣) = 𝑓 0 (𝑥; 𝜆𝑣) ≥ 𝑓 0 (𝑥; 𝑦) + 𝑔ˆ𝑇 (𝜆𝑣 − 𝑦)

• taking 𝜆 → ∞ shows that 𝑓 0 (𝑥; 𝑣) ≥ 𝑔ˆ𝑇 𝑣; from the lower bound on page 2.30,

𝑓 (𝑥 + 𝑣) ≥ 𝑓 (𝑥) + 𝑓 0 (𝑥; 𝑣) ≥ 𝑓 (𝑥) + 𝑔ˆ𝑇 𝑣 for all 𝑣

hence 𝑔ˆ ∈ 𝜕 𝑓 (𝑥)

• taking 𝜆 = 0 we see that 𝑓 0 (𝑥; 𝑦) ≤ 𝑔ˆ𝑇 𝑦

Subgradients 2.32
Descent directions and subgradients

𝑦 is a descent direction of 𝑓 at 𝑥 if 𝑓 0 (𝑥; 𝑦) < 0

• the negative gradient of a differentiable 𝑓 is a descent direction (if ∇ 𝑓 (𝑥) ≠ 0)

• negative subgradient is not always a descent direction

Example: 𝑓 (𝑥 1, 𝑥 2) = |𝑥 1 | + 2|𝑥 2 |
𝑥2

𝑔 = (1, 2)

𝑥1
(1, 0)

𝑔 = (1, 2) ∈ 𝜕 𝑓 (1, 0) , but 𝑦 = (−1, −2) is not a descent direction at (1, 0)

Subgradients 2.33
Steepest descent direction

Definition: (normalized) steepest descent direction at 𝑥 ∈ int dom 𝑓 is

Δ𝑥nsd = argmin 𝑓 0 (𝑥; 𝑦)

k𝑦k2 ≤1

Δ𝑥nsd is the primal solution 𝑦 of the pair of dual problems (BV §8.1.3)

minimize (over 𝑦 ) 𝑓 0 (𝑥; 𝑦) maximize (over 𝑔 ) −k𝑔k2

subject to k𝑦k2 ≤ 1 subject to 𝑔 ∈ 𝜕 𝑓 (𝑥)

• dual optimal 𝑔★ is subgradient with least norm

• 𝑓 0 (𝑥; Δ𝑥nsd) = −k𝑔★ k2 𝜕 𝑓 (𝑥)
𝑔★
• if 0 ∉ 𝜕 𝑓 (𝑥) , Δ𝑥nsd = −𝑔★/k𝑔★ k2
• Δ𝑥nsd can be expensive to compute

Δ𝑥nsd 𝑔𝑇 Δ𝑥nsd = 𝑓 0 (𝑥, Δ𝑥nsd)

Subgradients 2.34
Subgradients and distance to sublevel sets

if 𝑓 is convex, 𝑓 (𝑦) < 𝑓 (𝑥) , 𝑔 ∈ 𝜕 𝑓 (𝑥) , then for small 𝑡 > 0,

k𝑥 − 𝑡𝑔 − 𝑦k22 = k𝑥 − 𝑦k22 − 2𝑡𝑔𝑇 (𝑥 − 𝑦) + 𝑡 2 k𝑔k22

≤ k𝑥 − 𝑦k22 − 2𝑡 ( 𝑓 (𝑥) − 𝑓 (𝑦)) + 𝑡 2 k𝑔k22
< k𝑥 − 𝑦k22

• −𝑔 is descent direction for k𝑥 − 𝑦k2, for any 𝑦 with 𝑓 (𝑦) < 𝑓 (𝑥)

• in particular, −𝑔 is descent direction for distance to any minimizer of 𝑓

Subgradients 2.35
References

• A. Beck, First-Order Methods in Optimization (2017), chapter 3.

• D. P. Bertsekas, A. Nedić, A. E. Ozdaglar, Convex Analysis and Optimization

(2003), chapter 4.

• J.-B. Hiriart-Urruty, C. Lemaréchal, Convex Analysis and Minimization Algoritms

(1993), chapter VI.

• Yu. Nesterov, Lectures on Convex Optimization (2018), section 3.1.

• B. T. Polyak, Introduction to Optimization (1987), section 5.1.

Subgradients 2.36

Exercises With Solutions PDF
No ratings yet
Exercises With Solutions PDF
37 pages
A Study On Euler Graph and Hamiltonian Graph PDF
100% (1)
A Study On Euler Graph and Hamiltonian Graph PDF
18 pages
Subgradients in Convex Analysis
No ratings yet
Subgradients in Convex Analysis
39 pages
Subgradient Methods
No ratings yet
Subgradient Methods
56 pages
Subgradients: Ryan Tibshirani Convex Optimization 10-725
No ratings yet
Subgradients: Ryan Tibshirani Convex Optimization 10-725
25 pages
Subgradients Slides
No ratings yet
Subgradients Slides
37 pages
Subgradient Method for Optimization
No ratings yet
Subgradient Method for Optimization
33 pages
Lect4 Removed
No ratings yet
Lect4 Removed
32 pages
Notes On Subgradients
No ratings yet
Notes On Subgradients
13 pages
Subgradient Method for ECE Students
No ratings yet
Subgradient Method for ECE Students
22 pages
Primal - Dual Decomposition Methods
No ratings yet
Primal - Dual Decomposition Methods
40 pages
Lecture 3 Gradient Descent
No ratings yet
Lecture 3 Gradient Descent
37 pages
Lecture 3 Gradient Descent
No ratings yet
Lecture 3 Gradient Descent
37 pages
Gradient - Descent Important 23-24
No ratings yet
Gradient - Descent Important 23-24
37 pages
Subgrad Method Slides
No ratings yet
Subgrad Method Slides
33 pages
02-Subgrad Method Notes
No ratings yet
02-Subgrad Method Notes
27 pages
Chapter 3 Unconstrained Convex Optimization
No ratings yet
Chapter 3 Unconstrained Convex Optimization
28 pages
Latex For Mu
No ratings yet
Latex For Mu
3 pages
Gradient Method in Convex Optimization
No ratings yet
Gradient Method in Convex Optimization
31 pages
Week02 Convex Optimization
No ratings yet
Week02 Convex Optimization
48 pages
Chapter 4: Unconstrained Optimization
No ratings yet
Chapter 4: Unconstrained Optimization
25 pages
EE658 Lecture18 2024
No ratings yet
EE658 Lecture18 2024
18 pages
5 1 SD 17122020
No ratings yet
5 1 SD 17122020
47 pages
Lewis1999 Article NonsmoothAnalysisOfEigenvalues
No ratings yet
Lewis1999 Article NonsmoothAnalysisOfEigenvalues
24 pages
Chapter 4
No ratings yet
Chapter 4
65 pages
Gradient Descent
No ratings yet
Gradient Descent
52 pages
Subgradient Method: Ryan Tibshirani Convex Optimization 10-725
No ratings yet
Subgradient Method: Ryan Tibshirani Convex Optimization 10-725
21 pages
O4MD 03 Descent Methods
No ratings yet
O4MD 03 Descent Methods
18 pages
Gradient
No ratings yet
Gradient
37 pages
Convex Problems
No ratings yet
Convex Problems
48 pages
DS303: Introduction To Machine Learning: Stochastic Gradient Descent
No ratings yet
DS303: Introduction To Machine Learning: Stochastic Gradient Descent
19 pages
CS769 2025 Lecture 8-Annotated
No ratings yet
CS769 2025 Lecture 8-Annotated
37 pages
4 Chapter 21 Non Linear Programming
No ratings yet
4 Chapter 21 Non Linear Programming
37 pages
Gradinet
No ratings yet
Gradinet
51 pages
Clnote Sept24
No ratings yet
Clnote Sept24
24 pages
Lecture 7 8 Other Descent Methods
No ratings yet
Lecture 7 8 Other Descent Methods
7 pages
MIT ML Optimization Lecture
No ratings yet
MIT ML Optimization Lecture
89 pages
Coordinate Descent Algorithms: Stephen J. Wright
No ratings yet
Coordinate Descent Algorithms: Stephen J. Wright
32 pages
Homework Template 1
No ratings yet
Homework Template 1
2 pages
DL 1
No ratings yet
DL 1
10 pages
Fortran MMA & GCMMA Algorithms Guide
No ratings yet
Fortran MMA & GCMMA Algorithms Guide
23 pages
Bregman
No ratings yet
Bregman
9 pages
Non Linear Optimization Modified
No ratings yet
Non Linear Optimization Modified
78 pages
Chapter 9st - Non-Linear Programming
No ratings yet
Chapter 9st - Non-Linear Programming
21 pages
L10 - Subgrad - PGD (Partially Annotated)
No ratings yet
L10 - Subgrad - PGD (Partially Annotated)
39 pages
Algorithmic Stability
No ratings yet
Algorithmic Stability
87 pages
Han-Minimization Principles For EHI
No ratings yet
Han-Minimization Principles For EHI
13 pages
Lecture 11
No ratings yet
Lecture 11
4 pages
5165 Test 2 Cheating
No ratings yet
5165 Test 2 Cheating
7 pages
Gradient Descent in Convex Optimization
No ratings yet
Gradient Descent in Convex Optimization
27 pages
Sheet 2 Solution
No ratings yet
Sheet 2 Solution
5 pages
Mathematical Methods of Optimization
No ratings yet
Mathematical Methods of Optimization
62 pages
Stats 102B Cheat Sheet
No ratings yet
Stats 102B Cheat Sheet
4 pages
Lec 11
No ratings yet
Lec 11
13 pages
C2 M2 Exam Withsol
No ratings yet
C2 M2 Exam Withsol
12 pages
Math Lecture 4
No ratings yet
Math Lecture 4
27 pages
Midterm 1 Notes
No ratings yet
Midterm 1 Notes
46 pages
Data Science - Convex Optimization and Examples PDF
No ratings yet
Data Science - Convex Optimization and Examples PDF
9 pages
Xu2001 Minimax
No ratings yet
Xu2001 Minimax
13 pages
Solving Linear Integral Equations in Maple
No ratings yet
Solving Linear Integral Equations in Maple
8 pages
SL 2.3 Graphing
No ratings yet
SL 2.3 Graphing
211 pages
Skema LKM 6
No ratings yet
Skema LKM 6
2 pages
Unit 4: Taylor Series Method
No ratings yet
Unit 4: Taylor Series Method
5 pages
CP League Graphs and Trees
No ratings yet
CP League Graphs and Trees
36 pages
Excel Solver for Non-Linear Systems
No ratings yet
Excel Solver for Non-Linear Systems
5 pages
CHE 411 Lesson 10 Note
No ratings yet
CHE 411 Lesson 10 Note
52 pages
Discrete-Time Fourier Analysis
No ratings yet
Discrete-Time Fourier Analysis
72 pages
LP - Graphical Method
No ratings yet
LP - Graphical Method
14 pages
Quadratic Equations - Practice Sheet
No ratings yet
Quadratic Equations - Practice Sheet
5 pages
Euler Method For Solving Differential Equations
No ratings yet
Euler Method For Solving Differential Equations
9 pages
Laplace Transform for Engineers
No ratings yet
Laplace Transform for Engineers
13 pages
Calculus Limits for Beginners
No ratings yet
Calculus Limits for Beginners
42 pages
7 Integration by Rationalizing Substitution
No ratings yet
7 Integration by Rationalizing Substitution
14 pages
CBSE Class 12 Continuity and Differentiability Study Notes PDF
100% (1)
CBSE Class 12 Continuity and Differentiability Study Notes PDF
56 pages
Telangana 1st Year Math Study Guide
No ratings yet
Telangana 1st Year Math Study Guide
128 pages
Derivatives Exercises
No ratings yet
Derivatives Exercises
1 page
Exp 1
No ratings yet
Exp 1
6 pages
(Ebook) Mathematical Handbook of Formulas and Tables by Murray Spiegel, Seymour Lipschutz, John Liu ISBN 9780071548557, 9780071548564, 0071548556, 0071548564 Download
100% (1)
(Ebook) Mathematical Handbook of Formulas and Tables by Murray Spiegel, Seymour Lipschutz, John Liu ISBN 9780071548557, 9780071548564, 0071548556, 0071548564 Download
134 pages
Wa0002
No ratings yet
Wa0002
2 pages
Topics in Linear and Nonlinear Functional Analysis
No ratings yet
Topics in Linear and Nonlinear Functional Analysis
420 pages
Inverse Functions Activity
100% (4)
Inverse Functions Activity
3 pages
First Quarterly Exam Mathematics Grade 9: 2 ST ND RD TH
No ratings yet
First Quarterly Exam Mathematics Grade 9: 2 ST ND RD TH
4 pages
Unit 4
No ratings yet
Unit 4
16 pages
Eulerian Circuits & Graph Theory
No ratings yet
Eulerian Circuits & Graph Theory
8 pages
Class File Algebra
No ratings yet
Class File Algebra
66 pages
Daa Part 3
No ratings yet
Daa Part 3
24 pages
Complex Analysis Course Syllabus
No ratings yet
Complex Analysis Course Syllabus
36 pages
Gauss-Seidel Method for Linear Equations
No ratings yet
Gauss-Seidel Method for Linear Equations
7 pages