0% found this document useful (0 votes)

151 views37 pages

Gradient

The document summarizes key concepts related to gradient methods for minimizing convex functions. It discusses: 1) The gradient method which iteratively updates the current point by moving in the negative gradient direction with a step size determined by line search. 2) Properties of convex functions including Jensen's inequality and the gradient being a monotone mapping. 3) The definition of a Lipschitz continuous gradient, which bounds the change in gradient as a function of distance between points. This property implies a quadratic upper bound on the function.

Uploaded by

Sanjay Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

151 views37 pages

Gradient

Uploaded by

Sanjay Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 37

L.

Vandenberghe ECE236C (Spring 2019)

1. Gradient method

• gradient method, first-order methods

• convex functions

• Lipschitz continuity of gradient

• strong convexity

• analysis of gradient method

1.1
Gradient method

to minimize a convex differentiable function f : choose an initial point x0 and repeat

x k+1 = x k − t k ∇ f (x k ), k = 0, 1, . . .

step size t k is constant or determined by line search

Advantages

• every iteration is inexpensive

• does not require second derivatives

Notation

• x k can refer to k th element of a sequence, or to the k th component of vector x

• to avoid confusion, we sometimes use x (k) to denote elements of a sequence

Gradient method 1.2

Quadratic example

1
f (x) = (x12 + γx22) (with γ > 1)
2

with exact line search and starting point x (0) = (γ, 1)

4
k x (k) − x? k2 γ−1 k

=
(0)
k x − x k2 ? γ+1 0
x2

where x? = 0
−4
−10 0 10
x1

gradient method is often slow; convergence very dependent on scaling

Gradient method 1.3

Nondifferentiable example

x1 + γ|x2 |
q
f (x) = x12 + γx22 if |x2 | ≤ x1, f (x) = p if |x2 | > x1
1+γ

with exact line search, starting point x (0) = (γ, 1), converges to non-optimal point

0
x2

−2
−2 0 2 4
x1

gradient method does not handle nondifferentiable problems

Gradient method 1.4

First-order methods

address one or both shortcomings of the gradient method

Methods for nondifferentiable or constrained problems

• subgradient method
• proximal gradient method
• smoothing methods
• cutting-plane methods

Methods with improved convergence

• conjugate gradient method

• accelerated gradient method
• quasi-Newton methods

Gradient method 1.5

Outline

• gradient method, first-order methods

• convex functions

• Lipschitz continuity of gradient

• strong convexity

• analysis of gradient method

Convex function

a function f is convex if dom f is a convex set and Jensen’s inequality holds:

f (θ x + (1 − θ)y) ≤ θ f (x) + (1 − θ) f (y) for all x, y ∈ dom f , θ ∈ [0, 1]

First-order condition

for (continuously) differentiable f , Jensen’s inequality can be replaced with

f (y) ≥ f (x) + ∇ f (x)T (y − x) for all x, y ∈ dom f

Second-order condition

for twice differentiable f , Jensen’s inequality can be replaced with

∇2 f (x) 0 for all x ∈ dom f

Gradient method 1.6

Strictly convex function

f is strictly convex if dom f is a convex set and

f (θ x + (1 − θ)y) < θ f (x) + (1 − θ) f (y) for all x, y ∈ dom f , x , y , and θ ∈ (0, 1)

strict convexity implies that if a minimizer of f exists, it is unique

First-order condition

for differentiable f , strict Jensen’s inequality can be replaced with

f (y) > f (x) + ∇ f (x)T (y − x) for all x, y ∈ dom f , x , y

Second-order condition

note that ∇2 f (x) 0 is not necessary for strict convexity (cf., f (x) = x 4)

Gradient method 1.7

Monotonicity of gradient

a differentiable function f is convex if and only if dom f is convex and

(∇ f (x) − ∇ f (y))T (x − y) ≥ 0 for all x, y ∈ dom f

i.e., the gradient ∇ f : Rn → Rn is a monotone mapping

a differentiable function f is strictly convex if and only if dom f is convex and

(∇ f (x) − ∇ f (y))T (x − y) > 0 for all x, y ∈ dom f , x , y

i.e., the gradient ∇ f : Rn → Rn is a strictly monotone mapping

Gradient method 1.8

Proof

• if f is differentiable and convex, then

f (y) ≥ f (x) + ∇ f (x)T (y − x), f (x) ≥ f (y) + ∇ f (y)T (x − y)

combining the inequalities gives (∇ f (x) − ∇ f (y))T (x − y) ≥ 0

• if ∇ f is monotone, then g0(t) ≥ g0(0) for t ≥ 0 and t ∈ dom g , where

g(t) = f (x + t(y − x)), g0(t) = ∇ f (x + t(y − x))T (y − x)

hence
∫ 1
f (y) = g(1) = g(0) + g0(t) dt ≥ g(0) + g0(0)
0
= f (x) + ∇ f (x)T (y − x)

this is the first-order condition for convexity

Gradient method 1.9

Outline

• gradient method, first-order methods

• convex functions

• Lipschitz continuity of gradient

• strong convexity

• analysis of gradient method

Lipschitz continuous gradient

the gradient of f is Lipschitz continuous with parameter L > 0 if

k∇ f (x) − ∇ f (y)k∗ ≤ L k x − yk for all x, y ∈ dom f

• functions f with this property are also called L -smooth

• the definition does not assume convexity of f (and holds for − f if it holds for f )

• in the definition, k · k and k · k∗ are a pair of dual norms:

uT v
kuk∗ = sup = sup uT v
v,0 kvk kvk=1

this implies a generalized Cauchy–Schwarz inequality

|uT v| ≤ kuk∗ kvk for all u, v

Gradient method 1.10

Choice of norm

Equivalence of norms

• for any two norms k · ka, k · kb, there exist positive constants c1, c2 such that

c1 k xkb ≤ k xka ≤ c2 k xkb for all x

• constants depend on dimension; for example, for x ∈ Rn,

√ 1
k xk2 ≤ k xk1 ≤ n k xk2, √ k xk2 ≤ k xk∞ ≤ k xk2
n

Norm in definition of Lipschitz continuity

• without loss of generality we can use the Euclidean norm k · k = k · k∗ = k · k2

• the parameter L depends on choice of norm
• in complexity bounds, choice of norm can simplify dependence on dimensions

Gradient method 1.11

Quadratic upper bound
suppose ∇ f is Lipschitz continuous with parameter L

• this implies (from the generalized Cauchy–Schwarz inequality) that

(∇ f (x) − ∇ f (y))T (x − y) ≤ L k x − yk 2 for all x, y ∈ dom f (1)

• if dom f is convex, (1) is equivalent to

L
f (y) ≤ f (x) + ∇ f (x) (y − x) + k y − xk 2 for all x, y ∈ dom f
T
(2)
2

f (y) (x, f (x))

Gradient method 1.12

Proof (of the equivalence of (1) and (2) if dom f is convex)
• consider arbitrary x, y ∈ dom f and define g(t) = f (x + t(y − x))
• g(t) is defined for t ∈ [0, 1] because dom f is convex
• if (1) holds, then

g0(t) − g0(0) = (∇ f (x + t(y − x)) − ∇ f (x))T (y − x) ≤ t L k x − yk 2

integrating from t = 0 to t = 1 gives (2):

∫ 1 L
f (y) = g(1) = g(0) + g0(t) dt ≤ g(0) + g0(0) + k x − yk 2
0 2
L
= f (x) + ∇ f (x) (y − x) + k x − yk 2
T
2

• conversely, if (2) holds, then (2) and the same inequality with x , y switched, i.e.,
L
f (x) ≤ f (y) + ∇ f (y) (x − y) + k x − yk 2,
T
2

can be combined to give (∇ f (x) − ∇ f (y))T (x − y) ≤ L k x − yk 2

Gradient method 1.13

Consequence of quadratic upper bound

if dom f = Rn and f has a minimizer x?, then

1 L
k∇ f (z)k∗ ≤ f (z) − f (x ) ≤ kz − x? k 2 for all z
2 ?
2L 2

• right-hand inequality follows from upper bound property (2) at x = x?, y = z

• left-hand inequality follows by minimizing quadratic upper bound for x = z

L

inf f (y) ≤ inf f (z) + ∇ f (z)T (y − z) + k y − zk 2
y y 2

Lt 2
= inf inf f (z) + t∇ f (z)T v +
kvk=1 t 2
1

= inf f (z) − (∇ f (z)T v)2
kvk=1 2L
1
= f (z) − k∇ f (z)k∗2
2L

Gradient method 1.14

Co-coercivity of gradient

if f is convex with dom f = Rn and ∇ f is L -Lipschitz continuous, then

1
(∇ f (x) − ∇ f (y)) (x − y) ≥ k∇ f (x) − ∇ f (y)k∗2 for all x, y
T
L

• this property is known as co-coercivity of ∇ f (with parameter 1/L )

• co-coercivity in turn implies Lipschitz continuity of ∇ f (by Cauchy–Schwarz)

• hence, for differentiable convex f with dom f = Rn

Lipschitz continuity of ∇ f ⇒ upper bound property (2) (equivalently, (1))

⇒ co-coercivity of ∇ f
⇒ Lipschitz continuity of ∇ f

therefore the three properties are equivalent

Gradient method 1.15

Proof of co-coercivity: define two convex functions fx , fy with domain Rn

fx (z) = f (z) − ∇ f (x)T z, fy (z) = f (z) − ∇ f (y)T z

• the two functions have L -Lipschitz continuous gradients

• z = x minimizes fx (z); from the left-hand inequality on page 1.14,

f (y) − f (x) − ∇ f (x)T (y − x) = fx (y) − fx (x)

1
≥ k∇ fx (y)k∗2
2L
1
= k∇ f (y) − ∇ f (x)k∗2
2L

• similarly, z = y minimizes fy (z); therefore

1
T
f (x) − f (y) − ∇ f (y) (x − y) ≥ k∇ f (y) − ∇ f (x)k∗2
2L

combining the two inequalities shows co-coercivity

Gradient method 1.16
Lipschitz continuity with respect to Euclidean norm

supose f is convex with dom f = Rn, and L -smooth for the Euclidean norm:

k∇ f (x) − ∇ f (y)k2 ≤ L k x − yk2 for all x , y

• the equivalent property (1) states that

(∇ f (x) − ∇ f (y))T (x − y) ≤ L(x − y)T (x − y) for all x , y

• this is monotonicity of L x − ∇ f (x), i.e., equivalent to the property that

L
k xk22 − f (x) is a convex function
2

• if f is twice differentiable, the Hessian of this function is LI − ∇2 f (x):

λmax(∇2 f (x)) ≤ L for all x

is an equivalent characterization of L -smoothness

Gradient method 1.17

Outline

• gradient method, first-order methods

• convex functions

• Lipschitz continuity of gradient

• strong convexity

• analysis of gradient method

Strongly convex function

f is strongly convex with parameter m > 0 if dom f is convex and

m
f (θ x + (1 − θ)y) ≤ θ f (x) + (1 − θ) f (y) − θ(1 − θ)k x − yk 2
2

holds for all x, y ∈ dom f , θ ∈ [0, 1]

• this is a stronger version of Jensen’s inequality

• it holds if and only if it holds for f restricted to arbitrary lines:

m 2
f (x + t(y − x)) − t k x − yk 2 (3)
2

is a convex function of t , for all x, y ∈ dom f

• without loss of generality, we can take k · k = k · k2

• however, the strong convexity parameter m depends on the norm used

Gradient method 1.18

Quadratic lower bound
if f is differentiable and m-strongly convex, then
m
f (y) ≥ f (x) + ∇ f (x) (y − x) + k y − xk 2 for all x, y ∈ dom f
T
(4)
2

f (y)

(x, f (x))

• follows from the 1st order condition of convexity of (3)

• this implies that the sublevel sets of f are bounded
• if f is closed (has closed sublevel sets), it has a unique minimizer x? and
m ? 2 1
?
kz − x k ≤ f (z) − f (x ) ≤ k∇ f (z)k∗2 for all z ∈ dom f
2 2m
(proof as on page 1.14)
Gradient method 1.19
Strong monotonicity

differentiable f is strongly convex if and only if dom f is convex and

(∇ f (x) − ∇ f (y))T (x − y) ≥ mk x − yk 2 for all x, y ∈ dom f

this is called strong monotonicity (coercivity) of ∇ f

Proof

• one direction follows from (4) and the same inequality with x and y switched
• for the other direction, assume ∇ f is strongly monotone and define

m 2
g(t) = f (x + t(y − x)) − t k x − yk 2
2

then g0(t) is nondecreasing, so g is convex

Gradient method 1.20

Strong convexity with respect to Euclidean norm

suppose f is m-strongly convex for the Euclidean norm:

m
f (θ x + (1 − θ)y) ≤ θ f (x) + (1 − θ) f (y) − θ(1 − θ)k x − yk22
2

for x, y ∈ dom f , θ ∈ [0, 1]

• this is Jensen’s inequality for the function

m
h(x) = f (x) − k xk22
2

• therefore f is strongly convex if and only if h is convex

• if f is twice differentiable, h is convex if and only if ∇2 f (x) − mI 0, or

λmin(∇2 f (x)) ≥ m for all x ∈ dom f

Gradient method 1.21

Extension of co-coercivity

suppose f is m-strongly convex and L -smooth for k · k2, and dom f = Rn

• then the function

m
h(x) = f (x) − k xk22
2
is convex and (L − m)-smooth:

0 ≤ (∇h(x) − ∇h(y))T (x − y)
= (∇ f (x) − ∇ f (y))T (x − y) − mk x − yk22
≤ (L − m)k x − yk22

• co-coercivity of ∇h can be written as

mL 1
(∇ f (x) − ∇ f (y))T (x − y) ≥ k x − yk22 + k∇ f (x) − ∇ f (y)k22
m+L m+L

for all x, y ∈ dom f

Gradient method 1.22

Outline

• gradient method, first-order methods

• convex functions

• Lipschitz continuity of gradient

• strong convexity

• analysis of gradient method

Analysis of gradient method

x k+1 = x k − t k ∇ f (x k ), k = 0, 1, . . .

with fixed step size or backtracking line search

Assumptions

1. f is convex and differentiable with dom f = Rn

2. ∇ f (x) is L -Lipschitz continuous with respect to the Euclidean norm, with L > 0

3. optimal value f ? = inf x f (x) is finite and attained at x?

Gradient method 1.23

Basic gradient step

• from quadratic upper bound (page 1.12) with y = x − t∇ f (x):

Lt
f (x − t∇ f (x)) ≤ f (x) − t(1 − ) k∇ f (x)k22
2

• therefore, if x + = x − t∇ f (x) and 0 < t ≤ 1/L ,

t
f (x ) ≤ f (x) − k∇ f (x)k22
+
(5)
2

• from (5) and convexity of f ,

t
f (x +) − f ? ≤ ∇ f (x)T (x − x?) − k∇ f (x)k22
2
1 ? 2 ?
2
k x − x k2 − x − x − t∇ f (x) 2

=
2t
1
k x − x? k22 − k x + − x? k22

= (6)
2t

Gradient method 1.24

Descent properties

assume ∇ f (x) , 0

• the inequality (5) shows that

f (x +) < f (x)

• the inequality (6) shows that

k x + − x? k2 < k x − x? k2

in the gradient method, function value and distance to the optimal set decrease

Gradient method 1.25

Gradient method with constant step size

x k+1 = x k − t∇ f (x k ), k = 0, 1, . . .

• take x = xi−1, x + = xi in (6) and add the bounds for i = 1, . . . , k :

k k
? 1X ? 2 ? 2

( f (xi ) − f ) ≤ k xi−1 − x k2 − k xi − x k2
X
i=1 2t i=1
1
k x0 − x? k22 − k x k − x? k22

=
2t
1
≤ k x0 − x? k22
2t

• since f (xi ) is non-increasing (see (5))

k
1X 1
?
f (x k ) − f ≤ ( f (xi ) − f ?) ≤ k x0 − x? k22
k i=1 2kt

Conclusion: number of iterations to reach f (x k ) − f ? ≤ is O(1/)

Gradient method 1.26
Backtracking line search

initialize t k at tˆ > 0 (for example, tˆ = 1) and take t k := βt k until

f (x k − t k ∇ f (x k )) < f (x k ) − αt k k∇ f (x k )k22

f (xk − t∇ f (xk ))

f (xk ) − αt k∇ f (xk )k22

f (xk ) − t k∇ f (xk )k22

0 < β < 1; we will take α = 1/2 (mostly to simplify proofs)

Gradient method 1.27

Analysis for backtracking line search

line search with α = 1/2, if f has a Lipschitz continuous gradient

tL
f (xk ) − t(1 − )k∇ f (xk )k22
2
t
f (xk ) − k∇ f (xk )k22
2

f (xk − t∇ f (xk ))

t = 1/L

selected step size satisfies t k ≥ tmin = min{tˆ, β/L}

Gradient method 1.28

Gradient method with backtracking line search

• from line search condition and convexity of f ,

ti
f (xi+1) ≤ f (xi ) − k∇ f (xi )k22
2
ti
≤ f ? + ∇ f (xi )T (xi − x?) − k∇ f (xi )k22
2
? 1 ? 2 ? 2

= f + k xi − x k2 − k xi+1 − x k2
2ti

• this implies k xi+1 − x? k2 ≤ k xi − x? k , so we can replace ti with tmin ≤ ti :

? 1 ? 2 ? 2

f (xi+1) − f ≤ k xi − x k2 − k xi−1 − x k2
2tmin

• adding the upper bounds gives same 1/k bound as with constant step size

k
1X 1
?
f (x k ) − f ≤ ( f (xi ) − f ?) ≤ k x0 − x? k22
k i=1 2ktmin

Gradient method 1.29

Gradient method for strongly convex functions

better results exist if we add strong convexity to the assumptions on p. 1.23

Analysis for constant step size

if x + = x − t∇ f (x) and 0 < t ≤ 2/(m + L):

k x + − x? k22 = k x − t∇ f (x) − x? k22

= k x − x? k22 − 2t∇ f (x)T (x − x?) + t 2 k∇ f (x)k22
2mL ? 2 2
≤ (1 − t )k x − x k2 + t(t − )k∇ f (x)k22
m+L m+L
2mL
≤ (1 − t )k x − x? k22
m+L

(step 3 follows from result on page 1.22)

Gradient method 1.30

Distance to optimum

2mL
k xk − x? k22 k
≤ c k x0 − x? k22, c =1−t
m+L

• implies (linear) convergence

2
γ−1

• for t = 2/(m + L), get c = with γ = L/m
γ+1

Bound on function value (from page 1.14)

L ? 2 ck L
?
f (x k ) − f ≤ k x k − x k2 ≤ k x0 − x? k22
2 2

Conclusion: number of iterations to reach f (x k ) − f ? ≤ is O(log(1/))

Gradient method 1.31

Limits on convergence rate of first-order methods

First-order method: any iterative algorithm that selects x k+1 in the set

x0 + span{∇ f (x0), ∇ f (x1), . . . , ∇ f (x k )}

Problem class: any function that satisfies the assumptions on page 1.23

Theorem (Nesterov): for every integer k ≤ (n − 1)/2 and every x0, there exist
functions in the problem class such that for any first-order method

L x x ?k2
3 k 0 − 2
f (x k ) − f ? ≥
32 (k + 1)2

• suggests 1/k rate for gradient method is not optimal

• more recent accelerated gradient methods have 1/k 2 convergence (see later)

Gradient method 1.32

References

• A. Beck, First-Order Methods in Optimization (2017), chapter 5.

• Yu. Nesterov, Lectures on Convex Optimization (2018), section 2.1. (The result
on page 1.32 is Theorem 2.1.7 in the book.)

• B. T. Polyak, Introduction to Optimization (1987), section 1.4.

• The example on page 1.4 is from N. Z. Shor, Nondifferentiable Optimization

and Polynomial Problems (1998), page 37.

Gradient method 1.33

Ioqm Book
No ratings yet
Ioqm Book
75 pages
Monte Carlo Integration
No ratings yet
Monte Carlo Integration
5 pages
Lecture 12
No ratings yet
Lecture 12
4 pages
Section05 Solutions
No ratings yet
Section05 Solutions
5 pages
Application of Concepts of Differentials and Integral Calculus
No ratings yet
Application of Concepts of Differentials and Integral Calculus
9 pages
Convex Optimization Insights
No ratings yet
Convex Optimization Insights
3 pages
Ps 2
No ratings yet
Ps 2
3 pages
Lec3 Convex Function Exercise
No ratings yet
Lec3 Convex Function Exercise
4 pages
Convex Optimization Cheatsheet
No ratings yet
Convex Optimization Cheatsheet
2 pages
Analiza Convexa
No ratings yet
Analiza Convexa
4 pages
Lecture 11
No ratings yet
Lecture 11
4 pages
Convex Optimization Notes
No ratings yet
Convex Optimization Notes
3 pages
Table of Integrals
No ratings yet
Table of Integrals
2 pages
Lecture 7 8 Other Descent Methods
No ratings yet
Lecture 7 8 Other Descent Methods
7 pages
Week02 Convex Optimization
No ratings yet
Week02 Convex Optimization
48 pages
Lecture 10
No ratings yet
Lecture 10
4 pages
Gradient Method in Convex Optimization
No ratings yet
Gradient Method in Convex Optimization
31 pages
Lecture 15 Projected Gradient
No ratings yet
Lecture 15 Projected Gradient
8 pages
Some Special Class of Functions in Optimization: Convex, Lipschitz, Strongly Convex
No ratings yet
Some Special Class of Functions in Optimization: Convex, Lipschitz, Strongly Convex
17 pages
Ordinary Differential Equation. First Order PDF
No ratings yet
Ordinary Differential Equation. First Order PDF
16 pages
TI 83 Plus Graphing Calculator For Dummies 1st Edition C. C. Edwards Instant Download
100% (4)
TI 83 Plus Graphing Calculator For Dummies 1st Edition C. C. Edwards Instant Download
61 pages
Xu2001 Minimax
No ratings yet
Xu2001 Minimax
13 pages
Exam II - Sample - Solution PDF
No ratings yet
Exam II - Sample - Solution PDF
8 pages
Matrix Theory Problem Set
No ratings yet
Matrix Theory Problem Set
2 pages
Exam 2021 Solutions
No ratings yet
Exam 2021 Solutions
16 pages
Bregman
No ratings yet
Bregman
9 pages
A Note On The Accelerated Proximal Gradient Method For Nonconvex Optimization
No ratings yet
A Note On The Accelerated Proximal Gradient Method For Nonconvex Optimization
9 pages
Binary Channel Homework Solutions
No ratings yet
Binary Channel Homework Solutions
2 pages
E3. A Simple Proof of Menger's Theorem
No ratings yet
E3. A Simple Proof of Menger's Theorem
3 pages
Machine Learning
No ratings yet
Machine Learning
3 pages
1 Convex Analysis: 1.1 Motivations: Convex Optimization Problems
No ratings yet
1 Convex Analysis: 1.1 Motivations: Convex Optimization Problems
24 pages
LVL 2 Aptitute 1
100% (1)
LVL 2 Aptitute 1
6 pages
Machine Learning Techniques Project Proposal: Group 42
No ratings yet
Machine Learning Techniques Project Proposal: Group 42
2 pages
Homework 4: Problem 1
No ratings yet
Homework 4: Problem 1
2 pages
Lesson 1.3 Exponential and Logarithmic Functions
No ratings yet
Lesson 1.3 Exponential and Logarithmic Functions
12 pages
IGCSE (9-1) Maths - Practice Paper 2F
No ratings yet
IGCSE (9-1) Maths - Practice Paper 2F
22 pages
EE 5531: Probability and Stochastic Processes Fall 2018
No ratings yet
EE 5531: Probability and Stochastic Processes Fall 2018
3 pages
Dilations Translations PDF
No ratings yet
Dilations Translations PDF
5 pages
Allama Iqbal Open University, Islamabad Warning: (Department of Mathematics)
No ratings yet
Allama Iqbal Open University, Islamabad Warning: (Department of Mathematics)
3 pages
EE 330: Power Systems Semester-I, 2014: Dr. S. Chakrabarti
No ratings yet
EE 330: Power Systems Semester-I, 2014: Dr. S. Chakrabarti
7 pages
Convex Optimization L2 18
No ratings yet
Convex Optimization L2 18
11 pages
O4MD 03 Descent Methods
No ratings yet
O4MD 03 Descent Methods
18 pages
Balanced Faults
No ratings yet
Balanced Faults
46 pages
ID: 6d99b141
No ratings yet
ID: 6d99b141
35 pages
Solution 3: Problem 1
No ratings yet
Solution 3: Problem 1
6 pages
Convergence Theorems For (Stochastic) Gradient Methods
No ratings yet
Convergence Theorems For (Stochastic) Gradient Methods
84 pages
Homework 2
No ratings yet
Homework 2
5 pages
One'S Complement: CC - 148/294 Under Makaut, WB
No ratings yet
One'S Complement: CC - 148/294 Under Makaut, WB
10 pages
Convex Functions Lecture Notes
No ratings yet
Convex Functions Lecture Notes
14 pages
Multi-Task Learning Overview
No ratings yet
Multi-Task Learning Overview
20 pages
Measurement Errors & Instruments
No ratings yet
Measurement Errors & Instruments
24 pages
Planarity and Graph Theory Solutions
No ratings yet
Planarity and Graph Theory Solutions
12 pages
31
No ratings yet
31
8 pages
Quasi-Stationary Distributions For The Radial Ornstein-Uhlenbeck Processes
No ratings yet
Quasi-Stationary Distributions For The Radial Ornstein-Uhlenbeck Processes
10 pages
Least Square Approach On Indoor Positioning
No ratings yet
Least Square Approach On Indoor Positioning
9 pages
Permutation and Combination
No ratings yet
Permutation and Combination
8 pages
Chapter 3 Unconstrained Convex Optimization
No ratings yet
Chapter 3 Unconstrained Convex Optimization
28 pages
Eece 522 Notes - 05 CH - 3b
No ratings yet
Eece 522 Notes - 05 CH - 3b
10 pages
Subgradient Method for Optimization
No ratings yet
Subgradient Method for Optimization
33 pages
RVSP Notes
89% (9)
RVSP Notes
123 pages
Subgradient Method: Ryan Tibshirani Convex Optimization 10-725
No ratings yet
Subgradient Method: Ryan Tibshirani Convex Optimization 10-725
21 pages
Unconstrained Optimization (Contd.) Constrained Optimization
No ratings yet
Unconstrained Optimization (Contd.) Constrained Optimization
19 pages
Pengaruh Perubahan Organisasi Terhadap Kinerja Pegawai Di Lingkungan Kerja Universitas Teuku Umar
No ratings yet
Pengaruh Perubahan Organisasi Terhadap Kinerja Pegawai Di Lingkungan Kerja Universitas Teuku Umar
11 pages
Gradient Descent in Convex Optimization
No ratings yet
Gradient Descent in Convex Optimization
27 pages
Lecture 4 Si416 2025
No ratings yet
Lecture 4 Si416 2025
22 pages
Circles
No ratings yet
Circles
45 pages
Basic Concepts: 1.1 Continuity
No ratings yet
Basic Concepts: 1.1 Continuity
7 pages
Func 20160919
No ratings yet
Func 20160919
35 pages
Subgradients: Ryan Tibshirani Convex Optimization 10-725
No ratings yet
Subgradients: Ryan Tibshirani Convex Optimization 10-725
25 pages
6 Projection of Planes
No ratings yet
6 Projection of Planes
42 pages
Subgradients Slides
No ratings yet
Subgradients Slides
37 pages
1904 04755 PDF
No ratings yet
1904 04755 PDF
43 pages
Lect4 Removed
No ratings yet
Lect4 Removed
32 pages
Optimality Conditions: Unconstrained Optimization: 1.1 Differentiable Problems
No ratings yet
Optimality Conditions: Unconstrained Optimization: 1.1 Differentiable Problems
10 pages
Smooth Convex Optimization Lecture
No ratings yet
Smooth Convex Optimization Lecture
28 pages
Convex Functions in Optimization
No ratings yet
Convex Functions in Optimization
14 pages
Davy Depth Based Classifier
No ratings yet
Davy Depth Based Classifier
33 pages
Heteroclinic Cycles
No ratings yet
Heteroclinic Cycles
48 pages
Optimization Class Notes MTH-9842
No ratings yet
Optimization Class Notes MTH-9842
25 pages
03 Convex Functions
No ratings yet
03 Convex Functions
31 pages
Subgradients in Convex Analysis
No ratings yet
Subgradients in Convex Analysis
39 pages
Subgradient Methods
No ratings yet
Subgradient Methods
56 pages
Algorithmic Stability
No ratings yet
Algorithmic Stability
87 pages
Grundlehren Der Mathematischen Wissenschaften 305: A Series of Comprehensive Studies in Mathematics
No ratings yet
Grundlehren Der Mathematischen Wissenschaften 305: A Series of Comprehensive Studies in Mathematics
431 pages
Chapter 3
No ratings yet
Chapter 3
43 pages
Fast Algorithms for Convex Optimization
No ratings yet
Fast Algorithms for Convex Optimization
114 pages
Handbook of Convergence Theorems
No ratings yet
Handbook of Convergence Theorems
70 pages
Optimization Algorithms Guide
No ratings yet
Optimization Algorithms Guide
71 pages
Optimization for Convex Functions
No ratings yet
Optimization for Convex Functions
31 pages
PSQT Tutorial
No ratings yet
PSQT Tutorial
79 pages
Frobenius Algebras I Basic Representation Theory Andrzej Skowronski PDF Download
No ratings yet
Frobenius Algebras I Basic Representation Theory Andrzej Skowronski PDF Download
84 pages
Institute of Computer Science: Academy of Sciences of The Czech Republic
No ratings yet
Institute of Computer Science: Academy of Sciences of The Czech Republic
49 pages
Exercises With Solutions PDF
No ratings yet
Exercises With Solutions PDF
37 pages
Ee227c Notes PDF
No ratings yet
Ee227c Notes PDF
122 pages
Ee227c Notes 2 PDF
No ratings yet
Ee227c Notes 2 PDF
122 pages
Convex Optimization in Machine Learning
No ratings yet
Convex Optimization in Machine Learning
110 pages
Brief Intro To ML PDF
No ratings yet
Brief Intro To ML PDF
236 pages

Gradient

Uploaded by

Gradient

Uploaded by

L.

Vandenberghe ECE236C (Spring 2019)

• gradient method, first-order methods

• Lipschitz continuity of gradient

• analysis of gradient method

to minimize a convex differentiable function f : choose an initial point x0 and repeat

step size t k is constant or determined by line search

• every iteration is inexpensive

• x k can refer to k th element of a sequence, or to the k th component of vector x

Gradient method 1.2

with exact line search and starting point x (0) = (γ, 1)

gradient method is often slow; convergence very dependent on scaling

Gradient method 1.3

gradient method does not handle nondifferentiable problems

Gradient method 1.4

address one or both shortcomings of the gradient method

Methods for nondifferentiable or constrained problems

Methods with improved convergence

• conjugate gradient method

Gradient method 1.5

• gradient method, first-order methods

• Lipschitz continuity of gradient

• analysis of gradient method

a function f is convex if dom f is a convex set and Jensen’s inequality holds:

f (θ x + (1 − θ)y) ≤ θ f (x) + (1 − θ) f (y) for all x, y ∈ dom f , θ ∈ [0, 1]

for (continuously) differentiable f , Jensen’s inequality can be replaced with

f (y) ≥ f (x) + ∇ f (x)T (y − x) for all x, y ∈ dom f

for twice differentiable f , Jensen’s inequality can be replaced with

∇2 f (x)  0 for all x ∈ dom f

Gradient method 1.6

f is strictly convex if dom f is a convex set and

f (θ x + (1 − θ)y) < θ f (x) + (1 − θ) f (y) for all x, y ∈ dom f , x , y , and θ ∈ (0, 1)

strict convexity implies that if a minimizer of f exists, it is unique

for differentiable f , strict Jensen’s inequality can be replaced with

f (y) > f (x) + ∇ f (x)T (y − x) for all x, y ∈ dom f , x , y

Gradient method 1.7

a differentiable function f is convex if and only if dom f is convex and

(∇ f (x) − ∇ f (y))T (x − y) ≥ 0 for all x, y ∈ dom f

i.e., the gradient ∇ f : Rn → Rn is a monotone mapping

a differentiable function f is strictly convex if and only if dom f is convex and

(∇ f (x) − ∇ f (y))T (x − y) > 0 for all x, y ∈ dom f , x , y

i.e., the gradient ∇ f : Rn → Rn is a strictly monotone mapping

Gradient method 1.8

• if f is differentiable and convex, then

f (y) ≥ f (x) + ∇ f (x)T (y − x), f (x) ≥ f (y) + ∇ f (y)T (x − y)

combining the inequalities gives (∇ f (x) − ∇ f (y))T (x − y) ≥ 0

• if ∇ f is monotone, then g0(t) ≥ g0(0) for t ≥ 0 and t ∈ dom g , where

g(t) = f (x + t(y − x)), g0(t) = ∇ f (x + t(y − x))T (y − x)

this is the first-order condition for convexity

Gradient method 1.9

• gradient method, first-order methods

• Lipschitz continuity of gradient

• analysis of gradient method

the gradient of f is Lipschitz continuous with parameter L > 0 if

k∇ f (x) − ∇ f (y)k∗ ≤ L k x − yk for all x, y ∈ dom f

• functions f with this property are also called L -smooth

• in the definition, k · k and k · k∗ are a pair of dual norms:

this implies a generalized Cauchy–Schwarz inequality

|uT v| ≤ kuk∗ kvk for all u, v

Gradient method 1.10

c1 k xkb ≤ k xka ≤ c2 k xkb for all x

• constants depend on dimension; for example, for x ∈ Rn,

Norm in definition of Lipschitz continuity

• without loss of generality we can use the Euclidean norm k · k = k · k∗ = k · k2

Gradient method 1.11

• this implies (from the generalized Cauchy–Schwarz inequality) that

(∇ f (x) − ∇ f (y))T (x − y) ≤ L k x − yk 2 for all x, y ∈ dom f (1)

• if dom f is convex, (1) is equivalent to

f (y) (x, f (x))

Gradient method 1.12

g0(t) − g0(0) = (∇ f (x + t(y − x)) − ∇ f (x))T (y − x) ≤ t L k x − yk 2

integrating from t = 0 to t = 1 gives (2):

can be combined to give (∇ f (x) − ∇ f (y))T (x − y) ≤ L k x − yk 2

Gradient method 1.13

if dom f = Rn and f has a minimizer x?, then

• right-hand inequality follows from upper bound property (2) at x = x?, y = z

∇2 f (x) 0 for all x ∈ dom f

Conclusion: number of iterations to reach f (x k ) − f ? ≤ is O(1/)