Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
37 views3 pages

Exercices Class2

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views3 pages

Exercices Class2

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Derivative Free Optimization

Optimization and AMS Masters - University Paris Saclay

Exercices - Class 2 and 3

Anne Auger
[email protected]
https://www.cmap.polytechnique.fr/~auger/teaching.html

I Order statistics - Effect of selection

We want to illustrate the effect of selection on the distribution of candidate solutions in a stochastic
algorithm. More precisely we consider a (1, λ)-ES algorithm whose state is given by Xt ∈ Rn . At each
iteration t, λ candidate solutions are sampled according to

Xit+1 = Xt + Ut+1
i

i i
with (Ut+1 )1≤i≤λ i.i.d. and Ut+1 ∼ N (0, Id ). Those candidate solutions are evaluated on the function
n
f : R → R to be minimized and then ranked according the their f values:
t+1 t+1
f (X1:λ ) ≤ . . . ≤ f (Xλ:λ )

where i:λ denotes the index of the ith best candidate solution. The best candidate solution is then selected
that is
t+1
Xt+1 = X1:λ .
t+1
We will compute for the linear function f (x) = x1 to be minimized the conditional distribution of X1:λ
t+1
(i.e. after selection) and compare it to the distribution of Xi (i.e. before selection).

1. What is the distribution of Xit+1 conditional to Xt ? Deduce the density of each coordinate of Xit+1 .

We remind that given λ random variables independent and identically distributed Y1 , Y2 , . . . , Yλ , the order
statistics Y(1) , Y(2) , . . . , Y(λ) are random variables defined by sorting the realizations of Y1 , Y2 , . . . , Yλ in
increasing order. We consider that each random variable Yi admits a density f (x) and we denote F (x)
the cumulative distribution function, that is F (x) = Pr(Y ≤ x).

2. Compute the cumulative distribution of Y(1) and deduce the density of Y(1) .
t+1
3. Let U1:λ be the random vector such that
t+1 t+1
X1:λ = Xt + U1:λ
t+1
Express for the minimization of the linear function f (x) = x1 , the first coordinate of U1:λ as an
order statistic.
t+1
4. Deduce the conditional distribution and conditional density of the random vector X1:λ .
II Adaptive step-size algorithms

The implementations can be done in the programming langage your prefer (Matlab/Python, ...)
We are going to test the convergence of several algorithms on some test functions, in particular on the
so-called sphere function
n
X
fsphere (x) = x2i
i=1

and the ellipsoid function


n
X i−1
felli (x) = (100 n−1 xi )2 .
i=1

1. What is the condition number associated to the Hessian matrix of the functions above? Are the
functions ill-conditioned?

2. Implement the functions.

The (1 + 1)-ES algorithm is on of the simplest stochastic search method for numerical optimization. We
will start by implementing a (1 + 1)-ES with constant step-size. The pseudo-code of the algorithm is
given by

Initialize x ∈ Rn and σ > 0


while not terminate
x0 = x + σN (0, I)
if f (x0 ) ≤ f (x)
x = x0

where N (0, I) denotes a Gaussian vector with mean 0 and covariance matrix equal to the identity.

1. Implement the algorithm. You can write a function that takes as input an initial vector x, an initial
step-size σ and a maximum number of function evaluations and returns a vector where you have
recorded at each iteration the best objective function value.
2. Use the algorithm to minimize the sphere function in dimension n = 5. We will take as initial search
point x0 = (1, . . . , 1) [x=ones(1,5)] and initial step-size σ = 10−3 [sigma=1e-3] and stopping
criterion a maximum number of function evaluations equal to 2 × 104 .

3. Plot the evolution of the function value of the best solution versus the number of iterations (or
function evaluations). We will use a log scale for the y-axis (semilogy).
4. Explain the three phases observed on the figure.
To accelerate the convergence, we will implement a step-size adaptive algorithm, i.e. σ is not fixed
once for all. The method to adapt the step-size is called one-fifth success rule. The pseudo-code of
the (1 + 1)-ES with one-fifth success rule is given by:

Initialize x ∈ Rn and σ > 0


while not terminate
x0 = x + σN (0, I)
if f (x0 ) ≤ f (x)
x = x0
σ = 1.5 σ
else
σ = (1.5)−1/4 σ

2
5. Implement the (1+1)-ES with one-fifth success rule and test the algorithm on the sphere function
fsphere (x) in dimension 5 (n = 5) using x0 = (1, . . . , 1), σ0 = 10−3 and as stopping criterion a
maximum number of function evaluations equal to 6 × 102 . Plot the evolution of the square root of
the best function value at each iteration versus the number of iterations. Use a logarithmic scale for
the y-axis. Compare to the plot obtained on Question 3. Plot also on the same graph the evolution
of the step-size.
6. Use the algorithm to minimize the function felli in dimension n = 5. Plot the evolution of the
objective function value of the best solution versus the number of iterations. Why is the (1 + 1)-ES
with one-fifth success much slower on felli than on fsphere ?

7. Same question with the function


n−1
X
fRosenbrock (x) = (100(x2i − xi+1 )2 + (xi − 1)2 ) .
i=1

8. We now consider the functions, g(fsphere ) and g(felli ) where g : R → R, y 7→ y 1/4 . Modify your
implementation in Questions 5 and 6 so as to save at each iteration the distance between x and
the optimum. Plot the evolution of the distance to the optimum versus the number of function
evaluations on the functions fsphere and g(fsphere ) as well as on the functions felli and g(felli ). What
do you observe? Explain.

You might also like