Numerical Optimal Control, August 2014
Exercise 5: Dynamic programming
Joel Andersson Joris Gillis Greg Horn Rien Quirynen Moritz Diehl
University of Freiburg – IMTEK, August 5th, 2014
Dynamic programming for a two-state OCP
Dynamic programming and its continuous time counterpart – the Hamilton-Jacobi-Bellman
equation – can be used to calculate the global solution of an optimal control problem. Unfor-
tunately they suffer from Bellman’s so-called “curse-of-dimensionality”, meaning that they get
exponentionally expensive with the number of states and control. In practice, they can be used
for systems with 3-4 differential states or systems that have special properties.
Here we shall consider a simple OCP with two states (x1 , x2 ) and one control (u):
Z T
minimize x1 (t)2 + x2 (t)2 + u(t)2 dt
x,u 0
(1)
subject to ẋ1 = (1 − x22 ) x1 − x2 + u, x1 (0) = 0
ẋ2 = x1 , x2 (0) = 1
−1 ≤ x1 (t) ≤ 1, −1 ≤ x2 (t) ≤ 1, −1 ≤ u(t) ≤ 1,
with T = 10.
To be able to solve the problem using dynamic programming, we parameterize the control
trajectory into N = 20 piecewise constant intervals. On each interval, we then take NK steps
of a RK4 integrator in order to get a discrete-time OCP of the form:
N −1
(k) (k)
X
minimize F0 (x1 , x2 , u(k) )
x,u
k=0
(k+1) (k) (k) (0) (2)
subject to x1 = F1 (x1 , x2 , u(k) ), k = 0, . . . , N − 1, x1 = 0
(k+1) (k) (k) (0)
x2 = F2 (x1 , x2 , u(k) ), k = 0, . . . , N − 1, x2 = 1
(k) (k)
−1 ≤ x1 ≤ 1, −1 ≤ x2 ≤ 1, −1 ≤ u(k) ≤ 1 ∀k.
Tasks:
5.1 On the course webpage and on Gist1 , you will find an incomplete implementation of
dynamic programming for problem (2). Add the missing calculation of the cost-to-go
function to get the script working.
5.2 Add the additional end-point constraint x1 (T ) = −0.5 and x2 (T ) = −0.5. How does the
solution change?
1
https://gist.github.com/jaeandersson/e37e796e094b3c6cad9e