Prof. L.
Guzzella
Prof. R. D’Andrea
Midterm Examination November 12th, 2008
Dynamic Programming & Optimal Control (151-0563-00) Prof. R. D’Andrea
Solutions
Exam Duration: 150 minutes
Number of Problems: 4 (25% each)
Permitted aids: Textbook Dynamic Programming and Optimal Control by
Dimitri P. Bertsekas, Vol. I, 3rd edition, 2005, 558 pages.
Your written notes.
No calculators.
Important: Use only these prepared sheets for your solutions.
Page 2 Midterm Examination – Dynamic Programming & Optimal Control
Problem 1 25%
1 9
3 2 1 1
2 1 1
1 4 1 4
S 2 4 6 T
3 3 1 1
5 1 1
3 5 7 10
1 1
1 1
Figure 1
Find the shortest path from node S to node T for the graph given in Figure 1. Apply the label
correcting method. Use best-first search to determine at each iteration which node to remove
from OPEN; that is, remove node i with
di = min dj ,
j in OPEN
where the variable di denotes the length of the shortest path from node S to node i that has
been found so far.
Solve the problem by populating a table of the following form:
Iter- Node exiting OPEN dS d1 d2 d3 d4 d5 d6 d7 d8 d9 d10 dT =
ation OPEN UPPER
...
Midterm Examination – Dynamic Programming & Optimal Control Page 3
Solution 1
Iter- Node exiting OPEN dS d1 d2 d3 d4 d5 d6 d7 d8 d9 d10 dT =
ation OPEN UPPER
0 - S 0 ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞
1 S 1,2,3 0 3 1 3 ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞
2 2 1,3,4 0 2 1 3 5 ∞ ∞ ∞ ∞ ∞ ∞ ∞
3 1 3,4 0 2 1 3 4 ∞ ∞ ∞ ∞ ∞ ∞ ∞
4 3 4,5 0 2 1 3 4 8 ∞ ∞ ∞ ∞ ∞ ∞
5 4 5,6 0 2 1 3 4 7 5 ∞ ∞ ∞ ∞ ∞
6 6 5,9 0 2 1 3 4 7 5 ∞ ∞ 6 ∞ 9
7 9 5 0 2 1 3 4 7 5 ∞ ∞ 6 ∞ 7
8 5 - 0 2 1 3 4 7 5 ∞ ∞ 6 ∞ 7
The shortest path is S → 2 → 1 → 4 → 6 → 9 → T with a total length of 7.
Page 4 Midterm Examination – Dynamic Programming & Optimal Control
Problem 2 25%
Consider the dynamic system
xk+1 = (1 − a) wk + auk , 0 ≤ a ≤ 1, k = 0, 1 ,
with initial state x0 = −1. The cost function, to be minimized, is given by
( 1
)
X ¡ ¢
E x22 + x2k + u2k + wk2 .
w0 ,w1
k=0
The disturbance wk takes the values 0 and 1. If xk ≥ 0, both values have equal probability. If
xk < 0, the disturbance wk is 0 with probability 1. The control uk is constrained by
0 ≤ uk ≤ 1, k = 0, 1.
Apply the Dynamic Programming algorithm to find the optimal control policy and the optimal
final cost J0 (−1).
Midterm Examination – Dynamic Programming & Optimal Control Page 5
Solution 2
The optimal control problem is considered over a time horizon N = 2 and the cost, to be
minimized, is defined by
g2 (x2 ) = x22 and gk (xk , uk , wk ) = x2k + u2k + wk2 , k = 0, 1.
The DP algorithm proceeds as follows:
2nd stage:
J2 (x2 ) = x22
1st stage:
© ª
J1 (x1 ) = min E x21 + u21 + w12 + J2 (x2 )
0≤u1 ≤1
© ¡ ¢ª
= min E x21 + u21 + w12 + J2 (1 − a) w1 + au1
0≤u1 ≤1 w1
n ¡ ¢2 o
= min E x21 + u21 + w12 + (1 − a) w1 + au1
0≤u1 ≤1 w1
Distinguish two cases: x1 ≥ 0 and x1 < 0.
I ) x1 ≥ 0:
½ ´¾
2 2 1³ 2
´ 1³
2
J1 (x1 ) = min x1 + u1 + 1 + ((1 − a) + au1 ) + 0 + ((1 − a) · 0 + au1 )
0≤u1 ≤1 2 2
| {z }
L(x1 ,u1 )
Find the minimizing ū1 by
¯
∂L ¯¯ ¡ ¢ ! −a (1 − a)
¯ = (1 − a) a + 2 1 + a2 ū1 = 0 ⇔ ū1 = ≤ 0 (!).
∂u1 ū1 2 (1 + a2 )
Recall that the feasible set of inputs u1 is given by 0 ≤ u1 ≤ 1.
However, using the information that L (x1 , u1 ) is convex in u1 ; that is,
∂2L ¡ ¢
2 = 2 1 + a2 > 0,
∂u1
it follows that ū1 is a local minimum and the feasible optimal control u∗1 is given by
⇒ u∗1 = µ∗1 (x1 ) = 0 ∀ x1 ≥ 0.
Page 6 Midterm Examination – Dynamic Programming & Optimal Control
II ) x1 < 0:
© 2 ¡ ¢ ª
J1 (x1 ) = min x1 + 1 + a2 u21
0≤u1 ≤1 | {z }
L(x1 ,u1 )
Find the minimizing ū1 by
¯
∂L ¯¯ ¡ ¢ !
¯ = 2 1 + a2 ū1 = 0 ⇔ ū1 = 0.
∂u1 ū1
¯
∂2L ¯
Since the sufficient condition for a local minimum, ∂u21 ¯ū
> 0, holds, the optimal control is
1
⇒ u∗1 = µ∗1 (x1 ) = 0 ∀ x1 < 0.
0th stage:
n ¡ ¢o
J0 (−1) = min E (−1)2 + u20 + w02 + J1 (1 − a) w0 + au0
0≤u0 ≤1 w0
Since x0 < 0, we get
© ¡ ¢ª
J0 (−1) = min 1 + u20 + J1 au0 ,
0≤u0 ≤1 | {z }
L(x0 ,u0 )
where au0 ≥ 0. From above’s results, the optimal cost-to-go function for x1 ≥ 0 is
1 1
J1 (x1 ) = + (1 − a)2 + x21 .
2 2
Finally, the minimizing ū0 results from
¯
∂L ¯¯ !
= 2ū0 + 2a2 ū0 = 0 ⇔ ū0 = 0.
∂u0 ¯ū0
¯
∂2L ¯
Since ∂u20 ¯ū
> 0, the optimal control u∗0 is
0
⇒ u∗0 = µ∗0 (−1) = 0.
With this, the optimal final cost reads as
3 1
J0 (−1) = + (1 − a)2 .
2 2
In brief, the optimal control policy is to always set the input to zero, which can also be verified
by carefully looking at the equations given in the problem statement.
Midterm Examination – Dynamic Programming & Optimal Control Page 7
Problem 3 25%
t=0 t=1
u
z=0 z=1
Figure 2
At time t = 0, a unit mass is at rest at location z = 0. The mass is on a frictionless surface and
it is desired to apply a force u(t), 0 ≤ t ≤ 1, such that at time t = 1, the mass is at location
z = 1 and again at rest. In particular,
z̈(t) = u(t), 0 ≤ t ≤ 1, (1)
with initial and terminal conditions:
z(0) = 0, ż(0) = 0,
z(1) = 1, ż(1) = 0.
Of all the functions u(t) that achieve the above objective, find the one that minimizes
Z 1
1
u2 (t)dt.
2 0
Hint: The state for this system is x(t) = [ x1 (t), x2 (t) ]T , where x1 (t) = z(t) and x2 (t) = ż(t).
Page 8 Midterm Examination – Dynamic Programming & Optimal Control
Solution 3
Introduce the state vector · ¸ · ¸
x z
x= 1 = .
x2 ż
Using this notation, the dynamics read as
· ¸ · ¸
ẋ1 x
= 2
ẋ2 u
with initial and terminal conditions,
x1 (0) = 0, x2 (0) = 0,
x1 (1) = 1, x2 (1) = 0.
Apply the Minimum Principle.
• The Hamiltonian is given by
H(x, u, p) = g(x, u) + pT f (x, u)
1
= u2 + p1 x2 + p2 u.
2
• The optimal input u∗ (t) is obtained by minimizing the Hamiltonian along the optimal
trajectory. Differentiating the Hamiltonian with respect to u yields,
u∗ (t) + p2 (t) = 0 ⇔ u∗ (t) = −p2 (t).
Since the second derivative of H with respect to u is 1, u∗ (t) is indeed a minimum.
• The adjoint equations,
ṗ1 (t) = 0
ṗ2 (t) = −p1 (t),
are integrated and result in the following equations:
p1 (t) = c1 , c1 constant
p2 (t) = −c1 t − c2 , c2 constant.
Using this result, the optimal input is given by
u∗ (t) = c1 t + c2 .
• Recalling the initial and terminal conditions on x, we can solve for c1 and c2 .
With above’s results, the optimal state trajectory x∗2 (t) is
1
ẋ∗2 (t) = c1 t + c2 ⇒ x∗2 (t) = c1 t2 + c2 t + c3 , c3 constant,
2
Midterm Examination – Dynamic Programming & Optimal Control Page 9
and, therefore,
x∗2 (0) = 0 ⇒ c3 = 0
1
x∗2 (1) = 0 ⇒ c1 + c2 = 0 ⇒ c1 = −2c2 ,
2
yielding to
x∗2 (t) = −c2 t2 + c2 t.
The optimal state x∗1 (t) is given by
1 1
ẋ∗1 (t) = x∗2 (t) = −c2 t2 + c2 t ⇒ x∗1 (t) = − c2 t3 + c2 t2 + c4 , c4 constant.
3 2
With the conditions on x1 , we get
x∗1 (0) = 0 ⇒ c4 = 0
1 1
x∗1 (1) = 1 ⇒ − c2 + c2 = 1 ⇒ c2 = 6 and c1 = −12.
3 2
• Finally, we obtain the optimal control
u∗ (t) = −12t + 6,
and the optimal state trajectory
x∗1 (t) = z ∗ (t) = −2t3 + 3t2
x∗2 (t) = ż ∗ (t) = −6t2 + 6t.
Page 10 Midterm Examination – Dynamic Programming & Optimal Control
Problem 4 25%
Recall the Minimum Principle.
Under suitable technical assumptions, the following Proposition holds:
Given the dynamic system
ẋ = f (x(t), u(t)) , x(0) = x0 , 0≤t≤T
and the cost function, Z T
h (x(T )) + g (x(t), u(t)) dt,
0
to be minimized, define the Hamiltonian function
H(x, u, p) = g (x, u) + pT f (x, u) .
Let u∗ (t), t ∈ [0, T ] be an optimal control trajectory and x∗ (t) the resulting state trajectory.
Then,
1. ṗ(t) = − ∂H ∗ ∗
∂x (x (t), u (t), p(t)) , p(T ) = ∂h
∂x (x∗ (T )) ,
2. u∗ (t) = arg minu∈U H (x∗ (t), u, p(t)) ,
3. H (x∗ (t), u∗ (t), p(t)) is constant.
Show that if the dynamics and the cost are time varying – that is, f (x, u) is replaced by f (x, u, t)
and g (x, u) is replaced by g (x, u, t) – the Minimum Principle becomes:
1. ṗ(t) = − ∂H ∗ ∗
∂x (x (t), u (t), p(t), t) , p(T ) = ∂h
∂x (x∗ (T )) ,
2. u∗ (t) = arg minu∈U H (x∗ (t), u, p(t), t)
3. H (x∗ (t), u∗ (t), p(t), t) not necessarily constant,
where the Hamiltonian function is now given by
H(x, u, p, t) = g (x, u, t) + pT f (x, u, t) .
Midterm Examination – Dynamic Programming & Optimal Control Page 11
Solution 4
General idea:
Convert the problem to a time-independent one, apply the standard Minimum Principle pre-
sented in class, and simplify the obtained equations.
Follow the subsequent steps:
• Introduce an extra state variable y(t) representing the time:
y(t) = t, with ẏ(t) = 1 and y(0) = 0.
• Convert the problem into standard form by introducing the extended state ξ = [ x, y ]T :
The dynamics read now as
˙ = f˜(ξ, u) = [ f (x, u, y), 1 ]T
ξ(t)
and the cost is defined by Z T
h̃(ξ(T )) + g̃(ξ, u) dt,
0
where g̃(ξ, u) = g(x, u, y) and h̃(ξ) = h(x).
The Hamiltonian follows from above’s definitions:
H̃(ξ, u, p̃) = g̃(ξ, u) + p̃T f˜(ξ, u) with p̃ = [ p, py ]T .
• Apply the Minimum Principle:
Denoting the optimal control by u∗ (t) and the corresponding optimal state by ξ ∗ (t), we
get the following:
1. The adjoint equation is given by
˙ = − ∂ H̃ (ξ ∗ (t), u∗ (t), p̃(t)) ,
p̃(t) p̃(T ) =
∂ h̃ ∗
(ξ (T )) . (2)
∂ξ ∂ξ
However,
H̃(ξ, u, p̃) = g(x, u, y) + pT f (x, u, y) + py = H(x, u, p, y) + py ;
that is,
∂ H̃ ∂H ∂ H̃ ∂H
= , = .
∂x ∂x ∂y ∂y
Moreover,
∂ h̃ ∂h ∂ h̃
= and = 0.
∂x ∂x ∂y
From (2), we recover the first equation
∂H ∗ ∂h ∗
ṗ(t) = − (x (t), u∗ (t), p(t), t) , p(T ) = (x (T )) .
∂x ∂x
In addition, replacing y(t) by t again, we get
∂H ∗
ṗy (t) = − (x (t), u∗ (t), p(t), t) , py (T ) = 0.
∂t
Page 12 Midterm Examination – Dynamic Programming & Optimal Control
2. The optimal input u∗ (t) is obtained by
n o
u∗ (t) = arg minu∈U H(x∗ (t), u∗ (t), p(t), t) + py (t)
= arg minu∈U H(x∗ (t), u∗ (t), p(t), t).
3. Finally, the standard formulation gives us
H(x∗ (t), u∗ (t), p(t), t) + py (t) is constant.
∂H
However, py (t) is constant only if ∂t = 0, which, in general, is only true if f and g
do not depend on time.