+Q G1R11'G Q4-Qig2R22'G Q +H V2-1H2P2Q4: Algorithm
+Q G1R11'G Q4-Qig2R22'G Q +H V2-1H2P2Q4: Algorithm
AUTOMATIC
ON CONTROL, VOL. AC-25,NO. 6, DECEMBER
1149 1980
+Q2G2R22lR12RG1GQ2 +Q3G1Ri1G;Q3
-P[Q~PIH;VI-'HIPIQ~
+Q~PIH;VZ-'HZPIQ~
~ ~ ~ p l H ~ v 2 ~ ' H 2 p 2 ~ 4 ~ ~ 4 p 2 H ~ v 2 ~ ' H 2 p ~ ~ 3 I. INTRODUCTION
+Q~P~H;V~-'H~PZQ~] When we solve optimal control problems numerically, we sometimes
encounter convergencedifficulties. That is, it is difficult to r i d a
Q4(T)=o.
nominal solution such that an algorithm starting from it is stable. Since
the differential dynamic programming (DDP) technique had been pro-
V. CONCLUSION posed by Jacobson and Mayne [I], several computational methods for
For the general LQG stochastic differential game with differing ob- optimal control problems which ensure the convergence of the algorithm
servations for the two players, the pair of Nash equilibrium solutions were invented.Among others, Mayne and Polak [2]proposed a DDP-type
known to us are unimplementable. However, when one of the players is algorithm and proved that every limitpoint generated by their algorithm
assumed to "spy" on the other, the resulting solutions are implementable satisfies the optimality condition. However, their procedure of s u v
by finite-dimensional systems. A similar situation holds when the cost sively constructing controls is complicated. Ohno 131 presented a new
criterion of one of the players is modeled by theexponential of a approach to discrete time systems and proved l o c a l convergence of the
quadratic form. Also, using the "spy" situation as worst case, a lower algorithm. Jibmark [4], [5] proposedconvergence control parameter
bound for the performance in the game of any one of the players m y be technique for controlling the convergence. Althoughthis technique seems
obtained; also, fiite-dimensionally implementable solutions guarantee to workwell, the mathematical mechanism of this technique has not
this lower bound are exhibited. been clarified yet.
In this paper we present a simple algorithmfor computing the optimal
control, which is analogous to the first-order DDP, but essentially based
upon the Pontryagin minimum (or maximum) principle rather than the
REPERENCES dynamic programming. We employthe convergence control technique in
W. W.Willman, "Formal solutions for a class of stochastic pursuit-evasion game$"
the algorithm, and we consider the global convergence conditionsfor the
IEEE Trans. Aulonwt. C o w . . vol. AG14, pp. 504-509, 1969. algorithm. An example is worked out by using our algorithm.
A. Bagchi and G . J. Olsdcr, "Linear quadratic stochastic purmit-evasion games,"
Twenk Univ. T&oL Enscbede, The Netherlands, Memo. 231,1978.
P. R K-, "Stochastic optimal control and stochastic differential games," D.Sc. 11. OFTMAL. CONTROL PROBLEMAND COMPUTA~ONAL
dissertation, Washington Univ., St. L O G MO, 1977. PRocmuRe
W. M. Wonham, RMdmn Differentiaf&rationsin Control W y , P&ilistic
Mefhods in Applied Mmhematiies, A. T.Bharucha-Reid, Ed., voL 11. New Yo*
Academic. 1969. We consideradynamicalsystemdefined on afixed time interval
M. Fuji&, G . Kallianpur, and H.Kunita, "Stochastic differential equations for and described by
T=[to, t l ]
the nonlinear filIaing problem," Os& J . Math., voL 9, pp. 19-40.1972.
J. H.van Schuppcn, "Estimation theory for continuous time p r o c s s s , a martingale
approach," F'h.D. dissertation,Univ. California, Berkeley, 1973.
M. H.A. Davis and P. Varaiya, -Dynamic programming conditions for partially
observable stochastic system" SIAM 1. Cow., vol. 11, pp. 226-261, 1973. where ~ ( t is) an n-dimensional state vector, u ( t ) is an r-dimensional
C Striebel, "Martingale conditions for the optimal control of continuous time
stochastic systems," presented at the Int. Workshop on Stochastic Filtering and control vector, and the vector-valued function f(x, u, t ) should satisfy
Contr., Los Angeles, CA, May 13-17, 1974. somedifferentiability and continuity conditions which will be stated
T. Barar, "Decmtralized multicriteria optimization of linear stochastic systeq" later. The control vectors are required to satisfy the constraint
IEEE Tram. Aulomat. Conrr.. vol. AC-23, pp. 233-243, 1978.
1. B. RhDdts and D. G. Luenberger, "Differential games with m i p e
rd state
f
information," IEEE Trans. Automar. Contr., voL AG14, pp. 29-38, 1969.
I . B. Rhodes, "On nonzero sum d i f f e r m u games with quadratic cost functionals,"
in Proe. 1st Int. G n f . and Appl. of Differenrial Comes, Univ. Masa&upettS, where U is a conpuct and conuex subset of the r-dimensional Euclidean
Amherst, 1969.
R D. Bchn and Y. C Ho, "On a class of linear stochastic differential games" , space. The class D of admissible controlr isdefined as the set of all
IEEE T m . Automat. Contr., vol. AG13, pp. 227-239, 1968. measurable functions u: T+U satisfying (2).
K. Mori and E Shimemura, "Linear differential games withdelayed and noisy
information," 1. Oprimis. %ory Appl, vol. 13, pp. 275-289, 1974. The problem is to find the optimal control UED that minimhx the
Y.C. Ho, "On the minimax principle and Z C T D . stochastic
~ differential games," cost functional
J . Optimiz. lhwy Appl., voL 13, pp. 343-361, 1974.
D. H. Jacobson, "Opoptimal stochastic linear systems with exponential criteria and
their relation to deterministic differential games," IEEE Tram. A w o m a ~ .Comr.,
voL AG18, pp. 124-131, 1973.
J. L Speyer, J. Lkyst, and D. H. Jacobson, "Optimjzation of stochastic linear
systems with additive measurement and process noise using exponential perfor- Note that the cost functional of the form
mana a i & " IEEE Trans. AWomot. Cow.., voL AG19, pp. 358-366, 1974.
P. R Kumar and J. H. vanSchuppen, "On the optimal control of stochastic
bysfcms with an exponential-of-integral performance irides," 1. Mmh. A d . Appl.,
to be published.
I . B. Rhodes and D. G.Luenherger, "Stochastic differentialgames with conspained
staterstimators," IEEE Trans. Automar.Cow., vol. AG14, pp. 476-481, 1969. can be represented as (3) by setting
Y.C. H o and K.C Chu, '%formation structure in multi-personcontrol problems,"
Automatiea. voL IO, pp. 341-351. 1974.
C T. -des and B. Mom, "Differential gameswith noise corrupted observatiom,"
1. Optimiz. lhwy Appl., voL 28. pp. 233-251, 1979.
Y.C.Ho, I . Blao, and T. Basar, "A tale of four information smhms," in control Manuscript received December 19,1979; revised June 5,1980. Paper recammended by
%oy, Nanerical Methods and Computer SVJtems Modeling, voL 107, Lecture N o t a A J. h u b , chairman of the Computational Methods and Discrete Systems cornmitt&.
in Economics and Mathemafieaf S y s t m . New York Springer-Verlag. 1975, pp. The auihors are with the Department of Control Engin&& Faculty of Engin-
85-96. Science, Osaka University, Toyonaka, Osaka 560, Japan.
(8) holds for any i > io, then we may conclude that the sequence{ ~ ' ( t )has
}
converged.
Step 2: Define the function In Step 2 of the algorithm, the mhimization of the function K with
respect to u has to be performed for each grid point of T. It should be
K(x,u,h,t;o,C)=H(x,u,h,r)+(u-")=C(u-o) (9) noted that in most cases the minimi7ing point u ' ( t ) U~ can be calcu-
lated analytically and expressedby simple equations.
where C=diag(c,,. . . ,c,,), cl,. . . ,c,, 2 0 . Select a nonnegative diagonal
matrix C' properly. Determine x'(t) and u'(t), r E T which satisfy both
111. ASSuMpnONS AND SOME
K(x~(~),u'(~),h'-~(~),~;u~-~(~),c')
In order to consider the convergence of the algorithm, we make the
=H(x'(t),u'(t),X'-'(f),f) following assumptions throughout this paper.
Assumption I : Functions fi(x,u, t ) ( i = 1,. .. ,n ) and L(x, u, t ) and
+ ( u ~ ( ~ ) - u ' - ~ ( t ) ) ~ c ' ( u ~ ( (~t )) -) u ~ - ~their partial derivativesfi,,fi,,.fi,,fi,,,fi,,, L,, L,,, L,, L,,, L,, are
= ~ K ( x ~ ( t ) , u , h ' - ~ ( ~ ) , ~ ; u ' - ~ ( t )continuous,c') on R"x U X T.
UEU Ammprion 2: For any admissible control u(.)€B, there existsa
(10) uniformly bounded solution x(t; u ) , ET of (1). In other words, there is
a solution x ( t ; u ) of (1) that satisfies
and the differential equation
Ilx(t; U)ll4MI (15)
dx'(r)/dt=l(x'(t),u'(t),t), xi(to)=x0. (11)
for any t E T and for any u EQ, where M I is a constant independent of f
This is possible by integrating (11) from t o to t I while seeking u ' ( t ) that and u.
minimizes K. We denote by X a convex setof R" given by
Step 3: Calculate
X = { x E R " : llx114Ml}. (16)
J(ui)=/"L(xi(t),~i(t),r)dr. (12)
10
If there is a constant c such that the inequality
If J(ui)-J(ui-')>O, make the elements of C' larger and go to Step 2. IIf(x,~Y~)ll~~(IIXII+1) (17)
Otherwise, set i:=i+l and go to Step 1. holds for any ( x , u , t ) E R " X U X T , then it is easily seen that Assump
Stop the computation if the sequence {C') is bounded for all i and the tion 2 holds. In fact, integrating the inequhty
sequence { u ' ( t ) ) of the controls converges.
In this algorithm we minimize K instead of H. Since the function K
contains a quadratic penalty term ( u - u ~ - ~ I ) * c ' ( u - u ~ -)~for the possi-
ble large change of control, instability of the algorithm at the first stage
of computation can be avoided by taking large C'. This idea is due to we have
Jiirmark 141, [q.
In Step 2 of the algorithm, we have to determine x'(t) and u ' ( f ) Ilx(r; u)ll ~ ( ~ ~ x ( r l)e@1-'0)-
o ) ~ ~ + 1.
satisfying both (10) and (11). For this purpose, we propose the following Proposition I : The function h i ( t ) defined as thesolution of (8) is
implementable algorithm. We replace the differential equation (11) by a uniformly bounded, i.e., there is a constant M2 independent of t and i
difference equation with a uniform step length A > 0 and proceed as
such that
follows.
i) Set k=O. Ilh'(r)ll <M2 (18)
ii)Given x'(to+kA)~x'(k), determine ui(to+kA)=ui(k) via (10).
iii) Using an approximation foranytETandforanyi(i=O,l,..-).
1151 1980
IEBE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. AC-25, NO. 6, DECEMBER
Proof: From (8) we obtain minimum element of the nonnegative diagonal matrix Ci. If we choose
C' such that
Ai(fl)=0.
-dAi(t)/dt=Ai(t)f,(~i(t),ui(t),t)+Lx(xi(f),ui(t),t),
ci>co (i=1,2,---) (3)
Since f , ( x , u , t ) and & ( x , u , t ) are continuous on the compact set
X X U X T,there are constants a and b such that where co is a constant satisfying
co>(M-r)/4, (29)
~ ~ f , ( x ~ u( it()t ,) , r ) l l < a ,
llLx(xi(t), U'(t), t)ll < b . then the sequence {J(u')) of the cost function& decreases monotoni-
cally and converges.
Let T = t l - f , and define Proof: First we prove (27). In view of (6)and (12), we see that
+H(xi,ui-l,Ai-l ,t)-~(xi-l,u'-~,A'-l,f)-A'-~6xi(t)]~
Proposition 2: There is a constant M3 independent of t and i such that
ciIIui(t)-ui-'(t)II<M3 (22) =i:'[ H U ( x i , u i , A i - 1 , t ) 6 u ' - 2-1( G u i ) T H ~ , ( x i , I ; i , A i - 1 , t ) 6 u i
for any t E T and for any i where ci > 0 is a minimum element of the ,
+ ~ ~ ( ~ i - l , ~ i - l , ~ i - lt )6x'
diagonal matrix c'=diag(ci,- cj). + I ,
) - ~ ~ ( From
< H , ( ~ ~ ( t ) , I ; ~ ( f ) , A ~ - ' ( f ) , f ) ( ~ ~ - ' ( t(24) f ) ) (9) we obtain
IlH,(xi(t), I;'(f), A i - q t ) , t)ll < M 3 . (25) for any U EU. Thus the firstterm of the right-handside of (31) is
nonpositive. In view of (S), we obtain
From (24) and (25) we(22). obtain Q.E.D.
By Proposition 2, if we select a large C', then the variation of control /'l[~x(x'-I,ui-~,~i-~,t)6xi-xi-~6x']
u i ( t ) - u i - ' ( t ) is kept small and the stability of the algorithmis ensured. f0
The following assumption wiU usually hold in most optimal control = ( L ~ ~ - ) ; i - ~ ~ x i - ~ i - - ~ ; i ~ ~ = - [ ~ ~ - l ( f ) ~ x i ( (33)
t ) l ~ ~ = ~ .
problems. t0
Asmnption 3: There is a nonnegative definite matrix R such that
Furthermore, since H x x ( x ,u, A, t ) is continuous on the compact set
H,,(x,u,A,t)>R>O (26) X X UX A X T,there is a constant M4 independent of t and i such that
for a n y x ~ XUEV,
, AEA, and t E T . IIHxx(~'(t),u'-'(t),Ai\'-'(t),t)ll~M4 (34)
w. REDUCTION OF THE COST for any t ET and for any i. Using relations (3 1)-(34) and Assumption 3,
we obtain
Proposition 3: There is a constant M>O independent of i such that
the inequality J( u') -J( ui-1)
holds for any i where r > 0 is the minimum eigenvalue of R and ci is the It is easily seen that there exist positive constants ai(i= 1,2) such that
1152 I E ~ BTRANSACFIONS ON AUTOMATIC CONTROL, VOL. AC-25,NO. 6, DECEMBER 1980
M=M4M6=M4MS2(rl-r0)2/2. (40)
~ ~ ~ i ( ~rp ), ~ -~ ~~( r~) - x( i ~
( r ) )~ ~~+ p~z ~ ~ ~ ( r ) - u ~ ( r ) ~ ~
Therefore, we obtain
2o
0
t
I
1 2
I
Iteration
3
I
4
Fig. 2. Control functions for various iterations.
g ( x ( 0 , u(t))cO,
from which we see that
lim J( u i ) =J( ii). (5 1) we have not yet sucex$ed in proving the global convergence of the
i-r m algorithm. Constraints on the terminal state can be taken into considera-
Q.E.D. tion by adding a penalty term B ( x ( t , ) ) for the terminal state constraints
as in (4) and rewriting the cost functional as in (3).
VI. A NUMERICALEXAMPIX In [2] it is proved that, if a successively constructed sequence {ut> of
controls has a limit point, then it satisfies the necessary conditions for
We consider the following control problem described by the Rayleigh optimality. In [lo] it is further proved that, by extending the class of
equation: controls to the relaxed controls, at least one limit point exists that
x, = x 2 satisfies the necessary conditions for optimality. Although thm results
are stronger than our r d t , implementable criterion for determining a
i , = - x ] +1.4x2 -0.14~;
+4u, (52) limit point seems to be difficult.
The matrices C' should be chosen adaptively depending on the pro-
with the initial condition ceeding of the computation. In general, when the matrix Ci is smaller,
x1(0)= - 5 , x*(O)= -5. the obtained variation of the control function is larger. Therefore,C' are
desired to be as small as possible as far as the cost functionals decrease.
The problem is to find the optimal control u(t), 06 t ~ 2 . 5that mini- According to our computational experience, the following way of choos-
mizes ing the matrices C' is recommended.Choose the initial matrix C1
properly. If J(u')>J(u'-'), then set c':=x'. If J(u')<J(u'-'), then
'
set C'+ =aC' where a is a constant such that 0.5 6 a < 1. a =0.8 to 0.9
seems to be a good choice.
under the constraint Computational results of applying our algorithm to much more com-
plicated optimal control problems w ibe reported in a forthcoming,
l
l
-ltu(t)<l.
paper.
This problem was solved in [l] by the second-order DDP. Note that
Huu=2 in this case. REFERENCES
We solved this problemby our algorithm, starting from the same
nominal control u(t)= -0.5 as in [l]. Fig. 1 shows the cost as a function D. H. Jacobson and D. Q. Maync, DifferenfiolQvnamic ProgrMvmirg.
New York:
Elsevier, 1970.
of iteration number. We set C'=O for all i , and our algorithm found the D.Q. Mayneand E. P O W T i ~ ~ t ~ r strong
d e r variation algorithms for optimal
optimal solution in several iterations as in [l]. Compared with the result control," J . Opfimiz.
77mv &I., vol. 16, pp. 277-301, 1975.
K.Ohno, ' A new approach to differential dynamic programming for discrete timc
in [l], the rate of reduction of the cost by our algorithm is better than systems," IEEE Tmns.A u f o m f .Coria, vol. AG23, pp. 37-47, 1978.
that by the second-order DDP. Fig. 2 shows the control function for B. Jirmark, 'Vn convergence control in differential dynamicprogmmming applied
various iterations. Note that each control function is continuous. Al- to realisticaircraftand differential game problems," in Proc I977 IEEE Conj
Dairion Conrr., pp. 471-479.
though the cost converges very fast, the convergence rate of the control [51 - , "A New convergence control technique in differential dynamic
functions appears to be slower. This indicates that even if the sequence p r o g r e Royal Inst. Technol., Stockhotm, Swedcn, Rep. =A-REG-7332,
1975.
of the cost functionals has converged already, computation mustbe L S. Pontryagin, V. G . Boltyanskii, R V. G a m k r e l i d z e , and E.F. Mishchenko, The
continued until the sequenceof control functions converges. Mathematid Thew of optimalProcesses. New Yo& Interscience, 1962.
w. waltet, D#,?w~,,I ~d rntesrp~ rnequolita. ~eriin: springer. 1970.
M. R Hstenes, OpWzution l 7 i w f y : The Finite Dimemiona/ Care. New York
VII. CONWJDINGREMARKS Wiley. 1975.
E Asplundand L.Bungart, A Firsr Course in Integration. NewYork Holt,
€finehartand Winston, 1966.
Global convergence conditions have been investigated for our algo- L. J. Williamron and E. Polak, "Relaxed controls and the oonvcrgence of optimal
rithm, which can be derived naturally from the Pontryagin minimum umtrol algorithms," SIAM J . C o r n . Oprimiz., VOL 14, pp. 737-756. 1976.