Continuous Time Finance
Lisbon 2013
Tomas Björk
Stockholm School of Economics
Tomas Björk, 2013
Contents
• Stochastic Calculus (Ch 4-5).
• Black-Scholes (Ch 6-7.
• Completeness and hedging (Ch 8-9.
• The martingale approach (Ch 10-12).
• Incomplete markets (Ch 15).
• Dividends (Ch 16).
• Currency derivatives (Ch 17).
• Stochastic Control Theory (Ch 19)
• Martingale Methods for Optimal Investment (Ch 20)
Textbook:
Björk, T: “Arbitrage Theory in Continuous Time”
Oxford University Press, 2009. (3:rd ed.)
Tomas Björk, 2013 1
Notation
Xt = any random process,
dt = small time step,
dXt = Xt+dt − Xt
• We often write X(t) instead of Xt .
• dXt is called the increment of X over the interval
[t, t + dt].
• For any fixed interval [t, t + dt], the increment dXt
is a stochastic variable.
• If the increments dXs and dXt, over the disjoint
intervals [s, s + ds] and [t, t + dt] are independent,
then we say that X has independent increments.
• If every increment has a normal distribution we say
that X is a normal, or Gaussian process.
Tomas Björk, 2013 6
The Wiener Process
A stochastic process W is called a Wiener process if
it has the following properties
• The increments are normally distributed: For s < t:
Wt − Ws ∼ N [0, t − s]
E[Wt − Ws] = 0, V ar[Wt − Ws] = t − s
• W has independent increments.
• W0 = 0
• W has continuous trajectories.
Continuous random walk
Note: In Hull, a Wiener process is typically denoted
by Z instead of W .
Tomas Björk, 2013 7
A Wiener Trajectory
0.8
0.6
0.4
0.2
−0.2
−0.4 t
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
Tomas Björk, 2013 8
Important Fact
Theorem:
A Wiener trajectory is, with probability one, a
continuous curve which is nowhere differentiable.
Proof. Hard.
Tomas Björk, 2013 9
Wiener Process with Drift
A stochastic process X is called a Wiener process
with drift µ and diffusion coefficient σ if it has the
following dynamics
dXt = µdt + σdWt,
where µ and σ are constants.
Summing all increments over the interval [0, t] gives us
Xt − X0 = µ · t + σ · (Wt − W0 ),
Xt = X0 + µt + σWt
Thus
Xt ∼ N [X0 + µt, σ 2 t]
Tomas Björk, 2013 10
Itô processes
We say, losely speaking, that the process X is an Itô
process if it has dynamics of the form
dXt = µtdt + σt dWt,
where µt and σt are random processes.
Informally you can think of dWt as a random variable
of the form
dWt ∼ N [0, dt]
To handle expressions like the one above, we need
some mathematical theory.
First, however, we present an important example,
which we will discuss informally.
Tomas Björk, 2013 11
Example: The Black-Scholes model
Price dynamics: (Geometrical Brownian Motion)
dSt = µStdt + σStdWt,
Simple analysis:
Assume that σ = 0. Then
dSt = µStdt
Divide by dt!
dSt
= µSt
dt
This is a simple ordinary differential equation with
solution
St = s0 eµt
Conjecture: The solution of the SDE above is a
randomly disturbed exponential function.
Tomas Björk, 2013 12
Intuitive Economic Interpretation
dSt
= µdt + σdWt
St
Over a small time interval [t, t + dt] this means:
Return = (mean return)
+ σ × (Gaussian random disturbance)
• The asset return is a random walk (with drift).
• µ = mean rate of return per unit time
• σ = volatility
Large σ = large random fluctuations
Small σ = small random fluctuations
• The returns are normal.
• The stock price is lognormal.
Tomas Björk, 2013 13
A GBM Trajectory
10
0 t
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
Tomas Björk, 2013 14
Stochastic Differentials and Integrals
Consider an expression of the form
dXt = µtdt + σt dWt,
X0 = x0
Question: What exactly do we mean by this?
Answer: Write the equation on integrated form as
Z t Z t
Xt = x0 + µsds + σsdWs
0 0
How is this interpreted?
Tomas Björk, 2013 15
Recall:
Z t Z t
Xt = x0 + µsds + σsdWs
0 0
Two terms:
• Z t
µsds
0
This is a standard Riemann integral for each µ-
trajectory.
• Z t
σsdWs
0
Stochastic integral. This can not be interpreted
as a Stieljes integral for each trajectory. We need a
new theory for this Itô integral.
Tomas Björk, 2013 16
Information
Consider a Wiener process W .
Def:
FtW = “The information generated by W
over the interval [0, t]”
Def: Let Z be a stochastic variable. If the value of Z
is completely determined by FtW , we write
Z ∈ FtW
Ex:
For the stochastic variable Z, defined by
Z 5
Z= Wsds,
0
we have Z ∈ F5W .
We do not have Z ∈ F4W .
Tomas Björk, 2013 17
Adapted Processes
Let W be a Wiener process.
Definition:
A process X is adapted to the filtration
FtW : t ≥ 0 if
Xt ∈ FtW , ∀t ≥ 0
“An adapted process does not look
into the future”
Adapted processes are nice integrands for stochastic
integrals.
Tomas Björk, 2013 18
• The process Z t
Xt = Wsds,
0
is adapted.
• The process
Xt = sup Ws
s≤t
is adapted.
• The process
Xt = sup Ws
s≤t+1
is not adapted.
Tomas Björk, 2013 19
The Itô Integral
We will define the Itô integral
Z b
gsdWs
a
for processes g satisfying
• The process g is adapted.
• The process g satisfies
Z b 2
E gs ds < ∞
a
This will be done in two steps.
Tomas Björk, 2013 20
Simple Integrands
Definition:
The process g is simple, if
• g is adapted.
• There exists deterministic points t0 . . . , tn with
a = t0 < t1 < . . . < tn = b such that g is piecewise
constant, i.e.
g(s) = g(tk ), s ∈ [tk , tk+1)
For simple g we define
Z b n−1
X
gsdWs = g(tk ) [W (tk+1 ) − W (tk )]
a k=0
FORWARD INCREMENTS!
Tomas Björk, 2013 21
Properties of the Integral
Theorem: For simple g the following relations hold
• The expected value is given by
"Z #
b
E gsdWs = 0
a
• The second moment is given by
!
Z b 2 Z b
2
E gsdWs = E gs ds
a a
• We have Z b
gsdWs ∈ FbW
a
Tomas Björk, 2013 22
General Case
For a general g we do as follows.
1. Approximate g with a sequence of simple gn such
that Z b h i
2
E {gn(s) − g(s)} ds → 0.
a
2. For each n the integral
Z b
gn(s)dW (s)
a
is a well defined stochastic variable Zn.
3. One can show that the Zn sequence converges to a
limiting stochastic variable.
Rb
4. We define a gdW by
Z b Z b
g(s)dW (s) = lim gn(s)dW (s).
a n→∞ a
Tomas Björk, 2013 23
Properties of the Integral
Theorem: For general g following relations hold
• The expected value is given by
"Z #
b
E gsdWs = 0
a
• We do in fact have
"Z #
b
E gsdWs Fa = 0
a
• The second moment is given by
!
Z b 2 Z b
2
E gsdWs = E gs ds
a a
Tomas Björk, 2013 24
• We have Z b
gsdWs ∈ FbW
a
Tomas Björk, 2013 25
Martingales
Definition: An adapted process is a martingale if
E [Xt| Fs] = Xs, ∀s ≤ t
“A martingale is a process without drift”
Proposition: For any g (sufficiently integrable) he
process Z t
Xt = gsdWs
0
is a martingale.
Proposition: If X has dynamics
dXt = µtdt + σt dWt
then X is a martingale iff µ = 0.
Tomas Björk, 2013 26
Continuous Time Finance
Stochastic Calculus
(Ch 4-5)
Tomas Björk
Tomas Björk, 2013 27
Stochastic Calculus
General Model:
dXt = µtdt + σt dWt
Let the function f (t, x) be given, and define the
stochastic process Zt by
Zt = f (t, Xt)
Problem: What does df (t, Xt ) look like?
The answer is given by the Itô formula.
We provide an intuitive argument. The formal proof is
very hard.
Tomas Björk, 2013 28
A close up of the Wiener process
Consider an “infinitesimal” Wiener increment
dWt = Wt+dt − Wt
We know:
dWt ∼ N [0, dt]
E[dWt] = 0, V ar[dWt] = dt
From this one can show
E[(dWt)2 ] = dt, V ar[(dWt )2] = 2(dt)2
Tomas Björk, 2013 29
Recall
E[(dWt)2 ] = dt, V ar[(dWt )2] = 2(dt)2
Important observation:
1. Both E[(dWt)2 ] and V ar[(dWt)2 ] are very small
when dt is small .
2. V ar[(dWt)2 ] is negligable compared to E[(dWt)2 ].
3. Thus (dWt )2 is deterministic.
We thus conclude, at least intuitively, that
(dWt )2 = dt
This was only an intuitive argument, but it can be
proved rigorously.
Tomas Björk, 2013 30
Multiplication table.
Theorem: We have the following multiplication table
(dt)2 = 0
dWt · dt = 0
2
(dWt) = dt
Tomas Björk, 2013 31
Deriving the Itô formula
dXt = µtdt + σt dWt
Zt = f (t, Xt)
We want to compute df (t, Xt )
Make a Taylor expansion of f (t, Xt ) including second
order terms:
∂f ∂f 1 ∂ 2f 2
df = dt + dXt + (dt)
∂t ∂x 2 ∂t2
1 ∂ 2f 2 ∂ 2f
+ 2
(dXt ) + dt · dXt
2 ∂x ∂t∂x
Plug in the expression for dX, expand, and use the
multiplication table!
Tomas Björk, 2013 32
∂f ∂f 1 ∂ 2f 2
df = dt + [µdt + σdW ] + (dt)
∂t ∂x 2 ∂t2
1 ∂ 2f 2 ∂ 2f
+ 2
[µdt + σdW ] + dt · [µdt + σdW ]
2 ∂x ∂t∂x
∂f ∂f ∂f 1 ∂ 2f 2
= dt + µ dt + σ dW + (dt)
∂t ∂x ∂x 2 ∂t2
1 ∂ 2f 2 2 2 2
+ 2
[µ (dt) + σ (dW ) + 2µσdt · dW ]
2 ∂x
∂ 2f 2 ∂ 2f
+ µ (dt) + σ dt · dW
∂t∂x ∂t∂x
Using the multiplikation table this reduces to:
2
∂f ∂f 1 2 ∂ f
df = +µ + σ 2
dt
∂t ∂x 2 ∂x
∂f
+ σ dW
∂x
Tomas Björk, 2013 33
The Itô Formula
Theorem: With X dynamics given by
dXt = µtdt + σt dWt
we have
∂f ∂f 1 2 ∂ 2f
df (t, Xt) = +µ + σ dt
∂t ∂x 2 ∂x2
∂f
+ σ dWt
∂x
Alternatively
∂f ∂f 1 ∂ 2f 2
df (t, Xt) = dt + dXt + 2
(dXt ) ,
∂t ∂x 2 ∂x
where we use the multiplication table.
Tomas Björk, 2013 34
Example: GBM
dSt = µStdt + σSt dWt
We smell something exponential!
Natural Ansatz:
St = eZt ,
Zt = ln St
Itô on f (t, s) = ln(s) gives us
∂f 1 ∂f ∂ 2f 1
= , = 0, =− 2
∂s s ∂t ∂s2 s
1 1 1 2
dZt = dSt − (dS t )
St 2 St2
1 2
= µ − σ dt + σdWt
2
Tomas Björk, 2013 35
Recall
1
dZt = µ − σ 2 dt + σdWt
2
Integrate!
Z t Z t
1
Zt − Z0 = µ − σ 2 ds + σ dWs
0 2 0
1
= µ − σ 2 t + σWt
2
Using St = eZt gives us
(µ− 12 σ 2)t+σWt
St = S0 e
Since Wt is N [0, t], we see that St has a lognormal
distribution.
Tomas Björk, 2013 36
Changing Measures
Consider a probability measure P on (Ω, F ), and
assume that L ∈ F is a random variable with the
properties that
L≥0
and
E P [L] = 1.
For every event A ∈ F we now define the real number
Q(A) by the prescription
Q(A) = E P [L · IA]
where the random variable IA is the indicator for A,
i.e. (
1 if A occurs
IA =
0 if Ac occurs
Tomas Björk, 2013 139
Recall that
Q(A) = E P [L · IA]
We now see that Q(A) ≥ 0 for all A, and that
Q(Ω) = E P [L · IΩ] = E P [L · 1] = 1
We also see that if A ∩ B = ∅ then
Q(A ∪ B) = E P [L · IA∪B ] = E P [L · (IA + IB )]
= E P [L · IA] + E P [L · IB ]
= Q(A) + Q(B)
Furthermore we see that
P (A) = 0 ⇒ Q(A) = 0
We have thus more or less proved the following
Tomas Björk, 2013 140
Proposition 2: If L ∈ F is a nonnegative random
variable with E P [L] = 1 and Q is defined by
Q(A) = E P [L · IA]
then Q will be a probability measure on F with the
property that
P (A) = 0 ⇒ Q(A) = 0.
I turns out that the property above is a very important
one, so we give it a name.
Tomas Björk, 2013 141
Absolute Continuity
Definition: Given two probability measures P and Q
on F we say that Q is absolutely continuous w.r.t.
P on F if, for all A ∈ F , we have
P (A) = 0 ⇒ Q(A) = 0
We write this as
Q << P.
If Q << P and P << Q then we say that P and Q
are equivalent and write
Q∼P
Tomas Björk, 2013 142
Equivalent measures
It is easy to see that P and Q are equivalent if and
only if
P (A) = 0 ⇔ Q(A) = 0
or, equivalently,
P (A) = 1 ⇔ Q(A) = 1
Two equivalent measures thus agree on all certain
events and on all impossible events, but can disagree
on all other events.
Simple examples:
• All non degenerate Gaussian distributions on R are
equivalent.
• If P is Gaussian on R and Q is exponential then
Q << P but not the other way around.
Tomas Björk, 2013 143
Absolute Continuity ct’d
We have seen that if we are given P and define Q by
Q(A) = E P [L · IA]
for L ≥ 0 with E P [L] = 1, then Q is a probability
measure and Q << P . .
A natural question is now if all measures Q << P
are obtained in this way. The answer is yes, and the
precise (quite deep) result is as follows. The proof is
difficult and therefore omitted.
Tomas Björk, 2013 144
The Radon Nikodym Theorem
Consider two probability measures P and Q on (Ω, F ),
and assume that Q << P on F . Then there exists a
unique random variable L with the following properties
1. Q(A) = E P [L · IA] , ∀A ∈ F
2. L ≥ 0, P − a.s.
3. E P [L] = 1,
4. L∈F
The random variable L is denoted as
dQ
L= , on F
dP
and it is called the Radon-Nikodym derivative of Q
w.r.t. P on F , or the likelihood ratio between Q and
P on F .
Tomas Björk, 2013 145
A simple example
The Radon-Nikodym derivative L is intuitively the local
scale factor between P and Q. If the sample space Ω
is finite so Ω = {ω1, . . . , ωn} then P is determined by
the probabilities p1, . . . , pn where
pi = P (ωi ) i = 1, . . . , n
Now consider a measure Q with probabilities
qi = Q(ωi ) i = 1, . . . , n
If Q << P this simply says that
pi = 0 ⇒ qi = 0
and it is easy to see that the Radon-Nikodym derivative
L = dQ/dP is given by
qi
L(ωi) = i = 1, . . . , n
pi
Tomas Björk, 2013 146
If pi = 0 then we also have qi = 0 and we can define
the ratio qi /pi arbitrarily.
If p1 , . . . , pn as well as q1, . . . , qn are all positive, then
we see that Q ∼ P and in fact
−1
dP 1 dQ
= =
dQ L dP
as could be expected.
Tomas Björk, 2013 147
The likelihood process on a filtered space
We now consider the case when we have a probability
measure P on some space Ω and that instead of just
one σ-algebra F we have a filtration, i.e. an increasing
family of σ-algebras {Ft}t≥0 .
The interpretation is as usual that Ft is the information
available to us at time t, and that we have Fs ⊆ Ft
for s ≤ t.
Now assume that we also have another measure Q,
and that for some fixed T , we have Q << P on FT .
We define the random variable LT by
dQ
LT = on FT
dP
Since Q << P on FT we also have Q << P on Ft
for all t ≤ T and we define
dQ
Lt = on Ft 0≤t≤T
dP
For every t we have Lt ∈ Ft, so L is an adapted
process, known as the likelihood process.
Tomas Björk, 2013 154
The L process is a P martingale
We recall that
dQ
Lt = on Ft 0≤t≤T
dP
Since Fs ⊆ Ft for s ≤ t we can use Proposition 5 and
deduce that
Ls = E P [Lt| Fs] s≤t≤T
and we have thus proved the following result.
Proposition: Given the assumptions above, the
likelihood process L is a P -martingale.
Tomas Björk, 2013 155
Where are we heading?
We are now going to perform measure transformations
on Wiener spaces, where P will correspond to the
objective measure and Q will be the risk neutral
measure.
For this we need define the proper likelihood process L
and, since L is a P -martingale, we have the following
natural questions.
• What does a martingale look like in a Wiener driven
framework?
• Suppose that we have a P -Wiener process W and
then change measure from P to Q. What are the
properties of W under the new measure Q?
These questions are handled by the Martingale
Representation Theorem, and the Girsanov Theorem
respectively.
Tomas Björk, 2013 156
4.
The Martingale Representation Theorem
Tomas Björk, 2013 157
Intuition
Suppose that we have a Wiener process W under
the measure P . We recall that if h is adapted (and
integrable enough) and if the process X is defined by
Z t
Xt = x0 + hsdWs
0
then X is a a martingale. We now have the following
natural question:
Question: Assume that X is an arbitrary martingale.
Does it then follow that X has the form
Z t
Xt = x0 + hsdWs
0
for some adapted process h?
In other words: Are all martingales stochastic integrals
w.r.t. W ?
Tomas Björk, 2013 158
Answer
It is immediately clear that all martingales can not be
written as stochastic integrals w.r.t. W . Consider for
example the process X defined by
(
0 for 0 ≤ t < 1
Xt =
Z for t ≥ 1
where Z is an random variable, independent of W ,
with E [Z] = 0.
X is then a martingale (why?) but it is clear (how?)
that it cannot be written as
Z t
Xt = x0 + hsdWs
0
for any process h.
Tomas Björk, 2013 159
Intuition
The intuitive reason why we cannot write
Z t
Xt = x0 + hsdWs
0
in the example above is of course that the random
variable Z “has nothing to do with” the Wiener process
W . In order to exclude examples like this, we thus need
an assumption which guarantees that our probability
space only contains the Wiener process W and nothing
else.
This idea is formalized by assuming that the filtration
{Ft}t≥0 is the one generated by the Wiener
process W .
Tomas Björk, 2013 160
The Martingale Representation Theorem
Theorem. Let W be a P -Wiener process and assume
that the filtation is the internal one i.e.
Ft = FtW = σ {Ws; 0 ≤ s ≤ t}
Then, for every (P, Ft)-martingale X, there exists a
real number x and an adapted process h such that
Z t
Xt = x + hsdWs,
0
i.e.
dXt = htdWt.
Proof: Hard. This is very deep result.
Tomas Björk, 2013 161
Note
For a given martingale X, the Representation Theorem
above guarantees the existence of a process h such that
Z t
Xt = x + hsdWs,
0
The Theorem does not, however, tell us how to find
or construct the process h.
Tomas Björk, 2013 162
5.
The Girsanov Theorem
Tomas Björk, 2013 163
Setup
Let W be a P -Wiener process and fix a time horizon
T . Suppose that we want to change measure from P
to Q on FT . For this we need a P -martingale L with
L0 = 1 to use as a likelihood process, and a natural
way of constructing this is to choose a process g and
then define L by
(
dLt = gtdWt
L0 = 1
This definition does not guarantee that L ≥ 0, so we
make a small adjustment. We choose a process ϕ and
define L by
(
dLt = Ltϕt dWt
L0 = 1
The process L will again be a martingale and we easily
obtain Rt
ϕ dW − 1 R t ϕ2ds
s s
Lt = e 0 2 0 s
Tomas Björk, 2013 164
Thus we are guaranteed that L ≥ 0. We now change
measure form P to Q by setting
dQ = LtdP, on Ft, 0 ≤ t ≤ T
The main problem is to find out what the properties
of W are, under the new measure Q. This problem is
resolved by the Girsanov Theorem.
Tomas Björk, 2013 165
The Girsanov Theorem
Let W be a P -Wiener process. Fix a time horizon T .
Theorem: Choose an adapted process ϕ, and define
the process L by
(
dLt = Ltϕt dWt
L0 = 1
Assume that E P [LT ] = 1, and define a new mesure Q
on FT by
dQ = LtdP, on Ft, 0 ≤ t ≤ T
Then Q << P and the process W Q , defined by
Z t
WtQ = Wt − ϕsds
0
is Q-Wiener. We can also write this as
dWt = ϕtdt + dWtQ
Tomas Björk, 2013 166
Changing the drift in an SDE
The single most common use of the Girsanov Theorem
is as follows.
Suppose that we have a process X with P dynamics
dXt = µtdt + σt dWt
where µ and σ are adapted and W is P -Wiener.
We now do a Girsanov Transformation as above, and
the question is what the Q-dynamics look like.
From the Girsanov Theorem we have
dWt = ϕtdt + dWtQ
and substituting this into the P -dynamics we obtain
the Q dynamics as
dXt = {µt + σtϕt} dt + σtdWtQ
Moral: The drift changes but the diffusion is
unaffected.
Tomas Björk, 2013 167
1. Dynamic Programming
• The basic idea.
• Deriving the HJB equation.
• The verification theorem.
• The linear quadratic regulator.
Tomas Björk, 2013 323
Problem Formulation
"Z #
T
max E F (t, Xt, ut)dt + Φ(XT )
u 0
subject to
dXt = µ (t, Xt, ut) dt + σ (t, Xt , ut) dWt
X0 = x0,
ut ∈ U (t, Xt), ∀t.
We will only consider feedback control laws, i.e.
controls of the form
ut = u(t, Xt)
Terminology:
X = state variable
u = control variable
U = control constraint
Note: No state space constraints.
Tomas Björk, 2013 324
Main idea
• Embedd the problem above in a family of problems
indexed by starting point in time and space.
• Tie all these problems together by a PDE: the
Hamilton Jacobi Bellman equation.
• The control problem is reduced to the problem of
solving the deterministic HJB equation.
Tomas Björk, 2013 325
Some notation
• For any fixed vector u ∈ Rk , the functions µu, σ u
and C u are defined by
µu(t, x) = µ(t, x, u),
σ u(t, x) = σ(t, x, u),
C u(t, x) = σ(t, x, u)σ(t, x, u)0 .
• For any control law u, the functions µu , σ u, C u(t, x)
and F u(t, x) are defined by
µu(t, x) = µ(t, x, u(t, x)),
σ u (t, x) = σ(t, x, u(t, x)),
C u(t, x) = σ(t, x, u(t, x))σ(t, x, u(t, x))0 ,
F u (t, x) = F (t, x, u(t, x)).
Tomas Björk, 2013 326
More notation
• For any fixed vector u ∈ Rk , the partial differential
operator Au is defined by
n
X X n
u u ∂ 1 u ∂2
A = µi (t, x) + Cij (t, x) .
i=1
∂xi 2 i,j=1 ∂xi ∂xj
• For any control law u, the partial differential
operator Au is defined by
n
X X n
∂ 1 ∂2
A =
u
µi (t, x)
u
+ Cij (t, x)
u
.
i=1
∂x i 2 i,j=1
∂xi ∂xj
• For any control law u, the process X u is the solution
of the SDE
dXtu = µ (t, Xtu, ut) dt + σ (t, Xtu, ut) dWt,
where
ut = u(t, Xtu)
Tomas Björk, 2013 327
Embedding the problem
For every fixed (t, x) the control problem Pt,x is defined
as the problem to maximize
"Z #
T
Et,x F (s, Xsu, us)ds + Φ (XTu ) ,
t
given the dynamics
dXsu = µ (s, Xsu , us) ds + σ (s, Xsu , us) dWs,
Xt = x,
and the constraints
u(s, y) ∈ U, ∀(s, y) ∈ [t, T ] × Rn.
The original problem was P0,x0 .
Tomas Björk, 2013 328
The optimal value function
• The value function
J : R+ × Rn × U → R
is defined by
"Z #
T
J (t, x, u) = E F (s, Xsu, us)ds + Φ (XTu )
t
given the dynamics above.
• The optimal value function
V : R+ × Rn → R
is defined by
V (t, x) = sup J (t, x, u).
u∈U
• We want to derive a PDE for V .
Tomas Björk, 2013 329
Assumptions
We assume:
• There exists an optimal control law û.
• The optimal value function V is regular in the sense
that V ∈ C 1,2 .
• A number of limiting procedures in the following
arguments can be justified.
Tomas Björk, 2013 330
Bellman Optimality Principle
Theorem: If a control law û is optimal for the time
interval [t, T ] then it is also optimal for all smaller
intervals [s, T ] where s ≥ t.
Proof: Exercise.
Tomas Björk, 2013 331
Basic strategy
To derive the PDE do as follows:
• Fix (t, x) ∈ (0, T ) × Rn.
• Choose a real number h (interpreted as a “small”
time increment).
• Choose an arbitrary control law u on the time inerval
[t, t + h].
Now define the control law u? by
? u(s, y), (s, y) ∈ [t, t + h] × Rn
u (s, y) =
û(s, y), (s, y) ∈ (t + h, T ] × Rn.
In other words, if we use u? then we use the arbitrary
control u during the time interval [t, t + h], and then
we switch to the optimal control law during the rest of
the time period.
Tomas Björk, 2013 332
Basic idea
The whole idea of DynP boils down to the following
procedure.
• Given the point (t, x) above, we consider the
following two strategies over the time interval [t, T ]:
I: Use the optimal law û.
II: Use the control law u? defined above.
• Compute the expected utilities obtained by the
respective strategies.
• Using the obvious fact that û is least as good
as u?, and letting h tend to zero, we obtain our
fundamental PDE.
Tomas Björk, 2013 333
Strategy values
Expected utility for û:
J (t, x, û) = V (t, x)
Expected utility for u?:
• The expected utility for [t, t + h) is given by
"Z #
t+h
Et,x F (s, Xsu, us) ds .
t
• Conditional expected utility over [t + h, T ], given
(t, x):
Et,x V (t + h, Xt+h) .
u
• Total expected utility for Strategy II is
"Z #
t+h
Et,x F (s, Xsu , us) ds + V (t + h, Xt+h
u
) .
t
Tomas Björk, 2013 334
Comparing strategies
We have trivially
"Z #
t+h
V (t, x) ≥ Et,x F (s, Xsu, us) ds + V (t + h, Xt+h
u
) .
t
Remark
We have equality above if and only if the control law
u is the optimal law û.
Now use Itô to obtain
V (t + h, Xt+h
u
) = V (t, x)
Z t+h
∂V
+ (s, Xsu) + AuV (s, Xsu ) ds
t ∂t
Z t+h
+ ∇xV (s, Xsu)σ u dWs,
t
and plug into the formula above.
Tomas Björk, 2013 335
We obtain
"Z #
t+h
∂V
Et,x F (s, Xsu, us) + (s, Xsu ) + AuV (s, Xsu) ds ≤ 0.
t ∂t
Going to the limit:
Divide by h, move h within the expectation and let h tend to zero.
We get
∂V
F (t, x, u) + (t, x) + AuV (t, x) ≤ 0,
∂t
Tomas Björk, 2013 336
Recall
∂V
F (t, x, u) + (t, x) + AuV (t, x) ≤ 0,
∂t
This holds for all u = u(t, x), with equality if and only
if u = û.
We thus obtain the HJB equation
∂V
(t, x) + sup {F (t, x, u) + AuV (t, x)} = 0.
∂t u∈U
Tomas Björk, 2013 337
The HJB equation
Theorem:
Under suitable regularity assumptions the follwing hold:
I: V satisfies the Hamilton–Jacobi–Bellman equation
∂V
(t, x) + sup {F (t, x, u) + AuV (t, x)} = 0,
∂t u∈U
V (T, x) = Φ(x),
II: For each (t, x) ∈ [0, T ] × Rn the supremum in the
HJB equation above is attained by u = û(t, x), i.e. by
the optimal control.
Tomas Björk, 2013 338
Logic and problem
Note: We have shown that if V is the optimal value
function, and if V is regular enough, then V satisfies
the HJB equation. The HJB eqn is thus derived
as a necessary condition, and requires strong ad hoc
regularity assumptions, alternatively the use of viscosity
solutions techniques.
Problem: Suppose we have solved the HJB equation.
Have we then found the optimal value function and
the optimal control law? In other words, is HJB a
sufficient condition for optimality.
Answer: Yes! This follows from the Verification
Theorem.
Tomas Björk, 2013 339
The Verification Theorem
Suppose that we have two functions H(t, x) and g(t, x), such
that
• H is sufficiently integrable, and solves the HJB equation
8
< ∂H (t, x) + sup {F (t, x, u) + AuH(t, x)} = 0,
>
∂t u∈U
>
: H(T , x) = Φ(x),
• For each fixed (t, x), the supremum in the expression
sup {F (t, x, u) + AuH(t, x)}
u∈U
is attained by the choice u = g(t, x).
Then the following hold.
1. The optimal value function V to the control problem is given
by
V (t, x) = H(t, x).
2. There exists an optimal control law û, and in fact
û(t, x) = g(t, x)
Tomas Björk, 2013 340
Handling the HJB equation
1. Consider the HJB equation for V .
2. Fix (t, x) ∈ [0, T ] × Rn and solve, the static optimization
problem
u
max [F (t, x, u) + A V (t, x)] .
u∈U
Here u is the only variable, whereas t and x are fixed
parameters. The functions F , µ, σ and V are considered as
given.
3. The optimal û, will depend on t and x, and on the function
V and its partial derivatives. We thus write û as
û = û (t, x; V ) . (4)
4. The function û (t, x; V ) is our candidate for the optimal
control law, but since we do not know V this description is
incomplete. Therefore we substitute the expression for û into
the PDE , giving us the highly nonlinear (why?) PDE
∂V
(t, x) + F û (t, x) + Aû (t, x) V (t, x) = 0,
∂t
V (T , x) = Φ(x).
5. Now we solve the PDE above! Then we put the solution V
into expression (4). Using the verification theorem we can
identify V as the optimal value function, and û as the optimal
control law.
Tomas Björk, 2013 341
Making an Ansatz
• The hard work of dynamic programming consists in
solving the highly nonlinear HJB equation
• There are no general analytic methods available
for this, so the number of known optimal control
problems with an analytic solution is very small
indeed.
• In an actual case one usually tries to guess a
solution, i.e. we typically make a parameterized
Ansatz for V then use the PDE in order to identify
the parameters.
• Hint: V often inherits some structural properties
from the boundary function Φ as well as from the
instantaneous utility function F .
• Most of the known solved control problems have,
to some extent, been “rigged” in order to be
analytically solvable.
Tomas Björk, 2013 342
The Linear Quadratic Regulator
"Z #
T
min E {Xt0QXt + u0tRut } dt + XT0 HXT ,
u∈Rk 0
with dynamics
dXt = {AXt + But } dt + CdWt.
We want to control a vehicle in such a way that it stays
close to the origin (the terms x0Qx and x0Hx) while
at the same time keeping the “energy” u0 Ru small.
Here Xt ∈ Rn and ut ∈ Rk , and we impose no control
constraints on u.
The matrices Q, R, H, A, B and C are assumed to be
known. We may WLOG assume that Q, R and H are
symmetric, and we assume that R is positive definite
(and thus invertible).
Tomas Björk, 2013 343
Handling the Problem
The HJB equation becomes
∂V 0 0
(t, x) + inf u∈R k {x Qx + u Ru + [∇x V ](t, x) [Ax + Bu]}
∂t
1P ∂ 2V 0
+ 2 i,j ∂x ∂x (t, x) [CC ]i,j = 0,
i j
V (T, x) = x0Hx.
For each fixed choice of (t, x) we now have to solve the static unconstrained
optimization problem to minimize
x0 Qx + u0Ru + [∇xV ](t, x) [Ax + Bu] .
Tomas Björk, 2013 344
The problem was:
min x0Qx + u0Ru + [∇xV ](t, x) [Ax + Bu] .
u
Since R > 0 we set the gradient to zero and obtain
2u0R = −(∇xV )B,
which gives us the optimal u as
1 −1 0
û = − R B (∇x V )0.
2
Note: This is our candidate of optimal control law,
but it depends on the unkown function V .
We now make an educated guess about the structure
of V .
Tomas Björk, 2013 345
From the boundary function x0 Hx and the term x0Qx
in the cost function we make the Ansatz
V (t, x) = x0P (t)x + q(t),
where P (t) is a symmetric matrix function, and q(t) is
a scalar function.
With this trial solution we have,
∂V
(t, x) = x0Ṗ x + q̇,
∂t
∇xV (t, x) = 2x0 P,
∇xxV (t, x) = 2P
û = −R−1B 0 P x.
Inserting these expressions into the HJB equation we
get
n o
x0 Ṗ + Q − P BR−1 B 0 P + A0P + P A x
+q̇ + tr[C 0P C] = 0.
Tomas Björk, 2013 346
We thus get the following matrix ODE for P
(
Ṗ = P BR−1 B 0 P − A0P − P A − Q,
P (T ) = H.
and we can integrate directly for q:
(
q̇ = −tr[C 0P C],
q(T ) = 0.
The matrix equation is a Riccati equation. The
equation for q can then be integrated directly.
Final Result for LQ:
Z T
0
V (t, x) = x P (t)x + tr[C 0P (s)C]ds,
t
û(t, x) = −R−1B 0P (t)x.
Tomas Björk, 2013 347