Advanced Financial Models Guide
Advanced Financial Models Guide
Michael R. Tehranchi
Contents
4
Financial mathematics as a subject is young (as compared to, say, number theory), but
it is mature enough now that there has emerged some consensus on the notation, vocabulary
and important results. These notes are an attempt to present many of the main ingredients
of this theory, mainly concerning the pricing and hedging of derivative securities.
But before launching into the story, we will begin by acknowledging some of the real-world
complications that will not be discussed at length hereafter.
1.1. Dividends. The total stock of a publicly traded firm is divided into a fixed number
N of shares. The owner of each share is then entitled to the fraction 1/N of the total profit
of the firm.1 A portion of the firm’s profit is usually reinvested by management, for instance
by building new factories, but the rest of the profit is paid out to the shareholders. In
particular, the owner of each share of stock will receive periodically a dividend payment.
However, in this course,
we will assume that there are no dividend payments.
Actually, this assumption is not as terrible as it sounds. Example sheet 1 will show how to
adapt the theory developed for assets that pay no dividends to incorporate assets that have
non-zero dividend payments.
1.2. Tick size. Financial markets usually have a smallest increment of price, the tick.
(The tick refers back to the days when prices were quoted on ticker tape.) Indeed, the tick
size can vary from market to market, and even for assets traded in the same market. There
seems to be an industry-wide effort to hamonise tick sizes, but a quick google search found
this document
http://cdn.batstrading.com/resources/participant resources/BATSEuro Ticks.pdf
which highlights the complexity of the system in Europe.
However, in this course,
we will assume that the tick size is zero.
This is a convenient assumption for those who prefer continuous mathematics to discrete. It
is usually a harmless assumption, unless the prices of interest are very close to zero.
1Actually, things are even more complicated. For instance, stocks can be classified as either common or
preferred, with implications on dividends, voting rights and claims on the firm’s assets in case of bankruptcy.
Also, the number N of shares outstanding is not necessarily fixed.
5
1.3. Transactions costs. Financial transactions are processed by a string of middle
men, each of whom charge a fee for their services. Usually the fee is nearly proportional to
the size of the transaction.
However, in this course,
we will assume that there are no transactions costs.
This assumption is justified by by the fact that transactions costs are often very small relative
to the size of typical transactions. But one must always remember that in some applications,
it might not be wise to neglect these costs.
1.4. Short-selling constraints. In the real world, it is actually possible for someone
to sell an asset that he does not own. The essential mechanism is to borrow a share of that
asset from a broker, and then immediately to sell it to the market. This procedure is called
short selling.
Brokers, however, place contraints on this behaviour. Indeed, they usually require collat-
eral and charge a fee for their service. Furthemore, if the market price of the asset increases,
or if the price of the collateral decreases, the broker may ask the short seller to put up even
more collateral.
However, in this course,
we will assume that there are no short-selling constraints.
Indeed, the theory of discrete-time trading is cleaner without additional assumptions on the
sizes of trades. But we will see that to overcome some technical problems in the theory of
continuous-time trading, it will be natural to restrict trading to what are called admissible
strategies.
1.6. Bid-ask spread. Real-world trading is asymmetrical since the price to buy a share
is usually higher than the price to sell it. The reason is that are two different ways to buy
or sell an asset listed on an exchange: the limit order and the market order.
A limit buy order is an offer to buy a certain number of shares of the asset at a certain
price. A limit sell order is defined similarly. The collection of unfilled limit orders is called
the limit order book.
At any time, there is the highest price for which there is an order to buy the asset.
This is called the bid price. The lowest price for which there is an order to sell is called
the ask price. The bid/ask spread is the difference. Figure 1 illustrates the evolution of a
hypothetical limit order book as various orders arrive and are filled.
A market order are instructions to execute a transaction at the best available price.
In particular, if the market order is to buy, then the lowest limit sell order is filled first.
Therefore, for small market buy orders, the per share price paid is the ask price. Similarly,
6
if a market sell order arrives, then the highest limit buy order is filled first, and hence the
per share price received is the bid price.
However, in this course,
we will assume that there are no bid-ask spreads.
This assumption is justified by the observation that in many markets, the spread is very
small. However, in times of crisis, this assumption is not usually applicable, and hence the
theory breaks down dramatically.
Figure 1. Top left. The bid price is £8 and the ask is £11. Top right.
A limit sell order for three shares at £11 arrives. Bottom left. A limit buy
order for two shares at £8 is cancelled. Bottom right. A market order to
buy five shares arrives. Note that four shares are sold at £11 and one at £12.
After the transaction, the ask price is £12.
1.7. Market depth. As described above, there are only a finite number of limit orders
on the book at one time. If a large market buy order arrives, for instance, then the lowest
limit sell order is filled first. But if the market order is bigger than the total shares available
to buy at the ask price, then the limit orders at the next-to-lowest price are filled, and
progresses up the book until the market order is finally filled. In this way, the ask price
increases.
7
The market depth is the number of shares available to buy or sell at the ask or bid price
respectively. Equivalently, the depth of a market is a measure of the size of a market order
necessary to move quoted prices.
However, in this course,
we will assume that there is infinite market depth.
Equivalently, we will assume that investors are small relative to the limit order book, so they
are price takers, not price makers. However, the most recent financial crisis shows that this
assumption does not always approximate reality – just ask the traders at Lehman Brothers!
2. Prerequisite knowledge
The emphasis of this course is on some of the mathematical aspects of financial market
models. Very little is assumed of the reader’s knowledge of the workings of financial markets.
However, some mathematical background is needed.
Our starting point is the famous observation (sometimes attributed to Niels Bohr) that
it is difficult to make predictions, especially about the future. Indeed, anyone with even
a passing acquaintance with finance knows that most of us cannot predict with absolute
certainty how the the price of an asset will fluctuate – otherwise we would be much richer!
Therefore, the proper language to formulate the models that we will study is the lan-
guage of probability theory. An attempt is made to keep this course self-contained, but you
should be familiar with the basics of the theory, including knowing the definition and key
properties of the following concepts: random variable, expected value, variance, conditional
probability/expectation, independence, Gaussian (normal) distribution, etc. Familarity with
measure theoretical probability is helpful, though a crashcourse on probability theory is given
in an appendix.
Please send all comments and corrections (including small typos and major blunders) to
me at [email protected].
8
CHAPTER 1
1. The set-up
The models we will encounter will be of form P = (Pt1 , . . . , Ptn )t∈T where Pti will model
the price of a financial asset (stock, bond, etc.) at time t ∈ T. In this course, the index set
T will be one of two sets
• Z+ = {0, 1, 2, . . .} when time is discrete, and
• R+ = [0, ∞) when time is continuous.
Usually, the context will be clear and we write ‘t ≥ 0’ for ‘t ∈ T’.
A modelling assumption that we will use throughout is that
of all 24 = 16 subsets of Ω. The probability measure is just the one that assigns P({ω}) = 1/4
equal probability to each elementary event.
The flow of information is modelled by the following sigma-fields
• F0 = {∅, Ω},
• F1 = {∅, {HH, HT }, {T H, T T }, Ω},
• F2 = F.
Now consider a stochastic process (Xt )t∈{0,1,2} that is adapted to the filtration (Ft )t∈{0,1,2} .
Intuitively, the value of the random variable Xt is known once after t tosses of the coin.
For instance, X0 must be a constant,
since there is no information before the experiment. On the other hand, the random variable
X1 must be of the form
b if ω ∈ {HH, HT }
X1 (ω) =
c if ω ∈ {T H, T T }
since the only information known at time 1 is whether or not the first coin came up heads.
Finally, X2 can be any function on Ω, that is, of the form
d if ω = HH
e if ω = HT
X2 (ω) =
f if ω = T H
g if ω = T T.
Alternatively, on this particular filtered probability space, the adapted process X can be
visualised by the tree diagram:
X d
@
1/2
b /e
? 1/2
1/2
a=
==
==
=
1/2 ==
1/2
c= /f
==
==
=
1/2 ==
g
Notice that for all t ∈ {0, 1, 2} the event {Xt ≤ x} is in Ft for every real x.
10
For this course, it will be convenient to assume that there is no randomness at time 0.
This can be made formal by assuming
the sigma-field F0 is trivial.
This means that if A is an element F0 then either P(A) = 0 or P(A) = 1. In particular,
every F0 -measurable random variable is almost surely constant. In the discrete-time theory,
there nothing loss by further assuming F0 = {∅, Ω}. However, it turns out that this further
assumption is technically inconvenient in the continuous-time theory.
where u is a function on [0, ∞). We will suppose that our investor prefers a consumption
stream c to c0 if and only if
U (c) > U (c0 ).
We will assume that u is strictly increasing models the assumption that the investor strictly
prefers more to less. (Usually we also assume that u is strictly concave, so that the investor
is risk-averse, strictly preferring to consume the non-random quantity E(C) to the random
quantity C, for any non-constant random variable C.)
We suppose that investor’s initial wealth is x ≥ 0 given. We also suppose that he will
live exactly to age T , and since he derives no utility from wealth in the afterlife, chooses a
strategy H such that HT +1 = 0 a.s. Summing up, the investor faces the problem
maximise U (c) subject to H0 · P0 = x, (Ht − Ht+1 ) · Pt = ct , and HT +1 = 0.
With this problem in mind, we introduce an important definition:
Definition. An arbitrage is an investment-consumption strategy H such that there
exists a non-random time T > 0 with the properties
• H0 = 0 = HT +1 almost surely and
• P (ct > 0 for some 0 ≤ t ≤ T ) > 0.
where (Ht − Ht+1 ) · Pt = ct .
Note that if H f is a feasible investment strategy for the above investment problem and
if H a is an arbitrage, then H f + H a is also feasible but has strictly higher expected utility
U (cf + ca ) > U (cf ).
Inductively, the strategy H f + kH a is feasible for every k ≥ 0. In particular, if there is an
arbitrage then there cannot be an optimal investment strategy to the utility maximisation
problem.
12
Remark. There are several problems with market models with arbitrages. First, arbi-
trages are scalable: if H is an arbitrage, so is kH for all k > 0. In particular, if there is an
arbitrage, we can extract an arbitrary amount of consumption out of the market by choosing
k as large as we please. However, we have assumed that the investor is small relative to the
market – remember we have agreed to ignore the investor’s price impact. However, clearly
for large enough k the investor is no longer small relative to the order book, and price impact
becomes important.
However, there is a more fundamental objection to admitting arbitrage. In this course, we
take the market price process (Pt )t≥0 as given. However, in reality, prices are set by market
clearing, so that supply equals demand. Typically we think that the supply of shares is
fixed, but the demand for shares arises from investors solving their own utility maximisation
problem. In particular, in equilibrium, prices are such that investors are holding their optimal
portfolio. But as mentioned above, if there is an arbitrage, optimal porfolios do not exist.
Hence, the notion of equilibrium is inconsistent with the existence of arbitrage strategies.
3.1. Motivation: Langrangian duality. As usual in a constrained optimisation prob-
lem, we apply the Lagrangian method. Recall that this involves replacing our given objective
function with the so-called Lagrangian which encodes the constraints on the processes H and
c. In this case the Lagrangian is
T
X
L(H, c, Y ) = E u(ct ) + Yt (Ht · Pt − Ht+1 · Pt − ct )
t=0
To identify the dual problem, we seek to find conditions on the Lagrange multiplier process
Y such that the quantity
sup{L(H, c, Y ) : ct ≥ 0, H predictable }
is finite. To this end, we employ the standard trick of linear programming - we rewrite the
Lagrangian as
T
X T
X
L(H, c, Y ) = E (u(ct ) − Yt ct ) + E Ht · (Pt Yt − Pt−1 Yt−1 ) + xY0 .
t=0 t=1
Now, looking at the first term, we see that if Yt ≤ 0, there would not exist a finite maximum
when we maximise over ct ≥ 0. So we see that the dual variable Y must satisfy
Yt > 0 almost surely for all t ≥ 0
Look at the second term: since Ht is an arbitrary Ft−1 -measurable random vector, the
requirement of a finite maximum leads us to
E(Pt Yt |Ft−1 ) = Pt−1 Yt−1 .
The notation E(X|G) denotes the conditional expectation of the random variable X with
respect to the sigma-field G. The precise definition will be recalled below.
Note that there is nothing rigorous to this argument. The intention of this section is just
to show that the definition of a martingale deflator which will present now follows naturally
from the utility maximisation problem.
13
Definition. A martingale deflator is a strictly positive adapted process Y = (Yt )t≥0
such that the n-dimensional random variable Yt Pt is integrable for each t ≥ 0 and such that
E(Yt Pt |Ft−1 ) = Yt−1 Pt−1
for all t ≥ 1.
*****
We briefly recall some notions from probability.
Definition. Given a probability space (Ω, F, P), let G ⊆ F be a sub-sigma-field of
events. A random variable X : Ω → R is measurable with respect to G ( or briefly, G-
measurable) if and only if the event {X ≤ x} is an element of G for all x ∈ R.
You know what that the conditional expectation of an integrable random variable X
given a non-null event G means
E(X 1G )
E(X|G) =
P(G)
The next theorem leads to a definition of conditional expectation given a sigma-field:
Theorem (Existence and uniqueness of conditional expectations). Let X be an integrable
random variable defined on the probability space (Ω, F, P), and let G ⊆ F be a sub-sigma-field
of F. Then there exists an integrable G-measurable random variable Y such that
E(1G Y ) = E(1G X)
for all G ∈ G. Furthermore, if there exists another G-measurable random variable Y 0 such
that E(1G Y 0 ) = E(1G X) for all G ∈ G, then Y = Y 0 almost surely.
Definition. Let X be an integrable random variable and let G ⊂ F be a sigma-field.
The conditional expectation of X given G, written E(X|G), is a G-measurable random variable
with the property that
E [1G E(X|G)] = E(1G X)
for all G ∈ G.
Example. (Sigma-field generated by a countable partition) Let X be a non-negative
random variable definedSon (Ω, F, P). Let G1 , G2 , . . . be a sequence of disjoint events with
P(Gn ) > 0 for all n and n∈N Gn = Ω.
Let G be the Ssmallest sigma-field containing {G1 , G2 , . . . , ...}. That is, every element of
G is of the form n∈I Gn where I ⊆ N. Then
E(X 1Gn )
E(X|G)(ω) = E(X|Gn ) = if ω ∈ Gn
P(Gn )
where the right-hand side denotes conditional expection given the event Gn .
More concretely, suppose Ω = {HH, HT, T H, T T } consists of two tosses of a coin, and
let G = {∅, {HH, HT }, {T H, T T }, Ω} be the sigma-field containg the information revealed
14
by the first toss. Suppose the coin is fair, so that each outcome is equally likely. Consider
the random variable
a if ω = HH
b if ω = HT
X(ω) =
c if ω = T H
d if ω = T T.
Then
(a + b)/2 if ω ∈ {HH, HT }
E(X|G)(ω) =
(c + d)/2 if ω ∈ {T H, T T }
Example. We now construct one of the most important examples of a martingale. Let
X be an integrable random variable, and let
Mt = E(X|Ft ).
Then M = (Mt )t≥0 is a martingale.
Integrability follows from the theorem on the existence and uniqueness of conditional
expectation. Indeed, not that by Jensen’s inequality
E(|Mt |) = E(|E(X|Ft )|)
≤ E(E(|X| Ft ))
= E(|X|)
Now, for every 0 ≤ s ≤ t we have
E(Mt |Fs ) = E[E(X|Ft )|Fs ]
= E(X|Fs ) = Ms
by the tower property. Notice that this example also works in continuous time.
Sometimes we are given a process (Mt )0≤t≤T where T > 0 is a fixed, non-random time
horizon. To check that this process is a martingale, we need only check that
Mt = E(MT |Ft ) for all 0 ≤ t ≤ T,
because this corresponds to the construction above with X = MT .
16
Example. This last example is theorem shows how to take one martingale and build
another one. Let M be a martingale and let K be a bounded predictable process. Then the
process N defined by
Xt
Nt = Ks (Ms − Ms−1 )
s=1
is a martingale.
Indeed, by assumption, we have E(|Mt |) < ∞ for all t since M is a martingale and that
there exist a constant C > 0 such that |Kt | ≤ C almost surely for all t ≥ 0. Hence
t
X
E(|Nt |) ≤ E(|Ks ||Ms − Ms−1 |)
s=1
Xt
≤ C[E(|Ms |) + E(|Ms−1 |)] < ∞
s=1
Using the predictability of K and the slot property of conditional expectation, we have
E(Nt+1 − Nt |Ft ) = E(Kt+1 (Mt+1 − Mt )|Ft )
= Kt+1 E(Mt+1 − Mt |Ft )
=0
and we’re done.
Remark. The martingale N above is often called a martingale transform or a discrete
time stochastic integral. As we will see, it is one of the key building blocks for the continuous
time theory to come.
Example. Obviously, non-random times are stopping times. That is, if τ = t0 for some
fixed t0 ≥ 0, then {τ ≤ t} = Ω if t0 ≤ t and ∅ otherwise.
Example. Here is a typical example of a stopping time. Let (Yt )t≥0 be a discrete-time
adapted process and let A be a Borel set. Then the random variable
τ = inf{t ≥ 0 : Yt ∈ A}
(with the usual convention that inf ∅ = +∞) corresponding to the first time the process
enters the set A is a stopping time. Indeed,
t
[
{τ ≤ t} = {Ys ∈ A}
s=0
Remark. Note that the local martingale property is also stable under stopping. Indeed,
let X be a local martingale and τ a stopping time. Then by definition, there exists a sequence
of stopping times σN ↑ ∞ such that X σN is a martingale. Hence (X σN )τ = X σN ∧τ is again a
martingale since σN ∧ τ is a stopping time. But note that X σN ∧τ = (X τ )σN , implying that
the sequence of stopping times σN ↑ ∞ is such that (X τ )σN is a martingale. This means X τ
is a local martingale.
Remark. This is the martingale transform as before, but now do not insist that K is
bounded or that X is a true martingale. As a consequence, we cannot assert that Y is a
true martingale, merely a local martingale. The idea is that by localising, we can study the
algebraic and measurability structure of the martingale transform without worrying about
integrability issues.
Proof. Let τN = inf{t ≥ 0 : |Kt+1 | > N } with the convention inf ∅ = +∞. Note that
τN is a stopping time since K is predictable. Now writing
t
Ks 1{s≤τN } (Xs − Xs−1 )
X
YtτN =
s=1
we see that the stopped process is the martingale transform of the bounded predictable
process (Kt 1{t≤τN } )t≥1 with respect to the martingale X, and hence is a martingale.
The next theorem gives a sufficient condition that a local martingale is a true martingale.
Proof. Let (τN )N be a localising sequence of stopping times for X. Note that Xt∧τN →
Xt a.s. since τN ↑ ∞. Furthermore, by assumption |Xt∧τN | ≤ Yt which is integrable, so we
20
may apply the conditional version of the dominated convergence theorem to conclude
E(Xt |Fs ) = E(lim Xt∧τN |Fs )
N
= lim E(Xt∧τN |Fs )
N
= lim Xs∧τN
N
= Xs
for 0 ≤ s ≤ t, where we have used the fact that the stopped process (Xt∧τN )t≥0 is a martingale.
The following corollary is useful:
Corollary. Suppose X is a DISCRETE-TIME local martingale such that E(|Xt |) < ∞
for all t ≥ 0. Then X is a true martingale.
Proof. Let Yt = |X0 | + . . . + |Xt |. The process Y is integrable by assumption and
|Xs | ≤ Yt for all 0 ≤ s ≤ t. The conclusion follows from the previous theorem.
In the absense of integrability, the next best property is non-negativity. First we need
some definitions.
Definition. A supermartingale relative to a filtration (Ft )t≥0 is an adapted stochastic
process (Ut )t≥0 with the following properties:
• E(|Ut |) < ∞ for all t ≥ 0
• E(Ut |Fs ) ≤ Us for all 0 ≤ s ≤ t.
A submartingale is an adapted process (Vt )t≥0 with the following properties:
• E(|Vt |) < ∞ for all t ≥ 0
• E(Vt |Fs ) ≥ Vs for all 0 ≤ s ≤ t.
Remark. Hence a supermartingale decreases on average, while a submartingale increases
on average. A martingale is a stochastic process that is both a supermartingale and a
submartingale.
As in the case of the definition of martingale, to show that an adapted, integrable process
U is a supermartingale in discrete time, it is enough to show that E(Ut+1 |Ft ) ≤ Ut for all
t ≥ 0.
Theorem. Suppose X is a local martingale in either continuous or discrete time. If
Xt ≥ 0 for all t ≥ 0, then X is a supermartingale.
Proof. In the general case, let (τN )N be the localising sequence for X. First we show
that Xt is integrable for each t ≥ 0. Fatou’s lemma yields
E(|Xt |) = E(Xt )
= E(lim Xt∧τN )
N
≤ lim inf E(Xt∧τN )
N
= X0 < ∞.
21
Now that we have established integrability, we can discuss conditional expectations. The
conditional version of Fatou’s lemma yields
E(Xt |Fs ) = E(lim Xt∧τN |Fs )
N
≤ lim inf E(Xt∧τN |Fs )
N
= lim inf Xs∧τN
N
= Xs
for 0 ≤ s ≤ t, as claimed.
*****
The final step of the proof of the easier direction of the first fundamental theorem of
asset pricing is now complete.
22
5. Proof the harder direction of 1FTAP
This is the one period case. The full multi-period proof is a little more difficult because
of some technicalities involving measurability.
Recall that an arbitrage with T = 1 is a process (Ht )0≤t≤2 such that H0 = H2 = 0 and
where P(c0 > 0 or c1 > 0) > 0 where
c0 = −H1 · P0 ≥ 0 and c1 = H1 · P1 ≥ 0 a.s.
We now suppose that the market has no arbitrage, so that for any vector H ∈ Rn such that
H · P0 ≤ 0 ≤ H · P1 a.s. it must be the case that H · P0 = 0 = H · P1 a.s. We will show that
this implies there exists a random variable Z > 0 a.s. so that
E(P1 Z) = P0 .
Then the process (Yt )0≤t≤1 is a martingale deflator, where Y0 = 1 and Y1 = Z.
Define a function F : Rn → R by
F (h) = eh·P0 + E[e−h·P1 ζ]
2
where the random variable 0 < ζ ≤ e−kPt k /2 is introduced to ensure integrability. Notice
that F is finite valued and smooth.
We will show that no investment-consumption arbitrage implies that the function F has
a minimiser H ∗ . By the first order condition for a minimum, we have
∗ ·P ∗ ·P
0 = ∇F (H ∗ ) = eH 0
P0 − E[e−H 1
ζPt |Ft−1 ]
and hence we may take
∗ ∗
Z = e−H ·P0 −H ·P1 ζ.
So let (Hk )k be a sequence such that F (Hk ) → inf H F (H). If (Hk )k is bounded, we can
pass to a convergent subsequence, by the Bolzano–Weierstrass theorem, such that Hk → H ∗ .
By the smoothness of F we have
inf F (H) = lim F (Hk ) = F (lim Hk ) = F (H ∗ )
H k k
∗
so H is our desired minimiser.
It remains to show that no arbitrage implies that the sequence (Hk )k is bounded. So for
the sake of finding a contradiction, suppose (Hk )k is unbounded.
We can pass to a subsequence such that kHK k ↑ ∞. Now let
U = {u ∈ Rn : u · P0 = 0 = u · P1 a.s.} ⊆ Rn
and let
V = U ⊥.
Notice that if u ∈ U and v ∈ V then F (u + v) = F (v). Hence, we may assume Hk ∈ V for
all k.
Now let
Hk
Ĥk = .
kHk k
Note that kĤk k = 1 and that Ĥk ∈ V. Since (Ĥk )k is bounded, we can again pass to a
convergent subsequence such that Ĥk → Ĥ. Notice once more that kĤk = 1 and that
Ĥ ∈ V.
23
We know that the sequence F (Hk ) is bounded (since it is convergent) but we also have
F (Hk ) = (eĤk ·P0 )kHk k + E[(e−Ĥk ·P1 )kHk k ζ]
so we must conclude that Ĥ · P0 ≤ 0 ≤ Ĥ · P1 a.s. (since otherwise the right-hand side would
blow up).
By the assumption of no arbitrage we conclude that Ĥ · P0 = 0 = Ĥ · P1 a.s., which
means Ĥ ∈ U. But we also know that Ĥ ∈ V. Since the subspaces are orthogonal, we
have U ∩ V = {0}, and in particular, we have H = 0. But this contradicts the fact that
kĤk = 1.
6. Numéraires and equivalent martingale measures
In this section, we introduce the concepts of numéraire assets and equivalent martingale
measures. The primary purpose of this section is to reconcile concepts and terminology used
by other authors to the theory developed so far. We will also find that equivalent martingale
measures can be used to simplify some calculations later in the course.
In most discussions of arbitrage theory, there is the assumption that at least one asset is
a numéraire:
Definition. An asset is a numéraire iff its price is strictly positive for all time, almost
surely.
Having a numéraire in the market simplifies the story in some ways. For instance, when
we discuss arbitrage theory, we no longer have to allow for intermediate consumption.
Definition. A pure investment arbitrage is a predictable process H such that for some
non-random time horizon T > 0 we have
• H0 · P0 = 0 ≤ HT · PT a.s.
• P(HT · PT > 0) > 0
where H satisfies the self-financing condition
(Ht − Ht+1 ) · Pt = 0 for all 0 ≤ t ≤ T − 1.
Proposition. Suppose the market model has a numéraire asset. There exists a pure-
investment arbitrage if and only if there exists an investment-consumption arbitrage.
Proof. First let (Kt )0≤t≤T be a pure-investment arbitrage for the time horizon T > 0.
By setting KT +1 = 0 and cT = KT · PT , we have an investment-consumption arbitrage
(Kt )0≤t≤T +1 .
So, suppose (Ht )0≤t≤T +1 is an investment-consumption arbitrage. We can order the assets
such that P = (N, S) where the first asset is the numéraire with positive price process N
and S is the n − 1 dimensional process of the remaining asset prices. Let η = (1, 0, . . . , 0) so
that N = η · P .
To find the pure-investment arbitrage, the idea is to let K be the strategy that consists of
holding at time t the portfolio Ht but instead of consuming the amount ct = (Ht − Ht+1 ) · Pt ,
this money instead is invested into the numéraire portfolio. In notation, K is defined by
t−1
X cs
Kt = Ht + η
s=0
Ns
24
Note that
ct
(Kt − Kt+1 ) · Pt =(Ht − Ht+1 ) · Pt − η · Pt
Nt
=0
so K is a pure investment strategy. Finally, note that since HT +1 = 0, then
T
X cs
KT · PT = KT +1 · PT = NT ≥ 0.
s=0
Ns
*****
Now let’s return to our financial model.
Definition. Let P be a market model defined on a probability space (Ω, F, P). The
measure P is called the objective (or historical or statistical ) measure for the model.
Suppose that we can write our asset price process as P = (N, S) where N is a positive
adapted process (the price of a numéraire) and S is an adapted d dimensional process.
An equivalent martingale measure relative to this numéraire is any probability measure Q
equivalent to P such that the discounted price processes
St
Nt t≥0
is a martingale under Q.
Remark. In many accounts of arbitrage theory, the concept of an equivalent martingale
measure has taken centre stage. I believe that its importance has been overstressed. In
particular, it is a numéraire-dependent concept, unlike that of a martingale deflator. For
instance, if there are two assets that both numéraires (for example from the point of view
of a British trader, both the euro and the US dollar are numéraires) then one must be very
careful to specify which one is the numéraire.
Theorem (First Fundamental Theorem of Asset Pricing when there is a numéraire). The
market model (Pt )0≤t≤T has no arbitrage if and only if there exists an equivalent martingale
measure relative to a fixed numéraire.
Proof. We already know that there is no arbitrage if and only if there exists a martin-
gale deflator. We now show that there is essentially a one-to-one correspondence between
martingale deflators and equivalent martingale measures once a finite horizon T > 0 is
specified.
26
Let Y be a process such that {YT > 0} is P-a.s. and such that YT PT is P-integrable.
Define a new measure Q by the density
dQ YT NT
= P .
dP E (YT NT )
Our analysis turns on the Bayes formula
PT EP (PT YT |Ft )
E Q
|Ft = P
NT E (NT YT |Ft )
Suppose Y is a martingale deflator. In this case
EP (PT YT |Ft ) = Pt Yt
and in particular
EP (NT YT |Ft ) = Nt Yt .
By the Bayes formula we have
PT Pt
E Q
|Ft =
NT Nt
and hence P/N is a Q-martingale, i.e. Q is an equivalent martingale measure.
Conversely, suppose Q is an equivalent martingale measure. Let
dQ
Zt = E P
|Ft .
dP
Note that Z is a positive P-martingale. Let
Yt = Zt /Nt .
Since the random variable PT /NT is Q-integrable by the definition of martingale, we can
conclude that PT YT is P-integrable. Furthermore, the process Y is positive and satisfies
EP (NT YT |Ft ) = EP (ZT |Ft )
= Zt
= Nt Yt .
Hence by the Bayes formula
PT
E (PT YT |Ft ) = E
P Q
|Ft EP (NT YT |Ft )
NT
Pt
= (Nt Yt )
Nt
= Pt Y t
so that P Y is a P-martingale and hence Y is a martingale deflator.
Remark. Notice that the statement of the version of the fundamental theorem above is
for a finite horizon model, as opposed to the version presented earlier. Here is an example
that shows that there might be no arbitrage but there does not exist an equivalent martingale
measure over the infinite horizon.
Let ξ1 , ξ2 , . . . be independent random variables with
P(ξi = 1) = p = 1 − q = P(ξi = −1)
27
and let St = ξ1 +. . .+ξt be a simple random walk, where we assume that it is not symmetrical
p 6= q.
Define a market model a two asset model with respect to the natural filtration Ft =
σ(ξ1 , . . . , ξt ) by
Pt = (1, St ).
In particular, there is a numéraire with constant price Nt = 1, which can be interpreted as
cash.
First let us compute all martingale deflators for the model. Fix t and ξ1 , . . . , ξt and let
Zu = Yt+1 /Yt if ξt+1 = 1, and Zd = Yt+1 /Yt if ξt+1 = −1.
Since P Y is a martingale, we have
Yt Zu p + Yt Zd q = Yt
(St + 1)Yt Zu p + (St − 1)Yt Zd q = St Yt
so that Zu = 1/(2p) and Zd = 1/(2q). Hence, we have shown that all martingale deflators
satisfy
Yt+1 = Yt (4pq)−1/2 (q/p)ξt+1 /2
and hence
Yt = Y0 (4pq)−t/2 (q/p)St .
Now fix a horizon T > 0 and let PT be the restriction of P to FT . Let QT be the
equivalent measure on FT with density
dQT
= YT /Y0 .
dPT
By the above discussion, QT is the equivalent martingale measure for the finite horizon
model (Pt )0≤t≤T . It is an easy computation to verify that under the measure QT , the random
variables ξ1 , . . . , ξT are independent with
1
QT (ξi = 1) = = QT (ξi = −1).
2
Let us consider the measure Q on F with the property that the random variables ξ1 , ξ2 , . . .
are independent with
1
Q(ξi = 1) = = Q(ξi = −1),
2
so that QT is the restriction of Q to FT . Is this measure Q an equivalent martingale for the
infinite horizon model (Pt )t≥0 ? While it is true that P is a Q-martingale, it is not true that
P and Q are equivalent. Indeed,
St St
P → p − q = 1, but Q → 0 = 1.
t t
Since we have assumed p 6= q, we see that these measures are inequivalent! Indeed, note that
P (Yt → 0) = 1, but Q (Yt → ∞) = 1.
As a parting shot, we introduce some definitions which are used in the financial mathe-
matics literature.
28
Definition. An asset in a discrete-time market model is risk-free if its price process is
predictable.
Definition. An equivalent martingale measure with respect to a risk-less numéraire is
called a risk-neutral measure.
29
CHAPTER 2
A contingent claim is any cash payment where the size of the payment is contingent on
the prices of other assets or any other variable (for instance, the weather). There are two
major types of contingent claims that we will study in these notes: European and American.
European: specified by a time horizon T > 0 and FT -measurable random variable
ξT modelling the payout at the maturity date T .
American: specified by a time horizon T > 0 and an adapted process (ξt )0≤t≤T where
ξt models the payout of the claim if the owner of the claim chooses to exercise at
time t.
Example (Call option). A European call option gives the owner of the option the right,
but not the obligation, to buy a given stock at some fixed time T at some fixed price K,
called the strike of the option. Let ST denote the price of the stock at the maturity date
T . There are two cases: If K ≥ ST , then the option is worthless to the owner since there
is no point paying a price above the market price for the underlying stock. On the other
hand, if K < ST , then the owner of the option can buy the stock for the price K from the
counterparty and immediately sell the stock for the price ST to the market, realising a profit
of ST − K. Hence, the payout of the call option is ξT = (ST − K)+ , where a+ = max{a, 0}
as usual. The ‘hockey-stick’ graph of the function g(x) = (x − K)+ is below.
An American call option gives the owner of the option the right, but not the obligation,
to buy a given stock at any time t ∈ [0, T ] at some fixed strike price K. By the argument
above, the payout of the call option exercised at time t is given by ξt = (St − K)+ .
where
V = {u ∈ Rn : u · P0 = 0 = u · P1 a.s.}⊥ .
By the first order condition for a minimum ∇Fγ (Hγ ) = 0, we see that by setting
Y0γ = eγ(Hγ ·P0 −ξ0 ) and Y1γ = eγ(ξ1 −Hγ ·P1 ) ζ
we have found a martingale deflator. Note that
∂
Fγ (h)|h=Hγ = Y0γ (Hγ · P0 − ξ0 ) + E[Y1γ (ξ1 − Hγ · P1 )]
∂γ
= Hγ · (Y0γ P0 − E[Y1γ P1 ]) + E[Y1γ ξ1 ] − Y0γ ξ0
≤0
by since Y γ is a martingale deflator and the assumption that Y γ ξ is a supermartingale.
Also note that γ 7→ Hγ is differentiable. (Indeed, recall that Hγ is defined as the root
of the function ∇Fγ : V → V, and D2 Fγ is a strictly positive definite operator on V, so the
differentiability of Hγ follows from the implicit function theorem.) Furthermore,
Fγ (Hγ ) ≤ Fγ (Hγ±ε )
33
since Hγ is the minimiser of Fγ and hence
∂
Fg (Hγ )|g=γ = 0.
∂γ
Putting this together implies γ 7→ Fγ (Hγ ) is nonincreasing, and in particular
sup Fγ (Hγ ) < ∞.
γ≥1
Now we consider the sequence (Hk )k where the risk-aversion parameter takes the values
γ = k ∈ N.
If (Hk )k is bounded, then we can find a convergent subsequence such that Hk → H ∗ .
Note that since
Fk (Hk ) = (eHk ·P0 −ξ0 )k + E[(eξ1 −Hk ·P1 )k ζ]
we have by the boundedness of the sequence that ξ0 ≥ H ∗ · P0 and ξ1 ≤ H ∗ · P1 a.s.
So it remains to rule out the case that the sequence (Hk )k is unbounded. Suppose that
it was unbounded. Then we can pass to a subsequence that kHk k ↑ ∞. Again, let
Hk
Ĥk =
kHk k
and pass to a subsequence such that Ĥk → Ĥ. Note that we have that Ĥ ∈ V and that
kĤk = 1. But by the formula
ξ ξ1
Ĥk ·P0 − kH0 k kkHk k −Ĥk ·P1 kkHk k
Fk (Hk ) = (e k ) + E[(e kHk k ) ζ]
we see that boundedness forces Ĥ · P0 ≤ 0 ≤ Ĥ · P1 . By no arbitrage, we have Ĥ · P0 = 0 =
Ĥ · P1 a.s. Since Ĥ ∈ V we conclude that Ĥ = 0, contradicting kĤk = 1.
With this motivation, we introduce an important class of claims that can be perfectly
hedged:
Definition. A European contingent claim with payout ξT is replicable or attainable iff
there exists a pure investment strategy H such that HT · PT = ξT almost surely.
One of the reasons to single out attainable claims is that there is an unambiguous way
to price them according to the no-arbitrage principle:
Theorem. Suppose that the market model with n-dimensional price process P has no
arbitrage. Let ξT be the payout of an attainable European contingent claim with maturity
date T > 0, and let H be the n-dimensional replicating strategy.
Suppose the claim has price ξt for 0 ≤ t ≤ T . If the augmented market with (n + 1)-
dimensional price process (P, ξ) has no arbitrage, then
ξt = Ht · Pt almost surely for all 0 ≤ t ≤ T
Proof. Let X = H · P . The idea is to that if Xt 6= ξt for some t, then there would be
an arbitrage in the augmented market. To construct an arbitrage wait until the first time
that the price of the replicating portfolio differs from the price of the claim, and then buy
the cheap one, sell the expensive one and pocket the difference.
34
In mathematical notation, fix a T > 0 and let τ = inf{0 ≤ t ≤ T : Xt 6= ξt }, with the
usual convention that inf ∅ = +∞. Consider the (n + 1)-dimensional investment strategy
H̄t = sign(ξτ − Xτ )1{t>τ } (Ht , −1)
and consumption ct = |ξτ − Xτ |1{t=τ +1} .
Let X̄ = H̄ · (P, ξ) and note that X̄0 = X̄T = 0. If the augmented market has no
arbitrage, then ct = 0 a.s. for all t, implying τ = ∞ a.s. as claimed.
Example. (Put-call parity formula) Suppose we start with a market with three assets
with prices (Bt,T , St , Ct )0≤t≤T . The first asset is a bond with maturity date T and unit
principal value, so that in particular, BT,T = 1 almost surely. The next asset is a stock. The
last asset is a call option on that stock with strike K and maturity T , so that CT = (ST −K)+ .
Suppose that this market is free of arbitrage.
Now we introduce another claim, called a put option. A put option gives the owner of
the option the right, but not the obligation, to sell the stock for a fixed strike price at a fixed
maturity date. If the strike is K and maturity date is T , then a similar argument as we used
for the call option, the payout of a put option is PT = (K − ST )+ .
It turns out that the put option is replicable in the market (B, S, C). Indeed, we have
the identity
PT = (K − ST )+
= K − ST + (ST − K)+
= KBT,T − ST + CT
= (K, −1, +1) · (BT,T , ST , CT ).
Hence Ht = (K, −1, +1) for all 1 ≤ t ≤ T is a replicating strategy.
Now, suppose we want to assign prices Pt to the put for 0 ≤ t < T . The above theorem
says there is no arbitrage in the augmented market (B, S, C, P ) if and only if
Pt − Ct = KBt,T − St .
This is the famous put-call parity formula.
A difficulty in using the above theorem for pricing an attainable contingent claim is that
it requires knowing the replicating strategy. The following theorem gives a formula for the
no-arbitrage price of the claim which does not require knowledge of this strategy, just that
it exists.
Theorem. Suppose that the market model with n-dimensional price process P has no
arbitrage, and let ξT be the payout of an attainable European contingent claim with maturity
date T > 0. The claim is attainable if and only if there exists an x ∈ R such that
E(YT ξT ) = Y0 x
for all martingale deflators Y such that YT ξT is integrable.
Proof. (‘only if’ direction) Since the claim is attainable there exists a pure investment
strategy such that HT · PT = ξT a.s. Note that H · P Y is a local martingale from our
calculation in the last chapter. And from result in the example sheet, we see that the
35
assumption that YT ξT is integrable is sufficient to conclude that H · P Y is a true martingale.
In particular, we have
E(YT ξT ) = E(HT · PT YT ) = xY0
for any Y , where x = H0 · P0 is the initial cost of replication.
(‘if’ direction) Define ξˆ by
ˆ 1
ξt = ess sup E(ξT YT |Ft ) : Y a martingale deflator
Yt
ˆ is a supermartingale. Similarly, let
and note that ξY
1
ξˇt = ess inf E(ξT YT |Ft ) : Y a martingale deflator
Yt
and note that ξY ˇ is a submartingale. Since for all ξˆT = ξT = ξˇT and ξˆ0 = x = ξˇ0 , we can
conclude that ξˆ = ξ.ˇ Letting ξ = ξˆ = ξˇ we have proven that ξY is a martingale for all Y .
Now, there exists an investment-consumption strategy H such that ξt ≤ Ht · Pt almost
surely for all t ≥ 1. Now fix one such Y and let
t
X
Mt = −Yt ξt + Ht+1 · Pt + (Hs − Hs+1 ) · Ps Ys
s=1
t−1
X
= (Ht · Pt − ξt )Yt + (Hs − Hs+1 ) · Ps Ys .
s=1
In particular, note that M is a local martingale by our usual calculations such that Mt ≥ 0
for t ≥ 1, and hence M is a true martingale. However,
E(Mt ) = (H1 · P0 − ξ0 )Y0 ≤ 0,
since Ht+1 · Pt ≤ ξt , and hence Mt = 0 for all t ≥ 0. The conclusion follows.
For the sake of comparison, consider the following result:
Theorem. Suppose that the market model with n-dimensional price process P has no
arbitrage. Let ξT be the payout of (not necessarily attainable) contingent claim with maturity
date T > 0.
Suppose the claim has price ξt for 0 ≤ t ≤ T and that the augmented market with (n + 1)-
dimensional price process (P, ξ) has no arbitrage. Then there exists a martingale deflator Y
of the original market such that
1
ξt = E(ξT YT |Ft )
Yt
for all 0 ≤ t ≤ T .
Proof. This is just the first fundamental theorem of asset pricing applied to the aug-
mented market with prices (P, ξ).
Remark. The message is this: if a claim is attainable it can be priced with any mar-
tingale deflator. On the other hand, the most one can say for a general claim is that there
exists some martingale deflator that prices the claim.
36
Since attainable claims have unique no-arbitrage prices, we single out the markets for
which every claim is attainable:
Definition. A market is complete if and only if every European contingent claim is
attainable. A market is incomplete otherwise.
We can characterise complete markets:
Theorem (Second Fundamental Theorem of Asset Pricing). An arbitrage-free market
model is complete if and only if there exists a unique martingale deflator Y such that Y0 = 1.
Proof. Suppose that there is a unique martingale deflator such that Y0 = 1. Let ξT be
any FT -measurable random variable. By the flexibility of the proof of the first fundamental
theorem, we can choose the random variable ζ in such a way that we may suppose that ξT YT
is integrable. In particular, there is a number x such that
x = E(YT ξT )
for all (the unique) martingale deflators with Y0 = 1. By the characterisation of attainable
claims, there exists a pure-investment strategy such that H · P = ξ. Hence the market is
complete.
Conversely, suppose that the market is complete. Let Y and Y 0 be martingale deflators
such that Y0 = Y00 = 1. Fix a T > 0. By completeness there exists a pure-investment
strategy H such that
HT · PT = (YT − YT0 )Z
1
where Z = (YT +Y 0 2 . (The factor Z will be used to insure integrability later.)
T)
Since H · P Y is a local martingale which is integrable at time T , it is a true martingale
by the example sheet. In particular,
H0 · P0 = E[(YT − YT0 )ZYT ].
By the same argument with Y 0 we have
H0 · P0 = E[(YT − YT0 )ZYT0 ].
Subtracting yields
E[(YT − YT0 )2 Z] = 0
so by the pigeon-hole principle we have P(YT = YT0 ) = 1 as desired.
Complete markets are convenient for a variety of reasons. For instance, complete markets
have a riskless numéraire portfolio:
Proposition. Suppose the arbitrage-free market model P is complete. Then there exists
a pure-investment strategy η such that the process β = η·P is strictly positive and predictable.
37
Proof. By completeness, zero-coupon bonds can be attained. That is, for each T > 0
there exists a pure-investment strategy H T such that HTT · PT = 1 a.s. By no-arbitrage, the
bonds are numéraires: BtT = HtT · Pt > 0 a.s. for all 0 ≤ t ≤ T . Now define the bank account
process β by
Yt
βt = (1 + rs )
s=1
where
1
rt = t
− 1.
Bt−1
Note that β is predictable and strictly positive. Furthermore, let ηt = βt Htt . This portfolio
corresponds to holding the βt units of the bond with maturity t during the period (t − 1, t]
just before its maturity.
First note that βt = ηt · Pt since Btt = Htt · Pt = 1. Finally note that the predictable
process η is a self-financing pure-investment strategy since
t+1
ηt+1 · Pt = βt+1 Ht+1 · Pt
= βt+1 Htt+1 · Pt (since H t+1 is pure-invest.)
= βt+1 Btt+1
= βt
as desired.
In discrete time models complete markets have even more (arguably too much) structure:
Theorem. If the market model P with n assets is complete, then for each t ≥ 0 the
probability space Ω can be partitioned into no more than nt Ft -measurable events of positive
probability, and in particular, the n-dimensional random vector Pt takes values in a set of at
most nt elements.
Proof. We first consider the t = 1 case. Suppose A1 , . . . , Ak are a collection of disjoint
F1 -measurable events with P(Ai ) > 0 for all i. Claim: the set {1A1 , . . . , 1Ak } is linearly
independent, and in particular, the dimension of the span of {1A1 , . . . , 1Ak } is exactly k. To
prove this claim, we must show that if
a1 1A1 + . . . ak 1Ak = 0 a.s.
for some constants a1 , . . . , ak , then a1 = · · · = ak = 0. To this end, note that if i 6= j the
sets Ai and Aj are disjoint and hence 1Ai 1Aj = 0. By multiplying both sides of the equation
by 1Ai we get ai 1Ai = 0. But since P(Ai ) > 0 it must be the case that ai = 0.
Now if the market is complete, each of the 1Ai is replicable. Hence
span{1A1 , . . . , 1Ak } ⊆ {H · P1 : H ∈ Rn }
= span{P11 , . . . , Ptn }
Looking at the dimensions of the spaces above, we must conclude k ≤ n.
The argument in the case t > 1 is similar. Let B1 , . . . , BN be a maximal partition of
Ω into disjoint Ft−1 -measurable sets of positive measure. If a random vector Ht is Ft−1 -
measurable, then it takes exactly one value on each of the Bj ’s for a total of at most N
38
values H1 , . . . , HN . Hence
{H · Pt : H is Ft−1 -meas. } = {H1 · Pt 1B1 + . . . + HN · Pt 1BN : H1 , . . . , HN ∈ Rn }
= span{Pti 1Bj : 1 ≤ i ≤ n, 1 ≤ j ≤ N }
and the dimension of the space above is nN . The argument above proves that there are at
most nN sets of disjoint Ft -measurable sets of positive measure. Induction completes the
proof.
to be sure that he can hedge the option, where the supremum is taken over the set of stopping
times smaller than or equal to T . Indeed, this is the case.
Theorem. Suppose that the adapted process (ξt )0≤t≤T specifies the payout of an American
claim maturing at T > 0.
There exists a trading strategy H such that
• Xt (H) ≥ ξt for all 0 ≤ t ≤ T ,
• Xτ ∗ (H) = ξτ ∗ for some stopping time τ ∗ , and
• X0 (H) = supτ ≤T E (Yτ ξτ ).
Remark. The strategy H dominates the payout of the American claim at all times, but
is conservative in the sense that it exactly replicates the optimally exercised claim.
39
The rest of this subsection is dedicated to proving this theorem.
*****
We will need a result of general interest:
Theorem (Doob decomposition theorem). Let U be a discrete-time supermartingale.
Then there is a unique decomposition
Ut = U0 + Mt − At
where M is a martingale and A is a predictable non-decreasing process with M0 = A0 = 0.
Proof. Let M0 = 0 = A0 and define
Mt+1 = Mt + Ut+1 − E(Ut+1 |Ft )
At+1 = At + Ut − E(Ut+1 |Ft )
for t ≥ 0. Since U is assumed to be supermartingale, and hence integrable, the processes
M and A are integrable. It is straightforward to check that M is a martingale, and since
U is a supermartingale, that A is non-decreasing. Also by induction, we see that At+1 is
Ft -measurable.
Summing up,
t
X
Mt − At = M0 − A0 + (Ms − Ms−1 − As + As−1 )
s=1
t
X
= (Us − Us−1 )
s=1
= Ut − U0 .
To show uniqueness, assume that Ut = U0 + Mt − At = U0 + Mt0 − A0t . Then M − M 0 is a
predictable discrete-time martingale, that is, a constant.
Now we introduce the key concept in optimal stopping theory:
Definition. Let (Zt )0≤t≤T be a given integrable adapted discrete-time process. Define
an adapted process (Ut )0≤t≤T by the recursion
UT = ZT
Ut = max{Zt , E(Ut+1 |Ft )} for 0 ≤ t ≤ T − 1.
The process (Ut )0≤t≤T is called the Snell envelope of (Zt )0≤t≤T .
Remark. The Snell envelope clearly satisfies both
Ut ≥ Zt and Ut ≥ E(Ut+1 |Ft )
almost surely. Thus, another way to describe the Snell envelope of a process is to say it is
the smallest supermartingale dominating that process.
In our application Z will be the process Y ξ, where Y is the martingale deflator and ξ is
the process specifying the payout of the American claim.
40
Theorem. Let (Zt )0≤t≤T be an integrable adapted process, let (Ut )0≤t≤T be its Snell en-
velope with Doob decomposition Ut = U0 + Mt − At . Let
τ ∗ = min{t ∈ {0, . . . , T } : At+1 > 0}
with the convention τ ∗ = T on {At = 0 for all t}. Then τ ∗ is a stopping time and
Uτ ∗ = U0 + Mτ ∗ = Zτ ∗ .
Proof. That τ ∗ is a stopping time follows from the fact that the non-decreasing process
(At )0≤t≤T is predictable.
Now note that
E(Ut+1 |Ft ) = E(U0 + Mt+1 − At+1 |Ft ) = U0 + Mt − At+1
since M is a martingale and A is predictable so that by the definition of Snell envelope
U0 + Mt − At = max{Zt , U0 + Mt − At+1 }.
In particular,
U0 + Mτ ∗ = max{Zτ ∗ , U0 + Mτ ∗ − Aτ ∗ +1 }
since Aτ ∗ = 0. But since Aτ ∗ +1 > 0 we must conclude
Uτ ∗ = U0 + Mτ ∗ = Zτ ∗ .
Theorem. Let Z be an adapted integrable process and let U be its Snell envelope. Then
U0 = sup E(Zτ ).
τ ≤T
42
CHAPTER 3
Despite the elegance of discrete-time financial theory, there is at least one glaring problem:
explicit computations are difficult. For instance, the fundamental theorems are stated in
terms of state price densities, but it is very difficult to classify them except in a few simple
examples. The continuous-time theory has the convenient feature that explicit formulae are
easy to find–indeed, one of our first results will be the general formula for a state price
density in a continuous-time market model.
Before we can describe the continuous-time financial theory, we need to first learn about
stochastic integration. Recall that in discrete time, the self-financing condition and budget
constraint imply that for the wealth process X corresponding to a pure investment strategy
H satisfies
Xt − Xt−1 = Ht · (Pt − Pt−1 )
so that
X t
X t = X0 + Hs · (Ps − Ps−1 )
s=1
The continuous time analogue ought to be something like
Z t
Xt = X0 + Hs · dPs
0
What does the integral on the right mean? If we assume that the sample paths t 7→ Pt are
differentiable, we could interpret the integral as the Lebesgue integral
Z t
dPs
Hs · ds.
0 ds
Unfortunately, it turns out that life is not that simple. To see why, remember that in
discrete time we defined the state price density Y as a positive process such that Y P is
a martingale. We will adopt more-or-less the same definition in continuous time. Now, a
theorem of stochastic calculus says that a continuous martingale with differentiable sample
paths is necessarily constant. So if we insist that our price processes have differentiable
sample paths, we will have a very boring theory.
This chapter is concerned with an integration theory where we use the martingale prop-
erty, rather than the differentiablity of the sample paths, as the key ingredient. This theory
is nice, and indeed something like the fundamental theorem of calculus holds. This means
we can do explicit computations.
The most basic example of a continuous martingale is Brownian motion. We will build
up our theory by first defining Brownian motion, to construct the Brownian stochastic inte-
gral, and to learn the rules of the resulting calculus. The following chapter will provide an
extremely brief introduction to this theory.
43
1. Brownian motion
In this section, we introduce one of the most fundamental continuous-time stochastic
processes, Brownian motion. As hinted above, our primary interest in this process is that
it will be the building block for all of the continuous-time market models studied in these
lectures.
It is not clear that Brownian motion exists. That is, does there exist a probability
space (Ω, F, P) on which the uncountable collection of random variables (Wt )t≥0 can be
simultaneously defined in such a way that the above definition holds? The answer, of course,
is yes, and the proof of this fact is due to Wiener in 1923. Therefore, the Brownian motion
is also often called the Wiener process, especially in the U.S.
Although the sample paths of Brownian motion are continuous, they are very irregular.
Below is a computer simulation of a one-dimensional Brownian motion:
2.1. The L2 theory. To get things started, let W be a scalar Brownian motion. We
will assume that W is adapted to a filtration (Ft )t≥0 . For the record, we will assume
T that the
filtration satisfies what are called the usual conditions of right-continuity Ft = >0 Ft+ and
that F0 contains all P-null events. These are technical assumptions that ensure the existence
of stopping times with the right properties. We also will assume that for each 0 ≤ s < t the
increment Wt − Ws is independent of Fs .
The first building block of the theory are the simple predictable integrands.
Definition. A simple predictable process is an adapted process α = (αt )t≥0 of the form
N
1(tn−1 ,tn ] (t)an (ω)
X
αt (ω) =
n=1
44
Sample path of Brownian motion
3.5
3.0
2.5
2.0
1.5
W
1.0
0.5
0.0
−0.5
−1.0
−1.5
0 1 2 3 4 5 6 7 8 9 10
t
where an is bounded and Ftn−1 -measurable for some 0 ≤ t0 < t1 < ... < tN < ∞. For simple
predictable processes we define the stochastic integral by the formula
Z ∞ XN
αs dWs = an (Wtn − Wtn−1 )
0 n=1
as claimed.
R∞
Now, the map defined by I(α) = 0 αs dWs is an isometry from the space of simple
predictable integrands to the space L2 (Ω, F, P) of square-integrable random variables. The
fact that L2 is complete is the key observation which allows us to build the stochastic integral
of more general integrands.
Definition. The predictable sigma-field P is the sigma-field on the product space R+ ×Ω
generated by sets of the form (s, t]×A where 0 ≤ s < t and A is Fs -measurable. Equivalently,
the predictable sigma-field is that generated by the simple, predictable integrands.
A predictable process α is a map α : R+ × Ω → R that is P-measurable. Equivalently,
predictable processes are limits of simple, predictable integrands.
Remark. Every left-continuous, adapted process is predictable. These are the examples
to keep in mind, since they are the ones that come up most in application.
Now, suppose (α(k) )k≥1 is a sequence of simple predictable integrands converging in
2
L (R+ × Ω, P, Leb × P) to a predictable process α so that
Z ∞
(k) 2
E (αs − αs ) ds → 0
0
as k → ∞. By Itô’s isometry the sequence I(α(k) ) is a Cauchy sequence, which by the com-
pleteness of L2 , converges to some random variable. This is what we take as the definition.
Definition. If α is predictable and
Z ∞
E αs2 ds <∞
0
then Z ∞ Z ∞
αs dWs = lim αs(k) dWs
0 k 0
where the limit is interpreted in the L (Ω) sense where α(k) is any sequence of simple,
2
Of course, we are not really interested in integrals over the whole interval [0, ∞) but
rather finite intervals [0, t]. This is easily handled.
Theorem. For every predictable α such that
Z t
2
E αs ds < ∞ for all t ≥ 0
0
there exists a continuous martingale X such that
Z ∞
Xt = αs 1{s≤t} dWs .
0
In this case, we will use the notation
Z t
Xt = αs dWs .
0
2.2. Localisation. In this section, we show how to extend the definition of stochastic
integral to predictable processes α such that
Z t
αs2 ds < ∞ almost surely
0
for all t ≥ 0. The technique is called localisation.
Define the stopping times
Z t
2
τn = inf t ≥ 0 : αs ds = n
0
for each n ≥ 1, where inf ∅ = ∞ as usual, and let
= αt 1{t≤τn } .
(n)
αt
R
t (n)
Note that since E 0
(αs )2 ds ≤ n, the process X (n) defined by
Z t
(n)
Xt = αs(n) dWs
0
2
is a martingale for each n by the L theory.
R t Now fix t > 0 and define the increasing sequence of events An = {ω ∈ Ω : τn ≥ t}. Since
2
S
α ds < ∞ almost surely for all t ≥ 0, we have P n∈N An = 1. Hence we can define
0 s
the stochastic integral by the formula
Z t Z t
αs dWs = lim αs(n) dWs
0 n→∞ 0
where the limit is in probability.
Note that the X defined by Z t
Xt = αs dWs
0
47
is a continuous local martingale. Indeed, for each n the stopped process
(n)
Xt∧τn = Xt
is a martingale.
To summarise the most frequently used aspects of this construction:
Rt
Xt = 0 αs dWs defines a continuous
If α is an adapated continuous process then
Rt
local martingale. If in addition we have E 0 αs2 ds < ∞ for all t ≥ 0, then
X is a true martingale.
3. Itô’s formula
In the last section, we sketched very quickly the constructed of a stochastic integral with
respect to a Wiener process. What makes the Itô stochastic integral useful is that there is a
corresponding stochastic calculus. The basic building block of this calculus is the chain rule,
called Itô’s formula.
3.1. The scalar version. Let (Wt )t≥0 be a scalar Brownian motion adapted to a fil-
tration (Ft )t≥0 satisfying our usual conditions.
We can use our stochastic integration theory to define a useful class of stochastic process:
Definition. An Itô process X is an adapted process of the form
Z t Z t
Xt = X0 + αs dWs + βs ds.
0 0
where X0 is a fixed real number and (αt )t≥0 and (βt )t≥0 be predictable real-valued processes
such that Z t Z t
2
αs ds < ∞ and |βs |ds < ∞
0 0
almost surely for all t ≥ 0.
Note that the two integrals appearing the above definition have different meanings: the
first as a stochastic integral and the second as a pathwise Lebesgue integral.
We are now ready for the first version of Itô’s formula:
Theorem (Itô’s formula, scalar version). Let X be an Itô process and f : R → R twice
continuously differentiable. Then
Z t Z t
0 0 1 00 2
f (Xt ) = f (X0 ) + f (Xs )αs dWs + f (Xs )βs + f (Xs )αs ds.
0 0 2
Let us highlight a difference between Itô and ordinary calculus, by noting the mysterious
appearance of the f 00 term in Itô’s formula. This term would not appear in the chain rule of
ordinary calculus. But consider the example f (x) = x2 so that
Z t
2
Wt = 2 Ws dWs + t.
0
Note that since Z t Z t
E Ws2 ds = s ds = t2 /2 < ∞,
0 0
48
the local martingale X given by Z t
Xt = Ws dWs
0
is actually a true martingale. Can you verify, directly from the definition of Brownian motion,
that the process (Wt2 − t)t≥0 is a martingale?
We now introduce a differential notation which cleans up some of the formulae. We will
use the notation
dXt = αt dWt + βt dt
to mean Z t Z t
Xt = X0 + αs dWs + βs ds.
0 0
Recall that the sample paths of the Brownian motion are nowhere differentiable, so the
notation dWt is only formal, and can only be interpreted via the stochastic integration
theory. But in this differential notion, Itô’s formula takes a nicer form
1 00
df (Xt ) = f (Xt )βt + f (Xt )αt dt + f 0 (Xt )αt dWt .
0 2
2
Example. Consider the Itô process given by
Xt = X0 + aWt + bt
for some constants a, b ∈ R. Letting
Yt = eXt ,
we would like to show that the process (Yt )t≥0 is an Itô process, and write down its decom-
position in terms of ordinary and stochastic integrals.
Let f (x) = ex . Then f 0 (x) = ex and f 00 (x) = ex . Also,
dXt = a dWt + b dt and dhXit = a2 dt
So Itô’s formula says:
1
df (Xt ) = f 0 (Xt )dXt + f 00 (Xt )dhXit
2
⇒ dYt = Yt [(b + a2 /2)dt + a dWt ]
We now introduce a notion which helps with computations involving Itô’s formula.
Theorem. Let X be an Itô process. There exists a continuous non-decreasing process
hXi, called the quadratic variation of X, such that
N
X
hXit = lim (Xnt/N − X(n−1)t/N )2
N
n=1
for each t ≥ 0, where the limit is in probability. If
dXt = αt dWt + βt dt
then
dhXit = αt2 dt.
Remark. Some people write dhXit = (dXt )2 for obvious reasons.
49
This is not the appropriate place to prove this important result in full, but to get a feeling
for why it is true, we will prove it in the case where X is a Brownian motion:
Proof. By definition, the increments of Brownian motion are Gaussian randoms so that
E[(Wt − Ws )2 ] = t − s
and
Var[(Wt − Ws )2 ] = 2(t − s)2
for every 0 ≤ s ≤ t. Hence
" N # N
X X
E (Wtn − Wtn−1 )2 = (tn − tn−1 ) = tN − t0
n=1 n=1
Since the quadratic variation of a Brownian motion is positive, the typical Brownian sample
path is not a continuously differentiable function of time.
With the notion of quadratic variation, we can rewrite Itô’s formula once more in a
particularly easy to remember form:
1
df (Xt ) = f 0 (Xt )dXt + f 00 (Xt )dhXit
2
3.2. The multi-dimensional version. We now introduce the vector version of Itô’s
formula. It is basically the same as before, but with worse notation.
An n-dimensional Itô process (Xt )t≥0 defined by
Z t Z t
Xt = X0 + αs dWs + βs ds,
0 0
interpreted component-wise as
d
Z tX Z t
(i) (i)
Xt = X0 + αs(i,k) dWs(k) + βs(i) ds
0 k=1 0
where (Wt )t≥0 is a d-dimensional Brownian motion so that W (1) , . . . , W (d) , are independent
scalar Brownian motions, and the predictable process (αt )t≥0 is valued in the space of n × d
matrices, and the predictable process (βt )t≥0 is valued in Rn . We insist that
Z tXn X d Z tX n
(i,k) 2
(αs ) ds < ∞ and |βs(i) |ds < ∞
0 i=1 k=1 0 i=1
almost surely for all t ≥ 0 so that all of the integrals make are defined. The aim of this
section is to give a formula for the Itô decomposition of f (t, Xt ).
Now in the scalar case we needed a notion of quadratic variation (dXt )2 = dhXit . In the
(i) (j)
multi-dimensional case, we now introduce the notion of quadratic co-variation (dXt )(dXt ) =
dhX (i) , X (j) it .
Theorem. There exists a continuous process of finite variation hX (i) , X (j) i, called the
quadratic co-variation of X (i) and X (j) , such that
N
X
(i) (j) (i) (i) (j) (j)
hX , X it = lim (Xnt/N − X(n−1)t/N )(Xnt/N − X(n−1)t/N )
n
n=1
51
The following multiplication table might help you remember how to compute quadriatic
covariation, where W and W ⊥ denote independant Brownian motions:
(dt)2 = 0 (dt)(dWt ) = 0
4. Girsanov’s theorem
As we have seen in discrete time, the economic notion of an arbitrage-free market model
is tied to the existence of an equivalent measure for which the asset prices, when discounted
by a numéraire are martingales.
Recall that an equivalent measures is related to a positive random variable via the Radon–
Nikodym theorem. Indeed, let (Ω, F, P) be our probability space and let Q be equivalent to
P. Then, by the Radon–Nikodym theorem there exists a density
dQ
Z=
dP
such that Z > 0 has unit P-expectation. Conversely, if Z > 0 and EP (Z) = 1, we can define
an equivalent measure Q with density Z.
Motivated by above discussion, we aim to understand how martingales arise within the
context of the Itô stochastic integration theory. Consider the stochastic process (Zt )t≥0 given
by
1 t
R 2
Rt
Zt = e− 2 0 |αs | ds+ 0 αs ·dWs
where (Wt )t≥0 is a m-dimensional
Rt Brownian motion and (αt )t≥0 is a m-dimensional pre-
2
dictable process with 0 |αs | ds < ∞ a.s. for all t ≥ 0.
This process is clearly positive. Furthermore, notice that by Itô’s formula we have
dZt = Zt αt · dWt
so that (Zt )t≥0 is a local martingale, as it is a stochastic integral with respect to a Brownian
motion. Recall that since Z is a positive local martingale, it is automatically a supermartin-
gale. Hence, if
E(ZT ) = 1
for some non-random T > 0, then (Zt )0≤t≤T is a true martingale. In this case, what happens
to the Brownian motion when we change to an equivalent measure with density ZT ?
52
Theorem (Cameron–Martin–Girsanov Theorem). Let (Ω, F, P) be a probability space on
which a m-dimensional Brownian motion (Wt )t≥0 is defined, and let (Ft )t≥0 be a filtration
satisfying the usual conditions. Let
1
Rt Rt
kαs k2 ds+
Zt = e− 2 0 0 αs ·dWs
and suppose (Zt )0≤t≤T is a martingale. Define the equivalent measure Q on (Ω, FT ) by the
density process
dQ
= ZT .
dP
Then the m-dimensional process (Ŵt )0≤t≤T defined by
Z t
Ŵt = Wt − αs ds
0
is a Brownian motion on (Ω, FT , Q).
Now, you may be asking yourself: When is the process (Zt )t≥0 not just a local martingale,
but a true martingale?
Theorem (Novikov’s criterion). If
1 RT
+ 2 0 kαs k2 ds
E e <∞
then 1 RT RT
2
E e− 2 0 kαs k ds+ 0 αs ·dWs = 1.
and so (Mt )t≥0 is a continuous local martingale, as it is the stochastic integral with re-
2
spect to a continuous local martingale. On the other hand, since |Mt | = e|θ| t/2 and hence
E(sups∈[0,t] |Ms |) < ∞ the process (Mt )t≥0 is a true martingale. Thus for all 0 ≤ s ≤ t we
have
E(Mt |Fs ) = Ms
which implies
2
E(ei θ·(Xt −Xs ) |Fs ) = e−|θ| (t−s)/2 .
The above equation implies that the increment Xt − Xs has the Nm (0, (t − s)I) distribution
and is independent of Fs .
54
CHAPTER 4
We now return to the main theme of these lecture, models of financial markets. We
now have the tools to discuss the continuous time case, at least when the asset prices are
continuous processes.
1. The set-up
As before, our market model consists of a n-dimensional stochastic processes P =
(Pt1 , . . . , Ptn )t≥0 representing the asset prices. This process will be defined on a probability
space (Ω, F, P) with a filtration F = (Ft )t≥0 satisfying the usual conditions. Furthermore, we
will make the following assumption to make use of the Itô calculus developed in the previous
chapter.
Assumption. The stochastic process P is assumed to be is an Itô process adapted to F.
Since continuous-time theory has enough complications, we will make the following sim-
plification:
Assumption. There exists a numéraire asset.
In particular, when we discuss arbitrage theory, there is no need to allow the possibility
of intermediate consumption.
As before, the investor’s controls consist of the n-dimensional process H = (Ht1 , . . . , Htn )t≥0
where Hti and corresponds to the number of shares of asset i held at time t. We will assume
that H is self-financing in the continuous time sense:
Definition. A n-dimensional predictable process H such that H is P -integrable1is a
self-financing investment/consumption strategy iff
d(Ht · Pt ) = Ht · dPt
WARNING: THIS DEFINITION IS INCOMPLETE in the sense that it does not give
rise to interesting arbitrage theory. The reason for the above warning is spelled out below.
2. Admissible strategies
In order to make sense of the stochastic integral defining the wealth, we need to impose
a technical integrablity condition which holds automatically for continuous processes.
1...this
Rt
means the stochastic integral 0 Hs · dPs is well-defined, i.e. if dPt = bt dt + σt dWt then
Z t Z t
|Hs · bs |ds < ∞ and kσs> Hs k2 ds < ∞ a.s. for all t ≥ 0
0 0
55
However, in moving from discrete to continuous time, we have to be careful. We will now
see that this condition isn’t strong enough to make our economic analysis interesting.
Example. Consider a discrete-time market model with two assets P = (1, S) where S
is a simple symmetric random walk:
St = ξ1 + . . . + ξt
where the random variables ξ1 , ξ2 , . . . are independent and
P(ξt = 1) = P(ξt = −1) = 1/2.
Obviously this market has no arbitrage as P is a martingale. Nevertheless, let’s explore how
to approximate an arbitrage in some sense. Given a predictable process π, let
t−1
X
φt = (πs+1 − πs )Ss
s=1
Then the pair (φ, π) defines a self-financing pure investment strategy with associated wealth
process
Xt
Xt = πs (Ss − Ss−1 ).
s=1
In particular, X0 = 0.
A simple strategy that resembles an arbitrage is constructed as follows: first define the
stopping time
σ = inf{t ≥ 0 : St > 0}.
and consider the strategy with
πt = 1{t≤σ}
Note that the associated wealth process is Xt = St∧σ . Since σ < ∞ a.s., the conclusion is
that if you are willing to wait a while, investing in this strategy will result in an almost sure
gain Xσ = 1. But the amount of time you have to wait is very long: one can show that
E(σ) = +∞.
One can improve upon the above idea by taking larger and larger bets, effectively ‘speed-
ing up the clock’. Indeed, define the stopping time
τ = inf{t ≥ 0 : ξt = 1}
and consider the strategy
πt = 2t−1 1{t≤τ } .
In this case, the associated wealth process is
Xt = 1 − 2t 1{t≤τ −1} .
This is the classical ‘martingale’ or doubling strategy. Note that E(τ ) = 2, so an investor
following this strategy does not have to wait very long on average to realise the gain Xτ = 1.
But although τ is small on average, it is not bounded, and hence this strategy does not
qualify as an arbitrage.
56
Example. A technical problem with continuous time models is that events that will
happen eventually can be made to happen in bounded time by speeding up the clock.
Consider the market with prices P = (1, W ) where W is a Brownian motion. We will now
construct a pure investment trading strategy such that the corresponding wealth process has
X0 = 0 and XT = K a.s. where T > 0 is an arbitrary (non-random) time horizon and the
constant K > 0 is also arbitrary.
More concretely, by writing H = (φ, π), we will find a real-valued adapted process
RT
(πt )t∈[0,T ] such that 0 πs2 ds < ∞ almost surely, but
Z T
πs dWs = K a.s
0
Let f : [0, T ] → [0, ∞] be a strictly increasing, continuous function such that f (0) = 0
and f (T ) = ∞. In particular we assume that f 0 (t) > 0 for t and there exists an inverse
function f −1 : [0, ∞] → [0, T ] such that f ◦ f −1 (u) = u. For instance, to be explicit, we may
t
take f (t) = T −t and f −1 (u) = 1+u
uT
.
Now define a local martingale (Zu )u≥0 by
Z f −1 (u)
Zu = (f 0 (s))1/2 dWs
0
Note that the quadratic variation is
Z f −1 (u)
hZiu = f 0 (s)ds
0
= f (f −1 (u)) − f (0)
= u
so by Lévy’s characterisation (Zu )u≥0 is a Brownian motion. Define the stopping time τ by
τ = inf{u ≥ 0, Zu = K}.
Since (Zu )u≥0 is a Brownian motion, we have τ < ∞ almost surely since supu≥0 Zu = ∞
almost surely.
Now let
πt = (f 0 (t))1/2 1{t≤f −1 (τ )}
and Z t
Xt = πs dWs
0
for 0 ≤ t ≤ T . Note that since
Z T Z f −1 (τ )
πs2 ds = f 0 (s)ds = τ < ∞
0 0
the stochastic integral is well-defined. The strange fact is that (Xt )t∈[0,T ] is a local martingale
with X0 = 0, but XT = Zτ = K almost surely.
We see that integrand (πs )s∈[0,T ] roughly corresponds to an gambler starting at noon with
£0, employing a doubling strategy (with borrowed money) at a quicker and quicker pace,
until finally he gains £K almost surely before the clock strikes one o’clock. This situation is
rather unrealistic, particularly since the gambler must go arbitrarily far into debt in order to
57
secure the £K winning. Indeed, if such strategies were a good model for investor behaviour,
we all could be much richer by just spending some time trading over the internet.
The above discussion shows that the integrability necessary to define the stochastic in-
tegral is not really sufficient for our needs.
At this stage, there are several reasonable options. In this course we will insist that
the investor cannot go into debt.
Definition. A trading strategy H is admissible iff
Ht · Pt ≥ 0 for all t ≥ 0 almost surely .
Note that the doubling strategy is not admissible, since the investor now has only a finite
credit line. However, a suicide strategy, that is, a doubling strategy in which the object is to
lose a fixed amount K by time T , is admissible.
where the processes r, µi , σ ij are predictable and suitably integrable, and the W j are inde-
pendent Brownian motions.
The first asset can be thought of as a bank account, and the random variable rt is the
spot interest rate at time t. The (random) ordinary differential equation can be solved:
Rt
rs ds
Bt = B0 e 0
The d assets can be thought of as risky stocks. The random variablePµit is interpreted as the
mean instantaneous return of asset i, while the spot volatility is ( j (σtij )2 )1/2 . Note that
Itô’s formula yields
Rt ij 2 Rt
i 1
σsij dWsj
P P
Sti = S0i e 0 [µs − 2 j (σs ) ]ds+ 0 j .
We will use the notation
µ1t σt11 · · · σt1m
µt = ... and σt = ..
.
..
.
..
.
µdt d1
σt · · · σt dm
Conversely, if the filtration is generated by the Brownian motion, the martingale repre-
sentation theorem says that all positive local martingales M are of the form
1
Rt Rt
kλs k2 ds−
Mt = M0 e− 2 0 0 λs ·dWs
Sti X ij j
= σt (λt dt + dWtj )
Bt j
Sti X ij
= σt dŴtj .
Bt j
Rt
Now Girsanov’s theorem says that Ŵt = Wt + 0 λs ds is a Q-Brownian motion. Therefore,
each S i /B is the stochastic integral with respect to a Q-Brownian motion, and hence is a
Q-local martingale as claimed.
62
CHAPTER 5
As before, given a market model P we can introduce a contingent claim. Recall that
a European contingent claim maturing at a time T > 0 is modelled as random variable ξ
that is FT -measurable. We shall assume that there exists at least one martingale deflator,
so that, in particular, there are no absolute arbitrages.
and hence
h i
−r(T −t) (r−σ 2 /2)T +σ ŴT
Ht · Pt = e Q
E g S0 e |Ft
h 2
i
= e−r(T −t) EQ g St e(r−σ /2)(T −t)+σ(ŴT −Ŵt ) |Ft
Z ∞ √ −z2 /2
−r(T −t) (r−σ 2 /2)(T −t)+σ T −tz e
=e g St e √ dz.
−∞ 2π
65
A famous example is the case of the European call option where the payout function is of
the form g(S) = (S − K)+ . In this case, we have the the Nobel-prize-winning Black–Scholes
formula:
√
log(K/St )
Ct (T, K) =St Φ − √ + (r/σ + σ/2) T − t
σ T −t
√
−r(T −t) log(K/St )
− Ke Φ − √ + (r/σ − σ/2) T − t
σ T −t
Rx 2
where Φ(x) = −∞ √12π e−y /2 dy is the standard normal distribution function. (You are asked
to derive this formula on Example Sheet 3.)
We have argued that the martingale representation theorem asserts the existence of
replicating strategy H, but unfortunately, it gives us no information about how to compute
H. This problem will be tackled in the next section.
Now suppose that the d + 1 assets have Itô dynamics which can be expressed as
dBt = Bt r(t, St ) dt
dSt = diag(St )(µ(t, St )dt + σ(t, St )dWt )
rt (ω) = r(t, St (ω)), µt (ω) = µ(t, St (ω)), and σt (ω) = σ(t, St (ω)).
In this special situation, the asset prices (St )t≥0 are a d-dimensional Markov process.
The next theorem says how to find a replicating strategy for a contingent claim maturing
at time T with payout
ξT = g(ST )
!
X ∂V X ∂V
=r V − S i i dt + + i
dSti
i
∂S i
∂S
where we have used the assumption that V solves a certain PDE to go from the second to
third line above.
Now letting φ and π be as in the statement of the theorem we have that
V (t, St ) = φt Bt + πt · St
dV (t, St ) = φt dBt + πt · dSt .
1sometimes called the Feynman–Kac PDE. If r = 0, the PDE reduces to the (backward) Kolmogorov
equation.
67
Hence H = (φ, π) is a self-financing strategy with associated wealth process Xt (H) = V (t, St )
as claimed. It is 0-admissible since V ≥ 0 by assumption.
We have seen that there are two distinct ways to find replication costs for certain con-
tingent claims: by computing expectations or by solving a PDE. Furthermore, the PDE
method also gives the replicating portfolio. But how do you solve the PDE? In many cases,
the easiest way to solve the PDE is to compute the expectations.
This is illustrated by the Black–Scholes model:
Example (Black–Scholes continued). Let’s return to the Black–Scholes model
dBt = Bt rdt
dSt = St (µdt + σdWt )
with constant coefficients r, σ, µ. If we would like to replicate a claim with payout g(ST ), the
previous theorem says we should solve the Black–Scholes PDE
∂V ∂V 1 ∂ 2V
+ rS + σ 2 S 2 2 = rV
∂t ∂S 2 ∂S
V (T, S) = g(S)
Now, let’s specialise to the case of the call option where g(S) = (S − K)+ . From last
section we have
√
log(K/S)
V (t, S) =SΦ − √ + (r/σ + σ/2) T − t
σ T −t
√
−r(T −t) log(K/S)
− Ke Φ − √ + (r/σ − σ/2) T − t .
σ T −t
The delta, i.e. the replicating portfolio, in this case is (by a miracle of algebra)
√
∂V log(K/S)
(t, S) = Φ − √ + (r/σ + σ/2) T − t .
∂S σ T −t
Note that an agent attempting to replicate a call option using the Black–Scholes theory will
always hold a fraction of shares of the underlying stock between 0 and 1. Also note that
since the sensitivity of the portfolio to the price of the underlying, is given by the formula
∂ 2V √
1 log(K/S)
(t, S) = √ φ − √ + (r/σ + σ/2) T − t
∂S 2 Sσ T − t σ T −t
2
where φ(x) = √12π e−x /2 . Since the gamma is always positive, the hedger will buy more
shares of the underlying if the price goes up.
4. Black–Scholes volatility
What made the Black–Scholes formula so popular after its publication in 1973 is the
fact that the right-hand-side depends only on six quantities: the current calendar time t,
the option’s maturity time T , the option’s strike K, the spot interest rate r, the underlying
stock’s price St at time t, and a volatility parameter σ. Of these six numbers, only the
volatility parameter is neither specified by the option contract nor quoted in the market.
68
To use the Black–Scholes formula to find the price of real call options, one must first
estimate the volatility σ.
4.1. Estimation: statistics. In the Black–Scholes model, the drift µ and volatility σ
are not directly observable. Nevertheless, they can be estimated by appealing to standard
statistical theory. Suppose that we have observed the stock price (St
)−T ≤t≤0
. If we sample at
Sti
times ti = (i/n − 1)T , we see that the n random variables Yi = log St are independent
i−1
with distribution
Yi = (µ − σ 2 /2)(ti − ti−1 ) + σ(Wti − Wti−1 )
∼ N (aT /n, σ 2 T /n)
where a = µ − σ 2 /2. The maximum likelihood estimator of a is
n
1X
â = Yi
T i=1
and of σ 2 is
n
1X
σˆ2 = (Yi − âT /n)2 .
T i=1
Notice that this estimator â can be rewritten as
1
â = log(S0 /S−T ),
T
and hence does not depend on n! That is to say, there is no advantage going to ever higher
and higher frequency data to estimate the drift µ. Fortunately, a careful reading of the
previous section shows that the drift parameter µ is not needed to find either replication
cost or the replicating strategy. This is good news for the Black–Scholes theory.2
On the other hand, the variance of σ̂ 2 is 2σ 4 /n → 0 as n → ∞. Hence, there is some
hope of accurately estimating the volatility parameter by sampling the historical stock prices
regularly enough.
If one was to truly believe that the stock price was a geometric Brownian motion, that
is, of the form St = S0 eat+σWt , then one could insert the value σˆ2 into the Black–Scholes
formula to obtain the price of a call option. Notice that we have done the statistics under
the objective measure P, not the equivalent martingale measure Q.
4.3. Robustness of Black–Scholes. As argued above, since real markets tend to ex-
hibit implied volatility smiles, the Black–Scholes model cannot be considered an adequate
description of how stock prices fluctuate. However, it should be considered an approxi-
mation of reality, and we will now do a calculation to see how to quantify how good this
approximation is.
Suppose a banker wants to sell a contingent claim with payout ξT = g(ST ). The banker
believes that the underlying stocks prices are given by the Black–Scholes model, so that the
initial price of the claim is given by V (0, S0 , σ) where
Z ∞ √ −z 2 /2
−r(T −t) (r−σ 2 /2)(T −t)+σ T −tz e
V (t, S, σ) = e g(S0 e ) √ dz,
−∞ 2π
for some σ to be determined. Now, the claim is already traded on the market with initial
price ξ0 , so the banker chooses a σ = σ̂ such that V (0, S0 , σ̂) = ξ0 , i.e. σ̂ is the initial implied
volatility for the claim.
Now, the banker wants to hedge away the liability associated with the payout of the
claim, so again believing the Black–Scholes theory, he puts the initial wealth of X0 = ξ0 in
3...but
p
be careful: for large K, the graph can grow no faster than 2 log K/(T − t). See example sheet
4.
70
his account and holds a portfolio of
∂V
πt = (t, St , σ̂)
∂S
shares of the stock at all times. His wealth then evolves as
(*) dXt = r(Xt − πt St )dt + πt dSt .
The banker knows that according to the Black–Scholes theory his strategy should replicate
the claim XT = g(ST ) a.s.
Suppose that the true dynamics of the market are given by
dBt = Bt rdt
dSt = St (µdt + σt dWt )
where r and µ are the same constants as before, but now (σt )t≥0 is some predictable process.
How big is the hedging error XT − g(ST )?
First note that V solves the Black–Scholes PDE:
∂V ∂V 1 ∂ 2V
+ rS + σ̂ 2 S 2 2 = rV.
∂t ∂S 2 ∂S
V (T, S) = g(S).
Now note that by Itô’s formula and the Black–Scholes PDE
∂V ∂V 1 ∂ 2V
dV (t, St , σ̂) = dt + dSt + dhSit
∂t ∂S 2 ∂S 2
1 ∂ 2V
= rV dt + πt (dSt − rSt dt) + St2 (σt2 − σ̂ 2 ) 2 dt
2 ∂S
Subtracting this equation from equation (*) and solving yields
1 T r(T −t) 2 ∂ 2V
Z
e (σ̂ − σt2 )St2 2 (t, St , σ̂)dt = XT − V (T, ST , σ̂) − erT (X0 − V (0, S0 , σ̂))
2 0 ∂S
= XT − g(ST )
since X0 = ξ0 by assumption and ξ0 = V (0, S0 , σ̂) by the definition of σ̂.
The above formula show that the naive Black–Scholes hedger does reasonably well in a
world where the implied volatility is close to the actual spot volatility. For many claims,
2
such as call options, the gamma ∂∂SV2 is positive. Therefore, the naive hedgers strategy may
fall short if the implied volatility is smaller than the realised spot volatility.
where we have appealed to Itô’s formula4 with g(x) = (x − K)+ , g 0 (x) = 1[K,∞) (x), and
g 00 (x) = δK (x), the Dirac delta ‘function’.
Now, by the assumption of smoothness and the bounds on the volatility function, the
Q-law of the random variable ST has a density function fST . Computing expected values of
both sides
Z TZ ∞
1 T
Z
rT +
(1) e C0 (T, K) = (S0 − K) + fSt (y)y r dy dt + fSt (K)K 2 σ(t, K)2 dt
0 K 2 0
and then differentiating both sides with respect to T yields
Z ∞
rT ∂C0 1
e (T, K) + rC0 (T, K) = fST (y)y r dy + fSt (K)K 2 σ(t, K)2
∂T K 2
and the result follows from noting
Z ∞ Z ∞ Z ∞
+
fST (y) y dy = fST (y)(y − K) dy + K fST (y)dy
K 0 K
and applying the Breeden–Litzenberger identities.
4A version of Itô’s formula for non-smooth convex functions, called Tanaka’s formula, can actually be
rigorously stated in terms of a quantity called local time.
73
6. Computing marginal laws
We begin this section with some comments on the Breeden–Litzenberger formula. First
note that the collection of call prices {C0 (T, K) : K > 0}, determines the Q-law of the
random variable ST , even if a density doesn’t exist. Indeed, note that
C0 (T, K + ) − C0 (T, K)
lim = −e−rT Q(ST > K).
↓0
Also, if the put prices are given by the put-call parity formula
P0 (T, K) = C0 (T, K) − S0 + e−rT K
then the put prices also determine the marginal laws of S.
Now, if we know the distribution of ST , we can compute the expectation
EQ e−rT g(ST )
for any non-negative function g. Of course, the above quantity is the replication cost of a
contingent claim with payout ξT = g(ST ). Is there a way to deduce this replication cost
directly from the prices of calls and puts?
The answer is yes. Indeed, suppose g is C 2 and convex. Then, the following formula
holds identically
Z a Z ∞
0 00
g(S) = g(a) + g (a)(S − a) + +
g (K)(K − S) dK + g 00 (K)(S − K)+ dK
0 a
for any a > 0. This identity can be verified by integration by parts. By approximating the
integral by a Riemann sum
X X
g(S) ≈ g(a) − ag 0 (a) + g 0 (a)S + ∆Ki g 00 (Ki )(Ki − S)+ + ∆Ki g 00 (Ki )(S − Ki )+
Ki <a Ki ≥a
and we see that the financial significance is that the payout g(ST ) of the claim can be
approximated by holding an portfolio consisting of a bond with principal value g(a) − ag 0 (a),
g 0 (a) shares of the stock, ∆Ki g 00 (Ki ) puts of strike Ki < a and ∆Ki g 00 (Ki ) calls of strike
Ki ≥ a. And by integration, we get
Z a Z ∞
Q −rT 0 −rT 0 00
E [e g(ST )] = [g(a)−ag (a)]e +g (a)S0 + g (K)P0 (T, K)dK+ g 00 (K)C0 (T, K)dK.
0 a
6.1. Call prices from moment generating functions. Since a portfolio of calls and
puts on a stock can essentially replicate any European contingent claim, it is important to
have models where the call prices can be computed easily. Unfortunately, there are few
models where there exists nice, elementary formulae for the call prices. However, there
are many models where the moment generating functions can be computed explicitly, and
we will now see that given the moment generating function we can compute call prices by
integration:
Consider a market model (B, S) where Bt = B0 ert and S/B is a positive Q-martingale.
For complex θ in the vertical strip
Θ = {θ = p + iq : 0 ≤ p ≤ 1, q ∈ R}
74
define the moment generating function of the log stock price by
Mt (θ) = EQ (eθ log St ).
Note that since S/B is a martingale we have for θ = p + iq ∈ Θ,
EQ (|eθ log St |) = EQ (Stp )
≤ EQ (St )p
= eprt S0p < ∞
by Jensen’s inequality, so the moment generating function is well-defined. The following
result shows how to recover call prices from the moment generating function.
Theorem. For any 0 < p < 1 the identity
∞
e−rT K 1−p MT (p + ix)e−ix log K
Z
−rT +
E[e (ST − K) ] = S0 − dx
2π −∞ (x − ip)(x + i(1 − p))
holds.
Essentially, we are inverting the moment generating function via a complex integral.
Variants of this procedure are often called a Bromwich, Fourier or Mellin transform. To
prove this formula, we begin with a lemma:
Lemma. For any 0 < p < 1 the identity
Z ∞
eiax
−ap
1 e if a ≥ 0
dx =
2π −∞ (x − ip)(x + i(1 − p)) ea(1−p) if a < 0
holds.
Proof. This is a standard application of the Cauchy residue theorem. Consider the case
a ≥ 0. Define the semi-circular contour
ΓR = {x + i0 : −R ≤ x ≤ R} ∪ {Reiφ : 0 ≤ φ ≤ π}
in the upper half-plane. Cauchy’s theorem
eiaz eiaz
Z
dz = i2π
ΓR (z − ip)(z + i(1 − p)) z + i(1 − p) z=ip
−ap
= 2πe
75
since the integrand is meromorphic with a simple pole at z = ip inside the contour, and the
contour integral is evaluated in the anticlockwise sense.
On the other hand,
Z R Z π
eiaz eiax iRe−aR sin φ ei(aR cos φ+φ)
Z
dz = dx + iφ iφ
dφ
ΓR (z − ip)(z + i(1 − p)) −R (x − ip)(x + i(1 − p)) 0 (Re − ip)(Re + i(1 − p))
Now multiply by e−rT and compute expectations. The result follows upon interchanging
expectation and integration on the right-hand side. This is justified by Fubini’s theorem
since
Z ∞ Z ∞
ep log ST +ix log(ST /K) dx
EQ
dx = MT (p) p <∞
−∞ (x − ip)(x + i(1 − p)) −∞ (x + p )(x2 + (1 − p)2 )
2 2
Remark. Here is one application of the representation of call prices in terms of the
moment generating function. Let ΛT (p) = log MT (p) be the cumulant generating function.
By standard arguments, the function ΛT is convex and smooth on p ∈ [0, 1]. Note that
ΛT (0) = 0 and ΛT (1) = rT + log S0 . By the mean value theorem there exists a p∗ ∈ (0, 1)
such that Λ0T (p∗ ) = rT + log S0 . (Alternatively, let p∗ be the minimiser of p 7→ ΛT (p) −
p(rT + log S0 ). ) By Taylor’s formula
1
ΛT (p) ≈ ΛT (p∗ ) + (rT + log S0 )(p − p∗ ) + Λ00T (p∗ )(p − p∗ )2
2
where Λ00T (p∗ ) > 0. Hence
1 00 ∗ )x2
MT (p∗ + ix) ≈ MT (p∗ )ei(rT +log S0 )x− 2 ΛT (p
where k = log(Ke−rT /S0 ) is the log-moneyness, and the second approximation is appropriate
when Λ00T (p) → ∞.
76
6.2. Computing moment generating functions. In order to make use of the pre-
vious section, we need to be able to compute the moment generating function for some
interesting models. We first consider a general stochastic volatility model:
dBt = Bt rdt
√
dSt = St (rdt + vt dWtS )
dvt = A(vt )dt + B(vt )dWtv
Here W S and W v are assumed to be correlated Brownian motions in a fixed equivalent
martingale measure Q, with correlation ρ. Correlated Brownian motions can be constructed,
for instance, by letting W v and W ⊥ be independent Brownian motions and let
p
WtS = ρWtv + 1 − ρ2 Wt⊥ .
Theorem. For each θ ∈ Θ, let F (·, ·; θ) solve the PDE
∂F √ ∂F 1 ∂ 2F
+ [θr + (θ2 − θ)v/2]F + (A + θ vBρ) + B2 2 = 0
∂t ∂v 2 ∂v
with boundary condition
F (T, v; θ) = 1
then (Mt )0≤t≤T is a local martingale where
Mt = eθ log St F (t, v; θ).
Proof. This is just another application of Itô’s formula.
The significance of this result is that if we can prove the local martingale is a true
martingale, then
E[eθ log ST ] = eθ log S0 F (0, v0 ; θ)
and hence we have found the moment generating function.
To use this result, we need to solve a PDE in one spacial variable. Since the PDE for the
option prices would involve two spacial variables (v, S), we are in a better position finding
the moment generating function first via the above theorem, though we still need to evaluate
the Bromwich integral.
6.3. The Heston model. We now explore a model where the moment generating func-
tion can be computed explicitly. It was introduced by Heston in 1993:
dBt = Bt rdt
√
dSt = St (rdt + vt dWtS )
√
dvt = λ(v̄ − vt )dt + c vt dWtv
with hW S , W v i = ρt. This is just a special case of the stochastic
√ volatility model in the
previous subsection with A(v) = λ(v̄ − v) and B(v) = γ v for some positive constants
λ, v̄, γ. In this model the squared volatility v is a mean-reverting process , i.e. an ergodic
Markov process, at least under Q. The interpretation of v̄ is the level of mean reversion,
while λ is the speed of mean reversion. We will come across the stochastic process in the
context of the Cox–Ingersoll–Ross rate model. It was first studied by Feller in the 1950s.
77
The Heston PDE is then
∂F ∂F 1 ∂ 2F
+ [θr + (θ2 − θ)v/2]F + [λv̄ + (θcρ − λ)v] + c2 v 2 = 0.
∂t ∂v 2 ∂v
It turns out that this PDE can be solved explicitly. The trick is to make the ansatz
F (t, v; θ) = eR(T −t;θ)v+Q(T −t;θ) .
Note that the boundary condition F (T, v; θ) = 1 force R(0; θ) = Q(0; θ) = 0. The PDE
becomes
1
−Ṙv − Q̇ + [θr + (θ2 − θ)v/2] + [λv̄ + (θcρ − λ)v]R + c2 vR2 = 0,
2
where the dot indicates differentiation with respect to the time variable. Notice that the
equation can be written in the form
α(T − t; θ)v + β(T − t; θ) = 0.
Now, the above equation should hold for all v so α(T − t; θ) = 0 = β(T − t; θ), i.e
1
Ṙ = (θ2 − θ)/2 + (θcρ − λ)R + c2 R2
2
Q̇ = θr + λv̄R.
The equation for R is a Riccati equation which can be solved explicitly. In fact, we do not
even have to make any tricky substitutions, separation of variables and partial fractions work
well enough:
1
Ṙ = c2 (R − R+ )(R − R− )
2
Ṙ 1
⇒ = c2
(R − R+ )(R − R− ) 2
1 1 1 1
⇒ − Ṙ = c2
R+ − R− R − R+ R − R− 2
1 − R(τ )/R+
⇒ log = γτ
1 − R(τ )/R−
eγ(θ)τ − 1
⇒ R(τ ; θ) = (θ2 − θ)
(γ(θ) − θcρ + λ)eγ(θ)τ + (γ(θ) + θcρ − λ)
p
where γ(θ) = (λ − θcρ)2 − (θ2 − θ)c2 and R± (θ) = [(λ − θcρ)2 ± γ(θ)]/c2 . And the second
equation can be solved
Z τ
Q(τ ; θ) = θrτ + λv̄R(s; θ)ds
0
(θ2 − θ)λv̄ (γ(θ) − θcρ + λ)eγ(θ)τ + (γ(θ) + θcρ − λ)
2λv̄
= θr + τ − 2 log
γ(θ) + θcρ − λ c 2γ(θ)
It can be shown that for θ ∈ Θ that
EQ (eθ log ST ) = eθ log S0 +R(T ;θ)v0 +Q(T ;θ) .
78
What is the point of this calculation? Although the formula for the moment generating
function is hard to call beautiful, it is very explicit. In particular, given the set of model
parameters (v0 , λ, v̄, c), the function can be evaluated very quickly on a computer, and hence
the Bromwich integral for call prices can be computed numerically quickly. Hence, it is
possible to calibrate the Heston model to market data in a reasonable amount of time. This
is one of the main reasons for its popularity.
where a = σσ > .
Suppose V : [0, T ] × Rd → [0, ∞) solves the variational inequality
max {LV, g − V } = 0
V (T, S) = g(S).
Let X be the wealth process started with X0 = V (0, S0 ) and with πt = gradV (t, St ) shares
of stock at time t. Then Xt ≥ g(St ) for all t ∈ [0, T ], and there exists a stopping time τ∗
such that Xτ∗ = g(Sτ∗ ).
Remark. To see where this variational inequality comes from, let’s consider heuristically
the Snell envelope of Y ξ which should satisfy an equation like
Zt = max{Yt ξt , E[Zt+δ |Ft ]}
where δ > 0 is a small increment of time, the process ξ specifies the payout of the claim and
Y is the state price density. First, let U = Z/Y and let Q the equivalent martingale measure
corresponding to Y and the numéraire B. Then U should satisfy
Ut = max{ξt , EQ [e−rδ Ut+δ |Ft ]}.
79
Now, since ξt = g(St ) and S is a Markov process under Q, we suspect that Ut = V (t, St ) for
some function V . By Itô’s formula
Z t+δ Z t+δ
−rδ −rs ∂V
e V (t + δ, St+δ ) = V (t, St ) + e LV (s, Ss )ds + e−rs (s, Ss )dŴs
t t ∂S
where Ŵ is a Q-Brownian motion. Assuming the stochastic integral is mean-zero, we have
Z t+δ
−rs
V (t, S) = max g(S), V (t, S) + EQ
e LV (s, Ss )ds|St = S .
t
Subtracting V (t, S) from both sides and sending δ ↓ 0 yields the variational inequality
appearing in the theorem.
Remark. It should not be too surprising that the differential operator L also appeared
in our discussion of hedging European contingent claims. The following proof will proceed
in a similar way to the proof of the robustness of Black–Scholes implied volatility.
Proof. Let X be the wealth process. As usual we have
dXt = rXt dt + πt · (dSt − rSt dt).
Also by Itô’s formula we have
d d
!
∂V 1 XX ∂ 2V
dV (t, St ) = + ai,j S i S j i j dt + πt · dSt
∂t 2 i=1 j=1 ∂S ∂S
5going
back at least to H. McKean. Appendix: A free boundary problem for the heat equation arising
from a problem in mathematical economics. Industrial Management Review 6: 32-39. (1965)
81
CHAPTER 6
Rather than speak of bond prices, it is often easier to speak of interest rates. A popular
interest rate is the yield y(t, T ) at time t of a bond maturing at time T defined by the
1We assume that the bond issuer is absolutely credit worthy, and there is exactly zero probability of
default. Therefore, we are not discussing corporate bonds, mortgage-backed securities or the debt of some
countries (for instance, Russia famously defaulted in 1998). In fact, there is probably no real-world example
of a perfectly risk-free bond. Nevertheless, many practictioners probably still regard U.S. Treasury bonds,
which are backed by the ‘full faith and credit’ of the U.S. government, as virtually risk-free. Though with
the current political situation in Washington, this may well change.
83
formula
1
y(t, T ) = − log P (t, T ).
T −t
For us, a more useful interest rate is the forward rate f (t, T ) at time t for maturity T ,
defined by
∂
f (t, T ) = − log P (t, T ).
∂T
The yield curve, the forward rate curve and the bond price curve contain the same informa-
tion, since
RT
P (t, T ) = e−(T −t) y(t,T ) = e− t f (t,s)ds
The term structure of interest rates refers the function T 7→ P (t, T ), or equivalently, the
price data encoded in either of the functions T 7→ y(t, T ) or T 7→ f (t, T ).
There are at least two perspectives to bond price modelling. One is to assume that bonds
are derivative securities, where the underlying asset is a bank or money market account. A
complementary perspective is to consider the bonds as fundamental and the bank account
as a derivative asset (see example sheet 1). We will mostly explore the first perspective in
this chapter, but will return to the second perspective in our study of HJM models.
so the short rate is the left-hand end point of the forward rate curve. (The long rate
limT ↑∞ f (t, T ) is the far right-hand end of the curve.)
From common experience, it seems that we should like to model the interest rate (rt )t≥0
as a non-negative process. Indeed, if rt ≥ 0 for all t ≥ 0 then the map T 7→ P (t, T ) is
decreasing. However, for the sake of tractablility, this modelling requirement is frequently
dropped.
3.1. Vasicek model. In 1977, Vasicek proposed the following model for the short rate:
drt = λ(r̄ − rt )dt + σdŴt
for a parameter r̄ > 0 interpreted as a mean short rate, a mean-reversion parameter λ > 0,
and a volatility parameter σ > 0. This stochastic differential equation can be solved explicitly
to yield
Z t
−λt −λt
rt = e r0 + (1 − e )r̄ + e−λ(t−s) σdŴs .
0
Note that the short interest rate in the Vasicek model follows an Ornstein–Uhlenbeck process,
and in particular, that for each t ≥ 0 the random variable rt is Gaussian under the measure
Q with
Z t
−λt −λt σ2
E (rt ) = e r0 + (1 − e )r̄ and Var (rt ) =
Q Q
e−2λ(t−s) σ 2 ds = (1 − e−2λt ).
0 2λ
85
Moreover,
one can show that the process is ergodic and converges to the invariant distri-
σ2
bution N r̄, 2λ . In particular, we have
1 T
Z
rs ds → r̄ Q − almost surely.
T 0
Please note, however, that in the present framework we can say absolutely nothing about the
distribution of rt for the objective measure P, unless we have a model for the market price
of risk.
Since the short rate rt is Gaussian, the advantage of this type of model is that it is
relatively easy to compute prices, for instance of bonds, explicitly. A disadvantage of this
model is that there is a chance that rt < 0 for some time t > 0. Recall that a normal random
variable can take any real value, both positive and negative. However, for sensible parameter
values, the Q-probabilty of the event {rt < 0} is pretty small.
We have learned from example sheet 3 that
Z T Z T Z TZ t
−λt −λt
rt dt = [e r0 + (1 − e )r̄]dt + e−λ(t−s) σdŴs dt
0 0 0 0
Z T Z T Z T
−λt −λt −λ(t−s)
= [e r0 + (1 − e )r̄]dt + e dt σdŴs
0 0 s
Z T
σ2 T
Z
−λt −λt −λt 2
∼N [e r0 + (1 − e )r̄]dt, 2 (1 − e ) dt
0 λ 0
under Q, so that, using the moment generating function of a Gaussian random variable we
have
RT
P (0, T ) = EQ [e− 0 rt dt ]
Z T
σ2
−λt −λt −λt 2
= exp − e r0 + (1 − e )r̄ − 2 (1 − e ) dt
0 2λ
so that
σ2
f (0, T ) = e−λt r0 + (1 − e−λt )r̄ −
2
(1 − e−λt )2
2λ
By the time-homogeneity of the Vasicek model, we can actually deduce the formula
−λx −λx σ2
f (t, t + x) = rt e + r̄(1 − e ) − 2 (1 − e−λx )2
2λ
This formula says that for the Vasicek model, the forward rates at time t are an affine
function of the short rate at time t. (An affine function is of the form g(x) = ax + b, that is,
its graph is a line.)
The insight of Heath, Jarrow, and Morton in 1992 was that we can change perspectives
by modelling the bond prices directly.
Motivation. Indeed, suppose we start out with just the bond market, but without the
bank account. We can construct the bank account by considering an investor holding his
wealth in just-maturing bonds. More concretely, suppose at time 0 the investor has B0 units
of wealth. Fix a sequence 0 ≤ t0 < t1 < . . . of times and suppose that during the interval
(ti−1 , ti ] the investor holds all of his wealth in the bond which matures at time ti . If the
88
investor’s wealth at time t is denoted by Bt , and the number of shares of the just-maturing
bond by πt , the budget constraint is
Bti−1 = πti P (ti−1 , ti )
and the self-financing condition is
Bti = πti
since P (t, t) = 1 for all t. Hence, the rate of change of the wealth is given by
Bti − Bti−1 Bti−1 1 − P (ti−1 , ti )
=
ti − ti−1 Pti−1 (ti ) ti − ti−1
By taking the limit as ti − ti−1 → 0, we can define the spot rate by
∂
rt = − P (t, T )|T =t
∂T
so that dBt = Bt rt dt as before.
The usual formulation of the HJM idea is in terms of the forward rates. As usual,
we put ourselves in the context of a probability space (Ω, F, Q) on which we can define a
d-dimensional Brownian motion (Ŵt )t≥0 .
Theorem. Suppose for each T , the foward rate process (f (t, T ))t∈[0,T ] has dynamics
n
X (i)
df (t, T ) = a(t, T )dt + σ (i) (t, T )dŴt
i=1
for some suitably regular adapted processes (a(t, T ))t∈[0,T ] and (σ (i) (t, T ))t∈[0,T ] . Let the short
rate be given by rt = f (t, t) and the bank account dynamics by
dBt = Bt rt dt.
Finally, let the bond prices be given by
RT
P (t, T ) = e− t f (t,s) ds
.
If
d
X Z T
(i)
a(t, T ) = σ (t, T ) σ (i) (t, s)ds,
i=1 t
usually called the HJM drift condition. Notice that this drift/volatility contraint is not
present in the factor models from the previous sections.
89
The difference with the short rate models is that we are now trying to model the dynamics
of the whole term structure. Indeed, in the HJM framework, we can initialize the model with
any initial forward rate curve T 7→ f (0, T ). Nevertheless, note that any of the short rate or
factor models can be put into the HJM framework, just by choosing the initial forward rate
curve to match the one predicted by the model.
Rt RT
Proof. We must show that for each T > 0, the discounted bond price process e− 0 rs ds− t f (t,s)ds
is a local martingale. Now applying some formal manipulations (we assume enough regularity
that we can appeal to a stochastic Fubini theorem)
Z t Z T Z T
d rs ds + f (t, s) ds = (rt − f (t, t))dt + df (t, s) ds
0 t t
Z T 2 Z T
1
= σ(t, s)ds dt + σ(t, s)ds · dŴt .
2 t t
5.1. Ho–Lee. (1986) This model is the simplest possible model HJM model. Let d = 1
and σ(t, T ) = σ0 be constant. Then
df (t, T ) = σ02 (T − t) dt + σ0 dŴt .
or
f (t, T ) = f (0, T ) + σ02 (T t − t2 /2) + σ0 Ŵt .
Here is an unusual feature of this model: if the initial forward rate curve T 7→ f (0, T ) is
bounded from below, then for positive times t the forward rates f (0, T ) → ∞ as T → ∞.
The short rate is then given by
rt = f (0, t) + σ02 t2 /2 + σ0 Ŵt .
Hence the Ho–Lee model corresponds to the following short rate model:
drt = (f00 (t) + σ02 t)dt + σ0 dŴt .
5.2. Vasicek–Hull–White. (1990) Again let d = 1 but now σ(t, T ) = σ0 e−λ(T −t) for
positive constants σ0 and λ. Then
σ02 −λ(T −t)
df (t, T ) = e (1 − e−λ(T −t) )dt + σ0 e−λ(T −t) dŴt .
λ
90
The short rates are given by
t Z t
σ02 −λ(t−s)
Z
−λ(t−s)
rt = f (0, t) + e (1 − e )ds + σ0 e−λ(t−s) dŴs
0 λ 0
2 Z t
σ
= f (0, t) + 02 (1 − e−λt )2 + σ0 e−λ(t−s) dŴs
2λ 0
The short rate dynamics are given by
Z t
σ02 −λt
0 −λt
drt = f0 (t) + e (1 − e ) dt + σ0 dŴt − λ σ0 e−λ(t−s) dŴs dt
λ 0
0
σ02
f0 (t)
=λ + f0 (t) + 2 (1 − e−2λt ) − rt dt + σ0 dŴt
λ 2λ
Hence, the Hull–White extension of the Vasicek essentially replaces the mean interest rate r̄
with a time-varying, but non-random, mean rate r̄(t).
5.3. Kennedy. (1994) Note that for the HJM models discussed above, the forward rates
are given by
Z t Z T Z t
f (t, T ) = f (0, T ) + σ(u, T ) · σ(u, s)ds du + σ(u, T ) · dŴu .
0 u 0
If σ is not random, then the distribution of f (t, T ) under the risk-neutral measure Q is
Gaussian with mean
Z t Z T
Q
E [f (t, T )] = f0 (T ) + σ(u, T ) · σ(u, s)ds du
0 u
and covariance Z s∧t
Q
Cov [f (s, S), f (s, T )] = σ(u, S) · σ(u, T )du.
0
Kennedy reversed this logic, and considered a Gaussian random field {f (t, T ) : 0 ≤ t ≤
T } with mean µ(t, T ) and covariance C(s, t; S, T ). Suppose that covariance has the special
form
C(s, t; S, T ) = cs∧t (S, T )
so that, for each fixed T > 0, the increments of (f (t, T ))t∈[0,T ] are independent. Then the
discounted bound prices are local martingales (actually true martingales since everything is
Gaussian and we can compute the conditional expectations by hand) when the mean is given
by Z T
µ(t, T ) = f (0, T ) + ct∧s (s, T )ds.
0
An advantage of this formulation of the Gaussian HJM model is that one is no longer
restricted to finite dimensional Brownian motions, and, therefore, there is much more flex-
ibility to specify the correlation of the increments. For instance, one choice is to have the
correlation of the increments decay exponentially in the difference of the maturities:
‘corr(df (t, t + x), df (t, t + y) = e−β|x−y| .’
Since the operator on L2 (R+ ) with kernel e−β|x−y| is not of finite rank, the above correlation
could not be realised by a finite rank HJM model. However, since the operator is positive
91
definite, it can be the correlation of a Gaussian random field. Actually, this model can
be realised as an HJM model driven by an infinite dimensional Brownian motion. See the
book Interest Rate Models: an Infinite Dimensional Stochastic Analysis Perspective by René
Carmona and me for details.
92
CHAPTER 7
These notes are a list of many of the definitions and results of probability theory needed
to follow the Advanced Financial Models course. Since they are free from any motivating
exposition or examples, and since no proofs are given for any of the theorems, these notes
should be used only as a reference. A table of notation is in the appendix.
1. Measures
Definition. Let Ω be a set. A sigma-field on Ω is a non-empty set F of subsets of Ω
such that
(1) if A ∈ F then Ac ∈ F, S
(2) if A1 , A2 , . . . ∈ F then ∞
i=1 Ai ∈ F.
The terms sigma-field and sigma-algebra are interchangeable.
The Borel sigma-field B on R is the smallest sigma-field containing every open interval.
More generally, if Ω is a topological space, for instance Rn , the Borel sigma-field on Ω is the
smallest sigma-field containing every open set.
Definition. Let Ω be a set and let F be a sigma-field on Ω. A measure µ on the
measurable space (Ω, F) is a µ : F → [0, ∞] such that
(1) µ(∅) = 0
(2) if A1 , A2 , . . . ∈ F are disjoint then µ( ∞
S P∞
i=1 Ai ) = i=1 µ(Ai ).
2. Random variables
Definition. Let (Ω, F, P) be a probability space. A random variable is a function
X : Ω → R such that the set {ω ∈ Ω : X(ω) ≤ t} is an element of F for all t ∈ R.
93
Let A be a subset of R, and let X be a random variable. We use the notation {X ∈ A}
to denote the set {ω ∈ Ω : X(ω) ∈ A}. For instance, the event {X ≤ t} denotes {ω ∈ Ω :
X(ω) ≤ t}.
The distribution function of X is the function FX : R → [0, 1] defined by
FX (t) = P(X ≤ t)
for all t ∈ R.
We also use the term random variable to refer to measurable functions X from Ω to more
general spaces. In particular, we call a function X : Ω → Rn a random variable or random
vector if X(ω) = (X1 (ω), . . . , Xn (ω)) and Xi is a random variable for each i ∈ {1, . . . , n}.
Definition. Let A be an event in Ω. The indicator function of the event A is the
random variable 1A : Ω → {0, 1} defined by
1 if ω ∈ A
1A (ω) = 0 if ω ∈ Ac
for all ω ∈ Ω.
4. Special distributions
Definition. Let X be a discrete random variable taking values in Z+ with mass function
pX .
The random variable X is called
• Bernoulli with parameter p if
pX (0) = 1 − p and pX (1) = p.
where 0 < p < 1. Then E(X) = p and Var(X) = p(1 − p).
• binomial with parameters n and p, written X ∼ bin(n, p), if
n k
pX (k) = p (1 − p)n−k for all k ∈ {0, 1, . . . , n}
k
where n ∈ N and 0 < p < 1. Then E(X) = np and Var(X) = np(1 − p).
• Poisson with parameter λ if
λk −λ
pX (k) = e for all k = 0, 1, 2, . . .
k!
where λ > 0. Then E(X) = λ.
• geometric with parameter p if
pX (k) = p(1 − p)k−1 for all k = 1, 2, 3, . . .
where 0 < p < 1. Then E(X) = 1/p.
Definition. Let X be a continuous random variable with density function fX .
The random variable X is called
• uniform on the interval (a, b), written X ∼ unif(a, b), if
1
fX (t) = for all a < t < b
b−a
for some a < b. Then E(X) = a+b2
.
• normal or Gaussian with mean µ and variance σ 2 , written X ∼ N (µ, σ 2 ), if
(x − µ)2
1
fX (t) = √ exp − for all t ∈ R
2πσ 2σ 2
for some µ ∈ R and σ 2 > 0. Then E(X) = µ and Var(X) = σ 2 .
• exponential with rate λ, if
fX (t) = λe−λt for all t ≥ 0
for some λ > 0. Then E(X) = 1/λ.
96
If X is a random vector valued in Rn with density
−n/2 −1/2 1 −1
fX (x) = (2π) det(V ) exp − (x − µ) · V (x − µ)
2
for a positive definite n × n matrix V and vector µ ∈ Rn , then X is said to have the n-
dimensional normal (or Gaussian) distribution with mean µ and variance V , written X ∼
Nn (µ, V ). Then E(Xi ) = µi and Cov(Xi , Xj ) = Vij .
5. Conditional probability and expectation, independence
Definition. Let B be an event with P(B) > 0. The conditional probability of an event
A given B, written P(A|B), is
P(A ∩ B)
P(A|B) = .
P(B)
The conditional expectation of X given B, written E(X|B), is
E(X 1B )
E(X|B) = .
P(B)
Theorem (The law of total probability). Let B1 , B2 , . . . be disjoint, non-null events such
that ∞
S
B
i=1 i = Ω. Then
X∞
P(A) = P(A|Bi )P(Bi )
i=1
for all events A.
Definition. Let A1 , A2 , . . . be events. If
\ Y
P( Ai ) = P(Ai )
i∈I i∈I
for every finite subset I ⊂ N then the events are said to be independent.
Random variables X1 , X2 , . . . are called independent if the events {X1 ≤ t1 }, {X2 ≤
t2 }, . . . are independent. The phrase ‘independent and identically distributed’ is often ab-
breviated i.i.d.
Theorem. If X and Y are independent and integrable, then
E(XY ) = E(X)E(Y ).
6. Probability inequalities
Theorem (Markov’s inequality). Let X be a positive random variable. Then
E(X)
P(X ≥ ) ≤
for all > 0.
Corollary (Chebychev’s inequality). Let X be a random variable with E(X) = µ and
Var(X) = σ 2 . Then
σ2
P(|X − µ| ≥ ) ≤ 2
for all > 0.
97
7. Characteristic functions
Definition. The characteristic function of a real-valued random variable X is the func-
tion φX : R → C defined by
φX (t) = E(eitX )
√
for all t ∈ R, where i = −1. More generally, if X is a random vector valued in Rn then
φX : Rn → C defined by
φX (t) = E(eit·X )
is the characteristic function of X.
Theorem (Uniqueness of characteristic functions). Let X and Y be real-valued ran-
dom variables with distribution functions FX and FY . Let φX and φY be the characteristic
functions of X and Y . Then
φX (t) = φY (t) for all t ∈ R
if and only if
FX (t) = FY (t) for all t ∈ R.
Furthermore, if r ≥ p ≥ 1 then Xn → X in Lr ⇒ Xn → X in Lp .
Definition. Let A1 , A2 , . . . be events. The term eventually is defined by
[ \
{An eventually} = An
N ∈N n≥N
99
R the set of real numbers
R+ the set of non-negative real numbers [0, ∞)
N the set of natural numbers {1, 2, . . .}
C the set of complex numbers
Z the set of integers {. . . , −2, −1, 0, 1, 2, . . .}
Z+ the set of non-negative integers {0, 1, 2, . . .}
Ac the complement of a set A, Ac = {ω ∈ Ω, ω ∈ / A}
a∧b min{a, b}
a∨b max{a, b}
a+ max{a, 0}
lim supn↑∞ xn the limit superior of the sequence x1 , x2 , . . .
lim inf n↑∞ xn the limit inferior of the sequence x1 , x2 , . . .
Pn
a·b Euclidean inner (or dot) product in Rn , a · b = i=1 ai b i
|a| Euclidean norm in Rn , |a| = (a · a)1/2
100
Index
Vasicek model, 85
Wiener process, 44
yield curve, 83
zero-coupon bond, 83
103