Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
17 views103 pages

Advanced Financial Models Guide

This document discusses advanced financial models and provides an overview of the topics that will be covered in the notes, including arbitrage theory, pricing contingent claims, Brownian motion, stochastic calculus, interest rate models, and probability theory. It also acknowledges simplifying assumptions that are made for theoretical models, such as no dividends, transaction costs, or short selling constraints.

Uploaded by

Gonzalo Saavedra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views103 pages

Advanced Financial Models Guide

This document discusses advanced financial models and provides an overview of the topics that will be covered in the notes, including arbitrage theory, pricing contingent claims, Brownian motion, stochastic calculus, interest rate models, and probability theory. It also acknowledges simplifying assumptions that are made for theoretical models, such as no dividends, transaction costs, or short selling constraints.

Uploaded by

Gonzalo Saavedra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 103

Advanced Financial Models

Michael R. Tehranchi
Contents

1. Standing assumptions: complications we ignore 5


2. Prerequisite knowledge 8

Chapter 1. Arbitrage theory for discrete-time models 9


1. The set-up 9
2. Investment and consumption 11
3. A motivating utility maximisation problem and arbitrage 12
4. Arbitrage and the first fundamental theorem 17
5. Proof the harder direction of 1FTAP 23
6. Numéraires and equivalent martingale measures 24

Chapter 2. Pricing and hedging contingent claims in discrete-time models 31


1. European claims and the second fundamental theorem of asset pricing 31
2. Super-replication of American claims 39

Chapter 3. Brownian motion and stochastic calculus 43


1. Brownian motion 44
2. Itô stochastic integration 44
3. Itô’s formula 48
4. Girsanov’s theorem 52
5. A martingale representation theorem 53

Chapter 4. Arbitrage theory for continuous-time models 55


1. The set-up 55
2. Admissible strategies 55
3. Arbitrage and local martingale deflators 58
4. The structure of local martingale deflators 60

Chapter 5. Hedging contingent claims in continuous time models 63


1. Replication and super-replication 63
2. The Black–Scholes model and formula 65
3. Markovian markets and the Black–Scholes PDE 66
4. Black–Scholes volatility 68
5. Local volatility models 71
6. Computing marginal laws 74
7. American claims in local volatility models 79

Chapter 6. Interest rate models 83


1. Bond prices and interest rates 83
3
2. Bank accounts to bond prices and interest rates 84
3. Short rate models 85
4. Markovian short rate models 86
5. The Heath–Jarrow–Morton framework 88
Chapter 7. Crashcourse on probability theory 93
1. Measures 93
2. Random variables 93
3. Expectations and variances 94
4. Special distributions 96
5. Conditional probability and expectation, independence 97
6. Probability inequalities 97
7. Characteristic functions 98
8. Fundamental probability results 98
Index 101

4
Financial mathematics as a subject is young (as compared to, say, number theory), but
it is mature enough now that there has emerged some consensus on the notation, vocabulary
and important results. These notes are an attempt to present many of the main ingredients
of this theory, mainly concerning the pricing and hedging of derivative securities.
But before launching into the story, we will begin by acknowledging some of the real-world
complications that will not be discussed at length hereafter.

1. Standing assumptions: complications we ignore


Unfortunately, actual financial markets are very complicated. Of course, in order to
develop a systematic financial theory, it is prudent to concentrate on the essential features
of these markets and ignore the less essential complications. Therefore, the theory that will
be presented in these notes is concerned with the analysis of market models that have plenty
of simplifying assumptions.
That is not to say that these complications are not important. Indeed, there is active
ongoing research attempting to remove these simplifying assumptions from the canonical
theory. Below is a list of these assumptions.

1.1. Dividends. The total stock of a publicly traded firm is divided into a fixed number
N of shares. The owner of each share is then entitled to the fraction 1/N of the total profit
of the firm.1 A portion of the firm’s profit is usually reinvested by management, for instance
by building new factories, but the rest of the profit is paid out to the shareholders. In
particular, the owner of each share of stock will receive periodically a dividend payment.
However, in this course,
we will assume that there are no dividend payments.
Actually, this assumption is not as terrible as it sounds. Example sheet 1 will show how to
adapt the theory developed for assets that pay no dividends to incorporate assets that have
non-zero dividend payments.

1.2. Tick size. Financial markets usually have a smallest increment of price, the tick.
(The tick refers back to the days when prices were quoted on ticker tape.) Indeed, the tick
size can vary from market to market, and even for assets traded in the same market. There
seems to be an industry-wide effort to hamonise tick sizes, but a quick google search found
this document
http://cdn.batstrading.com/resources/participant resources/BATSEuro Ticks.pdf
which highlights the complexity of the system in Europe.
However, in this course,
we will assume that the tick size is zero.
This is a convenient assumption for those who prefer continuous mathematics to discrete. It
is usually a harmless assumption, unless the prices of interest are very close to zero.
1Actually, things are even more complicated. For instance, stocks can be classified as either common or
preferred, with implications on dividends, voting rights and claims on the firm’s assets in case of bankruptcy.
Also, the number N of shares outstanding is not necessarily fixed.
5
1.3. Transactions costs. Financial transactions are processed by a string of middle
men, each of whom charge a fee for their services. Usually the fee is nearly proportional to
the size of the transaction.
However, in this course,
we will assume that there are no transactions costs.
This assumption is justified by by the fact that transactions costs are often very small relative
to the size of typical transactions. But one must always remember that in some applications,
it might not be wise to neglect these costs.

1.4. Short-selling constraints. In the real world, it is actually possible for someone
to sell an asset that he does not own. The essential mechanism is to borrow a share of that
asset from a broker, and then immediately to sell it to the market. This procedure is called
short selling.
Brokers, however, place contraints on this behaviour. Indeed, they usually require collat-
eral and charge a fee for their service. Furthemore, if the market price of the asset increases,
or if the price of the collateral decreases, the broker may ask the short seller to put up even
more collateral.
However, in this course,
we will assume that there are no short-selling constraints.
Indeed, the theory of discrete-time trading is cleaner without additional assumptions on the
sizes of trades. But we will see that to overcome some technical problems in the theory of
continuous-time trading, it will be natural to restrict trading to what are called admissible
strategies.

1.5. Divisibility of assets. There is another real-world trading constraint of a rather


technical nature. The smallest unit of stock is the share. A share cannot be further divided
– it is generally impossible to buy half a share of a particular stock.
However, in this course,
we will assume that assets are infinitely divisible.

1.6. Bid-ask spread. Real-world trading is asymmetrical since the price to buy a share
is usually higher than the price to sell it. The reason is that are two different ways to buy
or sell an asset listed on an exchange: the limit order and the market order.
A limit buy order is an offer to buy a certain number of shares of the asset at a certain
price. A limit sell order is defined similarly. The collection of unfilled limit orders is called
the limit order book.
At any time, there is the highest price for which there is an order to buy the asset.
This is called the bid price. The lowest price for which there is an order to sell is called
the ask price. The bid/ask spread is the difference. Figure 1 illustrates the evolution of a
hypothetical limit order book as various orders arrive and are filled.
A market order are instructions to execute a transaction at the best available price.
In particular, if the market order is to buy, then the lowest limit sell order is filled first.
Therefore, for small market buy orders, the per share price paid is the ask price. Similarly,
6
if a market sell order arrives, then the highest limit buy order is filled first, and hence the
per share price received is the bid price.
However, in this course,
we will assume that there are no bid-ask spreads.
This assumption is justified by the observation that in many markets, the spread is very
small. However, in times of crisis, this assumption is not usually applicable, and hence the
theory breaks down dramatically.

Figure 1. Top left. The bid price is £8 and the ask is £11. Top right.
A limit sell order for three shares at £11 arrives. Bottom left. A limit buy
order for two shares at £8 is cancelled. Bottom right. A market order to
buy five shares arrives. Note that four shares are sold at £11 and one at £12.
After the transaction, the ask price is £12.

1.7. Market depth. As described above, there are only a finite number of limit orders
on the book at one time. If a large market buy order arrives, for instance, then the lowest
limit sell order is filled first. But if the market order is bigger than the total shares available
to buy at the ask price, then the limit orders at the next-to-lowest price are filled, and
progresses up the book until the market order is finally filled. In this way, the ask price
increases.
7
The market depth is the number of shares available to buy or sell at the ask or bid price
respectively. Equivalently, the depth of a market is a measure of the size of a market order
necessary to move quoted prices.
However, in this course,
we will assume that there is infinite market depth.
Equivalently, we will assume that investors are small relative to the limit order book, so they
are price takers, not price makers. However, the most recent financial crisis shows that this
assumption does not always approximate reality – just ask the traders at Lehman Brothers!
2. Prerequisite knowledge
The emphasis of this course is on some of the mathematical aspects of financial market
models. Very little is assumed of the reader’s knowledge of the workings of financial markets.
However, some mathematical background is needed.
Our starting point is the famous observation (sometimes attributed to Niels Bohr) that
it is difficult to make predictions, especially about the future. Indeed, anyone with even
a passing acquaintance with finance knows that most of us cannot predict with absolute
certainty how the the price of an asset will fluctuate – otherwise we would be much richer!
Therefore, the proper language to formulate the models that we will study is the lan-
guage of probability theory. An attempt is made to keep this course self-contained, but you
should be familiar with the basics of the theory, including knowing the definition and key
properties of the following concepts: random variable, expected value, variance, conditional
probability/expectation, independence, Gaussian (normal) distribution, etc. Familarity with
measure theoretical probability is helpful, though a crashcourse on probability theory is given
in an appendix.

Please send all comments and corrections (including small typos and major blunders) to
me at [email protected].

8
CHAPTER 1

Arbitrage theory for discrete-time models

1. The set-up
The models we will encounter will be of form P = (Pt1 , . . . , Ptn )t∈T where Pti will model
the price of a financial asset (stock, bond, etc.) at time t ∈ T. In this course, the index set
T will be one of two sets
• Z+ = {0, 1, 2, . . .} when time is discrete, and
• R+ = [0, ∞) when time is continuous.
Usually, the context will be clear and we write ‘t ≥ 0’ for ‘t ∈ T’.
A modelling assumption that we will use throughout is that

the n-dimensional stochastic process P is adapted to a filtration F.

We now briefly describe what this means.


*****
Recall that a probability space is a triple (Ω, F, P) where
• Ω is a set, called the sample space, whose elements are interpreted as the possible
outcomes of an experiment;
• F is a sigma-field1 on Ω, whose elements are interpreted as measurable events;
• P is a probability measure on (Ω, F), a countably additive function on F such that
P(Ω) = 1.
A random variable X is simply an F-measurable2 function from Ω to R. We say that
Y = (Y1 , . . . , YN ) is an N -dimensional random vector if each Yi is a random variable. A
stochastic process (Zt )t≥0 is just a collection of random variables (or vectors) indexed by the
parameter t, interpreted as time, either discrete or continuous.
We now formalise the concept of information being revealed as time marches forward.
The correct notions are that of a filtration and adaptedness.
Definition. A filtration F = (Ft )t≥0 on the probability space (Ω, F, P) is a collection
of sigma-fields such that Fs ⊆ Ft ⊆ F for all 0 ≤ s ≤ t.
Definition. A process X = (Xt )t≥0 is adapted to F iff the random variable Xt is Ft -
measurable for all t ≥ 0.
To gain some intuition about these definitions, consider this example.

1a non-empty collection of subsets of Ω closed under complementation and countable unions


2that is, the set {ω : X(ω) ≤ x} belongs to F for all real x
9
Example. Consider the experiment of tossing a coin two times. We can model this
experiment on the sample space Ω = {HH, HT, T H, T T }. The set of all events is the set

F = {∅, {HH}, . . . , {HH, HT }, . . . , {HH, HT, T H}, . . . , {HH, HT, T H, T T }}

of all 24 = 16 subsets of Ω. The probability measure is just the one that assigns P({ω}) = 1/4
equal probability to each elementary event.
The flow of information is modelled by the following sigma-fields
• F0 = {∅, Ω},
• F1 = {∅, {HH, HT }, {T H, T T }, Ω},
• F2 = F.
Now consider a stochastic process (Xt )t∈{0,1,2} that is adapted to the filtration (Ft )t∈{0,1,2} .
Intuitively, the value of the random variable Xt is known once after t tosses of the coin.
For instance, X0 must be a constant,

X0 (ω) = a for all ω ∈ Ω,

since there is no information before the experiment. On the other hand, the random variable
X1 must be of the form

b if ω ∈ {HH, HT }
X1 (ω) =
c if ω ∈ {T H, T T }
since the only information known at time 1 is whether or not the first coin came up heads.
Finally, X2 can be any function on Ω, that is, of the form


 d if ω = HH
e if ω = HT

X2 (ω) =

 f if ω = T H
 g if ω = T T.

Alternatively, on this particular filtered probability space, the adapted process X can be
visualised by the tree diagram:
X d
@

1/2


b /e
? 1/2
1/2

a=
==
==
=
1/2 ==
 1/2
c= /f
==
==
=
1/2 ==

g
Notice that for all t ∈ {0, 1, 2} the event {Xt ≤ x} is in Ft for every real x.
10
For this course, it will be convenient to assume that there is no randomness at time 0.
This can be made formal by assuming
the sigma-field F0 is trivial.
This means that if A is an element F0 then either P(A) = 0 or P(A) = 1. In particular,
every F0 -measurable random variable is almost surely constant. In the discrete-time theory,
there nothing loss by further assuming F0 = {∅, Ω}. However, it turns out that this further
assumption is technically inconvenient in the continuous-time theory.

2. Investment and consumption


To the market described by the adapted process P , we now introduce an investor. Sup-
pose that Hti is the number of shares of asset i held during the interval (t − 1, t]. We will
allow Hti to be either positive, negative, or zero with the interpretation that if Hti > 0 the
investor is ‘long’ asset i and if Hti < 0 the investor is ‘short’ the asset. Also, we do not
demand that the Hti are integers.
We also asssume that the investor consumes ct units of money during the interval (t−1, t].
As is natural, we will insist that ct is non-negative.
To have an economically meaningful theory, we will make some restrictions on the possible
dynamics of the n + 1-dimensional process H = (Ht1 , . . . , Htn , ct )t>0 .
Definition. An investment-consumption strategy is an n-dimensional process predictable
process H satisfying the self-financing (with respect to the market P ) condition
Ht · Pt ≥ Ht+1 · Pt .
Given the strategy H we define the corresponding consumption process c via
ct = (Ht − Ht+1 ) · Pt .
A strategy H is a pure investment strategy of the corresponding predictable consumption
process is identically zero.
Pn i i
(We are using the notation a · b = i=1 a b to denote the usual Euclidean inner (or
dot) product in Rn .) Note that the above definition only makes financial sense thanks to
our assumptions of no market frictions, no dividends, etc. We now recall the definition of a
predictable process:
*****
This definion is purely mathematical but is useful for us because it is the right way of
eliminating clairvoyant investors.
Definition. A stochastic process X = (Xt )t≥1 is predictable 3 (with respect to a filtration
F) if Xt is Ft−1 -measurable for all t ≥ 1.
Remark. Note that the time index set for a predictable process (Xt )t≥1 is (usually)
{1, 2, . . .}, not {0, 1, . . .}. Hence X0 is not necessarily defined.
3The term ‘predictable’ is used in the US, while the synonym ‘previsible’ is more common in the UK. I
am American, so I will use ‘predictable’ out of habit. I hope this will not cause too much confusion.
11
Remark. In discrete time, a process X is predictable if and only if the process Y is
adapted, where Xt = Yt−1 . That is to say, the notion of predictability can be dispensed with
by simply changing notation. However, in continuous time, there is a much deeper difference
between the notions of predictability and adaptedness. Therefore, for the sake of a unified
treatment of the discrete and continuous time cases, we keep it in.

3. A motivating utility maximisation problem and arbitrage


Now that we have our market model and we’ve introduced an investor into this market,
our first challenge is to find out how to invest optimally. We consider one such optimal
investment problem. Obviously, the following set-up can be generalised in many ways, but
since the main motivation for studying this problem is to introduce the very important notion
of a martingale deflator (also called a state price density), we try to keep it simple.
Let T > 0 be some non-random time horizon, and let
T
X
U (c) = E u(ct ),
t=0

where u is a function on [0, ∞). We will suppose that our investor prefers a consumption
stream c to c0 if and only if
U (c) > U (c0 ).
We will assume that u is strictly increasing models the assumption that the investor strictly
prefers more to less. (Usually we also assume that u is strictly concave, so that the investor
is risk-averse, strictly preferring to consume the non-random quantity E(C) to the random
quantity C, for any non-constant random variable C.)
We suppose that investor’s initial wealth is x ≥ 0 given. We also suppose that he will
live exactly to age T , and since he derives no utility from wealth in the afterlife, chooses a
strategy H such that HT +1 = 0 a.s. Summing up, the investor faces the problem
maximise U (c) subject to H0 · P0 = x, (Ht − Ht+1 ) · Pt = ct , and HT +1 = 0.
With this problem in mind, we introduce an important definition:
Definition. An arbitrage is an investment-consumption strategy H such that there
exists a non-random time T > 0 with the properties
• H0 = 0 = HT +1 almost surely and
• P (ct > 0 for some 0 ≤ t ≤ T ) > 0.
where (Ht − Ht+1 ) · Pt = ct .
Note that if H f is a feasible investment strategy for the above investment problem and
if H a is an arbitrage, then H f + H a is also feasible but has strictly higher expected utility
U (cf + ca ) > U (cf ).
Inductively, the strategy H f + kH a is feasible for every k ≥ 0. In particular, if there is an
arbitrage then there cannot be an optimal investment strategy to the utility maximisation
problem.
12
Remark. There are several problems with market models with arbitrages. First, arbi-
trages are scalable: if H is an arbitrage, so is kH for all k > 0. In particular, if there is an
arbitrage, we can extract an arbitrary amount of consumption out of the market by choosing
k as large as we please. However, we have assumed that the investor is small relative to the
market – remember we have agreed to ignore the investor’s price impact. However, clearly
for large enough k the investor is no longer small relative to the order book, and price impact
becomes important.
However, there is a more fundamental objection to admitting arbitrage. In this course, we
take the market price process (Pt )t≥0 as given. However, in reality, prices are set by market
clearing, so that supply equals demand. Typically we think that the supply of shares is
fixed, but the demand for shares arises from investors solving their own utility maximisation
problem. In particular, in equilibrium, prices are such that investors are holding their optimal
portfolio. But as mentioned above, if there is an arbitrage, optimal porfolios do not exist.
Hence, the notion of equilibrium is inconsistent with the existence of arbitrage strategies.
3.1. Motivation: Langrangian duality. As usual in a constrained optimisation prob-
lem, we apply the Lagrangian method. Recall that this involves replacing our given objective
function with the so-called Lagrangian which encodes the constraints on the processes H and
c. In this case the Lagrangian is
T
X
L(H, c, Y ) = E u(ct ) + Yt (Ht · Pt − Ht+1 · Pt − ct )
t=0

To identify the dual problem, we seek to find conditions on the Lagrange multiplier process
Y such that the quantity
sup{L(H, c, Y ) : ct ≥ 0, H predictable }
is finite. To this end, we employ the standard trick of linear programming - we rewrite the
Lagrangian as
T
X T
X
L(H, c, Y ) = E (u(ct ) − Yt ct ) + E Ht · (Pt Yt − Pt−1 Yt−1 ) + xY0 .
t=0 t=1

Now, looking at the first term, we see that if Yt ≤ 0, there would not exist a finite maximum
when we maximise over ct ≥ 0. So we see that the dual variable Y must satisfy
Yt > 0 almost surely for all t ≥ 0
Look at the second term: since Ht is an arbitrary Ft−1 -measurable random vector, the
requirement of a finite maximum leads us to
E(Pt Yt |Ft−1 ) = Pt−1 Yt−1 .
The notation E(X|G) denotes the conditional expectation of the random variable X with
respect to the sigma-field G. The precise definition will be recalled below.
Note that there is nothing rigorous to this argument. The intention of this section is just
to show that the definition of a martingale deflator which will present now follows naturally
from the utility maximisation problem.
13
Definition. A martingale deflator is a strictly positive adapted process Y = (Yt )t≥0
such that the n-dimensional random variable Yt Pt is integrable for each t ≥ 0 and such that
E(Yt Pt |Ft−1 ) = Yt−1 Pt−1
for all t ≥ 1.

*****
We briefly recall some notions from probability.
Definition. Given a probability space (Ω, F, P), let G ⊆ F be a sub-sigma-field of
events. A random variable X : Ω → R is measurable with respect to G ( or briefly, G-
measurable) if and only if the event {X ≤ x} is an element of G for all x ∈ R.
You know what that the conditional expectation of an integrable random variable X
given a non-null event G means
E(X 1G )
E(X|G) =
P(G)
The next theorem leads to a definition of conditional expectation given a sigma-field:
Theorem (Existence and uniqueness of conditional expectations). Let X be an integrable
random variable defined on the probability space (Ω, F, P), and let G ⊆ F be a sub-sigma-field
of F. Then there exists an integrable G-measurable random variable Y such that
E(1G Y ) = E(1G X)
for all G ∈ G. Furthermore, if there exists another G-measurable random variable Y 0 such
that E(1G Y 0 ) = E(1G X) for all G ∈ G, then Y = Y 0 almost surely.
Definition. Let X be an integrable random variable and let G ⊂ F be a sigma-field.
The conditional expectation of X given G, written E(X|G), is a G-measurable random variable
with the property that
E [1G E(X|G)] = E(1G X)
for all G ∈ G.
Example. (Sigma-field generated by a countable partition) Let X be a non-negative
random variable definedSon (Ω, F, P). Let G1 , G2 , . . . be a sequence of disjoint events with
P(Gn ) > 0 for all n and n∈N Gn = Ω.
Let G be the Ssmallest sigma-field containing {G1 , G2 , . . . , ...}. That is, every element of
G is of the form n∈I Gn where I ⊆ N. Then
E(X 1Gn )
E(X|G)(ω) = E(X|Gn ) = if ω ∈ Gn
P(Gn )
where the right-hand side denotes conditional expection given the event Gn .
More concretely, suppose Ω = {HH, HT, T H, T T } consists of two tosses of a coin, and
let G = {∅, {HH, HT }, {T H, T T }, Ω} be the sigma-field containg the information revealed
14
by the first toss. Suppose the coin is fair, so that each outcome is equally likely. Consider
the random variable 

 a if ω = HH
b if ω = HT

X(ω) =

 c if ω = T H
 d if ω = T T.

Then 
(a + b)/2 if ω ∈ {HH, HT }
E(X|G)(ω) =
(c + d)/2 if ω ∈ {T H, T T }

The important properties of conditional expectations are collected below:


Theorem. Let all random variables appearing below be such that the relevant conditional
expectations are defined, and let G be a sub-sigma-field of the sigma-field F of all events.
• linearity: E(aX + bY |G) = aE(X|G) + bE(Y |G) for all constants a and b
• positivity: If X ≥ 0 almost surely, then E(X|G) ≥ 0 almost surely, with almost sure
equality if and only if X = 0 almost surely.
• Jensen’s inequality: If f is convex, then E[f (X)|G] ≥ f [E(X|G)]
• monotone convergence theorem: If 0 ≤ Xn ↑ X a.s. then E(Xn |G) ↑ E(X|G) a.s.
• Fatou’s lemma: If Xn ≥ 0 a.s. for all n, then E(lim inf n Xn |G) ≤ lim inf n E(Xn |G)
• dominated convergence theorem: If supn |Xn | is integrable and Xn → X a.s. then
E(Xn |G) → E(X|G) a.s.
• If X is independent of G (the events {X ≤ x} and G are independent for each x ∈ R
and G ∈ G) then E(X|G) = E(X). In particular, E(X|G) = E(X) if G is trivial.
• ‘slot property’: If X is G-measurable, then E(XY |G) = XE(Y |G). In particular, if
X is G-measurable, then E(X|G) = X.
• tower property or law of iterated expectations: If H ⊆ G then
E[E(X|G)|H] = E[E(X|H)|G] = E(X|H)
Now we come to one of the most important concepts in financial mathematics, the mar-
tingale. A martingale is simply an adapted stochastic process that is constant on average in
the following sense:
Definition. A martingale relative to a filtration F is an adapted stochastic process
M = (Mt )t≥0 with the following properties:
• E(|Mt |) < ∞ for all t ≥ 0
• E(Mt |Fs ) = Ms for all 0 ≤ s ≤ t.
Remark. The above definition of martingale is the same both discrete- and continuous-
time processes. However, if the time index set is discrete T = Z+ , it is an exercise to show
that an integrable process M is a martingale only if E(Mt+1 |Ft ) = Mt for all t ≥ 0. That is,
it is sufficient to verify the conditional expectations of the process one period ahead.
Below are some examples of martingales. Before listing them, it is convenient to introduce
a definition:
15
Definition. Given a stochastic process Y = (Yt )t≥0 , the natural filtration of Y is the
smallest filtration for which Y is adapted. That is, it is the filtration (Ft )t≥0 where
Ft = σ(Ys , 0 ≤ s ≤ t).
In what follows, if a stochastic process is given but a filtration is not explicitly mentioned,
then we are implicitly working with the natural filtration of the process.

Example. Let ξ1 , ξ2 , ξ3 , . . . be independent integrable random variables such that E(ξi ) =


0 for all i. The process (St )t≥0 given by S0 = 0 and
St = ξ1 + . . . + ξt
is a martingale relative to its natural filtration. Indeed, the random variable St is integrable
since
E(|St |) ≤ E(|ξ1 |) + . . . + E(|ξt |)
by the triangular inequality and all the terms in this finite sum are finite by assumption.
Also,
E(St+1 |Ft ) = E(St + ξt+1 |Ft )
= E(St |Ft ) + E(ξt+1 |Ft )
= St + E(ξt+1 ) = St ,
where the conditional expectation E(ξt+1 |Ft ) is replaced by the unconditional expectation
E(ξt+1 ) by the assumption that ξt+1 is independent of Ft = σ(S1 , . . . , St ) = σ(ξ1 , . . . , ξt ).

Example. We now construct one of the most important examples of a martingale. Let
X be an integrable random variable, and let
Mt = E(X|Ft ).
Then M = (Mt )t≥0 is a martingale.
Integrability follows from the theorem on the existence and uniqueness of conditional
expectation. Indeed, not that by Jensen’s inequality
E(|Mt |) = E(|E(X|Ft )|)
≤ E(E(|X| Ft ))
= E(|X|)
Now, for every 0 ≤ s ≤ t we have
E(Mt |Fs ) = E[E(X|Ft )|Fs ]
= E(X|Fs ) = Ms
by the tower property. Notice that this example also works in continuous time.
Sometimes we are given a process (Mt )0≤t≤T where T > 0 is a fixed, non-random time
horizon. To check that this process is a martingale, we need only check that
Mt = E(MT |Ft ) for all 0 ≤ t ≤ T,
because this corresponds to the construction above with X = MT .
16
Example. This last example is theorem shows how to take one martingale and build
another one. Let M be a martingale and let K be a bounded predictable process. Then the
process N defined by
Xt
Nt = Ks (Ms − Ms−1 )
s=1
is a martingale.
Indeed, by assumption, we have E(|Mt |) < ∞ for all t since M is a martingale and that
there exist a constant C > 0 such that |Kt | ≤ C almost surely for all t ≥ 0. Hence
t
X
E(|Nt |) ≤ E(|Ks ||Ms − Ms−1 |)
s=1
Xt
≤ C[E(|Ms |) + E(|Ms−1 |)] < ∞
s=1

Using the predictability of K and the slot property of conditional expectation, we have
E(Nt+1 − Nt |Ft ) = E(Kt+1 (Mt+1 − Mt )|Ft )
= Kt+1 E(Mt+1 − Mt |Ft )
=0
and we’re done.
Remark. The martingale N above is often called a martingale transform or a discrete
time stochastic integral. As we will see, it is one of the key building blocks for the continuous
time theory to come.

4. Arbitrage and the first fundamental theorem


Markets with many arbitrage opportunities would be nice–we all would be a lot richer.
But for the sake of building realistic models, we usually assume that markets are free of
arbitrages. Indeed, recall that we have argued that existence of arbitrage opportunities is
not so good from the point of economic theory: for instance, no agent could possibly hold
his optimal portfolio and hence the market would not be in equilibrium.
The first theorem of the course is the mathematical classification of such market models.
We put ourselves in the context of a market model with n-dimensional price process P . We
begin with a definition, motivated by the heuristic analysis of the dual of a typical optimal
investment problem.
Definition. A martingale deflator is an adapted process Y such that Yt > 0 for all t ≥ 0
almost surely, and such that the n-dimensional process P Y = (Pt Yt )t≥0 is a martingale.
Remark. A martingale deflator is also known as a state price density, a stochastic dis-
count factor or a pricing kernel.
Theorem (First fundamental theorem of asset pricing). A market model has no arbitrage
if and only if there exists a martingale deflator.
17
Proof of the 1FTAP, easier direction. First we suppose that there is a martingale deflator
Y . Let H be an n-dimensional predictable process such that H0 = HT +1 = 0 almost surely
for some non-random T > 0 and such that
ct = (Ht − Ht+1 ) · Pt ≥ 0 almost surely for all 0 ≤ t ≤ T
We need to show that
ct = 0 almost surely for all 0 ≤ t ≤ T.
To this end, let
t
X
Mt = Ht+1 · Pt Yt + cs Y s .
s=0
Note that
T
X
MT +1 = cs Y s .
s=0
Since Ys > 0 for all s, we need only show that MT +1 = 0 almost surely.
Since MT +1 ≥ 0 almost surely, we need only show
E(MT +1 ) = 0
by the pigeon-hole principle.
Now, we rewrite M as
t
X
Mt = Hs · (Ps Ys − Ps−1 Ys−1 ).
s=1

We need only show that M is a martingale, because if so, we have


E(MT +1 ) = M0 = 0.
...... (to be continued)
*****
In the proof above, we note that M is a martingale transform of the predictable process
H with respect to the martingale P Y . However, since H may be unbounded in general, we
cannot yet assert that M is a martingale.
One way out of this problem is to simply assume the sample space Ω is finite. Indeed,
if Ω is finite then H can only take a finite number of values and, in particular, H would be
uniformly bounded. One might argue that we could assuming the sample space is finite is not
such a problem. Indeed, our cartoon model of the financial market is ignoring plenty of other
complications of reality. In particular, since prices really move on a discrete grid (because
tick sizes are positive) and one could assume that prices are bounded, say, by £10100 , so a
large finite sample space might be enough for our modelling needs.
There are a couple reasons why we will strive for results that hold on general probability
set-ups. Firstly, many popular models are based on random variables with continuous dis-
tributions, such as normal random variables. It would be a shame if our theory could not
handle such models. Secondly, often it really is possible to prove general results, and since
these notes are aimed at a mathematical audience, we are trying to state and prove results
with the mimimum of assumptions. The downside, of course, is extra technical work.
18
With that introduction, we begin our study of local martingales. First we start with a
definition.
Definition. A stopping time for a filtration (Ft )t∈T is a random variable τ taking values
in T ∪ {∞} such that the event {τ ≤ t} is Ft -measurable for all t ∈ T.

Example. Obviously, non-random times are stopping times. That is, if τ = t0 for some
fixed t0 ≥ 0, then {τ ≤ t} = Ω if t0 ≤ t and ∅ otherwise.
Example. Here is a typical example of a stopping time. Let (Yt )t≥0 be a discrete-time
adapted process and let A be a Borel set. Then the random variable
τ = inf{t ≥ 0 : Yt ∈ A}
(with the usual convention that inf ∅ = +∞) corresponding to the first time the process
enters the set A is a stopping time. Indeed,
t
[
{τ ≤ t} = {Ys ∈ A}
s=0

is Ft -measurable because each {Ys ∈ A} is Fs -measurable by the adaptedness of Y , and


Fs ⊆ Ft by the definition of filtration.
Stopping times can be used to stop processes.
Definition. For an adapted process X (in discrete or continuous4 time) and a stopping
time τ , the process X τ defined by Xtτ = Xt∧τ is said to be X stopped at τ .
Stopping times interact well martingales: stopped martingales are still martingales.
Proposition. Let X be a discrete-time martingale and let τ be a stopping time. Then
X τ is a martingale.
Remark. A version of this theorem also holds for continuous-time martingales with
continuous sample paths.
Proof. Note that
t
1{s≤τ } (Xs − Xs−1 ).
X
Xtτ = X0 +
s=1
Since the event {t ≤ τ } = {τ ≤ t − 1}c is Ft−1 -measurable by the definition of stopping
time, the process Kt = 1{t≤τ } is predictable. Since X τ is the martingale transform of the
bounded predictable process K with respect to the martingale X, it is a martingale. 
The above result says that the martingale property is stable under stopping. We use this
property as motivation for the following definition.
Definition. A local martingale is an adapted process X = (Xt )t≥0 , in either discrete or
continuous time, such that there exists an increasing sequence of stopping times (τN ) with
τN ↑ ∞ such that the stopped process X τN is a martingale for each N .
4If time is continuous, we also need the extra technical assumption that X is progressively measurable
in order that the map ω 7→ Xτ (ω) (ω) is measurable. Fortunately, it is sufficient to assume that sample paths
of X are continuous, which will be enough for this course.
19
Remark. Note that martingales are local martingales. Indeed, given a martingale X
and any sequence of stopping times τN ↑ ∞, the stopped process X τN is a martingale.

Remark. Note that the local martingale property is also stable under stopping. Indeed,
let X be a local martingale and τ a stopping time. Then by definition, there exists a sequence
of stopping times σN ↑ ∞ such that X σN is a martingale. Hence (X σN )τ = X σN ∧τ is again a
martingale since σN ∧ τ is a stopping time. But note that X σN ∧τ = (X τ )σN , implying that
the sequence of stopping times σN ↑ ∞ is such that (X τ )σN is a martingale. This means X τ
is a local martingale.

The notion of a local martingale allows us to use martingale techniques on processes


that ought to behave like martingales. The following theorem provides a context where local
martingales arise naturally.

Theorem. Suppose X is a discrete-time martingale and K is a predictable process. Let


t
X
Yt = Ks (Xs − Xs−1 )
s=1

for t ≥ 1. Then Y is a local martingale.

Remark. This is the martingale transform as before, but now do not insist that K is
bounded or that X is a true martingale. As a consequence, we cannot assert that Y is a
true martingale, merely a local martingale. The idea is that by localising, we can study the
algebraic and measurability structure of the martingale transform without worrying about
integrability issues.

Proof. Let τN = inf{t ≥ 0 : |Kt+1 | > N } with the convention inf ∅ = +∞. Note that
τN is a stopping time since K is predictable. Now writing
t
Ks 1{s≤τN } (Xs − Xs−1 )
X
YtτN =
s=1

we see that the stopped process is the martingale transform of the bounded predictable
process (Kt 1{t≤τN } )t≥1 with respect to the martingale X, and hence is a martingale. 

Remark. It is an exercise to show that if we assume that X is a local martingale in the


above theorem (and not necessarily a true martingale) we still can conclude (by essentially
the same argument) that the process Y is still a local martingale.

The next theorem gives a sufficient condition that a local martingale is a true martingale.

Theorem. Let X be a local martingale in either discrete or continuous time. Let Yt be


a process such that |Xs | ≤ Yt almost surely for all 0 ≤ s ≤ t. If E(Yt ) < ∞ for all t ≥ 0,
then X is a true martingale.

Proof. Let (τN )N be a localising sequence of stopping times for X. Note that Xt∧τN →
Xt a.s. since τN ↑ ∞. Furthermore, by assumption |Xt∧τN | ≤ Yt which is integrable, so we
20
may apply the conditional version of the dominated convergence theorem to conclude
E(Xt |Fs ) = E(lim Xt∧τN |Fs )
N
= lim E(Xt∧τN |Fs )
N
= lim Xs∧τN
N
= Xs
for 0 ≤ s ≤ t, where we have used the fact that the stopped process (Xt∧τN )t≥0 is a martingale.

The following corollary is useful:
Corollary. Suppose X is a DISCRETE-TIME local martingale such that E(|Xt |) < ∞
for all t ≥ 0. Then X is a true martingale.
Proof. Let Yt = |X0 | + . . . + |Xt |. The process Y is integrable by assumption and
|Xs | ≤ Yt for all 0 ≤ s ≤ t. The conclusion follows from the previous theorem. 
In the absense of integrability, the next best property is non-negativity. First we need
some definitions.
Definition. A supermartingale relative to a filtration (Ft )t≥0 is an adapted stochastic
process (Ut )t≥0 with the following properties:
• E(|Ut |) < ∞ for all t ≥ 0
• E(Ut |Fs ) ≤ Us for all 0 ≤ s ≤ t.
A submartingale is an adapted process (Vt )t≥0 with the following properties:
• E(|Vt |) < ∞ for all t ≥ 0
• E(Vt |Fs ) ≥ Vs for all 0 ≤ s ≤ t.
Remark. Hence a supermartingale decreases on average, while a submartingale increases
on average. A martingale is a stochastic process that is both a supermartingale and a
submartingale.
As in the case of the definition of martingale, to show that an adapted, integrable process
U is a supermartingale in discrete time, it is enough to show that E(Ut+1 |Ft ) ≤ Ut for all
t ≥ 0.
Theorem. Suppose X is a local martingale in either continuous or discrete time. If
Xt ≥ 0 for all t ≥ 0, then X is a supermartingale.
Proof. In the general case, let (τN )N be the localising sequence for X. First we show
that Xt is integrable for each t ≥ 0. Fatou’s lemma yields
E(|Xt |) = E(Xt )
= E(lim Xt∧τN )
N
≤ lim inf E(Xt∧τN )
N
= X0 < ∞.
21
Now that we have established integrability, we can discuss conditional expectations. The
conditional version of Fatou’s lemma yields
E(Xt |Fs ) = E(lim Xt∧τN |Fs )
N
≤ lim inf E(Xt∧τN |Fs )
N
= lim inf Xs∧τN
N
= Xs
for 0 ≤ s ≤ t, as claimed. 

As before, discrete time local martingales are particularly nice:


Corollary. If X is a DISCRETE-TIME local martingale such that Xt ≥ 0 a.s. for all
t ≥ 0, then X is a martingale.
Proof. By the above theorem, we have that E(|Xt |) = E(Xt ) ≤ X0 < ∞. Since X is
integrable, the previous corollary implies X is a martingale. 
Theorem. Suppose that
t
X
Yt = Y0 + Ks (Xs − Xs−1 )
s=1

where K is predictable and X is a martingale. If YT ≥ 0 a.s. for some non-random T > 0,


then (Yt )0≤t≤T is a true martingale.
Proof. Just as before, let τN = inf{t ≥ 0 : |Kt+1 | > N }. Note Ys 1{t≤τN } is integrable
for all 0 ≤ s ≤ t, since X is integrable by definition of martingale, and Ks is bounded on
{t ≤ τN }. Hence we have
0 ≤ E[YT 1{τN ≤T } |FT −1 ]
= E[YT −1 1{T ≤τN } + KT 1{T ≤τN } (XT − XT −1 )|FT −1 ]
= YT −1 1{T ≤τN } + KT 1{T ≤τN } E[XT − XT −1 |FT −1 ]
= YT −1 1{T ≤τN } .
If YT ≥ 0 we have
YT −1 1{T ≤τN } = E[YT 1{τN ≤T } |FT −1 ]
≥0
Taking N → ∞ shows YT −1 ≥ 0, induction shows that Yt ≥ 0 for all 0 ≤ t ≤ T . Therefore
(Yt )0≤t≤T is a non-negative local martingale in discrete time and hence a true martingale. 

*****
The final step of the proof of the easier direction of the first fundamental theorem of
asset pricing is now complete.
22
5. Proof the harder direction of 1FTAP
This is the one period case. The full multi-period proof is a little more difficult because
of some technicalities involving measurability.
Recall that an arbitrage with T = 1 is a process (Ht )0≤t≤2 such that H0 = H2 = 0 and
where P(c0 > 0 or c1 > 0) > 0 where
c0 = −H1 · P0 ≥ 0 and c1 = H1 · P1 ≥ 0 a.s.
We now suppose that the market has no arbitrage, so that for any vector H ∈ Rn such that
H · P0 ≤ 0 ≤ H · P1 a.s. it must be the case that H · P0 = 0 = H · P1 a.s. We will show that
this implies there exists a random variable Z > 0 a.s. so that
E(P1 Z) = P0 .
Then the process (Yt )0≤t≤1 is a martingale deflator, where Y0 = 1 and Y1 = Z.
Define a function F : Rn → R by
F (h) = eh·P0 + E[e−h·P1 ζ]
2
where the random variable 0 < ζ ≤ e−kPt k /2 is introduced to ensure integrability. Notice
that F is finite valued and smooth.
We will show that no investment-consumption arbitrage implies that the function F has
a minimiser H ∗ . By the first order condition for a minimum, we have
∗ ·P ∗ ·P
0 = ∇F (H ∗ ) = eH 0
P0 − E[e−H 1
ζPt |Ft−1 ]
and hence we may take
∗ ∗
Z = e−H ·P0 −H ·P1 ζ.
So let (Hk )k be a sequence such that F (Hk ) → inf H F (H). If (Hk )k is bounded, we can
pass to a convergent subsequence, by the Bolzano–Weierstrass theorem, such that Hk → H ∗ .
By the smoothness of F we have
inf F (H) = lim F (Hk ) = F (lim Hk ) = F (H ∗ )
H k k

so H is our desired minimiser.
It remains to show that no arbitrage implies that the sequence (Hk )k is bounded. So for
the sake of finding a contradiction, suppose (Hk )k is unbounded.
We can pass to a subsequence such that kHK k ↑ ∞. Now let
U = {u ∈ Rn : u · P0 = 0 = u · P1 a.s.} ⊆ Rn
and let
V = U ⊥.
Notice that if u ∈ U and v ∈ V then F (u + v) = F (v). Hence, we may assume Hk ∈ V for
all k.
Now let
Hk
Ĥk = .
kHk k
Note that kĤk k = 1 and that Ĥk ∈ V. Since (Ĥk )k is bounded, we can again pass to a
convergent subsequence such that Ĥk → Ĥ. Notice once more that kĤk = 1 and that
Ĥ ∈ V.
23
We know that the sequence F (Hk ) is bounded (since it is convergent) but we also have
F (Hk ) = (eĤk ·P0 )kHk k + E[(e−Ĥk ·P1 )kHk k ζ]
so we must conclude that Ĥ · P0 ≤ 0 ≤ Ĥ · P1 a.s. (since otherwise the right-hand side would
blow up).
By the assumption of no arbitrage we conclude that Ĥ · P0 = 0 = Ĥ · P1 a.s., which
means Ĥ ∈ U. But we also know that Ĥ ∈ V. Since the subspaces are orthogonal, we
have U ∩ V = {0}, and in particular, we have H = 0. But this contradicts the fact that
kĤk = 1. 
6. Numéraires and equivalent martingale measures
In this section, we introduce the concepts of numéraire assets and equivalent martingale
measures. The primary purpose of this section is to reconcile concepts and terminology used
by other authors to the theory developed so far. We will also find that equivalent martingale
measures can be used to simplify some calculations later in the course.
In most discussions of arbitrage theory, there is the assumption that at least one asset is
a numéraire:
Definition. An asset is a numéraire iff its price is strictly positive for all time, almost
surely.
Having a numéraire in the market simplifies the story in some ways. For instance, when
we discuss arbitrage theory, we no longer have to allow for intermediate consumption.
Definition. A pure investment arbitrage is a predictable process H such that for some
non-random time horizon T > 0 we have
• H0 · P0 = 0 ≤ HT · PT a.s.
• P(HT · PT > 0) > 0
where H satisfies the self-financing condition
(Ht − Ht+1 ) · Pt = 0 for all 0 ≤ t ≤ T − 1.
Proposition. Suppose the market model has a numéraire asset. There exists a pure-
investment arbitrage if and only if there exists an investment-consumption arbitrage.
Proof. First let (Kt )0≤t≤T be a pure-investment arbitrage for the time horizon T > 0.
By setting KT +1 = 0 and cT = KT · PT , we have an investment-consumption arbitrage
(Kt )0≤t≤T +1 .
So, suppose (Ht )0≤t≤T +1 is an investment-consumption arbitrage. We can order the assets
such that P = (N, S) where the first asset is the numéraire with positive price process N
and S is the n − 1 dimensional process of the remaining asset prices. Let η = (1, 0, . . . , 0) so
that N = η · P .
To find the pure-investment arbitrage, the idea is to let K be the strategy that consists of
holding at time t the portfolio Ht but instead of consuming the amount ct = (Ht − Ht+1 ) · Pt ,
this money instead is invested into the numéraire portfolio. In notation, K is defined by
t−1
X cs
Kt = Ht + η
s=0
Ns
24
Note that
ct
(Kt − Kt+1 ) · Pt =(Ht − Ht+1 ) · Pt − η · Pt
Nt
=0
so K is a pure investment strategy. Finally, note that since HT +1 = 0, then
T
X cs
KT · PT = KT +1 · PT = NT ≥ 0.
s=0
Ns

In particular, (Kt )0≤t≤T is a pure-investment arbitrage. 


To define an equivalent martingale measure, we begin with another definition:
Definition. Let (Ω, F) be a measurable space and let P and Q be two probability
measures on (Ω, F). The measures P and Q are equivalent, written P ∼ Q, iff
P(A) =⇔ Q(A) = 1
iff
P(A) = 0 ⇔ Q(A) = 0.
The above definition says that equivalent probability measures have the same almost sure
events. Complementarily, equivalent probability measures have the same null sets.
It turns out that equivalent measures can be characterised by the following theorem.
When there are more than one probability measure floating around, we use the notation EP
to denote expected value with respect to P, etc.
Theorem (Radon–Nikodym theorem). The probability measure Q is equivalent to the
probability measure P if and only if there exists a P-a.s. and Q-a.s. positive random variable
Z such that
Q(A) = EP (Z 1A )
for each A ∈ F.
The random variable Z is called the density, or the Radon–Nikodym derivative, of Q with
respect to P, and is often denoted
dQ
Z= .
dP
In fact, P also has a density with respect to Q given by
dP 1
= .
dQ Z
We only need the easy direction of the theorem, that the existence of a positive density
implies equivalence, for this course. Here is a proof. The proof of the harder direction is
omitted since we do not need it.
Proof. Suppose P(Z > 0) = 1 and that EP (Z) = 1. Define a set function Q by
Q(A) = EP (Z 1A ).
25
Note that Q is countably additive by the monotone convergence theorem. Also, Q(Ω) =
EP (Z) = 1, so Q is a probability measure. If P(A) = 0, then the event {1A = 0} is P-almost
sure and hence
Q(A) = EP (Z 1A ) = 0.
Conversely, if Q(A) = 0 we can conclude that {Z 1A = 0} is P-a.s. by the pigeon-hole
principle since {Z 1A ≥ 0} is P-a.s. But since {Z > 0} is P-a.s., we must conclude that
{1A = 0} is P-a.s., i.e. P(A) = 0. Thus Q and P are equivalent. 
Example. Consider the sample space Ω = {1, 2, 3} with the set F of events all subsets
of Ω. Consider probability measures P and Q defined by
• P{1} = 21 , P{2} = 21 , and P{3} = 0
1 999
• Q{1} = 1000 , Q{2} = 1000 , and Q{3} = 0.
dQ
Then P and Q are equivalent. We may take their density Z = dP
to be
1 999
Z(1) = , Z(2) = , Z(3) = 0.
500 500
(Since both measures don’t ‘see’ the event {3}, we can let Z(3) be any value.)

*****
Now let’s return to our financial model.
Definition. Let P be a market model defined on a probability space (Ω, F, P). The
measure P is called the objective (or historical or statistical ) measure for the model.
Suppose that we can write our asset price process as P = (N, S) where N is a positive
adapted process (the price of a numéraire) and S is an adapted d dimensional process.
An equivalent martingale measure relative to this numéraire is any probability measure Q
equivalent to P such that the discounted price processes
 
St
Nt t≥0
is a martingale under Q.
Remark. In many accounts of arbitrage theory, the concept of an equivalent martingale
measure has taken centre stage. I believe that its importance has been overstressed. In
particular, it is a numéraire-dependent concept, unlike that of a martingale deflator. For
instance, if there are two assets that both numéraires (for example from the point of view
of a British trader, both the euro and the US dollar are numéraires) then one must be very
careful to specify which one is the numéraire.
Theorem (First Fundamental Theorem of Asset Pricing when there is a numéraire). The
market model (Pt )0≤t≤T has no arbitrage if and only if there exists an equivalent martingale
measure relative to a fixed numéraire.
Proof. We already know that there is no arbitrage if and only if there exists a martin-
gale deflator. We now show that there is essentially a one-to-one correspondence between
martingale deflators and equivalent martingale measures once a finite horizon T > 0 is
specified.
26
Let Y be a process such that {YT > 0} is P-a.s. and such that YT PT is P-integrable.
Define a new measure Q by the density
dQ YT NT
= P .
dP E (YT NT )
Our analysis turns on the Bayes formula
 
PT EP (PT YT |Ft )
E Q
|Ft = P
NT E (NT YT |Ft )
Suppose Y is a martingale deflator. In this case
EP (PT YT |Ft ) = Pt Yt
and in particular
EP (NT YT |Ft ) = Nt Yt .
By the Bayes formula we have  
PT Pt
E Q
|Ft =
NT Nt
and hence P/N is a Q-martingale, i.e. Q is an equivalent martingale measure.
Conversely, suppose Q is an equivalent martingale measure. Let
 
dQ
Zt = E P
|Ft .
dP
Note that Z is a positive P-martingale. Let
Yt = Zt /Nt .
Since the random variable PT /NT is Q-integrable by the definition of martingale, we can
conclude that PT YT is P-integrable. Furthermore, the process Y is positive and satisfies
EP (NT YT |Ft ) = EP (ZT |Ft )
= Zt
= Nt Yt .
Hence by the Bayes formula
 
PT
E (PT YT |Ft ) = E
P Q
|Ft EP (NT YT |Ft )
NT
Pt
= (Nt Yt )
Nt
= Pt Y t
so that P Y is a P-martingale and hence Y is a martingale deflator. 
Remark. Notice that the statement of the version of the fundamental theorem above is
for a finite horizon model, as opposed to the version presented earlier. Here is an example
that shows that there might be no arbitrage but there does not exist an equivalent martingale
measure over the infinite horizon.
Let ξ1 , ξ2 , . . . be independent random variables with
P(ξi = 1) = p = 1 − q = P(ξi = −1)
27
and let St = ξ1 +. . .+ξt be a simple random walk, where we assume that it is not symmetrical
p 6= q.
Define a market model a two asset model with respect to the natural filtration Ft =
σ(ξ1 , . . . , ξt ) by
Pt = (1, St ).
In particular, there is a numéraire with constant price Nt = 1, which can be interpreted as
cash.
First let us compute all martingale deflators for the model. Fix t and ξ1 , . . . , ξt and let
Zu = Yt+1 /Yt if ξt+1 = 1, and Zd = Yt+1 /Yt if ξt+1 = −1.
Since P Y is a martingale, we have
Yt Zu p + Yt Zd q = Yt
(St + 1)Yt Zu p + (St − 1)Yt Zd q = St Yt
so that Zu = 1/(2p) and Zd = 1/(2q). Hence, we have shown that all martingale deflators
satisfy
Yt+1 = Yt (4pq)−1/2 (q/p)ξt+1 /2
and hence
Yt = Y0 (4pq)−t/2 (q/p)St .
Now fix a horizon T > 0 and let PT be the restriction of P to FT . Let QT be the
equivalent measure on FT with density
dQT
= YT /Y0 .
dPT
By the above discussion, QT is the equivalent martingale measure for the finite horizon
model (Pt )0≤t≤T . It is an easy computation to verify that under the measure QT , the random
variables ξ1 , . . . , ξT are independent with
1
QT (ξi = 1) = = QT (ξi = −1).
2
Let us consider the measure Q on F with the property that the random variables ξ1 , ξ2 , . . .
are independent with
1
Q(ξi = 1) = = Q(ξi = −1),
2
so that QT is the restriction of Q to FT . Is this measure Q an equivalent martingale for the
infinite horizon model (Pt )t≥0 ? While it is true that P is a Q-martingale, it is not true that
P and Q are equivalent. Indeed,
   
St St
P → p − q = 1, but Q → 0 = 1.
t t
Since we have assumed p 6= q, we see that these measures are inequivalent! Indeed, note that
P (Yt → 0) = 1, but Q (Yt → ∞) = 1.
As a parting shot, we introduce some definitions which are used in the financial mathe-
matics literature.
28
Definition. An asset in a discrete-time market model is risk-free if its price process is
predictable.
Definition. An equivalent martingale measure with respect to a risk-less numéraire is
called a risk-neutral measure.

29
CHAPTER 2

Pricing and hedging contingent claims in discrete-time models

A contingent claim is any cash payment where the size of the payment is contingent on
the prices of other assets or any other variable (for instance, the weather). There are two
major types of contingent claims that we will study in these notes: European and American.
European: specified by a time horizon T > 0 and FT -measurable random variable
ξT modelling the payout at the maturity date T .
American: specified by a time horizon T > 0 and an adapted process (ξt )0≤t≤T where
ξt models the payout of the claim if the owner of the claim chooses to exercise at
time t.

Example (Call option). A European call option gives the owner of the option the right,
but not the obligation, to buy a given stock at some fixed time T at some fixed price K,
called the strike of the option. Let ST denote the price of the stock at the maturity date
T . There are two cases: If K ≥ ST , then the option is worthless to the owner since there
is no point paying a price above the market price for the underlying stock. On the other
hand, if K < ST , then the owner of the option can buy the stock for the price K from the
counterparty and immediately sell the stock for the price ST to the market, realising a profit
of ST − K. Hence, the payout of the call option is ξT = (ST − K)+ , where a+ = max{a, 0}
as usual. The ‘hockey-stick’ graph of the function g(x) = (x − K)+ is below.

An American call option gives the owner of the option the right, but not the obligation,
to buy a given stock at any time t ∈ [0, T ] at some fixed strike price K. By the argument
above, the payout of the call option exercised at time t is given by ξt = (St − K)+ .

1. European claims and the second fundamental theorem of asset pricing


Imagine that you find yourself in a market with prices P = (Pt )t≥0 , and you would like
to sell a European contingent claim with payout ξT . What price should you ask at time 0
31
to off-set this liability at time-T ? One criterion would be to ask for enough money to offset
the cost of hedging away the liability by trading in the market.
Definition. An investment-consumption strategy super-replicates a European contin-
gent claim with payout ξT if there exists an investment-consumption strategy H such that
HT · PT ≥ ξT a.s.
The next theorem says that in an arbitrage-free market we can compute the amount of
initial capital needed to super-replicate a given claim:
Theorem. Suppose the market is free of arbitrage, and suppose the process (ξt )0≤t≤T
has the property that ξY is supermartingale for each martingale deflator Y such that ξY is
integrable. Then there exists an investment-consumption strategy H such that
Ht+1 · Pt ≤ ξt ≤ Ht · Pt a.s. for all 0 ≤ t ≤ T.
In particular, the initial capital needed to super-replicate the claim with payout ξT is at most
ξ0 .
Remark. Given an FT -measurable random variable ξT , we can find the minimal process
(ξt )0≤t≤T such that ξY is supermartingale for each martingale deflator Y as follows: let
 
1
ξt = ess sup E(ξT YT |Ft ) : Y a martingale deflator such that ξT YT is integrable .
Yt
The notation ess sup denotes the essential supremum and will be explained below. In the
mean time, assuming that (ξt )0≤t≤T is finite-valued we can see that ξY is a supermartingale
for each Y by first noting that we can express a martingale deflator as a product Yt =
Y0 Z1 · · · Zt where
E(Zt Pt |Ft−1 ) = Pt−1 .
Hence we can apply the dynamic programming principle to assert that
ξt = ess sup {E(Zt+1 · · · ZT ξT |Ft ) : Zt+1 , . . . , ZT }
= ess sup {E(Zt+1 ξt+1 |Ft ) : Zt+1 }
Remark. We now explain the notion of essential supremum used above. Let X be a
collection of random variables, and let
X̄(ω) = sup{X(ω) : X ∈ X } for all ω ∈ Ω.
Note that
• X̄ ≥ X everywhere for all X ∈ X .
• If Y ≥ X everywhere for all X ∈ X then Y ≥ X̄ everywhere.
But there is a problem: if the collection X is uncountable, then X̄ may not be a random
variable, i.e. a measurable function on Ω. (For example, let Ω = [0, 1] with P Lebesgue
measure. Let A be a subset of [0, 1], and let
Z = {1{t} (ω) : t ∈ A}.
Then
Z̄(ω) = sup{Z(ω) : Z ∈ Z}
= sup{1{t} (ω) : t ∈ A} = 1A (ω).
32
Then Z̄ is a random variable if and only if A ⊂ [0, 1] is measurable.)
But we have measure-theoretic work around:
Theorem. Let X be a collection of random variables. There exists random variable X̂
which is valued in R ∪ {+∞} and such that
• X̂ ≥ X almost surely for all X ∈ X .
• If Y ≥ X almost surely for all X ∈ X then Y ≥ X̄ almost surely.
(The proof will be outlined in the second example sheet.)
Definition. We let X̂ = ess sup X , the essential supremum.
(Returning to the example, we see that Z = 0 almost surely for all Z ∈ Z, so it follows
that ess sup Z = 0.)
Proof. (The T = 1 case) Suppose E(Y1 ξ1 ) ≤ Y0 ξ0 for every martingale deflator such
that Y1 ξ1 is integrable. We need to show that there exists a H ∗ ∈ Rn such that H ∗ · P0 ≤ ξ0
and H ∗ · P1 ≥ ξ1 a.s.
To that end, let
Fγ (H) = e−γ(ξ0 −H·P0 ) + E[e−γ(H·P1 −ξ1 ) ζ]
2 2
where the factor 0 < ζ ≤ e−kP1 k −ξ1 is introduced to ensure integrability. (This function
is motivated by the utility maximisation problem introduced in the last chapter, where
u(x) = −e−γc . The parameter γ > 0 is the investor’s risk aversion. We plan to send
γ → +∞, which corresponds the limit where the investor can tolerate no losses.)
For each γ > 0, by the proof of the first fundamental theorem of asset pricing, there
exists a unique Hγ ∈ V such that
Fγ (Hγ ) = inf Fγ (H),
H

where
V = {u ∈ Rn : u · P0 = 0 = u · P1 a.s.}⊥ .
By the first order condition for a minimum ∇Fγ (Hγ ) = 0, we see that by setting
Y0γ = eγ(Hγ ·P0 −ξ0 ) and Y1γ = eγ(ξ1 −Hγ ·P1 ) ζ
we have found a martingale deflator. Note that

Fγ (h)|h=Hγ = Y0γ (Hγ · P0 − ξ0 ) + E[Y1γ (ξ1 − Hγ · P1 )]
∂γ
= Hγ · (Y0γ P0 − E[Y1γ P1 ]) + E[Y1γ ξ1 ] − Y0γ ξ0
≤0
by since Y γ is a martingale deflator and the assumption that Y γ ξ is a supermartingale.
Also note that γ 7→ Hγ is differentiable. (Indeed, recall that Hγ is defined as the root
of the function ∇Fγ : V → V, and D2 Fγ is a strictly positive definite operator on V, so the
differentiability of Hγ follows from the implicit function theorem.) Furthermore,
Fγ (Hγ ) ≤ Fγ (Hγ±ε )
33
since Hγ is the minimiser of Fγ and hence

Fg (Hγ )|g=γ = 0.
∂γ
Putting this together implies γ 7→ Fγ (Hγ ) is nonincreasing, and in particular
sup Fγ (Hγ ) < ∞.
γ≥1

Now we consider the sequence (Hk )k where the risk-aversion parameter takes the values
γ = k ∈ N.
If (Hk )k is bounded, then we can find a convergent subsequence such that Hk → H ∗ .
Note that since
Fk (Hk ) = (eHk ·P0 −ξ0 )k + E[(eξ1 −Hk ·P1 )k ζ]
we have by the boundedness of the sequence that ξ0 ≥ H ∗ · P0 and ξ1 ≤ H ∗ · P1 a.s.
So it remains to rule out the case that the sequence (Hk )k is unbounded. Suppose that
it was unbounded. Then we can pass to a subsequence that kHk k ↑ ∞. Again, let
Hk
Ĥk =
kHk k
and pass to a subsequence such that Ĥk → Ĥ. Note that we have that Ĥ ∈ V and that
kĤk = 1. But by the formula
ξ ξ1
Ĥk ·P0 − kH0 k kkHk k −Ĥk ·P1 kkHk k
Fk (Hk ) = (e k ) + E[(e kHk k ) ζ]
we see that boundedness forces Ĥ · P0 ≤ 0 ≤ Ĥ · P1 . By no arbitrage, we have Ĥ · P0 = 0 =
Ĥ · P1 a.s. Since Ĥ ∈ V we conclude that Ĥ = 0, contradicting kĤk = 1. 
With this motivation, we introduce an important class of claims that can be perfectly
hedged:
Definition. A European contingent claim with payout ξT is replicable or attainable iff
there exists a pure investment strategy H such that HT · PT = ξT almost surely.
One of the reasons to single out attainable claims is that there is an unambiguous way
to price them according to the no-arbitrage principle:
Theorem. Suppose that the market model with n-dimensional price process P has no
arbitrage. Let ξT be the payout of an attainable European contingent claim with maturity
date T > 0, and let H be the n-dimensional replicating strategy.
Suppose the claim has price ξt for 0 ≤ t ≤ T . If the augmented market with (n + 1)-
dimensional price process (P, ξ) has no arbitrage, then
ξt = Ht · Pt almost surely for all 0 ≤ t ≤ T
Proof. Let X = H · P . The idea is to that if Xt 6= ξt for some t, then there would be
an arbitrage in the augmented market. To construct an arbitrage wait until the first time
that the price of the replicating portfolio differs from the price of the claim, and then buy
the cheap one, sell the expensive one and pocket the difference.
34
In mathematical notation, fix a T > 0 and let τ = inf{0 ≤ t ≤ T : Xt 6= ξt }, with the
usual convention that inf ∅ = +∞. Consider the (n + 1)-dimensional investment strategy
H̄t = sign(ξτ − Xτ )1{t>τ } (Ht , −1)
and consumption ct = |ξτ − Xτ |1{t=τ +1} .
Let X̄ = H̄ · (P, ξ) and note that X̄0 = X̄T = 0. If the augmented market has no
arbitrage, then ct = 0 a.s. for all t, implying τ = ∞ a.s. as claimed. 

Example. (Put-call parity formula) Suppose we start with a market with three assets
with prices (Bt,T , St , Ct )0≤t≤T . The first asset is a bond with maturity date T and unit
principal value, so that in particular, BT,T = 1 almost surely. The next asset is a stock. The
last asset is a call option on that stock with strike K and maturity T , so that CT = (ST −K)+ .
Suppose that this market is free of arbitrage.
Now we introduce another claim, called a put option. A put option gives the owner of
the option the right, but not the obligation, to sell the stock for a fixed strike price at a fixed
maturity date. If the strike is K and maturity date is T , then a similar argument as we used
for the call option, the payout of a put option is PT = (K − ST )+ .
It turns out that the put option is replicable in the market (B, S, C). Indeed, we have
the identity
PT = (K − ST )+
= K − ST + (ST − K)+
= KBT,T − ST + CT
= (K, −1, +1) · (BT,T , ST , CT ).
Hence Ht = (K, −1, +1) for all 1 ≤ t ≤ T is a replicating strategy.
Now, suppose we want to assign prices Pt to the put for 0 ≤ t < T . The above theorem
says there is no arbitrage in the augmented market (B, S, C, P ) if and only if
Pt − Ct = KBt,T − St .
This is the famous put-call parity formula.
A difficulty in using the above theorem for pricing an attainable contingent claim is that
it requires knowing the replicating strategy. The following theorem gives a formula for the
no-arbitrage price of the claim which does not require knowledge of this strategy, just that
it exists.
Theorem. Suppose that the market model with n-dimensional price process P has no
arbitrage, and let ξT be the payout of an attainable European contingent claim with maturity
date T > 0. The claim is attainable if and only if there exists an x ∈ R such that
E(YT ξT ) = Y0 x
for all martingale deflators Y such that YT ξT is integrable.
Proof. (‘only if’ direction) Since the claim is attainable there exists a pure investment
strategy such that HT · PT = ξT a.s. Note that H · P Y is a local martingale from our
calculation in the last chapter. And from result in the example sheet, we see that the
35
assumption that YT ξT is integrable is sufficient to conclude that H · P Y is a true martingale.
In particular, we have
E(YT ξT ) = E(HT · PT YT ) = xY0
for any Y , where x = H0 · P0 is the initial cost of replication.
(‘if’ direction) Define ξˆ by
 
ˆ 1
ξt = ess sup E(ξT YT |Ft ) : Y a martingale deflator
Yt
ˆ is a supermartingale. Similarly, let
and note that ξY
 
1
ξˇt = ess inf E(ξT YT |Ft ) : Y a martingale deflator
Yt
and note that ξY ˇ is a submartingale. Since for all ξˆT = ξT = ξˇT and ξˆ0 = x = ξˇ0 , we can
conclude that ξˆ = ξ.ˇ Letting ξ = ξˆ = ξˇ we have proven that ξY is a martingale for all Y .
Now, there exists an investment-consumption strategy H such that ξt ≤ Ht · Pt almost
surely for all t ≥ 1. Now fix one such Y and let
t
X
Mt = −Yt ξt + Ht+1 · Pt + (Hs − Hs+1 ) · Ps Ys
s=1
t−1
X
= (Ht · Pt − ξt )Yt + (Hs − Hs+1 ) · Ps Ys .
s=1

In particular, note that M is a local martingale by our usual calculations such that Mt ≥ 0
for t ≥ 1, and hence M is a true martingale. However,
E(Mt ) = (H1 · P0 − ξ0 )Y0 ≤ 0,
since Ht+1 · Pt ≤ ξt , and hence Mt = 0 for all t ≥ 0. The conclusion follows.

For the sake of comparison, consider the following result:
Theorem. Suppose that the market model with n-dimensional price process P has no
arbitrage. Let ξT be the payout of (not necessarily attainable) contingent claim with maturity
date T > 0.
Suppose the claim has price ξt for 0 ≤ t ≤ T and that the augmented market with (n + 1)-
dimensional price process (P, ξ) has no arbitrage. Then there exists a martingale deflator Y
of the original market such that
1
ξt = E(ξT YT |Ft )
Yt
for all 0 ≤ t ≤ T .
Proof. This is just the first fundamental theorem of asset pricing applied to the aug-
mented market with prices (P, ξ). 
Remark. The message is this: if a claim is attainable it can be priced with any mar-
tingale deflator. On the other hand, the most one can say for a general claim is that there
exists some martingale deflator that prices the claim.
36
Since attainable claims have unique no-arbitrage prices, we single out the markets for
which every claim is attainable:
Definition. A market is complete if and only if every European contingent claim is
attainable. A market is incomplete otherwise.
We can characterise complete markets:
Theorem (Second Fundamental Theorem of Asset Pricing). An arbitrage-free market
model is complete if and only if there exists a unique martingale deflator Y such that Y0 = 1.
Proof. Suppose that there is a unique martingale deflator such that Y0 = 1. Let ξT be
any FT -measurable random variable. By the flexibility of the proof of the first fundamental
theorem, we can choose the random variable ζ in such a way that we may suppose that ξT YT
is integrable. In particular, there is a number x such that
x = E(YT ξT )
for all (the unique) martingale deflators with Y0 = 1. By the characterisation of attainable
claims, there exists a pure-investment strategy such that H · P = ξ. Hence the market is
complete.
Conversely, suppose that the market is complete. Let Y and Y 0 be martingale deflators
such that Y0 = Y00 = 1. Fix a T > 0. By completeness there exists a pure-investment
strategy H such that
HT · PT = (YT − YT0 )Z
1
where Z = (YT +Y 0 2 . (The factor Z will be used to insure integrability later.)
T)
Since H · P Y is a local martingale which is integrable at time T , it is a true martingale
by the example sheet. In particular,
H0 · P0 = E[(YT − YT0 )ZYT ].
By the same argument with Y 0 we have
H0 · P0 = E[(YT − YT0 )ZYT0 ].
Subtracting yields
E[(YT − YT0 )2 Z] = 0
so by the pigeon-hole principle we have P(YT = YT0 ) = 1 as desired. 

This box summarises the fundamental theorems:


1FTAP: No arbitrage ⇔ Existence of martingale deflator
2FTAP: Completeness ⇔ Uniqueness of martingale deflator

Complete markets are convenient for a variety of reasons. For instance, complete markets
have a riskless numéraire portfolio:
Proposition. Suppose the arbitrage-free market model P is complete. Then there exists
a pure-investment strategy η such that the process β = η·P is strictly positive and predictable.
37
Proof. By completeness, zero-coupon bonds can be attained. That is, for each T > 0
there exists a pure-investment strategy H T such that HTT · PT = 1 a.s. By no-arbitrage, the
bonds are numéraires: BtT = HtT · Pt > 0 a.s. for all 0 ≤ t ≤ T . Now define the bank account
process β by
Yt
βt = (1 + rs )
s=1
where
1
rt = t
− 1.
Bt−1
Note that β is predictable and strictly positive. Furthermore, let ηt = βt Htt . This portfolio
corresponds to holding the βt units of the bond with maturity t during the period (t − 1, t]
just before its maturity.
First note that βt = ηt · Pt since Btt = Htt · Pt = 1. Finally note that the predictable
process η is a self-financing pure-investment strategy since
t+1
ηt+1 · Pt = βt+1 Ht+1 · Pt
= βt+1 Htt+1 · Pt (since H t+1 is pure-invest.)
= βt+1 Btt+1
= βt
as desired. 
In discrete time models complete markets have even more (arguably too much) structure:
Theorem. If the market model P with n assets is complete, then for each t ≥ 0 the
probability space Ω can be partitioned into no more than nt Ft -measurable events of positive
probability, and in particular, the n-dimensional random vector Pt takes values in a set of at
most nt elements.
Proof. We first consider the t = 1 case. Suppose A1 , . . . , Ak are a collection of disjoint
F1 -measurable events with P(Ai ) > 0 for all i. Claim: the set {1A1 , . . . , 1Ak } is linearly
independent, and in particular, the dimension of the span of {1A1 , . . . , 1Ak } is exactly k. To
prove this claim, we must show that if
a1 1A1 + . . . ak 1Ak = 0 a.s.
for some constants a1 , . . . , ak , then a1 = · · · = ak = 0. To this end, note that if i 6= j the
sets Ai and Aj are disjoint and hence 1Ai 1Aj = 0. By multiplying both sides of the equation
by 1Ai we get ai 1Ai = 0. But since P(Ai ) > 0 it must be the case that ai = 0.
Now if the market is complete, each of the 1Ai is replicable. Hence
span{1A1 , . . . , 1Ak } ⊆ {H · P1 : H ∈ Rn }
= span{P11 , . . . , Ptn }
Looking at the dimensions of the spaces above, we must conclude k ≤ n.
The argument in the case t > 1 is similar. Let B1 , . . . , BN be a maximal partition of
Ω into disjoint Ft−1 -measurable sets of positive measure. If a random vector Ht is Ft−1 -
measurable, then it takes exactly one value on each of the Bj ’s for a total of at most N
38
values H1 , . . . , HN . Hence
{H · Pt : H is Ft−1 -meas. } = {H1 · Pt 1B1 + . . . + HN · Pt 1BN : H1 , . . . , HN ∈ Rn }
= span{Pti 1Bj : 1 ≤ i ≤ n, 1 ≤ j ≤ N }
and the dimension of the space above is nN . The argument above proves that there are at
most nN sets of disjoint Ft -measurable sets of positive measure. Induction completes the
proof. 

2. Super-replication of American claims


We now discuss American claims. Here, things are quite different. The canonical example
of an American claim is the American put option– a contract which gives the buyer the right
(but not the obligation) to sell the underlying stock at a fixed strike price K > 0 at any time
between time 0 and a fixed maturity date T . Hence, the payout of the option is (K − Sτ )+
where τ ∈ {0, . . . , T } is a time chosen by the holder of the put to exercise the option.
The payout of an American claim is specified by two ingredients:
• a maturity date T > 0,
• an adapted process (ξt )0≤t≤T .
For instance, in the case of an American put, we may take ξt = (K − St )+ . Unlike the
European claim, the holder of an American claim can choose to exercise the option at any
time τ before or at maturity. However, to rule out clairvoyance, we insist that τ is a stopping
time.
Now, if an American claim matures at T > 0 and is specified by the payout process
(ξt )0≤t≤T , then the actual payout of the claim is modelled by the random variable ξτ , where
τ is any stopping time for the filtration taking values in {0, . . . , T }.
We can think of the American claim then as a family, indexed by the stopping time τ , of
European claims with payouts ξτ . To simplify matters, we make the following assumption
in this subsection:
The market model P = (Pt )0≤t≤T is complete.
Let Y = (Yt )0≤t≤T be the unique martingale deflator such that Y0 = 1.
Intuitively, the seller of such a claim should at time 0 charge at least the amount
sup E (Yτ ξτ )
τ ≤T

to be sure that he can hedge the option, where the supremum is taken over the set of stopping
times smaller than or equal to T . Indeed, this is the case.
Theorem. Suppose that the adapted process (ξt )0≤t≤T specifies the payout of an American
claim maturing at T > 0.
There exists a trading strategy H such that
• Xt (H) ≥ ξt for all 0 ≤ t ≤ T ,
• Xτ ∗ (H) = ξτ ∗ for some stopping time τ ∗ , and
• X0 (H) = supτ ≤T E (Yτ ξτ ).
Remark. The strategy H dominates the payout of the American claim at all times, but
is conservative in the sense that it exactly replicates the optimally exercised claim.
39
The rest of this subsection is dedicated to proving this theorem.

*****
We will need a result of general interest:
Theorem (Doob decomposition theorem). Let U be a discrete-time supermartingale.
Then there is a unique decomposition
Ut = U0 + Mt − At
where M is a martingale and A is a predictable non-decreasing process with M0 = A0 = 0.
Proof. Let M0 = 0 = A0 and define
Mt+1 = Mt + Ut+1 − E(Ut+1 |Ft )
At+1 = At + Ut − E(Ut+1 |Ft )
for t ≥ 0. Since U is assumed to be supermartingale, and hence integrable, the processes
M and A are integrable. It is straightforward to check that M is a martingale, and since
U is a supermartingale, that A is non-decreasing. Also by induction, we see that At+1 is
Ft -measurable.
Summing up,
t
X
Mt − At = M0 − A0 + (Ms − Ms−1 − As + As−1 )
s=1
t
X
= (Us − Us−1 )
s=1
= Ut − U0 .
To show uniqueness, assume that Ut = U0 + Mt − At = U0 + Mt0 − A0t . Then M − M 0 is a
predictable discrete-time martingale, that is, a constant. 
Now we introduce the key concept in optimal stopping theory:
Definition. Let (Zt )0≤t≤T be a given integrable adapted discrete-time process. Define
an adapted process (Ut )0≤t≤T by the recursion
UT = ZT
Ut = max{Zt , E(Ut+1 |Ft )} for 0 ≤ t ≤ T − 1.
The process (Ut )0≤t≤T is called the Snell envelope of (Zt )0≤t≤T .
Remark. The Snell envelope clearly satisfies both
Ut ≥ Zt and Ut ≥ E(Ut+1 |Ft )
almost surely. Thus, another way to describe the Snell envelope of a process is to say it is
the smallest supermartingale dominating that process.
In our application Z will be the process Y ξ, where Y is the martingale deflator and ξ is
the process specifying the payout of the American claim.
40
Theorem. Let (Zt )0≤t≤T be an integrable adapted process, let (Ut )0≤t≤T be its Snell en-
velope with Doob decomposition Ut = U0 + Mt − At . Let
τ ∗ = min{t ∈ {0, . . . , T } : At+1 > 0}
with the convention τ ∗ = T on {At = 0 for all t}. Then τ ∗ is a stopping time and
Uτ ∗ = U0 + Mτ ∗ = Zτ ∗ .
Proof. That τ ∗ is a stopping time follows from the fact that the non-decreasing process
(At )0≤t≤T is predictable.
Now note that
E(Ut+1 |Ft ) = E(U0 + Mt+1 − At+1 |Ft ) = U0 + Mt − At+1
since M is a martingale and A is predictable so that by the definition of Snell envelope
U0 + Mt − At = max{Zt , U0 + Mt − At+1 }.
In particular,
U0 + Mτ ∗ = max{Zτ ∗ , U0 + Mτ ∗ − Aτ ∗ +1 }
since Aτ ∗ = 0. But since Aτ ∗ +1 > 0 we must conclude
Uτ ∗ = U0 + Mτ ∗ = Zτ ∗ .

Theorem. Let Z be an adapted integrable process and let U be its Snell envelope. Then
U0 = sup E(Zτ ).
τ ≤T

Proof. Since U is a supermartingale,


U0 ≥ E(Uτ )
for any stopping time τ by the optional sampling theorem. (See example sheet 2.) But since
Ut ≥ Zt by construction,
U0 ≥ E(Zτ )
for any stopping time τ . But letting τ ∗ = min{t ∈ {0, . . . , T } : At+1 > 0} where U =
U0 + M − A is the Doob decomposition of U , we have
U0 = U0 + E(Mτ ∗ ) = E(Zτ ∗ ).
again by the optional sampling theorem and the previous result. 
Remark. By a similar argument, one can show that
Ut = ess supt≤τ ≤T E(Zτ |Ft )
for all 0 ≤ t ≤ T . This formula allows us to define the Snell envelope for the infinite horizon
case T = ∞ and also in the continuous time case.
Definition. If Z is an integrable adapted proces, a stopping time σ such that E(Zσ ) =
supτ ≤T E(Zτ ) is called an optimal stopping time. Obviously the stopping time τ ∗ defined
above is an optimal stopping time. Example sheet 2 shows how to find another one.
41
*****
Returning to finance, let (ξt )0≤t≤T be the process specifying the payout of an American
option, and let (Ut )0≤t≤T be the Snell envelope of Y ξ with Doob decomposition Ut = U0 +
Mt − At .
We now will use the assumption that the market is complete: let H be strategy such
that XT = (U0 + MT )/YT , where Xt = Ht · Pt . Since XY is a martingale since it is a
local martingale from before, and since the market is complete, it is also bounded. By the
martingale property, we have
Xt Yt = U0 + Mt
for all 0 ≤ t ≤ T . In particular,
• Xt = (U0 + Mt )/Yt ≥ Ut /Yt ≥ ξt for all 0 ≤ t ≤ T ,
• Xτ ∗ = ξτ ∗ , and
• X0 = supτ ≤T E(Yτ ξτ ),
completing the proof of the theorem.

42
CHAPTER 3

Brownian motion and stochastic calculus

Despite the elegance of discrete-time financial theory, there is at least one glaring problem:
explicit computations are difficult. For instance, the fundamental theorems are stated in
terms of state price densities, but it is very difficult to classify them except in a few simple
examples. The continuous-time theory has the convenient feature that explicit formulae are
easy to find–indeed, one of our first results will be the general formula for a state price
density in a continuous-time market model.
Before we can describe the continuous-time financial theory, we need to first learn about
stochastic integration. Recall that in discrete time, the self-financing condition and budget
constraint imply that for the wealth process X corresponding to a pure investment strategy
H satisfies
Xt − Xt−1 = Ht · (Pt − Pt−1 )
so that
X t
X t = X0 + Hs · (Ps − Ps−1 )
s=1
The continuous time analogue ought to be something like
Z t
Xt = X0 + Hs · dPs
0
What does the integral on the right mean? If we assume that the sample paths t 7→ Pt are
differentiable, we could interpret the integral as the Lebesgue integral
Z t
dPs
Hs · ds.
0 ds
Unfortunately, it turns out that life is not that simple. To see why, remember that in
discrete time we defined the state price density Y as a positive process such that Y P is
a martingale. We will adopt more-or-less the same definition in continuous time. Now, a
theorem of stochastic calculus says that a continuous martingale with differentiable sample
paths is necessarily constant. So if we insist that our price processes have differentiable
sample paths, we will have a very boring theory.
This chapter is concerned with an integration theory where we use the martingale prop-
erty, rather than the differentiablity of the sample paths, as the key ingredient. This theory
is nice, and indeed something like the fundamental theorem of calculus holds. This means
we can do explicit computations.
The most basic example of a continuous martingale is Brownian motion. We will build
up our theory by first defining Brownian motion, to construct the Brownian stochastic inte-
gral, and to learn the rules of the resulting calculus. The following chapter will provide an
extremely brief introduction to this theory.
43
1. Brownian motion
In this section, we introduce one of the most fundamental continuous-time stochastic
processes, Brownian motion. As hinted above, our primary interest in this process is that
it will be the building block for all of the continuous-time market models studied in these
lectures.

Definition. A Brownian motion W = (Wt )t≥0 is a collection of random variables such


that
• W0 (ω) = 0 for all ω ∈ Ω,
• for all 0 ≤ t0 < t1 < ... < tn the increments Wti+1 − Wti are independent, and the
distribution of Wt − Ws is N (0, |t − s|),
• the sample path t 7→ Wt (ω) is continuous all ω ∈ Ω.

It is not clear that Brownian motion exists. That is, does there exist a probability
space (Ω, F, P) on which the uncountable collection of random variables (Wt )t≥0 can be
simultaneously defined in such a way that the above definition holds? The answer, of course,
is yes, and the proof of this fact is due to Wiener in 1923. Therefore, the Brownian motion
is also often called the Wiener process, especially in the U.S.

Although the sample paths of Brownian motion are continuous, they are very irregular.
Below is a computer simulation of a one-dimensional Brownian motion:

2. Itô stochastic integration


We now have sufficient motivation to construct a stochastic integral with respect to a
Wiener process. What follows is the briefest of sketches of the theory. There are now plenty
of places to turn for a proper treatment of the subject. For instance, please consult one of
the following references:
• L.C.G. Rogers and D. Williams, Diffusions, Markov Processes, and Martingales:
Volume 2
• I. Karatzas and S.E. Shreve, Brownian Motion and Stochastic Calculus.

2.1. The L2 theory. To get things started, let W be a scalar Brownian motion. We
will assume that W is adapted to a filtration (Ft )t≥0 . For the record, we will assume
T that the
filtration satisfies what are called the usual conditions of right-continuity Ft = >0 Ft+ and
that F0 contains all P-null events. These are technical assumptions that ensure the existence
of stopping times with the right properties. We also will assume that for each 0 ≤ s < t the
increment Wt − Ws is independent of Fs .
The first building block of the theory are the simple predictable integrands.

Definition. A simple predictable process is an adapted process α = (αt )t≥0 of the form
N
1(tn−1 ,tn ] (t)an (ω)
X
αt (ω) =
n=1
44
Sample path of Brownian motion

3.5

3.0

2.5

2.0

1.5

W
1.0

0.5

0.0

−0.5

−1.0

−1.5
0 1 2 3 4 5 6 7 8 9 10
t

where an is bounded and Ftn−1 -measurable for some 0 ≤ t0 < t1 < ... < tN < ∞. For simple
predictable processes we define the stochastic integral by the formula
Z ∞ XN
αs dWs = an (Wtn − Wtn−1 )
0 n=1

Theorem (Itô’s isometry). For a simple predictable integrand α, we have


"Z 2 #
∞ Z ∞ 
2
E αs dWs =E αs ds
0 0

Proof. Note that


Z ∞ 2 X X
αs dWs = a2n (Wtn − Wtn−1 )2 + 2 am an (Wtm − Wtm−1 )(Wtn − Wtn−1 ).
0 n m<n
Now, if m < n we know that am , an and Wtm − Wtm−1 are Ftn−1 measureable and hence

E[am an (Wtm − Wtm−1 )(Wtn − Wtn−1 )] = E E[am an (Wtm − Wtm−1 )(Wtn − Wtn−1 )|Ftn−1 ]

= E am an (Wtm − Wtm−1 )E[Wtn − Wtn−1 |Ftn−1 ]
=0
45
by the tower and slot property, where in the last line we used the fact that the increment
Wtn − Wtn−1 has mean zero and is independent of Ftn−1 .
Similarly,
E[a2n (Wtn − Wtn−1 )2 ] = E{a2n E[(Wtn − Wtn−1 )2 |Ftn−1 ]}
= E[a2n (tn − tn−1 )]
where we have used the fact that the increment Wtn − Wtn−1 has variance tn − tn−1 .
Putting this together implies
" Z 2 # " N #
∞ X
E αs dWs =E a2n (tn − tn−1 )
0 n=1
Z ∞ 
=E αs2 ds
0

as claimed. 
R∞
Now, the map defined by I(α) = 0 αs dWs is an isometry from the space of simple
predictable integrands to the space L2 (Ω, F, P) of square-integrable random variables. The
fact that L2 is complete is the key observation which allows us to build the stochastic integral
of more general integrands.
Definition. The predictable sigma-field P is the sigma-field on the product space R+ ×Ω
generated by sets of the form (s, t]×A where 0 ≤ s < t and A is Fs -measurable. Equivalently,
the predictable sigma-field is that generated by the simple, predictable integrands.
A predictable process α is a map α : R+ × Ω → R that is P-measurable. Equivalently,
predictable processes are limits of simple, predictable integrands.
Remark. Every left-continuous, adapted process is predictable. These are the examples
to keep in mind, since they are the ones that come up most in application.
Now, suppose (α(k) )k≥1 is a sequence of simple predictable integrands converging in
2
L (R+ × Ω, P, Leb × P) to a predictable process α so that
Z ∞ 
(k) 2
E (αs − αs ) ds → 0
0

as k → ∞. By Itô’s isometry the sequence I(α(k) ) is a Cauchy sequence, which by the com-
pleteness of L2 , converges to some random variable. This is what we take as the definition.
Definition. If α is predictable and
Z ∞ 
E αs2 ds <∞
0

then Z ∞ Z ∞
αs dWs = lim αs(k) dWs
0 k 0
where the limit is interpreted in the L (Ω) sense where α(k) is any sequence of simple,
2

predictable processes converging to α in L2 (R+ × Ω).


46
The inspired idea of the above definition of the integral is that it compensates for the
roughness of a typical Brownian sample path by using instead the many cancellations that
occur on average from the uncorrelated Brownian increments.

Of course, we are not really interested in integrals over the whole interval [0, ∞) but
rather finite intervals [0, t]. This is easily handled.
Theorem. For every predictable α such that
Z t 
2
E αs ds < ∞ for all t ≥ 0
0
there exists a continuous martingale X such that
Z ∞
Xt = αs 1{s≤t} dWs .
0
In this case, we will use the notation
Z t
Xt = αs dWs .
0

2.2. Localisation. In this section, we show how to extend the definition of stochastic
integral to predictable processes α such that
Z t
αs2 ds < ∞ almost surely
0
for all t ≥ 0. The technique is called localisation.
Define the stopping times
 Z t 
2
τn = inf t ≥ 0 : αs ds = n
0
for each n ≥ 1, where inf ∅ = ∞ as usual, and let
= αt 1{t≤τn } .
(n)
αt
R 
t (n)
Note that since E 0
(αs )2 ds ≤ n, the process X (n) defined by
Z t
(n)
Xt = αs(n) dWs
0
2
is a martingale for each n by the L theory.
R t Now fix t > 0 and define the increasing sequence of events An = {ω ∈ Ω : τn ≥ t}. Since
2
S
α ds < ∞ almost surely for all t ≥ 0, we have P n∈N An = 1. Hence we can define
0 s
the stochastic integral by the formula
Z t Z t
αs dWs = lim αs(n) dWs
0 n→∞ 0
where the limit is in probability.
Note that the X defined by Z t
Xt = αs dWs
0
47
is a continuous local martingale. Indeed, for each n the stopped process
(n)
Xt∧τn = Xt
is a martingale.
To summarise the most frequently used aspects of this construction:
Rt
Xt = 0 αs dWs defines a continuous
If α is an adapated continuous process then 
Rt
local martingale. If in addition we have E 0 αs2 ds < ∞ for all t ≥ 0, then
X is a true martingale.

3. Itô’s formula
In the last section, we sketched very quickly the constructed of a stochastic integral with
respect to a Wiener process. What makes the Itô stochastic integral useful is that there is a
corresponding stochastic calculus. The basic building block of this calculus is the chain rule,
called Itô’s formula.
3.1. The scalar version. Let (Wt )t≥0 be a scalar Brownian motion adapted to a fil-
tration (Ft )t≥0 satisfying our usual conditions.
We can use our stochastic integration theory to define a useful class of stochastic process:
Definition. An Itô process X is an adapted process of the form
Z t Z t
Xt = X0 + αs dWs + βs ds.
0 0
where X0 is a fixed real number and (αt )t≥0 and (βt )t≥0 be predictable real-valued processes
such that Z t Z t
2
αs ds < ∞ and |βs |ds < ∞
0 0
almost surely for all t ≥ 0.
Note that the two integrals appearing the above definition have different meanings: the
first as a stochastic integral and the second as a pathwise Lebesgue integral.
We are now ready for the first version of Itô’s formula:
Theorem (Itô’s formula, scalar version). Let X be an Itô process and f : R → R twice
continuously differentiable. Then
Z t Z t 
0 0 1 00 2
f (Xt ) = f (X0 ) + f (Xs )αs dWs + f (Xs )βs + f (Xs )αs ds.
0 0 2
Let us highlight a difference between Itô and ordinary calculus, by noting the mysterious
appearance of the f 00 term in Itô’s formula. This term would not appear in the chain rule of
ordinary calculus. But consider the example f (x) = x2 so that
Z t
2
Wt = 2 Ws dWs + t.
0
Note that since Z t  Z t
E Ws2 ds = s ds = t2 /2 < ∞,
0 0
48
the local martingale X given by Z t
Xt = Ws dWs
0
is actually a true martingale. Can you verify, directly from the definition of Brownian motion,
that the process (Wt2 − t)t≥0 is a martingale?

We now introduce a differential notation which cleans up some of the formulae. We will
use the notation
dXt = αt dWt + βt dt
to mean Z t Z t
Xt = X0 + αs dWs + βs ds.
0 0
Recall that the sample paths of the Brownian motion are nowhere differentiable, so the
notation dWt is only formal, and can only be interpreted via the stochastic integration
theory. But in this differential notion, Itô’s formula takes a nicer form
 
1 00
df (Xt ) = f (Xt )βt + f (Xt )αt dt + f 0 (Xt )αt dWt .
0 2
2
Example. Consider the Itô process given by
Xt = X0 + aWt + bt
for some constants a, b ∈ R. Letting
Yt = eXt ,
we would like to show that the process (Yt )t≥0 is an Itô process, and write down its decom-
position in terms of ordinary and stochastic integrals.
Let f (x) = ex . Then f 0 (x) = ex and f 00 (x) = ex . Also,
dXt = a dWt + b dt and dhXit = a2 dt
So Itô’s formula says:
1
df (Xt ) = f 0 (Xt )dXt + f 00 (Xt )dhXit
2
⇒ dYt = Yt [(b + a2 /2)dt + a dWt ]
We now introduce a notion which helps with computations involving Itô’s formula.
Theorem. Let X be an Itô process. There exists a continuous non-decreasing process
hXi, called the quadratic variation of X, such that
N
X
hXit = lim (Xnt/N − X(n−1)t/N )2
N
n=1
for each t ≥ 0, where the limit is in probability. If
dXt = αt dWt + βt dt
then
dhXit = αt2 dt.
Remark. Some people write dhXit = (dXt )2 for obvious reasons.
49
This is not the appropriate place to prove this important result in full, but to get a feeling
for why it is true, we will prove it in the case where X is a Brownian motion:

Proof. By definition, the increments of Brownian motion are Gaussian randoms so that

E[(Wt − Ws )2 ] = t − s

and
Var[(Wt − Ws )2 ] = 2(t − s)2
for every 0 ≤ s ≤ t. Hence
" N # N
X X
E (Wtn − Wtn−1 )2 = (tn − tn−1 ) = tN − t0
n=1 n=1

and, by the independence of the increments of Brownian motion,


" N # N
X X
2
Var (Wtn − Wtn−1 ) = 2 (tn − tn−1 )2 .
n=1 n=1

Letting tn = nt/N , Chebychev’s inequality implies


N
!
X 2t2
P (Wnt/N − W(n−1)t/N )2 − t >  ≤ → 0.
n=1
N 2

Remark. For comparison, consider a continuously differentiable function f : [0, 1] → R.


Recall that for such functions there exists a constant C > 0 such that |f (t) − f (s)| ≤ C|t − s|
for all s, t ∈ [0, 1]. Hence we have
N
X N
X
[f (n/N ) − f ((n − 1)/N )]2 ≤ C 2 /N 2
n=1 n=1
2
= C /N → 0

Since the quadratic variation of a Brownian motion is positive, the typical Brownian sample
path is not a continuously differentiable function of time.

With the notion of quadratic variation, we can rewrite Itô’s formula once more in a
particularly easy to remember form:

1
df (Xt ) = f 0 (Xt )dXt + f 00 (Xt )dhXit
2

In this form, the idea of the proof becomes clear:


50
Idea of proof of Itô’s formula. Fix a partion of [0, t]. By telescoping a sum and
consider the following second order Taylor approximation we have the following:
N
X
f (Xt ) − f (X0 ) = f (Xtn ) − f (Xtn−1 )
n=1
N
X 1
≈ f 0 (Xtn−1 )(Xtn − Xtn−1 ) + f 00 (Xtn−1 )(Xtn − Xtn−1 )2
n=1
2
Z t Z t
1 00
≈ f 0 (Xs )dXs + f (Xs )dhXis .
0 0 2


3.2. The multi-dimensional version. We now introduce the vector version of Itô’s
formula. It is basically the same as before, but with worse notation.
An n-dimensional Itô process (Xt )t≥0 defined by
Z t Z t
Xt = X0 + αs dWs + βs ds,
0 0

interpreted component-wise as
d
Z tX Z t
(i) (i)
Xt = X0 + αs(i,k) dWs(k) + βs(i) ds
0 k=1 0

where (Wt )t≥0 is a d-dimensional Brownian motion so that W (1) , . . . , W (d) , are independent
scalar Brownian motions, and the predictable process (αt )t≥0 is valued in the space of n × d
matrices, and the predictable process (βt )t≥0 is valued in Rn . We insist that
Z tXn X d Z tX n
(i,k) 2
(αs ) ds < ∞ and |βs(i) |ds < ∞
0 i=1 k=1 0 i=1

almost surely for all t ≥ 0 so that all of the integrals make are defined. The aim of this
section is to give a formula for the Itô decomposition of f (t, Xt ).
Now in the scalar case we needed a notion of quadratic variation (dXt )2 = dhXit . In the
(i) (j)
multi-dimensional case, we now introduce the notion of quadratic co-variation (dXt )(dXt ) =
dhX (i) , X (j) it .
Theorem. There exists a continuous process of finite variation hX (i) , X (j) i, called the
quadratic co-variation of X (i) and X (j) , such that
N
X
(i) (j) (i) (i) (j) (j)
hX , X it = lim (Xnt/N − X(n−1)t/N )(Xnt/N − X(n−1)t/N )
n
n=1

for each t ≥ 0, where the limit is in probability, given by


d
X
(i) (j)
dhX , X it = αs(i,k) αs(j,k) dt.
k=1

51
The following multiplication table might help you remember how to compute quadriatic
covariation, where W and W ⊥ denote independant Brownian motions:

(dt)2 = 0 (dt)(dWt ) = 0

(dWt )2 = dt (dWt )(dWt⊥ ) = 0

Now we are ready for the statement of the theorem:


Theorem (Itô’s formula, multi-dimensional version). Let f : R+ × Rn → R where
(t, x) 7→ f (t, x) be continuously differentiable in the t variable and twice-continuously differ-
entiable in the x variable. Then
n n n
∂f X ∂f (i) 1 X X ∂ 2f
df (t, Xt ) = (t, Xt )dt + (t, Xt ) dXt + (t, Xt ) dhX (i) , X (j) it
∂t i=1
∂xi 2 i=1 j=1 ∂xi ∂xj

4. Girsanov’s theorem
As we have seen in discrete time, the economic notion of an arbitrage-free market model
is tied to the existence of an equivalent measure for which the asset prices, when discounted
by a numéraire are martingales.
Recall that an equivalent measures is related to a positive random variable via the Radon–
Nikodym theorem. Indeed, let (Ω, F, P) be our probability space and let Q be equivalent to
P. Then, by the Radon–Nikodym theorem there exists a density
dQ
Z=
dP
such that Z > 0 has unit P-expectation. Conversely, if Z > 0 and EP (Z) = 1, we can define
an equivalent measure Q with density Z.

Motivated by above discussion, we aim to understand how martingales arise within the
context of the Itô stochastic integration theory. Consider the stochastic process (Zt )t≥0 given
by
1 t
R 2
Rt
Zt = e− 2 0 |αs | ds+ 0 αs ·dWs
where (Wt )t≥0 is a m-dimensional
Rt Brownian motion and (αt )t≥0 is a m-dimensional pre-
2
dictable process with 0 |αs | ds < ∞ a.s. for all t ≥ 0.
This process is clearly positive. Furthermore, notice that by Itô’s formula we have
dZt = Zt αt · dWt
so that (Zt )t≥0 is a local martingale, as it is a stochastic integral with respect to a Brownian
motion. Recall that since Z is a positive local martingale, it is automatically a supermartin-
gale. Hence, if
E(ZT ) = 1
for some non-random T > 0, then (Zt )0≤t≤T is a true martingale. In this case, what happens
to the Brownian motion when we change to an equivalent measure with density ZT ?
52
Theorem (Cameron–Martin–Girsanov Theorem). Let (Ω, F, P) be a probability space on
which a m-dimensional Brownian motion (Wt )t≥0 is defined, and let (Ft )t≥0 be a filtration
satisfying the usual conditions. Let
1
Rt Rt
kαs k2 ds+
Zt = e− 2 0 0 αs ·dWs

and suppose (Zt )0≤t≤T is a martingale. Define the equivalent measure Q on (Ω, FT ) by the
density process
dQ
= ZT .
dP
Then the m-dimensional process (Ŵt )0≤t≤T defined by
Z t
Ŵt = Wt − αs ds
0
is a Brownian motion on (Ω, FT , Q).
Now, you may be asking yourself: When is the process (Zt )t≥0 not just a local martingale,
but a true martingale?
Theorem (Novikov’s criterion). If
 1 RT 
+ 2 0 kαs k2 ds
E e <∞
then  1 RT RT 
2
E e− 2 0 kαs k ds+ 0 αs ·dWs = 1.

5. A martingale representation theorem


In this section we will see that all continuous martingales are essentially stochastic inte-
grals with respect to Brownian motion. This will have applications to our continuous-time
financial models in the next chapter.
Theorem (Itô’s Martingale Representation Theorem). Let (Ω, F, P) be a probability
space on which a m-dimensional Brownian motion W = (Wt )t≥0 is defined, and let the
filtration (Ft )t≥0 be the filtration generated by W .
Let X = (Xt )t≥0 be a continuous localRmartingale. Then there exists a unique predictable
t
m-dimensional process (αt )t>0 such that 0 kαs k2 ds < ∞ almost surely for all t ≥ 0 and
Z t
Xt = X 0 + αs · dWs .
0
Rt
Furthermore, if Xt > 0 for all t ≥ 0 then there exists a predictable β such that 0
kβs k2 ds <
∞ and 1 T
RT
2
R
Xt = X0 e− 2 0 kβs k ds+ 0 βs ·dWs
Proof of the second claim. Assuming that
dXt = αt · dWt
and the positivity of X, apply Itô’s formula to get
αt kαt k2
d log Xt = · dWt − dt
Xt 2Xt2
53
so the conclusion follows with βt = αt /Xt . 
We conclude this section with a useful result. It is not directly applicable to finance, but
simplifies several arguments.
Theorem (Lévy’s Characterisation of Brownian Motion). Let (Xt )t≥0 be a continuous
m-dimensional local martingale such that

(i) (j) t if i = j
hX , X it =
0 if i 6= j.
Then (Xt )t≥0 is a standard m-dimensional Brownian motion.

Proof. Fix a constant vector θ ∈ Rm and let i = −1. Consider
2 t/2
Mt = eiθ·Xt +|θ| .
By Itô’s formula,
m X m
|θ|2
 
1 X
dMt = Mt iθ · dXt + dt − Mt θ(i) θ(j) dhX (i) , X (j) it
2 2 i=1 j=1
= iMt θ · dXt

and so (Mt )t≥0 is a continuous local martingale, as it is the stochastic integral with re-
2
spect to a continuous local martingale. On the other hand, since |Mt | = e|θ| t/2 and hence
E(sups∈[0,t] |Ms |) < ∞ the process (Mt )t≥0 is a true martingale. Thus for all 0 ≤ s ≤ t we
have
E(Mt |Fs ) = Ms
which implies
2
E(ei θ·(Xt −Xs ) |Fs ) = e−|θ| (t−s)/2 .
The above equation implies that the increment Xt − Xs has the Nm (0, (t − s)I) distribution
and is independent of Fs . 

54
CHAPTER 4

Arbitrage theory for continuous-time models

We now return to the main theme of these lecture, models of financial markets. We
now have the tools to discuss the continuous time case, at least when the asset prices are
continuous processes.

1. The set-up
As before, our market model consists of a n-dimensional stochastic processes P =
(Pt1 , . . . , Ptn )t≥0 representing the asset prices. This process will be defined on a probability
space (Ω, F, P) with a filtration F = (Ft )t≥0 satisfying the usual conditions. Furthermore, we
will make the following assumption to make use of the Itô calculus developed in the previous
chapter.
Assumption. The stochastic process P is assumed to be is an Itô process adapted to F.
Since continuous-time theory has enough complications, we will make the following sim-
plification:
Assumption. There exists a numéraire asset.
In particular, when we discuss arbitrage theory, there is no need to allow the possibility
of intermediate consumption.
As before, the investor’s controls consist of the n-dimensional process H = (Ht1 , . . . , Htn )t≥0
where Hti and corresponds to the number of shares of asset i held at time t. We will assume
that H is self-financing in the continuous time sense:
Definition. A n-dimensional predictable process H such that H is P -integrable1is a
self-financing investment/consumption strategy iff
d(Ht · Pt ) = Ht · dPt
WARNING: THIS DEFINITION IS INCOMPLETE in the sense that it does not give
rise to interesting arbitrage theory. The reason for the above warning is spelled out below.

2. Admissible strategies
In order to make sense of the stochastic integral defining the wealth, we need to impose
a technical integrablity condition which holds automatically for continuous processes.
1...this
Rt
means the stochastic integral 0 Hs · dPs is well-defined, i.e. if dPt = bt dt + σt dWt then
Z t Z t
|Hs · bs |ds < ∞ and kσs> Hs k2 ds < ∞ a.s. for all t ≥ 0
0 0

55
However, in moving from discrete to continuous time, we have to be careful. We will now
see that this condition isn’t strong enough to make our economic analysis interesting.

Example. Consider a discrete-time market model with two assets P = (1, S) where S
is a simple symmetric random walk:
St = ξ1 + . . . + ξt
where the random variables ξ1 , ξ2 , . . . are independent and
P(ξt = 1) = P(ξt = −1) = 1/2.
Obviously this market has no arbitrage as P is a martingale. Nevertheless, let’s explore how
to approximate an arbitrage in some sense. Given a predictable process π, let
t−1
X
φt = (πs+1 − πs )Ss
s=1

Then the pair (φ, π) defines a self-financing pure investment strategy with associated wealth
process
Xt
Xt = πs (Ss − Ss−1 ).
s=1
In particular, X0 = 0.
A simple strategy that resembles an arbitrage is constructed as follows: first define the
stopping time
σ = inf{t ≥ 0 : St > 0}.
and consider the strategy with
πt = 1{t≤σ}
Note that the associated wealth process is Xt = St∧σ . Since σ < ∞ a.s., the conclusion is
that if you are willing to wait a while, investing in this strategy will result in an almost sure
gain Xσ = 1. But the amount of time you have to wait is very long: one can show that
E(σ) = +∞.
One can improve upon the above idea by taking larger and larger bets, effectively ‘speed-
ing up the clock’. Indeed, define the stopping time
τ = inf{t ≥ 0 : ξt = 1}
and consider the strategy
πt = 2t−1 1{t≤τ } .
In this case, the associated wealth process is
Xt = 1 − 2t 1{t≤τ −1} .
This is the classical ‘martingale’ or doubling strategy. Note that E(τ ) = 2, so an investor
following this strategy does not have to wait very long on average to realise the gain Xτ = 1.
But although τ is small on average, it is not bounded, and hence this strategy does not
qualify as an arbitrage.
56
Example. A technical problem with continuous time models is that events that will
happen eventually can be made to happen in bounded time by speeding up the clock.
Consider the market with prices P = (1, W ) where W is a Brownian motion. We will now
construct a pure investment trading strategy such that the corresponding wealth process has
X0 = 0 and XT = K a.s. where T > 0 is an arbitrary (non-random) time horizon and the
constant K > 0 is also arbitrary.
More concretely, by writing H = (φ, π), we will find a real-valued adapted process
RT
(πt )t∈[0,T ] such that 0 πs2 ds < ∞ almost surely, but
Z T
πs dWs = K a.s
0
Let f : [0, T ] → [0, ∞] be a strictly increasing, continuous function such that f (0) = 0
and f (T ) = ∞. In particular we assume that f 0 (t) > 0 for t and there exists an inverse
function f −1 : [0, ∞] → [0, T ] such that f ◦ f −1 (u) = u. For instance, to be explicit, we may
t
take f (t) = T −t and f −1 (u) = 1+u
uT
.
Now define a local martingale (Zu )u≥0 by
Z f −1 (u)
Zu = (f 0 (s))1/2 dWs
0
Note that the quadratic variation is
Z f −1 (u)
hZiu = f 0 (s)ds
0
= f (f −1 (u)) − f (0)
= u
so by Lévy’s characterisation (Zu )u≥0 is a Brownian motion. Define the stopping time τ by
τ = inf{u ≥ 0, Zu = K}.
Since (Zu )u≥0 is a Brownian motion, we have τ < ∞ almost surely since supu≥0 Zu = ∞
almost surely.
Now let
πt = (f 0 (t))1/2 1{t≤f −1 (τ )}
and Z t
Xt = πs dWs
0
for 0 ≤ t ≤ T . Note that since
Z T Z f −1 (τ )
πs2 ds = f 0 (s)ds = τ < ∞
0 0
the stochastic integral is well-defined. The strange fact is that (Xt )t∈[0,T ] is a local martingale
with X0 = 0, but XT = Zτ = K almost surely.
We see that integrand (πs )s∈[0,T ] roughly corresponds to an gambler starting at noon with
£0, employing a doubling strategy (with borrowed money) at a quicker and quicker pace,
until finally he gains £K almost surely before the clock strikes one o’clock. This situation is
rather unrealistic, particularly since the gambler must go arbitrarily far into debt in order to
57
secure the £K winning. Indeed, if such strategies were a good model for investor behaviour,
we all could be much richer by just spending some time trading over the internet.

The above discussion shows that the integrability necessary to define the stochastic in-
tegral is not really sufficient for our needs.
At this stage, there are several reasonable options. In this course we will insist that
the investor cannot go into debt.
Definition. A trading strategy H is admissible iff
Ht · Pt ≥ 0 for all t ≥ 0 almost surely .
Note that the doubling strategy is not admissible, since the investor now has only a finite
credit line. However, a suicide strategy, that is, a doubling strategy in which the object is to
lose a fixed amount K by time T , is admissible.

3. Arbitrage and local martingale deflators


To see that our restriction to admissible strategies is reasonable, let’s now consider
continuous-time arbitrage theory.
Definition. An admissible strategy H is called an absolute arbitrage iff there is a non-
random time T such that
H0 · P0 = 0 ≤ HT · PT a.s.
and
P (HT · PT > 0) > 0.
An admissible strategy H is called an arbitrage relative to an admissible strategy K iff there
is a non-random time T such that
H0 · P0 = K0 · P0 ,
and
HT · PT ≥ KT · PT a.s., P (HT · PT > KT · PT ) > 0.
Remark. Note that if H is an absolute arbitrage and K is admissible, then the strategy
H + K is an arbitrage relative to K. On the other hand, if H 0 is an arbitrage relative to K,
then H 0 − K is an absolute arbitrage only if H 0 − K is admissible. In particular, an absolute
arbitrage is an arbitrage relative the strategy K = 0 of holding no assets.
In discrete time, the notions of absolute arbitrage and relative arbitrage are essentially
equivalent. In continuous time, we will soon find examples of the surprising fact that
there exists continuous time markets that have relative arbitrage but no absolute arbitrage.
Such market models are sometimes considered models of price bubbles.
The point of all of this is to warn you to be careful when making arbitrage arguments in
continuous time, since reasonable people can disagree on what kind of strategies should be
called arbitrages.
As in the discrete-time theory, we now introduce martingale deflators.
58
Definition. A (local) martingale deflator is a positive Itô process Y such that Y P =
(Yt Pt )t≥0 is an n-dimensional (local) martingale.
Our continuous-time version of the first fundamental theorem follows. Unfortunately, to
get a clean statement of this result we need to up the technical ante.
Theorem. Suppose there exists a local martingale deflator for the market model P . If
K is an admissible strategy such that the process K · P Y is a true martingale, then there is
no arbitrage relative to K. In particular, there is no absolute arbitrage.
The proof of this fact is based on an important lemma:
Lemma. Suppose H is a self-financing pure investment strategy and let
Z t
Xt = Ht · Pt = X0 + Hs · dPs
0
Then
d(Xt Yt ) = Ht · d(Yt Pt ).
for any Itô process Y . In particular, if Y is a local martingale deflator and H is admissible
then XY is a supermartingale.
Proof of lemma. . First note
Yt dXt = Yt (Ht · dPt )
and
Xt dYt = Ht · Pt dYt .
Finally, note that
n
X
dhX, Y it = (Ht · dPt )(dYt ) = Hti dhP i , Y it .
i=1
Putting this together with Itô’s formula yields
d(Xt Yt ) = Yt dPt + Xt dYt + dhX, Y it
X
= Hti (Yt dPti + Pti dYt + hY, P i it )
i
= Ht · d(Yt Pt )
as claimed. Now if Y is a local martingale deflator, then P Y is a local martingale. In
particular the process XY can be expressed as the stochastic integral with respect to a
continuous local martingale, and hence is itself a local martingale. Finally, if H is admissible,
then XY is a non-negative local martingale. Non-negative local martingales are super-
martingales by Fatou’s lemma. 
Proof that existence of a local martingale deflator implies no arbitrage.
Let Y be a local martingale deflator, and let H and K be admissible strategies such that
H0 · P0 = K0 · P0 and HT · PT ≥ KT · PT .
Furthermore, suppose that H · P Y is a martingale. We must show that HT · PT = KT · PT .
59
By the above lemma and since Y is non-negative, the process H·P Y is a super-martingale.
Hence
K0 · P0 Y0 = H0 · P0 Y0
≥ E(HT · PT YT )
≥ E(KT · PT YT )
= K0 · P 0 Y0 .
This shows that HT · PT YT = KT · PT YT . Since Y is strictly positive, the conclusion now
follows. 
Remark. Note that the above theorem doesn’t say that no relative arbitrage implies
the existence of a local martingale deflator. A weaker version notion of relative arbitrage,
called ‘free-lunch-with-vanishing-risk,’ is needed to have the converse implication. See the
recent book of Delbaen and Schachermayer The Mathematics of Arbitrage for an account of
the modern theory.

4. The structure of local martingale deflators


In this section we will parametrise a fairly general Itô market with n = d + 1 assets. All
assets in this market are numéraires, and we use the notation P = (B, S). We will assume
the dynamics of the prices are given by the following equations
dBt = Bt rt dt
m
!
X
dSti = Sti µit dt + σtij dWtj for i = 1, . . . , d
j=1

where the processes r, µi , σ ij are predictable and suitably integrable, and the W j are inde-
pendent Brownian motions.
The first asset can be thought of as a bank account, and the random variable rt is the
spot interest rate at time t. The (random) ordinary differential equation can be solved:
Rt
rs ds
Bt = B0 e 0

The d assets can be thought of as risky stocks. The random variablePµit is interpreted as the
mean instantaneous return of asset i, while the spot volatility is ( j (σtij )2 )1/2 . Note that
Itô’s formula yields
Rt ij 2 Rt
i 1
σsij dWsj
P P
Sti = S0i e 0 [µs − 2 j (σs ) ]ds+ 0 j .
We will use the notation
µ1t σt11 · · · σt1m
   

µt =  ...  and σt =  ..
.
..
.
.. 
.
µdt d1
σt · · · σt dm

for the d × 1 vector of means and d × m matrix of volatilities, respectively.


With this more explicit parametrisation, we can describe the structure of state price
densities:
60
Rt
Theorem. Let λ be a predictable m-dimensional process such that 0
kλs k2 ds < ∞ a.s.
for all t ≥ 0 and that
σt λt = µt − rt 1 for almost all (t, ω)
where 1 = (1, · · · , 1)> is the d × 1 vector with the constant 1 in each component.
Let
Rt 2
Rt
Yt = Y0 e− 0 (rs +kλs k /2)ds− 0 λs ·dWs
for a constant Y0 > 0 – or in equivalent differential form
dYt = Yt (−rt dt − λt · dWt ).
Then Y is a state price density.
Furthermore, if the filtration is generated by the m-dimensional Brownian motion W , all
state price densities have this form.
Remark. The m-dimensional random vector λt appearing the theorem is a generalisation
of the Sharpe ratio. The process λ = (λt )t≥0 is often called the market price of risk, for the
state price density, since it measures in some sense the excess return of the stocks per unit
of volatility.
Proof. We need to show that Y B and Y S are local martingales. Note that by Itô’s
formula
d(Yt Bt ) = −Yt Bt λt · dWt
so Y B is a local martingale since it is the stochastic integral with respect to a Brownian
motion W .
Also, by Itô’s formula
d(Yt Sti ) = Yt Sti [−rt + µit − (σt λt )i ] + Yt Sti (σti. − λt ) · dWt
= Yt Sti (σti. − λt ) · dWt

where we have used the identity σt λt = µt − rt 1 to cancel the dt term.

Conversely, if the filtration is generated by the Brownian motion, the martingale repre-
sentation theorem says that all positive local martingales M are of the form
1
Rt Rt
kλs k2 ds−
Mt = M0 e− 2 0 0 λs ·dWs

for some predictable λ, or in differential form


dMt = −Mt λt · dWt .
Hence, if Y B = M is a positive local martingale then
dYt = −Yt (rt dt + λt · dWt )
by Itô’s formula. Furthermore, if Y S is a local martingale, then Itô’s formula shows that in
order to cancel the drift we must have the identity σt λt = µt − rt 1. 
61
In the discrete time case, corresponding to a martingale deflator Y , there is an equivalent
martingale measure (with respect to the bank account) with the Radon–Nikodym density
dQ BT YT
= .
dP B0 Y0
for some time horizon T > 0. Recall, that the discrete time case, the definition of martingale
deflator implies that BY is a true martingale.
What if Y is a local martingale deflator, so the product BY is only a local martingale.
Therefore, we must ceck that BY is a true martingale in order to claim the density above
defines an equivalent probability measure. In discrete time, there is no problems since
positive local martingales are true martingales. However, in continuous time, we must be
more careful.
Theorem. Suppose that λ is a predictable process such that
σt λt = µt − rt 1.
If Rt Rt
1 2
Mt = e− 2 0 kλs k ds− 0 λs ·dWs
is a true martingale, then the measure Q defined by
dQ
= MT
dP
is an equivalent martingale measure (with respect to the bank account). In particular, the
dynamics of the stock prices are given by
!
X ij
dSti = Sti rt dt + σt dŴtj
j
Rt
where Ŵt = Wt + 0
λs ds is a Q-Brownian motion.
Proof. Note that by Itô’s formula
 i !
St Sti X ij
d = [µit − rt ]dt + σt dWtj
Bt Bt j

Sti X ij j
= σt (λt dt + dWtj )
Bt j
Sti X ij
= σt dŴtj .
Bt j
Rt
Now Girsanov’s theorem says that Ŵt = Wt + 0 λs ds is a Q-Brownian motion. Therefore,
each S i /B is the stochastic integral with respect to a Q-Brownian motion, and hence is a
Q-local martingale as claimed. 

62
CHAPTER 5

Hedging contingent claims in continuous time models

As before, given a market model P we can introduce a contingent claim. Recall that
a European contingent claim maturing at a time T > 0 is modelled as random variable ξ
that is FT -measurable. We shall assume that there exists at least one martingale deflator,
so that, in particular, there are no absolute arbitrages.

1. Replication and super-replication


First a simple result:
Theorem. Suppose H is an admissible super-replication strategy of ξT and Y a local
martingale deflator. Then
1
Ht · Pt = E(ξT YT |Ft ).
Yt
Proof. This is the same as the proof of that the existence of a local martingale deflator
implies no arbitrage.
E(ξT YT |Ft ) ≤ E(HT · PT YT |Ft )
≤ Ht · Pt Yt
since H · P Y is a supermartingale. 
Now we will impose more structure, by assuming that the market model P = (B, S) has
dynamics
dBt = Bt rt dt
m
!
X
dSti = Sti µit dt + σtij dWtj for i = 1, . . . , d
j=1

as before, or in vector notation, these equations can be written as


dSt = diag(St )(µt dt + σt dWt )
where
···
 
s1 0 0
...
0 s2 0 
 
diag(s1 , . . . , sd ) =  .

.. .. .. .. 
 . . . . 
0 0 ··· sd
We will work in the filtration generated by W , so that all state price densities Y are of
the form
dYt = Yt (−rt dt − λt · dWt ).
63
where
σt λt = µt − rt 1.
The following will serve as a version of the second fundamental theorem of asset pricing in
continuous time.
Theorem. Suppose m = d and that the d × d matrix σt is invertible for all (t, ω), so
that in particular, there is a unique (up to scaling) martingale deflator Y of the form
dYt = Yt (−rt dt − λt · dWt ).
where
λt = σt−1 (µt − rt 1).
Let ξT be non-negative, FT -measurable and such that ξT YT is integrable. Then there exists
an admissible strategy H such that
1
Ht · Pt = E(YT ξT |Ft ).
Yt
In particular, the strategy replicates the payout ξT .
Remark. That is to say, the quantity E(YT ξT )/Y0 is the minimal amount of money
needed to replicate the claim among reasonable trading strategies. Of course, if you could
employ a doubling strategy, you could replicate the claim with strictly less money. Of course,
you could also replicate the claim with more initial capital by running a suicide strategy on
top of the replication strategy.
Proof. Let
Mt = E(YT ξT |Ft ).
Then M is a martingale, and since the filtration is generated by the Brownian motion W
the martingale representation theorem tells us that there exists a d-dimensional predictable
process α such that
dMt = αt · dWt .
By Itô’s formula we have
 
Mt Mt (Mt λt + αt )
d = rt dt + · (dWt + λt dt).
Yt Yt Yt
Now let  
−1 (Mt λt + αt ) 1 Mt
πt = diag(St ) (σt> )−1 and φt = − πt · S t
Yt Bt Yt
Note that φt Bt + πt · St = Mt /Yt and that (after some tedious algebra)
 
Mt
φt dBt + πt · dSt = d .
Yt
This means H = (φ, π) is a self-financing strategy and
Mt
Ht · Pt = for all 0 ≤ t ≤ T.
Yt
It is admissible since Mt ≥ 0 and satisfies XT = MT /YT = ξT and X0 = E(ξT YT )/Y0 as
desired. 
64
If we consider the equation σt λt = µt − rt 1 where σt is an d × m matrix, one expects
from the rules of linear algebra for there to be no solution if m < d, exactly one solution if
m = d, and many solutions if m > d. Of course, this is not a theorem, just a rule of thumb.
Financially, the rule of thumb becomes:
m < d ‘⇒’ The market has arbitrage.
m = d ‘⇒’ The market has no arbitrage and is complete.
m > d ‘⇒’ The market has no arbitrage and is incomplete.

2. The Black–Scholes model and formula


We will consider the simplest possible model of the type studied introduced above. Con-
sider the case of a market with two assets. We will assume that all coefficients are constant,
so the price dynamics are given by the pair of equations
dBt = Bt r dt
dSt = St (µ dt + σdWt )
for real constants r, µ, σ where σ > 0. We will assume that the filtration is generate by the
scalar Brownian motion W . This is often called the Black–Scholes model.
We are interested in finding the replication cost of a European contingent claim with
payout ξT = g(ST ), where g is a given function which we assume to be non-negative and
suitably integrable. We know from before that the unique state price density with Y0 = 1 is
given by
2
Yt = e−(r+λ /2)t−λWt
where λ = (µ − r)/σ.
Hence, from our existential result there is a trading strategy H which replicates the
payout with time t cost
1
Xt = E[YT g(ST )|Ft ].
Yt
This is where we see the advantage of working with equivalent martingale measures rather
than state price densities. Indeed, define the equivalent martingale measure Q by the density
dQ 2
= e−λ T /2−λWT
dP
and recall that by the Cameron–Martin–Girsanov theorem the process Ŵt = Wt + λt is a
Q-Brownian motion.
The price of the stock can be written explicitly:
2 /2)t+σW 2 /2)t+σ Ŵ
St = S0 e(µ−σ t
= S0 e(r−σ t

and hence
h   i
−r(T −t) (r−σ 2 /2)T +σ ŴT
Ht · Pt = e Q
E g S0 e |Ft
h  2
 i
= e−r(T −t) EQ g St e(r−σ /2)(T −t)+σ(ŴT −Ŵt ) |Ft
Z ∞  √  −z2 /2
−r(T −t) (r−σ 2 /2)(T −t)+σ T −tz e
=e g St e √ dz.
−∞ 2π
65
A famous example is the case of the European call option where the payout function is of
the form g(S) = (S − K)+ . In this case, we have the the Nobel-prize-winning Black–Scholes
formula:

 
log(K/St )
Ct (T, K) =St Φ − √ + (r/σ + σ/2) T − t
σ T −t

 
−r(T −t) log(K/St )
− Ke Φ − √ + (r/σ − σ/2) T − t
σ T −t
Rx 2
where Φ(x) = −∞ √12π e−y /2 dy is the standard normal distribution function. (You are asked
to derive this formula on Example Sheet 3.)
We have argued that the martingale representation theorem asserts the existence of
replicating strategy H, but unfortunately, it gives us no information about how to compute
H. This problem will be tackled in the next section.

3. Markovian markets and the Black–Scholes PDE


We now have a sufficient condition that a contingent claim can be replicated. However,
at this stage we can only assert the existence of a replicating strategy for a given claim, but
we do not yet know how to actually compute it. This problem is the subject of this section.
The first step is to pose a model for the asset prices (Bt , St )t≥0 . A good model should
give a reasonable statistical fit to the actual market data. Furthermore, a useful model is
one in which the prices and hedges of contingent claims can be computed reasonably easily.
In this section, we will study models in which the asset prices are Markov processes. These
models are useful in the above sense, though there seems to be some controversy over how
well they fit actual market data.

Now suppose that the d + 1 assets have Itô dynamics which can be expressed as

dBt = Bt r(t, St ) dt
dSt = diag(St )(µ(t, St )dt + σ(t, St )dWt )

where the nonrandom functions r : [0, ∞) × Rd → R, µ : [0, ∞) × Rd → Rd and σ :


[0, ∞) × Rd → Rd×m are given. Notice that this is a special case of the set-up of the last
section, as now (with an abuse notation)

rt (ω) = r(t, St (ω)), µt (ω) = µ(t, St (ω)), and σt (ω) = σ(t, St (ω)).

In this special situation, the asset prices (St )t≥0 are a d-dimensional Markov process.
The next theorem says how to find a replicating strategy for a contingent claim maturing
at time T with payout
ξT = g(ST )

for some non-random function g : Rd → [0, ∞).


66
Theorem. Suppose the function V : [0, T ] × Rd → [0, ∞) satisfies the partial differential
equation
d d d
∂V X ∂V 1 XX ∂ 2V
+ rS i i + ai,j S i S j i j = rV
∂t i=1
∂S 2 i=1 j=1 ∂S ∂S
V (T, S) = g(S)
>
where a = σσ , and where all functions in the PDE are evaluated at the same point (t, S) ∈
[0, T ) × Rd .
Then there exists a 0-admissible strategy H such that Xt (H) = V (t, St ). In particular,
this strategy replicates the contingent claim with payout g(ST ).
Furthermore, if H = (φ, π) then the strategy can be calculated as
 
∂V ∂V
πt = grad V (t, St ) = (t, St ), . . . , d (t, St ) .
∂S 1 ∂S
and
V (t, St ) − πt · St
φt = .
Bt
The above theorem says that if the market model is Markovian, the price (i.e. replication
cost) of a claim contingent on the future risky asset prices can be written as a deterministic
function V of the current market prices. Furthermore, the pricing function V can be found by
solving a certain linear parabolic partial differential equation1 with terminal data to match
the payout of the claim. Solving this equation may be difficult to do by hand, but it can
usually be done by computer if the dimension d is reasonably small. And most importantly
for the banker selling such a contingent claim: the replicating portfolio πt can be calculated
as the gradient of the pricing function V with respect to the spatial variables, evaluated at
time t and current price St .
Proof. By Itô’s formula we have
∂V X ∂V
i 1 X ∂ 2V
dV (t, St ) = dt + i
dSt + i ∂S j
dhS i , S j it
∂t i
∂S 2 i,j
∂S
!
∂V 1 X ∂ 2V i j ij
X ∂V
= + S S a dt + dSti
∂t 2 i,j ∂S i ∂S j i
∂S i

!
X ∂V X ∂V
=r V − S i i dt + + i
dSti
i
∂S i
∂S
where we have used the assumption that V solves a certain PDE to go from the second to
third line above.
Now letting φ and π be as in the statement of the theorem we have that
V (t, St ) = φt Bt + πt · St
dV (t, St ) = φt dBt + πt · dSt .
1sometimes called the Feynman–Kac PDE. If r = 0, the PDE reduces to the (backward) Kolmogorov
equation.
67
Hence H = (φ, π) is a self-financing strategy with associated wealth process Xt (H) = V (t, St )
as claimed. It is 0-admissible since V ≥ 0 by assumption. 
We have seen that there are two distinct ways to find replication costs for certain con-
tingent claims: by computing expectations or by solving a PDE. Furthermore, the PDE
method also gives the replicating portfolio. But how do you solve the PDE? In many cases,
the easiest way to solve the PDE is to compute the expectations.
This is illustrated by the Black–Scholes model:
Example (Black–Scholes continued). Let’s return to the Black–Scholes model
dBt = Bt rdt
dSt = St (µdt + σdWt )
with constant coefficients r, σ, µ. If we would like to replicate a claim with payout g(ST ), the
previous theorem says we should solve the Black–Scholes PDE
∂V ∂V 1 ∂ 2V
+ rS + σ 2 S 2 2 = rV
∂t ∂S 2 ∂S
V (T, S) = g(S)

Now, let’s specialise to the case of the call option where g(S) = (S − K)+ . From last
section we have

 
log(K/S)
V (t, S) =SΦ − √ + (r/σ + σ/2) T − t
σ T −t

 
−r(T −t) log(K/S)
− Ke Φ − √ + (r/σ − σ/2) T − t .
σ T −t
The delta, i.e. the replicating portfolio, in this case is (by a miracle of algebra)

 
∂V log(K/S)
(t, S) = Φ − √ + (r/σ + σ/2) T − t .
∂S σ T −t
Note that an agent attempting to replicate a call option using the Black–Scholes theory will
always hold a fraction of shares of the underlying stock between 0 and 1. Also note that
since the sensitivity of the portfolio to the price of the underlying, is given by the formula
∂ 2V √
 
1 log(K/S)
(t, S) = √ φ − √ + (r/σ + σ/2) T − t
∂S 2 Sσ T − t σ T −t
2
where φ(x) = √12π e−x /2 . Since the gamma is always positive, the hedger will buy more
shares of the underlying if the price goes up.

4. Black–Scholes volatility
What made the Black–Scholes formula so popular after its publication in 1973 is the
fact that the right-hand-side depends only on six quantities: the current calendar time t,
the option’s maturity time T , the option’s strike K, the spot interest rate r, the underlying
stock’s price St at time t, and a volatility parameter σ. Of these six numbers, only the
volatility parameter is neither specified by the option contract nor quoted in the market.
68
To use the Black–Scholes formula to find the price of real call options, one must first
estimate the volatility σ.

4.1. Estimation: statistics. In the Black–Scholes model, the drift µ and volatility σ
are not directly observable. Nevertheless, they can be estimated by appealing to standard
statistical theory. Suppose that we have observed the stock price (St
)−T ≤t≤0
 . If we sample at
Sti
times ti = (i/n − 1)T , we see that the n random variables Yi = log St are independent
i−1
with distribution
Yi = (µ − σ 2 /2)(ti − ti−1 ) + σ(Wti − Wti−1 )
∼ N (aT /n, σ 2 T /n)
where a = µ − σ 2 /2. The maximum likelihood estimator of a is
n
1X
â = Yi
T i=1

and of σ 2 is
n
1X
σˆ2 = (Yi − âT /n)2 .
T i=1
Notice that this estimator â can be rewritten as
1
â = log(S0 /S−T ),
T
and hence does not depend on n! That is to say, there is no advantage going to ever higher
and higher frequency data to estimate the drift µ. Fortunately, a careful reading of the
previous section shows that the drift parameter µ is not needed to find either replication
cost or the replicating strategy. This is good news for the Black–Scholes theory.2
On the other hand, the variance of σ̂ 2 is 2σ 4 /n → 0 as n → ∞. Hence, there is some
hope of accurately estimating the volatility parameter by sampling the historical stock prices
regularly enough.
If one was to truly believe that the stock price was a geometric Brownian motion, that
is, of the form St = S0 eat+σWt , then one could insert the value σˆ2 into the Black–Scholes
formula to obtain the price of a call option. Notice that we have done the statistics under
the objective measure P, not the equivalent martingale measure Q.

4.2. Calibration: implied volatility. A completely different approach to find the


volatility parameter is to observe the prices of contingent claims from the market, and then
try to work out which σ to put into the Black–Scholes formula to get the right price.
2However, this is very bad news for optimal investment. For instance, consider the problem of maximising
E log(XT ) over all 0-admissible trading strategies. It turns out that the optimal fraction of wealth to hold
in the stock is given by
πt∗ St µ−r
= .
Xt∗ σ2
However, this formula is useless unless the parameters on the right-hand side can be estimated accurately.
69
The Black–Scholes formula says that in the context of a Black–Scholes model the call
price is given by
(*) Ct (T, K) = C BS (t, T, K, St , r, σ)
for an explicit function C BS written out the previous section.
But in reality we do not know σ but can observe the call prices. Therefore, rather than
compute the call price from the parameters, we turn the story around by defining the implied
volatility of the option to be unique number σ such that equation (*) holds. We denote by
Σt (T, K) the implied volatility of at time t of an option with maturity T and strike K.
If the market was still pricing call options by Black–Scholes formula, then there would
exist one parameter σ such that Σt (T, K) = σ for all 0 ≤ t < T and K > 0. However, in real-
world markets, is is usually the case that the implied volatility surface (T, K) 7→ Σt (T, K)
is not flat. Indeed, for fixed T , the graph of the function K 7→ Σt (T, K) often resembles a
convex parabola3 at least for strikes K close to the money, i.e. such that Ke−r(T −t) /St ≈ 1.
That is why practictioner refer to the function K 7→ Σt (T, K) as the implied volatility smile
or smirk.
One could either conclude Black–Scholes model is the true model of the stock price and
that the market is mispricing options, or that the Black–Scholes model does not quite match
reality. The second approach is more prudent. Then, why even consider implied volatility?
As Rebonato famously put it:
Implied volatility is the wrong number to put into wrong formula to obtain
the correct price.
However, thanks to the enormous influence of the Black–Scholes theory, the implied volatility
is now used as a common language to quote option prices.

4.3. Robustness of Black–Scholes. As argued above, since real markets tend to ex-
hibit implied volatility smiles, the Black–Scholes model cannot be considered an adequate
description of how stock prices fluctuate. However, it should be considered an approxi-
mation of reality, and we will now do a calculation to see how to quantify how good this
approximation is.
Suppose a banker wants to sell a contingent claim with payout ξT = g(ST ). The banker
believes that the underlying stocks prices are given by the Black–Scholes model, so that the
initial price of the claim is given by V (0, S0 , σ) where
Z ∞ √ −z 2 /2
−r(T −t) (r−σ 2 /2)(T −t)+σ T −tz e
V (t, S, σ) = e g(S0 e ) √ dz,
−∞ 2π
for some σ to be determined. Now, the claim is already traded on the market with initial
price ξ0 , so the banker chooses a σ = σ̂ such that V (0, S0 , σ̂) = ξ0 , i.e. σ̂ is the initial implied
volatility for the claim.
Now, the banker wants to hedge away the liability associated with the payout of the
claim, so again believing the Black–Scholes theory, he puts the initial wealth of X0 = ξ0 in

3...but
p
be careful: for large K, the graph can grow no faster than 2 log K/(T − t). See example sheet
4.
70
his account and holds a portfolio of
∂V
πt = (t, St , σ̂)
∂S
shares of the stock at all times. His wealth then evolves as
(*) dXt = r(Xt − πt St )dt + πt dSt .
The banker knows that according to the Black–Scholes theory his strategy should replicate
the claim XT = g(ST ) a.s.
Suppose that the true dynamics of the market are given by
dBt = Bt rdt
dSt = St (µdt + σt dWt )
where r and µ are the same constants as before, but now (σt )t≥0 is some predictable process.
How big is the hedging error XT − g(ST )?
First note that V solves the Black–Scholes PDE:
∂V ∂V 1 ∂ 2V
+ rS + σ̂ 2 S 2 2 = rV.
∂t ∂S 2 ∂S
V (T, S) = g(S).
Now note that by Itô’s formula and the Black–Scholes PDE
∂V ∂V 1 ∂ 2V
dV (t, St , σ̂) = dt + dSt + dhSit
∂t ∂S 2 ∂S 2
1 ∂ 2V
= rV dt + πt (dSt − rSt dt) + St2 (σt2 − σ̂ 2 ) 2 dt
2 ∂S
Subtracting this equation from equation (*) and solving yields
1 T r(T −t) 2 ∂ 2V
Z
e (σ̂ − σt2 )St2 2 (t, St , σ̂)dt = XT − V (T, ST , σ̂) − erT (X0 − V (0, S0 , σ̂))
2 0 ∂S
= XT − g(ST )
since X0 = ξ0 by assumption and ξ0 = V (0, S0 , σ̂) by the definition of σ̂.
The above formula show that the naive Black–Scholes hedger does reasonably well in a
world where the implied volatility is close to the actual spot volatility. For many claims,
2
such as call options, the gamma ∂∂SV2 is positive. Therefore, the naive hedgers strategy may
fall short if the implied volatility is smaller than the realised spot volatility.

5. Local volatility models


In the previous section we have considered the Black–Scholes model–a two asset market
model in which the risky asset price is a geometric Brownian motion. The Black–Scholes
formula gives an explicit representation of the prices Ct (T, K) of call options in this model
in terms of the calendar time t, the current stock price St , spot interest rate r, the option
maturity T and strike K, and a volatility parameter σ.
71
However, since the implied volatility surface Σt (T, K) of real-world option prices is usually
not flat, practitioners and researchers have proposed various generalisations of the Black–
Scholes model to better match the observed implied volatility surface. We now consider
another Markovian model which can match a given implied volatility surface exactly.
We consider a model given by
dBt = Bt r dt
dSt = St (µ dt + σ(t, St )dWt ).
That is, the idea is replace the constant volatility parameter in Black–Scholes model with a
local volatility function σ : [0, ∞) × (0, ∞) → (0, ∞). We will assume that σ is smooth and
bounded from below and above. As always, let Q be the equivalent martingale measures
with density
dQ 1 T 2
R RT
= e− 2 0 λs ds− 0 λs dWs
dP
where λt = (µ − r)/σ(t, St ). Recall that by Girsanov’s theorem dŴt = dWt − λt dt defines a
Q-Brownian motion.
The next theorem in the present context is usually attributed to Dupire’s 1994 paper.
Theorem. Suppose that
C0 (T, K) = EQ [e−rT (ST − K)+ ]
Then
∂C0 ∂C0 σ(T, K)2 2 ∂ 2 C0
(T, K) + rK (T, K) = K (T, K).
∂T ∂K 2 ∂K 2
Remark. We have already seen a PDE for the replication cost of options in Markovian
models. In that PDE, the solution V (t, St ) was the time-t value of a replication strategy for
the given claim, and the derivatives were respect to the calendar time t and the current price
of the underlying asset St . In contrast, Dupire’s PDE is for the initial replication cost of a
call option, and the derivatives are with respect to the maturity date T and the strike K.
Remark. The point of the above theorem is this: Suppose you believe that the stock
price is generated by a local volatility model, but you do not know what the local volatility
function is. If you can observe today’s call price surface {C0 (T, K) : T > 0, K > 0} then
you can solve for the local volatility in Dupire’s PDE to arrive at Dupire’s formula
!1/2
2[ ∂C
∂T
0
(T, K) + rK ∂C0
∂K
(T, K)]
σ(T, K) = ∂ 2 C0
.
2
K ∂K 2 (T, K)
Furthermore, assuming Dupire’s PDE has a unique solution (it will if σ is smooth and
bounded as assumed) then we have found a model that can reproduce the observed call
prices.
Of course, plugging the call surface C0 (T, K) = C BS (t = 0, T, K, S0 , r, σ0 ) into Dupire’s
formula yields
!1/2
2[ ∂C
∂T
0
(T, K) + rK ∂C0
∂K
(T, K)]
∂ 2 C0
= σ0 ,
2
K ∂K 2 (T, K)
as it should. In general, however, the local volatility surface need not be flat.
72
Before we sketch the proof, we will need a simple result observed by Breeden and Litzen-
berger in 1978. The proof just involves calculus, so there is no need to spell it out.
Lemma. Suppose the random variable ST has a continuous density with respect to Lebesgue
measure; that is, there exists a continuous function fST : [0, ∞) → [0, ∞) such that
Z x
Q(ST ≤ x) = fST (y) dy.
0
Then Z ∞ Z ∞
−rT + −rT
C0 (T, K) = e fST (y)(y − K) dy = e fST (y)(y − K)dy.
0 K
and
Z ∞
∂C0 −rT
(T, K) = −e fST (y) dy
∂K K
∂ 2 C0
(T, K) = e−rT fST (K)
∂K 2
Sketch of proof of Dupire’s formula. To outline the argument, we proceed for-
mally
Z T
1 T
Z
+
(ST − K) = (S0 − K) + +
1{St ≥K} dSt + δK (St )dhSit
0 2 0
Z T 
1
= (S0 − K) + +
1{St ≥K} St r + δK (St )St σ(t, St ) dt
2 2
0 2
Z T
+ 1{St ≥K} St σ(t, St )dŴt
0

where we have appealed to Itô’s formula4 with g(x) = (x − K)+ , g 0 (x) = 1[K,∞) (x), and
g 00 (x) = δK (x), the Dirac delta ‘function’.
Now, by the assumption of smoothness and the bounds on the volatility function, the
Q-law of the random variable ST has a density function fST . Computing expected values of
both sides
Z TZ ∞
1 T
Z
rT +
(1) e C0 (T, K) = (S0 − K) + fSt (y)y r dy dt + fSt (K)K 2 σ(t, K)2 dt
0 K 2 0
and then differentiating both sides with respect to T yields
  Z ∞
rT ∂C0 1
e (T, K) + rC0 (T, K) = fST (y)y r dy + fSt (K)K 2 σ(t, K)2
∂T K 2
and the result follows from noting
Z ∞ Z ∞ Z ∞
+
fST (y) y dy = fST (y)(y − K) dy + K fST (y)dy
K 0 K
and applying the Breeden–Litzenberger identities. 
4A version of Itô’s formula for non-smooth convex functions, called Tanaka’s formula, can actually be
rigorously stated in terms of a quantity called local time.
73
6. Computing marginal laws
We begin this section with some comments on the Breeden–Litzenberger formula. First
note that the collection of call prices {C0 (T, K) : K > 0}, determines the Q-law of the
random variable ST , even if a density doesn’t exist. Indeed, note that
C0 (T, K + ) − C0 (T, K)
lim = −e−rT Q(ST > K).
↓0 
Also, if the put prices are given by the put-call parity formula
P0 (T, K) = C0 (T, K) − S0 + e−rT K
then the put prices also determine the marginal laws of S.
Now, if we know the distribution of ST , we can compute the expectation
EQ e−rT g(ST )
for any non-negative function g. Of course, the above quantity is the replication cost of a
contingent claim with payout ξT = g(ST ). Is there a way to deduce this replication cost
directly from the prices of calls and puts?
The answer is yes. Indeed, suppose g is C 2 and convex. Then, the following formula
holds identically
Z a Z ∞
0 00
g(S) = g(a) + g (a)(S − a) + +
g (K)(K − S) dK + g 00 (K)(S − K)+ dK
0 a

for any a > 0. This identity can be verified by integration by parts. By approximating the
integral by a Riemann sum
X X
g(S) ≈ g(a) − ag 0 (a) + g 0 (a)S + ∆Ki g 00 (Ki )(Ki − S)+ + ∆Ki g 00 (Ki )(S − Ki )+
Ki <a Ki ≥a

and we see that the financial significance is that the payout g(ST ) of the claim can be
approximated by holding an portfolio consisting of a bond with principal value g(a) − ag 0 (a),
g 0 (a) shares of the stock, ∆Ki g 00 (Ki ) puts of strike Ki < a and ∆Ki g 00 (Ki ) calls of strike
Ki ≥ a. And by integration, we get
Z a Z ∞
Q −rT 0 −rT 0 00
E [e g(ST )] = [g(a)−ag (a)]e +g (a)S0 + g (K)P0 (T, K)dK+ g 00 (K)C0 (T, K)dK.
0 a

6.1. Call prices from moment generating functions. Since a portfolio of calls and
puts on a stock can essentially replicate any European contingent claim, it is important to
have models where the call prices can be computed easily. Unfortunately, there are few
models where there exists nice, elementary formulae for the call prices. However, there
are many models where the moment generating functions can be computed explicitly, and
we will now see that given the moment generating function we can compute call prices by
integration:
Consider a market model (B, S) where Bt = B0 ert and S/B is a positive Q-martingale.
For complex θ in the vertical strip
Θ = {θ = p + iq : 0 ≤ p ≤ 1, q ∈ R}
74
define the moment generating function of the log stock price by
Mt (θ) = EQ (eθ log St ).
Note that since S/B is a martingale we have for θ = p + iq ∈ Θ,
EQ (|eθ log St |) = EQ (Stp )
≤ EQ (St )p
= eprt S0p < ∞
by Jensen’s inequality, so the moment generating function is well-defined. The following
result shows how to recover call prices from the moment generating function.
Theorem. For any 0 < p < 1 the identity

e−rT K 1−p MT (p + ix)e−ix log K
Z
−rT +
E[e (ST − K) ] = S0 − dx
2π −∞ (x − ip)(x + i(1 − p))
holds.
Essentially, we are inverting the moment generating function via a complex integral.
Variants of this procedure are often called a Bromwich, Fourier or Mellin transform. To
prove this formula, we begin with a lemma:
Lemma. For any 0 < p < 1 the identity
Z ∞
eiax
 −ap
1 e if a ≥ 0
dx =
2π −∞ (x − ip)(x + i(1 − p)) ea(1−p) if a < 0
holds.

Proof. This is a standard application of the Cauchy residue theorem. Consider the case
a ≥ 0. Define the semi-circular contour
ΓR = {x + i0 : −R ≤ x ≤ R} ∪ {Reiφ : 0 ≤ φ ≤ π}
in the upper half-plane. Cauchy’s theorem
eiaz eiaz
Z
dz = i2π
ΓR (z − ip)(z + i(1 − p)) z + i(1 − p) z=ip
−ap
= 2πe
75
since the integrand is meromorphic with a simple pole at z = ip inside the contour, and the
contour integral is evaluated in the anticlockwise sense.
On the other hand,
Z R Z π
eiaz eiax iRe−aR sin φ ei(aR cos φ+φ)
Z
dz = dx + iφ iφ

ΓR (z − ip)(z + i(1 − p)) −R (x − ip)(x + i(1 − p)) 0 (Re − ip)(Re + i(1 − p))

and the second integral vanishes as R → ∞ since a ≥ 0.


The case a < 0 is handled in exactly the same way; just integrate around a semi-circular
contour in the lower half-plane enclosing the other pole at −i(1 − p). 

Proof of theorem. From the lemma, we have the identity


K 1−p ∞ ep log ST +ix log(ST /K)
Z
+
(ST − K) = ST − dx.
2π −∞ (x − ip)(x + i(1 − p))

Now multiply by e−rT and compute expectations. The result follows upon interchanging
expectation and integration on the right-hand side. This is justified by Fubini’s theorem
since
Z ∞ Z ∞
ep log ST +ix log(ST /K) dx
EQ
dx = MT (p) p <∞
−∞ (x − ip)(x + i(1 − p)) −∞ (x + p )(x2 + (1 − p)2 )
2 2


Remark. Here is one application of the representation of call prices in terms of the
moment generating function. Let ΛT (p) = log MT (p) be the cumulant generating function.
By standard arguments, the function ΛT is convex and smooth on p ∈ [0, 1]. Note that
ΛT (0) = 0 and ΛT (1) = rT + log S0 . By the mean value theorem there exists a p∗ ∈ (0, 1)
such that Λ0T (p∗ ) = rT + log S0 . (Alternatively, let p∗ be the minimiser of p 7→ ΛT (p) −
p(rT + log S0 ). ) By Taylor’s formula
1
ΛT (p) ≈ ΛT (p∗ ) + (rT + log S0 )(p − p∗ ) + Λ00T (p∗ )(p − p∗ )2
2
where Λ00T (p∗ ) > 0. Hence
1 00 ∗ )x2
MT (p∗ + ix) ≈ MT (p∗ )ei(rT +log S0 )x− 2 ΛT (p

If we approximate the Bromwich integral by a Gaussian integral, we have the approximation


∗ 1 00 ∗ 2
e−rT K 1−p MT (p∗ ) ∞ e−ixk− 2 ΛT (p )x
Z
C0 (T, K) ≈ S0 − dx
2π −∞ (x − ip)(x + i(1 − p))

K 1−p e−rT MT (p∗ )
≈ S0 − p
p∗ (1 − p∗ ) 2πΛ00T (p∗ )

where k = log(Ke−rT /S0 ) is the log-moneyness, and the second approximation is appropriate
when Λ00T (p) → ∞.
76
6.2. Computing moment generating functions. In order to make use of the pre-
vious section, we need to be able to compute the moment generating function for some
interesting models. We first consider a general stochastic volatility model:
dBt = Bt rdt

dSt = St (rdt + vt dWtS )
dvt = A(vt )dt + B(vt )dWtv
Here W S and W v are assumed to be correlated Brownian motions in a fixed equivalent
martingale measure Q, with correlation ρ. Correlated Brownian motions can be constructed,
for instance, by letting W v and W ⊥ be independent Brownian motions and let
p
WtS = ρWtv + 1 − ρ2 Wt⊥ .
Theorem. For each θ ∈ Θ, let F (·, ·; θ) solve the PDE
∂F √ ∂F 1 ∂ 2F
+ [θr + (θ2 − θ)v/2]F + (A + θ vBρ) + B2 2 = 0
∂t ∂v 2 ∂v
with boundary condition
F (T, v; θ) = 1
then (Mt )0≤t≤T is a local martingale where
Mt = eθ log St F (t, v; θ).
Proof. This is just another application of Itô’s formula. 
The significance of this result is that if we can prove the local martingale is a true
martingale, then
E[eθ log ST ] = eθ log S0 F (0, v0 ; θ)
and hence we have found the moment generating function.
To use this result, we need to solve a PDE in one spacial variable. Since the PDE for the
option prices would involve two spacial variables (v, S), we are in a better position finding
the moment generating function first via the above theorem, though we still need to evaluate
the Bromwich integral.

6.3. The Heston model. We now explore a model where the moment generating func-
tion can be computed explicitly. It was introduced by Heston in 1993:
dBt = Bt rdt

dSt = St (rdt + vt dWtS )

dvt = λ(v̄ − vt )dt + c vt dWtv
with hW S , W v i = ρt. This is just a special case of the stochastic
√ volatility model in the
previous subsection with A(v) = λ(v̄ − v) and B(v) = γ v for some positive constants
λ, v̄, γ. In this model the squared volatility v is a mean-reverting process , i.e. an ergodic
Markov process, at least under Q. The interpretation of v̄ is the level of mean reversion,
while λ is the speed of mean reversion. We will come across the stochastic process in the
context of the Cox–Ingersoll–Ross rate model. It was first studied by Feller in the 1950s.
77
The Heston PDE is then
∂F ∂F 1 ∂ 2F
+ [θr + (θ2 − θ)v/2]F + [λv̄ + (θcρ − λ)v] + c2 v 2 = 0.
∂t ∂v 2 ∂v
It turns out that this PDE can be solved explicitly. The trick is to make the ansatz
F (t, v; θ) = eR(T −t;θ)v+Q(T −t;θ) .
Note that the boundary condition F (T, v; θ) = 1 force R(0; θ) = Q(0; θ) = 0. The PDE
becomes
1
−Ṙv − Q̇ + [θr + (θ2 − θ)v/2] + [λv̄ + (θcρ − λ)v]R + c2 vR2 = 0,
2
where the dot indicates differentiation with respect to the time variable. Notice that the
equation can be written in the form
α(T − t; θ)v + β(T − t; θ) = 0.
Now, the above equation should hold for all v so α(T − t; θ) = 0 = β(T − t; θ), i.e
1
Ṙ = (θ2 − θ)/2 + (θcρ − λ)R + c2 R2
2
Q̇ = θr + λv̄R.
The equation for R is a Riccati equation which can be solved explicitly. In fact, we do not
even have to make any tricky substitutions, separation of variables and partial fractions work
well enough:
1
Ṙ = c2 (R − R+ )(R − R− )
2
Ṙ 1
⇒ = c2
(R − R+ )(R − R− ) 2
 
1 1 1 1
⇒ − Ṙ = c2
R+ − R− R − R+ R − R− 2
 
1 − R(τ )/R+
⇒ log = γτ
1 − R(τ )/R−
eγ(θ)τ − 1
⇒ R(τ ; θ) = (θ2 − θ)
(γ(θ) − θcρ + λ)eγ(θ)τ + (γ(θ) + θcρ − λ)
p
where γ(θ) = (λ − θcρ)2 − (θ2 − θ)c2 and R± (θ) = [(λ − θcρ)2 ± γ(θ)]/c2 . And the second
equation can be solved
Z τ
Q(τ ; θ) = θrτ + λv̄R(s; θ)ds
0
(θ2 − θ)λv̄ (γ(θ) − θcρ + λ)eγ(θ)τ + (γ(θ) + θcρ − λ)
   
2λv̄
= θr + τ − 2 log
γ(θ) + θcρ − λ c 2γ(θ)
It can be shown that for θ ∈ Θ that
EQ (eθ log ST ) = eθ log S0 +R(T ;θ)v0 +Q(T ;θ) .
78
What is the point of this calculation? Although the formula for the moment generating
function is hard to call beautiful, it is very explicit. In particular, given the set of model
parameters (v0 , λ, v̄, c), the function can be evaluated very quickly on a computer, and hence
the Bromwich integral for call prices can be computed numerically quickly. Hence, it is
possible to calibrate the Heston model to market data in a reasonable amount of time. This
is one of the main reasons for its popularity.

7. American claims in local volatility models


We have previously considered American claims in a general complete market in discrete
time. Our main tool was the Snell envelope. In this section we we will consider American
claims in a a continuous time model. We will make a Markovian assumption so that PDE
techniques are available.
We work in a market with d + 1 assets, a bank account with dynamics
dBt = Bt rdt
and d stocks with dynamics
dSt = diag(St )(µ dt + σ(t, St )dWt )
where r ≥ 0 and µ ∈ Rd are constants, σ : [0, ∞) × Rd → Rd×m is a given function and W is
an m-dimensional Brownian motion. Consider the problem faced by a banker who has sold
an American contingent claim with maturity T which pays g(St ) if the claim is exercised at
time t ∈ [0, T ], and who wishes to trade in the underlying market to hedge his exposure to
the optimally exercised claim. The main result is this:
Theorem. Let L be the differential operator defined by
d d d 2
∂V X
i ∂V 1 XX i j ∂ V
LV = + rS + ai,j S S − rV
∂t i=1
∂S i 2 i=1 j=1 ∂S i ∂S j

where a = σσ > .
Suppose V : [0, T ] × Rd → [0, ∞) solves the variational inequality
max {LV, g − V } = 0
V (T, S) = g(S).
Let X be the wealth process started with X0 = V (0, S0 ) and with πt = gradV (t, St ) shares
of stock at time t. Then Xt ≥ g(St ) for all t ∈ [0, T ], and there exists a stopping time τ∗
such that Xτ∗ = g(Sτ∗ ).
Remark. To see where this variational inequality comes from, let’s consider heuristically
the Snell envelope of Y ξ which should satisfy an equation like
Zt = max{Yt ξt , E[Zt+δ |Ft ]}
where δ > 0 is a small increment of time, the process ξ specifies the payout of the claim and
Y is the state price density. First, let U = Z/Y and let Q the equivalent martingale measure
corresponding to Y and the numéraire B. Then U should satisfy
Ut = max{ξt , EQ [e−rδ Ut+δ |Ft ]}.
79
Now, since ξt = g(St ) and S is a Markov process under Q, we suspect that Ut = V (t, St ) for
some function V . By Itô’s formula
Z t+δ Z t+δ
−rδ −rs ∂V
e V (t + δ, St+δ ) = V (t, St ) + e LV (s, Ss )ds + e−rs (s, Ss )dŴs
t t ∂S
where Ŵ is a Q-Brownian motion. Assuming the stochastic integral is mean-zero, we have
 Z t+δ 
−rs
V (t, S) = max g(S), V (t, S) + EQ
e LV (s, Ss )ds|St = S .
t

Subtracting V (t, S) from both sides and sending δ ↓ 0 yields the variational inequality
appearing in the theorem.
Remark. It should not be too surprising that the differential operator L also appeared
in our discussion of hedging European contingent claims. The following proof will proceed
in a similar way to the proof of the robustness of Black–Scholes implied volatility.
Proof. Let X be the wealth process. As usual we have
dXt = rXt dt + πt · (dSt − rSt dt).
Also by Itô’s formula we have
d d
!
∂V 1 XX ∂ 2V
dV (t, St ) = + ai,j S i S j i j dt + πt · dSt
∂t 2 i=1 j=1 ∂S ∂S

and hence by subtraction


(*) d(Xt − V (t, St )) = [r(Xt − V (t, St )) − LV (t, St )]dt
Since X0 = V (0, S0 ) and LV ≤ 0 by assumption, equation (*) implies Xt ≥ V (t, St ) for
all t ∈ [0, T ]. The fact that Xt ≥ g(St ) follows from the assumption that g − V ≤ 0.
Finally, let τ∗ = inf{t ≥ 0 : V (t, St ) = g(St )}. We will show that
Xτ∗ = V (τ∗ , Sτ∗ ) = g(Sτ∗ ).
Note that {τ∗ = 0} we have
Xτ∗ = X0 = V (0, S0 ) = V (τ∗ , Sτ∗ )
by assumption. Also, on {τ∗ > 0} note that on the interval t ∈ [0, τ ∗], we have LV (t, St ) = 0
and hence by equation (*) we have Xt = V (t, St ). 

Remark. We can identify two subsets of [0, T ] × Rd as follows


S = {(t, S) : g(S) − V (t, S) = 0}
C = {(t, S) : g(S) − V (t, S) < 0}
The set S is called the stopping region since if (t, St ) ∈ S it is optimal (from the point of view
of the buyer) to exercise the American claim. Similarly, the set C is called the continuation
region, since if (t, St ) ∈ C it is optimal to wait.
80
In general, it is impossible to find an explicit solution to the American options PDE.
However, there is one case5 where all the calculations can be done.
Example (Infinite horizon put in the Black–Scholes model). Let d = 1 and assume σ > 0
is constant. That is, the stock price is given by the Black–Scholes model. We will study the
American option PDE in the case of the infinite horizon put, where g(S) = (K − S)+ and
T = ∞.
Since (St )t≥0 is a time-homogeneous process, we can restrict ourselves to functions V
that depend on S but not on t. Also, since the payout function is decreasing, we guess that
there exists some q ∈ (0, K) such that we should continue if S > q and stop if S < q.
Hence we guess that
(S) V (S) = K − S if S < q
1 2 2 00
(C) σ S V + rSV 0 − rV = 0 if S > q
2
By the usual techniques of ODEs, the general solution to equation (C) is given by
V (S) = A0 S + A1 S −a
for some constants A0 and A1 , where a = 2r/σ 2 . Since we expect V (S) → 0 as S → ∞, we
can conclude that A0 = 0. It remains to solve for the constants q and A1 .
Since we expect V to be continuous at S = q, we have
A1 q −a = K − q.
To find another equation, we assume the smooth pasting condition that the derivative V 0 is
continuous at S = q:
−aA1 q −(a+1) = −1.
From this, we have
aK K a+1 aa
q= and A1 = .
a+1 (a + 1)a+1

5going
back at least to H. McKean. Appendix: A free boundary problem for the heat equation arising
from a problem in mathematical economics. Industrial Management Review 6: 32-39. (1965)
81
CHAPTER 6

Interest rate models

1. Bond prices and interest rates


In this last chapter, we explore models for the interest rate term structure. The basic
financial instruments in this setting are the zero-coupon bonds.
Definition. A (zero-coupon) bond with maturity T is a European contingent claim
that pays exactly1 one unit of currency at time T . We denote by P (t, T ) the price at time
t ∈ [0, T ] of the bond.
To get a feel for how we should model the bond prices, note that a typical sample
path t 7→ P (t, T ) of a zero-coupon bond price will look similar to the sample path of any
other asset price. However, note that at maturity the bond is worth its principal value, so
P (T, T ) = 1. On the other hand, since people prefer to be paid sooner rather than later,
the map T 7→ P (t, T ) is usually decreasing. Of course, there are only a finite number of
maturities of bonds traded on the the fixed income market. But since this number is very
large, it is common practice to represent the zero-coupon bond prices as a continuous curve,
rather than a discrete set of points.

Rather than speak of bond prices, it is often easier to speak of interest rates. A popular
interest rate is the yield y(t, T ) at time t of a bond maturing at time T defined by the
1We assume that the bond issuer is absolutely credit worthy, and there is exactly zero probability of
default. Therefore, we are not discussing corporate bonds, mortgage-backed securities or the debt of some
countries (for instance, Russia famously defaulted in 1998). In fact, there is probably no real-world example
of a perfectly risk-free bond. Nevertheless, many practictioners probably still regard U.S. Treasury bonds,
which are backed by the ‘full faith and credit’ of the U.S. government, as virtually risk-free. Though with
the current political situation in Washington, this may well change.
83
formula
1
y(t, T ) = − log P (t, T ).
T −t
For us, a more useful interest rate is the forward rate f (t, T ) at time t for maturity T ,
defined by

f (t, T ) = − log P (t, T ).
∂T
The yield curve, the forward rate curve and the bond price curve contain the same informa-
tion, since
RT
P (t, T ) = e−(T −t) y(t,T ) = e− t f (t,s)ds
The term structure of interest rates refers the function T 7→ P (t, T ), or equivalently, the
price data encoded in either of the functions T 7→ y(t, T ) or T 7→ f (t, T ).

There are at least two perspectives to bond price modelling. One is to assume that bonds
are derivative securities, where the underlying asset is a bank or money market account. A
complementary perspective is to consider the bonds as fundamental and the bank account
as a derivative asset (see example sheet 1). We will mostly explore the first perspective in
this chapter, but will return to the second perspective in our study of HJM models.

2. Bank accounts to bond prices and interest rates


Adopting the first perspective mentioned above, we assume that there is a numéraire
asset, the bank account, with price dynamics
dBt = Bt rt dt
where the process r = (rt )t≥0 is called the spot interest rate or the short interest rate. Of
course, the above differential equation has the solution
Rt
rs ds
Bt = B0 e 0 .
Now, we formulate a condition so that for any collection of maturities T1 < . . . < Td , the
market (Bt , P (t, T1 ), . . . , P (t, Td ))t∈[0,T1 ] has no arbitrage.
Theorem. There is no arbitrage relative to the numéraire if there exists an equivalent
measure Q such that the discounted bond price process (P (t, T )/Bt )t∈[0,T ] is a local martingale
for all T > 0. In particular, there is no arbitrage if
RT
P (t, T ) = EQ (e− t rs ds
|Ft )
for all 0 ≤ t ≤ T .
Notice that if RT
P (t, T ) = EQ (e− t rs ds
|Ft ),
and r is suitably well-behaved, we can differentiate the bond price with respect to maturity
to recover the forward rate:
RT
EQ (rT e− t rs ds
|Ft )
f (t, T ) = RT .
EQ (e− t rs ds
|Ft )
84
Notice that
lim f (t, T ) = rt
T ↓t

so the short rate is the left-hand end point of the forward rate curve. (The long rate
limT ↑∞ f (t, T ) is the far right-hand end of the curve.)
From common experience, it seems that we should like to model the interest rate (rt )t≥0
as a non-negative process. Indeed, if rt ≥ 0 for all t ≥ 0 then the map T 7→ P (t, T ) is
decreasing. However, for the sake of tractablility, this modelling requirement is frequently
dropped.

3. Short rate models


We begin with a market that has just the bank account B. We will consider an Itô
process short interest rate model of the form
drt = at dt + βt dWt
for adapted process (at )t≥0 and (bt )t≥0 , and a Brownian motion (Wt )t∈R+ for P.
Note that while in a complete stock market model there was only one equivalent martin-
gale measure, no such choice is possible since the short rate is not traded. However, we know
that there is no arbitrage if the market somehow picks an equivalent martingale measure Q
to price the bonds. We will assume that the market price of risk is given by the process
(λt )t≥0 so that
drt = αt dt + βt dŴt
where dŴt = dWt + λt dt defines a Brownian motion for the measure Q whose martingale
density process M is given by dMt = −Mt λt dWt , and where αt = at − βt λt defines the
risk-neutral drift.
Since we are interested in pricing and hedging, there is no need to model the processes
(at )t≥0 and (λt )t≥0 separately. However, we must be careful to realize that is impossible to
estimate the distribution of the random variable αt directly from a time series rt1 , . . . , rtn .

3.1. Vasicek model. In 1977, Vasicek proposed the following model for the short rate:
drt = λ(r̄ − rt )dt + σdŴt
for a parameter r̄ > 0 interpreted as a mean short rate, a mean-reversion parameter λ > 0,
and a volatility parameter σ > 0. This stochastic differential equation can be solved explicitly
to yield
Z t
−λt −λt
rt = e r0 + (1 − e )r̄ + e−λ(t−s) σdŴs .
0
Note that the short interest rate in the Vasicek model follows an Ornstein–Uhlenbeck process,
and in particular, that for each t ≥ 0 the random variable rt is Gaussian under the measure
Q with
Z t
−λt −λt σ2
E (rt ) = e r0 + (1 − e )r̄ and Var (rt ) =
Q Q
e−2λ(t−s) σ 2 ds = (1 − e−2λt ).
0 2λ
85
Moreover,
 one  can show that the process is ergodic and converges to the invariant distri-
σ2
bution N r̄, 2λ . In particular, we have

1 T
Z
rs ds → r̄ Q − almost surely.
T 0
Please note, however, that in the present framework we can say absolutely nothing about the
distribution of rt for the objective measure P, unless we have a model for the market price
of risk.
Since the short rate rt is Gaussian, the advantage of this type of model is that it is
relatively easy to compute prices, for instance of bonds, explicitly. A disadvantage of this
model is that there is a chance that rt < 0 for some time t > 0. Recall that a normal random
variable can take any real value, both positive and negative. However, for sensible parameter
values, the Q-probabilty of the event {rt < 0} is pretty small.
We have learned from example sheet 3 that
Z T Z T Z TZ t
−λt −λt
rt dt = [e r0 + (1 − e )r̄]dt + e−λ(t−s) σdŴs dt
0 0 0 0
Z T Z T Z T 
−λt −λt −λ(t−s)
= [e r0 + (1 − e )r̄]dt + e dt σdŴs
0 0 s
Z T
σ2 T
Z 
−λt −λt −λt 2
∼N [e r0 + (1 − e )r̄]dt, 2 (1 − e ) dt
0 λ 0
under Q, so that, using the moment generating function of a Gaussian random variable we
have
RT
P (0, T ) = EQ [e− 0 rt dt ]
 Z T
σ2
 
−λt −λt −λt 2
= exp − e r0 + (1 − e )r̄ − 2 (1 − e ) dt
0 2λ
so that
σ2
f (0, T ) = e−λt r0 + (1 − e−λt )r̄ −
2
(1 − e−λt )2

By the time-homogeneity of the Vasicek model, we can actually deduce the formula

−λx −λx σ2
f (t, t + x) = rt e + r̄(1 − e ) − 2 (1 − e−λx )2

This formula says that for the Vasicek model, the forward rates at time t are an affine
function of the short rate at time t. (An affine function is of the form g(x) = ax + b, that is,
its graph is a line.)

4. Markovian short rate models


We now study the case when the short rate is Markovian. Assume that
drt = α(t, rt )dt + β(t, rt )dŴt
for some non-random functions α : R+ × R → R and β : R+ × R → R.
86
As we have learned for Markovian stock models, the price of contingent claims can be
expressed in terms the solution of a PDE:
Theorem. Fix T > 0 and suppose V : [0, T ] × R → R satisfies the PDE
∂V ∂V 1 ∂ 2V
(t, r) + α(t, r) (t, r) + β(t, r)2 2 (t, r) = rV (t, r)
∂t ∂r 2 ∂r
V (T, r) = 1
Suppose P (t, T ) = V (t, rt ). Then the discounted price process
Rt
e− 0 rs ds
P (t, T )
is a Q-local martingale.
Proof. Itô’s formula implies
 Rt  Rt
d e− 0 rs ds V (t, rt ) = −rt e− 0 rs ds V (t, rt )dt
1 ∂ 2V
 
− 0t rs ds
R ∂V ∂V
+e (t, rt ) dt + (t, rt )drt + (t, rt )dhrit
∂t ∂r 2 ∂r2
− 0t rs ds ∂V
R  ∂V 1 ∂ 2V
= e (t, rt ) + α(t, rt ) (t, rt ) + β(t, rt )2 2 (t, rt )
∂t ∂r 2 ∂r
 Rt ∂V
−rt V (t, rt ) dt + e− 0 rs ds (t, rt )β(t, rt )dŴt
∂r
Since the drift vanishes by assumption, so (P (t, T )/Bt )t∈[0,T ] is a local martingale. 
Remark. In the proof of the preceding theorem, notice that we can only conclude that
− 0t rs ds
R
Mt = e P (t, T ) is a local martingale since we are using Itô’s formula. When is it a
true martingale?
Here is a sufficient condition. Suppose that we can show that rt ≥ 0 and that 0 ≤
P (t, T ) ≤ 1 for all t ≥ 0. In this case, we would have 0 ≤ Mt ≤ 1 and hence M is a true
martingale (recall that bounded local martingales are true martingales). In particular, we
have the formula  RT 
− t rs ds
Q
P (t, T ) = E e |Ft .
4.1. Cox–Ingersoll-Ross model. In 1985, Cox, Ingersoll, and Ross proposed the fol-
lowing model for the short rate:

drt = λ(r̄ − rt ) + σ rt dŴt
for a parameter r̄ > 0 interpreted as a mean short rate, a mean-reversion parameter λ > 0,
and a volatility parameter σ > 0. The process (rt )t∈R+ satisfying the above stochastic differ-
ential equation is often called a square-root diffusion or CIR process, though this stochastic
process was studied as early as 1951 by Feller. This process was also used by Heston to
model the spot volatility process in an equity market.
Althought the CIR stochastic differential equation cannot be solved explicitly, one can
say quite a lot about this process. For instance, one can show that the process is ergodic
and its invariant disribution is a gamma distribution with mean r̄.
An advantage of this model over the Vasicek model is that the short rate rt is non-negative
for all t ≥ 0. Furthermore, explicit formula are still available for the bond prices.
87
We can also use the above theorem to compute bond prices. Indeed, fix T > 0 and
consider the PDE
∂V ∂V 1 ∂ 2V
(t, r) + λ(r̄ − r) (t, r) + σ 2 r 2 (t, r) = rV (t, r)
∂t ∂r 2 ∂r
V (T, r) = 1.
As we did in the Heston model, we can make the ansatz
V (t, r) = erR(T −t)+Q(T −t)
for some functions A and B which satisfy the boundary conditions R(0) = Q(0) = 0.
Substituting this into the PDE yields
σ2 2
(−Ṙr − Q̇) + λ(r̄ − r)R + rR = r
2
This time we have
σ2 2
Ṙ = −λR + R −1
2
Q̇ = λr̄R.
The equation for R is a Riccati equation, whose solution is
2(eγτ − 1)
R(τ ) = −
(γ + λ)eγτ + (γ − λ)
Z τ
Q(τ ) = λr̄R(s)ds
0

where γ = λ2 + 2σ 2 . The bond prices are too messy to write down, but the forward rates
are given by
4γ 2 eγx 2λr̄(eγx − 1)
f (t, t + x) = r t + .
[(γ + λ)eγx + (γ − λ)]2 (γ + λ)eγx + (γ − λ)
In particular, the forward rates for the CIR model are again given by an affine function of
the short rate.

5. The Heath–Jarrow–Morton framework


Starting from a short-rate model, the derived bond prices are necessarily Itô processes.
There is no arbitrage in a factor model since, by construction, there exists an equivalent
martingale measure Q such that all discounted bond prices (P (t, T )/Bt )t∈[0,T ] are local mar-
tingales.

The insight of Heath, Jarrow, and Morton in 1992 was that we can change perspectives
by modelling the bond prices directly.
Motivation. Indeed, suppose we start out with just the bond market, but without the
bank account. We can construct the bank account by considering an investor holding his
wealth in just-maturing bonds. More concretely, suppose at time 0 the investor has B0 units
of wealth. Fix a sequence 0 ≤ t0 < t1 < . . . of times and suppose that during the interval
(ti−1 , ti ] the investor holds all of his wealth in the bond which matures at time ti . If the
88
investor’s wealth at time t is denoted by Bt , and the number of shares of the just-maturing
bond by πt , the budget constraint is
Bti−1 = πti P (ti−1 , ti )
and the self-financing condition is
Bti = πti
since P (t, t) = 1 for all t. Hence, the rate of change of the wealth is given by
Bti − Bti−1 Bti−1 1 − P (ti−1 , ti )
=
ti − ti−1 Pti−1 (ti ) ti − ti−1
By taking the limit as ti − ti−1 → 0, we can define the spot rate by

rt = − P (t, T )|T =t
∂T
so that dBt = Bt rt dt as before.

The usual formulation of the HJM idea is in terms of the forward rates. As usual,
we put ourselves in the context of a probability space (Ω, F, Q) on which we can define a
d-dimensional Brownian motion (Ŵt )t≥0 .
Theorem. Suppose for each T , the foward rate process (f (t, T ))t∈[0,T ] has dynamics
n
X (i)
df (t, T ) = a(t, T )dt + σ (i) (t, T )dŴt
i=1

for some suitably regular adapted processes (a(t, T ))t∈[0,T ] and (σ (i) (t, T ))t∈[0,T ] . Let the short
rate be given by rt = f (t, t) and the bank account dynamics by
dBt = Bt rt dt.
Finally, let the bond prices be given by
RT
P (t, T ) = e− t f (t,s) ds
.
If
d
X Z T
(i)
a(t, T ) = σ (t, T ) σ (i) (t, s)ds,
i=1 t

then, the discounted bond prices Rt


e− 0 rs ds
P (t, T )
are local martingales.
Remark. The upshot of the HJM result is that the drift and the volatilty of the forward
rate dynamics cannot be prescribed independently. Indeed, they must be related by the
famous formula
Z T
a(t, T ) = σ(t, T ) · σ(t, s)ds,
t

usually called the HJM drift condition. Notice that this drift/volatility contraint is not
present in the factor models from the previous sections.
89
The difference with the short rate models is that we are now trying to model the dynamics
of the whole term structure. Indeed, in the HJM framework, we can initialize the model with
any initial forward rate curve T 7→ f (0, T ). Nevertheless, note that any of the short rate or
factor models can be put into the HJM framework, just by choosing the initial forward rate
curve to match the one predicted by the model.
Rt RT
Proof. We must show that for each T > 0, the discounted bond price process e− 0 rs ds− t f (t,s)ds
is a local martingale. Now applying some formal manipulations (we assume enough regularity
that we can appeal to a stochastic Fubini theorem)
Z t Z T  Z T
d rs ds + f (t, s) ds = (rt − f (t, t))dt + df (t, s) ds
0 t t
Z T 2 Z T
1
= σ(t, s)ds dt + σ(t, s)ds · dŴt .
2 t t

Hence, by Itô’s formula, we have


Z T
dMt = −Mt σ(t, s)ds · dŴt
t
Rt
where Mt = e− 0 rs ds P (t, T ). Note that M is a stochastic integral with respect to a Brownian
motion, and hence a local martingale. 
We conclude this section with some examples. In these examples, the forward rates are
Gaussian under the measure Q, and hence are vulnerable to the criticism that there is a
positive probability that the rates become negative.

5.1. Ho–Lee. (1986) This model is the simplest possible model HJM model. Let d = 1
and σ(t, T ) = σ0 be constant. Then
df (t, T ) = σ02 (T − t) dt + σ0 dŴt .
or
f (t, T ) = f (0, T ) + σ02 (T t − t2 /2) + σ0 Ŵt .
Here is an unusual feature of this model: if the initial forward rate curve T 7→ f (0, T ) is
bounded from below, then for positive times t the forward rates f (0, T ) → ∞ as T → ∞.
The short rate is then given by
rt = f (0, t) + σ02 t2 /2 + σ0 Ŵt .
Hence the Ho–Lee model corresponds to the following short rate model:
drt = (f00 (t) + σ02 t)dt + σ0 dŴt .

5.2. Vasicek–Hull–White. (1990) Again let d = 1 but now σ(t, T ) = σ0 e−λ(T −t) for
positive constants σ0 and λ. Then
σ02 −λ(T −t)
df (t, T ) = e (1 − e−λ(T −t) )dt + σ0 e−λ(T −t) dŴt .
λ
90
The short rates are given by
t Z t
σ02 −λ(t−s)
Z
−λ(t−s)
rt = f (0, t) + e (1 − e )ds + σ0 e−λ(t−s) dŴs
0 λ 0
2 Z t
σ
= f (0, t) + 02 (1 − e−λt )2 + σ0 e−λ(t−s) dŴs
2λ 0
The short rate dynamics are given by
Z t
σ02 −λt
 
0 −λt
drt = f0 (t) + e (1 − e ) dt + σ0 dŴt − λ σ0 e−λ(t−s) dŴs dt
λ 0
 0
σ02

f0 (t)
=λ + f0 (t) + 2 (1 − e−2λt ) − rt dt + σ0 dŴt
λ 2λ
Hence, the Hull–White extension of the Vasicek essentially replaces the mean interest rate r̄
with a time-varying, but non-random, mean rate r̄(t).
5.3. Kennedy. (1994) Note that for the HJM models discussed above, the forward rates
are given by
Z t Z T Z t
f (t, T ) = f (0, T ) + σ(u, T ) · σ(u, s)ds du + σ(u, T ) · dŴu .
0 u 0
If σ is not random, then the distribution of f (t, T ) under the risk-neutral measure Q is
Gaussian with mean
Z t Z T
Q
E [f (t, T )] = f0 (T ) + σ(u, T ) · σ(u, s)ds du
0 u
and covariance Z s∧t
Q
Cov [f (s, S), f (s, T )] = σ(u, S) · σ(u, T )du.
0
Kennedy reversed this logic, and considered a Gaussian random field {f (t, T ) : 0 ≤ t ≤
T } with mean µ(t, T ) and covariance C(s, t; S, T ). Suppose that covariance has the special
form
C(s, t; S, T ) = cs∧t (S, T )
so that, for each fixed T > 0, the increments of (f (t, T ))t∈[0,T ] are independent. Then the
discounted bound prices are local martingales (actually true martingales since everything is
Gaussian and we can compute the conditional expectations by hand) when the mean is given
by Z T
µ(t, T ) = f (0, T ) + ct∧s (s, T )ds.
0
An advantage of this formulation of the Gaussian HJM model is that one is no longer
restricted to finite dimensional Brownian motions, and, therefore, there is much more flex-
ibility to specify the correlation of the increments. For instance, one choice is to have the
correlation of the increments decay exponentially in the difference of the maturities:
‘corr(df (t, t + x), df (t, t + y) = e−β|x−y| .’
Since the operator on L2 (R+ ) with kernel e−β|x−y| is not of finite rank, the above correlation
could not be realised by a finite rank HJM model. However, since the operator is positive
91
definite, it can be the correlation of a Gaussian random field. Actually, this model can
be realised as an HJM model driven by an infinite dimensional Brownian motion. See the
book Interest Rate Models: an Infinite Dimensional Stochastic Analysis Perspective by René
Carmona and me for details.

92
CHAPTER 7

Crashcourse on probability theory

These notes are a list of many of the definitions and results of probability theory needed
to follow the Advanced Financial Models course. Since they are free from any motivating
exposition or examples, and since no proofs are given for any of the theorems, these notes
should be used only as a reference. A table of notation is in the appendix.

1. Measures
Definition. Let Ω be a set. A sigma-field on Ω is a non-empty set F of subsets of Ω
such that
(1) if A ∈ F then Ac ∈ F, S
(2) if A1 , A2 , . . . ∈ F then ∞
i=1 Ai ∈ F.
The terms sigma-field and sigma-algebra are interchangeable.
The Borel sigma-field B on R is the smallest sigma-field containing every open interval.
More generally, if Ω is a topological space, for instance Rn , the Borel sigma-field on Ω is the
smallest sigma-field containing every open set.
Definition. Let Ω be a set and let F be a sigma-field on Ω. A measure µ on the
measurable space (Ω, F) is a µ : F → [0, ∞] such that
(1) µ(∅) = 0
(2) if A1 , A2 , . . . ∈ F are disjoint then µ( ∞
S P∞
i=1 Ai ) = i=1 µ(Ai ).

Theorem. There exists a unique measure Leb on (R, B) such that


Leb(a, b] = b − a
for every b > a. This measure is called Lebesgue measure.
Definition. A probability measure P on (Ω, F) is a measure such that P(Ω) = 1.
Let Ω be a set, F a sigma-field on Ω, and P a probability measure on (Ω, F). The triple
(Ω, F, P) is called a probability space.
The set Ω is called the sample space, and an element of Ω is called an outcome. A subset
of Ω which is an element of F is called an event.
Let A ∈ F be an event. If P(A) = 1 then A is called an almost sure event, and if
P(A) = 0 then A is called a null event. The phrase ‘almost surely’ is often abbreviated a.s.
A sigma-field is called trivial if each of its elements is either almost sure or null.

2. Random variables
Definition. Let (Ω, F, P) be a probability space. A random variable is a function
X : Ω → R such that the set {ω ∈ Ω : X(ω) ≤ t} is an element of F for all t ∈ R.
93
Let A be a subset of R, and let X be a random variable. We use the notation {X ∈ A}
to denote the set {ω ∈ Ω : X(ω) ∈ A}. For instance, the event {X ≤ t} denotes {ω ∈ Ω :
X(ω) ≤ t}.
The distribution function of X is the function FX : R → [0, 1] defined by
FX (t) = P(X ≤ t)
for all t ∈ R.
We also use the term random variable to refer to measurable functions X from Ω to more
general spaces. In particular, we call a function X : Ω → Rn a random variable or random
vector if X(ω) = (X1 (ω), . . . , Xn (ω)) and Xi is a random variable for each i ∈ {1, . . . , n}.
Definition. Let A be an event in Ω. The indicator function of the event A is the
random variable 1A : Ω → {0, 1} defined by

1 if ω ∈ A
1A (ω) = 0 if ω ∈ Ac
for all ω ∈ Ω.

3. Expectations and variances


Definition. Let X be a random variable on (Ω, F, P). The expected value of X is
denoted by E(X) and is defined as follows
• X is simple, i.e. takes only a finite number of values x1 , . . . , xn .
Xn
E(X) = xi P(X = xi ).
i=1
• X ≥ 0 almost surely.
E(X) = sup{E(Y ) : Y simple and 0 ≤ Y ≤ Xa.s.}
Note that the expected value of a non-negative random variable may take the value
∞.
• Either E(X + ) or E(X − ) is finite.
E(X) = E(X + ) − E(X − )
• X is vector valued and E(|X|) < ∞.
E[(X1 , . . . , Xd )] = (E[X1 ], . . . , E[Xd ])
A random variable X is integrable iff E(|X|) < ∞ and is square-integrable iff E(X 2 ) < ∞.
The terms expected value, expectation, and mean are interchangeable.
The variance of an integrable random variable X, written Var(X), is
Var(X) = E{[X − E(X)]2 } = E(X 2 ) − E(X)2 .
The covariance of square-integrable random variable X and Y , written Cov(X, Y ), is
Cov(X, Y ) = E{[X − E(X)][Y − E(Y )]} = E(XY ) − E(X)E(Y ).
If neither X or Y is almost surely constant, then their correlation, written ρ(X, Y ), is
Cov(X, Y )
ρ(X, Y ) = .
Var(X)1/2 Var(Y )1/2
94
Random variables X and Y are called uncorrelated if Cov(X, Y ) = 0.
Theorem. Let X and Y be integrable random variables.
• linearity: E(aX + bY ) = aE(X) + bE(Y ) for constants a, b.
• positivity: Suppose X ≥ 0 almost surely. Then E(X) ≥ 0 with equality if and only
if X = 0 almost surely.
Definition. For p ≥ 1, the space Lp is the collection of random variables such that
E(|X|p ) < ∞. The space L∞ is the collection of random variables which are bounded almost
surely.
Theorem (Jensen’s inequality). Let X be a random variable and g : R → R be a convex
function. Then
E[g(X)] ≥ g(E[X])
whenever the expectations exist. If g is strictly convex, the above inequality is strict unless
X is constant.
Theorem (Hölder’s inequality). Let X and Y be random variables and let p, q > 1 with
1
p
+ 1q = 1. If X ∈ Lp and Y ∈ Lq then
E(XY ) ≤ E(|X|p )1/p E(|Y |q )1/q
with equality if and only if either X = 0 almost surely or X and Y have the same sign and
|Y | = a|X|p−1 almost surely for some constant a ≥ 0. The case when p = q = 2 is called the
Cauchy–Schwarz inequality.
Definition. A random variable X is called discrete if X takes values in a countable set;
i.e. there is a countable set S such that X ∈ S almost surely. If X is discrete, the function
pX : R → [0, 1] defined by pX (t) = P(X = t) is called the mass function of X.
The random variable X is absolutely continuous (with respect to Lebesgue measure) if
and only if there exists a function fX : R → [0, ∞) such that
Z t
P(X ≤ t) = fX (x)dx
−∞
for all t ∈ R, in which case the function fX is called the density function of X.
If X is a random vector taking values in Rn , then the density of X, if it exists, is the
function fX : Rn → [0, ∞) such that
Z
P(X ∈ A) = fX (x)dx
A
n
for all Borel subsets A ⊆ R .
Theorem. Let the function g : R → R be such that g(X) is integrable.
If X is a discrete random variable with probability mass function pX taking values in a
countable set S then X
E(g(X)) = g(t) pX (t).
t∈S
If X is an absolutely continuous random variable with density function fX then
Z ∞
E(g(X)) = g(x) fX (x) dx.
−∞
95
More generally, if X is a random vector valued in Rn with density fX and g : Rn → R then
Z
E(g(X)) = g(x) fX (x) dx.
Rn

4. Special distributions
Definition. Let X be a discrete random variable taking values in Z+ with mass function
pX .
The random variable X is called
• Bernoulli with parameter p if
pX (0) = 1 − p and pX (1) = p.
where 0 < p < 1. Then E(X) = p and Var(X) = p(1 − p).
• binomial with parameters n and p, written X ∼ bin(n, p), if
 
n k
pX (k) = p (1 − p)n−k for all k ∈ {0, 1, . . . , n}
k
where n ∈ N and 0 < p < 1. Then E(X) = np and Var(X) = np(1 − p).
• Poisson with parameter λ if
λk −λ
pX (k) = e for all k = 0, 1, 2, . . .
k!
where λ > 0. Then E(X) = λ.
• geometric with parameter p if
pX (k) = p(1 − p)k−1 for all k = 1, 2, 3, . . .
where 0 < p < 1. Then E(X) = 1/p.
Definition. Let X be a continuous random variable with density function fX .
The random variable X is called
• uniform on the interval (a, b), written X ∼ unif(a, b), if
1
fX (t) = for all a < t < b
b−a
for some a < b. Then E(X) = a+b2
.
• normal or Gaussian with mean µ and variance σ 2 , written X ∼ N (µ, σ 2 ), if
(x − µ)2
 
1
fX (t) = √ exp − for all t ∈ R
2πσ 2σ 2
for some µ ∈ R and σ 2 > 0. Then E(X) = µ and Var(X) = σ 2 .
• exponential with rate λ, if
fX (t) = λe−λt for all t ≥ 0
for some λ > 0. Then E(X) = 1/λ.
96
If X is a random vector valued in Rn with density
 
−n/2 −1/2 1 −1
fX (x) = (2π) det(V ) exp − (x − µ) · V (x − µ)
2
for a positive definite n × n matrix V and vector µ ∈ Rn , then X is said to have the n-
dimensional normal (or Gaussian) distribution with mean µ and variance V , written X ∼
Nn (µ, V ). Then E(Xi ) = µi and Cov(Xi , Xj ) = Vij .
5. Conditional probability and expectation, independence
Definition. Let B be an event with P(B) > 0. The conditional probability of an event
A given B, written P(A|B), is
P(A ∩ B)
P(A|B) = .
P(B)
The conditional expectation of X given B, written E(X|B), is
E(X 1B )
E(X|B) = .
P(B)
Theorem (The law of total probability). Let B1 , B2 , . . . be disjoint, non-null events such
that ∞
S
B
i=1 i = Ω. Then
X∞
P(A) = P(A|Bi )P(Bi )
i=1
for all events A.
Definition. Let A1 , A2 , . . . be events. If
\ Y
P( Ai ) = P(Ai )
i∈I i∈I
for every finite subset I ⊂ N then the events are said to be independent.
Random variables X1 , X2 , . . . are called independent if the events {X1 ≤ t1 }, {X2 ≤
t2 }, . . . are independent. The phrase ‘independent and identically distributed’ is often ab-
breviated i.i.d.
Theorem. If X and Y are independent and integrable, then
E(XY ) = E(X)E(Y ).
6. Probability inequalities
Theorem (Markov’s inequality). Let X be a positive random variable. Then
E(X)
P(X ≥ ) ≤

for all  > 0.
Corollary (Chebychev’s inequality). Let X be a random variable with E(X) = µ and
Var(X) = σ 2 . Then
σ2
P(|X − µ| ≥ ) ≤ 2

for all  > 0.
97
7. Characteristic functions
Definition. The characteristic function of a real-valued random variable X is the func-
tion φX : R → C defined by
φX (t) = E(eitX )

for all t ∈ R, where i = −1. More generally, if X is a random vector valued in Rn then
φX : Rn → C defined by
φX (t) = E(eit·X )
is the characteristic function of X.
Theorem (Uniqueness of characteristic functions). Let X and Y be real-valued ran-
dom variables with distribution functions FX and FY . Let φX and φY be the characteristic
functions of X and Y . Then
φX (t) = φY (t) for all t ∈ R
if and only if
FX (t) = FY (t) for all t ∈ R.

8. Fundamental probability results


Definition (Modes of convergence). Let X1 , X2 , . . . and X be random variables.
• Xn → X almost surely if P(Xn → X) = 1
• Xn → X in Lp , for p ≥ 1, if E|X|p < ∞ and E|Xn − X|p → 0
• Xn → X in probability if P(|Xn − X| > ) → 0 for all  > 0
• Xn → X in distribution if FXn (t) → FX (t) for all points t ∈ R of continuity of FX
Theorem. The following implications hold:

Xn → X almost surely 
or ⇒ Xn → X in probability ⇒ Xn → X in distribution
Xn → X in Lp , p ≥ 1

Furthermore, if r ≥ p ≥ 1 then Xn → X in Lr ⇒ Xn → X in Lp .
Definition. Let A1 , A2 , . . . be events. The term eventually is defined by
[ \
{An eventually} = An
N ∈N n≥N

and infinitely often by \ [


{An infinitely often} = An .
N ∈N n≥N
[The phrase ‘infinitely often’ is often abbreviated i.o.]
Theorem (The first Borel–Cantelli lemma). Let A1 , A2 , . . . be a sequence of events. If

X
P(An ) < ∞
n=1

then P(An infinitely often) = 0.


98
Theorem (The second Borel-Cantelli lemma). Let A1 , A2 , . . . be a sequence of indepen-
dent events. If

X
P(An ) = ∞
n=1
then P(An infinitely often) = 1.
Theorem (Monotone convergence theorem). Let X1 , X2 , . . . be positive random variables
with Xn ≤ Xn+1 almost surely for all n ≥ 1, and let X = supn∈N Xn . Then Xn → X almost
surely and
E(Xn ) → E(X).
Theorem (Fatou’s lemma). Let X1 , X2 , . . . be positive random variables. Then
E(lim inf Xn ) ≤ lim inf E(Xn ).
n↑∞ n↑∞

Theorem (Dominated convergence theorem). Let X1 , X2 , . . . and X be random variables


such that Xn → X almost surely. If E(supn≥1 |Xn |) < ∞ then
E(Xn ) → E(X).
Theorem (A strong law of large numbers). Let X1 , X2 , . . . be independent and identically
distributed integrable random variables with common mean E(Xi ) = µ. Then
X 1 + . . . + Xn
→ µ almost surely.
n
Theorem (Central limit theorem). Let X1 , X2 , . . . be independent and identically dis-
tributed with E(Xi ) = µ and Var(Xi ) = σ 2 for each i = 1, 2, . . ., and let
X1 + . . . + Xn − nµ
Zn = √ .
σ n
Then Zn → Z in distribution, where Z ∼ N (0, 1).

99
R the set of real numbers
R+ the set of non-negative real numbers [0, ∞)
N the set of natural numbers {1, 2, . . .}
C the set of complex numbers
Z the set of integers {. . . , −2, −1, 0, 1, 2, . . .}
Z+ the set of non-negative integers {0, 1, 2, . . .}
Ac the complement of a set A, Ac = {ω ∈ Ω, ω ∈ / A}

FX the distribution function of a random variable X


pX the mass function of a discrete random variable X
fX the density function of an absolutely continuous random variable X
φX the characteristic function of X

E(X) the expected value of the random variable X


Var(X) the variance of X
Cov(X, Y ) the covariance of X and Y
E(X|B) the conditional expectation of X given the event B

a∧b min{a, b}
a∨b max{a, b}
a+ max{a, 0}
lim supn↑∞ xn the limit superior of the sequence x1 , x2 , . . .
lim inf n↑∞ xn the limit inferior of the sequence x1 , x2 , . . .
Pn
a·b Euclidean inner (or dot) product in Rn , a · b = i=1 ai b i
|a| Euclidean norm in Rn , |a| = (a · a)1/2

X∼ν the random variable X is distributed as the probability measure ν


1A the indicator function of the event A
N (µ, σ 2 ) the normal distribution with mean µ and variance σ 2
Nn (µ, V ) the n-dimensional normal distribution with mean µ ∈ Rn
and variance V ∈ Rn×n
bin(n, p) the binomial distribution with parameters n and p
unif(a, b) the uniform distribution on the interval (a, b)

Lp the set of random variables X with E|X|p < ∞


Table 1. Notation

100
Index

1FTAP Chebychev’s inequality, 97


continuous time, 59 CIR model, 87
discrete time, 17 complete market
2FTAP discrete time, 37
continuous time, 64 conditional expectation
discrete time, 37 existence and uniqueness, 14
given a sigma-field, 14
a.s., 93 given an event, 97
absolutely continuous random variable, 95 continuation region, 80
adapted process, 9 contour integration, 75
admissible trading strategy, 58 Cox–Ingersoll–Ross model, 87
almost sure event, 93
American contingent claims, 39 density function, 95
arbitrage discounted price relative to a numéraire, 26
absolute discrete random variable, 95
continuous time, 58 dominated convergence theorem, 99
discrete time, 12 Doob decomposition, 40
relative Dupire’s formula, 72
continuous time, 58
equivalent martingale measure
attainable
discrete time, 26
discrete-time, 34
equivalent measures, 25
Bernoulli random variable, 96 exponential random variable, 96
binomial random variable, 96
Fatou’s lemma, 99
Black–Scholes formula, 66
Feynman–Kac PDE, 67
Black–Scholes model, 65 filtration, 9
Black–Scholes PDE, 68 forward rate, 84
bond, 83 fundamental theorem of asset pricing
Borel sigma-field, 93 first
Borel–Cantelli lemmas, 98, 99 continuous time, 59
Breeden–Litzenberger formula, 73 discrete time, 17
Bromwich integral, 75 second
Brownian motion, 44 continuous time, 64
discrete time, 37
call option
American, 31 Gaussian random variable, 96
European, 31 Gaussian random vector, 97
Cameron–Martin–Girsanov theorem, 53 geometric random variable, 96
Cauchy residue theorem, 75 Girsanov’s theorem, 53
Cauchy–Schwarz inequality, 95
central limit theorem, 99 Hölder’s inequality, 95
101
Heath–Jarrow–Morton drift condition, 89 numéraire, 24
Heston model, 77
historical probability measure, 26 objective probability measure, 26
HJM drift condition, 89 optimal stopping time, 41
Ho–Lee model, 90
Poisson random variable, 96
Hull–White extension of Vasicek, 90
predictable
i.i.d., 97 discrete time, 11
implied volatility, 70 predictable process
incomplete market continuous time, 46
discrete time, 37 predictable sigma-field, 46
independent events, 97 previsible
independent random variables, 97 discrete time, 11
indicator function, 94 probability density function, 95
integrable random variable, 94 probability mass function, 95
interest rate term structure, 84 put option, 35
Itô process, 48 put-call parity, 35
Itô’s formula
quadratic co-variation, 51
multi-dimensional version, 52
quadratic variation, 49
scalar version, 48
Itô’s isometry, 45 Radon–Nikodym derivative, 25
Radon–Nikodym theorem, 25
Jensen’s inequality, 95
replicable
Kennedy model, 91 discrete-time, 34
Kolmogorov equation, 67 Riccati equation, 78
risk-less asset
law of iterated expectations, 15 discrete-time, 29
Lebesgue measure, 93 risk-neutral measure, 29
local martingale, 19
local volatility, 72 self-financing
long interest rate, 85 continuous time, 55
discrete time, 11
market price of risk, 61 short interest rate, 84
Markov’s inequality, 97 sigma-algebra, 93
martingale, 15 sigma-field, 93
martingale deflator simple predictable process, 44
discrete-time, 14 simple random variable, 94
martingale representation theorem, 53 smile, implied volatility, 70
martingale transform, 17 smirk, implied volatility, 70
mass function, 95 smooth pasting, 81
mean-reverting process, 77 Snell envelope
measurable with respect to a sigma-field, 14 discrete time, 40
measure, 93 spot interest rate, 84
probability, 93 square-integarable random variable, 94
monotone convergence theorem, 99 statistical probability measure, 26
multivariate Gaussian, 97 stochastic integral
multivariate normal distribution, 97 discrete time, 17
stopping region, 80
natural filtration of a process, 16 stopping time, 19
normal random variable, 96 strike price, 31
normal random vector, 97 strong law of large numbers, 99
Novikov’s criterion, 53 submartingale, 21
null event, 93 suicide strategy, 58
102
super-replication, 32
supermartingale, 21

term structure of interest rates, 84


tower property, 15
trivial sigma-field, 11, 93

uniform random variable, 96


usual conditions, 44

Vasicek model, 85

Wiener process, 44

yield curve, 83

zero-coupon bond, 83

103

You might also like