Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
21 views15 pages

Martingales

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views15 pages

Martingales

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Chapter 7

Martingales

7.1 Conditional probability with respect to a σ-algebra


Given a probability space (Ω, F, P) and an event B ∈ F such that P(B) 6= 0, for any A ∈ F it is
possible to define the conditional probability of A given B, denoted P(A|B), as

P(A ∩ B)
P(A|B) = , A ∈ F.
P(B)

In fact Markov property for stochastic processes (Xt )t ∈ T , where T ⊂ R+ , with discrete state
space S is formulated in terms of the equality between conditional probabilities, namely:
 
P Xtk+1 = ik+1 |Xt1 = i1 , ..., Xtk = ik = P Xtk+1 = ik+1 |Xtk = ik ,

for any choice of k ∈ N, 0 ≤ t1 < t2 < ... < tk < tk+1 , i1 , ..., ik+1 ∈ S.
However, if the random variables have an absolutely continuous distribution then the formula
above is no longer meaningful since events of the form {Xtk = ik } have probability 0!
The aim of the present chapter is to provide a generalization of the concept of conditional proba-
bility that:
• allows to deal with the notion of conditional probability even with respect to events B, such
that P(B) = 0,
• can be applied to a definition of Markov property for stochastic processes (Xt )t ∈ T such
that Xt is continuously distributed.

This problem can be solved by introducing the notion of conditional probability with respect to a
sub-σ-algebra.
Theorem 41. Let (Ω, F, P) be a probability space, G ⊂ F a sub-σ-algebra of F and A ∈ F a fixed
event.
There exists a map P(A|G) : Ω → R called conditional probability of A with respect to G with the
following properties:
i. P(A|G) is G-measurable and integrable.

102
ii. For any G ∈ G the following holds
Z
P(A|G)(ω)dP(ω) = P(A ∩ G)
G

Such a random variable is unique up to probability 0 sets, i.e. if f : Ω → R is another map


satisfying i. and ii. then P(f 6= P(A|G)) = 0. In this case f is called a version of P(A|G).
The proof relies on Radon Nicodym theorem which we recall here below.
Theorem 42 (Radon-Nicodym). Let µ, ν be two σ-finite measures on some measurable space
(Ω, F). Let us assume that ν is absolutely continuous with respect to µ 1 .
Then there exists a function f : Ω → R such that f is F measurable and for any E ∈ F
Z
ν(E) = f (ω)dµ(ω).
E

Such a function is called density.


It f˜ : Ω → R is another map with these properties then µ(f˜ 6= f ) = 0.
For a proof of Radon-Nicodym theorem see, e.g., [9].

Proof: (of theorem 41)


Let us apply the Radon-Nicodym theorem to the particular case where the measurable space is
(Ω, G), the measure µ is the probability measure P restricted to the σ-algebra G while the measure
ν is defined as ν(G) := P(A ∩ G). It is easy to verify that ν is absolutely continuous with respect
to µ, hence,
R by Radon Nicodym theorem, there exists a G-measurable map P(A|G) such that
ν(G) = G P(A|G)(ω)dµ(ω), i.e. :
Z
P(A ∩ G) = P(A|G)(ω)dP(ω).
G

Furthermore, any other density f˜ with these properties differs from P(A|G) on sets which have
zero probability, i.e P(f˜ 6= P(A|G)) = 0.
Let us consider now some particular examples.
Example 23. Let us consider a partition {Bi }i of Ω, i.e. a finite or countable family of disjoint
sets such that Ω = ∪i Bi and let G = σ(Bi , i) be the σ algebra generated by the partition. A
G-measurable map must be constant on the sets Bi . In particular, for any event A ∈ F, the
conditional probability of A given G will have the following form:
(
P(A|Bi ) if ω ∈ Bi and P(Bi ) 6= 0,
P(A|G)(ω) =
ci if ω ∈ Bi and P(Bi ) = 0,

where the constants ci can be arbitrarily chosen in R. Two different choices corresponds to different
versions of P(A|G)(ω).
1 This means that if E ∈ F is such that µ(E) = 0 then ν(E) = 0.

103
Example 24. Let us consider a particular case of the previous example. Let (Nt )t∈R+ be a
Poisson process with rate α > 0. Let 0 ≤ s ≤ t and consider the sub-σ-algebra G generated by
the events Bi = {Nt = i}, i ∈ N, i.e. the sub-σ-algebra generated by the random variable Nt . Let
A ∈ F be the event A = {Ns = 0}. As shown in the previous example, the conditional probability
P(Ns = 0|G) is given by

P(Ns = 0|Nt = i)1Nt =i (ω)


X
P(Ns = 0|G)(ω) =
i

where
P(Ns = 0 ∩ Nt = i) P(Ns = 0 ∩ Nt − Ns = i)  s i
P(Ns = 0|Nt = i) = = = 1−
P(Nt = i) P(Nt = i) t
hence
X s i  s Nt (ω)
P(Ns = 0|G)(ω) = 1− 1Nt =i (ω) = 1 − .
i
t t

Example 25. If G = {∅, Ω} then P(A|G) = P(A).


Example 26. If G = F then P(A|G) = 1A .
Example 27. If A is independent of G then P(A|G) = P(A).
In the particular case where the sub-σ- algebra G is generated by a random variable X : Ω → R,
i.e. if G = σ(X), then the conditional probability P(A|G) will be denoted as

P(A|G) ≡ P(A|X)

By Doob’s theorem the G-measurable map P(A|G) will be of the form P(A|G)(ω) = g(X(ω)) for
some Borel measurable function g : R → R. For instance, in the example 24, the σ-algebra G is
generated by the random
x variable Nt and the map P(A|G) has the form P(A|G)(ω) = g(X(ω))
where g(x) = 1 − st .
The notion of conditional probability with respect to a σ-algebra allows to generalize the
definition of Markov property to the case of stochastic processes (Xt )t such that the random
variables Xt are continuously distributed.
Definition 42. A stochastic process (Xt )t≥0 has the Markov property if ∀t ≤ u, I ∈ B(R) the
following holds
P(Xu ∈ I|Xt ) = P(Xu ∈ I|Xs , s ≤ t) a.s. (7.1)

7.2 Conditional expectation with respect to a σ-algebra


The notion of conditional probability with respect to a σ-algebra can be extended to the notion of
conditional expectation with respect to a σ-algebra.
Theorem 43. Let X be a random variable on a probability space (Ω, F, P) such that E[|X|] < +∞.
Let G ⊂ F be a sub−σ- algebra. Then there exists a map E[X|G] : Ω → R such that
i. E[X|G] is G-measurable.
R R
ii. G E[X|G](ω)dP(ω) = G X(ω)dP(ω) ∀G ∈ G.

104
E[X|G] is called conditional expectation of X with respect to G. It is unique up to P-null sets, i.e.
for any Z : Ω → R satisfying i. and ii. P(Z 6= E[X|G]) = 0. In this case Z is called a version of
E[X|G].

Proof:
1. 1st step. If X ≥ 0 then let us consider the positive finite measure ν on (Ω, G) defined as
Z
ν(G) = X(ω)dP(ω), G ∈ G.

It is absolutely continuous with respect to the measure µ on (Ω, G) defined as the restriction
R P to G. By Radon-Nicodym theorem there exists a G-measurable map F such that ν(G) =
of
G
f dµ. It is sufficient to set E[X|G] = f .
2. 2nd step. In the general case it is sufficient to write X = X + − X − , where X + = max(0, X)
and X − = max(0, −X), and set E[X|G] = E[X + |G] − E[X − |G], where E[X + |G] and E[X − |G]
are constructed according to the procedure described in the first step.

Example 28. If X = 1A then E[X|G] = P(A|G).


Example 29. If X is G-measurable, then E[X|G] = X a.s.
Example 30. If the sub−σ- algebra G is generated by a partition {Bi }i then any version of the
conditional expectation of a random variable X with respect to G has the following form

ci 1Bi (ω),
X
E[X|G](ω) = ω ∈ Ω,
i

1
R
where ci = P(Bi ) Bi
X(ω)dP(ω) if P(Bi ) 6= 0, while ci can have an arbitrary real value in the case
P(Bi ) = 0.
In the case where G is generated by a random variable Y , i.e. if G = σ(Y ) then the conditional ex-
pectation of a random variable X with respect to σ(Y ) is denoted E[X|Y ]. By Doob’s theorem we
have that there exists a Borel measurable function g : R → R such that E[X|σ(Y )](ω) = g(Y (ω)).
In particular for any y ∈ R, we shall adopt the heuristic notation g(y) ≡ E[X|Y = y].

Let X, X1 , X2 be integrable random variables on a probability space (Ω, F, P), G ⊂ F a sub-σ-


algebra and α, β two real constants. The conditional expectation has the following properties:

1. E[αX1 + βX2 |G] = αE[X1 |G] + βE[X2 |G] a.s.


2. if X ≥ 0 then E[X|G] ≥ 0 a.s.
3. E[E[X|G]] = E[X]

4. if Z : Ω → R is a bounded G− measurable function then E[ZX|G] = ZE[X|G] a.s.


5. if G 0 ⊂ G is a sub-σ-algebra of G then E[E[X|G]|G 0 ] = E[X|G 0 ] a.s.

105
6. if X is independent of G then E[X|G] = E[X] a.s.
Furthermore if X, {Xn }n are real integrable random variables then
7. if Xn % X a.s. then E[Xn |G] % E[X|G] a.s.
8. if Xn → X a.s. and ∀n |Xn | ≤ Y for some integrable function Y , then E[Xn |G] → E[X|G]
a.s.
Moreover Jensen’s inequality holds: if φ : R → R is a convex function2 then

φ (E[X|G]) ≤ E[φ(X)|G] a.s. (7.2)

Example 31. Let (Nt )t≥0 be a Poisson process wit rate α > 0, let s, t ∈ R+ , with s < t, and let
G = σ(Ns ). The conditional expectation of Nt with respect to σ(Ns ) can be computed according to
the following procedure:

E[Nt |Ns ] = E[Nt − Ns + Ns |Ns ] (7.3)


= E[Nt − Ns |Ns ] + E[Ns |Ns ] (7.4)
= E[Nt − Ns ] + Ns (7.5)
= α(t − s) + Ns (7.6)

where (7.4) follows from the linearity of conditional expectation (property 1.), (7.5) follows from the
independence of Nt − Ns and σ(Ns ) (property 4.) and from the fact that Ns is σ(Ns )-measurable,
while (7.6) comes from the properties of Poisson distribution.

7.2.1 An alternative interpretation of conditional expectation


Let us now restrict to the case of random variables X on a probability space (Ω, F, P) such that
E[X 2 ] < +∞. Let us consider the real Hilbert space H = L2 (Ω, F, P), i.e. the vector space of
(equivalence classes3 of ) F-measurable maps X : Ω → R, such that E[X 2 ] < +∞. The inner
product in H is defined as
Z
hX, Y i := E[XY ] = X(ω)Y (ω)dP(ω), X, Y ∈ H

and the norm kXk is induced by the inner product in the natural way, i.e. kXk2 = hX, Xi.
It is well known (see, e.g. [9]) that H endowed with the distance induced by the norm, i.e.
d(X, Y ) = kX − Y k, is a complete metric space.
If G ⊂ F is a sub-σ-algebra of F, let us also introduce the Hilbert space K = L2 (Ω, G, P). Clearly
K is a closed subspace of H.
Let us consider the linear operator P : H → K defined as

P (X) := E[X|G], X∈H

Clearly, by property 1 of conditional expectation P is linear. Furthermore by definition of condi-


tional expectation the random variable P (X) = E[X|G] is G measurable. In order to prove that
2 φ(tx + (1 − t)y) ≤ tφ(x) + (1 − t)φ(y) for any t ∈ [0, 1] and x, y ∈ R. In particular this property yields the
continuity of φ on R hence its Borel measurability.
3 X ∼ X 0 if P(X 6= X 0 ) = 0

106
it is an element of K we have to prove that E[(P (X))2 ] < ∞. By Jensen’s inequality we obtain
(E[X|G])2 ≤ E[X 2 |G]. By taking the expectation of both sides

E[(E[X|G])2 ] ≤ E[E[X 2 |G]] = E[X 2 ] < +∞, (7.7)

where in the second step we used property 3.


Equation (7.7) shows that the operator P : H → H is bounded hence continuous. The image of H
under the action of P is the space K. In fact P is the orthogonal projection operator from H to
K. Indeed for any W ∈ K we have:

hW, X − P (X)i = E[W (X − E[X|G])] = E[W X] − E[W E[X|G]] = E[W X] − E[E[W X|G]] (7.8)
= E[W X] − E[W X] = 0. (7.9)

Further, if we pick G 0 ⊂ G and consider the set J = L2 (Ω, G 0 , P), we have that J ⊂ K is a closed
subspace. If we define the projection PJK : K → J as the restriction to K of the projection
PJH : H → J, we have PJH = PJK ◦ PKH , i.e. E[X|G 0 ] = E[E[X|G]|G 0 ], X ∈ H.
Eq. (7.8) allows also to prove that E[X|G] is the element of K which minimizes the distance from
X. Indeed, for any W ∈ K, the square of the distance d(W, X) is given by

(d(W, X))2 = kW − Xk2 = E[|W − X|2 ]


= E[(W − E[X|G])2 ] + E[(E[X|G] − X)2 ] + 2E[(W − E[X|G])(E[X|G] − X)]
= E[(W − E[X|G])2 ] + E[(E[X|G] − X)2 ] ≥ E[(E[X|G] − X)2 ]

7.3 Martingales
Definition 43. A real valued stochastic process (Ω, F, (Ft )t∈T , (Xt )t∈T , P), T ⊂ R+ , is a martin-
gale if
i. E[|Xt |] < +∞ for all t ∈ T
ii. E[Xt |Fs ] = Xs , a.s. for all s, t ∈ T such that s ≤ t.
If the filtration (Ft )t∈T is not specified it is understood that it is the natural one, i.e. Ft =
σ(Xs , s ≤ t).
Example 32. Let us consider the case where T = N. Let {ξjP }j≥1 be a sequence of inde-
n
pendent centered integrable real random variables. Let Xn := j=1 ξj and (Fn )n≥1 be the
natural filtration. By the particular construction of the random variables (Xn )n≥1 we have
Fn = σ(X1 , ..., Xn ) = σ(ξ1 , ..., ξn ).
The stochastic process (Ω, F, (Fn )n≥1 , (Xn )n≥1 , P) is a martingale, indeed in n ≥ m:
n
X n
X
E[Xn |Fm ] = E[Xm |Fm ] + E[ξj |Fm ] = Xm + E[ξj ] = Xm a.s.
j=m+1 j=m+1

Actually the case where T = N has a gambling interpretation. If Xn represents the ”fortune”
(the amount of money) of the gambler after the n-th bet, condition ii. of definition 43, namely
E[Xn+1 |Fn ] = Xn is interpreted as the (almost sure) equality between the present fortune and the
expected fortune after the next play. A martingale is considered a ”fair play”

107
Remark 14. If (Ω, F, (Ft )t∈T , (Xt )t∈T , P) is a martingale then E[Xt ] is a constant function of
the time variable t. Indeed for any s ≤ t:

E[Xs ] = E[E[Xt |Fs ]] = E[Xt ]

Example 33. Let X be an integrable real random variable on (Ω, F, P) and (Ft )t∈T a filtration.
Define Xt := E[X|Ft ], t ∈ T . Then the process (Ω, F, (Ft )t∈T , (Xt )t∈T , P) is a martingale. Indeed
for any t ∈ T the random variable Xt is integrable, since the function φ : R → R given by
φ(x) := |x| is convex, and by Jensen’s inequality

|Xt | = |E[X|Ft ]| ≤ E[|X||Ft ]

hence
E[|Xt |] ≤ E[E[|X||Ft ]] = E[|X|] < +∞.
Further, for s ≤ t:
E[Xt |Fs ] = E[E[X|Ft ]|Fs ] = E[X|Fs ] = Xs a.s.
Example 34. If (Xt )t≥0 is an integrable process with independent and centered increments then
it is a martingale.

E[Xt |Fs ] = E[Xt − Xs |Fs ] + E[Xs |Fs ] = E[Xt − Xs ] + Xs = 0 + Xs a.s.

Definition 44. A real valued stochastic process (Ω, F, (Ft )t∈T , (Xt )t∈T , P), T ⊂ R+ , is a sub-
martingale if
i E[|X|] < +∞ for all t ∈ T
ii. E[Xt |Fs ] ≥ Xs , a.s. for all s, t ∈ T such that s ≤ t.
Definition 45. A real valued stochastic process (Ω, F, (Ft )t∈T , (Xt )t∈T , P), T ⊂ R+ , is a super-
martingale if
i E[|X|] < +∞ for all t ∈ T
ii. E[Xt |Fs ] ≤ Xs , a.s. for all s, t ∈ T such that s ≤ t.
Example 35. Let us consider the case where T = N, {ξj }j≥1 be Pan sequence of independent
integrable real random variables such that E[ξj ] ≥ 0. Let Xn := j=1 ξj and (Fn )n≥1 be the
natural filtration. Then (Ω, F, (Ft )t∈T , (Xt )t∈T , P), T ⊂ R+ , is a submartingale.
If E[ξj ] ≤ 0 then the process (Ω, F, (Ft )t∈T , (Xt )t∈T , P), T ⊂ R+ , is a supermartingale.
Remark 15. If (Ω, F, (Ft )t∈T , (Xt )t∈T , P) is a submartingale then E[Xt ] is an increasing function
of the time variable t. Indeed for any s ≤ t:

E[Xs ] ≤ E[E[Xt |Fs ]] = E[Xt ],

while if it is a supermartingale then E[Xt ] is a decreasing function of the time variable t

E[Xs ] ≥ E[E[Xt |Fs ]] = E[Xt ], s ≤ t.

In a gambling interpretation, a submartingale represents a play favorable to the gambler, while


a supermartingale represents an unfavorable one.

108
Example 36. Let φ : R → R be a convex function, (Ω, F, (Ft )t∈T , (Xt )t∈T , P) be a mar-
tingale and assume that for all t ∈ T the random variable Yt := φ(Xt ) is integrable. Then
(Ω, F, (Ft )t∈T , (Yt )t∈T , P), is a submartingale. Indeed:
E[Yt |Fs ] = E[φ(Xt )|Fs ] ≥ φ(E[Xt |Fs ]) = φ(Xs ) = Ys
Example 37. Let Nt be a Poisson process with rate α. Nt is a submartingale, while Nt − αt is a
martingale.

7.3.1 Gambling interpretation. Fair and unfair games


From now on we shall restrict ourselves to the case of discrete time parameter, i.e. T = N. Let
(Ω, F, P) be a probability space and (Fn )n≥0 be a filtration. Let (Xn )n≥0 be a submartingale
(resp. martingale) with respect to the filtration (Fn )n≥0 .
Note that in this setting the condition defining a supermartingale or a martingales can be equiva-
lently written as:
1. E[Xn − Xn−1 |Fn−1 ] ≤ 0 (supermartingale)
2. E[Xn − Xn−1 |Fn−1 ] = 0 (martingale)
Interpreting Xn − Xn−1 as the net winning per unit stake at game n and E[Xn − Xn−1 |Fn−1 ] as
the best estimate of this value given Fn−1 , i.e. given the information on the story of the game up
to time n − 1, then relation 1 can be interpreted as an unfair game, while relation 2 as a fair game.
A process {Cn }n≥1 will be said previsible with respect to the filtration {Fn }n≥ 0 if for any n ≥ 1
the random variable Cn is Fn−1 -measurable. This means that the value of Cn can be computed
given the information encoded in Fn−1 .
Now we want to construct a strategy of game. Thinking to Cn as the stake at time n, the
winnings on game n is given by Cn (Xn − Xn−1 ), while the total winnings up to time n is given by
n
X
Yn := Ck (Xk − Xk−1 )
k=1

The following result shows that it is impossible to transform an unfair game to a favourable one
by means of a simple game strategy consisting in the choice of a suitable stake at any time n.
Theorem 44 (You cannot beat the system). Let {Fn }n≥ 0 be a filtration and {Xn }n≥ 0 an adapted
process with respect to {Fn }n≥0 such that E[|Xn |] < ∞ ∀n. Let {Cn }n≥0 be a bounded non-negative
to {Fn }n≥ 0 (i.e. there exists a K ∈ R such that Cn (ω) ≤ K for all
previsible process with respect P
n
ω ∈ Ω and n ≥ 1). Let Yn := k=1 Ck (Xk − Xk−1 ). Then:
• if {Xn }n≥ 0 is a martingale then {Yn }n≥ 0 is a martingale;
• if {Xn }n≥ 0 is a submartingale then {Yn }n≥ 0 is a submartingale;
• if {Xn }n≥ 0 is a supermartingale then {Yn }n≥ 0 is a supermartingale.

Proof: Since Cn is bounded then Yn is integrable. In order to check whether {Yn }n≥ 0 is a mar-
tingale (resp. submartingale resp. supermartingale) it is sufficient to compute E[Yn − Yn−1 |Fn−1 ].
Indeed:
E[Yn − Yn−1 |Fn−1 ] = E[Cn (Xn − Xn−1 )|Fn−1 ] = Cn E[Xn − Xn−1 |Fn−1 ]
Since Cn ≥ 0 we can conclude that the sign of E[Yn − Yn−1 |Fn−1 ] coincides with the sign of
E[Xn − Xn−1 |Fn−1 ]. In particular, if E[Xn − Xn−1 |Fn−1 ] = 0 then E[Yn − Yn−1 |Fn−1 ] = 0.

109
7.3.2 Martingales and stopping times
Let us consider again the definition 23 of stopping time introduced in the framework of Markov
chain.
Given a probability space (Ω, F, P) and a filtration (Fn )n∈N , a discrete random variable τ with
values in the set N ∪ {∞} is called a stopping time (with respect to the filtration (Fn )n∈N ) if

∀n ∈ N {τ = n} ∈ Fn

Remark 16. If τ is a stopping time, also the events {τ ≤ n} and {τ > n} belong to Fn . Indeed

{τ ≤ n} = ∪k≤n {τ = k}

and {τ > n} = {τ ≤ n}c .


As we have already discussed above, (see example 15) it is not difficult to construct interesting
examples of stopping times. An important case is the first hitting time of a set or the time of
first entry. Given a real-valued stochastic process (Ω, F, P, (Fn )n∈N , (Xn )n∈N ), and a Borel subset
B ⊂ B(R), let τ : Ω → N ∪ {∞} defined as: τ (ω) := min{n ∈ N : Xn (ω) ∈ B}, and τ (ω) := ∞ if
{n ∈ N : Xn (ω) ∈ B} = ∅. It is not difficult to show that τ is a stopping time, indeed

{τ = n} = (∩k<n {Xk ∈ B c }) ∩ {Xn ∈ B}

and the event on the right hand side of the identity above belongs to Fn since it involves only the
random variables Xk with k ≤ n, which are Fn -measurable.
Let us consider now (Xn )n∈N a sequence of random variables on (Ω, F, P) adapted to a filtration
(Fn )n∈N and let τ be stopping time (adapted to the same filtration). We shall call the new sequence
(Yn )n defined as
Yn := Xτ ∧n
the sequence (Xn )n stopped at τ .
It is not difficult to prove that the sequence (Yn )n = (Xτ ∧n )n is adapted to the filtration. Indeed,
for any set I ∈ B(R) and for any n ∈ N we have

{Yn ∈ I} = {Xτ ∧n ∈ I}
= ({τ ≤ n} ∩ {Xτ ∧n ∈ I}) ∪ ({τ > n} ∩ {Xτ ∧n ∈ I})
= ({τ ≤ n} ∩ {Xτ ∈ I}) ∪ ({τ > n} ∩ {Xn ∈ I})
= ∪m≤n ({τ = m} ∩ {Xm ∈ I}) ∪ ({τ > n} ∩ {Xn ∈ I})

all the sets appearing in the last line belong to Fn .


Example 38. Let us consider again the simple gambling example described in section 7.3.1. Let
us assume that the gambler decides to stop gambling if either its total winning reaches a level b > 0
or else if its losses reach a level a < 0. Let τ be the stopping time defined as the first hitting time
of set B = {a, b}. In this case the stopped process (Xτ ∧n ) gives the winning or loss of the gambler.
In section 7.3.1 we have proved that it is impossible to provide a gambling strategy that can
turn a fair game in an unfair one. The same is true if the gambling strategy relies on a suitable
stopping time.
Theorem 45. Let (Xn )n∈N a sequence of random variables on (Ω, F, P) adapted to a filtration
(Fn )n∈N and let τ be stopping time (adapted to the same filtration). Then:

110
• if (Xn )n∈N is a martingale then (Xτ ∧n )n∈N is a martingale;
• if (Xn )n∈N is a submartingale then (Xτ ∧n )n∈N is a submartingale;
• if (Xn )n∈N is a supermartingale then (Xτ ∧n )n∈N is a supermartingale;

Proof: The proof relies on the fact that the introduction of a stopping time can be cast in the
form of the gambling strategy described in section 7.3.1. Indeed, let (Cn )n≥1 be the sequence of
random variables on (Ω, F, P) defined as
(
1 τ (ω) ≥ n
Cn (ω) :=
0 τ (ω) < n

It is easy to see that the process (Cn )n is previsible, indeed for any n ≥ 1 the random variable Cn
is Fn−1 -measurable since Cn = 1τ ≥n is the indicator function of the event {τ (ω) ≥ n} = {τ (ω) >
n − 1}, which belongs to Fn−1 by Remark 16. Further, the following identity holds

Xτ ∧n = C1 (X1 − X0 ) + C2 (X2 − X1 ) + · · · + Cn (Xn − Xn−1 )

and by theorem 44 we obtain the final result.

7.4 Additional results on martingales


7.4.1 The optimal stopping theorem
Theorem 45 shows that if we start from a martingale (Xn )n and a stopping time τ , the stopped
martingale Xτ ∧n is a martingale as well, in particular its expected value is a constant:

E[Xτ ∧n ] = E[X1 ]

A stronger and more useful result can be proved under a suitable set of assumptions.
Theorem 46 (Optimal stopping theorem). Let (Ω, F, P, (Fn )n∈N , (Xn )n∈N ) be a martingale and
τ a stopping time with respect to the filtration (Fn )n∈N . If the following conditions hold
i. P(τ < +∞) = 1
ii. E[|Xτ |] < ∞
iii. E[Xn 1{τ >n} ] → 0 as n → ∞.
Then
E[Xτ ] = E[X1 ]

Proof:
It is easy to check the following identity, valid for any n ≥ 1

Xτ = Xτ ∧n + (Xτ − Xn )1{τ >n}

111
hence
E[Xτ ] = E[Xτ ∧n ] + E[Xτ 1{τ >n} ] − E[Xn 1{τ >n} ]
Since by theorem 45 the process (Xτ ∧n )n∈N is a martingale, we have E[Xτ ∧n ] = E[X1 ] and we get

E[Xτ ] = E[X1 ] + E[Xτ 1{τ >n} ] − E[Xn 1{τ >n} ]

We can now take the limit for n → ∞ of both sides. By assumption iii. E[Xn 1{τ >n} ] → 0 as
n → ∞. Concerning the term E[Xτ 1{τ >n} ], we can write:

E[Xτ 1{τ >n} ] = E[Xτ 1{τ =k} ]


X

k>n

E[Xτ 1{τ =k} ]


X
=
k>n

E[Xk 1{τ =k} ]


X
=
k>n

E[Xk 1{τ =k} ] → 0 as n → ∞ since the series


P
Now it is sufficient to observe that k>n

E[Xk 1{τ =k} ] = E[Xτ ]


X

is convergent by assumption ii.

Remark 17. Condition iii. holds if, for instance, condition i. is fulfilled and there exists a
constant K such that |Xn (ω)| ≤ K if n < τ (ω). Indeed, in this case
Z
|E[Xn 1{τ >n} ]| = | Xn (ω)dP(ω)| ≤ KP(τ > n)
τ >n

since the sequence of events An = {τ > n} is a decreasing sequence we have

lim P(τ > n) = P(∩n {τ > n}) = P(τ = ∞)


n→∞

but, by condition i. we have P(τ = ∞) = 0

7.4.2 Martingales and Markov chains


A useful tool for establishing condition i. in the theorem above is provided by the following result
on recurrent irreducible Markov chains. Let us recall that for i ∈ S the random variable Ti : Ω →
N ∪ {+∞} denotes the first return time to the state i:

Ti (ω) := inf{n ≥ 1 : Xn (ω) = i}

(with Ti (ω) = +∞ if {n ≥ 1 : Xn (ω) = i} = ∅. Clearly Ti is a stopping time with respect to the


natural filtration associated to (Xn )n . We know that if i is a recurrent state then Pi (Ti < +∞) = 1.
The following lemma shows that if the chain is irreducible we have Pi (Tj < +∞) = 1 for all i, j ∈ S.

112
Theorem 47. Let (Xn )n be an irreducible recurrent Markov chain. Then Pi (Tj < +∞) = 1 for
all i, j ∈ S.

Proof: Let us consider the case where i 6= j, otherwise there is nothing to prove.

Let us assume ”ad absurdum” that for some i, j ∈ S, Pi (Tj < +∞) = fij < 1. This means that

starting from i we have positive probability (1 − fij ) of never reaching j in the future. On the
other hand, since by assumption the chain is irreducible, we have positive probability of reaching
i starting from j, in other words there exists at least a finite time n ≥ 1 such that Pj (Ti = n) > 0.
Let us consider the minimum of this numbers, namely let n0 ∈ N be defined as

n0 := min{n ≥ 1 : Pj (Ti = n) > 0} (7.10)

Since pnji0 > 0 there exists a finite sequence of states j ≡ i0 6= i1 6= · · · 6= in0 −1 6= in0 ≡ i such that

pi0 ii pi1 i2 · · · pin0 −1 in0 = pjii pi1 i2 · · · pin0 −1 i > 0

and neither of the states i1 , . . . , in0 −1 are equal to i or j (otherwise n0 wouldn’t be the minimum
of the set (7.10)). We can now prove that the probability of starting at j and never reaching j in
the future is strictly positive, since it is at least equal to

pjii pi1 i2 · · · pin0 −1 i (1 − fij )>0

but, on the other hand, we know that this probability is equal to 0 since j is a recurrent state.

Corollary 2. Let (Xn )n be an irreducible recurrent Markov chain. For any j ∈ S

P(Tj < +∞) = 1

Proof: Let λ be the initial distribution of the Markov chain. Then


X X
P(Tj < +∞) = Pi (Tj < +∞)P(X0 = i) = λi = 1
i∈S i∈S

Let us consider again example 38 Let us assume that the gambler decides to stop gambling if
either its total winning reaches a level b > 0 or else if its losses reach a level −a < 0. Let τ be
the stopping time defined as the first hitting time of set B = {−a, b} and let Xτ the winning (or
loss) at the end of the game. Optimal stopping theorem can be applied. Indeed the sequence of
random variables (Xn )n besides being a martingale is an irreducible recurrent Markov chain. We
can then apply theorem 47 and corollary 2 and conclude P(τ < +∞) = P0 (τ < +∞ = 1, hence
condition i. of theorem 46 is satisfied. Concerning condition ii., we have |Xτ | ≤ max{a, b} hence
E[|Xτ |] ≤ max{a, b} < +∞. Moreover, if n < τ (ω), then |Xn | < max{a, b} with P0 probability
1, hence by remark 17 also condition iii. is fulfilled. By optimal stopping theorem we have
E[Xτ ] = E[X1 ] = 0 hence,

0 = bP(Xτ = b) − aP(Xτ = −a) = bP(Xτ = b) − a(1 − P(Xτ = b))

which gives P(Xτ = b) = a/(b + a) and P(Xτ = −a) = b/(b + a).

113
Let us now consider a Markov chain (Xn )n≥0 on a probability space (Ω, F, P) and let (Fn )n be
the natural filtration. We are going to investigate under what conditions the sequence of random
variables (Xn )n≥0 is a martingale.
Since (Fn )n is the natural filtration, we have Fn = σ(X0 , ..., Xn ). Furthermore, since the ran-
dom variables X0 , ..., Xn are discrete, the sigma algebra σ(X0 , ..., Xn ) is generated by a particular
partition of Ω: the one made of the sets of the form {X0 = i0 , ..., Xn = in } with i0 , ..., in ∈ S ⊂ R.
For any random variable Y on (Ω, F, P), its conditional expectation with respect to Fn will be of
the form

ci0 ,....,in 1{X0 =i0 ,...,Xn =in } (ω)


X
E[Y |Fn ](ω) = E[Y |σ(X0 , ..., Xn )](ω) =
i0 ,....,in ∈S

where Z
1
ci0 ,....,in = Y (ω)dP(ω)
P(X0 = i0 , ..., Xn = in ) {X0 =i0 ,...,Xn =in }

j 1Xn+1=j , we
P
In particular, if we consider the case where Y = Xn+1 by writing Xn+1 = j∈S
obtain
X {X0 =i0 ,...,Xn =in } 1Xn+1 =j (ω)dP(ω)
R
ci0 ,....,in = j
P(X0 = i0 , ..., Xn = in )
j∈S

1{X0 =i0 ,...,Xn =in } (ω)1Xn+1 =j (ω)dP(ω)


R
X
= j Ω
P(X0 = i0 , ..., Xn = in )
j∈S
X P(X0 = i0 , ..., Xn = in , Xn+1 = j)
= j
P(X0 = i0 , ..., Xn = in )
j∈S
X
= jP[Xn+1 = j|X0 = i0 , ..., Xn = in ]
j∈S

and by applying Markov property we get


X
ci0 ,....,in = jP[Xn+1 = j|Xn = in ] = E[Xn+1 |Xn = in ]
j∈S

hence

E[Xn+1 |Xn = in ]1{X0 =i0 ,...,Xn =in } (ω)


X
E[Xn+1 |Fn ](ω) =
i0 ,....,in ∈S

E[Xn+1 |Xn = in ]1{Xn =in } (ω)


X
=
in ∈S

Hence, a Markov chain is a martingale if it is integrable and for any n

E[Xn+1 − Xn |Xn = in ]1{Xn =in } (ω) = 0


X
E[Xn+1 − Xn |Fn ](ω) = (7.11)
in ∈S

which means that for any n ∈ N and any state i ∈ S, the conditional expectations of the increments
vanishes:
E[Xn+1 − Xn |Xn = i] = 0 (7.12)

114
The Wright Fisher model
We introduce now a simple Markov chain modelling the inheritance of a particular gene with two
alleles A and a.
We assume that in each generation there are m alleles, some of type A and some of type a. The
random variable Xn denotes the number of alleles of type A in the n − th generation. We assume
that the number of alleles of type A in the generation n + 1 are obtained by choosing randomly
with replacement from those in the generation n.
For instance, if in the generation n we have the following alleles

AAaAaaa

then we have probability 3/7 of choosing alleles of type A and probability 4/7 of choosing alleles
of type a in the next generation. The transition probabilities are
   j  m−j
m i m−i
pij = , i, j ∈ {0, ..., m}
j m m
This can be regarded as a simple model of generic inheritance by taking m even (m = 2k), assuming
that each of the k individuals has two genes (hence the possibilities are AA, Aa and aa) and in
the next generation each individual is obtained by mating individuals randomly chosen from the
current generation. The offsprings inherit one allele from each parent. In this simple model we
allow that the two parents coincide and we do not distinguish the gender of the parents.
It is simple to see that 0 and m are absorbing states, and that there are three classes {0},
{m} and {1, ..., m − 1}, the latter being transient. We can compute the probability of extinction
of a particular allele, i.e. the probability that if we start at initial time with i alleles of type A
(i 6= 0, m) we reach the state j = 0 in the future. First of all we can see that (Xn ) is a martingale.
Indeed, for any i ∈ S
m
X
E[Xn+1 − Xn |Xn = i] = jpij − i
j=0
m   j  m−j
X m i m−i
= j −i
j=0
j m m
i
=m −i=0
m
Further, let τ be the stopping time defined as the first hitting time of the absorbing set {0, m}.
We can apply theorem 45 since for any i ∈ S we have Pi (τ < +∞) = 1 since all the states different
from 0, m are transient hence during the whole history of the Markov chain they will be visited
only a finite number of times and, sooner or later, we will reach either the state 0 or the state m.
Moreover, it is simple to see that E[Xτ ] < ∞ and |Xn | < +∞ for n ≤ τ . We can then conclude
that
Ei [Xτ ] = Ei [X0 ] = i
which means
0Pi [Xτ = 0] + mPi [Xτ = m] = i
hence
i m−i
Pi [Xτ = m] = , Pi [Xτ = 0] =
m m

115
The Moran model
The Moran model is an alternative simple stochastic model for the genetic inheritance. An in the
Wright-Fisher model we assume that in each generation there are m alleles, some of type A and
some of type a, and the random variable Xn denotes the number of alleles of type A in the n − th
generation.
While in the Wright Fisher model at each step the whole population changes, the Moran model
is a birth and death-chain. The only one-step transition allowed are those where the number of
individuals of type A can change at most of one unit. More precisely, the population at time
n + 1 is obtained by choosing randomly an individual out of the population at time n and adding
an individual of the same type. In addition, another individual is chosen randomly out of the
population at time n and it is removed. In this way the total number of alleles is constant and the
non-vanishing transition probabilities are given by:
i m−i m−i i i i m−im−i
pi(i+1) = , pi(i−1) = , pii) = +
m m m m mm m m
The Moran Model is a martingale. The simplest way to check this property is to verify that
E[Xn+1 − Xn |Xn = i] = 0 for all i ∈ S. Indeed we have:

E[Xn+1 − Xn |Xn = i] = E[Xn+1 − i|Xn = i]


= (i + 1 − i)pi(i+1) + (i + 1 − i)pi(i−1) + (i − i)pii
= pi(i+1) − pi(i−1) = 0

As in the Wright-Fisher model, the states 0 and m are absorbing and there are three classes
{0}, {m} and {1, ..., m−1}, the latter being transient. The probability of extinction of a particular
allele can be computed by applying the optimal stopping theorem, similarly to the Wright-Fisher
model, obtaining
i m−i
Pi [Xτ = m] = , Pi [Xτ = 0] =
m m
where τ denotes the stopping time defined as the first hitting time of the absorbing set {0, m}.

116

You might also like