Continuous-Time Markov Chains
Continuous-Time Markov Chains
Copyright
A Markov chain in discrete time, {Xn : n 0}, remains in any state for exactly one
unit of time before making a transition (change of state). We proceed now to relax this
restriction by allowing a chain to spend a continuous amount of time in any state, but
in such a way as to retain the Markov property. As motivation, suppose we consider the
rat in the maze Markov chain. Clearly it is more realistic to be able to keep track of
where the rat is at any continuous-time t 0 as oppposed to only where the rat is after
n steps.
Assume throughout that our state space is S = Z = { , 2, 1, 0, 1, 2, } (or some
subset thereof). Suppose now that whenever a chain enters state i S, independent of
the past, the length of time spent in state i is a continuous, strictly positive (and proper)
random variable Hi called the holding time in state i. When the holding time ends,
the process then makes a transition into state j according to transition probability Pij ,
independent of the past, and so on.1 Letting X(t) denote the state at time t, we end up
with a continuous-time stochastic process {X(t) : t 0} with state space S.
Our objective is to place conditions on the holding times to ensure that the continuoustime process satisfies the Markov property: The future, {X(s + t) : t 0}, given the
present state, X(s), is independent of the past, {X(u) : 0 u < s}. Such a process will
be called a continuous-time Markvov chain (CTMC), and as we will conclude shortly,
the holding times will have to be exponentially distributed.
The formal definition is given by
Definition 1.1 A stochastic process {X(t) : t 0} with discrete state space S is called
a continuous-time Markvov chain (CTMC) if for all t 0, s 0, i S, j S,
P (X(s + t) = j|X(s) = i, {X(u) : 0 u < s}) = P (X(s + t) = j|X(s) = i) = Pij (t).
Pij (t) is the probability that the chain will be in state j, t time units from now, given
it is in state i now.
For each t 0 there is a transition matrix
P (t) = (Pij (t)), i, j S,
and P (0) = I, the identity matrix.
As for discrete-time Markov chains, we are assuming here that the distribution of the
future, given the present state X(s), does not depend on the present time s, but only on
1
Pii > 0 is allowed, meaning that a transition back into state i from state i can ocurr. Each time this
happens though, a new Hi , independent of the past, determines the new length of time spent in state i.
See Section 1.14 for details.
the present state X(s) = i, whatever it is, and the amount of time that has elapsed, t,
since time s. In particular, Pij (t) = P (X(t) = j|X(0) = i).
But unlike the discrete-time case, there is no smallest next time until the next
transition, there is a continuum of such possible times t. For each fixed i and j, Pij (t), t
0 defines a function which in principle can be studied by use of calculus and differential
equations. Although this makes the analysis of CTMCs more difficult/technical than for
discrete-time chains, we will, non-the-less, find that many similarities with discrete-time
chains follow, and many useful results can be obtained.
A little thought reveals that the holding times must have the memoryless property
and thus are exponentially distributed. To see this, suppose that X(t) = i. Time t lies
somewhere in the middle of the holding time Hi for state i. The future after time t tells
us, in particular, the remaining holding time in state i, whereas the past before time t,
tells us, in particular, the age of the holding time (how long the process has been in state
i). In order for the future to be independent of the past given X(t) = i, we deduce that
the remaining holding time must only depend (in distribution) on i and be independent of
its age; the memoryless property follows. Since an exponential distribution is completely
determined by its rate we conclude that for each i S, there exists a constant (rate)
ai > 0, such that the chain, when entering state i, remains there, independent of the
past, for an amount of time Hi exp(ai ):
A CTMC makes transitions from state to state, independent of the past, according to a discrete-time Markov chain, but once entering a state remains in
that state, independent of the past, for an exponentially distributed amount of
time before changing state again.
Thus a CTMC can simply be described by a transition matrix P = (Pij ), describing
how the chain changes state step-by-step at transition epochs, together with a set of rates
{ai : i S}, the holding time rates. Each time state i is visited, the chain spends, on
average, E(Hi ) = 1/ai units of time there before moving on.
1.1
Letting n denote the time at which the nth change of state (transition) occurs, we see that
Xn = X(n +), the state right after the nth transition, defines the underlying discrete-time
Markov chain, called the embedded Markov chain. {Xn } keeps track, consecutively, of
the states visited right after each transition, and moves from state to state according to
the one-step transition probabilities Pij = P (Xn+1 = j|Xn = i). This transition matrix
(Pij ), together with the holding-time rates {ai }, completely determines the CTMC.
1.2
Chapman-Kolmogorov equations
kS
As for discrete-time chains, the (easy) proof involves first conditioning on what state
k the chain is in at time s given that X(0) = i, yielding Pik (s), and then using the
Markov property to conclude that the probability that the chain, now in state k, would
then be in state j after an additional t time units is, independent of the past, Pkj (t).
1.3
Examples of CTMCs
1. Poisson counting process: Let {N (t) : t 0} be the counting process for a Poisson
process = {tn } at rate . Then {N (t)} forms a CTMC with S = {0, 1, 2, . . .},
Pi,i+1 = 1, ai = , i 0: If N (t) = i then, by the memoryless property, the next
arrival, arrival i + 1, will, independent of the past, occur after an exponentially
distributed amount of time at rate . The holding time in state i is simply the
interarrival time, ti+1 ti , and n = tn since N (t) only changes state at an arrival
time. Assuming that N (0) = 0 we conclude that Xn = N (tn +) = n, n 0; the
embedded chain is deterministic. This is a very special kind of CTMC for several
reasons. (1) all holding times Hi have the same rate ai = , and (2) N (t) is a nondecreasing process; it increases by one at each arrival time, and remains constant
otherwise. As t , N (t) step by step.
2. Consider the rat in the closed maze, in which at each transition, the rat is equally
likely to move to one of the neighboring two cells, but where now we assume that
the holding time, Hi , in cell i is exponential at rate ai = i, i = 1, 2, 3, 4. Time is in
minutes (say). Let X(t) denote the cell that the rat is in at time t. Given the rat is
now in cell 2 (say), he will remain there, independent of the past, for an exponential
amount of time with mean 1/2, and then move, independent of the past, to either
cell 1 or 4 w.p.=1/2. The other transitions are similarly explained. {X(t)} forms
a CTMC. Note how cell 4 has the shortest holding time (mean 1/4 minutes), and
cell 1 has the longest (mean 1 minute). Of intrinisic interest is to calculate the
long-run proportion of time (continuous time now) that the rat spends in each cell;
Z
1 t
def
I{X(s) = i}ds, i = 1, 2, 3, 4.
Pi = lim
t t 0
We will learn how to compute these later; they serve as the continuous-time analog
to the discrete-time stationary probabilities i for discrete-time Markov chains.
3
denotes the number of customers in the system at time t. For illustration, lets
assume c = 2. Then, for example, X(t) = 4 means that two customers are in
service (each with their own server) and two others are waiting in line. When
X(t) = i {0, 1}, the holding times are the same as for the M/M/1 model; a0 =
, a1 = + . But when X(t) = i 2, both remaining service times, denoted by
Sr1 and Sr2 , compete to determine the next departure. Since they are independent
exponentials at rate , we deduce that the time until the next departure is given by
min{Sr1 , Sr2 } exp(2). The time until the next arrival is given by X exp()
and is independent of both remaining service times. We conclude that the holding
time in any state i 2 is given by Hi = min{X, Sr1 , Sr2 } exp( + 2).
For the general case of c 2, the rates are determined analogously: ai = +i, 0
i c, ai = + c, i > c.
For the embedded chain: P0,1 = 1 and for 0 i c1, Pi,i+1 = /( + i), Pi,i1 =
i/( + i). Then for i c, Pi,i+1 = /( + c), Pi,i1 = c/( + c). This is an
example of a simple random walk with state-dependent up, down probabilities:
at each step, the probabilities for the next increment depend on i, the current state,
until i = c at which point the probabilities remain constant.
5. M/M/ infinite-server queue: Here we have a M/M/c queue with c = ; a special
case of the M/G/ queue. Letting X(t) denote the number of customers in the
system at time t, we see that ai = + i, i 0 since there is no limit on the
number of busy servers.
For the embedded chain: P0,1 = 1 and Pi,i+1 = /( + i), Pi,i1 = i/( + i), i
1. This simple random walk thus has state-dependent up, down probabilities
that continue to depend on each state i, the current state. Note how, as i increases, the down probability, Pi,i1 , increases, and approaches 1 as i : when
the system is heavily congested, departures occur rapidly; this model is always
stable.
1.4
Except for Example 2 (rat in the closed maze) all of the CTMC examples in the previous
section were Birth and Death (B&D) processes, CTMCs that can only change state by
increasing by one, or decreasing by one; Pi,i+1 + Pi,i1 = 1, i S. (In Example 2,
P1,3 > 0, for example, so it is not B&D.) Here we study B&D processes more formally,
since they tend to be a very useful type of CTMC. Whenever the state increases by one,
we say there is a birth, and whenever it decreases by one we say there is a death. We
shall focus on the case when S = {0, 1, 2, . . .}, in which case X(t) can be thought of as
the population size at time t.
For each state i 0 we have a birth rate i and a death rate i : Whenever X(t) = i,
independent of the past, the time until the next birth is a r.v. X exp(i ) and,
independently, the time until the next death is a r.v. Y exp(i ). Thus the holding
5
time rates are given by ai = i + i because the time until the next transition (change
of state) in given by the holding time Hi = min{X, Y } exp(i + i ). The idea here is
that at any given time the next birth is competing with the next death to be the next
transition. (We always assume here that 0 = 0 since there can be no deaths without a
population.)
This means that whenever X(t) = i 1, the next transition will be a birth w.p.
Pi,i+1 = P (X < Y ) = i /(i + i ), and a death w.p. Pi,i1 = P (Y < X) = i /(i + i ).
Thus the embedded chain for a B&D process is a simple random walk with state dependent
up, down probabilities.
When i = 0, i 0, and i > 0, i 0, we call the process a pure birth process;
the population continues to increase by one at each transition. The main example is the
Poisson counting process (Example 1 in the previous Section), but this can be generalized
by allowing each i to be different. The reader is encouraged at this point to go back
over the B&D Examples in the previous Section.
1.5
Explosive CTMCs
X
T =
Hi .
i=0
2i = 2 < ,
i=0
and we conclude that on average all states i 0 have been visited by time t = 2, a finite
amount of time! But this implies that w.p.1., all states will be visited in a finite amount
of time; P (T < ) = 1. Consequently, w.p.1., X(T + t) = , t 0. This is an example
of an explosive Markov chain: The number of transitions in a finite interval of time is
infinite.
We shall rule out this kind of behavior in the rest of our study, and assume from now
on that all CTMCs considered are non-explosive, by which we mean that the number of
transitions in any finite interval of time is finite. This will always hold for any CTMC
with a finite state space, or any CTMC for which there are only a finite number of
distinct values for the rates ai , and more generally whenever sup{ai : i S} < . Every
Example given in the previous Section was non-explosive. Only the M/M/ queue needs
6
1.6
1.7
State i is called positive recurrent if, in addition to being recurrent, E(Tii ) < ; the
expected amount of time to return is finite. State i is called null recurrent if, in addition
7
to being recurrent, E(Tii ) = ; the expected amount of time to return is infinite. Unlike
recurrence, positive (or null) recurrence is not equivalent to that for the embedded chain:
It is possible for a CTMC to be positive recurrent while its embedded chain is null
recurrent (and vice versa). But positive and null recurrence are still class properties, so
in particular:
For an irreducible CTMC, all states together are transient, positive recurrent,
or null recurrent.
A CTMC is called positive recurrent if it is irreducible and all states are positive
recurrent. We define (when they exist, independent of initial condition X(0) = i) the
limiting probabilities {Pj } for the CTMC as the long-run proportion of time the chain
spends in each state j S:
Z
1 t
I{X(s) = j|X(0) = i}ds, w.p.1.,
(1)
Pj = lim
t t 0
which after taking expected values yields
1
Pj = lim
t t
Pij (s)ds.
(2)
P
When each Pj exists and j Pj = 1, then P~ = {Pj } (as a row vector) is called the
limiting (or stationary) distribution for the Markov chain. Letting
P~
(3)
P = P~
..
.
denote the matrix in which each row is the limiting probability distribution P~ , (2) can
be expressed nicely in matrix form as
Z
1 t
lim
P (s)ds = P .
(4)
t t 0
As for discrete-time Markov chains, positive recurrence implies the existence of limiting probabilities by use of the SLLN. The basic idea is that for fixed state j, we can
break up the evolution of the CTMC into i.i.d. cycles, where a cycle begins every time
the chain makes a transition into state j. This yields an example of what is called a
regenerative process because we say it regenerates every time a cycle begins. The cycle
lengths are i.i.d. distributed as Tjj , and during a cycle, the chain spends an amount of
time in state j equal in distribution to the holding time Hj . This leads to
Proposition 1.1 If {X(t)} is a positive recurrent CTMC, then the limiting probability
distribution P~ = (Pi,j ) as defined by Equation (1) exists, is unique, and is given by
E(Hj )
1/aj
=
> 0, j S.
E(Tjj )
E(Tjj )
Pj =
In words: The long-run proportion of time the chain spends in state j equals the expected
amount of time spent in state j during a cycle divided by the expected cycle length (between
visits to state j).
Moreover, the stronger mode of convergence (weak convergence) holds:
Pj = lim Pij (t), i, j S.
t
(5)
1
Nj (t)
=
.
t
E(Tjj )
(6)
Letting
Z
n (j)
Jn =
I{X(s) = j}ds,
n1 (j)
(the amount of time spent in state j during the nth cycle) we conclude that {Jn } forms
an i.i.d. sequence of r.v.s. distributed as the holding time Hj ; E(J) = E(Hj ). Thus
Z
Nj (t)
I{X(s) = j}ds
0
Jn ,
n=1
Z
0
Nj (t)
Nj (t)
1 X
I{X(s) = j}ds
Jn .
t
Nj (t) n=1
Letting t yields
Pj =
E(Hj )
,
E(Tjj )
where the denominator is from (6) and the numerator is from the SLLN applied to
{Jn }. Pj > 0 if E(Tjj ) < (positive recurrence), whereas Pj = 0 if E(Tjj ) = (null
recurrence). And if transient, then I{X(s) = j|X(0) = i} 0 as s , wp1, yielding
Pj = 0 as well from (1).
1/a
Uniqueness of the Pj follows by the unique representation, Pj = E(Tjjj ) .
The weak convergence in (5) holds in addition to the already established time-average
convergence because the cycle-length distribution (the distribution of Tjj for any fixed
j) is non-lattice.2 Tjj has a non-lattice distribution because it is of phase type hence
a continuous distribution. In general, a positive recurrent regenerative process with a
non-lattice cycle-length distribution converges weakly. The details of this will be dealt
with later when we return to a more rigorous study of renewal theory.
1.8
1.9
The distribution of a non-negative rv X is said to be non-lattice if there does not exists a d > 0 such
that P (X {nd : n 0}) = 1. Any continuous distribution, in particular, is non-lattice.
10
In fact this limiting distribution P~ is the only distribution (it is unique) that is stationary,
that is, for which P~ P (t) = P~ , t 0. Moreover, letting {X (t) : t 0} denote the chain
when X(0) P~ , it forms a stationary stochastic process: {X (s + t) : t 0} has the
same distribution for all s 0.
Proof : From the definition of P (each row is P~ ) we must equivalently show that
P P (t) = P for any t 0. (Intuitively we are simply asserting that P ()P (t) = P ()
because + t = .)
Recalling the Chapman-Kolmogorov equations, P (s + t) = P (s)P (t), and using (4),
we get
Z
1 u
P P (t) =
lim
P (s)ds P (t)
u u 0
Z
1 u
P (s)P (t)ds
= lim
u u 0
Z u
1
= lim
P (s + t)ds
u u 0
Z u
1
= lim
P (s)ds
u u 0
= P .
The secondRto last equality Rfollows due to
fact that adding
R u+t
R t the fixed t is asymptotically
u
u
negligible: 0 P (s + t)du = 0 P (s)ds + u P (s)ds 0 P (s). All elements of P (s) are
bounded by 1, and so the last two integrals when divided by u tend to 0 as u .
If a probability distribution ~ satisfies ~ P (t) = ~ , t 0, then on the one hand, since
the chain is assumed positive recurrent, we have Equation (7) and hence
Z
1 t
lim
~ P (s)ds = ~ P = P~ .
(8)
t t 0
But on the other hand ~ P (t) = ~ implies that
Z
Z
1 t
1 t
~ P (s)ds =
~ ds = ~ ,
t 0
t 0
and we conclude that ~ = P~ ; the stationary distribution is unique.
By the Markov property, a Markov process is completely determined (in distribution)
by its initial state. Thus {X (s + t) : t 0} has the same distribution for all s 0
because for all s, its initial state has the same distribution, X (s) P~ .
1.10
Assume here that Pi,i = 0 for all i S. ai can be interpreted as the transition rate out
of state i given that X(t) = i; the intuitive idea being that the exponential holding time
11
will end, independent of the past, in the next dt units of time with probability ai dt. This
can be made rigorous. It can be shown that for i 6= j
0
Pi,j
(0) = lim Pi,j (h)/h = ai Pi,j .
(9)
h0
ai Pi,j can thus be interpreted as the transition rate from state i to state j given that
the chain is currently in state i.
When i = j, Pi,i (h) = 1 P (X(h) 6= i | X(0) = i) and it can be shown that
0
(0) = lim(Pi,i (h) 1)/h = ai .
Pi,i
(10)
h0
Definition 1.2 The matrix Q = P 0 (0) given explicitly by (9) and (10) is called the
transition rate matrix, or infinitesimal generator, of the Markov chain
For example, if S = {0, 1, 2, 3, 4}, then
a0 a0 P0,1
a1 P1,0 a1
Q=
a2 P2,0 a2 P2,1
a3 P3,0 a3 P3,1
a4 P4,0 a4 P4,3
a0 P0,2
a1 P1,2
a2
a3 P3,2
a4 P4,3
a0 P0,3
a1 P1,3
a2 P2,3
a3
a4 P4,3
a0 P0,4
a1 P1,4
a2 P2,4
a3 P3,4
a4
Note in passing that since we assume that Pi,i = 0, i S, we conclude that each row
of Q sums to 0.
1.11
We have yet to show how to compute the transition probabilities for a CTMC, Pij (t) =
P (X(t) = j|X(0) = i), t 0. For discrete-time Markov chains this was not a problem
(n)
since Pij = P (Xn = j|X0 = i), n 1 could be computed by using the fact that the
(n)
matrix (Pij ) was simply the transition matrix P multiplied together n times, P n . In
continuous time however, the problem is a bit more complex; it involves setting up linear
differential equations for Pij (t) known as the Kolmogorov backward equations and then
solving. We present this derivation now.
Proposition 1.3 (Kolmogorov Backward Equations) For a (non-explosive) CTMC
with transition rate matrix Q = P 0 (0) as in Definition 1.2, the following set of linear differential equations is satisfied by {P (t)}:
P 0 (t) = QP (t), t 0, P (0) = I,
(11)
that is,
Pij0 (t) = ai Pij (t) +
k6=i
12
(12)
(13)
X
Mn
n=0
n!
(14)
(15)
dividing by h and letting h 0 then yields P 0 (t) = P 0 (0)P (t) = QP (t). (Technically,
this involves justifying the interchange of a limit and an infinite sum, which indeed can
be justified here even when the state space is infinite.)
The word backward refers to the fact that in our use of the Chapman-Kolmogorov
equations, we chose to place the h on the right-hand side in back, P (t + h) = P (h)P (t)
as opposed to in front, P (t + h) = P (t)P (h). The derivation above can be rigorously
justified for any non-explosive CTMC.
It turns out, however, that the derivation of the analogous forward equations P 0 (t) =
P (t)Q, t 0, P (0) = I, that one would expect to get by using P (t + h) = P (t)P (h)
can not be rigorously justified for all non-explosive CTMCs; there are examples (infinite
state space) that cause trouble; the interchange of a limit and an infinite sum can not be
justified.
But it does not matter, since the unique solution P (t) = eQt to the backward equations
is the unique solution to the forward equations, and thus both equations are valid.
For a (non-explosive) CTMC, the transition probabilities Pi,j (t) are the unique
solution to both the Kolmogorov backward and forward equations.
Remark 1.1 It is rare that we can explicitly compute the infinite sum in the solution
P (t) = eQt =
X
(Qt)n
n=0
n!
But there are various numerical recipes for estimating eQt to any desired level of accuracy.
For example, since
eM = limn (1 + M/n)n , for any square matrix M , one can choose n large and use
eQt (1 + (Qt)/n)n .
13
1.12
Consider any deterministic function x(t), t 0 with values in S. Clearly, every time
x(t) enters a state j, it must first leave that state in order to enter it again. Thus the
number of times during the interval (0, t] that it enters state j differs by at most one,
from the number of times during the interval (0, t] that it leaves state j. We conclude
(by dividing by t and letting t ) that the long-run rate at which the function leaves
state j equals the long-run rate at which the function enters state j. In words, the rate
out of state j is equal to the rate into state j, for each state j. We can apply this kind
of result to each sample-path of a stochastic process. For a positive recurrent CTMC
with limiting distribution P~P
= {Pj }, the rate out of state j is given by aj Pj , while the
rate into state j is given by i6=j Pi ai Pij , j S, by interpreting the limiting probability
Pj as a proportion of time and recalling Section 1.10 on transition rates.
Definition 1.3 The balance equations for a positive recurrent CTMC are given by
X
aj Pj =
Pi ai Pij , j S,
(16)
i6=j
(17)
But this implies that P~ P (t) is a constant in t and hence that P~ P (t) = P~ P (0) = P~ I =
P~ ; P~ is indeed a stationary distribution. Now suppose that the chain is not positive
recurrent. For an irreducible CTMC, all states together are transient, positive recurrent,
or null recurrent, so the chain must be either null recurrent or transient and hence by
Proposition 1.1, we have
Z
1 t
lim
P (s)ds = 0.
(18)
t t 0
Multiplying both sides on the left by P~ yields
1
lim
t t
P~ P (s)ds = 0.
(19)
Rt
Rt
But using the already established P~ P (t) = P~ we have 1t 0 P~ P (s)ds = 1t 0 P~ ds = P~
and we end with a contradiction P~ = 0 (P~ is a probability distribution by assumption).
Finally, from Proposition 1.2 we know that there can only be one stationary distribution
for a positive recurrent chain, the limiting distribution as defined in Equations (1)-(4),
so we conclude that P~ here is indeed the limiting distribution.
As for discrete-time Markov chains, when the state space is finite, we obtain a useful
and simple special case:
Theorem 1.2 An irreducible CTMC with a finite state space is positive recurrent; there
is always a unique probability solution to the balance equations.
Proof : Suppose (without loss of generality) the state space is S = {1, 2, . . . , b} for some
integer b 1. We already know that the chain must be recurrent because the embedded
chain is so. We also know that the embedded chain is positive recurrent because for finite
state discrete-time chains irreducibility implies positive recurrence. Let 1,1 denote the
discrete return time to state 1, and let T1,1 denote the corresponding continuous return
time. We know that E(1,1 ) < . Also, T1,1 is a random sum of 1,1 holding times,
starting with H1 . Let a = min{a1 , . . . , ab }. Then a > 0 and every holding time Hi
satisfies E(Hi ) 1/a < , i {1, 2, . . . , b}. Letting {Yn } denote iid exponential rvs
at rate a , independent of 1,1 , we conclude (Walds Equation) that
E(T1,1 ) E
1,1
X
n=1
1.13
Here we apply Theorems 1.1 and 1.2 to a variety of models. In most cases, solving
the resulting balance equations involves recursively expressing all the Pj in terms of one
15
P
particular one, P0 (say), then solving for P0 by using the fact that jS Pj = 1. In the
case when the state space is infinite, the sum is an infinite sum that might diverge unless
further restrictions on the system parameters (rates) are enforced.
1. FIFO M/M/1 queue: X(t) denotes the number of customers in the system at time
t. Here, irreducibility is immediate since as pointed out earlier, the embedded chain
is a simple random walk (hence irreducible), so, from Theorem 1.1, we will have
positive recurrence if and only if we can solve the balance equations (16):
P0 = P1
( + )P1 = P0 + P2
( + )P2 = P1 + P3
..
.
( + )Pj = Pj1 + Pj+1 , j 1.
These equations can also be derived from scratch as follows: Given X(t) = 0, the
rate out of state 0 is the arrival rate a0 = , and the only way to enter state 0 is
from state i = 1, from which a departure must occur (rate ). This yields the first
equation. Given X(t) = j 1, the rate out of state j is aj = + (either an
arrival or a departure can occur), but there are two ways to enter such a state j:
either from state i = j 1 (an arrival occurs (rate ) when X(t) = j 1 causing
the transition j 1 j), or from state i = j + 1 (a departure ocurrs (rate ) when
X(t) = j causing the transition j + 1 j). This yields the other equations.
Note that since P0 = P1 (first equation), the second equation reduces to P1 =
P2 which in turn causes the third equation to reduce to P2 = P3 , and in general
the balance equations reduce to
Pj = Pj+1 , j 0,
(20)
X
j=0
16
j ,
from which we conclude that there is a solution if and only if the geometric series
converges, that is, if and only if < 1, equivalently < , the arrival rate is less
than the service rate, in which case 1 = P0 (1 )1 , or P0 = 1 .
Thus Pj = j (1 ), j 0 and we obtain a geometric stationary distribution.
Summarizing:
The FIFO M/M/1 queue is positive recurrent if and only if < 1 in
which case its stationary distribution is geometric with paramater ; Pj =
j (1), j 0. (If = 1 it can be shown that the chain is null recurrent,
and transient if > 1.)
When < 1 we say that the M/M/1 queue is stable, unstable otherwise. Stability
intuitively means that the queue length doesnt grow without bound over time.
When the queue is stable, we can take the mean of the stationary distribution to
obtain the average number of customers in the system
Z
1 t
X(s)ds
(21)
l = lim
t t 0
X
=
jPj
(22)
=
j=0
j(1 )j
(23)
j=0
.
1
(24)
2. Birth and Death processes: The fact that the balance equations for the FIFO
M/M/1 queue reduced to for each state j, the rate from j to j + 1 equals the
rate from j + 1 to j is not a coincidence, and in fact this reduction holds for any
Birth and Death process. For in a Birth and Death process, the balance equations
are:
0 P0 = 1 P1
(1 + 1 )P1 = 0 P0 + 2 P2
(2 + 2 )P2 = 1 P1 + 3 P3
..
.
(j + j )Pj = j1 Pj1 + j+1 Pj+1 , j 1.
Plugging the first equation into the second yields 1 P1 = 2 P2 which in turn can
be plugged into the third yielding 2 P2 = 3 P3 and so on. We conclude that for
any Birth and Death process, the balance equations reduce to
17
(25)
Using the fact that the probabilities must sum to one then yields:
An irreducible Birth and Death process is positive recurrent if and only if
j
Y
X
i1
j=1 i=1
in which case
P0 =
and
1+
< ,
1
P Qj
j=1
i=1
i1
i
Qj
Pj =
1+
i1
i=1 i
P Qj i1
i=1 i
j=1
, j 1.
(26)
j
Y
X
i1
j=1 i=1
j ,
j=0
18
3. M/M/1 loss system: This is the M/M/1 queueing model, except there is no waiting
room; any customer arriving when the server is busy is lost, that is, departs
without being served. In this case S = {0, 1} and X(t) = 1 if the server is busy
and X(t) = 0 if the server is free. P01 = 1 = P10 ; the chain is irreducible. Since
the state space is finite we conclude from Theorem 1.2 that the chain is positive
recurrent for any > 0 and > 0. We next solve for P0 and P1 . We let = /.
There is only one balance equation, P0 = P1 . So P1 = P0 and since P0 + P1 = 1,
we conclude that P0 = 1/(1 + ), P1 = /(1 + ). So the long-run proportion of
time that the server is busy is /(1 + ) and the long-run proportion of time that
the server is free (idle) is 1/(1 + ).
4. M/M/ queue: X(t) denotes the number of customers (busy servers) in the system
at time t. Being a Birth and Death process we need only consider the Birth and
Death balance equations (25) which take the form
Pj = (j + 1)Pj+1 , j 0.
Irreducibility follows from the fact that the embedded chain is an irreducible simple
random walk, so positive recurrence will follow if we can solve the above equations.
As is easily seen by recursion, Pj = j /j!P0 . Forcing these to sum to one (via
using the Taylors series expansion for ex ), we obtain 1 = e P0 , or P0 = e . Thus
Pj = e j /j! and we end up with the Poisson distribution with mean :
The M/M/ queue is always positive recurrent for any > 0, > 0; its
stationary distribution is Poisson with mean = /.
The above result should not be surprising, for we already studied (earlier in this
course) the more general M/G/ queue, and obtained the same stationary distribution. But because we now assume exponential service times, we are able to
obtain the result using CTMC methods. (For a general service time distribution
we could not do so because then X(t) does not form a CTMC; so we had to use
other, more general, methods.)
5. M/M/c loss queue: This is the M/M/c model except there is no waiting room; any
arrival finding all c servers busy is lost. This is the cserver analog of Example 3.
With X(t) denoting the number of busy servers at time t, we have, for any > 0
and > 0, an irreducible B&D process with a finite state space S = {0, . . . , c},
so positive recurrence follows from Theorem 1.2. The B&D balance equations (25)
are
Pj = (j + 1)Pj+1 , 0 j c 1,
or Pj+1 = Pj /(j + 1), 0 j c 1; the first c equations for the FIFO M/M/
queue. Solving we get Pj = j /j!P0 , 0 j c, and summing to one yields
c
X
j
,
1 = P0 1 +
j!
j=1
19
yielding
c
X
j 1
P0 = 1 +
.
j!
j=1
Thus
1
c
X
n
j
1+
Pj =
, 0 j c.
j!
n!
n=1
(27)
1
c
X
n
c
1+
Pc =
,
c!
n!
n=1
(28)
In particular
the proportion of time that all servers are busy. Later we will see from a result called
PASTA, that Pc is also the proportion of lost customers, that is, the proportion of
arrivals who find all c servers busy. This turns out to be a very famous/celebrated
queueing theory result because the solution in (27), in particular the formula for
Pc in (28), turns out to hold even if the service times are not exponential (the
M/G/c-loss queue), a result called Erlangs Loss Formula.
6. Population model with family immigration: Here we start with a general B&D process (birth rates i , death rates i ), but allow another source of population growth,
in addition to the births. Suppose that at each of the times from a Poisson process
at rate , independently, a family of random size B joins the population (immigrates). Let bi = P (B = i), i 1 denote corresponding family size probabilities.
Letting X(t) denote the population size at time t, we no longer have a B&D process
now since the arrival of a family can cause a jump larger than size one. The balance
equations (the rate out of state j equals the rate into state j) are:
(0 + )P0 = 1 P1
(1 + 1 + )P1 = (0 + b1 )P0 + 2 P2
(2 + 2 + )P2 = b2 P0 + (1 + b1 )P1 + 3 P3
..
.
j1
X
(j + j + )Pj = j Pj1 + Pj+1 +
bji Pi , j 1.
i=0
The derivation is as follows: When X(t) = j, any one of three events can happen
next: A death (rate j ), a birth (rate j ) or a family immigration (rate ). This
yields the rate out of state j. There are j additional ways to enter state j, besides
a birth from state j 1 or a death from state j + 1, namely from each state i < j a
family of size j i could immigrate (rate bji ). This yields the rate into state j.
20
1.14
In our study of CTMCs we have inherently been assuming that Pi,i = 0 for each i S,
but this is not necessary as we illustrate here.
Suppose that 0 < Pi,i < 0. Assume X0 = i and let K denote the total number
of transitions (visits) to state i before making a transition out to another state. Since
X0 = i, we count this initial visit as one such visit. Then P (K = n) = (1p)n1 p, n 1,
where p = 1 Pi,i . Letting Yn denote iid exponential rvs at rate ai (the holding time
rate), we can represent the total holding time HT in state i as an independent geometric
sum
K
X
HT =
Yn .
n=1
So, it makes no difference as far as {X(t)} is concerned3 . This is how it works out for
any CTMC.
1.15
For a stable M/M/1 queue, let ja denote the long-run proportion of arrivals who, upon
arrival, find j customers already in the system. If X(t) denotes the number in system at
time t, and tn denotes the time of the nth Poisson arrival, then
def
ja =
N
1 X
I{X(tn ) = j},
lim
N N
n=1
where X(tn ) denotes the number in system found by the nth arrival.
On the one hand, ja is the long-run rate (number of times per unit time) that X(t)
makes a transition j j + 1. After all, arrivals occur at rate , and such transitions can
only happen when arrivals find j customers in the system. On the other hand, from the
B&D balance equations (20), Pj is also the same rate in question. Thus ja = Pj , or
ja = Pj , j 0,
which asserts that
the proportion of Poisson arrivals who find j customers in the system is equal
to the proportion of time there are j customers in the system.
This is an example of Poisson Arrivals See Time Averages (PASTA), and it turns out
that PASTA holds for any queueing model in which arrivals are Poisson, no matter how
complex, as long as a certain (easy to verify) condition, called LAC, holds. (Service
times do not need to have an exponential distribution, they can be general, as in the
M/G/ queue.) Moreover, PASTA holds for more general quantities of interest besides
number in system. For example, the proportion of Poisson arrivals to a queue who, upon
arrival, find a particular server busy serving a customer with a remaining service time
exceeding x (time units) is equal to the proportion of time that this server is busy serving
a customer with a remaining service time exceeding x. In general, PASTA will not hold
if the arrival process is not Poisson.
To state PASTA more precisely, let {X(t) : t 0} be any stochastic process, and
= {tn : n 0} a Poisson process. Both processes are assumed on the same probability
space. We have in mind that X(t) denote the state of some queueing process with which
the Poisson arriving customers are interacting/participating. The state space S can
3
But there might be other associated stochastic processes that will become different by making this
change. For example, in queueing models, allowing Pi,i > 0 might refer to allowing customers to return
to the end of the queue for another round after completing service. By resetting Pi,i = 0, we are forcing
the customer to re-enter service immediately for the extra round instead of waiting at the end of the
queue. This of course would effect quantities of interest such as average waiting time.
22
1.16
Multi-dimensional CTMCs
So far we have assumed that a CTMC is a one-dimensional process, but that is not
necessary. All of the CTMC theory we have developed in one-dimension applies here as
well (except for the Birth and Death theory). We illustrate with some two-dimensional
examples, higher dimensions being analogous.
1. Tandem queue: Consider a queueing model with two servers in tandem: Each
customer, after waiting in line and completing service at the first single-server
facility, immediately waits in line at a second single-server facility. Upon completion
4
def
It has left-hand limits if for each t > 0, x(t) = limh0 x(t h) exists (but need not equal x(t)). If
x(t) 6= x(t+), then the function is said to be discontinuous at t, or have a jump at t. Queueing processes
typically have jumps at arrval times and departure times.
23
of the second service, the customer finally departs. in what follows we assume that
the first facility is a FIFO M/M/1, and the second server has exponential service
times and also serves under FIFO, in which case this system is denoted by
F IF O M/M/1/ /M/1.
Besides the Poisson arrival rate , we now have two service times rates (one for each
server), 1 and 2 . Service times at each server are assumed i.i.d. and independent
of each other and of the arrival process.
Letting X(t) = (X1 (t), X2 (t)), where Xi (t) denotes the number of customers in
the ith facility, i = 1, 2, it is easily seen that {X(t)} satisfies the Markov property.
This is an example of an irreducible two-dimensional CTMC. Balance equations
(rate out of a state equals rate into the state) can be set up and used to solve for
stationary probabilities. Letting Pn,m denote the long-run proportion of time there
are n customers at the first facility and m at the second (a joint probability),
P0,0 = 2 P0,1 ,
because the only way the chain can make a transion into state (0, 0) is from (0, 1)
(no one is at the first facility, exactly one customer is at the second facility, and
this one customer departs (rate 2 )). Similarly when n 1, m 1,
( + 1 + 2 )Pn,m = Pn1,m + 1 Pn+1,m1 + 2 Pn,m+1 ,
because either a customer arrives, a customer completes service at the first facility
and thus goes to the second, or a customer completes service at the second facility
and leaves the system. The remaining balance equations are also easily derived.
Letting i = /i , i = 1, 2, it turns out that the solution is
Pn,m = (1 1 )n1 (1 2 )m
2 , n 0, m 0,
provided that i < 1, i = 1, 2. This means that as t , X1 (t) and X2 (t)
become independent r.v.s. each with a geometric distribution. This result is quite
surprising because, after all, the two facilities are certainly dependent at any time t,
and why should the second facility have a stationary distribution as if it were itself
an M/M/1 queue? (For example, why should departures from the first facility be
treated as a Poisson process at rate ?) The proof is merely a plug in and check
proof using Theorem 1.2: Plug in the given solution (e.g., treat it as a guess)
into the balance equations and verify that they work. Since they do work, they are
the unique probability solution, and the chain is positive recurrent.
It turns out that there is a nice way of understanding part of this result. The
first facilty is an M/M/1 queue so we know that X1 (t) by itself is a CTMC with
stationary distribution Pn = (1 1 )n1 , n 0. If we start off X1 (0) with this
24
stationary distribution (P (X1 (0) = n) = Pn , n 0), then we know that X1 (t) will
have this same distribution for all t 0, that is, {X1 (t)} is stationary. It turns out
that when stationary, the departure process is itself a Poisson process at rate ,
and so the second facility (in isolation) can be treated itself as an M/M/ 1 queue
when {X1 (t)} is stationary. This at least explains why X2 (t) has the geometric
stationary distribution, (1 2 )m
2 , m 0, but more analysis is required to prove
the independence part.
2. Jackson network:
Consider two FIFO single-server facilities (indexed by 1 and 2), each with exponential service at rates 1 and 2 respectively. For simplicity we refer to each facility as
a node. Each node has its own queue with its own independent Poisson arrival
process at rates 1 and 2 respectively. Whenever a customer completes service
at node i = 1, 2, they next go to the queue at node j = 1, 2 with probability Qij ,
independent of the past, or depart the system with probability Qi,0 , where the state
0 refers to departing the system, and we require that Q0,0 = 1, an absorbing state.
We always assume that states 1 and 2 are transient, and state 0 is absorbing. So
typically, a customer gets served a couple of times, back and forth between the
two nodes before finally departing. In general, we allow feedback, which means
that a customer can return to a given node (perhaps many times) before departing
the system. The tandem queue does not have feedback; it is the special case when
Q1,2 = 1 and Q2,0 = 1 and 2 = 0, an example of a feedforward network. In general,
Q = (Qij ) is called the routing transition matrix, because it represents the transition matrix of a Markov chain. Letting X(t) = (X1 (t), X2 (t)), where Xi (t) denotes
the number of customers in the ith node, i = 1, 2, {X(t)} yields an irreducible
CTMC. Like the tandem queue, it turns out that the stationary distribution for
the Jackson network is of the product form
Pn,m = (1 1 )n1 (1 2 )m
2 , n 0, m 0,
provided that i < 1, i = 1, 2. Here
i =
i
E(Ni ),
i
where E(Ni ) is the expected number of times that a customer attends the ith
facility. E(Ni ) is completely determined by the routing matrix Q: Each customer,
independently, is routed according to the discrete-time Markov chain with transition
matrix Q, and since 0 is absorbing (and states 1 and 2 transient), the chain will
visit each state i = 1, 2 only a finite number of times before getting absorbed.
Notice that i = i E(Ni ) represents the total arrival rate to the ith node. So
i < 1, i = 1, 2, just means that the total arrival rate must be smaller than the
service rate at each node. As with the tandem queue, the proof can be carried out
25
by the plug in and check method. The i can be computed equivalently as the
solution to the flow equations:
i = i +
Letting QT = (Qj,i ), i, j {1, 2}, denote the 2 2 matrix without the absorbing
state 0 included, the flow equations in matrix form are
~ = ~ +
~ QT ,
with solution
~ = ~(I QT )1 .
We recognize that (I QT )1 = S = (si,j ), where si,j denotes the expected number
of times the discrete-time chain visits transient state j given it started in transient
state i, i = 1, 2.
26