KEMBAR78
Stoch Bio Chapter 3 | PDF | Markov Chain | Stochastic Process
0% found this document useful (0 votes)
22 views46 pages

Stoch Bio Chapter 3

Uploaded by

lalamrayane81
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views46 pages

Stoch Bio Chapter 3

Uploaded by

lalamrayane81
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 46

Chapter 3

Discrete Time Markov Chains

In this chapter we introduce discrete time Markov chains. For these models both time
and space are discrete. We will begin by introducing the basic model, and provide
some examples. Next, we will construct a Markov chain using only independent
uniformly distributed random variables. Such a construction will demonstrate how
to simulate a discrete time Markov chain, which will also be helpful in the continuous
time setting of later chapters. Finally, we will develop some of the basic theory of
discrete time Markov chains.

3.1 The Basic Model


Let Xn , n = 0, 1, 2 . . . , be a discrete time stochastic process with a discrete state
space S. Recall that S is said to be discrete if it is either finite or countably infinite.
Without loss of generality, we will nearly always assume that S is either {1, . . . , N }
or {0, . . . , N − 1} in the finite case, and either {0, 1, . . . } or {1, 2, . . . } in the infinite
setting.
To understand the behavior of such a process, we would like to know the values
of
P {X0 = i0 , X1 = i1 , · · · , Xn = in }, (3.1)
for every n and every finite sequence of states i0 , . . . , in ∈ S. Note that having such
finite dimensional distributions allows for the calculation of any path probability. For
example, by the axioms of probability

P {X0 = i0 , X3 = i3 } = P {X0 = i0 , X1 ∈ S, X2 ∈ S, X3 = i3 }

= P {X0 = i0 , X1 = j1 , X2 = j2 , X3 = i3 }, (3.2)
j1 ∈S j2 ∈S

where the second equality holds as the events are mutually exclusive.

0 c 2011 by David F. Anderson.


Copyright 

39
Example 3.1.1. Recall Example 1.1.3, where we let Zk be the outcome of the kth
roll of a fair die and we let n

Xn = Zk .
k=1
Then, assuming the rolls are indpendent,
 3
1
P {X1 = 2, X2 = 4, X3 = 6} = P {X1 = 2}P {X2 = 4}P {X3 = 6} = .
6

Example 3.1.2. Suppose a frog can jump between three lily pads, labeled 1, 2, and
3. We suppose that if the frog is on lily pad number 1, it will jump to lily pad number
2 with a probability of one. Similarly, if the frog is on lily pad number 3, it will jump
to lily pad number 2. However, when the frog is on lily pad number 2, it will jump
to lily pad 1 with probability 1/4, and to lily pad three with probability 3/4. We can
depict this process graphically via
1/4 1
1  2  3.
1 3/4

We let Xn denote the position of the frog after the nth jump, and assume that X0 = 1.
We then intuitively have (this will be made precise shortly)
P {X0 = 1, X1 = 2, X2 = 3} = 1 × 1 × 3/4 = 3/4,
whereas
P {X0 = 1, X1 = 3} = 0.

Actually computing values like (3.2) can be challenging even when the values
(3.1) are known, and it is useful to assume the process has some added structure. A
common choice for such structure is the assumption that the processes satisfies the
Markov property:
P {Xn = in | X0 = i0 , . . . , Xn−1 = in−1 } = P {Xn = in | Xn−1 = in−1 }, (3.3)
which says that the probabilities associated with future states only depends upon
the current state, and not on the full history of the process. Any process Xn , n ≥ 0,
satisfying the Markov property (3.3) is called a discrete time Markov chain. Note that
the processes described in Examples 3.1.1 and 3.1.2 are both discrete time Markov
chains.
Definition 3.1.3. The one-step transition probability of a Markov chain from state
i to state j, denoted by pij (n), is
def
pij (n) = P {Xn+1 = j | Xn = i}.
If the transition probabilities do not depend upon n, then the processes is said to
be time homogeneous, or simply homogeneous, and we will use the notation pij as
opposed to pij (n).

40
All discrete time Markov chain models considered in these notes will be time
homogeneous, unless explicitly stated otherwise. It is a straightforward use of con-
ditional probabilities to show that any process satisfying the Markov property (3.3)
satisfies the more general condition

P {Xn+m = in+m , . . . , Xn = in | X0 = i0 , . . . , Xn−1 = in−1 }


(3.4)
= P {Xn+m = in+m , . . . , Xn = in | Xn−1 = in−1 },

for any choice of n, m ≥ 1, and states ij ∈ S, with j ∈ 0, . . . , n + m. Similarly, any


Markov chain satisfies the intuitively pleasing identities such as

P {Xn = in | Xn−1 = in−1 , X0 = i0 } = P {Xn = in | Xn−1 = in−1 }.

We will denote the initial probability distribution of the process by α (which we


think of as a column vector):

α(j) = P {X0 = j}, j ∈ S.

Returning to (3.1), we have

P {X0 = i0 , · · · , Xn = in }
= P {Xn = in | X0 = i0 , · · · , Xn−1 = in−1 }P {X0 = i0 , · · · , Xn−1 = in−1 }
= pin−1 in P {X0 = i0 , · · · , Xn−1 = in−1 } (3.5)
..
.
= α0 pi0 i1 · · · pin−1 in ,

and the problem of computing probabilities has been converted to one of simple
multiplication. For example, returning to Example 3.1.2, we have

P {X0 = 1, X1 = 2, X2 = 3} = α1 p12 p23 = 1 × 1 × 3/4 = 3/4.

The one-step transition probabilities are most conveniently expressed in matrix


form.

Definition 3.1.4. The transition matrix P for a Markov chain with state space
S = {1, 2, . . . , N } and one-step transition probabilities pij is the N × N matrix
 
p11 p12 · · · p1N
 p21 p22 · · · p2N 
def  
P =  .. .. . . ..  .
 . . . . 
pN 1 pN 2 · · · pN N

If the state space S is infinite, then P is formally defined to be the infinite matrix
with i, jth component pij .

41
Note that the matrix P satisfies

0 ≤ Pij ≤ 1, 1 ≤ i, j, ≤ N, (3.6)
N

Pij = 1, 1 ≤ i ≤ N. (3.7)
j=1

Any matrix satisfying the two conditions (3.6) and (3.7) is called a Markov or stochas-
tic matrix, and can be the transition matrix for a Markov chain. If P also satisfies
the condition
 N
Pij = 1, 1 ≤ j ≤ N,
i=1

so that the column sums are also equal to 1, then P is termed doubly stochastic.

3.1.1 Examples
We list examples that will be returned to throughout these notes.

Example 3.1.5. This example, termed the deterministically monotone Markov chain,
is quite simple but will serve as a building block for more important models in the
continuous time setting.
Consider Xn with state space {1, 2, . . . , }, and with transition probabilities pi,i+1 =
1, and all others are zero. Thus, if α is the initial distribution and α1 = 1, then the
process simply starts at 1 and proceeds deterministically up the integers towards
positive infinity.

Example 3.1.6. Suppose that Xn are independent and identically distributed with

P {X0 = k} = ak , k = 0, 1, . . . , N,

where ak ≥ 0 and k ak = 1. Then,

P {Xn+1 = in+1 | X0 = i0 , . . . , Xn = in } = P {Xn+1 = in+1 } = ain+1


= P {Xn+1 = in+1 | Xn = in },

and the process is Markovian. Here


 
a0 a1 · · · aN
 
P =  ... ..
. 
a0 a1 · · · aN

Example 3.1.7. Consider a gene that can be repressed by a protein. By Xn = 0, we


mean the gene is free at time n, and by Xn = 1 we mean that the gene is repressed.
We make the following assumptions:

42
1. If the gene is free at time n, there will be a probability of p ≥ 0 that it is
repressed at time n + 1.

2. If the gene is represses at time n, there will be a probability of q ≥ 0 that it is


free at time n + 1.

In this setting Xn can be modeled as a discrete time Markov chain with finite state
space S = {0, 1}. The transition matrix is
 
1−p p
P = , (3.8)
q 1−q

where the first row/column is associated with state 0. Note that any two state discrete
time Markov chain has a transition matrix of the form (3.8). .

Example 3.1.8 (Random walk with finite state space). A “random walk” is a model
used to describe the motion of an entity, the walker, on some discrete space. Taking
our state space to be {0, . . . , N }, for some N > 0, we think of the walker flipping
a coin to decide whether or not to move to the right or left during the next move.
That is, at each time-step the walker moves one step to the right with probability p
(she flipped a heads) and to the left with probability 1 − p (she flipped a tails). If
p = 1/2, the walk is termed symmetric or unbiased, whereas if p = 1/2, the walk is
biased. The one step transition intensities for i ∈ {1, . . . , N − 1} are,

pi,i+1 = p, pi,i−1 = 1 − p, 0 < i < N,

though we must still give the transition intensities at the boundaries. One choice for
the boundary conditions would be to assume that with probability one, the walker
transitions away from the boundary during the next time step. That is, we could
have

p01 = 1, pN,N −1 = 1.

We say such a process has reflecting boundaries. Note that Example 3.1.2 was a
model of a random walk on {1, 2, 3} with reflecting boundaries. Another option
for the boundary conditions is to assume there is absorption, yielding the boundary
conditions

p00 = 1, pN N = 1,

in which case the chain is often called the Gambler’s Ruin, which can be understood
by assuming p < 1/2. Finally, we could have a partial type of reflection

p00 = 1 − p, p01 = p, pN,N −1 = 1 − p, pN N = p.

Of course, we could also have any combination of the above conditions at the different
boundary points. We could also generalize the model to allow for the possibility of
the walker choosing to stay at a given site i ∈ {1, . . . , N − 1} during a time interval.

43
In the most general case, we could let qi , pi and ri be the probabilities that the walker
moves to the left, right, and stays put given that she is in state i. Assuming absorbing
boundary conditions, the transition matrix for this model is
 
1 0 0 0 ··· 0 0
 q1 r1 p 1 0 ··· 0 0 
 
 0 q2 r2 p2 · · · 0 0 
 
P =  .. . . .. .. ..  ,
 . . . . . 
 
 0 · · · 0 0 qN −1 rN −1 pN −1 
0 0 0 0 ··· 0 1
where it is understood that qi , pi , ri ≥ 0, and qi + pi + ri = 1 for all i ∈ {1, . . . , N − 1}.

Example 3.1.9 (Axonal transport). One method of transport used in living cells is
axonal transport in which certain (motor) proteins carry cargo such as mitochondria,
other proteins, and other cell parts, on long microtubules. These microtubule can be
thought of as the “tracks” of the transportation mechanism, with the motor protein
as the random walker. One natural, and simple, model for such transport would begin
by breaking the microtubule into N equally sized intervals, and then letting Xn be
the position of the motor protein on the state space {1, . . . , N }. We could then let
the transition probabilities satisfy
pi,i+1 = pi , pi,i−1 = qi , pi,i = ri , i ∈ {2, . . . , N − 1},
where pi + qi + ri = 1 with pi , qi , ri ≥ 0, and with boundary conditions
p1,1 = p1 + q1 , p1,2 = r1 , pN,N = 1,
where we think of the end of the microtubule associated with state N as the destina-
tion of the cargo. In this case, it would be natural to expect pi > qi . 
Example 3.1.10 (Random walk on the integers). This Markov chain is like that
of Example 3.1.8, except now we assume that the state space is all the integers
S = Z = {. . . , −1, 0, 1, . . . }. That is, Xn is the position of the walker at time n,
where for some 0 ≤ p ≤ 1 the transition probabilities are given by
pi,i+1 = p, pi,i−1 = 1 − p,
for all i ∈ S. This model is one of the most studied stochastic processes and will be
returned to frequently as a canonical example. 
Example 3.1.11 (Random walk on Zd ). We let Zd be the d-dimensional integer
lattice:
Zd = {(x1 , . . . , xd ) : xi ∈ Z}.
Note that for each x ∈ Zd there are exactly 2d values y with |x − y| = 1 (as there are
precisely d components that can be changed by a value of ±1). We may let

1/2d if |x − y| = 1
pxy = .
0 else

44
3.2 Constructing a Discrete Time Markov Chain
We turn to the problem of constructing a discrete time Markov chain with a given
initial distribution, α, and transition matrix, P . More explicitly, for the discrete set
S = {1, 2, . . . } (the finite state space is handled similarly), we assume the existence
of:
(i) An initial distribution α = {αk } giving the associated probabilities for the
random variable X0 . That is, for k ∈ S,

αk = P {X0 = k}.

(ii) Transition probabilities, pij , giving the desired probability of transitioning from
state i ∈ S to state j ∈ S:

pij = P {Xn+1 = j | Xn = i}.

Note that we will require that α is a probability vector in that αk ≥ 0 for each k and

αk = 1.
k∈S

We further require that for all i ∈ S



pij = 1,
j∈S

which simply says that the chain will transition somewhere from state i (including
the possibility that the chain transitions back to state i). The problem is to now
construct a discrete time Markov chain for a given choice of α and {pij } using more
elementary building blocks: uniform random variables. Implicit in the construction
is a natural simulation method.
We let {U0 , U1 , . . . } be independent random variables that are uniformly dis-
tributed on the interval (0, 1). We will use the initial distribution to produce X0
from U0 , and then for n ≥ 1, we will use the transition matrix to produce Xn from
the pair (Xn−1 , Un ). Note, therefore, that each choice of sequence of uniform ran-
dom variables {U0 , U1 , . . . } will correspond with a unique path of the process Xn ,
n ≥ 0. We therefore have a simulation strategy: produce uniform random variables
and transform them into a path of the Markov chain.
To begin the construction, we generate X0 from U0 using the transformation
method detailed in Theorem 2.3.22. Next, we note that,

P {X1 = j | X0 = i} = pij .

Therefore, conditioned upon X0 , X1 is a discrete random variable with probability


mass function determined by the ith row of the transition matrix P . We may then
use Theorem 2.3.22 again to generate X1 using only U1 . Continuing in this manner
constructs the Markov chain Xn .

45
It is straightforward to verify that the constructed model is the desired Markov
chain. Using that Xn+1 is simply a function of Xn and Un+1 , that is Xn+1 =
f (Xn , Un+1 ), and that by construction X0 , . . . , Xn are independent of Un+1 , we have

P {Xn+1 = j | X0 = i0 , . . . , Xn−1 = in−1 , Xn = i}


= P {f (Xn , Un+1 ) = j | X0 = i0 , . . . , Xn−1 = in−1 , Xn = i}
= P {f (i, Un+1 ) = j | X0 = i0 , . . . , Xn−1 = in−1 , Xn = i}
= P {f (i, Un+1 ) = j}
= pij .

The above construction provides an algorithm for the exact simulation of sample
paths of the Markov chain. In fact, the algorithm implicit in the construction above
is already one half of the well known “Gillespie Algorithm” used in the generation
of sample paths in the continuous time Markov chain setting that will be studied in
later chapters [6, 7].

3.3 Higher Order Transition Probabilities


We begin by asking one of the most basic questions possible of a stochastic process:
given an initial distribution α, and a transition matrix P , what is the probability
that the Markov chain will be in state i ∈ S at time n ≥ 0? To begin answering this
question we have the following definition.
(n)
Definition 3.3.1. The n-step transition probability, denoted pij , is the probability
of moving from state i to state j in n steps,
(n) def
pij = P {Xn = j | X0 = i} = P {Xn+k = j | Xk = i},

where the final equality is a consequence of time homogeneity.

Let Pijn denote the i, jth entry of the matrix P n . We note that if the state space
is infinite, then we formally have that

Pij2 = pik pkj ,
k∈S
 
which converges since k pik pkj ≤ k pik = 1, with similar expressions for Pijn . The
following is one of the most useful results in the study of discrete time Markov chains,
and is the reason much of there study reduces to linear algebra.

Proposition 3.3.2. For all n ≥ 0 and i, j ∈ S,


(n)
pij = Pijn .

46
Proof. We will show the result by induction on n. First, note that the cases n = 0
and n = 1 follow by definition. Next, assuming the result is true for a given n ≥ 1,
we have

P {Xn+1 = j | X0 = i} = P {Xn+1 = j, Xn = k | X0 = i}
k∈S

= P {Xn+1 = j | Xn = k}P {Xn = k | X0 = i}
k∈S
 (n)
= pik pkj
k∈S

= Pikn Pkj ,
k∈S

where the final equality is our inductive hypothesis. The last term is the i, jth entry
of P n+1 .
We note that a slight generalization of the above computation yields
 (m) (n)
pm+n
ij = pik pkj , (3.9)
k∈S

for all i, j ∈ S, and m, n ≥ 0. These are usually called the Chapman-Kolmogorov


equations, and they have a quite intuitive interpretation: the chain must be some-
where after m steps, and we are simply summing over the associated probabilities.
Note that the Chapman-Kolmogorov equations is the probabilistic version of the well
known matrix identity
P m+n = P m P n .
We may now answer our original question pertaining to the probability that the
Markov chain will be in state i ∈ S at time n ≥ 0 for a given initial distribution α:
 
P {Xn = i} = P {Xn = i | X0 = k}α(k) = α(k)Pkin = (αT P n )i . (3.10)
k∈S k∈S

Thus, calculating probabilities is computationally equivalent to computing powers of


the transition matrix.

Example 3.3.3. Consider again Example 3.1.7 pertaining to the gene that can be
repressed. Suppose that p = 1/3 and q = 1/8 and we know that the gene is unbound
at time 0, and so  
1
α= .
0
Suppose we want to know the probability that the gene is unbound at time n = 4.
We have  
2/3 1/3
P = ,
1/8 7/8

47
and so  
4 .33533 .66467
P = ,
.24925 .75075
and
αT P 4 = [.33533, .66467].
Thus, the desired probability is .33533. 

A natural question, and the focus of Section 3.5, is the following: for large n,
what are the values P {Xn = i}, for i ∈ S. That is, after a very long time what are
the probabilities of being in different states. By Proposition 3.3.2, we see that this
question, at least in the case of a finite state space, can be understood simply through
matrix multiplication.
For example, suppose that Xn is a two-state Markov chain with transition matrix
 
2/3 1/3
P = .
1/8 7/8

It is easy to check with a computer, or linear algebra, that for very large n,
 
n 3/11 8/11 def
P ≈ = Π.
3/11 8/11

Note that the rows of Π are identical and equal to π T = [3/11, 8/11]. Therefore, if v
is a probability vector (that is, a row vector with non-negative elements that sum to
one, think of it as an initial distribution), we see that

lim v T P n = v T Π = π T .
n→∞

Therefore, for this example we may conclude that


3 8
lim P {Xn = 1} = , and lim P {Xn = 2} = ,
n→∞ 11 n→∞ 11
no matter the initial distribution.
Such a vector π will eventually be termed a stationary, or invariant, distribution
of the process, and is usually of great interest to anyone wishing to understand the
underlying model. Natural questions now include: does every process Xn have such
a stationary distribution? If so, is it unique? Can we quantify how long it takes
to converge to a stationary distribution? To answer these questions1 we need more
terminology and mathematical machinery that will be developed in the next section.
We will return to them in Section 3.5.
1
The answers are: no, sometimes, yes.

48
3.4 Classification of States
3.4.1 Reducibility
Suppose that Xn is a Markov chain with state space S = {1, 2, 3, 4} and transition
matrix  
1/2 1/2 0 0
 1/3 2/3 0 0 
P = 0
. (3.11)
0 1/3 2/3 
0 0 3/4 1/4
Note that if the chain starts in either state 1 or 2, then it will remain in {1, 2} for all
time, whereas if the chain starts in state 3 or 4, it will remain in {3, 4} for all time.
It seems natural to study this chain by analyzing the “reduced chains,” consisting of
states S1 = {1, 2} and S2 = {3, 4}, separately.
If instead the transition matrix is
 
1/2 1/4 1/4 0
 1/3 2/3 0 0 
P =  0
, (3.12)
0 1/3 2/3 
0 0 3/4 1/4

then it should be at least intuitively clear that even if X0 ∈ {1, 2}, the chain will
eventually move to the states {3, 4} as every time the chain enters state 1, it has a
probability of 0.25 of next transitioning to state 3. Once such a transition occurs,
the chain remains in the states {3, 4} for all time. This intuition will be shown to
be true later in the notes. For this example, if only the probabilities associated with
very large n are desired, then it seems natural to only consider the “reduced chain”
consisting of states {3, 4}.
The following definitions describe when chains can be so reduced.

Definition 3.4.1. The state j ∈ S is accessible from i ∈ S, and we write i → j, if


there is an n ≥ 0 such that
(n)
pij > 0.
That is, j is accessible from i if there is a positive probability of the chain hitting j
if it starts in i.

For example, for the chain with transition matrix (3.11) we have the relations
1 → 2, 2 → 1, 3 → 4, and 4 → 3, together with all the relations i → i. However, for
the chain with transition matrix (3.12), we have all the relations i → i and

• 1 → 2, 1 → 3, 1 → 4,

• 2 → 1, 2 → 3, 2 → 4,

• 3 → 4,

• 4 → 3,

49
which can be seen from the fact that
 19 5 5 13 
72 18 18 72
 10 97 1 1 
 
P = ,
4 27 216 8 18
 0 0 107 109 
 216 216 
109 83
0 0 192 192

combined with the fact that the bottom left 2×2 sub-matrix of P n will always consist
entirely of zeros.
Definition 3.4.2. States i, j ∈ S of a Markov chain communicate with each other,
and we write i ↔ j, if i → j and j → i.
It is straightforward to verify that the relation ↔ is
1. Reflexive: i ↔ i.
2. Symmetric: i ↔ j implies j ↔ i.
3. Transitive: i ↔ j and j ↔ k implies i ↔ k.
Only the third condition need be checked, and it essentially follows from the
Chapman-Kolmogorov equations (3.9): Since i → j, there is an n ≥ 0 such that
(n) (m)
pij > 0. Since j → k, there is an m ≥ 0 such that pjk > 0. Therefore, by (3.9)
 (n) (m) (n) (m)
pn+m
ik = pi pk ≥ pij pjk > 0,

and i → k.
We may now decompose the state space using the relation ↔ into disjoint equiv-
alence classes called communication classes. For example, the Markov chain with
transition matrix (3.11) has two communication classes {1, 2} and {3, 4}. Also, the
Markov chain with transition matrix (3.12) has the same communication classes:
{1, 2} and {3, 4}. For the deterministically monotone process of Example 3.1.5, each
singleton {i}, i ≥ 0, is its own communication class. For the symmetric random
walk of Example 3.1.8 with absorbing boundaries (the Gambler’s Ruin problem) the
communication classes are {0}, {N }, and {1, . . . , N − 1}, whereas for the symmetric
random walk with reflecting boundaries the only communication class is the entire
state space {0, . . . , N }. For the random walk on the integer lattice Zd described in
Example 3.1.11, the only communication class is all of Zd .
Definition 3.4.3. A Markov chain is irreducible if there is only one communication
class. That is, if i ↔ j for all i, j ∈ S. Otherwise, it is called reducible.
Consider again the Markov chains with transition matrices (3.11) and (3.12). For
both, the set of states {1, 2} is a communication class. However, it should be clear
that the behavior of the chains on {1, 2} are quite different as the chain with transition
matrix (3.12) will eventually leave those states (assuming it starts there), and never
return.

50
Definition 3.4.4. A subset of the state space C ⊂ S, is said to be closed if it is
impossible to reach any state outside of C from any state in C via one-step transitions.
That is, C is closed if pij = 0 for all i ∈ C and j ∈/ C. We say that the state j is
absorbing if {j} is closed.

The set {1, 2} is closed for the chain with transition matrix (3.11), whereas it is
not for that with transition matrix (3.12). However, the set {3, 4} is closed for both.
For the deterministically monotone system, the subset {n, n + 1, n + 2, . . . } is closed
for any n ≥ 0. For the Gambler’s ruin problem of random walk on {0, . . . , N } with
absorbing boundary conditions, only {0} and {N } are closed.
We point out that if C ⊂ S is closed, then the matrix with elements pij for i, j ∈ C
is a stochastic matrix because for any i ∈ C,
 
pij = 1, and pij = 0.
j∈C j∈C c

Therefore, if we restrict our attention to any closed subset of the state space, we can
treat the resulting model as a discrete time Markov chain itself. The most interesting
subsets will be those that are both closed and irreducible: for example the subset
{3, 4} of the Markov chain with transition matrix (3.11) or (3.12), which for either
model is a two-state Markov chain with transition matrix
 
1/3 2/3
P = .
3/4 1/4

3.4.2 Periodicity
Periodicity helps us understand the possible motion of a discrete time Markov chain.
As a canonical example, consider the random walker of Example 3.1.8 with state
space S = {0, 1, 2, 3, 4} and reflecting boundary conditions. Note that if this chain
starts in state i ∈ S, it can only return to state i on even times.
For another example, consider the Markov chain on {0, 1, 2} with

p01 = p12 = p20 = 1.

Thus, the chain deterministically moves from state 0 to state 1, then to state 2, then
back to 0, etc. Here, if the chain starts in state i, it can (and will) only return to
state i at times that are multiples of 3.
On the other hand, consider the random walk on S = {0, 1, 2, 3, 4} with boundary
conditions
p0,0 = 1/2, p0,1 = 1/2, and p4,3 = 1.
In this case, if the chain starts at state 0, there is no condition similar to those above
on the times that the chain can return to state 0.

Definition 3.4.5. The period of state i ∈ S is


(n)
d(i) = gcd{n ≥ 1 : pii > 0},

51
(n)
where gcd stands for greatest common divisor. If {n ≥ 1 : pii > 0} = ∅,2 we take
d(i) = 1. If d(i) = 1, we say that i is aperiodic, and if d(i) > 1, we say that i is
periodic with a period of d(i).
The proof of the following theorem can be found in either [10, Chapter 1] or [13,
Chapter 2].
Theorem 3.4.6. Let Xn be a Markov chain with state space S. If i, j ∈ S are in the
same communication class, then d(i) = d(j). That is, they have the same period.
Therefore, we may speak of the period of a communication class, and if the chain
is irreducible, we may speak of the period of the Markov chain itself. Any property
which necessarily holds for all states in a communication class is called a class property.
Periodicity is, therefore, the first class property we have seen, though recurrence and
transience, which are discussed in the next section, are also class properties.
Periodicity is often obvious when powers of the transition matrix are taken.
Example 3.4.7. Consider a random walk on {0, 1, 2, 3} with reflecting boundary
conditions. This chain is periodic with a period of two. Further, we have
 
0 1 0 0
 
 1/2 0 1/2 0 
P =  0 1/2 0 1/2  ,

 
0 0 1 0
and for any n ≥ 1,
   
∗ 0 ∗ 0 0 ∗ 0 ∗
   
 0 ∗ 0 ∗   ∗ 0 ∗ 0 
P 2n = 
 ∗ 0 ∗ 0 , and P 2n+1 = 
 0 ∗ 0 ∗ ,
   
0 ∗ 0 ∗ ∗ 0 ∗ 0
where ∗ is a generic placeholder for a positive number. 
Example 3.4.8. Consider the random walk on S = {0, 1, 2, 3, 4} with boundary
conditions
p0,0 = 1/2, p0,1 = 1/2, and p4,3 = 1.
The transition matrix is
 
1/2 1/2 0 0 0
 
 1/2 0 1/2 0 0 
 
 
P =  0 1/2 0 1/2 0 ,
 
 0 0 1/2 0 1/2 
 
0 0 0 1 0
2
This happens, for example, for the deterministically monotone chain of Example 3.1.5.

52
and  
71 57 1 9 7
 256 256 4 64 64 
 57 39 29 21 1 
 
 256 128 256 64 32 
 1 29 49 9 7

P8 = 
 4 256 128 256 32
,

 
 9 21 9 63 1 
 64 64 256 128 256 
 
7 1 7 1 35
32 16 16 128 128
showing that d(i) = 1 for each i ∈ S. 
In the previous example, we used the basic fact that if each element of P n is
positive for some n ≥ 1, then P n+k has strictly positive elements for all k ≥ 0. This
follows because (i) each element of P is nonnegative, (ii) the rows of P sum to one,
and (iii) P n+k = P P n+k−1 .

3.4.3 Recurrence and Transience


A state i ∈ S of a Markov chain will be called recurrent if after every visit to state i,
the chain will eventually return for another visit with a probability of one. Otherwise,
we will call the state transient. More formally, we begin by fixing a state i ∈ S and
then defining the probability measure Pi by
def
Pi {A} = P {A|X0 = i}, A ∈ F.

We let Ei be the expected value associated with the probability measure Pi . Let τi
denote the first return time to state i,
def
τi = min{n ≥ 1 : Xn = i},

where we take τi = ∞ if the chain never returns.


Definition 3.4.9. The state i ∈ S is recurrent if

Pi {τi < ∞} = 1,

and transient if Pi {τi < ∞} < 1, or equivalently if Pi {τi = ∞} > 0.


To study the difference between a recurrent and transient state we let


R= 1{Xn =i}
n=0

denote the random variable giving the number of times the chain returns to state i.
Computing the expectation of R we see that

 ∞
 (n)
Ei R = Pi {Xn = i} = pii .
n=0 n=0

53
Suppose that the chain is transient and let
def
p = Pi {τi < ∞} < 1.

The random variable R is geometric with parameter 1 − p > 0. That is, for k ≥ 1

Pi {R = 1} = 1 − p, Pi {R = 2} = p(1 − p), ... Pi {R = k} = pk−1 (1 − p).

Therefore,

 1
Ei R = kpk−1 (1 − p) = < ∞. (3.13)
k=1
1−p
Note that equation (3.13) also shows that if the chain is transient, then

Pi {R = ∞} = 0

and there is, with a probability of one, a last time the chain visits the site i. Similarly,
if state i is recurrent, then Pi {R = ∞} = 1 and Ei R = ∞. Combining the above
yields the following.
Theorem 3.4.10. A state i is transient if and only if the expected number of returns
is finite, which occurs if and only if

 (n)
pii < ∞.
n=0

Further, if i is recurrent, then with a probability of one, Xn returns to i infinitely


often, whereas if i is transient, there is a last time a visit occurs.
The set of recurrent states can be subdivided further. We say that the state i is
positive recurrent if we also have

E[τi ] < ∞.

Otherwise, we say that the state i is null recurrent. The different types of recurrence
will be explored further in Section 3.5, where we will show why positive recurrence is
a much stronger form of recurrence than null recurrence. In fact, in many important
ways positive recurrent chains with an infinite state space behave like finite state
space chains.
The following theorem shows that recurrence, and hence transience, is a class
property. Thus, when the chain is irreducible, we typically say the chain is recurrent.
Theorem 3.4.11. Suppose that i ↔ j. Then state i is recurrent if and only if state
j is recurrent.
Proof. The following argument is the intuition needed to understand the result (which
is also the basis of the proof): because state i is recurrent, we return to it an infinite
number of times with a probability of one. We also know that there is an n > 0 for
(n)
which pij > 0. Thus, every time we are in state i, which happens an infinite number

54
of times, there is a positive probability that we get to state j in n steps. Thus, we
will enter state j an infinite number of times. The formal proof is below.
Suppose that i is recurrent. We must show that j is recurrent. Because i ↔ j,
(n) (m)
there are nonnegative integers n and m that satisfy pij , pji > 0. Let k be a non-
negative integer. It is an exercise in the use of conditional probabilities to show
that
(m+n+k) (m) (k) (n)
pjj ≥ pji pii pij ,
which says that one way to get from j to j in m + n + k steps is to first go to i in m
steps, then return to i in k steps, then return to j in n steps. Therefore,

 ∞
 ∞

(k) (m+n+k) (m) (k) (n)
pjj ≥ pjj ≥ pji pii pij
k=0 k=0 k=0


(m) (n) (k)
= pji pij pii .
k=0

Because i is recurrent, Theorem 3.4.10 shows that the sum is infinite, and hence that
state j is recurrent.
Note that Theorems 3.4.10 and 3.4.11 together guarantee the following:

Fact: All states of an irreducible, finite state space Markov chain are recurrent.

The above fact holds by the following logic: if the states were not recurrent, they
are each transient. Hence, there is a time, call it Ti , that a particular realization of
the chain visits state i. Therefore, maxi {Ti } is the last time the realization visits any
state, which can not be. Things are significantly less clear in the infinite state space
setting as the next few examples demonstrate.

Example 3.4.12. Consider a one dimensional random walk on the integer lattice
S = Z = {. . . , −2, −1, 0, 1, 2, . . . } where for some 0 < p < 1 we have
def
pi,i+1 = p, pi,i−1 = q, with q = 1 − p.

This chain is irreducible and has a period of 2. We will show that it is recurrent if
p = 1/2, and transient otherwise. To do so, we will verify the result at the origin
using Theorem 3.4.10, and then use Theorem 3.4.11 to extend the result to the entire
state space.
Notice that because of the periodicity of the system, we have
(2n+1)
p00 = 0,

for all n ≥ 0. Therefore,



 ∞

(n) (2n)
p00 = p00 .
n=0 n=0

55
Given that X0 = 0, if X2n = 0 the chain must have moved to the right n times and to
the left n times. Each such sequence
 of steps has a probability of pn q n of occurring.
2n
Because there are exactly n such paths, we see
 
(2n) 2n (2n)!
p00 = (pq)n = (pq)n .
n n!n!
Therefore,

 ∞

(2n) (2n)!
p00 = (pq)n .
n=0 n=0
n!n!
Recall that Stirling’s formula states that for m  1,

m! ∼ mm e−m 2πm,

where by f (m) ∼ g(m) we mean

f (m)
lim = 1.
m→∞ g(m)

Verification of Stirling’s formula can be found in a number of places, for example in


[5]. Stirling’s formula yields

(2n) (2n)! n 4πn(2n)2n e−2n 1
p00 = (pq) ∼ 2n −2n
(pq)n = √ (4pq)n .
n!n! 2πnn e πn
Therefore, there is an N > 0 such that n ≥ N implies
1 2
√ (4pq)n < p(2n)
00 < √ (4pq)n .
2 πn πn

The function 4pq = 4p(1 − p) is strictly less than one for all p ∈ [0, 1] with p = 1/2.
However, when p = 1/2, we have that 4p(1 − p) = 1. Therefore, in the case that
p = 1/2 we have
∞ ∞ ∞
(2n) (2n) 1
p00 > p00 > √ = ∞,
n=0 n=N n=N
2 πn
and by Theorem 3.4.10, the chain is recurrent. When p = 1/2, let ρ = 4pq < 1. We
have ∞ ∞
 (2n)
 2
p00 < N + √ ρn < ∞,
n=0 n=N
πn
and by Theorem 3.4.10, the chain is transient. 
Example 3.4.13. We consider now the symmetric random walk on the integer lattice
Zd introduced in Example 3.1.11. Recall that for this example,

1/2d if |i − j| = 1
pij = .
0 else

56
We again consider starting the walk at the origin 0 = (0, 0, . . . , 0). The chain has a
(2n+1)
period of 2, and so p0,0 = 0 for all n ≥ 0. Thus, to apply Theorem 3.4.10 we only
(2n)
need an expression for p0,0 . We will not give a rigorous derivation of the main results
here as the combinatorics for this example are substantially more cumbersome than
the last. Instead, we will make use of the following facts, which are intuitive:

(i) For large value of n, approximately 2n/d of these steps will be taken in each of
the d dimensions.

(ii) In each of the d dimensions, the analysis of the previous example implies that
theprobability that that component is at zero at time 2n/d is asymptotic to
1/ π(n/d).

Therefore, as there are d dimensions, we have


 d/2
(2n) d
p0,0 ∼C ,

∞
for some C > 0 (that depends upon d, of course). Recalling that n=1 n−a < ∞ if
and only if a > 1, we see that

 
(2n) = ∞, d = 1, 2
p0,0 .
< ∞, d ≥ 3
n=1

Thus, simple random walk in Zd is recurrent if d = 1 or 2 and is transient if d ≥ 3. This


points out the general phenomenon that dynamics, in general, are quite different in
dimensions greater than or equal to three than in dimensions one and two. Essentially,
a path restricted to a line or a plane is much more restricted than one in space.3 

The following should, at this point, be intuitive.

Theorem 3.4.14. Every recurrent class of a Markov chain is a closed set.

Proof. Suppose C is a recurrent class that is not closed. Then, there exists i ∈ C and
j∈/ C such that pij > 0, but it is impossible to return to state i (otherwise, i ↔ j).
Therefore, the probability of starting in i and never returning is at least pij > 0, a
contradiction with the class being recurrent.
Note that the converse of the above theorem is, in general, false. For example, for
the deterministic monotone chain, each set {n, n + 1, . . . } is closed, though no state
is recurrent.
Suppose that P is a transition matrix for a Markov chain and that R1 , . . . , Rr are
the recurrent communication classes and T1 , . . . , Ts are the transient classes. Then,
3
The video game “Tron” points this out well. Imagine how the game would play in three dimen-
sions.

57
after potentially reordering the indices of the state, we can write P in the following
form:  
P1
 P2 0 
 
 P3 0 
 
P = . . , (3.14)
 0 . 
 
 Pr 
S Q
where Pk is the transition matrix for the Markov chain restricted to Rk . Raising P
to powers of n ≥ 1 yields
 n 
P1
 P2n 0 
 
 P n
0 
 3 
Pn =  .. ,
 0 . 
 
 n
Pr 
n
Sn Q

and to understand the behavior of the chain on Rk , we need only study Pk . The
Matrix Q is sub-stochastic in that the row sums are all less than or equal to one,
and at least one of the row sums is strictly less than one. In this case each of the
eigenvalues has an absolute value that is strictly less than one, and it can be shown
that Qn → 0, as n → ∞.

3.5 Stationary Distributions


Just as stable fixed points characterize the long time behavior of solutions to differen-
tial equations, stationary distributions characterize the long time behavior of Markov
chains.

Definition 3.5.1. Consider a Markov chain with transition matrix P . A non-negative


vector π is said to be an invariant measure if

πT P = πT , (3.15)

which in component form is



πi = πj pji , for all i ∈ S. (3.16)
j

If π also satisfies k πk = 1, then π is called a stationary, equilibrium or steady state
probability distribution.

Thus, a stationary distribution is a left eigenvector of the transition matrix with


associated eigenvalue equal to one. Note that if one views pji as a “flow rate” of

58
probability from state j to state i, then (3.16) can be interpreted in the following
manner: for each state i, the probability of being in state i is equal to the sum of the
probability of being in state j times the “flow rate” from state j to i.
A stationary distribution can be interpreted as a fixed point for the Markov chain
because if the initial distribution of the chain is given by π, then the distribution at
all times n ≥ 0 is also π,

π T P n = π T P P n−1 = π T P n−1 = · · · = π T ,

where we are using equation (3.10). Of course, in the theory of dynamical systems it
is well known that simply knowing a fixed point exists does not guarantee that the
system will converge to it, or that it is unique. Similar questions exist in the Markov
chain setting:

1. Under what conditions on a Markov chain will a stationary distribution exist?

2. When a stationary distribution exists, when is it unique?

3. Under what conditions can we guarantee convergence to a unique stationary


distribution?

We recall that we have already seen an example in which all of the above questions
where answered. Recall that in Section 3.3, we showed that if the two-state Markov
chain has transition matrix  
2/3 1/3
P = , (3.17)
1/8 7/8
then for very large n,  
n 3/11 8/11
P ≈ = Π.
3/11 8/11
The important point was that the rows of Π are identical and equal to π T = [3/11, 8/11],
and therefore, if v is an arbitrary probability vector,

lim v T P n = v T Π = π T ,
n→∞

and so no matter the initial distribution we have


3 8
lim P {Xn = 1} = , and lim P {Xn = 2} = .
n→∞ 11 n→∞ 11
It is straightforward to check that [3/11, 8/11] is the unique left eigenvector of P with
an eigenvalue of 1.
Let us consider at least one more example.

Example 3.5.2. Suppose that Xn is a three state Markov chain with transition
matrix  
2/3 1/3 0
P =  1/12 5/8 7/24  . (3.18)
0 1/8 7/8

59
Then, for large n  
3/43 12/43 28/43
P n ≈  3/43 12/43 28/43  = Π,
3/43 12/43 28/43
where we again note that each row of Π is identical. Therefore, regardless of the
initial distribution, we have
3 12 28
lim P {Xn = 1} = , lim P {Xn = 2} = , and lim P {Xn = 3} = .
n→∞ 43 n→∞ 43 n→∞ 43
We again note that it is straightforward to check that [3/43, 12/43, 28/43] is the
unique left eigenvalue of P with an eigenvalue of 1. 
Interestingly, we were able to find stationary distributions for the above transition
matrices without actually computing the left eigenvectors. Instead, we just found the
large n probabilities. Question 3 above asks when such a link between stationary
distributions and large n probabilities holds (similar to convergence to a fixed point
for a dynamical system). This question will be explored in detail in the current
section, however we begin my making the observation that if
π T = lim v T P n ,
n→∞

for all probability vectors v (which should be interpreted as an initial distribution),


then  
π T = lim v T P n+1 = lim v T P n P = π T P.
n→∞ n→∞
n
Therefore, if P converges to a matrix with a common row, π, then that common row
is, in fact, a stationary distribution.
The logic of the preceding paragraph is actually backwards in how one typically
studies Markov chains. Most often, the modeler has a Markov chain describing some-
thing of interest to him or her. If this person would like to study the behavior of
their process for very large n, then it would be reasonable to consider the limiting
probabilities, assuming they exist. To get at these probabilities, they would need
to compute π as the left-eigenvector of their transition matrix and verify that this
is the unique stationary distribution, and hence all probabilities converge to it, see
Theorems 3.5.6 and 3.5.16 below.
We will answer the three questions posed above first in the finite state space
setting, where many of the technical details reduce to linear algebra. We then extend
all the results to the infinite state space setting.

3.5.1 Finite Markov chains


Irreducible, aperiodic chains
For a finite Markov chain with transition matrix P , we wish to understand the long
term behavior of P n and, relatedly, to find conditions that guarantee a unique sta-
tionary distribution exists. However, we first provide a few examples showing when
such a unique limiting distribution does not exist.

60
Example 3.5.3. Consider simple random walk on {0, 1, 2} with reflecting boundaries.
In this case we have  
0 1 0
P =  1/2 0 1/2  .
0 1 0
It is simple to see that for n ≥ 1,
 
1/2 0 1/2
P 2n =  0 1 0 ,
1/2 0 1/2
and,  
0 1 0
P 2n+1 =  1/2 0 1/2  .
0 1 0
It is easy to see why this happens. If the walker starts at 1, then she must be at one
after an even number of steps, etc. This chain is therefore periodic. Clearly, P n does
not converge in this example. 
Example 3.5.4. Consider simple random walk on {0, 1, 2, 3} with absorbing bound-
aries. That is  
1 0 0 0
 1/2 0 1/2 0 
P =  0 1/2 0 1/2  .

0 0 0 1
For n large we have  
1 0 0 0
 2/3 0 0 1/3 
Pn ≈ 
 1/3
.
0 0 2/3 
0 0 0 1
Again, this is believable, as you are assured that you will end up at 0 or 3 after enough
time has passed. We see the problem here is that the states {1, 2} are transient. 
Example 3.5.5. Suppose that S = {1, 2, 3, 4, 5} and
 
1/3 2/3 0 0 0
 3/4 1/4 0 0 0 
 
P =  0 0 1/8 1/4 5/8 .

 0 0 0 1/2 1/2 
0 0 1/3 0 2/3
For n  1, we have
 
9/17 8/17 0 0 0
 9/17 8/17 0 0 0 
 
P ≈
n
 0 0 8/33 4/33 7/11 .

 0 0 8/33 4/33 7/11 
0 0 8/33 4/33 7/11

61
In this case, the Markov chain really consists of two smaller, noninteracting chains:
one on {1, 2} and another on {3, 4, 5}. Each subchain will converge to its equilibrium
distribution, but there is no way to move from one subchain to the other. Here the
problem is that the chain is reducible. 

These examples actually demonstrate everything that can go wrong. The following
theorem is the main result of this section.

Theorem 3.5.6. Suppose that P is the transition matrix for a finite Markov chain
that is irreducible and aperiodic. Then, there is a unique stationary distribution π,

πT P = πT ,

for which πi > 0 for each i. Further, if v is any probability vector, then

lim v T P n = π T .
n→∞

The remainder of this sub-section consists of verifying Theorem 3.5.6. However,


before proceeding with the general theory, we attempt to better understand why the
processes at the beginning of this section converged to a limiting distribution π. Note
that the eigenvalues of the matrix (3.17) are

λ1 = 1 and λ2 = 13/24 < 1.

If we let π1 and π2 denote the respective left eigenvectors, and let v be an arbitrary
probability vector, then because π1 and π2 are necessarily linearly independent

v T P n = (c1 π1T P n + c2 π2T P n ) = c1 π1T + c2 (13/24)n π2T → c1 π1T , as n → ∞.

The normalization constant c1 is then chosen so that c1 π1 is a probability vector.


Similarly, the eigenvalues of the √
transition matrix for the Markov chain of Example
3.5.2 are λ1 = 1 and λ2 , λ3 = (14 ± 14)/24. Thus, |λi | < 1 for i ∈ {1, 2}, and λ1 = 1
is again the dominant eigenvalue. Therefore, by the same reasoning as in the 2 × 2
case, we again have that for any probability vector v,

v T P n → c1 π1T , as n → ∞,

where c1 is chosen so that c1 π1 is a probability vetor.


The above considerations suggest the following plan of attack for proving Theorem
3.5.6, which we write in terms of a claim.
Claim: Suppose that a stochastic matrix, P , satisfies the following three conditions:

(i) P has an eigenvalue of 1, which is simple (has a multiplicity of one).

(ii) All other eigenvalues have absolute value less than 1.

(iii) The left eigenvector associated with the eigenvalue 1 has strictly positive entries.

62
Then
vT P n → πT , as n → ∞, (3.19)
for any probability vector v, where π is the unique left eigenvector normalized to sum
to one, and πi > 0 for each i.
Note that condition (iii) above is not strictly necessary as it just guarantees
convergence to a vector giving non-zero probability to each state. However, it is
included for completeness (since this will be the case for irreducible chains), and we
will consider the possibility of πi = 0 for some i (which will occur if there are transient
states) later.
It turns out the above claim follows from a straightforward use of Jordan canonical
forms. We point the interested reader to [10, Chapter 1] for full details. However, it
is probably more instructive to show the result in a slightly less general setting by
also assuming that there is a full set of distinct eigenvalues for P (though we stress
that the claim holds even without this added assumption). Thus, let λ1 , λ2 , . . . , λN ,
be the eigenvalues of P with λ1 = 1 and |λi | < 1, for i > 1. Let the corresponding
left eigenvectors be denoted by πi , where π1 is normalized to sum to one (that is, it
is a probability vector). The eigenvectors are necessarily linearly independent and so
we can write our initial distribution as

v = c1 π 1 + c2 π 2 + · · · + cN π N ,

for some choice of ci , which depend upon our choice of v. Thus, letting π (n) denote
the distribution at time n we see

π (n) = v T P n
= (c1 π1T + c2 π2T + · · · + cN πN
T
)P n
= c1 λn1 π1T + c2 λn2 π2T + · · · + cN λnN πN
T

→ c1 π 1 ,

as n → ∞. Note that as both π (n) and π1 are probability vectors, we see that c1 = 1,
which agrees with our examples above. We further note the useful fact that the rate
of convergence to the stationary distribution is dictated by the size of the second
largest (in absolute value) eigenvalue.
Returning to Theorem 3.5.6, we see that the theorem will be proved if we can
verify that the transition matrix of an aperiodic, irreducible chain satisfies the three
conditions above. By the Perron-Frobenius theorem, any stochastic matrix, Q, that
has all strictly positive entries satisfies the following:

(i) 1 is a simple eigenvalue of Q,

(ii) the left eigenvector associated with 1 can be chosen to have strictly positive
entries,

(iii) all other eigenvalues have absolute value less than 1.

63
Therefore, the Perron-Frobenius theorem almost gives us what we want. However,
the transition matrix for aperiodic, irreducible chains do not necessarily have strictly
positive entries, see (3.18) of Example 3.5.2, and so the above can not be applied
directly.
However, suppose instead that P n has strictly positive entries for some n ≥ 1.
Then the Perron-Frobenius theorem can be applied to P n , and conditions (i), (ii),
and (iii) directly above hold for P n . However, by the spectral mapping theorem
the eigenvalues of P n are simply the nth powers of the eigenvalues of P , and the
eigenvectors of P are the eigenvectors of P n . We can now conclude that P also
satisfies the conclusions of the Perron-Frobenius theorem by the following arguments:

1. The vector consisting of all ones is a right eigenvector of P with eigenvalue 1,


showing P always has such an eigenvalue.

2. If λ = 1 were an eigenvalue of P with |λ| = 1 and eigenvector v, then v T P n =


λn v T , showing v is a left eigenvector of P n with eigenvalue of absolute value
equal to one. This is impossible as the eigenvalue 1 is simple for P n . Thus, 1 is
a simple eigenvalue of P and all others have absolute value value less than one.

3. The left eigenvector of P associated with eigenvalue 1 has strictly positive com-
ponents since this is the eigenvector with eigenvalue 1 for P n .

Therefore, Theorem 3.5.6 will be shown if the following claim holds:


Claim: Suppose that P is the transition matrix for an aperiodic, irreducible Markov
chain. Then, there is an n ≥ 1 for which P n has strictly positive entries.

Proof. The proof of the claim is relatively straightforward, and the following is taken
from [10, Chapter 1]. We take the following fact for granted, which follows from a
result in number theory: if the chain is aperiodic, then for each state i, there is an
(n)
M (i) for which pii > 0 for all n ≥ M (i).
Returning to the proof of the claim, we need to show that there is an M > 0 so
that if n ≥ M , then we have that P n has strictly positive entries. Let i, j ∈ S. By
the irreducibility of the chain, there is an m(i, j) for which
(m(i,j))
pij > 0.

Thus, for all n ≥ M (i),


(n+m(i,j)) (n) (m(i,j))
pij ≥ pii pij > 0.

Now, simply let M be the maximum over M (i) + m(i, j), which exists since the state
(n)
space is finite. Thus, pij > 0 for all n ≥ M and all i, j ∈ S.
We pause to reflect upon what we have shown. We have concluded that for an
irreducible, aperiodic Markov chain, if we wish to understand the large time prob-
abilities associated with the chain, then it is sufficient to calculate the unique left

64
eigenvector of the transition matrix with eigenvalue equal to one. Such computations
can be carried out by hand for small examples, though are usually performed with
software (such as Maple or Mathematica) for larger systems. In the next sub-section
we consider what changes when we drop the irreducible assumption. We will consider
the periodic case when we turn to infinite state space Markov chains in Section 3.5.2.
Example 3.5.7. Consider a Markov chain with state space {0, 1, 2, 3} and transition
matrix  
0 1/5 3/5 1/5
 
 1/4 1/4 1/4 1/4 
P = 1
.

 0 0 0 
0 1/2 1/2 0
Find limn→∞ P {Xn = 2}.
Solution. It is easy to verify that
 
3/16 77/400 18/400 67/400
 
 127/320 57/320 97/320 39/320 
3
P =  ,
13/20 3/20 3/20 1/20 
 
5/32 7/32 15/32 5/32

showing that Theorem 3.5.6 applies. The eigenvector of P (normalized to be a prob-


ability distribution) associated with eigenvalue 1 is

π = [22/59, 12/59, 22/59, 8/59].

Therefore,
22
lim P {Xn = 2} = .
n→∞ 59

Reducible chains
We turn to the case of a reducible chain and begin with some examples.
Example 3.5.8. Consider the gambler’s ruin problem on the state space {0, 1, 2, . . . , N }.
Setting
πα = (α, 0, 0, . . . , 1 − α),
for any 0 ≤ α ≤ 1, it is straightforward to show that παT P = παT . Thus, there are
uncountably many stationary distributions for this example, though it is important
to note that they are all linear combinations of (1, 0, . . . , 0) and (0, . . . , 0, 1), which
are the stationary distributions on the recurrent classes {0} and {N }. 
Example 3.5.9. Consider the Markov chain on {1, 2, 3, 4} with
 
P1 0
P = ,
0 P2

65
with  
1/2 1/2
Pi = , for i ∈ {1, 2}.
1/2 1/2
Then, the communication classes {1, 2} and {3, 4} are each irreducible and aperiodic,
and have stationary distribution (1/2, 1/2). Also for any 0 ≤ α ≤ 1,

α(1/2, 1/2, 0, 0) + (1 − α)(0, 0, 1/2, 1/2) = (α/2, α/2, (1 − α)/2, (1 − α)/2)

is a stationary distribution for the transition matrix P . 

The above examples essentially show what happens in this case of a reducible
Markov chain with a finite state space. All of the mass of a limiting distribution will
end up on the recurrent classes, and the form of the stationary distribution on the
recurrent classes can be found by the results in the previous section.
Consider now a general finite state space Markov chain with reducible state space,
S, that is restricted to any recurrent communication class R1 ⊂ S. If the Markov
chain is aperiodic on R1 , then by Theorem 3.5.6 a unique stationary distribution,
π (1) , exists with support only on R1 . Clearly, the previous argument works for each
recurrent communication class Rk ⊂ S. Therefore, we have the existence of a family
of stationary distributions, π (k) , which are limiting stationary distributions for the
Markov chain restricted to the different Rk . We note the following (some of which
are left as homework exercises to verify):

1. Each such π (k) is a stationary distribution for the original, unrestricted Markov
chain.

2. Assuming there are m recurrent communication classes, each linear combination

a1 π (1) + · · · + am π (m) (3.20)



with ai ≥ 0 and i ai = 1, is a stationary distribution for the unrestricted
Markov chain, Xn .

3. All stationary distributions of the Markov chain Xn can be written as a linear


combination of the form (3.20).

Thus, in the case that the Markov chain is reducible, the limiting probabilities will
depend on the initial condition. That is, if αk (i) is the probability that the chain
ends up in recurrent class Rk given it starts in state i, then for j ∈ Rk ,
(n) (k)
lim pij = αk (i)πj (3.21)
n→∞

where we will discuss how to calculate αk (i) in later sections. Note, however, that
αk (i) will be one if i, j are in the same recurrent class, zero if they are in different
recurrent classes, and between zero and one if i is transient and i → j. We conclude
that if v is an initial distribution for a reducible finite state space Markov chain, then
the limit limn→∞ v T P n will always exists, though it will depend upon v.

66
3.5.2 Countable Markov chains
We now extend the results of the previous section to the setting of a countably infinite
state space. We note that every result stated in this section also holds for the finite
state space case, and these are the most general results. We begin with an example
demonstrating a major difference between the finite and countable state space setting.
Example 3.5.10. Consider symmetric random walk on the integers. That is, the
state space is S = Z and pi,i+1 = pi,i−1 ≡ 1/2 for all i. We know from Example 3.4.12
that this chain is recurrent, and we search for a stationary distribution π satisfying
π T = π T P , where P is the transition matrix. This yields
 1
πj = πk pkj = πj−1 pj−1,j + πj+1 pj+1,j = πj−1 (1/2) + πj+1 (1/2) = (πj−1 + πj+1 ),
k
2

for all j ∈ Z. These can be solved by taking πj ≡ 1. Note, however, that in this case
we can not scale the solution to get a stationary distribution, and so such a π is an
invariant measure, though not a stationary distribution. 
While the Markov chain of the previous example was recurrent, and therefore one
might expect a stationary distribution to exist, it turns out the chain “is not recurrent
enough.” We recall that we define τi to be the first return time to state i,
def
τi = min{n ≥ 1 : Xn = i},

where we take τi = ∞ if the chain never returns. We further recall that the state i is
called recurrent if Pi (τi < ∞) = 1 and transient otherwise. In the infinite state space
setting it is useful to subdivide the set of recurrent states even further.
Definition 3.5.11. The value


def
µ i = E i τi = nPi {τi = n}
n=1

is called the mean recurrence time or mean first return time for state i. We say that
the chain is positive recurrent if Ei τi < ∞, and null recurrent otherwise.
Note that we have we have µi = ∞ for a transient state as in this case Pi {τi =
∞} > 0.
The following is stated without proof. However, for those that are interested, the
result follows directly from basic renewal theory, see [13, Chapter 3]. Theorem 3.5.12
captures the main difference between positive recurrent and other (null recurrent and
transient) chains.
Theorem 3.5.12. Consider a recurrent, irreducible, aperiodic Markov chain. Then,
for any i, j ∈ S
(n) 1
lim pji = ,
n→∞ µi
where if µi = ∞ (null recurrence), we interpret the right hand side as zero.

67
The similar theorem for periodic chains is the following.

Theorem 3.5.13. Let Xn be a recurrent, irreducible, d-periodic Markov chain. Then,


for any i ∈ S
(nd) d
lim pii = ,
n→∞ µi
where if µi = ∞ (null recurrence), then we interpret the right hand side as zero.

Recurrence has already been shown to be a class property. The following theorem
shows that positive recurrence is also a class property.

Theorem 3.5.14. Suppose that i ↔ j belong to the same class and that state i is
positive recurrent. Then state j is positive recurrent.

Proof. We will prove the result in the aperiodic case so that we may make use of
Theorem 3.5.12. We know from Theorem 3.5.12 that
(n) 1
lim pkj = ,
n→∞ µj

for any k in the same class as j. Because j is positive recurrent if and only if µj < ∞,
we see it is sufficient to show that
(n)
lim pij > 0.
n→∞

(m)
Because i ↔ j, there is an m > 0 for which pij > 0. Therefore,

(n) (n+m) (n) (m) (m) (n) (m) 1


lim pij = lim pij ≥ lim pii pij = pij lim pii = pij > 0,
n→∞ n→∞ n→∞ n→∞ µi
where the final equality hods from Theorem 3.5.12 applied to state i.
Therefore, we can speak of positive recurrent chains or null recurrent chains.

Example 3.5.15. Consider again the symmetric (p = 1/2) random walk on the
integer lattice. We previously showed that

(2n) 1
p00 ∼ √ .
πn
(2n)
Therefore, limn→∞ p00 ∼ √1πn = 0, and by Theorem 3.5.13 we have that µ0 = ∞,
and the chain is null recurrent. Thus, when p = 1/2, the chain is periodic and null
recurrent, and when p = 1/2, the chain is periodic and transient. 

Theorem 3.5.12 also gives a strong candidate for a limiting stationary distribution
for a positive recurrent, irreducible, aperiodic Markov chain.

68
Theorem 3.5.16. If a Markov chain is irreducible and recurrent, then there is an
invariant measure π, unique up to multiplicative constants, that satisfies 0 < πj < ∞
for all j ∈ S. Further, if the Markov chain is positive recurrent then
1
πi = ,
µi

where µi is the mean recurrence time of state i, i πi = 1, and π is a stationary
(n)
distribution of the Markov chain. If the Markov chain is also aperiodic, then pji → πi ,
as n → ∞, for all i, j ∈ S.
Proof. We will verify the result in the positive recurrent case only, and
 direct the
reader to [13, Chapter 2.12] for the full details. We first show that i∈S πi = 1.
Choosing any k ∈ S, we see
 (n)  1
1 = lim pkj = ,
n→∞
j∈S j∈S
µ j

where the final equality follows from Theorem 3.5.12. Next, for any k ∈ S,
1 (n+1)

= lim pki = lim P {Xn+1 = i | Xn = j}P {Xn = j | X0 = k}
µi n→∞ j∈S
n→∞
 (n)
= pji lim pkj
n→∞
j∈S
 1
= pji .
j∈S
µj

Thus, the result is shown.


Note that Theorem 3.5.16 guarantees the existence of a stationary distribution
even in the chain is periodic.
Example 3.5.17. Consider reflecting random walk on {1, 2, 3, 4}. That is, the
Markov chain with transition matrix
 
0 1 0 0
 1/2 0 1/2 0 
P = 0 1/2 0 1/2
.

0 0 1 0
This chain has period two, and for large n we have
   
1/3 0 2/3 0 0 2/3 0 1/3
 0 2/3 0 1/3   1/3 0 2/3 0 
P 2n ≈ 
 1/3 0
 , P 2n+1 ≈  .
2/3 0   0 2/3 0 1/3 
0 2/3 0 1/3 1/3 0 2/3 0
The unique stationary distribution of the chain can be calculate, however, and is
π = [1/6, 1/3, 1/3, 1/6]. While π does not, in this case, give the long run probabilities
of the associated chain, we will see in Theorem 3.5.22 a useful interpretation of π as
giving the average amount of time spent in each state. 

69
A question still remains: can the invariant measure of a null recurrent chain be
normalized to give a stationary distribution? The answer, given in the following
theorem, is no.

Theorem 3.5.18. Suppose a Markov chain is irreducible and that a stationary dis-
tribution π exists: 
π  = π  P, πj = 1, πj > 0.
j∈S

Then, the Markov chain is positive recurrent.

Thus, a necessary and sufficient condition for determining positive recurrence is


simply demonstrating the existence or non-existence of a stationary distribution. Note
also that the above results provides an effective algorithm for computing the mean
return times: compute the invariant distribution using

π T = π T P,

and invert the component of interest.

Example 3.5.19 (Random walk with partially reflecting boundaries, [10]). Consider
again a random walker on S = {0, 1, 2, . . . }. Suppose that for j ∈ S the transition
probabilities are given by

pj,j+1 = p, pj,j−1 = 1 − p, if j ≥ 1,
p01 = p, p00 = 1 − p.

This Markov chain is irreducible and aperiodic. We want to determine when this
model will have a limiting stationary distribution, and, hence, when it is positive
recurrent.
A stationary distribution for this system must satisfy

πj+1 (1 − p) + πj−1 p = πj , j > 0 (3.22)


π1 (1 − p) + π0 (1 − p) = π0 , (3.23)

with the condition that πj ≥ 0 and ∞ j=0 πj = 1. Solving the difference equations,
the general solution to equation (3.22) is
  j
p
c1 + c2 1−p , p = 1/2
πj = .
c1 + c2 j, p = 1/2

However, equation (3.23) shows


1−p
π0 = π1 .
p

70
Plugging this into the above equation shows c1 = 0 in the p = 1/2 case, and that
c2 = 0 in the p = 1/2 case. Therefore,
  j
p
c2 1−p , p = 1/2
πj = .
c1 , p = 1/2

Because we need ∞ j=0 πj = 1 for a distribution to exist, we see that if p = 1/2, no
choice of c1 could satisfy this condition.
Now just consider the case p = 1/2. We obviously require that c2 > 0. If p > 1/2,
then p/(1 − p) > 1 and the sum

  j
p
c2 = ∞.
j=0
1−p

If, on the other hand, p < 1/2, then



  j
p 1−p
c2 = c2 .
j=0
1−p 1 − 2p

Therefore, taking c2 = (1 − 2p)/(1 − p) gives us a stationary distribution of


 j
1 − 2p p
πj = .
1−p 1−p

Thus, the chain is positive recurrent when p < 1/2, which is believable. We also know
that the chain is either null recurrent or transient if p ≥ 1/2. 

Suppose that we want to figure out when the chain of the previous example is
either null recurrent or transient. We will make use of the following non-trivial fact,
which is stated without proof. We will make use of this fact again in later sections.

Theorem 3.5.20. Let Xn be an irreducible Markov chain with state space S, and let
i ∈ S be arbitrary. Then Xn is transient if and only if there is a unique solution,
α : S → R, to the following set of equations

0 ≤ αj ≤ 1 (3.24)
αi = 1, inf{αj : j ∈ S} = 0 (3.25)

αj = pjk αk , i = j. (3.26)
k∈S

It is reasonable to ask why these conditions are at least believable. Suppose we


define
αj = P {Xn = i for some n ≥ 0 | X0 = j},

71
and we assume our chain is transient. Then, αi = 1 by constructions and we should
have αi → 0 by transience (though we are not going to prove this fact). Finally, for
j = i, we have
αj = P {Xn = i for some n ≥ 0 | X0 = j}
= P {Xn = i for some n ≥ 1 | X0 = j}

= P {Xn = i for some n ≥ 1 | X1 = k}P {X1 = k | X0 = j}
k

= pjk αk .
k

In the recurrent case, we know αj ≡ 1, and so there should be no solution satisfying


(3.25).
Example 3.5.21. We return to the previous example and try to figure out when the
chain is transient. Take i = 0. We will try to find a solution to the above equations.
Equation (3.26) states that we must have
αj = (1 − p)αj−1 + pαj+1 , j > 0.
The solution to this difference equation is
  j
c1 + c2 1−p , if p = 1/2
αj = p .
c1 + c2 j, if p = 1/2
We must have that α0 = 1. Therefore, we have
  j
(1 − c2 ) + c2 1−p , if p = 1/2
αj = p .
1 + c2 j, if p = 1/2
If c2 = 0 in either, then αj ≡ 1, and we can not satisfy our decay condition. Also, if
p = 1/2 and c2 = 0, then the solution is not bounded. Thus, there can be no solution
in the case p = 1/2, and the chain is recurrent in this case. If p < 1/2, we see that the
solution will explode if c2 = 0. Thus, there is no solution for p < 1/2. Of course we
knew this already because we already showed it was posetive recurrent in this case!
For the case p > 1/2, we have that 1 − p < p, we see we can take c2 = 1 and find that
 j
1−p
αj = ,
p
is a solution. Thus, when p > 1/2, the chain is transient. 
We end this section with a theorem that shows that the time averages of a single
path of an irreducible and positive recurrent Markov chain is equal to the chains space
average. This is incredibly useful and shows that one way to compute statistics of the
stationary distribution is to compute one very long path and average over that path.
For a proof of the Theorem below, we point the interested reader to [13, Chapter
2.12].

72
Theorem 3.5.22. Consider an irreducible, positive recurrent Markov chain with
unique stationary distribution π. If we let
n−1

Ni (n) = 1{Xk =i} ,
k=0

denote the number of visits to state i before time n. Then,


 
Ni (n)
P → πi , as n → ∞ = 1.
n
Moreover, for any bounded function f : S → R,
 n−1 
1 
P f (Xk ) → f (i)πi , as n → ∞ = 1.
n k=0 i∈S

The final result says that the time averages of a single realization of the Markov
chain converge (with probability one) to the “space averages” obtained by simply
taking expectations with respect to the distribution π. More explicitly, think of a
random variable X∞ having probability mass function P {X∞ = i} = πi . Then, by
definition, 
f (i)πi = Ef (X∞ ).
i∈S

Therefore, another, more suggestive, way to write the last result is


 n−1 
1
P f (Xk ) → Ef (X∞ ), as n → ∞ = 1.
n k=0

Example 3.5.23. Consider the Markov chain with state space {1, 2, 3} and transition
matrix  
1/3 2/3 0
P =  1/4 1/2 1/4  . (3.27)
1 0 0
It is simply to check that the unique stationary distribution of this chain is π =
[3/8, 1/2, 1/8]. Therefore, for example, limn→∞ P {Xn = 3} = 1/8. However,  we can
also approximate this value using Theorem 3.5.22. Figure 3.5.23 plots (1/n) n−1
k=0 1{Xk =3}
versus n for one realization of the chain. We see it appears to converges to 1/8. 

3.6 Transition probabilities


In this section we ask the following questions for Markov chains with finite state
spaces.

1. How many steps do we expect the chain to make before being absorbed by a
recurrent class if X0 = i is a transient state?

73
0.25

(1/n)N3(n)

0.2

0.15

0.125

0.1

0.05

0
0 500 1000 1500 2000
n


Figure 3.5.1: (1/n) n−1
k=0 1{Xk =3} versus n for one realization of the Markov chain with
transition matrix (3.27). A line of height 0.125 = 1/8 has been added for reference.

2. For given states i, j ∈ S of an irreducible chain, what is the expected number


of needed steps to go from state i to state j?

3. If X0 = j is a transient state, and the recurrent classes are denoted R1 , R2 , . . . ,


what is the probability that the chain eventually ends up in recurrent class Rk ?

We answer these questions sequentially and note that much of the treatment
presented here follows Section 1.5 in Greg Lawler’s book [10].
Question 1. We let P be the transition matrix for some finite Markov chain Xn .
We recall that after a possible reordering of the indices, we can write P as
 
P̃ 0
P = , (3.28)
S Q

where P̃ is the transition matrix for only those states associated with recurrent states,
Q is the submatrix of P giving the transition probabilities from the transient states
to the transient states, and S is the submatrix of P giving the transition probabilities
from the transient states to the recurrent states. Raising powers of P in the form
(3.28) yields,  n 
n P̃ 0
P = .
Sn Q n

74
For example, consider the the Markov chain with state space {1, 2, 3, 4} and tran-
sition matrix given by (3.12),
 
1/2 1/4 1/4 0
 1/3 2/3 0 0 
P =  0
.
0 1/3 2/3 
0 0 3/4 1/4

After reordering the elements of the state space as {3, 4, 1, 2} the new transition
matrix is  
1/3 2/3 0 0
 3/4 1/4 0 0 
 
 1/4 0 1/2 1/4  , (3.29)
0 0 1/3 2/3
and for this example
     
1/3 2/3 1/2 1/4 1/4 0
P̃ = , Q= , and S = .
3/4 1/4 1/3 2/3 0 0

Note that, in general, S will not be a square matrix.


The matrix Q will always be a substochastic matrix, meaning the row sums are
less than or equal to one, with at least one row summing to a value that is strictly
less than one.

Proposition 3.6.1. Let Q be a substochastic matrix. Then the eigenvalues of Q all


have absolute values strictly less than one.

The above proposition can be proved in a number of ways using basic linear algebra
techniques. However, for our purposes it may be best to understand it in the following
probabilistic way. Because each of the states represented by Q are transient, we know
that Qn , which gives the n step transition probabilities between the transient states,
converges to zero, implying the result.
Because the eigenvalues of Q have absolute value strictly less than one, the equa-
tion (Id − Q)v = 0, where Id is the identity matrix with the same dimensions as Q,
has no solutions. Thus, Id − Q is invertible and we define
def
M = (Id − Q)−1 = Id + Q + Q2 + · · · , (3.30)

where the second equality follows from the identity

(Id + Q + Q2 + · · · )(Id − Q) = Id .

Now consider a transient state j. We let Rj denote the total number of visits to
j,


Rj = 1{Xn =j} ,
n=0

75
where we explicitly note that if the chain starts in state j, then we count that as one
visit. Note that Rj < ∞ with a probability of one no matter the initial condition
since j is transient.
Suppose that X0 = i, where i is also transient. Then,

 ∞
 (n)
E[Rj | X0 = i] = P {Xn = j | X0 = i} = pij .
n=0 n=0

Therefore, we have shown that E[Rj | X0 = i] is the i, j th entry of

Id + P + P 2 + · · · ,

which, because both i and j are transient, is the same as the i, j th entry of

Id + Q + Q2 + · · · = (I − Q)−1 .

Therefore, we can conlude that the expected number of visits to state j, given that
the chain starts in state i, is Mij , defined in (3.30).
For example, consider the Markov chain with transition matrix (3.29). For this
example, the matrix M for the transient states {1, 2} is
 
−1 4 3
M = (Id − Q) = .
4 6

We see that starting in state 1, for example, the expected number of visits to state 2
before being absorbed to the recurrent states is equal to M12 = 3. Starting in state 2,
the expected number of visits to state 2 (including the first) is M22 = 6. Now suppose
we want to know the total number of visits to any recurrent state given that X0 = 2.
This value is give by
E2 R1 + E2 R2 = M21 + M22 = 10,
and we see that we simply need to sum the second row of M . We also see that the
expected total number of steps needed to transition from state 2 to a recurrent state
is 10.
More generally, we have shown the following.

Proposition 3.6.2. Consider a Markov chain with transition matrix P given by


(3.29). Then, with M defined via (3.30), and states i, j both transient, Mij gives
the expected number of visits to the transient state j given that X0 = i. Further, if
we define 1 to be the vector consisting of all ones, then M 1 is a vector whose ith
component gives the total expected number of visits to transient states, given that
X0 = i, before the chain is absorbed by the recurrent states.

Question 2. We now turn to the second question posed at the beginning of this
section: for given states i, j ∈ S of an irreducible chain, what is the expected number
of needed steps to go from state i to state j?

76
With the machinery just developed, this problem is actually quite simple now.
We begin by reordering the state space so that j is the first element. Hence, the
transition matrix can be written as
 
pjj U
P = ,
S Q

where Q is a substochastic matrix and the row vector U has the transition probabilities
from j to the other states. Next, simply note that the answer to the question of how
many steps are required to move from i to j would be unchanged if we made j an
absorbing state. Thus, we can consider the problem on the system with transition
matrix  
1 0
P̃ = ,
S Q
where all notation is as before. However, this is now exactly the same problem solved
above and we see the answer is M 1i , where all notation is as before.

Example 3.6.3 (Taken from Lawler, [10]). Suppose that P is the transition matrix
for random walk on {0, 1, 2, 3, 4} with reflecting boundary:
 
0 0 1 0 0 0
1   1/2 0 1/2 0 0  
P = 2   0 1/2 0 1/2 0 .

3  0 0 1/2 0 1/2 
4 0 0 0 1 0

If we let i = 0, then
   
0 1/2 0 0 2 2 2 1
 1/2 0 1/2 0   2 4 4 2 
Q=  0
, M = (I − Q)−1 = .
1/2 0 1/2   2 4 6 3 
0 0 1 0 2 4 6 4

Thus,
M 1 = (7, 12, 15, 16),
Therefore, the expected number of steps needed to get from state 3 to state 0 15. 

Example 3.6.4. Consider the Jukes-Cantor model of DNA mutation. The transition
matrix for this model is
 
1 − ρ ρ/3 ρ/3 ρ/3
 ρ/3 1 − ρ ρ/3 ρ/3 
P =
 ρ/3
,
ρ/3 1 − ρ ρ/3 
ρ/3 ρ/3 ρ/3 1 − ρ

If at time zero the nucleotide is in state 1, how many steps do we expect to take place
before it enters states 3 or 4? Recalling that the different states are A, G, C, and T,

77
we note that A (adenine) and G (guanine) are purines and that C (cytosine) and T
(thymine) are pyrimidines. Thus, this question is asking for the expected time until
a given purine converts to a pyrimidine.
We make {3, 4} absorbing states, reorder the state space as {3, 4, 1, 2} and note
that the new transition matrix is
 
1 0 0 0
 0 1 0 0 
 
 ρ/3 ρ/3 1 − ρ ρ/3  ,
ρ/3 ρ/3 ρ/3 1 − ρ

with Q and M = (Id − Q)−1 given via


   9 3 
1 − ρ ρ/3 8ρ 8ρ
Q= , and M = 3 9 .
ρ/3 1 − ρ 8ρ 8ρ

Therefore, the expected number of transitions needed to go from state 1 (A) to states
3 or 4 (C or T) is
9 3 31
M11 + M12 = + = .
8ρ 8ρ 2ρ
Note that this value goes to ∞ as ρ → 0, which is reasonable. 

Question 3. We turn now to the third question laid out at the beginning of this
section: if X0 = j is a transient state, and the recurrent classes are denoted R1 , R2 , . . . ,
what is the probability that the chain eventually ends up in recurrent class Rk ? Note
that this question was asked first in and around equation (3.21).
We begin by noting that we can assume that each recurrent class consists of a
single point (just group all the states of a class together). Therefore, we denote
the recurrent classes as r1 , r2 , . . . , with pri ,ri = 1. Next, we let t1 , t2 , . . . denote the
transient states. We may now write the transition matrix as
 
I 0
P = ,
S Q

where we put the recurrent states first. For any transient state ti and recurrent class
k, we define
def
αk (ti ) = P {Xn = rk for some n ≥ 0 | X0 = ti }.

78
For a recurrent states rk , ri we set αk (rk ) ≡ 1, and αk (ri ) = 0, if i = k. Then, for any
transient state ti we have
αk (ti ) = P {Xn = rk for some n ≥ 0 | X0 = ti }

= P {X1 = j | X0 = ti }P {Xn = rk for some n ≥ 0 | X1 = j}
j∈S

= pti ,j αk (j)
j∈S
 
= pti ,rj αk (rj ) + pti ,tj αk (tj )
rj tj

= pti ,rk + pti ,tj αk (tj ),
tj

where the first sum was over the recurrent states and the second (and remaining)
sum is over the transient states. If A is the matrix whose i, k th entry is αk (ti ), then
the above can be written in matrix form:
A = S + QA.
Again letting M = (I − Q)−1 we have
A = (I − Q)−1 S = M S.
Example 3.6.5. Consider again the Markov chain with state space {3, 4, 1, 2} and
transition matrix  
1/3 2/3 0 0
 3/4 1/4 0 0 
 .
 1/4 0 1/2 1/4 
0 0 1/3 2/3
Note that for this example, we know that we must enter state 3 before state 4, so it
is a good reality check on our analysis above. We again have
   
4 3 1/4 0
M= , and S = ,
4 6 0 0
and so  
1 0
MS = ,
1 0
as expected. 
Example 3.6.6 (Taken from Lawler, [10]). As an example, consider random walk
with absorbing boundaries. We order the states S = {0, 4, 1, 2, 3, } and have
 
1 0 0 0 0
 0 1 0 0 0 
 
P =  1/2 0 0 1/2 0 ,

 0 0 1/2 0 1/2 
0 1/2 0 1/2 0

79
Then,
     
1/2 0 3/2 1 1/2 3/4 1/4
S= 0 0 , M =  1 2 1 , M S =  1/2 1/2 
0 1/2 1/2 1 3/2 1/4 3/4

Thus, starting at state 1, the probability that the walk is eventually absorbed at state
0 is 3/4. 

3.7 Exercises
1. Suppose there are three white and three black balls in two urns distributed so
that each urn contains three balls. We say the system is in state i, i = 0, 1, 2, 3,
if there are i white balls in urn one. At each stage one ball is drawn at random
from each urn and interchanged. Let Xn denote the state of the system after the
nth draw. What is the transition matrix for the Markov chain {Xn : n ≥ 0}.

2. (Success run chain.) Suppose that Jake is shooting baskets in the school
gym and is very interested in the number of baskets he is able to make in a
row. Suppose that every shot will go in with a probability of p ∈ (0, 1), and
the success or failure of each shot is independent of all other shots. Let Xn
be the number of shots he has currently made in a row after n shots (so, for
example, X0 = 0 and X1 ∈ {0, 1}, depending upon whether or not he hit the
first shot). Is it reasonable to model Xn as a Markov chain? What is the state
space? What is the transition matrix?

3. (Jukes-Cantor model of DNA mutations) Consider a single nucleotide on a


strand of DNA. We are interested in modeling possible mutations to this single
spot on the DNA. We say that Xn is in state 1, 2, 3, or 4, if the nucleotide
is the base A, G, C, or T, respectively. We assume that there is a probability,
ρ ∈ (0, 1), that between one time period and the next, we will observe a change
in this base. If it does change, we make the simple assumption that each of the
other three bases are equally likely.

(a) What is the transition matrix for this Markov chain?


(b) What are the eigenvalues, and associated left eigenvectors? To compute the
eigenvectors, trial and error is possible, and so is a rather long calculation.
It will also be okay if you simply use software to find the eigenvectors.
(10) (100) (1,000) (10,000)
(c) If ρ = 0.01, what are (approximately): p13 , p13 , p13 , p13 ?

4. Suppose that whether or not it rains tomorrow depends on previous weather


conditions only through whether or not it is raining today. Assume that the
probability it will rain tomorrow given it rains today is α and the probability
it will rain tomorrow given it is not raining today is β. If the state space is
S = {0, 1} where state 0 means it rains and state 1 means it does not rain on

80
a given day. What is the transition matrix when we model this situation with
a Markov chain. If we assume there is a 40% chance of rain today, what is the
probability it will rain three days from now if α = 7/10 and β = 3/10.

5. Verify the condition (3.4). Hint, use an argument like equation (3.5).

6. (a) Show that the product of two stochastic matrices is stochastic.

 matrix P , and any row vector π, we have πP 1 ≤


(b) Show that for stochastic
π1 , where v1 = i |vi |. Deduce that all eigenvalues, λ, of P must
satisfy |λ| ≤ 1.

7. Let Xn denote a discrete time Markov chain with state space S = {1, 2, 3, 4}
and with transition Matrix
 
1/4 0 1/5 11/20
 0 0 0 1 
P =  1/6 1/7 0 29/42  .

1/4 1/4 1/2 0

(a) Suppose that X0 = 1, and that

(U1 , U2 , . . . , U10 )
= (0.7943, 0.3112, 0.5285, 0.1656, 0.6020, 0.2630, 0.6541, 0.6892, 0.7482, 0.4505)

is a sequence of 10 independent uniform(0, 1) random variables. Using


these random variables (in the order presented above) and the construction
of Section 3.2, what are Xn , n ∈ {0, 1, . . . , 10}? Note, you are supposed to
do this problem by hand.
(b) Using Matlab, simulate a path of Xn up to time n =100 using the con-
struction of Section 3.2. A helpful sample Matlab code has been provided
on the course website. Play around with your script. Try different values
of n and see the behavior of the chain.

8. Consider a chain with state space {0, 1, 2, 3, 4, 5} and transition matrix


 
1/2 0 0 0 1/2 0
 0 3/4 1/4 0 0 0 
 
 
 0 1/8 7/8 0 0 0 
 
P = 
 1/2 1/4 1/4 0 0 0 
 
 
 1/3 0 0 0 2/3 0 
0 0 0 1/2 0 1/2

What are the communication classes? Which classes are closed? Which classes
are recurrent and which are transient?

81
9. Consider a finite state space Markov chain, Xn . Suppose that the recurrent
communication classes are R1 , R2 , . . . , Rm . Suppose that restricted to Rk , the
Markov chain is irreducible and aperiodic, and let π̃ (k) be the unique limiting
stationary distribution for the Markov chain restricted to Rk . Now, for each Rk ,
let πk (note the lack of a tilde) be the vector with components equal to those
of π̃ (k) for those states in Rk , and zero otherwise. For example, assuming there
are three states in R1 ,
(1) (1) (1)
π (1) = (π̃1 , π̃2 , π̃3 , 0, 0, . . . , 0),

and if there are two states in R2 , then


(2) (2)
π (2) = (0, 0, 0, π̃1 , π̃2 , 0, . . . , 0),

Prove both the following:

(a) Each linear combination

a1 π (1) + · · · + am π (m)

with ai ≥ 0 and i ai = 1, is a stationary distribution for the unrestricted
Markov chain, Xn .
(b) All stationary distributions of the Markov chain Xn can be written as such
a linear combination. (Hint: use the general form of the transition matrix
given by equation (3.14). Now, break up an arbitrary stationary distribu-
tion, π, into the different components associated with each communication
class. What can be concluded about each piece of π?)

10. Consider the Markov chain described in Problem 3 above. What is the sta-
tionary distribution for this Markov chain. Interpret this result in terms of the
probabilities of the nucleotide being the different possible values for large times.
Does this result make sense intuitively?

11. Show that the success run chain of Problem 2 above is positive recurrent. What
is the stationary distribution of this chain? Using the stationary distribution,
what is the expected number of shots Jake will hit in a row.

12. Let Xn be the number of customers in line for some service at time n. During
each time interval, we assume that there is a probability of p that a new customer
arrives. Also, with probability q, the service for the first customer is completed
and that customer leaves the queue. Assuming at most one arrival and at most
one departure can happen per time interval, the transition probabilities are

pi,i−1 = q(1 − p), pi,i+1 = p(1 − q)

pii = 1 − q(1 − p) − p(1 − q), i>0

p00 = 1 − p, p01 = p.

82
(a) Argue why the above transition probabilities are the correct ones for this
model.
(b) For which values of p and q is the chain null recurrent, positive recurrent,
transient?
(c) For the positive recurrent case, give the limiting probability distribution
π. (Hint: note that the equation for π0 and π1 are both different than the
general nth term.)
(d) Again in the positive recurrent case, using the stationary distribution you
just calculated, what is the expected length of the queue in equilibrium?
What happens to this average length as p → q. Does this make sense?

13. This problem has you redo the computation of Example 3.27, though with a
different Markov chain. Suppose our state space is {1, 2, 3, 4} and the transition
matrix is  
1/4 0 1/5 11/20
 0 0 0 1 
P = 1/6 1/7 0 29/42  ,

1/4 1/4 1/2 0


which was the transition matrix of problem 7 above. Using Theorem 3.5.22,
estimate limn→∞ P {Xn = 2}. Make sure you choose a long enough path, and
that you plot your output (to turn in). Compare your solution with the actual
answer computed via the left eigenvector (feel free to use a computer for that
part).

14. (Taken from Lawler, [10]) You will need software for this problem to deal with
the matrix manipulations. Suppose that we flip a fair coin repeatedly until
we flip four consecutive heads. What is the expected number of flips that are
needed? (Hint: consider a Markov chain with state space {0, 1, 2, 3, 4}.)

15. You will need software for this problem to deal with the matrix manipulations.
Consider a Markov chain Xn with state space {0, 1, 2, 3, 4, 5} and transition
matrix  
1/2 0 0 0 1/2 0
 0 3/4 1/4 0 0 0 
 
 
 0 1/8 7/8 0 0 0 
 
P = 
 1/2 1/4 1/4 0 0 0 
 
 
 1/3 0 0 0 1/3 1/3 
0 0 1/4 1/4 0 1/2
Here the only recurrent class is {1, 2}. Suppose that X0 = 0 and let

T = inf{n : Xn ∈ {1, 2}}.

(a) What is ET ?

83
(b) What is P {XT = 1}? P {XT = 2}? (Note that this is asking for the
probabilities that when the chain enters the recurrent class, it enters into
state 1 or 2.)

16. (Taken from Lawler, [10]) You will need software for this problem to deal with
the matrix manipulations. Let Xn and Yn be independent Markov chains with
state space {0, 1, 2} and transition matrix
 
1/2 1/4 1/4
P =  1/4 1/4 1/2  .
0 1/2 1/2

Suppose that X0 = 0 and Y0 = 2 and let

T = inf{n : Xn = Yn }.

A hint for all parts of this problem: consider the nine-state Markov chain Zn =
(Xn , Yn ).

(a) Find E(T ).


(b) What is P {XT = 2}?
(c) In the long run, what percentage of the time are both chains in the same
state?

84

You might also like