KEMBAR78
478 Lecture4 | PDF | Markov Chain | Stochastic Process
0% found this document useful (0 votes)
15 views13 pages

478 Lecture4

The document discusses Markov chains in discrete time, focusing on transition probabilities and the Markov property, which states that future states depend only on the current state. It provides examples of Markov processes, including random walks and birth-death processes, and introduces the Chapman-Kolmogorov equations for calculating probabilities in Markov chains. Additionally, it explores the implications of finite and infinite state spaces and the use of transition matrices to represent these processes.

Uploaded by

jjop0519
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views13 pages

478 Lecture4

The document discusses Markov chains in discrete time, focusing on transition probabilities and the Markov property, which states that future states depend only on the current state. It provides examples of Markov processes, including random walks and birth-death processes, and introduces the Chapman-Kolmogorov equations for calculating probabilities in Markov chains. Additionally, it explores the implications of finite and infinite state spaces and the use of transition matrices to represent these processes.

Uploaded by

jjop0519
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Markov chain in discrete time

Math 478

April 1, 2019

1 Transition probability
Let Xk , k = 0, 1, · · · be a stochastic process in discrete time. An important ingredient
to understand the evolution of Xk is its conditional transition probability given its
past history:

P (Xn+1 ∈ En+1 Xn ∈ En , Xn−1 ∈ En−1 , · · · , X0 ∈ E0 ),

where Ei , i = 1, · · · , n + 1 are subsets of the state space (the space where Xk takes
value). If this transition probability only depends on the most recent history:

P (Xn+1 ∈ En+1 Xn ∈ En , Xn−1 ∈ En−1 , · · · , X0 ∈ E0 ) = P (Xn+1 ∈ En+1 Xn ∈ En ),

we say Xk is Markov. In general, a Markov process Xt (with respect to a filtration FtY


generate by a random variable Y possibly different from X) can be stated succinctly
as any stochastic process that satisfies for any s < t and any function f

E(f (Xt ) FsY ) = E(f (Xt ) Ys ),

where FsY represents all the information given by Yu , 0 ≤ u ≤ s.


In the discrete time setting, one may potentially have two definitions for Markov
property:

E(f (Xn+1 ) Xn , Xn−1 , · · · , X0 ) = E(f (Xn+1 ) Xn ) ∀n,

or

E(f (Xn ) Xm , Xm−1 , · · · , X0 ) = E(f (Xn ) Xm ) ∀m < n.

The second definition obviously is more general than the first one. It turns out that
the two are equivalent, as we shall see.

1
Example 1.1. Random walk as a discrete time Markov process
Let Xi be iid and Sk = ki=1 Xi . Then for m < n
P

n
X
E(Sn |Sm , Sm−1 , · · · S0 ) = E(Sm + Xi |Sm , Sm−1 , · · · S0 )
i=m+1
Xn
= Sm + E( Xi |Sm , Sm−1 , · · · S0 )
i=m+1
Xn
= Sm + E( Xi )
i=m+1
= E(Sn |Sm ).

A Markov chain is a stochastic process whose state space is discrete. In this case,
we can capture its dynamics if we know for any n, i, j

P (Xn+1 = j Xn = i, Xn−1 = in−1 , · · · , X0 ∈ i0 ) = P (Xn+1 = j Xn = i),

The RHS of this equation is a quantity that only depends on i, j, n. We make the
additional assumption that it is independent of n (which is referred to as a time
homogeneous Markov chain). We assume all the Markov chain we discuss is time
homogeneous unless otherwise mentioned. Then we can denote

P (Xn+1 = j Xn = i) = Pij .

If we make one last assumption that the state space is finite with size n then Pij can
be organized into a n × n matrix, referred to as a transition matrix. Such matrix
obviously has non-negative entries. Since
X X
Pij = P (Xn+1 = j Xn = i) = 1,
j j

the sum of all row entries in a transition matrix is 1. Additionally, since Pij 6= Pji
in general, a transition matrix is NOT necessarily symmetric (in distinction to say, a
correlation matrix). Finally, if a state i is such that Pii = 1 (and thus Pij = 0 for any
i 6= j) then it is referred to as an absorbing state.
Remark 1.2. If we’re willing to entertain the notion of a (countably) infinite dimen-
sional matrix, then the same organization of information can be applied to a Markov
chain with infinite state space. In fact, the matrix notation is only used to facilitate
the computation (see next section for the Chapman-Kolmogorov result) and to apply
linear algebra results where appropriate. If we take care to be precise, the parallels of
those may be applied to the infinite dimensional case as well.

2
Example 1.3. Random walk as a time homogeneous Markov chain
Let Sk be a random walk. Then

P (Sk+1 = j|Sk = i) = P (Xk+1 = j − i) = P (Xk = j − i) = P (Sk = j|Sk−1 = i),

since the Xk ’s are identical. Thus we can denote Pij = P (Sk+1 = j|Sk = i) without
worrying about the dependence on k.

Example 1.4. Transforming the state space to make a process Markov


Some process doesn’t seem to be Markov at first glance. For example, if the
weather tomorrow depends on both the weather today and yesterday, then it is clear
that a day to day weather status is not a Markov process. On the other hand, we can
transform our state space to represent the weather status of 2 consecutive days. Then
this new process is Markov since clearly the weather of today and tomorrow depends
on the weather of today and yesterday.

Example 1.5. Transition probability of a birth-death process


Suppose the numbers of families that check into a hotel on successive days are
independent Poisson RVs with mean λ. Also suppose that the number of days that
a family stays in the hotel is a Geometric(p) RV ( that is a family checks out the
next day with probability p independently of how long they have stayed. Suppose
all families act independently of one another. Show that Xn , the number of families
staying at the hotel on day n is a Markov process. Find its transition probability.
Ans: Let Yn be the number of newly arrived families on day n and Zn be the
number of checked out families on day n (e.g. on day 0 no family checks out and
some family checks in ). Then it is clear that Yn , Zn are independent of Xi , i < n.
Moreover,

Xn+1 = Xn + Yn+1 − Zn+1 .

Thus Xn is a random walk with step size Yn+1 − Zn+1 . This is why it is a Markov
chain. To find the transition probability Pij = P (Xn+1 = j|Xn = i) we need to
understand the distribution of (Yn+1 , Zn+1 )|Xn . It is clear that Yn+1 is a Poisson (λ)
RV independent of Xn . On the other hand, Zn+1 |Xn is a Binomial(Xn , p). Therefore,

P (Xn+1 = j|Xn = i) = P (Y − Z = j − i),

3
where Y, Z are independent, Y is a Poisson λ and Z is a Binomial(i,p). We have
X
P (Y − Z = j − i) = P (Y − Z = j − i|Z = k)P (Z = k)
k
i
X
= P (Y = j − i + k)P (Z = k)
k=max(i−j,0)
i
λj−i+k
 
X
−λ i k
= e p (1 − p)i−k ,
(j − i + k)! k
k=max(i−j,0)

where we sum k from max(i − j, 0) to guarantee that j − i + k ≥ 0.

2 Chapman-Kolmogorov Equations
Let Xk be a Markov chain with transition matrix P. We want to compute

P (Xk = j|Xk−2 = i)

in terms of P. From the law of total probability:


X
P (Xk = j, Xk−2 = i) = P (Xk = j, Xk−1 = n, Xk−2 = i)
n
X
= P (Xk = j|Xk−1 = n, Xk−2 = i)P (Xk−1 = n, Xk−2 = i)
n
X
= P (Xk = j|Xk−1 = n, Xk−2 = i)P (Xk−1 = n|Xk−2 = i)P (Xk−2 = i).
n

Thus it is clear that


X
P (Xk = j|Xk−2 = i) = Pik Pkj ,
k

which is a matrix product entry. Note that the above can be an infinite sum (why
must it converge in this case?) This justifies us to denote
4
Pij2 = P (Xk = j|Xk−2 = i) = (P × P )ij .

One can easily see by induction how one can extend the above procedure to conclude
that
4
Pijn = P (Xk+n = j|Xk = i) = (P n )ij ,

4
where the RHS is understood as the matrix P raised to the nth power and the LHS
is a definition. Finally, for any s, t such that s + m + n = t we have
X
P (Xt = j, Xs = i) = P (Xt = j, Xt−n = k, Xs = i)
k
X
= P (Xt = j|Xt−n = k, Xs = i)P (Xt−n = k|Xs = i)P (Xs = i).
k

Thus
X
Pijm+n = n
Pikm Pkj
k
X
= Pikn Pkj
m
( by a similar reasoning as above ),
k

which is commutativity of matrix products. This is referred to as the Chapman-


Kolmogorov equation.

Example 2.1. Unconditional distribution of a Markov chain


The entry Pijn gives the conditional probability P (Xn = j|X0 = i). On the other
hand, suppose we are given an initial distribution vector π such that πi = P (X0 = i).
Then we have
X
P (Xn = j) = P (Xn = j|X0 = i)P (X0 = i)
i
X
= πi Pijn .
i

In terms of matrix notation, if we denote π n as the distribution vector of Xn then


(π n )T = π T P n . That is we need to left multiply the initial distribution vector with
the transition matrix to get to the unconditional distribution of the chain at a future
time. The transpose is because we conventionally think of a vector as a column vector.

Example 2.2. Markov chain distribution at any time point


An urn always contains 2 balls with possible colors red and blue. At each turn, a
ball is selected at random and replaced with a ball of the other color with probability
0.2 and kept the same with probabiltiy 0.8. Initially both balls are red. Find the
probability that the fifth ball selected is red.
Ans: There are several ways to approach this problem. It may seem natural to
model the color of the ball selected. But this is not the best idea, nor is it necessarily a
Markov chain, see Ross problem 4.9 ! A better idea is to model the color composition

5
of the balls in the urn at any time. This gives us 3 possibilities : RR, RB, BB,
with corresponding states 0, 1, 2 respectively. Some transition probability is obvious
: P02 = P20 = 0. On the other hand

P01 = 0.2, P00 = 0.8


P10 = .1, P11 = 0.8, P12 = .1
P21 = 0.2, P22 = 0.8

This gives us the transition matrix


 
0.8 0.2 0
P =  0.1 0.8 0.1  .
 

0 0.2 0.8

Initially both balls are red means we start with the initial distribution
 
1
v0 =  0  .
 

The comment here is that we can start with any initial distribution, such as
 
1/3
v0 =  1/3 .
 

1/3
h i
Using v0T = 1 0 0 , the distribution of the urn at time 4 is
 
0.4872
v0T P 4 =  0.4352  ,
 

0.0776

which is the (transpose of) the first row of P 4 . Note how P02
4
6= 0 while P02 = 0. Note
also that this is the unconditional distribution of the states of the urn at step 4, since
we’ve already left multiplied with a given initial distribution.
Picking a ball randomly from the 0th state has 100% chance of being a red while
picking from the 1st state has a 50% chance. Thus the answer is

0.4872 + 0.4352 × 0.5 = 0.7048.

6
Example 2.3. A multidimensional Markov chain
Consider a Markov chain X with r possible states and assume an individual
changes states in accordance with this chain with transition probabilities Pij . The
individuals are assumed to move through the system independently. Suppose the
numbers of people initially in states 1, 2, · · · are independent Poisson RVs with means
1, λ2 , · · · , λr respectively. Find the joint distribution of the numbers of individuals
in states 1, 2, · · · , r at some time n.
Ans: First, note that this problem does not give an initial distribution of the
Markov chain X. We are in fact interested in another multidimensional Markov
chain Y (whose dimension is r) that keeps track of the number of individuals at each
state of X (Why is Y Markov and what is its transition probabilitiy?). The problem
thus gives the initial distribution of Y (what is it?) and asks us to find the general
distribution of Y at any time n.
Let’s analyze the first step for n = 1. From the point of view of a fixed state j, the
number of potential individuals coming from state i is Poisson(λi ). Of those, there
are two types : the ones that actually make it and the ones that don’t ( think of the
yoga studio problem with male and female customers again ). For any individual from
state i, the probability that they make it to state j is Pij . Therefore the distribution
of individuals coming to state j from state i is Poisson(λi Pij ). A similar argument
applies to all other states r 6= i. Therefore, the distribution of individuals at state j
P
after one step is Poisson( i λi Pij ).
A similar argument also applies to n time step. Thus the distribution of Ynj the
number of individuals at state j after n steps is Poisson( i λi Pijn ).
P

Example 2.4. Birth-death process example revisited


In the hotel example above, find the expected number of families in the hotel on
day n given that there were i families on the first day.
Ans : The question asks for E(Xn |X0 = i). We have

E(Xn |X0 ) = E(E(Xn |Xn−1 )|X0 )


= E(Xn−1 + λ − pXn−1 |X0 )
= E((1 − p)Xn−1 + λ|X0 ).

Repeating this relation one more time gives

E(Xn |X0 ) = E((1 − p)((1 − p)Xn−2 + λ) + λ|X0 )


= E((1 − p)2 Xn−2 + λ(1 + (1 − p))|X0 ).

7
Thus we can see that
n−1
X
E(Xn |X0 ) = λ (1 − p)i + (1 − p)n X0
i=0
1 − (1 − p)n
= λ + (1 − p)n X0 .
p
Example 2.5. Transforming a transition matrix using absorbing state
Balls are successively distributed equally likely among 8 urns. What is the prob-
ability that there are exactly 3 nonempty urns after 9 balls have been distributed?
Ans: This should remind us of the key collision problem. It is clear that we want
to let Xk be the number of nonempty urns at time k, with state space i = 1, · · · , 8.
Then we see that
i
P (Xk+1 = i|Xk = i) =
8
8−i
P (Xk+1 = i + 1|Xk = i) = .
8
We can write down the 8 × 8 transition matrix and take it to the 9th power. On
the other hand, this is a non-decreasing Markov chain, and so we can treat state 4
and above as absorbing state (since we’re only interested in state 3 probability). The
modified transition matrix with 4 states is:
 
1/8 7/8 0 0
 0 2/8 6/8 0 
P = ,
 
 0 0 3/8 5/8 
0 0 0 1
with initial distribution
 
1
 0 
v0 =  .
 
 0 
0
Raising the transition matrix to the 8th power (instead of 9th since we do not include
the state i = 0 into the transition matrix) and look at the first row, we have
 
0
 0 
v0T P 8 =  .
 
 0.00757 
0.9923

8
Thus the probability that there are exactly 3 non-empty urns after 9 balls have been
distributed is 0.00757. Note that it is almost guaranteed that 4 or more urns are
non-empty after 9 balls have been distributed.

3 Absorbing probability before time N


The last example in the previous section gives a general technique to approach the
following question : given a sub-collection of states A, and a starting state i ∈
/ A we
want to ask the probability that Xk reaches A before a time horizon N. That is

P (Xk ∈ A, for some k = 1, 2, · · · , N |X0 = i) =?

This question is not easy to keep track of in terms of the original transition matrix.
On the other hand, we can transform the original Markov chain Xk into a more
suitable one Yk where the state space of Y is the same as X for those not in A plus
one absorbing “state” A. Let P X , P Y be the transition matrices of X, Y respectively.
Then

PijY = PijX , i, j ∈ /A
X
Y
PiA = PijX , i ∈
/A
j∈A
Y
PAA = 1.

The original probability in question can be answered as

P (Xk ∈ A, some 1 ≤ k ≤ N |X0 = i) = P (YN = A|Y0 = i) = (P Y )N


iA .

Example 3.1. Understanding the absorbing probability


Suppose a Markov chain Xk with 5 possible states {1, 2, 3, 4, 5} has the following
transition probability matrix
 
1/5 1/5 1/5 1/5 1/5
 0 1/3 1/6 1/3 1/6 
 
 
 0 1/2 1/2 0 0 
 
 0 1/2 0 1/2 0 
 
1/2 0 0 0 1/2
We are interested in the questions of the type

P (Xk ≥ 3, some k = 1, 2, · · · , n|X0 = 1),

9
for certain value of n. From the discussion above, we view states {3, 4, 5} as absorbing
and for convenience just relabel it as state 3. We’ll also refer to this ”new” chain as
Yk . The new transition matrix becomes
 
1/5 1/5 3/5
 0 1/3 2/3 
 

0 0 1

and answer to the above question in general is just (Qn )13 . That is we also assert that

P (Xk ≥ 3, some k = 1, 2, · · · , n|X0 = 1) = P (Yn = 3|Y0 = 1)


= P (Yk = 3, some k = 1, 2, · · · , n|Y0 = 1).

Let’s take a closer look into why this is true.


For n = 1
5
X
P (X1 ≥ 3|X0 = 1) = P1j = Q13 = P (Y1 = 3|Y0 = 1)
j=3

For n = 2

P (X1 ≥ 3 or X2 ≥ 3|X0 = 1) = P (X1 ≥ 3|X0 = 1) + P (X2 ≥ 3, X1 < 3|X0 = 1)


X5 2 X
X 5
= P1j + P1i Pij
j=3 i=1 j=3
5
X 2
X 5
X
= P1j + P1i Pij
j=3 i=1 j=3
2
X
= Q13 + Q1i Qi3
i=1
3
X
= Q1i Qi3 because Q33 = 1
i=1
2
= (Q )13 = P (Y2 = 3|Y9 = 1).

Suppose this is true up to case n. For case n + 1

P (Xk ≥ 3, some k = 1, 2, · · · , n + 1|X0 = 1) = P (Xn+1 ≥ 3; Xi < 3, i = 1, 2, · · · , n|X0 = 1)


+ P (Xk ≥ 3, some k = 1, · · · , n|X0 = 1).

10
Now

P (Xn+1 ≥ 3; Xi < 3, i = 1, 2, · · · , n|X0 = 1) = P (Yn+1 = 3; Yi < 3, i = 1, 2, · · · , n|Y0 = 1)


P (Xk ≥ 3, some k = 1, · · · , n|X0 = 1) = P (Yn = 3|Y0 = 1) = (Qn )13

Because state 3 is absorbing for Y

P (Yn+1 = 3; Yi < 3, i = 1, 2, · · · , n|Y0 = 1) = P (Yn+1 = 3, Yn < 3|Y0 = 1)


= P (Yn+1 = 3|Yn = 2)P (Yn = 2|Y0 = 1)
+ P (Yn+1 = 3|Yn = 1)P (Yn = 1|Y0 = 1)
= (Qn )11 Q13 + (Qn )12 Q23 .

Putting it together, we have

P (Xk ≥ 3, some k = 1, 2, · · · , n + 1|X0 = 1) = (Qn )13 Q33 + (Qn )11 Q13 + (Qn )12 Q23
( again because Q33 = 1)
= (Qn+1 )13 .

Finally we want to address the question

P (Xk ≥ 3, some k = 1, 2, · · · , n|X0 = j), j = 3, 4, 5.

Clearly the previous technique described in this example does not work, because
states {3, 4, 5} are not actually absorbing for Xk . The solution is to compute what
happens to X1 and condition on that. For simplicity suppose j = 3. We have

P (Xk ≥ 3, some k = 1 · · · , n|X0 = j) = P (Xk ≥ 3, some k = 2, · · · , n, X1 < 3|X0 = j)


+ P (X1 ≥ 3|X0 = j).

Now
2
X
P (Xk ≥ 3, some k = 2, · · · , n, X1 < 3|X0 = j) = P (Xk ≥ 3, some k = 2, · · · , n|X1 = i) ×
i=1
P (X1 = i|X0 = j)

and we can use our previous technique to deal with this computation.

Example 3.2. A pensioner receives 2 (thousand) dollars at the beginning of each


month. He spends an amount of i = 1, 2, 3, 4 during the month with probability pi ,

11
independent of how much he currently has. Also if he has more than 3 at the end of a
month he gives the excess amount to his son. Suppose he has 5 at the beginning of a
month. What is the probability that he has 1 or less within the following 4 months?
Ans: Let Xk be the amount he has at the beginning of each month. Xk is like
a random walk with step size −2, −1, 0, 1 with the difference that Xk ≤ 3. In fact
we have P33 = p1 + p2 . We treat state 1 as an absorbing state. Then X has only
i = 1, 2, 3 as possible states and we have the following transition matrix
 
1 0 0
P =  p3 + p4 p2 p1 .
 

p4 p3 p1 + p2
Finally as he starts at 5 during the first month he will end up with 3. Thus we have
 
0
v0 =  0  .
 

The answer is either v0T P 3 or v0T P 4 , depending on how one counts “the following 4
months.”

Example 3.3. Absorbing probability with constraint on the final step


A Markov chain has 5 states i = 1, 2, 3, 4, 5. What is

P (X4 = 2, X3 ≤ 2, X2 ≤ 2, X1 ≤ 2|X0 = 1)?

Ans: This question clearly refers to states {3, 4, 5} as an absorbing state with an
additional specification that the chain ends up at state 2 at step 4. First, it seems
easier to compute

P (X4 ≤ 2, X3 ≤ 2, X2 ≤ 2, X1 ≤ 2|X0 = 1)

because the complement of that is

P (Xk > 2, for some k = 1, · · · , 4|X0 = 1).

The verbal interpretation of the above quantity is the probability that the chain ever
enters states 3, 4 , 5 in steps 1 to 4, starting at state 1. The new transition matrix is
 P5 
p11 p12 i=3 p1i
P5
Q =  p21 p22 i=3 p2i  .
 

0 0 1

12
Then

P (X4 ≤ 2, X3 ≤ 2, X2 ≤ 2, X1 ≤ 2|X0 = 1) = 1 − (Q4 )13 .

On the other hand, since X4 = 2 is the last step in consideration, we actually can use
the same set up as above, treating {3, 4, 5} as an absorbing state. In fact, it is just

P (X4 = 2, X3 ≤ 2, X2 ≤ 2, X1 ≤ 2|X0 = 1) = (Q4 )12 .

The verbal interpretation of the above quantity is the probability that the chain never
enters into states 3, 4 , 5 in steps 1 to 4 AND also end up at step 2 in state 4. Note
how the question will be very different if we were to ask

P (X4 ≤ 2, X3 = 2, X2 ≤ 2, X1 ≤ 2|X0 = 1) =?

Example 3.4. Absorbing probability with general constraint on the last step
We see from the previous example that if A is an absorbing state then for i, j ∈
/A

/ A, k = 1, · · · , m − 1|X0 = i) = (Qm )ij .


P (Xm = j, Xk ∈

What about the case when j ∈ A or i ∈ A ? (remember here that A is only formally
an absorbing state so if i ∈ A the original chain can “escape” from A)
First, suppose j ∈ A and i ∈ / A. Then Qm−1 represents all possible outcomes of
the chain at the m − 1 step, not getting absorbed in A. All we need to do is from the
step m − 1 compute the probability of arriving at state j from all states outside A
using the original transition matrix P. That is
X
P (Xm = j, Xk ∈
/ A, k = 1, · · · , m − 1|X0 = i) = Qm−1
ir Prj .
r∈A
/

Now suppose i ∈ A. In this case we can move to an initial distribution vector v1


where v1 is the probability of moving from state i to a state j outside of A in one
step.Then we can repeat the above procedure with m − 1 steps. Namely the answer
is
X
P (Xm = j, Xk ∈/ A, k = 2, · · · , m|X1 = r)Pir
r∈A
/
X
= P (Xm−1 = j, Xk ∈
/ A, k = 1, · · · , m − 1|X0 = r)Pir
r∈A
/

13

You might also like