KEMBAR78
Math 183 Spring 2012 Notes | PDF | Equations | Mathematics
0% found this document useful (0 votes)
41 views67 pages

Math 183 Spring 2012 Notes

Uploaded by

rexonjs10
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views67 pages

Math 183 Spring 2012 Notes

Uploaded by

rexonjs10
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 67

Mathematical Modeling

Preliminary Lecture Notes

Adolfo J. Rumbos
c Draft date March 23, 2012
2
Contents

1 Preface 5

2 Modeling Process 7
2.1 Constructing Models . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.1.1 Conservation Principles . . . . . . . . . . . . . . . . . . . 8
2.1.2 Constitutive Equations . . . . . . . . . . . . . . . . . . . . 8
2.2 Example: Bacterial Growth in a Chemostat . . . . . . . . . . . . 8
2.3 Analysis of Models . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3.1 Nondimensionalization . . . . . . . . . . . . . . . . . . . . 11

3 Continuous Deterministic Models 17


3.1 Example: Modeling Traffic Flow . . . . . . . . . . . . . . . . . . 17
3.2 Analysis of the Traffic Flow Equation . . . . . . . . . . . . . . . 20

4 Stochastic Models 35
4.1 Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.1.1 A Brief Excursion into Probability . . . . . . . . . . . . . 37
4.1.2 Discrete Random Variables . . . . . . . . . . . . . . . . . 40
4.1.3 The Binomial Distribution . . . . . . . . . . . . . . . . . . 42
4.1.4 Expected Value . . . . . . . . . . . . . . . . . . . . . . . . 48
4.1.5 The Poisson Distribution . . . . . . . . . . . . . . . . . . 49
4.1.6 Estimating Mutation Rates in Bacterial Populations . . . 52
4.1.7 Another Brief Excursion into Probability . . . . . . . . . 53
4.1.8 Continuous Random Variables . . . . . . . . . . . . . . . 56
4.2 Random Processes . . . . . . . . . . . . . . . . . . . . . . . . . . 61

3
4 CONTENTS
Chapter 1

Preface

The main goal of this course is to provide opportunities for students to con-
struct and analyze mathematical models that arise in the physical, biological
and social sciences. Mathematical models are usually created in order to obtain
understanding of problems and situations arising in the real world; other times,
the main goal is to make predictions or to control certain processes; finally, some
models are created in order to aid in decision making.
Construction of a mathematical model consists of translating a real world
problem into a mathematical problem involving parameters, variables, functions,
equations and/or inequalities. Analysis of the model involves the solution (if
possible) of the mathematical problem through logical, algebraic, analytical or
computational means, and assessing what the solutions imply about the real
situation under study. If an analytical or computational solution is not possible,
computer simulations can sometimes be used in order to study various scenarios
implied or predicted by the model.
Analysis techniques can be drawn from many areas of mathematics. In
this course, it is assumed that students have a good working knowledge of
Calculus, Linear Algebra and Ordinary Differential Equations. These areas are
adequate for the analysis of some models. However, many modeling situations
require the use of some probability theory and linear programming. These
mathematical topics will be covered in the course. In calculus and differential
equations courses, students have been exposed to some continuous models. In
this course, we will also introduce students to discrete models as well.

5
6 CHAPTER 1. PREFACE
Chapter 2

The Process of Modeling

In this course we identify three stages in the process of mathematical modeling:

1. Construction of the model.

2. Analysis of the model.

3. Testing of the model.

We will get a chance to go through at least the first two stages listed above in
a variety of case studies or examples.
Of course, the modeling process always begins with a question that we want
to answer, or a problem we have to solve. Often, asking the right questions and
posing the right problems can be the hardest part in the modeling process. This
part of the process involves getting acquainted with the intricacies of the science
involved in the particular question at hand. It is unrealistic to expect that a
mathematical modeling course will teach students to do this in a systematic way.
The best we can do is to present many case studies and examples of real life
modeling situations that mathematicians have analyzed in various situations.
One of the goals of the course is to have students grapple with this issue when
working with a specific problem in a term project that will involve a large portion
of the course.

2.1 Constructing Models


Model construction involves the translation of a scientific question or problem
into a mathematical one. The hope here is that answering the mathematical
question, or solving the mathematical problem, if possible, might shed some light
in the understanding of the situation being studied. In a physical science, this
process is usually attained through the use of established scientific principles or
laws that can be stated in mathematical terms. In general, though, we might not
have the advantage of having at our disposal a large body of scientific principles.
This is particularly the case if scientific principles have not been discovered yet

7
8 CHAPTER 2. MODELING PROCESS

(in fact, the reason we might be resorting to mathematical modeling is that,


perhaps, mathematics can aid in the discovery of those principles).

2.1.1 Conservation Principles


There are, however, a few general and simple principles that can be applied in a
variety of situations. For instance, in this course we’ll have several opportunities
to apply conservation principles. These are rather general principles that
can be applied in situations in which the evolution in time of the quantity of
a certain entity within a certain system is studied. For instance, suppose the
quantity of a certain substance confined within a system is given by a continuous
function of time, t, and is denoted by Q(t) (the assumption of continuity is one
that needs to be justified by the situation at hand). A conservation principle
states that the rate at which a the quantity Q(t) changes has to be accounted
by how much of the substance goes into the system and how much of it goes
out of the system. For the case in which Q is also assumed to be differentiable
(again, this is a mathematical assumption that would need some justification),
the conservation principle can be succinctly stated as

dQ
= Rate of Q in − Rate of Q out. (2.1)
dt
In this case, the conservation principle might lead to a differential equation, or
a system of differential equations, and so the theory of differential equations can
be used to help in the analysis of the model.

2.1.2 Constitutive Equations


The right–hand side of the equation in (2.1) requires further modeling; in other
words, we need to postulate a kind of functional form for the rates in the right–
hand side of (2.1). This might take the general form of rewriting the equation
in (2.1) as
dQ
= f (t, Q, λ1 , λ2 , . . . , λp ), (2.2)
dt
where {λ1 , λ2 , . . . , λp } is a collection of parameters that are relevant to the real–
life problem being modeled. The functional form of the right–hand side in (2.2)
may be obtained from empirical or theoretical of relations between variables
usually called constitutive equations.
In the next subsection we present the first case study of the course in which
we see the two elements outlined above in the construction of models.

2.2 Example: Bacterial Growth in a Chemostat


This example presented in this subsection is discussed on page 121 of [EK88].
The diagram in Figure 2.2.1 shows schematically what goes on in a chemostat
that is used to harvest bacteria at a constant rate. The box in the top–left
2.2. EXAMPLE: BACTERIAL GROWTH IN A CHEMOSTAT 9

co

F -
N (t)
Q(t)
F
-

Figure 2.2.1: Chemostat

portion of the diagram in Figure 2.2.1 represents a stock of nutrient at a fixed


concentration co , in units of mass per volume. Nutrient flows into the bacterial
culture chamber at a constant rate F , in units of volume per time. The chamber
contains N (t) bacteria at time t. The chamber also contains an amount Q(t) of
nutrient, in units of mass, at time t. If we assume that the culture in the chamber
is kept well–stirred, so that there are no spatial variations in the concentration
of nutrient and bacteria, we have that the nutrient concentration is a function
of time given by
Q(t)
c(t) = , (2.3)
V
where V is the volume of the culture chamber. If we assume that the culture in
the chamber is harvested at a constant rate F , as depicted in the bottom–right
portion of the diagram in Figure 2.2.1, the volume, V , of the culture in equation
(2.3) is fixed.
We will later make use of the bacterial density,
N (t)
n(t) = , (2.4)
V
in the culture at time t.
The parameters, co , F and V , introduced so far can be chosen or adjusted.
The problem at hand, then, is to design a chemostat system so that
1. The flow rate, F , will not be so high that the bacteria in the culture will
be washed out, and
2. the nutrient replenishment, co , is sufficient to maintain the growth of the
colony.
In addition to assuming that the culture in the chamber is kept well–stirred
and that the rate of flow into and out of the chamber are the same, we will also
make the following assumptions:
10 CHAPTER 2. MODELING PROCESS

1. The bacterial colony depends on only one nutrient for growth;


2. The growth rate of the bacterial population is a function of the nutri-
ent concentration; in other words, the per–capita growth rate, K(c), is a
function of c.
We will apply a conservation principles to the quantities N (t) and Q(t) in
the growth chamber. For the number of bacteria in the culture, the conservation
principle in (2.1) reads:

dN
= Rate of N in − Rate of N out. (2.5)
dt
We are assuming here that N is a differentiable function of time. This assump-
tion is justified if
(i) we are dealing with populations of very large size so that the addition (or
removal) of a few individuals is not very significant; for example, in the
case of a bacterial colony, N is of the order of 106 cells per milliliter;
(ii) ”there are no distinct population changes that occur at timed intervals,”
see [EK88, pg. 117].
Using the constitutive assumption stated previously, we have that

Rate of N in = K(c)N, (2.6)

since K(c) is the per–capita growth rate of the bacterial population.


Since culture is taken out of the chamber at a rate F , we have that

Rate of N out = F n, (2.7)

where n is the bacterial density defined in (2.4). We can therefore re–write (2.5)
as
dN F
= K(c)N − N. (2.8)
dt V
Next, apply the conservation principle (2.1) to the amount of nutrient, Q(t),
in the chamber, where
Rate of Q in = F co , (2.9)
and
Rate of Q out = F c + αK(c)N, (2.10)
where we have introduced another parameter α, which measures the fraction of
nutrient that is being consumed as a result of bacterial growth. The reciprocal
of the parameter α,
1
Y = , (2.11)
α
measures the number of cells produced due to consumption of one unit of nu-
trient, and is usually referred to as the yield.
2.3. ANALYSIS OF MODELS 11

Combining (2.10), (2.9) and (2.1) we see that the conservation principle for
Q takes the form
dQ
= F co − F c − αK(c)N. (2.12)
dt
Using the definition of c in (2.3) we can re–write (2.12) as
dQ F
= F co − Q − αK(c)N. (2.13)
dt V
The differential equations in (2.8) and (2.13) yield the system of differential
equations 
dN F
 dt = K(c)N − V N ;



(2.14)

 dQ F

 = F co − Q − αK(c)N.
dt V
Thus, application of conservation principles and a few constitutive assumptions
has yielded a system of ordinary differential equations (2.14) for the variables N
and Q in the chemostat system. We have therefore constructed a preliminary
mathematical model for bacterial growth in a chemostat.
Dividing the equations in (2.14) by the fixed volume, V , of the culture in
the chamber, we obtain the following system of ordinary differential equations
for the bacterial population density, n(t), and the nutrient concentration, c(t).

dn F
 dt = K(c)n − V n;



(2.15)

 dc F F

 = co − c − αK(c)n.
dt V V
Thus, we have arrived at a mathematical model that describes the evolution
in time of the population density and nutrient concentration in a chemostat
system. We will analyze the system in (2.15) in subsequent sections.

2.3 Analysis of Models


In the process of constructing the differential equations model expressed in the
system in (2.15) we made several simplifying assumptions; for instance, we as-
sumed that the mixture in the culture is well-stirred and that the volume V
is fixed, so that the bacterial density and nutrient concentration are functions
of a single variable, t. We also assumed that these are differentiable functions.
Simplification is an important part of the modeling process; otherwise the math-
ematical problem might be intractable.

2.3.1 Nondimensionalization
In this section we illustrate yet another way to simplify the problem which
consists of introducing dimensionless variables (variables without units). This
12 CHAPTER 2. MODELING PROCESS

process is known as nondimensionalization, and it has the added benefit of de-


creasing the number of parameters in the system, reducing thereby the complex-
ity of the problem. We illustrate this procedure in the analysis of the chemostat
system in (2.15).
Note that the system in (2.15) has four parameters; namely, co , F , V and α.
Before proceeding with the analysis of system (2.15), we will use the constitutive
equation
bc
K(c) = , (2.16)
a+c
where a and b are two additional positive parameters. Thus, the system in
(2.15) becomes 
dn bnc F
 dt = a + c − V n;



(2.17)

 dc F F αbnc

 = co − c − ,
dt V V a+c
with six parameters. A procedure that consolidates the set of parameters into
a smaller set will greatly simplify the analysis.
The constitutive equation in (2.16) is borrowed from the Michaelis–Menten
theory of enzyme kinetics. It models the per–capita growth rate, K(c), of the
bacterial population as an increasing function of the nutrient concentration with
a limiting value b; hence, b has the units of a per–capita growth rate, namely,
1/time. The parameter a has units of concentration (mass/volume), and it
represents the value of the nutrient concentration at which the per–capita growth
rate, K(c), is half of its limiting value. Figure 2.3.2 shows a sketch of the graph
of K as a function of c as given by (2.16).

b
2

a c

Figure 2.3.2: Sketch of Graph of K(c)

Nondimensionalizing the system in (2.17) consists of introducing new vari-


ables, n
b, b
c and τ , to replace n, c and t, respectively, in such a way that n
b, b
c and
τ have no units, or dimensions. This can be achieved by scaling the variables
n, c and t by appropriate scaling factors so that the units will cancel out. For
instance, we can scale c by the parameter a, since they have the same units, to
2.3. ANALYSIS OF MODELS 13

get
c
c=
b . (2.18)
a
It is not clear at the moment what the scaling factor for n and t should be, so
we shall denote them by µ and λ, respectively. We then have that,
n
n
b= , (2.19)
µ
and
t
τ= , (2.20)
λ
where µ has units of bacterial density (cells/volume), and λ has units of time.
Next, we find expressions for the derivatives
db
n db
c
and . (2.21)
dτ dτ
To find the expressions in (2.21) we need to apply the Chain Rule; for instance,
db
n db
n dt
= · (2.22)
dτ dt dτ
To compute the right–hand side of (2.22), we use (2.19) and (2.20) to obtain
from (2.22) that
db
n λ dn
= . (2.23)
dτ µ dt
dn
Next, substitute the expression for in the first equation in (2.17) into the
dt
right–hand side of (2.23) to obtain
 
db
n λ bnb
c F
= − n , (2.24)
dτ µ 1+bc V
where we have also used the expression for b
c in (2.18). Distributing on the
right–hand side of (2.24) we obtain
db
n n c λF

bb
= λb n
b, (2.25)
dτ 1+b c V
where we have used (2.19).
We will now choose λ so that
λF
= 1, (2.26)
V
from which we get that
V
λ= (2.27)
F
is our scaling factor for t; observe that the parameter λ in (2.27) has units of
time.
14 CHAPTER 2. MODELING PROCESS

Next, we consolidate the parameters b and λ into a single parameter, which


will call α1 , by the formula
λb = α1 , (2.28)
which yields
bV
α1 = , (2.29)
F
by the use the definition of λ in (2.27). Note that α1 in (2.32) is dimensionless
since b has units of 1/time. Combining (2.25), (2.26) and (2.28), we obtain the
dimensionless differential equation

db
n n c
−n
bb
= α1 b. (2.30)
dτ 1+b c
Similar calculations (see Problem 1 in Assignment 2) show that

db
c n c
= α2 − −b
bb
c, (2.31)
dτ 1+b c
where we have set
co
α2 = (2.32)
a
and
αbλµ
= 1,
a
so that
a
µ= . (2.33)
αbλ
Note that the parameter α2 in (2.32) is dimensionless and that that the units
of µ defined in (2.33) are cells/volume.
Putting together the equations in (2.30) and (2.33) we obtain the system

dbn n c
−n
bb
 dτ = α1 1 + b b;


c

(2.34)

 dbc n c
= α2 − −b
b b

 c,
dτ 1+b c
in the nondimensional variables n
b, b
c and τ defined in (2.19), (2.18) and (2.20),
respectively. Observe that the system in (2.34) contains two dimensionless pa-
rameters, α1 and α2 , as opposed to the six parameters in the original system
in (2.17). This reduction in the number of parameters greatly simplifies the
problem in two aspects:

1. the mathematical calculations are simpler to perform;

2. the dimensionless parameters, α1 and α2 , consolidate the information con-


tained in the six original parameters, and this makes the analysis easier
to carry out.
2.3. ANALYSIS OF MODELS 15

For instance, the equilibrium points of the system in (2.34) are expressed in
terms of the parameters α1 and α2 as follows
   
1 1
(0, α2 ) and α1 α2 − , . (2.35)
α1 − 1 α1 − 1

In order to obtain biologically feasible equilibrium solutions, we must require


that
α1 > 1 (2.36)
and
1
α2 > . (2.37)
α1 − 1
In terms of the original parameters, conditions (2.36) and (2.37) translate into

F < bV

and
aF
co > ,
bV − F
respectively.
The equilibrium solution (0, α2 ) in (2.35) is referred to as the “washout”
solution, since all the bacteria washed out because of the flow; while the second
solution in (2.35) is the “survival” solution.
Stability analysis of the dimensionless system in (2.34) will reveal further
conditions that determine whether the chemostat system will yield a sustainable
crop of bacteria. Some of this analysis is carried out in Assignment 2.
16 CHAPTER 2. MODELING PROCESS
Chapter 3

Continuous Deterministic
Models

The chemostat model discussed in the previous chapter is an example of a


continuous model—all the variables in question, N , Q and t, are assumed to
be continuous variables (in fact, we assumed that N and Q are differentiable
functions of t). Application of conservation principles, in that case, led to a
system of ordinary differential equations, which also makes the chemostat model
discussed in the previous chapter a deterministic model—the solutions to the
model are completely determined by the parameters in the model and by the
initial conditions; in particular, a given set of initial conditions and parameters
give rise to the same predictions every time the model is run.
In this chapter we present another kind of deterministic model that involves
variables that depend continuously on more than one variable. We will again
apply a conservation principle; however, in this case we will get a partial differ-
ential equation model. In subsequent sections in this chapter we will present an
approach to analysis of the equations that result from this process. In particular,
we will see an application to modeling traffic flow.

3.1 Example: Modeling Traffic Flow


Consider the unidirectional flow of traffic in a one–lane, straight road depicted
in Figure 3.1.1. In this idealized road, vehicles are modeled by moving points.
The location, x, of a point–vehicle is measured from some reference point along
an axis parallel to the road. We postulate a traffic density, ρ(x, t), measured in
units of number of cars per unit length of road at location x and time t. We
interpret ρ(x, t) as follows: Consider a section of the road from x to x + ∆x
at time t. Let ∆N ([x, x + ∆x], t) denote the number of cars in the section
[x, x + ∆x] at time t. We define ρ(x, t) by the expression
∆N ([x, x + ∆x], t)
ρ(x, t) = lim , (3.1)
∆x→0 ∆x

17
18 CHAPTER 3. CONTINUOUS DETERMINISTIC MODELS

v1 r - v2 r -

Figure 3.1.1: One–lane unidirectional flow

provided that the limit on the right–hand side of (3.1) exists.


Next, consider a section of the road from x = a to x = b pictured in Figure
3.1.2. Using the traffic density, ρ(x, t), we calculate the number of vehicles in

x=a x=b

Figure 3.1.2: Section of road from x = a to x = b

the section [a, b] at time t to be given by the integral


Z b
N (t) = ρ(x, t) dx. (3.2)
a

If we assume that density, ρ(x, t), is a differentiable function, we can state a


conservation principle for the number of cars in section [a, b] as follows:

dN
= Number of cars entering at a − Number of cars leaving at b. (3.3)
dt
We can re–write the conservation principle in (3.3) more succinctly by postulat-
ing a traffic flux function, q(x, t), which measures the number of cars crossing
location x per unit of time at time t. Using the traffic flux function we can then
re–write (3.3) as
dN
= q(a, t) − q(b, t),
dt
or
dN
= −[q(b, t) − q(a, t)]. (3.4)
dt
Assuming that the flux function is differentiable and that its partial derivatives
are continuous, we can invoke the Fundamental Theorem of Calculus to re–write
(3.4) as
Z b
dN ∂
=− [q(x, t)] dx. (3.5)
dt a ∂x
3.1. EXAMPLE: MODELING TRAFFIC FLOW 19

Next, assume that the traffic density, ρ, has continuous partial derivatives
to obtain from (3.2) that
Z b
dN ∂
= [ρ(x, t)] dx. (3.6)
dt a ∂t

Combining (3.6) and (3.5) we then see that the conservation principle in (3.4)
now takes the form
Z b Z b
∂ ∂
[ρ(x, t)] dx = − [q(x, t)] dx. (3.7)
a ∂t a ∂x
Rewrite the equation in (3.7) as
Z b Z b
∂ ∂
[ρ(x, t)] dx + [q(x, t)] dx = 0,
a ∂t a ∂x
or Z b  
∂ ∂
[ρ(x, t)] + [q(x, t)] dx = 0, (3.8)
a ∂t ∂x
and observe that the points a and b are arbitrary. Since, we are assuming that
the partial derivatives of ρ and q are continuous, we can show (see Assignment
4) that, given that (3.8) holds true for all intervals [a, b], then we must have
that
∂ ∂
[ρ(x, t)] + [q(x, t)] = 0. (3.9)
∂t ∂x
The equation in (3.9) is an example of a partial differential equation or
PDE. It is usually written in a more compact form
∂ρ ∂q
+ = 0, (3.10)
∂t ∂x
or
ρt + qx = 0, (3.11)
where the subscripts in (3.11) indicate partial derivatives with respect to the
subscripted variables.
Ideally, we would like to find a solution, ρ, to (3.10) subject to some initial
condition
ρ(x, 0) = ρo (x), (3.12)
for some initial traffic density profile, ρo , along the road.
Before we proceed any further, we need to model the traffic flux, q(x, t).
Imagine a vehicle at x is moving with a velocity v. Then, for a short time
interval, [t, t + ∆t] the car moves a distance approximately give by

∆x ≈ v∆t.

The number of cars in the section [x, x + ∆x] is then, approximately, given by

∆N ≈ ρ(x, t)v∆t. (3.13)


20 CHAPTER 3. CONTINUOUS DETERMINISTIC MODELS

Dividing (3.13) by ∆t and letting ∆t → 0, we obtain from (3.13) the rate of


cars crossing the location x at time t; in other words, the traffic flux, q(x, t).
We therefore obtain the constitutive equation

q = vρ. (3.14)

Substituting the expression for q into (3.10) we obtain


∂ρ ∂
+ [vρ] = 0, (3.15)
∂t ∂x
The next step is to model the velocity v in (3.15). It is reasonable to assume
that v is a function of traffic density—the higher the density, the lower the traffic
speed. We may therefore write

v = f (ρ, Λ), (3.16)

where f is a continuous function of ρ and a set of parameters, Λ. Some of the


parameters might be a maximum density, ρmax , dictated by bumper to bumper
traffic, and a maximum speed, vmax ; for instance, vmax is a speed limit. Given
the parameters ρmax and vmax , the simplest model for the relationship between
v and ρ is the constitutive equation
 
ρ
v = vmax 1 − . (3.17)
ρmax

Later in this course we shall shall see how to derive expressions like (3.17)
relating traffic velocity to traffic density from theoretical considerations; for
now, we simply note that it models the intuitive notion that we stated above
regarding the traffic velocity being low for high traffic density.
The partial differential equation model for traffic flow presented in this sec-
tion, based on the conservation equation in (3.15) and a constitutive relation
for the traffic velocity, v, and the traffic density ρ (of which (3.17) is just an
example), was first introduced by Lighthill and Whitman in 1955 (see [LW55]);
it was also treated by Richards in 1956, [Ric56].

3.2 Analysis of the Traffic Flow Equation


In this section we outline an analysis of the equation in (3.15) with v as given
in (3.17); namely,
   
∂ρ ∂ ρ
+ vmax 1 − ρ = 0. (3.18)
∂t ∂x ρmax
We are interested in answering the question of whether the partial differential
equation in (3.18) has a solution satisfying an initial condition of the form in
(3.12); namely,
ρ(x, 0) = ρo (x), (3.19)
3.2. ANALYSIS OF THE TRAFFIC FLOW EQUATION 21

where ρo is some given continuous function of x.


Before we proceed with the analysis, we nondimensionalize the equation in
(3.18) by introducing dimensionless variables
ρ x t
u= , x
b= , and t= ,
b (3.20)
ρmax L τ
where L and τ are some characteristic lengths and time scaling factors, respec-
tively. It follows from the Chain Rule and the last two equations in (3.20)
that
∂ρ ∂ρ ∂b x
= ,
∂x ∂bx ∂x
or
∂ρ 1 ∂ρ
= , (3.21)
∂x L ∂b x
and
∂ρ 1 ∂ρ
= . (3.22)
∂t τ ∂b t
Substituting the equations in (3.21) and (3.22) into (3.18) and using the first
equation in (3.20), we obtain

1 ∂ρ vmax ∂
+ [(1 − u) ρ] = 0. (3.23)
τ ∂b
t L ∂bx
Next, divide the equation in (3.23) and use the first equation in (3.20) to get

∂u vmax τ ∂
+ [(1 − u) u] = 0, (3.24)
∂b
t L ∂b x
where we have also multiplied by τ . Setting
vmax τ
= 1,
L
or, equivalently, choosing the time scale τ to be L/vmax , we see that we can
re–write (3.24) as
∂u ∂
+ [(1 − u) u] = 0. (3.25)
∂tb ∂b
x
The equation in (3.25) is now in dimensionless form. If the original time and
space variables, t and x, are assumed to be in units of τ and L, respectively, we
can then rewrite the equation in (3.25) as

∂u ∂
+ [(1 − u) u] = 0, (3.26)
∂t ∂x
where u, x and t represent real (dimensionless) variables. The equation in (3.26)
is the one we’ll be analyzing for the remainder of this section.
Set
g(u) = u(1 − u); (3.27)
22 CHAPTER 3. CONTINUOUS DETERMINISTIC MODELS

then, the partial differential equation in (3.26) can be written in the form

∂u ∂
+ [g(u)] = 0, (3.28)
∂t ∂x
or
∂u ∂u
+ g 0 (u) = 0. (3.29)
∂t ∂x
We will describe here a general procedure for analyzing the equation in (3.28)
for the case in which g is a differentiable function. We begin by presenting the
simplest example of a linear function

g(u) = cu,

for some constant c. The equation in (3.29) then becomes the linear first order
partial differential equation
∂u ∂u
+c = 0. (3.30)
∂t ∂x
We will show how to find a solution to the differential equation in (3.30) subject
to the initial condition
u(x, 0) = f (x), (3.31)
where f is some differentiable function of a single variable.
The problem of determining a solution of the differential equation in (3.30)
subject to the condition in (3.31) is an example of initial value problem and is
usually written in a the more compact form
∂u ∂u


 +c = 0, x ∈ R, t > 0;
∂t ∂x (3.32)

x ∈ R.

u(x, 0) = f (x),

Example 3.2.1 (The Method of Characteristic Curves). We can arrive at a so-


lution to the initial value problem in (3.32) by using the method of characteristic
curves.
A characteristic curve in the xt–plane, parametrized by

t 7→ (x(t), t), for t ∈ R, (3.33)

is obtained as follows. First, evaluate the function u on the curve in (3.33) to


obtain a real–valued function of a single variable

t 7→ u(x(t), t), for t ∈ R. (3.34)

Differentiate with respect to t the function defined in (3.34), using the Chain
Rule, to obtain
d ∂u dx ∂u dt
[u(x(t), t)] = + ,
dt ∂x dt ∂t dt
3.2. ANALYSIS OF THE TRAFFIC FLOW EQUATION 23

which we can re–write as


du ∂u dx ∂u
= + . (3.35)
dt ∂t dt ∂x
Comparing the right–hand side of the equation in (3.35) with the left–hand side
of the partial differential equation in (3.32), we see that if we choose the curve
in (3.33) so tat
dx
= c, (3.36)
dt
then the equation in (3.35) turns into

du
= 0. (3.37)
dt
The ordinary differential equation in (3.36) defines a family of curves in the
xt–plane give by
x = ct + k, (3.38)
where k is a real parameter. The curves in (3.38) are straight lines of slope 1/c
in the xt–plane; some of the curves for the case c > 0 are pictured in Figure
3.2.3.
t

 
     
      
      
      
      
       x
      
      
      
      
      

Figure 3.2.3: Characteristic Curves for (3.32)

The curves in (3.38) are called the characteristic curves of the partial dif-
ferential equation in (3.32) and are defined by the ordinary differential equation
in (3.36).
Since the equation in (3.37) was derived by differentiating u along a char-
acteristic curve, it implies that u is constant along characteristics. We can
therefore conclude from (3.37) that

u = constant along characteristics. (3.39)

The equation in (3.39) allows us to obtain a formula for u(x, t), where (x, t) lies
along the characteristic indexed by k in (3.38) as follows

u(x, t) = ϕ(k), (3.40)


24 CHAPTER 3. CONTINUOUS DETERMINISTIC MODELS

where ϕ(k) denotes the constant value of u along the characteristic in (3.38)
indexed by k.
Solving for k in (3.38) and substituting into (3.40) yields a general formula
for computing a solution to the partial differential equation in (3.32):

u(x, t) = ϕ(x − ct). (3.41)

Substituting the initial condition (3.32) into (3.41) yields

ϕ(x) = f (x), for all x ∈ R,

so that
u(x, t) = f (x − ct), for x ∈ R, t ∈ R, (3.42)
solves the initial value problem (3.32). The expression in (3.42) says that the
solution to (3.32) is a traveling wave, which moves to the right with speed c
if c > 0, or to the left if c < 0. In other words, the initial profile, f (x), moves
propagates without distortion with velocity c. The solution in (3.42) is also
known as an advection wave and the partial differential equation in (3.32) is
known as the advection equation.

The method of characteristic curves illustrated in Example 3.2.1 also applies


to nonlinear equations. The analysis is more involved, but in some cases it yields
a lot of information about solutions to the equation. In the next example we
consider the case in which
1
g(u) = u2 (3.43)
2
in (3.28). Thus, in view of (3.29) and (3.43), the partial differential equation in
(3.28) becomes
∂u ∂u
+u = 0. (3.44)
∂t ∂x
The partial differential equation in (3.44) is known as the inviscid Burgers’
equation.

Example 3.2.2 (Inviscid Burgers’ Equation). Solve the equation in (3.44) sub-
ject to the initial condition
u(x, 0) = f (x),
where f is some given continuous function; in other words, solve the initial value
problem
∂u ∂u


 +u = 0, x ∈ R, t > 0;
∂t ∂x (3.45)

x ∈ R.

u(x, 0) = f (x),
We proceed as in Example 3.2.1 by first obtaining the equation for the charac-
teristic curves
dx
= u. (3.46)
dt
3.2. ANALYSIS OF THE TRAFFIC FLOW EQUATION 25

In this case we cannot solve directly for the characteristic curves. However, as
in Example 3.2.1, the a solution, u, to the partial differential equation in (3.45)
must solve the ordinary differential equation
du
= 0; (3.47)
dt
Thus, u must be constant along the characteristic curves given by (3.46). Thus,
we can solve (3.47) to yield
u(x, t) = ϕ(k), (3.48)
where ϕ(k) denotes the constant value of u along the characteristic curve given
by (3.46) indexed by k. We can then re–write (3.46) as

dx
= ϕ(k),
dt
which can be solved to yield the characteristic curves

x = ϕ(k)t + k. (3.49)

Hence, according to (3.49), the characteristic curves of the partial differential


equation in (3.45) are straight lines in the xt–plane whose slopes depend on the
value of u on the them.
Solving for the right–most k in (3.49) and substituting into (3.48), we obtain
an expression that defines u implicitly

u(x, t) = ϕ(x − u(x, t)t), (3.50)

where we have used the expression for ϕ(k) in (3.48).


Finally, employing the initial condition in (3.45), we obtain from (3.50) that

u(x, t) = f (x − u(x, t)t). (3.51)

In subsequent examples we will see realizations of the implicit formula in


(3.51) for specific initial conditions f .

Example 3.2.3 (Inviscid Burgers’ Equation, Continued). Solve the initial value
problem in (3.45) where the initial condition is given by

f (x) = x, for all x ∈ R. (3.52)

In this case the expression in (3.51) defining u implicitly reads

u(x, t) = x − u(x, t)t, (3.53)

in view of (3.52). Solving the expression in (3.53) for u(x, t) yields


x
u(x, t) = , for all x ∈ R and t > 0. (3.54)
1+t
26 CHAPTER 3. CONTINUOUS DETERMINISTIC MODELS

Example 3.2.4 (Inviscid Burgers’ Equation, Continued). Solve the initial value
problem in (3.45) where the initial condition is given by

f (x) = 1 − x, for all x ∈ R. (3.55)

In this case the expression in (3.51) defining u implicitly reads

u(x, t) = 1 − (x − u(x, t)t), (3.56)

in view of (3.55). Solving the expression in (3.56) for u(x, t) yields


1−x
u(x, t) = , for all x ∈ R and 0 6 t < 1. (3.57)
1−t
It is interesting to notice the strikingly different results obtained in Exam-
ples 3.2.3 and 3.2.4. In Example (3.2.4), the solutions, u(x, t) in (3.57) ceases
to exist at t = 1; while the solution in Example 3.2.3 exists for all positive
values of t, according to (3.54). In subsequent examples we try to explain why
the two examples display vastly different results by examining the method of
characteristic curves more carefully.
Example 3.2.5 (Inviscid Burgers’ Equation, Continued). Solve the initial value
problem in (3.45) where the initial condition is given by

0, if x < 0;

f (x) = x, if 0 6 x < 1; (3.58)

1, if x > 1.

Figure 3.2.4 shows a sketch of the initial condition, f . The characteristic curves
f

Figure 3.2.4: Sketch of the graph of f in (3.58)

for the differential equation in (3.45) are solutions to the ordinary differential
equation
dx
= u. (3.59)
dt
Along the characteristic curves, u solves the ordinary differential equation
du
= 0, (3.60)
dt
3.2. ANALYSIS OF THE TRAFFIC FLOW EQUATION 27

which implies that u is constant along characteristics so that

u(x, t) = ϕ(k), (3.61)

for some parameter k, depending on the particular characteristic curve, and ϕ


is some arbitrary real–valued function. Solving (3.59) for u given by (3.61) we
obtain the equation for the characteristic curves:

x = ϕ(k)t + k. (3.62)

Next, solve for k in (3.62) and substitute in (3.61) to obtain that u is given
implicitly by
u(x, t) = ϕ(x − u(x, t)t). (3.63)

Using the initial condition in (3.45), we obtain from (3.63) that

ϕ(x) = f (x), for all x ∈ R, (3.64)

so that
u(x, t) = f (x − u(x, t)t). (3.65)

It follows from (3.62) and (3.64) that the characteristic curves are given by the
equations
x = f (k)t + k; (3.66)

thus, according to (3.66), the characteristic curves of the partial differential


equation in (3.45) are straight lines of slope 1/f (k) going through the point
(k, 0) on the x–axis. In particular, for k 6 0 the characteristic curves are the
vertical lines
x = k,

since f (k) = 0 for k 6 0; for 0 < k < 1, the characteristic curves are the straight
lines
x = k(t + 1);

and, for k > 1, the characteristic lines are the straight lines of slope 1

x = t + k.

A sketch of The characteristic curves is shown in Figure 3.2.5. Notice that the
characteristic curves for 0 6 k 6 1 fan out from the t–axis to the line x = t + 1.
Since the solution, u, to the initial value problem in (3.45) is constant along
characteristic curves (see the equations in (3.60) and (3.61)), the sketch in Figure
3.2.5 shows that u can be computed by traveling back along the characteristic
curves to the initial time, t = 0, and reading off value of f (k) for the particular
value point k on the x–axis. Thus, in theory, the initial value problem in (3.45)
with initial condition given in (3.58) can be solved, and the solution is unique.
28 CHAPTER 3. CONTINUOUS DETERMINISTIC MODELS

t

   
   
   
   
   
  
x

Figure 3.2.5: Sketch of the characteristic curves for (3.45) with f given in (3.58)

In this case we can write down a formula for u(x, t):



0,


if x 6 0, t > 0;



 x
u(x, t) = , if 0 < x < t + 1, t > 0; (3.67)

 1+t





1, if x > t + 1, t > 0.

The fanning our of the characteristic curves pictured in Figure 3.2.5 has the
effect of stretching the initial profile for u in the x direction. This is shown in
Figure 3.2.6, where sketch of u(x, t) as given in (3.67) is shown for t = 0 and
t = 1; the initial profile is shown in dashed lines.
u


 

 
 


x

Figure 3.2.6: Sketch of the graph of u(x, t) for t = 0 and t = 1

The nature of the solutions to the initial value problem in (3.45) changes
dramatically when the following initial profile is used.

1,
 if x < 0;
f (x) = 1 − x, if 0 6 x < 1; (3.68)

0, if x > 1.

A sketch of the graph of f in (3.68) is shown in Figure 3.2.7.


3.2. ANALYSIS OF THE TRAFFIC FLOW EQUATION 29
f

@
@
@
@
@
@
x

Figure 3.2.7: Sketch of the graph of f in (3.68)

Example 3.2.6 (Inviscid Burgers’ Equation, Continued). Solve the initial value
problem in (3.45) where f is as given in (3.68).
Proceeding as in Example 3.2.6, we sketch the characteristic curves in Figure
3.2.8. In Figure 3.2.8 we see that the characteristic curves,
t


  
  
 
 



 
 
  
  
x

Figure 3.2.8: Sketch of the characteristic curves for (3.45) with f given in (3.68)

x = f (k)t + k, (3.69)
for 0 < k < 1, instead of fanning out, bunch in and all meet at the single point
with coordinates (1, 1) in the xt–plane. To see why this is the case, take two of
the characteristic curves in (3.69) with equations
x = f (k1 )t + k1 , (3.70)
and
x = f (k2 )t + k2 , (3.71)
with 0 < k1 < k2 6 1. To find the intersection of the lines in (3.70) and (3.71),
set the equations equal to each other and use the definition of f in (3.68), so
that f (k) = 1 − k, for 0 < k 6 1, to get that
(1 − k1 )t + k1 = (1 − k2 )t + k2 ,
30 CHAPTER 3. CONTINUOUS DETERMINISTIC MODELS

from which we get that t = 1. Thus, u(x, t) ceases to exist in the usual sense at
t = 1.
As in Example 3.2.5, we can obtain a formula for computing u(x, t), at least
for t < 1: 
1,


if x < t < 1;




1 − x
u(x, t) = , if t < x 6 1; (3.72)
1−t






0, if x > 1, t < 1.
Figure 3.2.9 shows a picture of the graph of u(x, t), for t = 1/2, as a function
x. As t approaches 1 from the left, we see from (3.72) that the profile of u(x, t)
u

A
A
A
A
A
A
x

Figure 3.2.9: Sketch of the graph of u(x, t) in (3.72) for t = 1/2

approaches that shown in Figure 3.2.10. Thus, as t → 1− , u(x, t) develops a


u

Figure 3.2.10: Sketch of the graph of u(x, t) in (3.72) as t → 1−

discontinuity.

As seen in Example 3.2.6, a solution to the initial value problem in (3.45),


where f is as given in (3.68), ceases to exist in the usual sense. However, some
sort of solution, known as a shock wave, does exist in a generalized sense.
This generalized solution, or weak solutions as it is also called, will not solve the
partial differential equation, but it will solve the integral equation formulation
of the conservation principle that led to the partial differential equation. For
3.2. ANALYSIS OF THE TRAFFIC FLOW EQUATION 31

the case of the inviscid Burgers’ equation, the conservation principle is

d b
Z
u(x, t) dx = Flux at a − Flux at b, (3.73)
dt a
for all a, b ∈ R with a < b. In the case of Burgers’ equation, the flux is given by
1
F (x, t) = [u(x, t)]2 (3.74)
2
Combining (3.73) and (3.74), the equation in (3.73) can be written as
Z b
d 1 1
u(x, t) dx = [u(a, t)]2 − [u(b, t)]2 , (3.75)
dt a 2 2
for all a, b ∈ R with a < b.
The conservation principle in (3.75) can be used to describe what happens
to the solution after a shock forms; for instance, after t = 1 in Example 3.2.6.
We saw in that example that a discontinuity develops. The discontinuity will
continue to travel along some curve in the xt–plane parametrized by a path of
the form
t 7→ (σ(t), t). (3.76)
We would like to describe the path in (3.76). In order to do this, we assume that
the path in (3.76) is differentiable with continuous derivative (in other words,
the path in (3.76) is a C 1 path). We also assume that u has a jump discontinuity
along the path in (3.76), so that the one sided limits

u− (t) = lim u(x, t) and u+ (t) = lim u(x, t) (3.77)


x→σ(t)− x→σ(t)+

For instance, in Example 3.2.6 we saw that the solution to the inviscid Burgers’
equation with the initial condition in (3.68) has a jump discontinuity at t = 1
with
u− (1) = 1 and u+ (1) = 0. (3.78)
The conservation expression in (3.73) states that the quantity
Z b
Q(a, b, t) = u(x, t) dx, (3.79)
a

is conserved. Using (3.79), we can re–write (3.75)


d 1 1
[Q(a, b, t)] = [u(a, t)]2 − [u(b, t)]2 . (3.80)
dt 2 2
Consider the quantity Q(a, b, t) over a short time interval [t, t + ∆t], where
a = σ(t) and b = σ(t + ∆t). We have by the definition of Q in (3.79) that
Z σ(t+∆t)
Q(σ(t), σ(t + ∆t), t + ∆t) = u(x, t + ∆t) dx (3.81)
σ(t)
32 CHAPTER 3. CONTINUOUS DETERMINISTIC MODELS

Using (3.77) we can approximate (3.81) as follows

Q(σ(t), σ(t + ∆t), t + ∆t) ≈ u− (t)(σ(t + ∆t) − σ(t)) (3.82)

since u(x, t + ∆t) is to the left of the shock for ∆t > 0 (see Figure 3.2.11).
Similarly, we get from (3.79) and (3.77) that

Q(σ(t), σ(t + ∆t), t) ≈ u+ (t)(σ(t + ∆t) − σ(t)) (3.83)

It follows from (3.82) and (3.83) that

Q(σ(t), σ(t + ∆t), t + ∆t) − Q(σ(t), σ(t + ∆t), t)


∆t
(3.84)
− + σ(t + ∆t) − σ(t)
≈ [u (t) − u (t)] .
∆t
Letting ∆t → 0 in (3.84) we obtain
dQ dσ
= [u− (t) − u+ (t)] (3.85)
dt dt
across the shock.
t

t + ∆t

u−
t
u+

x
σ(t) σ(t + ∆t)

Figure 3.2.11: Path of a shock solution for (3.45) with f given in (3.68)

On the other hand, for any τ ∈ (t, t + ∆t) we have that


 
1 1 1 1
lim [u(σ(t), τ )] − [u(σ(t + ∆t), τ )] = [u− (t)]2 − [u− (t)]2
2 2
(3.86)
∆t→0 2 2 2 2

(see Figure 3.2.11).


Next, combine (3.85), (3.86) and the conservation principle in (3.75) to get
dσ 1 1
[u− (t) − u+ (t)] = [u− (t)]2 − [u− (t)]2 ,
dt 2 2
3.2. ANALYSIS OF THE TRAFFIC FLOW EQUATION 33

from which we get that


dσ u− (t) + u+ (t)
= (3.87)
dt 2
The differential equation in (3.87) gives the path that the jump discontinuity
will follow after the shock forms.

Example 3.2.7 (Inviscid Burgers’ Equation, Continued). We saw in Example


3.2.6 that the solution to the initial value problem in (3.45) where f is as given
in (3.68) develops a shock at t = 1. The solution after t = 1 has a jump
discontinuity that travels along the path

t 7→ (σ(t), t)

given by (3.87) where u+ and u− are given in (3.78). Consequently,

dσ 1
= ,
dt 2
so that
t
σ(t) = + c. (3.88)
2
Since σ(1) = 1, we get from (3.88) that
1
c=
2
in (3.88). We therefore get the following formula for the weak solution to initial
value problem discussed in Example 3.2.6:
t+1

1, if x <
 , t > 1;

 2
u(x, t) = (3.89)


0, if x > t + 1
, t > 1.

2
Figure 3.2.12 shows a picture of some of the characteristic curves for (3.45)
with f given in (3.68), which also incorporates the shock wave solution that we
derived in (3.89).
34 CHAPTER 3. CONTINUOUS DETERMINISTIC MODELS








 
 
  
  
x

Figure 3.2.12: Sketch of the characteristic curves for (3.45) with f given in
(3.68) with shock wave solution
Chapter 4

Stochastic Models

The models discussed in the previous chapter have been deterministic—the vari-
ables used in the models are completely determined by the values of a set of
parameters and the values of the variables at some initial point or curve. For
instance, when modeling bacterial growth, the number of bacteria in a culture
at time, t, might be modeled by a continuous variable, N (t), which satisfies the
initial value problem
  
dN N
= rN 1 − ;


dt K

(4.1)


N (0) = No ,

known as the logistic model. The model in (4.1) was derived in Assignment
#1 as a special case of the bacterial growth in a chemostat model presented
in Section 2.2 of these notes. The parameters in the equation in (4.1) are
the intrinsic growth rate, r, and the carrying capacity of the growth medium,
K. Given values for the parameters r and K, and the size of the bacterial
population, No , at time t = 0, the values of population size, N (t), for later
values of t, are completely determined by the formula

No K
N (t) = , for t > 0. (4.2)
No + (K − No )e−rt

Thus, if an experiment is performed in which No bacteria, whose intrinsic growth


rate is r, are placed in a growth medium with carrying capacity, K, at time
t = 0, then the number of bacteria, N (t), in the culture at time t > 0 will
be given by (4.2). The experiment may be repeated many times; if the same
initial condition holds, and the values of the parameters are the same, the same
population value, N (t), at some time later will be that given by (4.2).
The Logistic growth scenario described in the previous paragraphs is to be
contrasted with the situation in which, in addition to finding out how many
bacteria, N (t), are present at time t, we want to know how many of those

35
36 CHAPTER 4. STOCHASTIC MODELS

bacteria develop a certain type of mutation. Here, an element of chance needs


to be introduced in the model. Each of the N (t) bacteria present at time t
might or might not develop of a mutation in a short time interval [t, t + ∆t]; it
is not possible to predict with certainty whether a given bacterium will develop
a mutation. Thus, if we let M (t) be the number of bacteria that develop a
mutation during the time interval [0, t], then every time we run the experiment
of placing No bacteria in a culture at time t = 0, under the same conditions,
and count the number of bacteria that have developed the mutation at time t,
we will not get the same value for M (t). Thus, M (t) is not a function in the
usual sense that we understand that term. After a time interval of length t,
M (t) can take on a range of values, and each value has a certain likelihood or
probability of occurring. This notion of a “function” M (t) whose values cannot
be predicted, but for which we can obtain a measure of their likelihood is what
is known as a random variable.

4.1 Random Variables


If M (t) denotes the number of bacteria that develop a mutation from and initial
number No in a time interval [0, t], it is reasonable to model it as a random
variable. Roughly speaking, random variables are quantities that are determined
from outcomes of a random experiment. A random experiment is a process
which can be repeated indefinitely under the same set of conditions, but whose
outcome cannot be predicted with certainty before the experiment is performed.
For instance, suppose you start with one bacterium in a medium conducive to
growth; t units of time later we count how many out of the N (t) bacteria have
developed a mutation. The number of bacteria, M (t), that have developed the
mutation is a random variable.
Even though we are not able to predict with certainty what specific value
the random variable, M (t), will take on at time t, in many modeling situations
we are able to ascertain the likelihood, or probability, that M (t) will take on a
range of values.

Example 4.1.1. Suppose that two bacteria, a and b, can randomly develop a
mutation in a unit of time. Assume that each bacterium can mutate at most
once in the unit of time of the experiment. Let M denote the number of bacteria
out of the two that develop mutations after one unit of time. Then M can take
on the values 0, 1, or 2. We cannot predict precisely what value M will take on.
Any time we run the experiment of placing the two bacteria under observation
and counting the number of mutations we may get any of the possible values.
M is thus an example of a random variable. The best we can hope for is an
estimate of the probabilities that M can take on any of the possible values; in
symbols, we want to estimate

Pr[M = k], for k = 0, 1, 2, (4.3)


4.1. RANDOM VARIABLES 37

4.1.1 A Brief Excursion into Probability


The expression Pr[M = k] in (4.3) denotes the probability of the event [M = k];
that is, Pr[M = k] gives a a measure of the likelihood that k bacteria out of the
two in Example 4.1.1 will develop a mutation.
An event is a possible outcome, or set of outcomes, of a random experiment.
In Example 4.1.1, the event denoted by [M = k] represents a set of outcomes
in which k of the two bacteria have developed a mutation.
A probability function assigns a real value between 0 and 1 to an event. A
probability of 0 means an impossible event, and a probability of 1 means that
the event will surely happen. The assignments of probability between 0 and 1
will depend on assumptions made about the experiment at hand.1
In order to compute the probabilities of the events [M = k], for k = 0, 1, 2,
in Example 4.1.1, we need to make some assumptions regarding how mutations
occur. Let A denote the event that bacterium a develops a mutation and b the
event that bacterium B develops a mutation in one unit of time. Suppose we are
told that the probability that a bacterium will develop a mutation is p, where
0 < p < 1 (p is called the mutation rate). We then have that

Pr(A) = p and Pr(B) = p. (4.4)

We assume that the event that A occurs will not affect the probability of event
B. We say that A and B are stochastically independent.

Definition 4.1.2 (Stochastic Independence). We say that events A and B are


stochastically independent if the probability of the joint occurrence of A and B
is the product of the individual probabilities. In symbols,

Pr(A ∩ B) = Pr(A) · Pr(B), (4.5)

where A ∩ B denotes the event that both A and B happen jointly.

In Example 4.1.1, A ∩ B corresponds to the event that both bacteria develop


a mutation in a unit of time. Thus,

A ∩ B = [M = 2].

Thus, the independence assumption implies that

Pr[M = 2] = Pr(A) · Pr(B) = p · p = p2 , (4.6)

where we have used (4.5) and (4.4).


We next see how to compute Pr[M = 0] and Pr[M = 1] in Example 4.1.1.

Definition 4.1.3 (Complement of an Event). Given and event, A, the event


that A does not occur is called the complement of A and is denoted by Ac .
1 For example, in the experiment of tossing a “fair die,” it is assumed that all faces of the

die are equally likely; thus, the probability of any given face is 1/6.
38 CHAPTER 4. STOCHASTIC MODELS

Thus, in Example 4.1.1, Ac is the event that bacterium a does not develop
a mutation in one unit of time. Observe that A and Ac are mutually exclusive;
that is, if A occurs then Ac cannot occur.
Definition 4.1.4 (Mutually Exclusive Events). Events A and B are said to be
mutually exclusive if and only if A ∩ B = ∅.

Definition 4.1.5. Given events A and B, the symbol A ∪ B denotes the event
that either A or B occurs.
Definition 4.1.6 (Probability of Mutually Exclusive Events). If A and B are
mutually exclusive, then

Pr(A ∪ B) = Pr(A) + Pr(B) (4.7)

For example, since A and Ac are mutually exclusive, it follows that

Pr(A ∪ Ac ) = Pr(A) + Pr(Ac ).

On the other hand, Pr(A ∪ Ac ) = 1, since A ∪ Ac is a sure event. It then follows


that
Pr(A) + P (Ac ) = 1,
from which we get the following property of the probability function.
Proposition 4.1.7 (Probability of Complementary Event).

Pr(Ac ) = 1 − Pr(A). (4.8)

Definition 4.1.8 (Set Difference). Given any events A and B, we define the
set
A\B = {x ∈ A | x 6∈ B}.
Note that A\B and B are mutually exclusive, by Definition 4.1.8. Further-
more,
A = B ∪ (A\B). (4.9)
We therefore have the following proposition.

Proposition 4.1.9 (Probability of Difference of Events). Given events A and


B, with B ⊆ A,
Pr(A\B) = Pr(A) − Pr(B). (4.10)
Proof: It follows from (4.9) and Definition 4.1.6 that

Pr(A) = Pr(B) + Pr(A\B),

which implies (4.10). 


As a consequence of the properties of probability that we have discussed so
far, we have the following proposition.
4.1. RANDOM VARIABLES 39

Proposition 4.1.10 (Probability of Union of Events). For any two events, A


and B,
Pr(A ∪ B) = Pr(A) + Pr(B) − Pr(A ∩ B). (4.11)
Proof: Note that A ∩ B ⊆ A. Thus, applying Proposition 4.1.9,

Pr(A\(A ∩ B)) = Pr(A) − Pr(A ∩ B), (4.12)

where
x ∈ A\(A ∩ B) iff x ∈ A and x 6∈ A ∩ B
iff x ∈ A and x 6∈ B
iff x ∈ A ∩ Bc,
so that
A\(A ∩ B) = A ∩ B c . (4.13)
Substituting (4.13) into (4.12) then yields

Pr(A ∩ B c ) = Pr(A) − Pr(A ∩ B). (4.14)

Similar calculations show that

Pr(B ∩ Ac ) = Pr(B) − Pr(A ∩ B). (4.15)

Observing that

A ∪ B = (A ∩ B c ) ∪ (B ∩ Ac ) ∪ (A ∩ B), (4.16)

where A ∩ B c , B ∩ B c and A ∩ B are mutually exclusive, we get from (4.16) and


Definition 4.1.6 that

Pr(A ∪ B) = Pr(A ∩ B c ) + Pr(B ∩ Ac ) + Pr(A ∩ B). (4.17)

Combining (4.14), (4.15) and (4.17) yields (4.11). 


Example 4.1.11 (Continuation of Example 4.1.1.). Let A, B and M be as in
Example 4.1.1. In this example we compute Pr[M = 0] and Pr[M = 1].
The probability that bacterium a will not mutate is

Pr(Ac ) = 1 − p, (4.18)

where we have used (4.8) and (4.4). Likewise, the probability that bacterium b
will not mutate is
Pr(B c ) = 1 − p. (4.19)
Since Ac and B c are independent, it follows from (4.5) and (4.19) that

Pr(Ac ∩ B c ) = P (Ac ) · P (B c ) = (1 − p) · (1 − p) = (1 − p)2 .

In other words, the probability that no mutation occurs is (1−p)2 . We therefore


have that
Pr[M = 0] = P (Ac ∩ B c ) = (1 − p)2 . (4.20)
40 CHAPTER 4. STOCHASTIC MODELS

To compute P [M = 1], first observe that

[M = 1] = (A ∩ B c ) ∪ (Ac ∩ B), (4.21)

where the events A ∩ B c and Ac ∩ B are mutually exclusive. It then follows from
(4.7) and (4.21) that

P [M = 1] = P (A ∩ B c ) + P (Ac ∩ B). (4.22)

Next, use the independence of the events A and B c (see Problem 1 in Assignment
#7) to compute
P (A ∩ B c ) = Pr(A) · Pr(B c )

= Pr(A)(1 − Pr(B)) (4.23)

= p(1 − p),
where we have used Definition 4.1.2, Proposition 4.1.7 and (4.4). Similarly,

P (Ac ∩ B) = (1 − p)p. (4.24)

Combining (4.22), (4.23) and (4.24) then yields

Pr[M = 1] = 2(1 − p)p. (4.25)

In order to compute Pr[M = 0], first observe that

[M = 0] = Ac ∩ B c ;

so that, by virtue of the independence of Ac and B c (see Problem 1 in Assign-


ment #7),
Pr[M = 0] = Pr(Ac ) · Pr(B c )
(4.26)
= (1 − p)2 .
where we have used (4.18) and (4.19).

Putting together the results in (4.6), (4.25) and (4.26) we obtain




 (1 − p)2 if k = 0;

2p(1 − p) if k = 1;
Pr[M = k] = (4.27)


 p2 if k = 2;
0 elsewhere.

4.1.2 Discrete Random Variables


The random variable, M , in Example 4.1.1 is an instance of a discrete random
variable.
4.1. RANDOM VARIABLES 41

Definition 4.1.12 (Discrete Random Variable). A random variable, X, which


takes on a finite, or countable set of values, x1 , x2 , x3 , . . ., is said to be discrete.
The set of probability values
pX (xk ) = Pr[X = xk ], for k = 1, 2, 3, . . .
is called the probability mass function of X, or simply the probability distribu-
tion of X. Observe that
X∞
pX (xk ) = 1. (4.28)
k=1

Example 4.1.13 (Continuation of Example 4.1.1.). The expression in (4.27)


is called the probability distribution of the random variable, M , defined in
Example 4.1.1. We therefore have that


 (1 − p)2 if k = 0;

2(1 − p)p if k = 1;
pM (k) = (4.29)


 p2 if k = 2;
0 elsewhere.

Observe that
2
X
pM (k) = (1 − p)2 + 2(1 − p)p + p2
k=0

= [(1 − p) + p]2

= 1,
so that the function defined in (4.29) satisfies the condition (4.28) in the defini-
tion of a the distribution of a random variable (Definition 4.1.12).
Definition 4.1.14 (Bernoulli Trials). A random experiment with two mutually
exclusive outcomes, one called a “success” and the other a “failure,” is called a
Bernoulli trial. We associate a random variable, X, with a Bernoulli as follows:
X = 1 if the outcome is a success, and X = 0 if the outcome is a failure. If the
probability of a success is p, then then distribution of X is

1 − p if k = 0;

pX (k) = p if k = 1; (4.30)

0 elsewhere.

The random variable X is said to have a Bernoulli distribution with parameter


p. We write X ∼ Bernoulli(p).
Example 4.1.15 (Continuation of Example 4.1.1.). The bacterial mutation
situation described in Example 4.1.1 may be modeled by using two Bernoulli
trials, X1 and X2 , with parameter p, where p is the probability that a bacterium
will develop a mutation in the time interval [0, 1]. Event A is then [X1 = 1],
while event B is the event [X2 = 1].
42 CHAPTER 4. STOCHASTIC MODELS

Definition 4.1.16 (Independent Discrete Random Variables). Two discrete


random variables, X and Y , are said to be stochastically independent if and
only if
Pr(X = a, Y = b) = Pr(X = a) · Pr(Y = b), (4.31)
for all values of a and b.
Example 4.1.17 (Continuation of Example 4.1.1.). If we assume that the
Bernoulli random variables, X1 and X2 , postulated in Example 4.1.17 as a
model for the bacterial mutation situation described in Example 4.1.1, are also
stochastically independent, then events A = [X1 = 1] and B = [X2 = 1] are
independent by virtue of (4.31). We also see that the random variable M , the
number of mutations in one unit of time, in the two–bacteria culture, is given
by
M = X1 + X2 .
Thus, the calculations leading to (4.1.13) in Example 4.1.13 show that, if X1
and X2 are independent Bernoulli(p) random variables and Y = X1 + X2 , then
the random variable Y has the distribution function


 (1 − p)2 if k = 0;

2(1 − p)p if k = 1;
pY (k) = (4.32)


 p2 if k = 2;
0 elsewhere.

4.1.3 The Binomial Distribution


Consider now three bacteria, labeled 1, 2 and 3, and ask the question: How
many mutations will there be in a unit of time? As we did in Example 4.1.17,
we may postulate three Bernoulli(p) random variables, X1 , X2 , and X3 , where
p is the mutation rate. Thus, the event [Xi = 1] is the event that bacterium i
will develop a mutation, for i = 1, 2, 3. This time, in addition to assuming that
X1 , X2 and X3 are pairwise independent, we also need to assume that

Pr(X1 = a, X2 = b, X3 = c) = Pr(X1 = a) · Pr(X2 = b) · Pr(X3 = c),

for all values a, b and c. This is the concept of mutual independence.


Definition 4.1.18 (Mutual Independence). Three discrete random variables,
X1 , X2 and X3 , are said to be mutually independent if

Pr(Xi = a, Xj = b) = Pr(Xi = a) · Pr(Xj = b), for i 6= j, (4.33)

for all values of a and b; that is, X1 , X2 and X3 are pairwise stochastically
independent, and

Pr(X1 = a, X2 = b, X3 = c) = Pr(X1 = a) · Pr(X2 = b) · Pr(X3 = c), (4.34)

for all values of a, b and c.


4.1. RANDOM VARIABLES 43

Lemma 4.1.19. Let X1 , X2 and X3 be mutually independent, discrete ran-


dom variables and define Y2 = X1 + X2 . Then, Y2 and X3 are stochastically
independent.

Proof: Compute

Pr(Y2 = w, X3 = z) = Pr(X1 + X2 = w, X3 = z)
X
= Pr(X1 = x, X2 = w − x, X3 = z),
x

where the summation is taken over all possible values of X1 . Thus, using (4.34)
in Definition 4.1.18,
X
Pr(Y2 = w, X3 = z) = Pr(X1 = x) · Pr(X2 = w − x) · Pr(X3 = z)
x
!
X
= Pr(X1 = x) · Pr(X2 = w − x) · Pr(X3 = z)
x

Hence, by pairwise independence, (see (4.33) in Definition 4.1.18),


!
X
Pr(Y2 = w, X3 = z) = Pr(X1 = x, X2 = w − x) · Pr(X3 = z)
x

= Pr(X1 + X2 = w) · Pr(X3 = z)

= Pr(Y2 = w) · Pr(X3 = z),

which shows the independence of Y2 and X3 . 

Example 4.1.20. Suppose X1 , X2 and X3 be three mutually independent


Bernoulli random variables with parameter p, where 0 < p < 1. Define Y3 =
X1 + X2 + X3 . Find the probability distribution for Y3 .
Solution: Observe that Y3 takes on the values 0, 1, 2 and 3, and that

Y3 = Y2 + X3 ,

where the probability distribution for Y2 is given in (4.32).


We compute

Pr(Y3 = 0) = Pr(Y2 = 0, X3 = 0)
= Pr(Y2 = 0) · Pr(X3 = 0), by independence (Lemma 4.1.19),
= (1 − p)2 · (1 − p)
= (1 − p)3 .
44 CHAPTER 4. STOCHASTIC MODELS

Next, since the event (Y3 = 1) consists of the disjoint union of the events
(Y2 = 1, X3 = 0) and (Y2 = 0, X3 = 1),

Pr(Y3 = 1) = Pr(Y2 = 1, X3 = 0) + Pr(Y2 = 0, X3 = 1)


= Pr(Y2 = 1) · Pr(X3 = 0) + Pr(Y2 = 0) · Pr(X3 = 1)
= 2p(1 − p)(1 − p) + (1 − p)2 p
= 3p(1 − p)2 ,

where we have used Lemma 4.1.19 and the definition of the probability distri-
bution of Y2 in (4.32). Similarly,

Pr(Y3 = 2) = Pr(Y2 = 2, X3 = 0) + Pr(Y2 = 1, X3 = 1)


= Pr(Y2 = 2) · Pr(X3 = 0) + Pr(Y2 = 1) · Pr(X3 = 1)
= p2 (1 − p) + 2p(1 − p)p
= 3p2 (1 − p),

and
Pr(Y3 = 3) = Pr(Y2 = 2, X3 = 1)
= Pr(Y2 = 0) · Pr(X3 = 0)
= p2 · p
= p3 .
We then have that the probability distribution of Y3 is given by



 (1 − p)3 if k = 0,
2
3p(1 − p) if k = 1,




2
pY3 (k) = 3p (1 − p) if k = 2, (4.35)

p3 if k = 3,





0 elsewhere.

If we go through the calculations in Examples 4.1.11 and 4.1.20 for the case of
four mutually independent2 Bernoulli trials with parameter p, where 0 < p < 1,
X1 , X2 , X3 and X4 , we obtain that for Y4 = X1 + X2 + X3 + X4 ,


 (1 − p)4 if k = 0,

4p(1 − p)3 if k = 1,




6p2 (1 − p)2

if k = 2,
pY4 (k) = (4.36)


 4p3 (1 − p) if k = 3,
p 4 if y = 4,




0 elsewhere.

2 Here, not only do we require that the random variable be pairwise independent, but also

that for any group of k ≥ 2 events (Xj = xj ), the probability of their intersection is the
product of their probabilities.
4.1. RANDOM VARIABLES 45

Observe that the terms in the expressions for pY2 (y), pY3 (y) and pY4 (y) in (4.32),
(4.35) and (4.36), respectively, are the terms in the expansion of [(1 − p) + p]n
for n = 2, 3 and 4, respectively. By the Binomial Expansion Theorem,
n  
n
X n k n−k
(a + b) = a b , for a, b ∈ R, n ∈ N, (4.37)
k
k=0

where  
n n!
= , k = 0, 1, 2 . . . , n, (4.38)
k k!(n − k)!
are the called the binomial coefficients, we obtain that
n  
X n k
[(1 − p) + p]n = p (1 − p)n−k .
k
k=0

This suggests that if


Yn = X1 + X2 + · · · + Xn ,
where X1 , X2 , . . . , Xn are n mutually independent Bernoulli trials with param-
eter p, for 0 < p < 1, then
 
n k
pYn (k) = p (1 − p)n−k for k = 0, 1, 2, . . . , n,
k
and pYn (k) = 0 elsewhere. In fact, this statement is true, and will be proved as
the following theorem. We shall establish this as a the following theorem:
Theorem 4.1.21. Assume that X1 , X2 , . . . , Xn are mutually independent Bernoulli
trials with parameter p, where 0 < p < 1. Define

Yn = X1 + X2 + · · · + Xn .

Then the probability distribution of Yn is


 
n k

 p (1 − p)n−k for k = 0, 1, 2, . . . , n;
 k
pYn (k) = (4.39)



0 elsewhere.

Proof: We prove this result by induction on n.


For n = 1 we have that Y1 = X1 , and therefore

pY1 (0) = Pr(X1 = 0) = 1 − p

and
pY1 (1) = Pr(X1 = 1) = p.
Thus, (
1−p if k = 0,
pY1 (k) =
p if k = 1.
46 CHAPTER 4. STOCHASTIC MODELS
   
1 1
Observe that = = 1 and therefore the result in (4.39) holds true for
0 1
n = 1.
Next, assume the theorem is true for n; that is, suppose that
 
n k
pYn (k) = p (1 − p)n−k for k = 0, 1, 2, . . . , n, (4.40)
k
and pYn (k) = 0 elsewhere. We show then show that the result also holds true
for n + 1. In other words, we show that if X1 , X2 , . . . , Xn , Xn+1 are mutually
independent Bernoulli trials with parameter p, with 0 < p < 1, and

Yn+1 = X1 + X2 + · · · + Xn + Xn+1 , (4.41)

then, the pmf of Yn+1 is


 
n+1 k
pYn+1 (k) = p (1 − p)n+1−k for k = 0, 1, 2, . . . , n, n + 1, (4.42)
k
and pYn+1 (k) = 0 elsewhere.
From (4.41) we see that

Yn+1 = Yn + Xn+1 ,

where Yn and Xn+1 are independent random variables, by an argument similar


to the one in the proof of Lemma 4.1.19 since the Xk ’s are mutually independent.
Therefore, the following calculations are justified:
(i) for k 6 n, or k < n + 1,

Pr(Yn+1 = k) = Pr(Yn = k, Xn+1 = 0) + Pr(Yn = k − 1, Xn+1 = 1)

= Pr(Yn = k) · Pr(Xn+1 = 0)
+ Pr(Yn = k − 1) · Pr(Xn−1 = 1)
 
n k
= p (1 − p)n−k (1 − p)
k  
n
+ pk−1 (1 − p)n−k+1 p,
k−1
where we have used the inductive hypothesis (4.40). Thus,
   
n n
Pr(Yn+1 = k) = + pk (1 − p)n+1−k . (4.43)
k k−1
The expression in (4.42) will following from (4.43) the fact that
     
n n n+1
+ = , (4.44)
k k−1 k
which can be established by the following counting argument:
4.1. RANDOM VARIABLES 47

Imagine n + 1 balls in a bag, n of which are blue and one is


red. We consider the collection of all groups of k balls that can
be formed out of the n + 1 balls in the bag. This collection is
made up of two disjoint sub–collections: the ones with the red
ball and the ones without the red ball. The number of elements
in the collection with the one red ball is
     
n 1 n
· = , (4.45)
k−1 1 k−1
while the number of elements in the collection of groups without
the red ball are  
n
. (4.46)
k
 
n+1
Adding the amounts in (4.45) and (4.46) must yield ,
k
which proves (4.44).
Thus, combining (4.43) and (4.44), we obtain (4.42) for the case k < n + 1.
(ii) If k = n + 1, then, using again the independence of Yn and Xn+1 ,
Pr(Yn+1 = k) = Pr(Yn = n, Xn+1 = 1)

= Pr(Yn = n) · Pr(Xn+1 = 1)

= pn p

= pn+1
 
n+1 k
= p (1 − p)n+1−k ,
k
since k = n + 1, and so (4.42) is established for k = n + 1.
The proof is now complete. 
Definition 4.1.22 (Binomial Distribution). A discrete random variable, Y ,
which counts the number of successes in n independent Bernoulli(p) trials, and
having the distribution
n!

k n−k
 k!(n − k)! p (1 − p) for k = 0, 1, 2, . . . , n,


pY (k) = (4.47)



0 elsewhere.
is called a binomial random variable with parameters n and p, where p is the
provability of each success. We write
Y ∼ Binomial(n, p).
48 CHAPTER 4. STOCHASTIC MODELS

Remark 4.1.23. Theorem 4.1.21 shows that a Binomial(n, p) distribution is


the sum of n independent Bernoulli(p) trials.

Example 4.1.24 (Bacterial Mutations). Consider a culture containing N bac-


teria. Let M denote the number of mutations in the culture that develop in the
culture in a unit of time. Then, assuming a mutation rate of p (the probability
that a given bacterium with develop a mutation in a unit of time), M can be
modeled by a binomial random variable with parameters N and p; that is, M
has a probability distribution given by (4.47), where n = N :
 
N k

 p (1 − p)n−k for k = 0, 1, 2, . . . , N ;
 k
pM (k) = (4.48)



0 elsewhere.

4.1.4 Expected Value


Definition 4.1.25 (Expected Value of a Discrete Random Variable). Given a
discrete random variable, X, with values x1 , x2 , . . . , xn , and distribution pX ,
the weighted average of the values of X,

x1 pX (x1 ) + x2 pX (x2 ) + · · · + xn pX (xn ),

is called the expected value of X and is denoted by E(X). We then have that
n
X
E(X) = xk pX (xk ) (4.49)
k=1

Example 4.1.26 (Expected Value of a Bernoulli Trial). Let X denote a Bernoulli


random variable with parameter p. Then, the expected value of X is

E(X) = 0 · pX (0) + 1 · pX (1) = p. (4.50)

Example 4.1.27 (Expected Value of a Binomial Random Variable). Let

X1 , X2 , · · · , Xn

denote n independent Bernoulli trials with parameter p, and put Y = X1 +X2 +


· · · + Xn . Then, using the result of Problem 1 in Assignment #8, we have that

E(Y ) = E(X1 ) + E(X2 ) + · · · + E(Xn ) = np,

where we have used (4.50). Thus, the expected value of a binomial random
variable, Y , with parameters n and p is

E(Y ) = np. (4.51)


4.1. RANDOM VARIABLES 49

Alternatively, we could have used the definition of pY in (4.47) and the


formula for the expected value in (4.49) to get
n
X n!
E(Y ) = pk (1 − p)n−k (4.52)
(k − 1)!(n − k)!
k=1

Next, make the change of variables m = k − 1 in (4.52) to get


n−1
X n(n − 1)!
E(Y ) = pm+1 (1 − p)n−1−m
m=0
m!(n − 1 − m)!
(4.53)
n−1  
X n−1
= np · pm (1 − p)n−1−m
m=0
m

Thus, applying the formula for the binomial theorem in (4.37) for n − 1 in place
of n and a = p, b = 1 − p, we obtain from the result in (4.53) that

E(Y ) = np · (p + 1 − p)n−1 ,

which yields (4.51).

4.1.5 The Poisson Distribution


We have seen that, if Y ∼ Binomial(n, p), then Yn has a probability distribution
given by

n!

k n−k
 k!(n − k)! p (1 − p) for k = 0, 1, 2, . . . , n,


pYn (k) = (4.54)



0 elsewhere,

and the expected value of Yn is given by

E(Yn ) = np, for all n. (4.55)

In this section we explore what happens to the distribution of Yn as n → ∞,


while E(Yn ) is kept at a constant value, λ. In other words, we would like explore
the limit
lim pYn (k) while np = λ, for all n, (4.56)
n→∞

where λ is a constant.
Note that from
np = λ,
we get that
λ
p= . (4.57)
n
50 CHAPTER 4. STOCHASTIC MODELS

It follows from (4.57) that

p→0 as n → ∞,

so that the limiting setting described in (4.56) is relevant in modeling situations


in which there are large number of independent trials with a very small proba-
bility of success. This is precisely the situation of a bacterial culture of size N
in the order of millions, and mutation rates typically of the order of 10−8 . Thus,
the limiting distribution in (4.56), if it exists, can be used too approximate the
distribution of mutations in a large colony of bacteria when the mutation rate
is very small.
Fix k in (4.54). Then, for n > k we may write
n−k
λk

n! λ
pYn (k) = 1− , (4.58)
k!(n − k)! nk n

λ
where we have used (4.57) to replace p by . Next, rewrite (4.58) as
n
−k  n
λk n(n − 1)(n − 2) · · · (n − k + 1)

λ λ
pYn (k) = 1 − 1 − ,
k! nk n n

which can in turn be rewritten as


    
1 2 k−1
1− 1− ··· 1 − n
λk

n n n λ
pYn (k) = k 1− . (4.59)
k! n

λ
1−
n

Now, since k and λ are fixed,


    
1 2 k−1
1− 1− ··· 1 −
n n n
lim k = 1. (4.60)
n→∞

λ
1−
n

Next, use the well–known limit


 x n
lim 1 + = ex , for all x ∈ R, (4.61)
n→∞ n
to obtain that  n
λ
lim 1 − = e−λ . (4.62)
n→∞ n
Thus, using the limits in (4.62) and (4.60), we obtain from (4.59) that

λk −λ
lim pYn (k) = e . (4.63)
n→∞ k!
4.1. RANDOM VARIABLES 51

The limit expression in (4.63) shows that the sequence of random variables
 
λ
Yn ∼ Binomial n, , for n = 1, 2, 3, . . . ,
n
has a limiting distribution given by
λk −λ
pY (k) = e , for k = 0, 1, 2, 3, . . . (4.64)
k!
To see that the expression in (4.64) does indeed define a probability distribution
observe that
∞ ∞
X X λk −λ
pY (k) = e
k!
k=0 k=0
(4.65)
∞ k
X λ
= e−λ .
k!
k=0

Thus, using the well known series expansion result



X xk
= ex , for all x ∈ R, (4.66)
k!
k=0

we obtain from (4.65) that



X
pY (k) = e−λ eλ = 1.
k=0

Note that the expressions in (4.61) and (4.66) are well known realizations of
the exponential function x 7→ ex .
Definition 4.1.28 (Poisson Distribution). A discrete random variable, Y , which
can take on the values k = 0, 1, 2, . . ., is said to have a Poisson distribution with
parameter λ, if  k
 λ e−λ for k = 0, 1, 2, . . . ;

k!

pY (k) = (4.67)


0 elsewhere.

We write
Y ∼ Poisson(λ).
Example 4.1.29 (Expected Value of the Poisson Distribution). Let ∼ Poisson(λ).
Compute E(Y ).
Solution: Since Y takes on a countable number of values, the expected
value of Y is given by the series

X
E(Y ) = mpY (m), (4.68)
m=0
52 CHAPTER 4. STOCHASTIC MODELS

where pY is given in (4.67). Thus, noting that the first term in the series in
(4.68) is zero, we obtain from (4.68) and (4.67) that

X λm −λ
E(Y ) = m· e
m=1
m!
(4.69)

Xλm
= e−λ .
m=1
(m − 1)!

Next, make the change of variables k = m − 1 in (4.69) to obtain



X λk+1
E(Y ) = e−λ
k!
k=0


X λk · λ
= e−λ ,
k!
k=0

so that

X λk
E(Y ) = λ e−λ . (4.70)
k!
k=0

Finally, use the series expansion for ex in (4.66) to obtain from (4.70) that

E(Y ) = λe−λ eλ = λ.


Thus, we have shown that

Y ∼ Poisson(λ) ⇒ E(Y ) = λ; (4.71)

in other words, the expected value of a Poisson random variable with parameter
λ is λ. In Assignment #9 you’ll be asked to show that the variance, Var(Y ) of
of Y ∼ Poisson(λ) is also λ; where,

Var(Y ) = E(Y 2 ) − [E(Y )]2 .

4.1.6 Estimating Mutation Rates in Bacterial Populations


In the early 1940s, Luria and Delbrück [LD43] devised the following procedure
(known as the fluctuation test) to estimate the mutation rate, p, for certain
bacteria:
Imagine that you start with a single normal bacterium (with no mutations)
and allow it to grow to produce several bacteria. Place each of these bacteria
in test–tubes each with a medium conducive to growth. Suppose the bacteria
in the test–tubes are allowed to reproduce for n division cycles. After the
nth division cycle, the content of each test–tube is placed onto a agar plate
4.1. RANDOM VARIABLES 53

containing a virus population which is lethal to the bacteria which have not
developed resistance. Those bacteria which have mutated into resistant strains
will continue to replicate, while those that are sensitive to the virus will die.
After certain time, the resistant bacteria will develop visible colonies on the
plates. The number of these colonies will then correspond to the number of
resistant cells in each test tube at the time they were exposed to the virus. This
number corresponds to the number of bacteria in the colony that developed a
mutation which led to resistance. We denote this number by YN , where N is
the size of the colony after the nth division cycle. Assuming that the bacteria
may develop mutation to resistance after exposure to the virus, if N is very
large, according to the result in Section 4.1.5, the distribution of YN can be
approximated by a Poisson distribution with parameter λ = pN , where p is the
mutation rate and N is the size of the colony. It then follows that the probability
of no mutations occurring in one division cycle is
Pr(YN = 0) ≈ e−λ , (4.72)
according to (4.67). This probability can also be estimated experimentally as
Luria and nd Delbrück showed in their 1943 paper. In one of the experiments
described in that paper, out of 87 cultures of 2.4 × 108 bacteria, 29 showed not
resistant bacteria (i.e., none of the bacteria in the culture mutated to resistance
and therefore all perished after exposure to the virus). We therefore have that
29
Pr(YN = 0) ≈ .
87
Comparing this to the expression in Equation (4.72), we obtain that
29
e−λ ≈ ,
87
which can be solved for λ to obtain
 
29
λ ≈ − ln
87
or
λ ≈ 1.12.
The mutation rate, p, can then be estimated from λ = pN :
λ 1.12
p= ≈ ≈ 4.7 × 10−9 .
N 2.4 × 108

4.1.7 Another Brief Excursion into Probability


We have seen that, if A and B are independent events, then
Pr(A ∩ B) = Pr(A) · Pr(B); (4.73)
in other words, the probability of the joint occurrence of two independent events
is the product of there probabilities. In many situations, however, the occurrence
of an event will have an effect on the probability of the occurrence o the other
event. Here is a simple example that illustrates how this can happen.
54 CHAPTER 4. STOCHASTIC MODELS

Example 4.1.30. Put four marbles in a bag. Two of the marbles are red and
the other two are blue. Pick two marbles at random and without replacement.
Labeling the marbles R1 , R2 , B1 and B2 , for the two red marbles and the two
blue marbles, respectively, we see that the sample space for the experiment (the
set of all possible outcomes of the experiment) can be written as

C = {R1 R2 , R1 B1 , R1 B2 , R2 B1 , R2 B2 , B1 B2 }.

The assumption of randomness in the picking of the two marbles implies that
each of the outcomes in C is equally likely; so that
1
Pr(c) = , for all c ∈ C. (4.74)
6
Let A denote the event that at least one of the marbles is red and B denote
the event that the two marbles have the same color; then,

A = {R1 R2 , R1 B1 , R1 B2 , R2 B1 , R2 B2 },

so that, in view of (4.74),


5
Pr(A) = . (4.75)
6
Similarly, since
B = {R1 R2 , B1 B2 },
it follows from (4.74) that Then,
2 1
Pr(B) = = . (4.76)
6 3
On the other hand, since
A ∩ B = {R1 R2 },
that is, the joint event A ∩ B is the event that both marbles are red,
1
Pr(A ∩ B) = . (4.77)
6
It follows from (4.75) and (4.76) that
5 1 5
Pr(A) · Pr(B) = · = . (4.78)
6 3 18
Comparing (4.77) and (4.78) we see that (4.73) does not hold true in this ex-
ample. Thus, A and B are not stochastically independent.

Definition 4.1.31 (Conditional Probability). Given two events, A and B, with


Pr(B) > 0, we define the conditional probability of A given B, denoted Pr(A |
B), by
Pr(A ∩ B)
Pr(A | B) = . (4.79)
Pr(B)
4.1. RANDOM VARIABLES 55

Similarly, if Pr(A) > 0, he conditional probability of B given A, denoted Pr(B |


A), by
Pr(A ∩ B)
Pr(B | A) = . (4.80)
Pr(A)
Remark 4.1.32. It follows from (4.80) that, if Pr(A) > 0, then

Pr(A ∩ B) = Pr(A) · Pr(B | A); (4.81)

thus, by introducing the concept of conditional probability, in some sense we


recover (4.73), which, as seen in Example 4.1.30, does not hold in general.
Observe that (4.81) holds true for all events A and B such that Pr(A) > 0.
Proposition 4.1.33. Assume that Pr(A) > 0 and Pr(B) > 0. Then, events A
and B are independent if and only if

Pr(A | B) = Pr(A), and Pr(B | A) = Pr(B). (4.82)

Proof: Assume that A and B are independent. Then, (4.73) holds true. Since
Pr(B) > 0, it follows from (4.73) and (4.79) that
Pr(A ∩ B) Pr(A) · Pr(B)
Pr(A | B) = = = Pr(A).
Pr(B) Pr(B)
Similarly, since Pr(A) > 0, it follows from (4.73) and (4.80) that

Pr(B | A) = Pr(B).

Thus, (4.82) holds true if A and B are independent.


Conversely, suppose that (4.82) holds true. It then follows from (4.81) that

Pr(A ∩ B) = Pr(A) · Pr(B | A) = Pr(A) · Pr(B),

which shows that A and B are independent. 


Proposition 4.1.34 (Properties of Conditional Probabilities). Let A, B, E1 , E2 , . . . , En
denote subsets of a sample space C on which a probability function, Pr, is de-
fined.
1. If Pr(B) > 0, then Pr(Ac | B) = 1 − Pr(A | B).
2. (Law of Total Probability). Suppose E1 , E2 , E3 , . . . , En are mutually ex-
clusive such that P (Ei ) > 0 for i = 1, 2, 3...., n, and
n
[
C= Ek .
k=1

Then,
n
X
P (B) = P (Ek ) · P (B | Ek ) (4.83)
k=1
56 CHAPTER 4. STOCHASTIC MODELS

4.1.8 Continuous Random Variables


We begin with an example regarding modeling the time, T , that people spend at
a check–out counter in a supermarket, for instance. In this case, T is a random
variable that can take on a continuum of values; that is, any real value within
an interval of real numbers; more specifically, in this case, T could take on any
value in the interval (0, ∞). Thus, T is an example of a continuous random
variable.
For a continuous random variable, T , we are interested in its probability
density function, fT , or pdf. Once the pdf is known, we can compute the
probability that T will take on a certain range of values by integration; for
example,
Z b
Pr(a < T 6 b) = fT (t) dt. (4.84)
a
It the next example, we model the service time T and show how to compute fT
by first computing the cumulative distribution function, FT , defined by

FT (t) = Pr(T 6 t), for all t ∈ R. (4.85)

It follows from (4.84) and (4.85) that


Z t
FT (t) = fT (τ ) dτ, for all t ∈ R,
−∞

so that, by the Fundamental Theorem of Calculus,

fT (t) = FT0 (t), (4.86)

for values of t at which fT is continuous.


Example 4.1.35 (Service time at a checkout counter). Suppose you sit by a
checkout counter at a supermarket and measure the time, T , it takes for each
customer to be served. This is a continuous random variable that takes on values
in a time continuum. We would like to compute the cumulative distribution
function given by (4.85), FT (t) = Pr(T 6 t), for t > 0.
Let N (t) denote the number of customers being served at a checkout counter
(not in line) at time t, and assume that either N (t) = 1 or N (t) = 0; in other
words, N (t) is a Bernoulli random variable for each t. Here, we are also assuming
that, once service is completed, no new costumer will walk up to the checkout
counter.
Set
p(t) = Pr[N (t) = 1], for t > 0, (4.87)
and assume that p(0) = 1; that is, at the start of the observation, one person is
being served.
We consider what happens to the probability p(t) for t in a short time interval
[t, t + ∆t]. We would like to estimate p(t + ∆t), where ∆t is very small; i.e., the
probability that a person is being served at time t + ∆t.
4.1. RANDOM VARIABLES 57

We assume that the the probability that service will be completed in the
short time interval [t, t + ∆t] is proportional to ∆t; say µ∆t, where µ > 0
is a proportionality constant. Then, the probability that service will not be
completed at t + ∆t is 1 − µ∆t. This situation is illustrated in the state diagram
pictured in Figure 4.1.1: The circles in the state diagram represent the possible

1 − µ∆t

 ?
# #

1 0
"!1"!
µ∆t

Figure 4.1.1: State diagram for N (t)

values of N (t), or states. In this case, the states are 1 or 0, corresponding to


one person being served and no person being served, respectively. The arrows
represent transition probabilities from one state to another (or the same) in the
interval from t to t + ∆t. Thus the probability of going from sate N (t) = 1 to
state N (t) = 0 in that interval (that is, service is completed) is approximately
µ∆t, while the probability that the person will still be at the counter at the end
of the interval is 1 − µ∆t.
By the law of total probability in (4.83)

Pr(N (t + ∆t) = 1) = Pr(N (t) = 1) · Pr(N (t + ∆t) = 1 | N (t) = 1)

+Pr(N (t) = 0) · Pr(N (t + ∆t) = 1 | N (t) = 0),

where Pr(N (t + ∆t) = 1 | N (t) = 1) is the probability that service is not


completed in the time interval [t, t + ∆t]; so that

Pr(N (t + ∆t) = 1 | N (t) = 1) ≈ 1 − µ∆t,

by the previous consideration. We can therefore write

p(t + ∆t) ≈ p(t)(1 − µ∆t)


(4.88)
+(1 − p(t)) · Pr(N (t + ∆t) = 1 | N (t) = 0),

Since we are also assuming that

Pr(N (t + ∆t) = 1 | N (t) = 0) = 0,

for ∆t small enough (see also the diagram in Figure 4.1.1), we therefore get
from (4.88) that
p(t + ∆t) ≈ p(t)(1 − µ∆t),
58 CHAPTER 4. STOCHASTIC MODELS

or
p(t + ∆t) − p(t) ≈ −µ∆t. (4.89)
Dividing both sides of (4.89) by ∆t 6= 0 and taking the limit as ∆t → 0 we
obtain that
p(t + ∆t) − p(t)
lim = −µp(t) (4.90)
∆t→0 ∆t
It follows from (4.90) that p is differentiable and satisfies the differential equa-
tion.
dp
= −µp(t). (4.91)
dt
The first order differential equation in (4.91) can be solved subject to the initial
condition p(0) = 1 to yield

p(t) = e−µt , for t > 0. (4.92)

Recall that T denotes the time it takes for service to be completed, or the
service time at the checkout counter. Thus, it is the case that

Pr[T > t] = Pr[N (t) = 1], for t > 0; (4.93)

that is, T > t if and only if at time t the person is still at the checkout counter.
It follows from (4.93) and (4.87) that

Pr[T > t] = p(t), for t > 0,

which can be written as

Pr[T > t] = e−µt , for t > 0,

in view of (4.92). We then get that

Pr[T 6 t] = 1 − Pr[T > t] = 1 − e−µt , for t > 0,

so that the cumulative distribution function (cdf) of T is


(
1 − e−µt , for t > 0;
FT (t) = (4.94)
0 for t < 0.

A portion of the graph of this cumulative distribution function is shown in Figure


4.1.2. It follows from (4.86) and (4.94) that the probability density function for
the service time, T , is given by
(
µe−µt , for t > 0;
fT (t) = (4.95)
0 for t < 0.

To see that the function in (4.95) indeed defines a probability density func-
tion, compute
Z ∞ Z b
fT (t) dt = lim µe−µt dt, (4.96)
−∞ b→∞ 0
4.1. RANDOM VARIABLES 59

1.0

0.75

0.5

0.25

0.0

0 1 2 3 4 5
x

Figure 4.1.2: Cumulative Distribution Function of T for t > 0

where Z b  −µt b
µe−µt = −e 0
= 1 − e−µb (4.97)
0

It follows from (4.97) and (4.96) that


Z ∞
fT (t) dt = lim [1 − e−µb ] = 1, (4.98)
−∞ b→∞

since µ > 0.

Definition 4.1.36 (Probability Density Function). An integrable function,


f : R → R, is said to be a probability density function if

(i) f (x) > 0 for all x ∈ R, and


Z ∞
(ii) f (x) dx = 1.
−∞

Remark 4.1.37. It follows from (4.98) in Example 4.1.35 that the function
fT defined in (4.95) is a probability density function. We say that fT is the
probability density function of the random variable T .

Definition 4.1.38 (Cumulative Distribution Function). Let X be a random


variable with probability density function fX . Then, for any real numbers, a
and b, with a < b;
Z b
Pr[a < X 6 b] = fX (x) dx.
a

The function, FX : R → R, defined by


Z x
FX (x) = Pr[X 6 x] = fX (t) dt, for all x ∈ R,
−∞

is called the cumulative distribution function of X.


60 CHAPTER 4. STOCHASTIC MODELS

Remark 4.1.39. The function FT defined in (4.94) in Example 4.1.35 is the


cumulative distribution function of the service time T .
Definition 4.1.40 (Expected Value of a Continuous Random Variable). Let
X be a continuous random variable with probability density function fX . If
Z ∞
|x|fX (x) dx < ∞,
−∞

we define the expected value of X, denoted E(X), by


Z ∞
E(X) = xfX (x) dx.
−∞

Example 4.1.41 (Average Service Time). In the service time example, Exam-
ple 4.1.35, we showed that the time, T , that it takes for service to be completed
at a checkout counter has an exponential distribution with probability density
function given in (4.95),
(
µe−µt for t > 0,
fT (t) = (4.99)
0 otherwise,

where µ is a positive parameter. Note that for the expression in (4.99) to make
sense, the parameter µ has to have units of 1/time.
Observe that
Z ∞ Z ∞
|t|fT (t) dt = tµe−µt dt
−∞ 0

Z b
= lim t µe−µt dt
b→∞ 0

 b
−µt 1 −µt
= lim −te − e
b→∞ µ 0
 
1 1
= lim − be−µb − e−µb
b→∞ µ µ

1
= ,
µ
where we have used integration by parts and L’Hospital’s rule. It then follows
that Z ∞
1
|t|fT (t) dt = < ∞
−∞ µ
and therefore the expected value of T exists and
Z ∞ Z ∞
1
E(T ) = tfT (t) dt = tµe−µt dt = .
−∞ 0 µ
4.2. RANDOM PROCESSES 61

Thus, the parameter µ is the reciprocal of the expected service time, or average
service time, at the checkout counter.
Example 4.1.42. Suppose the average service time, or mean service time,
at a checkout counter is 5 minutes. Compute the probability that a given person
will spend at least 6 minutes at the checkout counter.
Solution: By the result of Example 4.1.35, we assume that the service time,
T , has a probability density function given in (4.95) with µ = 1/5. We then
have that
Z ∞ Z ∞
1 −t/5
Pr(T > 6) = fT (t) dt = e dt = e−6/5 ≈ 0.30.
6 6 5
Thus, there is a 30% chance that a person will spend 6 minutes or more at the
checkout counter. 
Definition 4.1.43 (Exponential Distribution). A continuous random variable,
X, is said to be exponentially distributed with parameter β > 0, written
X ∼ Exponential(β),
if it has a probability density function given by
1 −x/β

β e for x > 0,


fX (x) =


0 otherwise.

The expected value of X ∼ Exponential(β), for β > 0, is E(X) = β.

4.2 Random Processes


In this section we come back to the problem of determining the distribution of
the number of mutations of a certain type that occur in a bacterial colony. We
analyzed this problem in Section 4.1 by considering the limit of Binomial(N, p)
distributions when N is very large while N p = λ is a constant. This approach
led to a Poisson(λ) distribution. Here we take into account the fact that the
bacterial population size is changing with time, t. Accordingly, we define M (t)
to be the number of mutations of a certain type that occur in the time interval
[0, t], for t > 0. Note that, for each t, M (t) is a random variable that can take
on any of the values
0, 1, 2, 3, . . . .
We are interested in computing the probability that M (t) attains each of those
values for each time t. In symbols, we would like to compute
Pr[M (t) = m] for m = 0, 1, 3, . . . and t > 0.
We shall denote Pr[M (t) = m] by Pm (t).
We would like to compute Pm (t), for each m = 1, 2, 3, . . . and t > 0, under
the following assumptions:
62 CHAPTER 4. STOCHASTIC MODELS

(i) P0 (0) = Pr[M (0) = 0] = 1; that is, initially no bacterium has mutated
into a strain of the particular type under study. It then follows that

Pm (0) = 0, for m ≥ 1 (4.100)

(ii) The probability that any bacterium develops a mutation in a short time
interval [t, t + ∆t] depends only on ∆t and not on the number of mutant
bacteria at previous times.
(iii) The probability of a new mutation in the short interval [t, t + ∆t] is pro-
portional to ∆t; in symbols

Pr(new mutation in [t, t + ∆t]) ≈ λ∆t,

where λ > 0 is a constant of proportionality.


(iv) ∆t is so small that the probability of two or more mutations occurring in
the short time interval [t, t + ∆t] is zero.

In order to determine Pm (t) for each m = 1, 2, 3, . . . and t > 0, first we need


to estimate Pm (t+∆t) for ∆t small enough. Thus, we need to model the process
of going from time t to the time t + ∆t. As in Example 4.1.35, examination of a
state diagram for M (t) can be used to aid us in this process. The state diagram
is pictured in Figure 4.1.1. Each of the circles in the state diagram represents
the number of mutations at any given stage.

1 − λ∆t 1 − λ∆t 1 − λ∆t 1 − λ∆t 1 − λ∆t


    
 ?  ?  ?  ?  ?
0j - 1j - 2j - 3j - 4j - . . .
λ∆t λ∆t λ∆t λ∆t λ∆t

Figure 4.2.3: State diagram for M (t)

The arrows in Figure 4.2.3 indicate the transition probabilities of going from
one state to the next, or those of remaining in the same state, in a short time
interval [t, t + ∆t]. For instance, if at time t there are no mutants in the colony
(i.e., the system is in state 0 at that time), then at time t + ∆t there might a
bacterium that has developed a mutation. The system would then go from state
0 to state 1 in the time interval [t, t + ∆t]; the probability of this occurrence
is approximately λ∆t by assumption (iii); this is indicated by the arrow in the
diagram that goes from state 0 to state 1. On the other hand, there might
not be a new mutation in the time interval [t, t + ∆t]; the probability of this
occurring is approximately 1 − λ∆t (why?), and this is shown by the arrow that
starts at state 0 and which winds back again to 0. Observe that assumption (iv)
is implicit in the state diagram in Figure 4.2.3 since the states can only increase
by 1 and not by 2 or more; thus, arrows from a given state either return to that
state or go to the next one.
4.2. RANDOM PROCESSES 63

The state diagram in Figure 4.2.3 can be used to estimate Pm (t + ∆t), given
that we know Pm (t), for very small values of ∆t. We start out with the case
m = 0 as follows:

P0 (t + ∆t) = P0 (t) · Pr(no new mutations in [t, t + ∆t] | M (t) = 0).

Using assumption (ii) we have

P0 (t + ∆t) = P0 (t) · P (no new mutations in [t, t + ∆t]), (4.101)

since

Pr(no new mutations in [t, t+∆t] | M (t) = 0) = Pr(no new mutations in [t, t+∆t])

by stochastic independence. It then follows from (4.101) and assumption (iii)


that
P0 (t + ∆t) ≈ P0 (t) · (1 − λ∆t),
or
P0 (t + ∆t) ≈ P0 (t) − λ∆tP0 (t). (4.102)
Rearranging equation (4.102) and dividing by ∆t 6 0, we obtain

P0 (t + ∆t) − P0 (t)
≈ −λP0 (t). (4.103)
∆t
Next, let ∆t → 0 in (4.103) to conclude that P0 (t) is differentiable and

dP0
= −λP0 ; (4.104)
dt
that is, P0 (t) satisfies a first order differential equation. The differential equation
in (4.104) can be solved by separation of variables to yield

P0 (t) = Ce−λt , (4.105)

for some constant C. Since P0 (0) = 1 by assumption (i), it follows that C = 1


in (4.105), and so the probability of no mutations in the colony at time t is given
by
P0 (t) = e−λt , for t > 0. (4.106)
We next proceed to compute P1 (t). Using the state diagram in Figure 4.2.3
we obtain that

P1 (t + ∆t) ≈ P0 (t) · λ∆t + P1 (t) · (1 − λ∆t), (4.107)

since, according to the state diagram in Figure 4.2.3, the system can get to
state 1 at t + ∆t via two routes: (i) from state 0 through a new mutation
which occurs with probability λ∆t, approximately, or (ii) from state 1 if no new
mutation occurs in the time interval [t, t + ∆t], and the approximate probability
64 CHAPTER 4. STOCHASTIC MODELS

of this occurrence is 1 −λ∆t. Here we have also used the law of total probability
in (4.83) and the independence assumption (ii).
Rearranging equation (4.107) and dividing by ∆t 6= 0, we obtain
P1 (t + ∆t) − P1 (t)
≈ −λP1 (t) + λP0 (t). (4.108)
∆t
Next, let ∆t → 0 in (4.108) to conclude that P1 is differentiable and satisfies
the differential equation
dP1
= −λP1 + λP0 (t)
dt
or, using (4.106),
dP1
= −λP1 + λe−λt . (4.109)
dt
The differential equation (4.109) can be solved as follows: Rewrite the equa-
tion as
dP1
+ λP1 = λe−λt (4.110)
dt
and multiply both sides of (4.110) by eλt to get
dP1
eλt + λeλt P1 = λ (4.111)
dt
Observe that, by the Product Rule,
d λt dP1
(e P1 ) = eλt + λeλt P1 ,
dt dt
and so the differential equation in (4.111) reduces to
d λt
(e P1 ) = λ. (4.112)
dt
The equation in (4.112) can be integrated to yield

eλt P1 = λt + C,

for some arbitrary constant C, and therefore

P1 (t) = λt e−λt + Ce−λt (4.113)

for t > 0.
Next, use the initial condition P1 (0) = 0 in (4.100), which follows from
assumption (i), to get that C = 0, and therefore

P1 (t) = λt e−λt , for t > 0 (4.114)

In order to compute P2 (t), we proceed in a way similar to that used to


compute P1 (t). From the state diagram in Figure (4.2.3) we get that

P2 (t + ∆t) = P1 (t) · λ∆t + P2 (t) · (1 − λ∆t),


4.2. RANDOM PROCESSES 65

from which we are led to the differential equation

dP2
= −λP2 + λP1 (t)
dt
or, using (4.114),
dP2
= −λP2 + λ2 te−λt . (4.115)
dt
We can solve this differential equation as we solved (4.115), by first rearranging
and multiplying by eλt to get

dP2
eλt + λeλt P2 = λ2 t, (4.116)
dt
and then re–writing the left–hand side of (4.116), so that

d λt
(e P2 ) = λ2 t. (4.117)
dt
Next, integrate the equation in (4.117) and use the initial condition P2 (0) = 0
in (4.100) to get
(λt)2 −λt
P2 (t) = e , for t > 0. (4.118)
2
One can go through the same procedure leading to (4.118) to obtain the
formula
(λt)3 −λt
P3 (t) = e
3!
for P3 (t), and this suggests the general formula for Pm (t), m = 0, 1, 2, . . ., to be

(λt)m −λt
Pm (t) = e , for t > 0. (4.119)
m!
We will establish the formula in (4.119) by induction on m. Observe that we
have already established the basic case m = 0 in (4.106). Next, for the inductive
step, assume that the formula (4.119) holds for m, and we seek to show that it
also holds for m + 1. Using the state diagram 4.2.3 we see that

Pm+1 (t + ∆t) ≈ Pm (t) · λ∆t + Pm+1 (t) · (1 − λ∆t),

from which we are led to the differential equation

d
(Pm+1 ) = −λPm+1 + λPm (t)
dt
or, using the inductive hypothesis (4.119),

d λm+1 tm −λt
(Pm+1 ) = −λPm+1 + e . (4.120)
dt m!
66 CHAPTER 4. STOCHASTIC MODELS

We can solve the differential equation in (4.120) as we solved (4.115); that is,
first rearrange the equation and multiply by eλt to get
d λm+1 tm
eλt (Pm+1 ) + λeλt Pm+1 = ; (4.121)
dt m!
then, re–write the left–hand side of the equation in (4.121) to get
d λt λm+1 tm
(e Pm+1 ) = . (4.122)
dt m!
Integrating (4.122) and using the initial condition Pm+1 (0) = 0 in (4.100), we
obtain
(λt)m+1 −λt
Pm+1 (t) = e
(m + 1)!
for all t ≥ 0, since (m+1)! = (m+1)m!. This establishes the formula (4.119) for
the case m+1, and therefore formula (4.119) is now proved for all m = 0, 1, 2, . . .
by induction on m.
Note that the formula in (4.119),
(λt)m −λt
Pm (t) = e , for m = 0, 1, 2, 3, . . . and t > 0, (4.123)
m!
is the probability distribution of a Poisson(λt) random variable. We have there-
fore demonstrated that assumptions (i)–(iv) imply that, for each t > 0, M (t) is
a Poisson random variable with parameter λt,

M (t) ∼ Poisson(λt), for t > 0. (4.124)

We say that M (t) is a Poisson random process. This particular random process
is characterized by the assumptions (i)–(iv) that we made on M (t).
In general, a random process is a collection, {X(t) | t ∈ I}, of random
variables, X(t), for t in some indexing set I. If I is countable, for instance,
I = N or I = Z, the random process {X(t) | t ∈ I} is called a discrete–time
random process. If I is some interval of real numbers, then {X(t) | t ∈ I} is
called a continuous–time random process. For the case of the Poisson random
process defined in (4.123), I = [0, ∞). Therefore, the Poisson random process
is a continuous–time random process.
Given a random process, {X(t) | t ∈ I}, the mean of the random process is
given by
E[X(t)], for t ∈ I.

It follows from (4.124), or (4.123), and the calculations in Section 4.1.5 that

E[M (t)] = λt, for t > 0. (4.125)

Thus, assumptions (i)–(iv) imply that the expected number of mutations in the
interval [0, t] is proportional to the length of the time interval. The constant of
proportionality, λ, represents the average number of mutations per unit time.
Bibliography

[EK88] L. Edelstein-Keshet. Mathematical Models in Biology. Random


House/Birkhäuser, 1988.

[LD43] S. E. Luria and M. Delbrück. Mutations of bacteria from virus sensi-


tivity to virus resistence. Genetics, 28:491–511, 1943.
[LW55] M. J. Lighthill and G. B. Whitham. On kinematic waves II: A theory
of traffic flow on long crowded roads. Proc. R. Soc. Lond., 229:317–345,
1955.

[Ric56] P. Richards. Shock waves on the highways. Operations Research,


4(1):42–51, 1956.

67

You might also like