Part3 Lecturenotes
Part3 Lecturenotes
c Eva Kisdi. Any part of this material may be copied or re-used only with the
explicit permission of the author.
1
Contents
1 Introduction: A need for matrices 3
1.1 Allele frequencies under mutation . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Conformational states of molecules . . . . . . . . . . . . . . . . . . . . . . 8
1.3 Growth of a structured population . . . . . . . . . . . . . . . . . . . . . . 10
1.4 A metapopulation model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.5 Age-structured populations . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2 Matrix operations 16
2.1 Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2 Transposed vectors and vector products . . . . . . . . . . . . . . . . . . . . 18
2.3 The identity matrix and the inverse matrix . . . . . . . . . . . . . . . . . . 19
2.4 Sums of matrices and multiplication with a number . . . . . . . . . . . . . 19
3 Modelling discrete life cycles and the order of matrices in matrix mul-
tiplication 20
2
1 Introduction: A need for matrices
Learning matrix algebra is a major step towards biological modelling. Most biological
models involve several variables: concentrations of various substances, frequencies of mul-
tiple alleles, numbers of individuals of different ages, states or species. There are countless
many more examples where a number of variables change simultaneously. Multivariate
models are much richer and more interesting than models of a single variable, and to
analyse models with several variables beyond the elementary insights discussed in Part 1,
matrix algebra is indispensable.
There is however a learning difficulty at this step. To start analysing multivariate mod-
els, one needs to master matrix algebra; and matrix algebra needs some time to learn.
Matrix algebra is not particularly difficult, but at some intermediate stages it is not im-
mediately obvious how particular bits of matrix algebra will be useful. In other words,
the student sometimes needs to go on the good faith that soon we arrive at something of
practical value. As before, we shall make every effort to learn the necessary mathematics
via biological applications wherever it is possible.
Consider a population with discrete generations where two alleles (A1 and A2 ) segre-
gate in a locus. In every generation, each copy of allele A1 mutates into A2 with probability
µ, whereas A2 mutates into A1 with probability ν:
µ
A1 A2
ν
We assume that population size is constant, and therefore the total number of alleles, N ,
is the same in every generation. Denote the frequency of allele A1 in generation t with pt ,
such that the population harbours pt N copies of A1 and (1 − pt )N copies of A2 at time t.
3
Assume that N is large enough so that stochastic variation in the number of alleles in the
next generation (often referred to as genetic drift) can be neglected. How does pt change
from generation to generation under the force of mutation?
To calculate pt+1 , notice that an allele can be of type A1 in the next generation for two
reasons: either it was originally (in generation t) A1 and did not mutate (i.e., remained
A1 ), or it was originally A2 and mutated to become A1 . The former route produces
pt N (1 − µ) copies of A1 , where pt N is the number of original A1 alleles and a fraction
1 − µ of those will not mutate. The latter route gives (1 − pt )N · ν copies of A1 because of
the original (1 − pt )N alleles of type A2 , a fraction ν mutates and becomes A1 . Hence the
number of A1 alleles in generation t+1, pt+1 N , is given by pt+1 N = (1−µ)pt N +ν(1−pt )N ;
and after dividing with the constant N , we obtain
In the probabilistic terms used in Part 2, µ and ν are conditional probabilities. µ is the
probability that an allele will be A2 in the next generation given that it is A1 now; i.e.,
µ = P (A2 |A1 ). The conditional probability that the allele will be A1 if it is now A1 is the
probability that A1 does not mutate, i.e., 1 − µ = P (A1 |A1 ). Similarly, ν = P (A1 |A2 ) and
1 − ν = P (A2 |A2 ). Rewriting equation (1) with the notation of conditional probabilities
gives pt+1 = P (A1 |A1 )pt + P (A1 |A2 )(1 − pt ), which the reader will recognise as the law of
total probability for having A1 in the next generation.
To find the equilibrium of the mutational process, we set pt+1 = pt = p in equation (1),
p = (1 − µ)p + ν(1 − p)
How do we generalize the above model for more than two alleles? Many loci have a
number of phenotypically distinguishable alleles (for example, in the AB0 blood group
system there are three alleles, A, B and 0), and DNA sequencing uncovers a huge num-
ber of allele varieties. Crudely grouping the different sequences such as ”functional” and
”non-functional” is actually not accurate; for example there are many possible errors that
destroy functionality and some are easier to restore than others, i.e., the back mutation
4
rate ν is not the same for each non-functional allele.
Suppose thus that there are n alleles segregating in a locus (A1 , ..., An ), and each can
mutate into each other (or if not, then we set some of the mutation probabilities to 0).
Let pi denote the frequency of allele Ai (where i can be any of the numbers 1, ..., n). As
before, the sum of all allele frequencies must add up to 1, i.e., we have p1 +p2 +...+pn = 1,
or the same in more concise notation,
X
pi = 1
i
This means that we could express one of the frequencies with all the others; for example,
we could substitute pn with pn = 1 − (p1 + ... + pn−1 ), and therefore we could work with
only p1 , ..., pn−1 . This was useful in the case of only two alleles, where we obtained the
frequency of A2 as 1 − p and therefore could work with only one variable, p. With three
or more alleles, however, the reduction by one variable is no real help in practice, and it is
actually better to avoid singling out pn from among the variables (in a more mathematical
language, don’t break the symmetry of the problem by treating some allele(s) differently
from others).
We need new notation also for the mutation probabilities. As the number of alleles
n increases, the number of possible mutations explodes [it equals n2 − n], and it is not
practical to assign a separate letter to each of them. Instead, let qij denote the probability
that Aj mutates into Ai . Notice the order of indices: the second index j shows the
original allele and the first index i is the new allele (the indices show the allele’s identity
”backwards” in time). qii (where the two indices are the same) is the probability that
Ai produces Ai , i.e., that there was no mutation. More generally (i.e. also outside the
context of genetics and mutations), the probabilities qij are called transition probabilities.
In the two-allele example above, the probability that A1 mutates into A2 is µ and hence
in the new notation q21 = µ. q11 is the probability that A1 remains A1 , i.e., q11 = 1 − µ.
Similarly, q12 = ν and q22 = 1 − ν. The transition probabilities
P qij are precisely the same as
the conditional probabilities P (i|j), qij = P (i|j). The sum i qij is the sum of probabilities
that Aj becomes A1 or A2 P etc. up to An . Because Aj certainly becomes one of the possible
alleles A1 , ..., An , we have i qij = 1.
We can now write down the frequencies of all n alleles in the next generation analo-
gously to equation (1). Denoting the next generation with a prime (because the sub-index
is now used to denote the identity of the allele), the equations for three alleles (n = 3) are
5
With n more than only very few, the equations analogous to (3) become unwidely.
When these equations are used e.g. to find the equilibrium allele frequencies, the results
are not transparent simple formulas as equation (2) above, but complicated expressions
that depend on the many transition probabilities. It is therefore imperative to simplify
notation, find the structure of the problem, and study all problems with the same struc-
ture in one go independently of the particulars such as the exact value of n.
We therefore introduce a vector as a ”list” of all variables. With three alleles we have
the three variables p1 , p2 , p3 and the corresponding vector is
p1
p = p2
p3
The boldface notation1 p reminds that the symbol means a vector, not just a number
(although this is often omitted in more mathematical texts).
The transition probabilities we collect into a matrix, which is ”table” of n rows and n
columns; the ith row of the jth column contains the transition probability qij . For three
alleles, the matrix of transition probabilities is
q11 q12 q13
Q = q21 q22 q23
q31 q32 q33
The boldface2 capital is a reminder of a matrix. If the number of rows and columns is the
same (as here), the matrix is a square matrix. Square matrices are the most common, and
in this course we focus on them (even though one might note that an n-vector is actually
a matrix with n rows and a single column, i.e., an n × 1 matrix).
1
in handwriting, one can underline the symbol of a vector
2
in handwriting, the symbol of a matrix is twice underlined
6
P
As noted above, our matrix Q has the property that i qij = 1 for each choice of j, i.e.,
that each column of the matrix sums to 1. Such matrices are called stochastic matrices;
they usually contain transition probabilities as in the present example.
Next, we need to know how to multiply a matrix and a vector. The result of such a
multiplication is a new vector. To calculate the first element of the result vector, do the
following using the first row of the matrix: multiply the 1st element in the row with the
1st element of the vector; to this, add the product of the 2nd element in the row and the
2nd element of the vector; to this, add the product of the 3rd element in the row and the
3rd element of the vector; and so on till the end of the row and of the vector. (If they
do not have the same length, then the multiplication is not possible.) This sum of the
products is the first element of the result vector. To obtain the second element of the
result vector, do the same but using the second row of the matrix; and so on (see Figure
1). The result vector has as many elements as many rows the matrix has. An n×n square
matrix can thus be multiplied with a vector of length n, and the result is again an n-vector.
Figure 1: Multiplication of a matrix with a vector. The result is the vector on the right
hand side. To obtain its first element, go through the first row of the matrix and through
the vector; multiply the matching elements of the matrix and of the vector; add up these
products. To obtain the second element of the result vector, do the same using the second
row of the matrix; etc.
Notice that the vector on the right hand side contains p01 , p02 , p03 as given in equations (3)!
Hence we can calculate the allele frequencies of the next generation as
0
p1 q11 q12 q13 p1
p02 = q21 q22 q23 p2
p03 q31 q32 q33 p3
7
or more concisely,
p0 = Qp or pt+1 = Qpt (4)
The rules of matrix multiplication may first seem somewhat arcane, but as this example
illustrates, matrix multiplication is defined in the way as it proves useful.
In this model, the elements of matrix Q are constants. The transition probabilities
(or conditional probabilities of having Ai in the next generation if we have Aj now) do
not depend on the history of the alleles; Aj mutates into Ai with a fixed probability no
matter from which other allele Aj might have arisen in the past. The vector pt therefore
gives a complete description of the system at time t, sufficient to predict its future pt+1
without reference to the past.
α
A1 A2
β
is analogous to the forward-backward mutations between two alleles in section (1.1), ex-
cept that this molecular dynamics plays out in continuous time (as opposed to the discrete
generations we assumed in the mutation model). Accordingly, α and β are rates and not
probabilities (cf. Part 1).
Denote the number of open channels by n1 and the number of closed channels by n2 .
The dynamics of open and closed channels is given by
dn1
= −αn1 + βn2
dt
dn2
= αn1 − βn2
dt
The sum of these two equations is obviously zero:
dn1 dn2
+ = −αn1 + βn2 + αn1 − βn2 = 0
dt dt
and this is because the total number of membrane channels, n1 + n2 = N , does not
change with time. One of the two differential equations is superfluous; the two equations
8
don’t give independent information and if we know n1 at some time, we immediately have
n2 = N − n1 at the same time. We can thus continue with using only the first equation
and substituting n2 = N − n1 into it:
dn1
= −αn1 + β(N − n1 )
dt
If we like, we can also divide this equation with N . If we define p = n1 /N to be the
fraction of open channels, then we obtain
dp
= −αp + β(1 − p) (5)
dt
To compare with the mutation model, note that the above equation gives the change of
p whereas equation (1) gives its value next year. In continuous time, there is no ”next”
time step. Expressing the change of p also from equation (1), we obtain pt+1 − pt =
−µpt + ν(1 − pt ), which is directly analogous to equation (5). To find the equilibrium of
the opening-closing process, we set the change in equation (5) to zero and solve for the
equilibrium fraction of open channels:
−αp + β(1 − p) = 0
β
p=
α+β
This is again analogous to the equilibrium allele frequency of the mutation model in equa-
tion (2).
9
equations can be written as
dp
= Kp (6)
dt
where the derivative on the left hand side is the vector of the derivatives,
dp1 /dt
dp
= dp2 /dt
dt
dp3 /dt
Equation (7) assumes that all individuals are equivalent, i.e., they all have the same
birth and death rates; such an unstructured population can be described by a single
variable N , the number of equivalent individuals. Suppose now that the population
contains juveniles and adults, and let respectively N1 and N2 denote their numbers. Adults
10
produce juveniles at rate b and die at rate d. Juveniles mature into adults at rate m and
die at rate δ. The dynamics of this structured population is thus given by
dN1
= bN2 − mN1 − δN1
dt
dN2
= mN1 − dN2 (8)
dt
Notice that the only positive term in dN1 /dt is bN2 , because the only way to get new
juveniles added to N1 is the reproduction of adults. The term mN1 appears as a negative
term in the first equation but a positive term in the second equation: by maturation,
juveniles cease to be juveniles and are removed from N1 ; they however do not disappear
from the population but become adults and are added to N2 . Death affects each stage
separately.
Exercise: By rewriting equations (8) into matrix-vector form, show that the
model is equivalent to
dN1 /dt −(m + δ) b N1
=
dN2 /dt m −d N2
For which parameter values will this structured population grow? The answer is
much less straightforward than for equation (7)! Some intuition helps us to appreciate
the significance of the juvenile stage. If m is very large, then juveniles mature almost
instantaneously after birth; in this case, the model is very close to equation (7) and
the approximate condition for population growth is again b − d > 0. If however m is
very small, then the offspring spend an enormously long time as juveniles and (assuming
δ > 0) almost all of them die before they would maturre; hence the population goes
extinct (unless b is enormously large to compensate for juvenile mortality).
In this particular example, we can find a shortcut to evaluating the condition of population
growth, using what we have learnt about exponential decay in Part 1. Maturation and
death are two exponential decay processes that remove juveniles; hence the probability that
a juvenile matures rather than dies is m/(m+δ). Adults die as an exponential decay process
at rate d, so that the average adult lifetime is 1/d. During this lifetime, an adult produces
m b
b · (1/d) juveniles. Putting these together, an adult produces on average R0 = m+δ d adults
before it dies. R0 is called the basic reproduction number or lifetime reproductive success. If
R0 exceeds 1, then an adult replaces herself with more than one adult; hence the population
grows. If R0 is less than 1, then adults on average die before they would have produced
one replacement; hence the population declines. If m is very large, then R0 ≈ b/d as in the
unstructured population model, whereas if m is very small, then so is R0 .
We must admit that in reality, maturation is often very much different from an exponential
decay process. Maturation often takes an approximately fixed length of time. A tadpole
just hatched from its egg has no chance to mature in the next short time interval dt; but a
large tadpole with legs has a rather high probability of maturing soon. m is therefore not
11
constant. Yet in much theoretical work, it is assumed to be constant (in the hope that it
does not make a qualitative difference in the end) because the mathematical analysis of an
exponential process is much simpler.
Note that the above shortcut was facilitated by the fact that the juvenile and adult stages
are consecutive, and therefore we could separately calculate the probability of surviving
the juvenile stage and then the number of offspring produced during the adult stage. As
our next example will illustrate, the states of many structured population models are not
consecutive.
where prime denotes the next generation. The m11 F1 N1 term of the first equation gives
the number of offspring produced in population 1 that also stayed in population 1. In
contrast, the term m12 F2 N2 is the number of offspring produced in population 2 that
moved over to population 1. P1 N1 is the number of adults of population 1 that are
still alive in the next year. Because adults do not move, only local adults contribute to
population size. The second equation is constructed analogously.
N0 = AN
with
m11 F1 + P1 m12 F2
A=
m21 F1 m22 F2 + P2
12
Matrix A is called the projection matrix of the model because it ”projects” the cur-
rent population vector N into the next population vector N0 . The transition probability
matrix Q in section (1.1) is also a projection matrix, albeit a special one, because it is also
a stochastic matrix (its columns sum to 1). The matrices of rates in the continuous-time
models are however not projection matrices because they describe change (dN/dt) rather
than project to the next value N0 .
Note that even if the model distinguishes offspring from their parents (because only
the offspring disperse), there is no population structure within a local population (N1 , the
total number of individuals in population 1, is sufficient to describe the first population,
and the same holds for N2 ). This is because the model assumes that offspring produced
this year will become adults by the next year. At the time of census (when we count N1
and N2 ), just before reproduction, all individuals are adults.
The modeller is of course free to choose the census point. We might for example decide that
we count the populations when the offspring are already born but have not yet dispersed.
With this choice, however, we should distinguish offspring who may disperse and adults who
may not; hence each local population would have a stage structure in addition to the spatial
structure of the metapopulation. The resulting model would have a 4x4 matrix, which is
unnecessarily more complicated than the model above. It is worthwhile to choose the census
point such that the resulting model is as simple as possible. Naturally, a more complicated
model alternative would eventually predict the same behaviour (after all, populations do
not ”know” when we count them, so their behaviour cannot depend on the choice of census),
but the analysis can be much harder.
In this model, it is not at all obvious whether the projection matrix describes a popu-
lation that grows or declines. In fact, it might happen that the population declines for a
while but then starts to grow (despite that we assume constant parameters!). To see this,
suppose that population 1 lives in a good environment so that it has high fecundity and
survival, whereas in population 2 these parameters are so small that without immigration
from population 1, population 2 would quickly go extinct. If most of the initial individuals
happen to be in population 2, then the dynamics of the first few generations will be dom-
inated by their death. Population 1 will however grow and eventually total population
size will be dominated by its growth; via dispersal, it will also maintain population 2 that
otherwise would be extinct.
13
1.5 Age-structured populations
The best known ecological example of matrix models addresses the discrete-time dynamics
of age-structured populations. This model applies to organisms with seasonal reproduc-
tion (for simplicity we assume that reproduction occurs once a year, e.g. in spring) whose
survival and fecundity depend on age. The population is divided into age classes. The
most convenient is to count (or ”census”) the individuals immediately before reproduc-
tion; then the youngest ones are 1 year old (they were born in the previous reproductive
season) and the population consists of N1 , N2 , ..., Nω individuals of age 1, 2, ..., ω. To have
only a finite number of age classes, we assume that there is a maximum age ω. The
maximum age can however be set so high that neglecting individuals older than ω does
not make an appreciable difference.The population is thus given by the vector
N1
N2
N = ..
.
Nω
Let Pi denote the probability that an i-year old individual survives the coming year
such that next year we shall census it as an i + 1-year old individual, and let Fi be the
number of offspring produced by an i-year old parent. By the next year, the offspring will
be 1-year old. Notice, however, that we census only those individuals who are present at
the beginning of a year (immediately before reproduction); hence if an offspring does not
survive its first year of life, we shall never count it. Hence Fi is not simply the number of
newborns produced, but the number of offspring who are alive at the next census (”effec-
tive fecundity”); it is often much fewer than the newborns. It is also customary to count
only females (and hence female offspring), assuming that there are enough males for each
female to be mated.
To construct the projection matrix, let us first draw the so-called life cycle graph that
depicts the life history of this population (Figure 2). Because we want to obtain a matrix
that projects over 1 year, it is very important that each arrow in this graph represents
a change over one year. In an age-structured population, all surviving individuals enter
the next age class and each age class produces offspring that are 1-year old in one year’s
14
time. The members of the last age class are removed.
One can set up the projection matrix directly from the life cycle graph, keeping in
mind that the ij element of the matrix gives the contribution of the jth class to next
year’s ith class. The fecundity Fj of the jth class contributes to the 1st class of the next
year; hence the Fj ’s occupy the first row of the matrix. Pi gives the transition probability
from class i to class i + 1, and hence it is placed into the it row and i + 1st column. All
other elements of the matrix are zero, because no other transitions are possible between
the age classes. As a result, we obtain the projection matrix
F1 F2 . . . Fω−1 Fω
P1 0 . . . 0 0
L = 0 P2 . . . 0 0 (10)
.. .. . . .. ..
. . . . .
0 0 . . . Pω−1 0
Exercise: Write down the individual equations to calculate the size of various
age classes next year (Ni0 ) analogously to equations (3), and verify that the
projection matrix given in (10) is correct.
Exercise: Draw the life cycle graph of the metapopulation model in section
1.4.
We can investigate whether the population described by the Leslie matrix in equation (10)
is able to grow by calculating its basic reproduction number (also called the lifetime repro-
ductive success, R0 ) analogously to section 1.3. In an age-structured population, R0 is the
number of offspring a 1-year old individual can expect to produce throughout its life. Let li
denote the probability that a 1-year old individual will survive till age i; if it does, then it
produces Fi offspring at that age. Summing all offspringP over every possible age, we obtain
the expected lifetime number of offspring as R0 = i li Fi . To calculate li , notice that,
obviously, l1 =1 (a 1-year old is alive now). The probability of surviving from age 1 to age 2
is given by P1 (cf. Figure 2), so that l2 = P1 . To survive from age 1 to age 3, the individual
must survive from age 1 to age 2 (with probability P1 ) and then from age 2 to age 3 (with
probability P2 ), and hence l3 = P1 P2 . By the same logic, we have li = P1 P2 ...Pi−1 ; this
completes the calculation of R0 directly from the parameters given in the Leslie matrix.
If R0 > 1, then the population has the capacity to grow, because every 1-year old will
eventually replace itself by more than one offspring. This does not however imply that the
population will grow in every year. Suppose that maturation occurs at age 2 such that
1-year old individuals do not reproduce yet (F1 = 0). If the initial population consists of
only 1-year old individuals, then in the first year there is no reproduction and all what
happens is that some of the individuals die.
R0 > 1 does not guarantee population growth even in the long term if there are post-
reproductive age classes. Suppose, for example, that Fω−1 = 0 and Fω = 0, i.e., the last
15
two age classes do not reproduce. If the initial population contains only these two age
classes, then there will be no reproduction ever, and the population will go extinct when
the initial individuals die3 . R0 > 1 implies long-term population growth if 1-year old in-
dividuals are either present in the initial population or will be produced before the initial
individuals die; afterwards, the 1-year old individuals will replace themselves by more than
one offspring each.
2 Matrix operations
In section 1.1, we already introduced matrix-vector mutiplication. In this chapter, we
review all basic operations and their important properties.
2.1 Multiplication
When a matrix and a vector are multiplied, the result is a new vector. The ith element
of this vector is calculated by multiplying the elements of the ith row of the matrix with
the corresponding elements of the vector, and adding up the products (see Figure 1). For
a 2x2 matrix, the multiplication is
a11 a12 x1 a11 x1 + a12 x2
Ax = =
a21 a22 x2 a21 x1 + a22 x2
If the matrix has a different number of elements in its rows than the number of elements
in the vector, then the procedure of multiplication cannot be carried out (one series of
3
Sometimes it is convenient to set ω to be the last reproductive age, and simply not count the post-
reproductive individuals in the model. If however survival or fecundity depends on population size (or
explicitly on some limiting resource, etc.; this we do not assume in the current model), then also the
post-reproductive individuals influence the growth of the population by contributing to population size
(or by exhausting the resources).
16
numbers runs out before the other); in these cases, the product does not exist. Hence an
n-vector can be multiplied only with matrices that have n columns. If the matrix is a
square matrix (i.e., it has also n rows), then the result will be an n-vector.
When seen for the first time, matrix-vector multiplication may appear haphazard (”why is
this the way it’s done and why not some other?”). The examples of the previous chapter
should convince the reader that matrix multiplication is defined in the way as it is useful.
To multiply two matrices, think of the columns of the second matrix as if they were
separate vectors. For example, in the matrix product
a11 a12 b11 b12
AB =
a21 a22 b21 b22
b11 b12
think of B as a ”collection” of two vectors, and . The first column of the
b21 b22
b11 b12
result matrix is A times ; the second column of the result is A times .
b21 b22
Hence we have
a11 a12 b11 b12 a11 b11 + a12 b21 a11 b12 + a12 b22
AB = =
a21 a22 b21 b22 a21 b11 + a22 b21 a21 b12 + a22 b22
In general, the element in the ith row, jth column of the result matrix is
X
[AB]ij = aik bkj
k
Once again, the product does not exist if the number of columns of the first matrix
does not match the number of rows of the second matrix. For example, the matrix-vector
product
a11 a12 x1
a21 a22 x2
exists because the matrix has 2 columns and the vector (think of it as a 2x1 matrix) has
2 rows. The product
x1 a11 a12
x2 a21 a22
however does not exist, because the vector has only 1 column but the matrix has 2 rows.
The most important fact about matrix multiplication is that the order of matrices
matters (multiplication is not commutative). This means that
AB 6= BA
17
1 2 5 0
Exercise: Let A = and B = . Calculate both AB and BA
0 3 6 1
to show that they are different.
All what matters is that the order must be kept; hence AB + AC = A(B + C) is correct
(matrix A is factored out to the front) but AB + AC is not the same as (B + C)A
(because the order has changed). Nothing can be factored out of AB + CA (because A
is once in the front and once in the back). To emphasise the order, we say that in AB,
A pre-multiplies B whereas B post-multiplies A.
which transforms a column vector into a row vector and vice versa. (When it is not
specified, a ”vector” is always taken to be a column vector.) A row vector can be post-
multiplied with a matrix as in
a11 a12
x1 x2 = a11 x1 + a21 x2 a12 x1 + a22 x2
a21 a22
but the opposite order is not possible. The result of xT A is a row vector, and its elements
are different from the elements of the column vector Ax.
A row vector and a column vector can be multiplied in both orders, but the results
are qualitatively different. The product
y1
x1 x2 = x1 y 1 + x2 y 2
y2
yields a scalar number (x1 y1 + x2 y2 ) and therefore it is called the scalar (or inner ) product
of the two vectors. The opposite order
y1 x1 y1 x2 y1
x1 x2 =
y2 x1 y2 x2 y2
18
2.3 The identity matrix and the inverse matrix
In multiplication of numbers, number ”1” plays a special role: a number multiplied with
1 remains the same. In matrix multiplication, the identity matrix plays the same role.
The 3x3 identity matrix is
1 0 0
I= 0 1 0
0 0 1
and the pattern is the same for any size of the matrix; the diagonal of the matrix contains
1’s and all other elements are 0.
Exercise: Verify that Ix = x for any vector x (assuming of course that the
size of the identity matrix corresponds to the vector).
A−1 A = AA−1 = I
whereas in general,
[A + B]ij = aij + bij
Matrices can be added to each other only if they have the same size. The zero matrix is
the matrix all elements of which are zeros; adding the zero matrix makes no difference.
19
Because 2A = A + A, we must have
a11 a12 a11 a12 2a11 2a12
2A = + =
a21 a22 a21 a22 2a21 2a22
By analogy, we can multiply a matrix with any number by multiplying each element of
the matrix with the number. Hence for a 2x2 matrix and any number k we have
ka11 ka12
kA =
ka21 ka22
and in general,
[kA]ij = kaij
In discrete-time models, one has to specify the order of events within a year. Sup-
pose thus that after censusing the population (counting the numbers in the population
vector Nt ), first there is reproduction and then there is dispersal (Figure 3a). Before con-
structing the full projection matrix, consider the effect of reproduction separately. After
reproduction, the original N1 individuals in population 1 are replaced by F1 N1 offspring;
and similarly the original N2 individuals in population 2 are replaced by F2 N2 offspring.
The population after reproduction is thus described by the vector
F1 0 N1
Ñt = = FNt
0 F2 N2
where the tilde denotes ”after reproduction” (but still within year t) and F is the diagonal
matrix of fecundities.
Next, we turn to the effect of dispersal. After dispersal, a fraction m11 of the Ñ1 indi-
viduals in population 1 are still in population 1, plus a fraction m12 of the Ñ2 individuals
in population 2 have moved to population 1. Population 1 thus contains m11 Ñ1 + m12 Ñ2
individuals; and an analogous reasoning holds for population 2. The population numbers
20
after dispersal are the numbers at the next census point, Nt+1 (cf. Figure 3a), hence we
have
m11 m12 Ñ1
Nt+1 = = MÑt
m21 m22 Ñ2
Putting the two equations together,
Nt+1 = MÑt = MFNt
Hence the 1-year projection matrix (which projects Nt directly into Nt+1 ) is the product
m11 m12 F1 0 F1 m11 F2 m12
MF = = (11)
m21 m22 0 F2 F1 m21 F2 m22
Notice that because reproduction precedes dispersal, we have to apply the fecundity
matrix F to the population vector N before applying the dispersal matrix M. Applying
a matrix to a (column) vector means pre-multiplying the vector with the matrix. Hence
we must first form the product FN, and then pre-multiply this with M to obtain MFN.
The ”literal” order ”M”, ”F” in MFN is just the opposite of the real order of operations!
The product MFN means that the vector N is first (pre-)multiplied with F and then the
result of this is (pre-)multiplied with M.
Let us now compare the above model with the same except that the order of events
is the opposite: we now assume that after census, there is first dispersal and then repro-
duction (Figure 3b). In this case, the population vector N is first multiplied with the
dispersal matrix M to obtain MN; and then this product is (pre-)multiplied with the
fecundity matrix F. This results in
Nt+1 = FMNt
i.e., the 1-year projection matrix is
F1 0 m11 m12 F1 m11 F1 m12
FM = = (12)
0 F2 m21 m22 F2 m21 F2 m22
Compare the results in equations (11) and (12): the different order of events resulted in
different projection matrices. The order of events matters because the order of matrices
matters in matrix multiplication.
21
It might be noticed that with only two events in the life cycle (reproduction, dispersal),
we could change the point of census such that the above two models become identical. For
example in Figure 3b, we could move the census from before dispersal to after dispersal,
and then the life cycle would be identical to Figure 3a. The choice of the census point is
of course arbitrary. The above two models are therefore models of the same population
censused at different points in the life cycle. The numbers in the population vectors of the
two models are different because they are counted at different points, but all qualitative
properties of the two models are the same (such as the speed of population growth or the
convergence to a stable state). With three or more events within a year, however, moving
the census point will not account for the differences in the order of events!
Nt = ANt−1 (13)
But can we figure out the past if we know the present, i.e., can we find Nt−1 if we know Nt ?
pt+1 = Qpt
To find the equilibrium allele frequencies, we must find the frequency vector p such that
if the current frequencies are given by this vector (pt = p), then there is no change after
mutation such that also the next frequencies are given by the same vector (pt+1 = p). In
other words, we must solve the equation
p = Qp
for the unknown equilibrium vector p. As with numbers, first we collect all terms with
the unknown to one side of the equation:
0 = Qp − p
where 0 is a vector of zeros. To factor out p from the two terms on the right hand side,
recall that p = Ip (the identity matrix does not change the vector) and therefore we can
22
write Qp − p = Qp − Ip = (Q − I)p, where the difference is formed between two matrices
(Q − I) and the order of multiplication is unchanged (p is factored out to the right). With
this, we can write the equilibrium equation as
(Q − I)p = 0 (14)
The two equations in (13) and (14) have the same structure: a known matrix (A or
Q − I) is multiplied with an unknown vector (Nt−1 or p) to yield a known vector (Nt or
0). In general, problems of this sort can be written as
Ax = r (15)
where A is an arbitrary known matrix, r is a known vector, and x is the unknown vector.
Such a matrix equation represents a system of linear equations. Indeed, if we write out
the matrix-vector multiplication, we arrive at the equations
a11 x1 + a12 x2 + . . . + a1n xn = r1
a21 x1 + a22 x2 + . . . + a2n xn = r2
..
.
an1 x1 + an2 x2 + . . . + ann xn = rn
where x1 , . . . , xn are the unknowns.
Formally, we can solve the system of linear equations by pre-multiplying both sides of
equation (15) with the inverse of A:
A−1 Ax = A−1 r
x = A−1 r (16)
(because A−1 A = I and the identity matrix does not change x), but to calculate x, we
must know the inverse matrix A−1 . In this chapter, we study how to obtain the solution
in practice; how to obtain the inverse matrix (if we must); and most importantly, whether
there exists a solution in the first place.
23
Secondly, subtract the second equation from the first; because the x2 terms are the same,
they cancel:
a11 a22 x1 − a12 a21 x1 = a22 r1 − a12 r2
Finally, solve the above single equation for x1 :
a22 r1 − a12 r2
x1 = (18)
a11 a22 − a12 a21
Once we have x1 , we can substitute it into one of the original equations and solve it for
the remaining unknown x2 .
a11 r2 −a21 r1
Exercise: Show that x2 = a11 a22 −a12 a21
.
This derivation implies that the solution can always be obtained, except in one case:
if a11 a22 − a12 a21 happens to be zero (because then we would divide with zero). As we
shall see later, the quantity a11 a22 − a12 a21 is the determinant of the 2x2 matrix A.
It is helpful to illustrate graphically when equations (17) have a solution and when
they have not. In Figure 4(a), the two straight lines correspond to the two equations in
(17) as follows. Rearranging equation (17a), we obtain x2 = (r1 − a11 x1 )/a12 , which gives
x2 as a linear function of x1 ; this linear function is plotted as the straight line correspond-
ing to equation 1. The line corresponding to equation (17b) is plotted analogously. The
first equation is satisfied by all combinations of x1 and x2 on the first line and the second
equation is satisfied by all combinations of x1 and x2 on the second line. To satisfy both
equations simultaneously, the solution must be on both lines. There is only one point that
is on both lines, the intersection point of the lines. Hence the intersection point gives the
solution (x1 , x2 ) graphically.
Figure 4: Graphical representation of a system of two linear equations. On each line, one
of the equations holds. The solution to both equations is the intersection point of the
lines in (a). There is no solution if the lines do not intersect as in (b); and all points on
the line are solutions if the lines coincide as in (c).
Figure 4(b) and (c) show how things can go wrong. There is no solution if the lines
are parallel (so that they never intersect); and there are infinitely many solutions if the
24
two lines coincide (all points on the line are on both lines). It follows immediately that
equations (17) have either one solution, or no solution, or infinitely many solutions; no
other cases are possible.
In case of a 3x3 matrix, the unknown vector has 3 elements and the analogue of Figure 4 is
three dimensional. Each of the three equations correspond to a plane in three-dimensional
space. Two of the planes intersect in a line; and this line intersects the third plane in a
point, which is the solution. There is no solution if the line is parallel to the third plane
or if all three planes are parallel (and no line exists). There are infinitely many solutions if
the line is in the third plane or if all three planes are the same. It remains true also in all
higher dimensions that a system of linear equations has either one solution or no solution
of infinitely many solutions.
If the equations have exactly one solution (as in Figure 4a), we say that there is a
unique solution. In practice, that is what we hope to have: a unique solution, which is
then ”the solution” of the problem. To explore whether equations (17) have a unique
solution or happen to represent one of the ”problem cases” in Figure 4b,c where there is
no unique solution, a numerical example will be most useful. Compare the left hand sides
of the following equations:
x1 + 2x2 = 5 (19a)
3x1 + 6x2 = 7 (19b)
Obviously, the left hand side of the second equation is just 3 times that of the first
equation. The right hand sides however do not agree: 7 is not 3 times 5. It is simply not
possible that a quantity (x1 + 2x2 ) equals 5 and its triple, 3(x1 + 2x2 ) = 3x1 + 6x2 , equals
7. These two equations contradict each other and therefore the system of equations has
no solution. This is the case of Figure 4b.
Indeed, the two lines corresponding to equations (19) are given by x2 = (5−x1 )/2 = 52 − 12 x1
and x2 = (7 − 3x1 )/6 = 76 − 12 x1 . These two lines have the same slope (− 12 ) but different
intercepts ( 52 and 76 , respectively). Hence the two equations correspond to two parallel lines
as shown in Figure 4b.
If we change the right hand sides of equations (19) such that now also the right hand
side of the second equation is 3 times that of the first equation,
x1 + 2x2 = 5 (20a)
3x1 + 6x2 = 15 (20b)
then the two equations say precisely the same; dividing both sides of 3x1 + 6x2 = 15 with
3, it simplifies to x1 + 2x2 = 5. The second equation thus contains no new information.
These equations correspond to the coinciding lines of Figure 4c.
25
Comparing the two examples in equations (19) and (20) it should be clear that the
problem in both cases is that the left hand side of the second equation is just 3 times that
of the first equation. This means that in the corresponding matrix,
1 2
A=
3 6
the second row is 3 times the first row. Hence we can see directly from the matrix that
this is a ”problem case” that has no unique solution. Whether the equations Ax = r have
infinitely many solutions or have no solution depends on whether the right hand sides of
the equations (given in vector r) agree with the matrix or not (in our example, whether
r2 is 3 times r1 or not). But the upshot is, the matrix alone determines whether there is a
unique solution. With 2x2 matrices, the equations have no unique solution if one row of
the matrix can be obtained by multiplying the other row with a certain number (in our
example, the second row is 3 times the first, or the first row is 13 times the second).
No row can be expressed as a number times another row of A, yet one of the rows can
be expressed using the other two rows: as the reader will notice, the last row is just the
sum of the first two. If we write out the product Ax (which is the left hand side of the
equations), then the last expression is the sum of the first two:
x1 + 2x2 + 6x3
x1 − x2 + 4x3
2x1 + x2 + 10x3
If the right hand sides agree with this pattern (the last is the sum of the first two), then
one of the equations will contain no new information. For example, in the system
x1 + 2x2 + 6x3 = 8
x1 − x2 + 4x3 = 5
2x1 + x2 + 10x3 = 13
the last equation does not say anything new; if the first two equations hold, then the
last one holds automatically because it is just the sum of the first two. This system of
equations has three unknowns but only two useful equations. Hence one of the unknowns
26
(e.g. x3 ) can be chosen arbitrarily; the first two equations can be solved for the remaining
unknowns (x1 and x2 ) and so a valid solution is obtained for any value of the arbitrary
unknown (x3 ). This system has infinitely many solutions. To the contrary, if the right
hand sides do not agree with the pattern of the left hand sides (in this example, the last
is not the sum of the first two), then the last equation will contradict the first two and
the system has no solution. It is for example impossible to satisfy the system
x1 + 2x2 + 6x3 = 8
x1 − x2 + 4x3 = 5
2x1 + x2 + 10x3 = 10
because whenever the first two equations hold, the last one certainly does not.
We can summarize the examples in the following statement: the system of equations
has a unique solution if and only if the rows of the matrix are linearly independent.
”Linearly independent” means that a row cannot be expressed with other rows using
multiplication with a number and addition/subtraction. For example, the rows of the
matrix
1 2 3
B= 1 1 1
3 4 5
are not linearly independent because the last row is the first row plus twice the second
row.
Denote the ith row of an n × n matrix A with ai T (where I use the transpose because ai T
is a row vector). an T can be expressed using the other rows if we can find appropriate
numbers c1 , . . . , cn−1 such that an T = c1 a1 T + c2 a2 T + . . . + cn−1 an−1 T . With cn = −1,
this is equivalent to
c1 a1 + c2 a2 + . . . + cn an = 0
The expression c1 a1 + c2 a2 + . . . + cn an is called a linear combination of the vectors ai . The
vectors are not linearly independent if there is a linear combination of them that equals the
zero vector.
Exercise: Take matrix B as given above, and give examples for the vector r
such that the system Bx = r has (i) no solution; (ii) infinitely many solutions.
27
matrix is singular. This is of course an exceptional case (cf. Figure 4), but this excep-
tional case will be of great importance.
If matrix A is singular, then it cannot be inverted (A−1 does not exist). Therefore
the solution to Ax = r cannot be obtained as x = A−1 r (cf. equation (16)); a unique
solution does not exist.
This matrix is obviously singular; the second row is 0 times the first row (and notice that for
the same reason, any matrix that contains a row (or column) of zeros is singular). Applying
this matrix to any vector,
1 0 x1 x1
=
0 0 x2 0
it retains the first element of the vector but replaces the second element with zero. Graph-
ically, this means that the matrix projects every vector to the x1 -axis (Figure 5a). The
second element of the original vector, x2 , is ”forgotten”: there is no way one could recon-
struct the value of x2 from the result vector.
Figure 5: What a singular matrix does to a vector. In (a), any original vector with coordinates x1 and
x2 (bold arrow) is projected onto the x1 -axis, i.e., its x1 -coordinate is retained but its x2 coordinate is
replaced with 0 (dashed arrow). In (b), any original vector (bold arrow) is stretched and rotated into a
result vector (dashed arrow) that is on the line x2 = 2x1 (thin line).
the second row of which is twice the first row. Applying this matrix to an arbitrary vector,
1 2 x1 x1 + 2x2
=
2 4 x2 2x1 + 4x2
28
the second element of the result is always twice the first element. Graphically, the x2 coor-
dinate of the result vector is always twice its x1 coordinate; hence this matrix rotates every
vector such that it lands on the x2 = 2x1 line (see Figure 5b).
In both examples, we can start with any vector in a plane (of the coordinate system in
Figure 5), and the multiplication with the singular matrix results in a vector on a line (on
the x1 axis or on the line x2 = 2x1 ). A singular matrix therefore collapses the vectors
into less dimensions than the original (a plane is collapsed onto a line). From knowing the
vector Ax, we cannot figure out the original vector x, because information is lost when the
dimensions are collapsed. Saying that from knowing the vector Ax we cannot figure out the
original vector x is the same as saying that the equation Ax = r cannot be solved. If the
vector r is in the dimensions retained by the matrix (e.g. on the line onto which the matrix
collapses every vector) then many vectors x are collapsed into this same vector r and there
are infinitely many solutions to the equation; and if r is not in the dimensions retained by
the matrix, then there is no solution.
4.4.1 2 × 2 matrices
The determinant of the 2x2 matrix
a11 a12
A=
a21 a22
is given by the formula
|A| = a11 a22 − a12 a21 (21)
The vertical lines in |A| denote the determinant. The determinant of A can also be
denoted as det(A).
29
Notice that the determinant of the 2 × 2 matrix appears as the denominator in the solution
of a system of two equations, given in (18). If the determinant is zero, then the solution
cannot be obtained; this already hints that a zero determinant signals a singular matrix.
4.4.2 n × n matrices
For larger matrices, the calculation of the determinant can be done through the calculation
of smaller determinants as follows. Suppose we want to calculate the determinant of a
3 × 3 matrix,
a11 a12 a13
a21 a22 a23
a31 a32 a33
To start, assign a ”+” or a ”−” sign to each element of the determinant as on a chessboard
(starting with ”+” in the first element of the first row):
+ − +
− + −
+ − +
Then take the first element of a row (e.g. take a11 ); keep its original sign if the chessboard
pattern assigns ”+” to this element and change its sign if the chessboard pattern assigns
”−”; and multiply this with the determinant of the matrix after deleting the row and the
column where the element is. For the element a11 , this means
a22 a23
a11
a32 a33
(note the ”+” sign and that the first row and first column are left out from the determi-
nant). The 2 × 2 determinant in this expression can be calculated as shown above. Do
this same procedure while going through the entire row of the matrix, and add up the
results to obtain the determinant of the 3 × 3 matrix:
In this example, we expanded the determinant using its first row; but it can be done
(and gives the same result) using any row or any column. The important things are to
follow the chessboard pattern of signs; to go fully through one row or one column; and at
every step, to delete the row and the column where the current element is. So the same
determinant may also be calculated using the second row of the matrix:
30
The determinants of larger matrices are expanded analogously. The expansion of a
4 × 4 matrix results in an expression containing four 3 × 3 determinants, each of which is
reduced to three 2 × 2 determinants as above. The procedure described here is known as
the Laplace expansion of determinants.
(i) If we exchange two rows (or two columns) of a matrix, then it’s determinant changes
sign.
Denote the original matrix with A and let à be the same matrix except that the first and the
second rows are exchanged. For example with 3 × 3 matrices, we have
Expand |A| using its first row and |Ã| using its second row. The numbers obtained during the ex-
pansion are exactly the same, except that due to the chessboard rule, all numbers in the expansion
of |Ã| get a negative sign. Hence |Ã| = −|A| as stated. The same applies if we exchange any two
consecutive rows (e.g. the second and the third rows).
Suppose now that the first and the third rows are to be exchanged. This we can achieve by
exchanging consecutive rows three times. Label the original rows of the matrix with 1,2,3 and
exchange them as follows: (1,2,3) → (2,1,3) → (2,3,1) → (3,2,1). At each of these exchanges,
the determinant changes sign; and because we have had an odd number of exchanges, the final
determinant has the opposite sign to the original. It can be seen that exchanging two arbitrary
rows can be achieved by an odd number of exchanges between consecutive rows. Hence in every
case, the determinant changes sign.
(ii) If we multiply one row (or column) of a matrix with a number c, then the determinant
is also multiplied with c.
Expand the determinant using the row (column) that has been multiplied with c; each term of the
expansion is then multiplied with c.
(iii) If a row (or column) of a matrix contains sums of two terms, then the determinant
is the sum of two determinants, where one determinant contains only the first terms
and the other contains only the second terms in that row (or column).
31
The statement is far easier to understand if we write an example:
b1 + c1 b2 + c2 b3 + c3 b1 b2 b3 c1 c2 c3
a21 a22 a23 = a21 a22 a23 + a21 a22 a23
a31 a32 a33 a31 a32 a33 a31 a32 a33
To see why this is true, expand the determinant using its row (column) where the sums are.
To make the above statement briefer, we can say that a matrix is singular if and only if its determinant
is zero. Here ”if” means that the zero determinant is sufficient to make sure that the matrix is singular
(i.e., all matrices with zero determinant are singular); and ”only if” means that the zero determinant is
necessary to make sure that the matrix is singular (i.e., only those matrices are singular that have zero
determinant). The relationship ”if and only if” between two properties mean that the two are equivalent;
here a matrix being singular is equivalent to its determinant being zero. ”If and only if” is sometimes
abbreviated as ”iff”.
Proof. First we prove the first half of the theorem: if a matrix is singular, then its de-
terminant is zero. In a singular matrix, the rows are not linearly independent, i.e., one
of the rows can be written as a linear combination of the others (see section 4.3). The
simplest case is if two rows of the matrix are the same. In this case, we exchange these two
rows, which makes no difference to the matrix and hence to its determinant. Exchanging
two rows, however, changes the sign of the determinant by property (i) discussed in the
previous section. There is only one number that does not change if its sign is changed to
the opposite: zero. Hence the determinant of a matrix with two identical rows must be zero.
Now we can consider the general case when no two rows are identical, but one of them is a
linear combination of the others. Without loss of generality, we take the last row as a linear
combination of the others. I write the reasoning for a 3 × 3 matrix, but it should be clear
that the same argument holds for any size. Thus we start with the matrix
a11 a12 a13
A= a21 a22 a23
αa11 + βa21 αa12 + βa22 αa13 + βa23
where the last row is written as α times the first row plus β times the second row (a linear
combination of the first two rows). The last row of the matrix contains sums, hence use
property (iii) in the preceding section to write the determinant of A as
32
Now both determinants on the right hand side have their last row multiplied with a number.
Applying property (ii) in the previous section we obtain
Notice now that in the first determinant, the last row is the same as the first; and in the
second determinant, the last row is the same as the second. Hence both determinants are
zero because they contain identical rows, and we arrive at
|A| = α · 0 + β · 0 = 0
If the matrix is of size n × n, then the same logic applies except that one of the rows is
written as the sum of n − 1 other rows, and hence in the first step of the expansion (using
property (iii)) we get a sum of n − 1 determinants. Each determinant is then shown to be
zero as above.
We still have to prove the converse theorem, i.e., that every matrix that has zero determinant
is singular. This we shall do at the end of the next section.
33
Now we calculate the determinant of A2 applying property (iii) of determinants (see section
4.4.3) to the second column:
P
|A2 | = a1 x2 a2 a3 + a1 j6=2 xj aj a3
By property (ii), the first determinant on the right hand side is x2 |A|. In the second
determinant, the second column is a linear combination of the other columns and therefore
the determinant is zero. Therefore we arrive at |A2 | = x2 |A|, or
|A2 |
x2 =
|A|
We used the example of A2 throughout, but it should be clear that the same logic applies
to any column of A; and hence we can obtain any element of the solution vector x by the
formula
|Ai |
xi = (22)
|A|
This formula is called Cramer’s rule.
Exercise: Show that the solution in equation (18) that we obtained for a 2 × 2 matrix is
an example for Cramer’s rule.
Cramer’s rule gives the unique solution of the system Ax = r, provided that the formula
in (22) is valid, i.e., if its denominator, |A|, is not zero. If the determinant of matrix A is
zero, then according to Cramer’s rule, the unique solution of Ax = r does not exist. As we
have seen (section 4.3), a unique solution does not exist when the rows (columns) of A are
not linearly independent, or, in other words, if the matrix is singular.
34
matrix. With a 2 × 2 matrix A, hence we can assemble the inverse matrix from the columns
obtained from equations (23) and (24):
b11 b12
A−1 =
b21 b22
and by the construction, we have that
−1 a11 a12 b11 b12 1 0
AA = =
a21 a22 b21 b22 0 1
The method is the same for larger matrices.
If A is singular, then equations (23) and (24) don’t have unique solutions and hence A−1
cannot be constructed; the inverse of a singular matrix does not exist.
To project the population from an arbitrary initial vector N0 for an arbitrarily long
period of time t, we can write
N(1) = AN(0)
N(2) = AN(1) = AAN(0) ≡ A2 N(0)
N(3) = AN(2) = A3 N(0)
..
.
N(t) = At N(0) (25)
The last of these equations lets us calculate N(t) directly from any initial population vec-
tor N(0), but at a huge computational cost: obtaining the matrix At involves t − 1 matrix
multiplications5 (where t could be thousands). Even worse, such large computations offer
little insight into what happens in the model and why.
5
If only one specific initial vector is of interest, then N(t) can be obtained by t matrix-vector multi-
plications as N(1) = AN(0), ..., N(t) = AN(t − 1). This amounts to less computations than calculating
At . However, the entire calculation has to be done again if we wish to try with a different initial vector.
35
5.1 Numerical experiments
5.1.1 An age-structured population
Figure 6 shows the behaviour of an age-structured population described by the Leslie
matrix
0.7 1 2 2 1
0.25 0 0 0 0
L= 0 0.4 0 0 0
(26)
0 0 0.6 0 0
0 0 0 0.5 0
Each histogram shows the fractions of individuals who belong to age classes 1-5. The first
panel is the initial population, this was chosen arbitrarily. The subsequent histograms
were obtained by projecting theP population to the next year using N(t) = LN(t − 1) and
calculating the fractions Ni (t)/ j Nj (t) of individuals in age class i. The bottom panel
of the figure shows how the population growth factor changed in time. The population
growth factor is the total number
P of individuals
P in a given year divided with the total
number in the previous year ( j Nj (t)/ j Nj (t − 1)). It is commonly called the annual
growth ”rate” (note however that this quantity is not a rate!).
The histograms of Figure 6 change in the first few years; for example, the initial
population had no 1-year olds, and consequently the population in year t = 1 has no
2-year olds. However, the changes diminish with time. The population converges to a
stable age structure; after the transient has died out, the population consists of a fixed
fraction of 1-year olds, 2-year olds, etc. Simultaneously with the stabilization of the age
structure, also the population growth factor converges to a constant value. (Note that
this constant value is not 1, which means that the population grows exponentially. The
elements of N(t) hence increase with time, but the ratios or relative sizes of the elements,
shown by the histograms in Figure 6, become constant.)
The latter fact is easy to understand. Once the age structure has stabilized, each year the
same fraction of individuals produces F1 (and F2 , ..., Fn ) offspring; and the same fraction
of individuals is subject to mortality 1 − P1 (and 1 − P2 , ..., 1 − Pn−1 ). Hence the per capita
population growth is the same in each year.
The behaviour observed in Figure 6 is typical for almost all matrix models. The
following exercise however shows that there are some exceptions.
36
Figure 6: Changes of the age distribution and of the annual growth ”rate” under the Leslie
matrix given in (26). The initial distribution is shown in the top first panel. Initially the
age distribution exhibits transient changes (top two rows) but later it stabilizes (last row
of panels). The bottom panel shows the annual growth ”rate” as a function of time.
37
0 F
L= (27)
P 0
where F is the fecundity of reproductive plants and P is the probability that a
vegetative plant survives to be reproductive (choose the numbers arbitrarily).
Monitor the age structure and the annual growth factor while projecting the
population forward in time; show that these do not converge to fixed values but
maintain 2-year oscillations. To interpret the results, it is helpful to consider
an initial population of only 1-year old plants; and an initial population of
only 2-year old plants. Any initial population is the mixture of these two.
p(t + 1) = Qp(t)
This is a typical example for all models where frequencies are projected using a matrix of
transition probabilities. Figure 7 shows the long-term behaviour of this model with the
transition probability matrix
0.8 0.2 0.15
Q = 0.06 0.7 0.1 (28)
0.14 0.1 0.75
Once again, the frequency vector p(t) converges to a stable distribution as t becomes
large. The only difference from Figure 6 is in the annual growth factor. Because the
elements of p(t) must add up to 1 in every year t, the total remains the same each year;
and hence the annual growth factor is constant at value 1.
This is a direct consequence of the matrix being a stochastic matrix. Stochastic matrices
represent transitions between states with no loss or gain of individuals; this is evident from
the matrix as each of its columns sums to 1 (cf. section 1.1) and implies that the growth
factor is always 1. Stochastic projection matrices are thus a special case of all projection
matrices.
38
Figure 7: Changes of the state distribution and of the annual growth ”rate” under the
stochastic matrix given in (28). The initial distribution is shown in the top first panel.
39
5.2 Eigenvalues and eigenvectors
We begin the analysis of matrix models with seeking the equilibrium population structure
observed in typical numerical experiments as in Figures 6 and 7. Once the equilibrium
structure is established, the population grows each year by a constant factor λ, the annual
growth factor, while preserving the ratios between the elements of the population vector.
Hence each element of the population vector must grow by the same factor λ, and we
have
N(t + 1) = λN(t)
As always, N(t+1) = AN(t). When the population has reached its equilibrium structure,
then we have
AN(t) = λN(t)
In general, any vector u that satisfies the equation
Au = λu (29)
Pretend for a moment that we know the eigenvalue λ; we could then solve equation
(29) for the unknown vector u. Rearranging equation (29), we obtain
Au − λu = (A − λI)u = 0 (30)
But this equation has an obvious solution: if u is the zero vector, then this equation holds
for any λ. (Write out the system of linear equations from (30) to see that every term
in the left hand sides of the equations contains an element of u. If all elements of u are
zero, then the left hand side of each equation equals zero, precisely as required by (30)).
u = 0 is a trivial solution of equation (30), but it is also a useless solution because it
corresponds to a nonexisting population.
Equation (30) represents a system of linear equations, and therefore has generically
only one solution: the trivial solution. To find any other solution, the matrix A − λI must
be singular (cf. sections 4.1 - 4.3).
This fact gives us the key to determine the eigenvalue λ. Because A − λI must be
singular, its determinant must be zero; hence we have the equation
|A − λI| = 0 (31)
that we can solve for λ. This equation is called the characteristic equation.
40
Finding eigenvalues and eigenvectors is a two-step process: first solve the characteristic
equation for λ; then solve equation (29) (or equivalently equation (30)) for u. I detail these
two steps below, and in the next section I show a worked example for the calculations.
1. Solve the characteristic equation. To start with, one has to expand the determinant
in the characteristic equation; the expansion is described in section 4.4. In case of
a 2x2 matrix A, the characteristic equation becomes
a11 − λ a12
= λ2 − (a11 + a22 )λ + a11 a22 − a12 a21 = 0
a21 a22 − λ
The elements of A are known, the characteristic equation is thus a quadratic equa-
tion for λ. The quadratic equation has two solutions (λ1 , λ2 )6 , both are eigenvalues
of matrix A.
2. Find the eigenvector for each eigenvalue. To find the eigenvector, we need to solve
the equation that defines the eigenvector, Au = λu (or, equivalently, (A−λI)u = 0).
Each eigenvalue has its own eigenvector, hence we must do this step n times: First
use λ1 for λ to find its corresponding eigenvector u1 ; then use λ2 to find u2 ; etc.
one of the equations is not independent of the others and thus gives no information.
This is no surpise; this is because the matrix A − λI is singular (cf. step 1). We
simply throw away one of the equations, and solve the rest. But because we have
√
2
6
Recall that the solutions of the quadratic equation ax2 + bx + c = 0 are x1,2 = −b± 2ab −4ac
. The
2
two solutions may be complex (when b − 4ac < 0 so that the square root gives a complex number) or
exceptionally the two solutions may coincide yielding a multiple root (when b2 − 4ac = 0). Accordingly,
a matrix may have complex eigenvalues or multiple eigenvalues; we deal with these complications later.
7
These approximate values can be made much more precise with numerical algorithms such as the
bisection method or the Newton-Raphson method, which are implemented in many software packages.
Complex eigenvalues cannot be obtained graphically, but we shall actually not need them here.
41
now one more unknown element of u than the number of useful equations, we can-
not obtain a full solution. Instead, we can choose one of the elements arbitrarily,
and solve the above equations for the rest. For example, if we choose a value for u1 ,
then all other elements of u follow. But if we increase the chosen value of u1 k-fold,
then also all other elements of u increase k-fold.
In other words, if u is an eigenvector, then ku is also an eigenvector. This is easily seen also from
the definition of an eigenvector: if the equation Au = λu holds, then also A(ku) = λ(ku) holds.
Eigenvectors are determined only up to a constant.
Because (A − λI)u = 0, the singular matrix A − λI collapses the eigenvector u into the origin.
But if a matrix collapses a vector into the origin, then it collapses also every other vector that are
on the same line; hence all vectors on a line are eigenvectors.
By determining the eigenvalues and eigenvectors, we have candidate solutions for the
annual growth factor and the equilibrium population structure. We shall investigate in
section 5.4 which of the eigenvalue-eigenvector pairs represent the annual growth factor
and the stable structure.
42
and write this out as a system of equations for the unknown elements of the eigenvector,
u1 , u2 :
u1 + u2 = 1.5u1
0.75u1 = 1.5u2
From the second equation, we get u2 = 12 u1 . We do not expect the first equation to
give any more information; indeed, solving the first equation for u2 gives the same result,
u2 = 12 u1 . We can thus neglect one of the two equations. As always, the eigenvector is
determined only up to a constant, so that we can choose any vector where the second
element is half of the first,
2 1 2/3
u1 = or or or . . .
1 1/2 1/3
In any case, the eigenvector shows that there are twice as many 1-year old individuals as
2-year olds.
In this example, the second eigenvalue, λ2 = −0.5, clearly can not be the annual growth
factor of the population, because the number of individuals cannot become negative. If
nevertheless necessary, one can determine u2 , the eigenvector corresponding to λ2 = −0.5,
in the same way as above. The equations
u1 + u2 = −0.5u1
0.75u1 = −0.5u2
are all eigenvectors, as are infinitely many more vectors where the second component is
(−2) times the first.
The lifetime reproductive success of this population is R0 = 1 + 0.75 = 1.75, because all 1-
year old individuals produce one offspring and the fraction 0.75 who survive till age 2 produce
another offspring. Both λ1 = 1.5 > 1 and R0 = 1.75 > 1 indicate that the population is
growing, but R0 is less than λ1 . This is because the lifetime number of offspring is realized
in 2 years, whereas λ1 shows growth over 1 year. The offspring produced at age 1 start
reproducing a year earlier than the offspring produced at age 2. Because the latter lose a
year of reproduction, they contribute less to the long-term growth of the population.
43
Exercise: Determine the annual growth factor and the equilibrium age dis-
tribution of a population with a Leslie matrix similar to (32), except that each
age class produces 2 offspring rather than 1. Compare the results with those
above.
44
How to put the initial vector together from components of eigenvectors is easiest
understood via a numerical example. Suppose that the eigenvectors are
1 0
u1 = and u2 =
2 −1
N(0) = α1 u1 + α2 u2
Since the initial vector N(0) and the eigenvectors u1 , ..., un are known, (36) represents a
system of n linear equations for the n unknown numbers α1 , ..., αn . These equations can be
solved to obtain the values of α1 , ..., αn ; but we shall not need it in this section.
In equation (36), we have the initial vector built up from components of eigenvectors.
Now we substitute this into the long-term projection (34) and use (35):
N(t) = At N(0) =
= At (α1 u1 + α2 u2 + ... + αn un ) =
= α1 At u1 + α2 At u2 + ... + αn At un =
= α1 λt1 u1 + α2 λt2 u1 + ... + αn λtn un =
h λ t λ t i
t 2 n
= λ1 α1 u1 + α2 u2 + ... + αn un (37)
λ1 λ1
Because λ1 is greater in absolute value than any of the other eigenvalues (cf. (33)),
the ratios λλ12 , ..., λλn1 are all less than 1 in absolute value. Raising these numbers to high
powers gives very small values; for example, λλ21 = 0.95 raised to power t = 200 is only
0.000035. This means that all but the first term in (37) become negligibly small as t
becomes large. We thus have
N(t) ≈ λt1 α1 u1 (38)
45
where the approximation is excellent for large t and improving ever more as t increases.
The result in (38) says that N(t) is a number times the dominant eigenvector u1 (the
product itself is the dominant eigenvector, because eigenvectors are determined only up
to a constant); hence in the long term, the population structure becomes the structure of
u1 . Each time t increases by one, N(t) grows λ1 -fold; hence λ1 is the population growth
factor. By (38), we have shown that the population eventually grows with the dominant
eigenvalue of its projection matrix and its structure converges to the dominant eigenvector.
5.5 Diagonalization
In this section, we formulate the above ideas in a somewhat more technical way that
eventually leads to more biological insight. This is however not indispensable, and hence
the reader may skip this section and proceed directly to the next.
This section will make use of diagonal matrices. A diagonal matrix is a matrix where the
only non-zero elements are in the diagonal:
λ1 0 . . . 0
0 λ2 . . . 0
Λ= . (39)
.. . . ..
.. . . .
0 0 . . . λn
The next three exercises explore the properties of diagonal matrices, which will be used at
crucial points in this section.
Exercise: Show that a diagonal matrix can easily be raised to any power as
t
λ1 0 . . . 0
0 λt2 . . . 0
Λt = . (40)
.. . . ..
.. . . .
0 0 ... λtn
Exercise: Show that pre-multiplying any matrix A with a diagonal matrix Λ multiplies
the rows of A with the numbers λi :
λ1 a11 λ1 a12 . . . λ1 a1n
λ2 a21 λ2 a22 . . . λ2 a2n
ΛA = (41)
.. .. .. ..
. . . .
λn an1 λn an2 . . . λn ann
Exercise: Show that post-multiplying any matrix A with a diagonal matrix Λ multiplies
the columns of A with the numbers λi :
λ1 a11 λ2 a12 . . . λn a1n
λ1 a21 λ2 a22 . . . λn a2n
AΛ = (42)
.. .. .. ..
. . . .
λ1 an1 λ2 an2 ... λn ann
46
The number of eigenvectors. The characteristic equation of an n × n matrix is an nth de-
gree equation for the eigenvalue λ. Such an equation has n solutions, and therefore the
matrix has n eigenvalues. There are however two possible complications. First, some of
the eigenvalues may be complex numbers. One can do all necessary calculations also with
complex eigenvalues and eigenvectors; and in biologically meaningful models, the predicted
population vectors will (obviously) turn out to be real, not complex. The second complica-
tion is more serious: some of the solutions of the characteristic equation may coincide with
each other. Such coinciding solutions are called multiple roots. Figure 8 shows the roots
2
√ equation x − a = 0. If a is positive (Figure 8a), then there are
of the simple quadratic
two roots, x1,2 = ± a, √ as shown in the figure. If a is negative, then there are again two
different roots, x1,2 = ± a, which are complex numbers (because we are taking the square
root of a negative number). These roots cannot be seen in the figure, but can nevertheless
be calculated and used. The critical case is when a = 0. In this case, both solutions are the
same number, x1,2 = ±0 = 0. As Figure 8 suggests, the two roots x1,2 get closer and closer
to each other as a approaches 0, and coincide only when a is exactly 0. Multiple roots are
therefore exceptional (the technical term is ”degenerate”).
√
Figure 8: The graph of x2 − a. In (a), the two roots x1,2 = ± a are marked with dots. In (b), the two
roots coincide, whereas in (c), the roots are complex and therefore cannot be seen.
Different eigenvalues always have their own eigenvectors, so if the matrix has n different
eigenvalues (what is almost always the case) then it has n different eigenvectors that are
linearly independent of each other. In case of a multiple root, however, we may or may
not find as many different eigenvectors corresponding to the multiple root as the number of
coinciding eigenvalues; and if there are fewer eigenvectors, then the total number of eigen-
vectors is less than n. In the remainder of this section, we shall assume that the matrix does
have n different eigenvectors. Such matrices are called diagonalizable, i.e., we assume that
the projection matrix we are concerned with is diagonalizable. All matrices that are not
diagonalizable have multiple eigenvalues, and therefore represent exceptional (degenerate)
cases that we do not pursue in this course.
Transforming the projection matrix. Assume thus that matrix A has n different eigenvectors.
Collect these n eigenvectors into a matrix
such that the eigenvectors are the columns of matrix C. The product AC is easy to compute:
treating C column-wise (cf. section 2.1) and using Au = λu, we obtain
47
AC = [Au1 , Au2 , . . . , Aun ] = [λ1 u1 , λ2 u2 , . . . , λn un ]
In the product AC, the columns of C are multiplied with the eigenvalues. As we have seen
in equation (42), the same result can also be obtained from the multiplication
CΛ = [λ1 u1 , λ2 u2 , . . . , λn un ]
where Λ is the diagonal matrix containing the eigenvalues. Hence we can write
AC = CΛ
Because matrix C has n linearly independent columns (the eigenvectors are all different),
it can be inverted (C−1 exists).By post-multiplying both sides of the above equation with
(C−1 , we can express A as
A = CΛC−1 (43)
Fast matrix powers. For projecting the population vector according to the matrix model
N(t) = At N(0), we need to calculate high powers of matrix A. The transformation in (43)
is the key to do this. With (43), we obtain
because the products C−1 C give the identity matrix and therefore are cancelled. Raising
a diagonal matrix to a high power is easy (see (40)), so that we can calculate At efficiently
if we know all its eigenvalues (for matrix Λ) and eigenvectors (for matrix C). The only
remaining hurdle is to calculate the inverse matrix C−1 . As we show in the next step, the
rows of C−1 also have an intimate relationship with the original matrix A.
Left eigenvectors. To figure out what matrix C−1 contains, pre-multiply both sides of
equation (43) with C−1 . This gives
Recall from (41) that in ΛC−1 , pre-multiplication with the diagonal matrix multiplies the
rows of matrix C−1 with the eigenvalues in Λ. Let us thus consider matrix C−1 as a
collection of rows:
The superscript ”T” denotes the transpose (see section 2.2), i.e., makes the vectors row
vectors that can be placed in the rows of the matrix. Now we look at the two sides of
equation (45). By the rules of matrix multiplication, the ith row of the product C−1 A is
computed by multiplying the ith row of C−1 with matrix A; hence the ith row of the left
hand side is given by the vector-matrix product viT A. On the right hand side of (45) a
diagonal matrix pre-multiplies C−1 and hence the ith row of the result is λi times the ith
row of C−1 , which is λi viT . Combining these results, we have
48
viT is thus a row vector such that if we multiply it with matrix A, the result is the ith eigen-
value times the original vector viT . This is very similar to the definition of an eigenvector
given in equation (29), the only difference is that viT is a row vector and therefore must
be placed before the matrix in order to be able to multiply. viT is called the left eigenvec-
tor belonging to eigenvalue λi . The ”normal” eigenvectors u1 , . . . , un are also called right
eigenvectors (although if nothing is said, an eigenvector is taken to be the right eigenvector).
The set of left eigenvectors, v1T , . . . , vnT are similar to the right eigenvectors except that they
are row vectors and hence the order of matrix-vector multiplication is the opposite. Each
eigenvalue λi has a (right) eigenvector ui and a left eigenvector viT .
Scaling of the left eigenvectors. Like any eigenvector, also the left eigenvectors are unde-
termined up to a constant, i.e., any number times an eigenvector viT is also an eigenvector
belonging to λi . However, we are not as free choosing the left eigenvectors as we are to
choose the right eigenvectors. If we want the left eigenvectors to be the rows of C−1 , we
must make sure that the product C−1 C will indeed be the identity matrix as it should. The
off-diagonal elements will be 0 automatically, but for the diagonal elements of the product
to be 1, we must have that the ith row of C−1 times the ith column of C is 1:
n
X
viT ui = 1 or, equivalently, vi,j ui,j = 1 (47)
j=1
where vi,j and ui,j denote respectively the jth elements of the ith left and right eigenvec-
tors. In practice, we first choose the scaling of the right eigenvectors as we wish, and then
compute the left eigenvectors to obey the scaling given in (47). The scaling equation gives
one more equation to those determining the elements of the left eigenvector, so that we have
as many independent equations as vector elements and can obtain all elements of the left
eigenvector unambiguously.
Long-term population growth and the reproductive value. Now we return to the population
model N(t) = At N(0). Substituting At from equation (44), the population vector after t
years of growth is given by
N(t) = CΛt C−1 N(0) (48)
where all ingredients (C, Λ, C−1 ) are given by the eigenvalues and eigenvectors of the pro-
jection matrix A. This equation gives an exact projection of the population for arbitrarily
long time t, at the cost of calculating all eigenvlues and (left and right) eigenvectors.
If we are interested in only what happens after sufficiently long time, then we can obtain
a good approximation by looking into the matrix Λt for large t. Let λ1 be the dominant
eigenvalue (i.e., the one largest in absolute value) and rewrite (40) such that λt1 is factored
out:
t
λ1 0 . . . 0
1 0 ... 0
λ2 t
0 λt2 . . . 0 0
λ1 ... 0
t t
Λ = . .. = λ1 ..
. .. . . .. . . ..
. . . . . . . .
t λn t
0 0 . . . λn
0 0 ... λ1
Because the numbers λλ21 , . . . , λλn1 are all less than 1 in absolute value, their t-powers become
very close to zero when t is large. Hence in the long term, all but the first diagonal element
49
of Λt are negligibly small, and we have
1 0 ... 0
0 0 ... 0
Λt ≈ λt1
.. .. .. ..
. . . .
0 0 ... 0
Substituting this approximation into the exact projection in (48) greatly simplifies the
matrix multiplications because Λt now contains many zeros. In fact, all but the first column
of C is multiplied with only zeros; and all but the first row of C−1 is multiplied with only
zeros. The projection thus simplifies to
N(t) ≈ λt1 u1 v1T N(0) (49)
This equation is exactly the same as the approximation we got in equation (38), with the
additional information that the number α1 in (38) is given by v1T N(0), the scalar product
of the dominant left eigenvector and the initial population vector.
Equation (49) gives an important biological interpretation to the dominant left eigenvector.
The initial population
P vector N(0) influences the long-term projection N(t) only by a single
number, v1T N(0) = j v1,j Nj (0). This number is the weighted sum of the initial numbers
Nj (0), and the weights are the elements of the dominant left eigenvector. For example,
if the first element of the left eigenvector is much greater than the second, then starting
a population with individuals in the first state (age group etc.) will give much higher
popultion numbers after t years than starting a population with individuals in the second
state. The jth element of v1T hence show how ”valuable” is a j-state individual for the
population far in the future. For example in age-structured populations, pre-reproductive
individuals give a lower contribution to population growth because it takes time before they
start reproducing (this leaves less time for exponential growth) and may also die before
they reach reproduction. As a consequence, the first few elements of v1T , which correspond
to the pre-reproductive age classes, are smaller than those of the reproductive age classes.
Post-reproductive individuals do not contribute to future population growth at all, and the
corresponding elements of v1T are zero. In other matrix models, the contribution of various
states to long-term population growth may be less intuitively obvious. All these differences
are however quantitatively captured in the dominant left eigenvector, the elements of which
are called the reproductive values of the corresponding states.
Equation (38) in section 5.4 predicts that the population will eventually grow expo-
nentially at a speed given by the dominant eigenvalue λ1 and its structure will be given
by the dominant eigenvector u1 (the same is obtained also in equation (49) of section 5.5).
There are two steps in the derivation where exceptions may appear.
50
1. There may be no ”dominant” eigenvalue, i.e., no single eigenvalue λ1 that is greatest
in absolute value. If there are several eigenvalues of the same absolute value, then
all terms corresponding to these in equation (37) are of the same magnitude and
cannot be neglected to arrive at the single term in equation (38). This is the
case if the largest eigenvalue is a multiple root, or if there are different eigenvalues
that equal in absolute value (such as 1 and -1, for example). Both possibilities are
degenerate (i.e., exceptional in a mathematical sense), but the latter does show up in
interesting biological systems; this will explain the oscillating dynamics of biennial
plants, numerically investigated in an exercise of section 5.1.1.
2. The number α1 in equation (38) may be zero. In this case, the whole term is of
course zero rather than being the largest; and therefore some other terms of equa-
tion (37) that have been neglected in (38) can in fact be not be neglected. Note
however that even a small (but non-zero) α1 is sufficient for the approximation (38)
t
to be valid, because the powers λλ12 etc. in equation (37) become overwhelmingly
small when t is large.
The number α1 depends on the initial population vector N(0). In some models, α1 is
positive for all possible initial vectors; but in other systems, there are initial vectors
that make α1 zero. An example for the latter is an age-structured population where
there are post-reproductive ages, and the initial population contains only individuals
who will no longer reproduce. Obviously, this initial population must go extinct even
if other initial populations can grow with the same projection matrix. In section
5.5, we saw that α1 is the weighted sum of the elements of the initial vector N(0),
and the weights are given by the dominant left eigenvector. If the dominant left
eigenvector contains only positive numbers, then the weighted sum must also be
positive, and therefore α1 is positive (assuming that at least one element of N(0)
is positive, but otherwise there is no population to start with). If however the
dominant left eigenvector contains some zeros, then we can initiate the population
with individuals in only those states that have zero elements in the dominant left
eigenvector and then α1 is zero.
To see when we can exclude the above two exceptions from exponential growth with
stable population structure, we need to classify the matrix population models into three
categories. This classification can be done just by inspecting two properties of the life
cycle graph.
• We say that the life cycle graph and the corresponding projection matrix is irre-
ducible if it is possible to get from any state to any state following the arrows of the
life cycle graph. For example, the life cycle graph of an age-structured population
in Figure 9 is not irreducible (i.e., it is reducible) because from state 3, it is not
possible to get back to state 1 or to state 2. If however we delete the grey part of
the figure (by assuming P2 = 0), then the remaining life cycle graph is irreducible.
51
Figure 9: With a post-reproductive age class (age 3), the life cycle graph is reducible. If
the grey part of the figure is deleted, then the remainder is irreducible.
• For irreducible graphs, we need to find all ”loops” (sequences of arrows along which
we get back to the state we started from) and record their length (the number of
arrows followed within the loop). Let I denote the greatest common divisor of all
loop lengths (I is called the index of imprimitivity). If I = 1, then the life cycle
graph and the projection matrix is primitive; otherwise it is imprimitive (”not prim-
itive”). In the life cycle graph shown in Figure 10, there are three loops: 1 → 2 → 1
(length 2), 1 → 2 → 3 → 4 → 1 (length 4), and, in grey, 1 → 2 → 3 → 1 (length
3). The greatest common divisor of the loop lengths 2, 3, 4 is I = 1, hence the life
cycle graph is primitive. If however we delete the grey arrow from Figure 10 (by
assuming F3 = 0), then the remaining loops are of lengths 2 and 4, the greatest
common divisor of which is I = 2. Hence without the grey arrow, the life cycle
graph is imprimitive.
Figure 10: With all arrows present, this age-structured population has a primitive life
cycle graph. If however the grey arrow is absent, the life cycle graph is imprimitive.
The above two properties classify all projection matrices into three categories: (i)
primitive matrices that are irreducible and have I = 1; (ii) irreducible but imprimitive
matrices (irreducible but I > 1); and (iii) reducible matrices (here I is irrelevant). The
Perron-Frobenius theorem, which we describe in the remainder of this section, character-
izes the eigenvalues and eigenvectors of these three categories of projection matrices. As
we have seen above, the eigenvalues and eigenvectors determine in turn the long-term
dynamics of matrix population models. The Perron-Frobenius theorem is valid for non-
negative matrices, i.e., for matrices all elements of which are positive or zero. Projection
matrices of discrete-time matrix models contain life history parameters such as fecun-
dities and probabilities of survival, transition probabilities between various states and
52
similar quantities that are non-negative by their nature. Hence projection matrices are
non-negative matrices and fall under the Perron-Frobenius theorem.
(i) Primitive matrices. If the projection matrix is primitive, then it has a dominant
eigenvalue which is a strictly positive real number, a simple (not multiple) root
of the characteristic equation and strictly larger in absolute value than any other
eigenvalue. All elements of the corresponding dominant eigenvector are strictly
positive (no zeros), and the same holds also for the dominant left eigenvector.
These facts imply that neither of the two problems described at the beginning of
this section can occur. For primitive matrices, there is a single dominant eigenvalue,
hence the first exception is excluded; and because the dominant left eigenvector has
no zeros but only positive elements, the possibility that α1 could be zero is also
excluded (see the details above).
In summary, if the projection matrix is primitive, then the population vector con-
verges to the stable structure given by the dominant eigenvector and afterwards
grows exponentially with the dominant eigenvalue. The convergence occurs for any
possible initial population vector that has at least one positive element (otherwise
the initial population does not exist). At the stable structure, no state is empty
(because all elements of the dominant eigenvector are positive)8 .
(ii) Irreducible but imprimitive matrices. Irreducible but imprimitive matrices have a
strictly positive eigenvalue such that no eigenvalue is greater in absolute value, but
have another I − 1 eigenvalues that have the same absolute value (with I > 2, some
of these are complex eigenvalues). The dynamics of these matrix models oscillate
with period I; the simplest example is the dynamics of biennial plants, where I = 2
and the population exhibits cycles of period 2.
The best way to analyze these models is to follow the dynamics not every year but
every I years (i.e., to use N(t + I) = AI N(t)). In this way, we don’t see the oscil-
lations (for example if the population has cycles over I = 2 years, then odd years
are the same and even years are the same; if we census the population in only even
years, then we see only one type of years). AI , the matrix projecting over I years,
is primitive such that we can analyze the long-term trends as above, and can ”fill
in” the oscillations afterwards.
(iii) Reducible matrices. With reducible matrices, both exceptions detailed at the begin-
ning of this section can occur. One can try to isolate a part of the life cycle graph
8
This is sometimes called the strong ergodic theorem of matrix models. ”Ergodic” means that the
dynamics becomes independent of the initial vector. ”Strong ergodic” means that it converges to a definite
structure, given by the dominant eigenvector; in temporally fluctuating environments, for example, the
dynamics still forgets its initial state but because of the fluctuations, there is no single definitive structure
to which the dynamics converges.
53
that is irreducible and there are no arrows pointing inwards from other parts. (Out-
going arrows are no problem because they correspond to losses similar to mortality,
but incoming arrows represent an unknown amount of inflow from other parts of the
graph.) Such an isolated part can be analyzed separately. For example in Figure 9,
we can cut the post-reproductive age 3 and analyze the rest of the graph. Once the
dynamics of age classes 1-2 are known, we can calculate the number of 3-year olds
simply as the number of survivors from the previous year’s 2-year olds.
54