KEMBAR78
Lecture Notes in Population Genetics | PDF | Zygosity | Population Genetics
0% found this document useful (0 votes)
28 views276 pages

Lecture Notes in Population Genetics

The lecture notes by Kent E. Holsinger cover various aspects of population genetics, including genetic transmission, the Hardy-Weinberg principle, and the genetic structure of populations. The content is structured into multiple sections, discussing topics such as natural selection, genetic drift, quantitative genetics, molecular evolution, and phylogeography. The notes are intended for educational purposes and are licensed under the Creative Commons Attribution License.

Uploaded by

John Fuerst
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views276 pages

Lecture Notes in Population Genetics

The lecture notes by Kent E. Holsinger cover various aspects of population genetics, including genetic transmission, the Hardy-Weinberg principle, and the genetic structure of populations. The content is structured into multiple sections, discussing topics such as natural selection, genetic drift, quantitative genetics, molecular evolution, and phylogeography. The notes are intended for educational purposes and are licensed under the Creative Commons Attribution License.

Uploaded by

John Fuerst
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 276

Lecture Notes

in
Population Genetics

Kent E. Holsinger
Department of Ecology & Evolutionary Biology, U-3043
University of Connecticut
Storrs, CT 06269-3043
c 2001-2015 Kent E. Holsinger

Creative Commons License


These notes are licensed under the Creative Commons Attribution License. To view a copy of this
license, visit https://creativecommons.org/licenses/by/4.0/ or send a letter to Creative Commons,
559 Nathan Abbott Way, Stanford, California 94305, USA.

i
ii
Contents

Preface v

I The genetic structure of populations 1


1 Genetic transmission in populations 3

2 The Hardy-Weinberg Principle and estimating allele frequencies 7

3 Inbreeding and self-fertilization 21

4 Testing Hardy-Weinberg 31

5 Analyzing the genetic structure of populations 39

6 Analyzing the genetic structure of populations: a Bayesian approach 53

7 Analyzing the genetic structure of populations: individual assignment 61

8 Two-locus population genetics 67

II The genetics of natural selection 75


9 The Genetics of Natural Selection 77

10 Estimating viability 89

11 Selection at one locus with many alleles, fertility selection, and sexual
selection 95

iii
III Genetic drift 103
12 Genetic Drift 105

13 Mutation, Migration, and Genetic Drift 119

14 Selection and genetic drift 125

15 The Coalescent 129

IV Quantitative genetics 135


16 Introduction to quantitative genetics 137

17 Resemblance among relatives 147

18 Evolution of quantitative traits 157

19 Selection on multiple characters 165

20 Association mapping: the background from two-locus genetics 173

V Molecular evolution 183


21 Introduction to molecular population genetics 185

22 Patterns of nucleotide and amino acid substitution 199

VI Phylogeography 217
23 AMOVA and Statistical phylogeography 219

24 Population genomics 241

25 Genetic structure of human populations in Great Britain 253

iv
Preface

Acknowledgments
I’ve used various versions of these notes in my graduate course on population genetics
http://darwin.eeb.uconn.edu/eeb348 since 2001. Some of them date back even earlier than
that. Several generations of students and teaching assistants have found errors and helped
me to find better ways of explaining arcane concepts. In addition, the following people have
found various errors and helped me to correct them.

Brian Cady Jennifer Steinbachs


Nora Mitchell Kathryn Theiss
Rachel Prunier Yufeng Wu
Uzay Sezen
Robynn Shannon

I am indebted to everyone who has found errors or suggested better ways of explaining
concepts, but don’t blame them for any errors that are left. Those are all mine.

v
Part I

The genetic structure of populations

1
Chapter 1

Genetic transmission in populations

Mendel’s rules describe how genetic transmission happens between parents and offspring.
Consider a monohybrid cross:
A1 A2 × A1 A2

1 1
A A
4 1 1
A
2 1 2
A 14 A2 A2

Population genetics describes how genetic transmission happens between a population of


parents and a population of offspring. Consider the following data from the Est-3 locus of
Zoarces viviparus:1
Genotype of offspring
Maternal genotype A1 A1 A1 A2 A2 A2
A1 A1 305 516
A1 A2 459 1360 877
A2 A2 877 1541

This table describes, empirically, the relationship between the genotypes of mothers and the
genotypes of their offspring. We can also make some inferences about the genotypes of the
fathers in this population, even though we didn’t see them.

1. 305 out of 821 male gametes that fertilized eggs from A1 A1 mothers carried the A1
allele (37%).
2. 877 out of 2418 male gametes that fertilized eggs from A2 A2 mothers carried the A1
allele (36%).
1
from [10]

3
Question How many of the 2,696 male gametes that fertilized eggs from A1 A2 mothers
carried the A1 allele?
Recall We don’t know the paternal genotypes or we wouldn’t be asking this question.

• There is no way to tell which of the 1360 A1 A2 offspring received A1 from their
mother and which from their father.
• Regardless of what the genotype of the father is, half of the offspring of a het-
erozygous mother will be heterozygous.2
• Heterozygous offspring of heterozygous mothers contain no information about
the frequency of A1 among fathers, so we don’t bother to include them in our
calculations.

Rephrase How many of the 1336 homozygous progeny of heterozygous mothers received
an A1 allele from their father?
Answer 459 out of 1336 (34%)
New question How many of the offspring where the paternal contribution can be identified
received an A1 allele from their father?
Answer (305 + 459 + 877) out of (305 + 459 + 877 + 516 + 877 + 1541) or 1641 out of
4575 (36%)

An algebraic formulation of the problem


The above calculations tell us what’s happening for this particular data set, but those of you
who know me know that there has to be a little math coming to describe the situation more
generally. Here it is:
Genotype Number Sex
A1 A1 F11 female
A1 A2 F12 female
A2 A2 F22 female
A1 A1 M11 male
A1 A2 M12 male
A2 A2 M22 male
2
Assuming we’re looking at data from a locus that has only two alleles. If there were four alleles at a
locus, for example, all of the offspring might be heterozygous.

4
then
2F11 +F12 2F22 +F12
pf = 2F11 +2F12 +2F22
qf = 2F11 +2F12 +2F22

2M11 +M12 2M22 +M12


pm = 2M11 +2M12 +2M22
qm = 2M11 +2M12 +2M22
,
where pf is the frequency of A1 in mothers and pm is the frequency of A1 in fathers.3
Since every individual in the population must have one father and one mother, the
frequency of A1 among offspring is the same in both sexes, namely
1
p = (pf + pm ) ,
2
assuming that all matings have the same average fecundity and that the locus we’re studying
is autosomal.4
Question: Why do those assumptions matter?
Answer: If pf = pm , then the allele frequency among offspring is equal to the allele
frequency in their parents, i.e., the allele frequency doesn’t change from one generation to
the next. This might be considered the First Law of Population Genetics: If no forces act to
change allele frequencies between zygote formation and breeding, allele frequencies will not
change.

Zero force laws


This is an example of what philosophers call a zero force law. Zero force laws play a very
important role in scientific theories, because we can’t begin to understand what a force does
until we understand what would happen in the absence of any forces. Consider Newton’s
famous dictum:

An object in motion tends to remain in motion in a straight line. An object at


rest tends to remain at rest.

or (as you may remember from introductory physics)5

F = ma .
3
qf = 1 − pf and qm = 1 − pm as usual.
4
And that there are enough offspring produced that we can ignore genetic drift. Have you noticed that I
have a fondness for footnotes? You’ll see a lot more before the semester is through, and you’ll soon discover
that most of my weak attempts at humor are buried in them.
5
Don’t worry if you’re not good at physics. I’m probably worse. What I’m about to tell you is almost
the only thing about physics I can remember.

5
If we observe an object accelerating, we can immediately infer that a force is acting on it,
and we can infer something about the magnitude of that force. However, if an object is
not accelerating we cannot conclude that no forces are acting. It might be that opposing
forces act on the object in such a way that the resultant is no net force. Acceleration is a
sufficient condition to infer that force is operating on an object, but it is not necessary.
What we might call the “First Law of Population Genetics” is analogous to Newton’s
First Law of Motion:

If all genotypes at a particular locus have the same average fecundity and the
same average chance of being included in the breeding population, allele frequen-
cies in the population will remain constant.

For the rest of the semester we’ll be learning about the forces that cause allele frequencies to
change and learning how to infer the properties of those forces from the changes that they
induce. But you must always remember that while we can infer that some evolutionary force
is present if allele frequencies change from one generation to the next, we cannot infer the
absence of a force from a lack of allele frequency change.

6
Chapter 2

The Hardy-Weinberg Principle and


estimating allele frequencies

To keep things relatively simple, we’ll spend much of our time in this course talking about
variation at a single genetic locus, even though alleles at many different loci are involved in
expression of most morphological or physiological traits. We’ll spend about three weeks in
mid-October studying the genetics of quantitative variation, but until then you can asssume
that I’m talking about variation at a single locus unless I specifically say otherwise.

The genetic composition of populations


When I talk about the genetic composition of a population, I’m referring to three aspects of
variation within that population:1
1. The number of alleles at a locus.
2. The frequency of alleles at the locus.
3. The frequency of genotypes at the locus.
It may not be immediately obvious why we need both (2) and (3) to describe the genetic
composition of a population, so let me illustrate with two hypothetical populations:
A1 A1 A1 A2 A2 A2
Population 1 50 0 50
Population 2 25 50 25
1
At each locus I’m talking about. Remember, I’m only talking about one locus at a time, unless I
specifically say otherwise. We’ll see why this matters when we get to two-locus genetics in a few weeks.

7
It’s easy to see that the frequency of A1 is 0.5 in both populations,2 but the genotype
frequencies are very different. In point of fact, we don’t need both genotype and allele
frequencies. We can always calculate allele frequencies from genotype frequencies, but we
can’t do the reverse unless . . .

Derivation of the Hardy-Weinberg principle


We saw last time using the data from Zoarces viviparus that we can describe empirically and
algebraically how genotype frequencies in one generation are related to genotype frequencies
in the next. Let’s explore that a bit further. To do so we’re going to use a technique that is
broadly useful in population genetics, i.e., we’re going to construct a mating table. A mating
table consists of three components:

1. A list of all possible genotype pairings.


2. The frequency with which each genotype pairing occurs.
3. The genotypes produced by each pairing.

Offsrping genotype
Mating Frequency A1 A1 A1 A2 A2 A2
A1 A1 × A1 A1 x211 1 0 0
1 1
A1 A2 x11 x12 2 2
0
A2 A2 x11 x22 0 1 0
1 1
A1 A2 × A1 A1 x12 x11 2 2
0
1 1 1
A1 A2 x212 4 2 4
1 1
A2 A2 x12 x22 0 2 2
A2 A2 × A1 A1 x22 x11 0 1 0
1 1
A1 A2 x22 x12 0 2 2
A2 A2 x222 0 0 1

Believe it or not, in constructing this table we’ve already made three assumptions about the
transmission of genetic variation from one generation to the next:

Assumption #1 Genotype frequencies are the same in males and females, e.g., x11 is the
frequency of the A1 A1 genotype in both males and females.3
2
p1 = 2(50)/200 = 0.5, p2 = (2(25) + 50)/200 = 0.5.
3
It would be easy enough to relax this assumption, but it makes the algebra more complicated without
providing any new insight, so we won’t bother with relaxing it unless someone asks.

8
Assumption #2 Genotypes mate at random with respect to their genotype at this partic-
ular locus.

Assumption #3 Meiosis is fair. More specifically, we assume that there is no segregation


distortion; no gamete competition; no differences in the developmental ability of eggs,
or the fertilization ability of sperm.4 It may come as a surprise to you, but there are
alleles at some loci in some organisms that subvert the Mendelian rules, e.g., the t
allele in house mice, segregation distorter in Drosophila melanogaster, and spore killer
in Neurospora crassa. A pair of papers describing work in Neurospora just appeared a
couple of years ago [29, 72].

Now that we have this table we can use it to calculate the frequency of each genotype in
newly formed zygotes in the population,5 provided that we’re willing to make three additional
assumptions:

Assumption #4 There is no input of new genetic material, i.e., gametes are produced
without mutation, and all offspring are produced from the union of gametes within
this population, i.e., no migration from outside the population.

Assumption #5 The population is of infinite size so that the actual frequency of matings
is equal to their expected frequency and the actual frequency of offspring from each
mating is equal to the Mendelian expectations.

Assumption #6 All matings produce the same number of offspring, on average.

Taking these three assumptions together allows us to conclude that the frequency of a par-
ticular genotype in the pool of newly formed zygotes is
X
(frequency of mating)(frequency of genotype produce from mating) .

So

1 1 1
freq.(A1 A1 in zygotes) = x211 + x11 x12 + x12 x11 + x212
2 2 4
1
= x211 + x11 x12 + x212
4
4
We are also assuming that we’re looking at offspring genotypes at the zygote stage, so that there hasn’t
been any opportunity for differential survival.
5
Not just the offspring from these matings

9
= (x11 + x12 /2)2
= p2
freq.(A1 A2 in zygotes) = 2pq
freq.(A2 A2 in zygotes) = q2

Those frequencies probably look pretty familiar to you. They are, of course, the familiar
Hardy-Weinberg proportions. But we’re not done yet. In order to say that these proportions
will also be the genotype proportions of adults in the progeny generation, we have to make
two more assumptions:

Assumption #7 Generations do not overlap.

Assumption #8 There are no differences among genotypes in the probability of survival.

The Hardy-Weinberg principle


After a single generation in which all eight of the above assumptions are satisfied

freq.(A1 A1 in zygotes) = p2 (2.1)


freq.(A1 A2 in zygotes) = 2pq (2.2)
freq.(A2 A2 in zygotes) = q 2 (2.3)

It’s vital to understand the logic here.

1. If Assumptions #1–#8 are true, then equations 2.1–2.3 must be true.

2. If genotypes are in Hardy-Weinberg proportions, one or more of Assumptions #1–#8


may still be violated.

3. If genotypes are not in Hardy-Weinberg proportions, one or more of Assumptions #1–


#8 must be false.

4. Assumptions #1–#8 are sufficient for Hardy-Weinberg to hold, but they are not nec-
essary for Hardy-Weinberg to hold.

10
Point (3) is why the Hardy-Weinberg principle is so important. There isn’t a population
of any organism anywhere in the world that satisfies all 8 assumptions, even for a single
generation.6 But all possible evolutionary forces within populations cause a violation of at
least one of these assumptions. Departures from Hardy-Weinberg are one way in which we
can detect those forces and estimate their magnitude.7

Estimating allele frequencies


Before we can determine whether genotypes in a population are in Hardy-Weinberg propor-
tions, we need to be able to estimate the frequency of both genotypes and alleles. This is
easy when you can identify all of the alleles within genotypes, but suppose that we’re trying
to estimate allele frequencies in the ABO blood group system in humans. Then we have a
situation that looks like this:

Phenotype A AB B O
Genotype(s) aa ao ab bb bo oo
No. in sample NA NAB NB NO

Now we can’t directly count the number of a, b, and o alleles. What do we do? Well,
more than 50 years ago, some geneticists figured out how with a method they called “gene
counting” [9] and that statisticians later generalized for a wide variety of purposes and called
the EM algorithm [14]. It uses a trick you’ll see repeatedly through this course. When we
don’t know something we want to know, we pretend that we know it and do some calculations
with it. If we’re lucky, we can fiddle with our calculations a bit to relate the thing that we
pretended to know to something we actually do know so we can figure out what we wanted
to know. Make sense? Probably not. But let’s try an example.
If we knew pa , pb , and po , we could figure out how many individuals with the A phenotype
have the aa genotype and how many have the ao genotype, namely
p2a
!
Naa = nA
p2a + 2pa po
!
2pa po
Nao = nA .
p2a + 2pa po
6
There may be some that come reasonably close, but none that fulfill them exactly. There aren’t any
populations of infinite size, for example.
7
Actually, there’s a ninth assumption that I didn’t mention. Everything I said here depends on the
assumption that the locus we’re dealing with is autosomal. We can talk about what happens with sex-linked
loci, if you want. But again, mostly what we get is algebraic complications without a lot of new insight.

11
Obviously we could do the same thing for the B phenotype:
p2b
!
Nbb = nB
p2b + 2pb po
!
2pb po
Nbo = nB .
p2b + 2pb po
Notice that Nab = NAB and Noo = NO (lowercase subscripts refer to genotypes, uppercase
to phenotypes). If we knew all this, then we could calculate pa , pb , and po from
2Naa + Nao + Nab
pa =
2N
2Nbb + Nbo + Nab
pb =
2N
2Noo + Nao + Nbo
po = ,
2N
where N is the total sample size.
Surprisingly enough we can actually estimate the allele frequencies by using this trick.
Just take a guess at the allele frequencies. Any guess will do. Then calculate Naa , Nao ,
Nbb , Nbo , Nab , and Noo as described in the preceding paragraph.8 That’s the Expectation
part the EM algorithm. Now take the values for Naa , Nao , Nbb , Nbo , Nab , and Noo that
you’ve calculated and use them to calculate new values for the allele frequencies. That’s
the Maximization part of the EM algorithm. It’s called “maximization” because what
you’re doing is calculating maximum-likelihood estimates of the allele frequencies, given the
observed (and made up) genotype counts.9 Chances are your new values for pa , pb , and po
won’t match your initial guesses, but10 if you take these new values and start the process
over and repeat the whole sequence several times, eventually the allele frequencies you get
out at the end match those you started with. These are maximum-likelihood estimates of
the allele frequencies.11
Consider the following example:12
Phenotype A AB AB O
No. in sample 25 50 25 15
8
Chances are Naa , Nao , Nbb , and Nbo won’t be integers. That’s OK. Pretend that there really are
fractional animals or plants in your sample and proceed.
9
If you don’t know what maximum-likelihood estimates are, don’t worry. We’ll get to that in a moment.
10
Yes, truth is sometimes stranger than fiction.
11
I should point out that this method assumes that genotypes are found in Hardy-Weinberg proportions.
12
This is the default example available in the Java applet at http://darwin.eeb.uconn.edu/simulations/em-
abo.html.

12
We’ll start with the guess that pa = 0.33, pb = 0.33, and po = 0.34. With that assumption
we would calculate that 25(0.332 /(0.332 + 2(0.33)(0.34))) = 8.168 of the A phenotypes in
the sample have genotype aa, and the remaining 16.832 have genotype ao. Similarly, we can
calculate that 8.168 of the B phenotypes in the population sample have genotype bb, and the
remaining 16.832 have genotype bo. Now that we have a guess about how many individuals
of each genotype we have,13 we can calculate a new guess for the allele frequencies, namely
pa = 0.362, pb = 0.362, and po = 0.277. By the time we’ve repeated this process four more
times, the allele frequencies aren’t changing anymore. So the maximum likelihood estimate
of the allele frequencies is pa = 0.372, pb = 0.372, and po = 0.256.

What is a maximum-likelihood estimate?


I just told you that the method I described produces “maximum-likelihood estimates” for
the allele frequencies, but I haven’t told you what a maximum-likelihood estimate is. The
good news is that you’ve been using maximum-likelihood estimates for as long as you’ve been
estimating anything, without even knowing it. Although it will take me awhile to explain
it, the idea is actually pretty simple.
Suppose we had a sock drawer with two colors of socks, red and green. And suppose
we were interested in estimating the proportion of red socks in the drawer. One way of
approaching the problem would be to mix the socks well, close our eyes, take one sock from
the drawer, record its color and replace it. Suppose we do this N times. We know that the
number of red socks we’ll get might be different the next time, so the number of red socks
we get is a random variable. Let’s call it K. Now suppose in our actual experiment we find
k red socks, i.e., K = k. If we knew p, the proportion of red socks in the drawer, we could
calculate the probability of getting the data we observed, namely
!
N k
P(K = k|p) = p (1 − p)(N −k) . (2.4)
k

This is the binomial probability distribution. The part on the left side of the equation is
read as “The probability that we get k red socks in our sample given the value of p.” The
word “given” means that we’re calculating the probability of our data conditional on the
(unknown) value p.
Of course we don’t know p, so what good does writing (2.4) do? Well, suppose we
reverse the question to which equation (2.4) is an answer and call the expression in (2.4)
the “likelihood of the data.” Suppose further that we find the value of p that makes the
13
Since we’re making these genotype counts up, we can also pretend that it makes sense to have fractional
numbers of genotypes.

13
likelihood bigger than any other value we could pick.14 Then p̂ is the maximum-likelihood
estimate of p.15
In the case of the ABO blood group that we just talked about, the likelihood is a bit
more complicated
!
N  NA  N B  N O
p2a + 2pa po 2pa pN
b
AB
p2b + 2pb po p2o (2.5)
NA NAB NB NO

This is a multinomial probability distribution. It turns out that one way to find the values
of pa , pb , and po is to use the EM algorithm I just described.16

An introduction to Bayesian inference


Maximum-likelihood estimates have a lot of nice features, but likelihood is a slightly back-
wards way of looking at the world. The likelihood of the data is the probability of the data,
x, given parameters that we don’t know, φ, i.e, P(x|φ). It seems a lot more natural to think
about the probability that the unknown parameter takes on some value, given the data, i.e.,
P(φ|x). Surprisingly, these two quantities are closely related. Bayes’ Theorem tells us that

P(x|φ)P(φ)
P(φ|x) = . (2.6)
P(x)

We refer to P(φ|x) as the posterior distribution of φ, i.e., the probability that φ takes on
a particular value given the data we’ve observed, and to P(φ) as the prior distribution of
φ, i.e., the probability that φ takes on a particular value before we’ve looked at any data.
Notice how the relationship in (2.6) mimics the logic we use to learn about the world in
everyday life. We start with some prior beliefs, P(φ), and modify them on the basis of data
or experience, P(x|φ), to reach a conclusion, P(φ|x). That’s the underlying logic of Bayesian
inference.17

14
Technically, we treat P(K = k|p) as a function of p, find the value of p that maximizes it, and call that
value p̂.
15
You’ll be relieved to know that in this case, p̂ = k/N .
16
There’s another way I’d be happy to describe if you’re interested, but it’s a lot more complicated.
17
If you’d like a little more information on why a Bayesian approach makes sense, you might want to take
a look at my lecture notes from the Summer Institute in Statistical Genetics.

14
Estimating allele frequencies with two alleles
Let’s suppose we’ve collected data from a population of Protea repens18 and have found 7
alleles coding for the fast allele at a enzyme locus encoding glucose-phosphate isomerase in
a sample of 20 alleles. We want to estimate the frequency of the fast allele. The maximum-
likelihood estimate is 7/20 = 0.35, which we got by finding the value of p that maximizes
!
N k
P(k|N, p) = p (1 − p)N −k ,
k

where N = 20 and k = 7. A Bayesian uses the same likelihood, but has to specify a prior
distribution for p. If we didn’t know anything about the allele frequency at this locus in P.
repens before starting the study, it makes sense to express that ignorance by choosing P(p)
to be a uniform random variable on the interval [0, 1]. That means we regarded all values of
p as equally likely prior to collecting the data.19
Until about 25 years ago20 it was necessary to do a bunch of complicated calculus to
combine the prior with the likelihood to get a posterior. Since the early 1990s statisticians
have used a simulation approach, Monte Carlo Markov Chain sampling, to construct numer-
ical samples from the posterior. For the problems encountered in this course, we’ll mostly
be using the freely available software package JAGS to implement Bayesian analyses. For the
problem we just encountered, here’s the code that’s needed to get our results:21

model {

# likelihood
k ~ dbin(p, N)

# prior
p ~ dunif(0,1)

}
18
A few of you may recognize that I didn’t choose that species entirely at random, even though the “data”
I’m presenting here are entirely fanciful.
19
If we had prior information about the likely values of p, we’d pick a different prior distribution to reflect
our prior information. See the Summer Institute notes for more information, if you’re interested.
20
OK, I realize that 25 years ago was before most of you were born, but I was already teaching population
genetics then. Cut me a little slack.
21
This code and other JAGS code used in the course can be found on the course web site by following the
links associated with the corresponding lecture.

15
Running this in JAGS with k = 7 and n = 20 produces these results:22

> source("binomial.R")
Compiling model graph
Resolving undeclared variables
Allocating nodes
Graph Size: 5

Initializing model

|**************************************************| 100%
Inference for Bugs model at "binomial.txt", fit using jags,
5 chains, each with 2000 iterations (first 1000 discarded)
n.sims = 5000 iterations saved
mu.vect sd.vect 2.5% 25% 50% 75% 97.5% Rhat n.eff
p 0.363 0.099 0.187 0.290 0.358 0.431 0.567 1.001 3800
deviance 4.289 1.264 3.382 3.487 3.817 4.579 7.909 1.001 3100

For each parameter, n.eff is a crude measure of effective sample size,


and Rhat is the potential scale reduction factor (at convergence, Rhat=1).

DIC info (using the rule, pD = var(deviance)/2)


pD = 0.8 and DIC = 5.1
DIC is an estimate of expected predictive error (lower deviance is better).
>

The column headings should be fairly self-explanatory, except for the one labeled MC
error.23 mean is the posterior mean. It’s our best guess of the value for the frequency of the
fast allele. s.d. is the posterior standard deviation. It’s our best guess of the uncertainty
associated with our estimate of the frequency of the fast allele. The 2.5%, 50%, and 97.5%
columns are the percentiles of the posterior distribution. The [2.5%, 97.5%] interval is the
95% credible interval, which is analogous to the 95% confidence interval in classical statistics,
except that we can say that there’s a 95% chance that the frequency of the fast allele lies
within this interval.24 Since the results are from a simulation, different runs will produce
slightly different results. In this case, we have a posterior mean of about 0.36 (as opposed
22
Nora will show you how to run JAGS through R in lab.
23
If you’re interested in what MC error means, ask. Otherwise, I don’t plan to say anything about it.
24
If you don’t understand why that’s different from a standard confidence interval, ask me about it.

16
to the maximum-likelihood estimate of 0.35), and there is a 95% chance that p lies in the
interval [0.19, 0.57].25

Returning to the ABO example


Here’s data from the ABO blood group:26

Phenotype A AB B O Total
Observed 862 131 365 702 2060

To estimate the underlying allele frequencies, pA , pB , and pO , we have to remember how the
allele frequencies map to phenotype frequencies:27

Freq(A) = p2A + 2pA pO


Freq(AB) = 2pA pB
Freq(B) = p2B + 2pB pO
Freq(O) = p2O .

Hers’s the JAGS code we use to estimate the allele frequencies:

model {
# likelihood
pi[1] <- p.a*p.a + 2*p.a*p.o
pi[2] <- 2*p.a*p.b
pi[3] <- p.b*p.b + 2*p.b*p.o
pi[4] <- p.o*p.o
x[1:4] ~ dmulti(pi[],n)

# priors
a1 ~ dexp(1)
b1 ~ dexp(1)
o1 ~ dexp(1)
25
See the Summer Institute notes for more details on why the Bayesian estimate of p is different from
the maximum-likelihood estimate. Suffice it to say that when you have a reasonable amount of data, the
estimates are barely distinguishable. Also, don’t worry about what deviance is or what DIC means for the
moment. We’ll get to that later.
26
This is almost the last time! I promise.
27
Assuming genotypes are in Hardy-Weinberg proportions. We’ll relax that assumption later.

17
p.a <- a1/(a1 + b1 + o1)
p.b <- b1/(a1 + b1 + o1)
p.o <- o1/(a1 + b1 + o1)

n <- sum(x[])
}

The dmulti() is a multinomial probability, a simple generalization of the binomial prob-


ability to samples when there are more than two categories. The priors are some mumbo
jumbo necessary to produce the rough equivalent of uniform [0,1] priors with more than two
alleles.28 sum() is a built-in function that saves me the trouble of calculating the sample size
and ensures that the n in dmulti() is consistent with the individual sample components.
The x=c() produces a vector of counts arranged in the same order as the frequencies in
pi[]. Here are the results:

> source("multinomial.R")
Compiling model graph
Resolving undeclared variables
Allocating nodes
Graph Size: 20

Initializing model

|++++++++++++++++++++++++++++++++++++++++++++++++++| 100%
|**************************************************| 100%
Inference for Bugs model at "multinomial.txt", fit using jags,
5 chains, each with 2000 iterations (first 1000 discarded)
n.sims = 5000 iterations saved
mu.vect sd.vect 2.5% 25% 50% 75% 97.5% Rhat n.eff
p.a 0.282 0.008 0.266 0.276 0.282 0.287 0.297 1.001 5000
p.b 0.129 0.005 0.118 0.125 0.129 0.133 0.140 1.001 5000
p.o 0.589 0.008 0.573 0.584 0.589 0.595 0.606 1.001 5000
deviance 27.811 2.007 25.830 26.363 27.229 28.577 33.245 1.001 4400

For each parameter, n.eff is a crude measure of effective sample size,


and Rhat is the potential scale reduction factor (at convergence, Rhat=1).

28
It produces a Dirichlet(1,1,1), if you really want to know.

18
DIC info (using the rule, pD = var(deviance)/2)
pD = 2.0 and DIC = 29.8
DIC is an estimate of expected predictive error (lower deviance is better).
>

Notice that the posterior means are very close to the maximum-likelihood estimates, but
that we also have 95% credible intervals so that we have an assessment of how reliable the
Bayesian estimates are. Getting them from a likelihood analysis is possible, but it takes a
fair amount of additional work.

19
20
Chapter 3

Inbreeding and self-fertilization

Remember that long list of assumptions associated with derivation of the Hardy-Weinberg
principle that I went over a couple of lectures ago? Well, we’re about to begin violating
assumptions to explore the consequences, but we’re not going to violate them in order.
We’re first going to violate Assumption #2:

Genotypes mate at random with respect to their genotype at this particular locus.

There are many ways in which this assumption might be violated:

• Some genotypes may be more successful in mating than others — sexual selection.

• Genotypes that are different from one another may mate more often than expected —
disassortative mating, e.g., self-incompatibility alleles in flowering plants, MHC loci in
humans (the smelly t-shirt experiment) [82].

• Genotypes that are similar to one another may mate more often than expected —
assortative mating.

• Some fraction of the offspring produced may be produced asexually.

• Individuals may mate with relatives — inbreeding.

– self-fertilization
– sib-mating
– first-cousin mating
– parent-offspring mating

21
– etc.

When there is sexual selection or disassortative mating genotypes differ in their chances
of being included in the breeding population. As a result, allele and genotype frequencies
will tend to change from one generation to the next. We’ll talk a little about these types of
departures from random mating when we discuss the genetics of natural selection in a few
weeks, but we’ll ignore them for now. In fact, we’ll also ignore assortative mating, since it’s
properties are fairly similar to those of inbreeding, and inbreeding is easier to understand.

Self-fertilization
Self-fertilization is the most extreme form of inbreeding possible, and it is characteristic of
many flowering plants and some hermaphroditic animals, including freshwater snails and
that darling of developmental genetics, Caenorhabditis elegans.1 It’s not too hard to figure
out what the consequences of self-fertilization will be without doing any algebra.

• All progeny of homozygotes are themselves homozygous.

• Half of the progeny of heterozygotes are heterozygous and half are homozygous.

So you might expect that the frequency of heterozygotes would be halved every generation,
and you’d be right. To see why, consider the following mating table:

Offsrping genotype
Mating frequency A1 A1 A1 A2 A2 A2
A1 A1 × A1 A1 x11 1 0 0
1 1 1
A1 A2 × A1 A2 x12 4 2 4
A2 A2 × A2 A2 x22 0 0 1

Using the same technique we used to derive the Hardy-Weinberg principle, we can calculate
the frequency of the different offspring genotypes from the above table.
1
It could be that it is characteristic of many hermaphroditic animal parasites, but I’m a plant biologist.
I know next to nothing about animal mating systems, so I don’t have a good feel for how extensively
self-fertilization has been looked for in hermaphroditic animals. You should also know that I ied when I
wrote that “self-fertilization is the most extreme form of inbreeding.” The form of self-fertilization I’m
going to describe actually isn’t the most extreme form of self-fertilization possible. That honor belongs to
gametophytic self-fertilization in homosporous plants. The offspring of gametophytic self-fertilization are
uniformly homozygous at every locus in the genome. For more information see [36]

22
x011 = x11 + x12 /4 (3.1)
x012 = x12 /2 (3.2)
x022 = x22 + x12 /4 (3.3)
I use the 0 to indicate the next generation. Notice that in making this caclulation I assume
that all other conditions associated with Hardy-Weinberg apply (meiosis is fair, no differences
among genotypes in probability of survival, no input of new genetic material, etc.). We can
also calculate the frequency of the A1 allele among offspring, namely

p0 = x011 + x012 /2 (3.4)


= x11 + x12 /4 + x12 /4 (3.5)
= x11 + x12 /2 (3.6)
= p (3.7)
These equations illustrate two very important principles that are true with any system
of strict inbreeding:
1. Inbreeding does not cause allele frequencies to change, but it will generally cause
genotype frequencies to change.
2. Inbreeding reduces the frequency of heterozygotes relative to Hardy-Weinberg expec-
tations. It need not eliminate heterozygotes entirely, but it is guaranteed to reduce
their frequency.

• Suppose we have a population of hermaphrodites in which x12 = 0.5 and we


subject it to strict self-fertilization. Assuming that inbred progeny are as likely
to survive and reproduce as outbred progeny, x12 < 0.01 in six generations and
x12 < 0.0005 in ten generations.

Partial self-fertilization
Many plants reproduce by a mixture of outcrossing and self-fertilization. To a population
geneticist that means that they reproduce by a mixture of selfing and random mating.2 Now
2
It would be more accurate to write: “Population geneticists usually model this mixture as a mixture of
self-fertilization and random mating. That simple model ignores a lot of complexity in how self-fertilization
happens, but it’s a useful approximation for most purposes.”

23
I’m going to pull a fast one and derive the equations that determine how allele frequencies
change from one generation to the next without using a mating table. To do so, I’m going
to imagine that our population consists of a mixture of two populations. In one part of the
population all of the reproduction occurs through self-fertilization and in the other part all
of the reproduction occurs through random mating. If you think about it for a while, you’ll
realize that this is equivalent to imagining that each plant reproduces some fraction of the
time through self-fertilization and some fraction of the time through random mating.3 Let
σ be the fraction of progeny produced through self-fertilization, then

x011 = p2 (1 − σ) + (x11 + x12 /4)σ (3.8)


x012 = 2pq(1 − σ) + (x12 /2)σ (3.9)
x022 = q 2 (1 − σ) + (x22 + x12 /4)σ (3.10)

Notice that I use p2 , 2pq, and q 2 for the genotype frequencies in the part of the population
that’s mating at random. Question: Why can I get away with that?4
It takes a little more algebra than it did before, but it’s not difficult to verify that the
allele frequencies don’t change between parents and offspring.

n o
p0 = p2 (1 − σ) + (x11 + x12 /4)σ + {pq(1 − σ) + (x12 /4)σ} (3.11)
= p(p + q)(1 − σ) + (x11 + x12 /2)σ (3.12)
= p(1 − σ) + pσ (3.13)
= p (3.14)

Because homozygous parents can always have heterozygous offspring (when they out-
cross), heterozygotes are never completely eliminated from the population as they are with
complete self-fertilization. In fact, we can solve for the equilibrium frequency of heterozy-
gotes, i.e., the frequency of heterozygotes reached when genotype frequencies stop changing.5
By definition, an equilibrium for x12 is a value such that if we put it in on the right side of
equation (3.9) we get it back on the left side, or in equations
3
Again, it would be more accurate to write: “If you tink about it for a while, you’ll realize that for
purposes of understanding how genotype frequencies change through time this is equivalent to assuming
that each plant produces some fraction of its progeny through self-fertilization and some fraction through
outcrossing.”
4
If you’re being good little boys and girls and looking over these notes before you get to class, when you
see Question in the notes, you’ll know to think about that a bit, because I’m not going to give you the
answer in the notes, I’m going to help you discover it during lecture.
5
This is analogous to stopping the calculation and re-calculation of allele frequencies in the EM algorithm
when the allele frequency estimates stop changing.

24
x̂12 = 2pq(1 − σ) + (x̂12 /2)σ (3.15)
x̂12 (1 − σ/2) = 2pq(1 − σ) (3.16)
2pq(1 − σ)
x̂12 = (3.17)
(1 − σ/2)

It’s worth noting several things about this set of equations:

1. I’m using x̂12 to refer to the equilibrium frequency of heterozygotes. I’ll be using hats
over variables to denote equilibrium properties throughout the course.6

2. I can solve for x̂12 in terms of p because I know that p doesn’t change. If p changed,
the calculations wouldn’t be nearly this simple.

3. The equilibrium is approached gradually (or asymptotically as mathematicians would


say). A single generation of random mating will put genotypes in Hardy-Weinberg
proportions (assuming all the other conditions are satisfied), but many generations
may be required for genotypes to approach their equilibrium frequency with partial
self-fertilization.

Inbreeding coefficients
Now that we’ve found an expression for x̂12 we can also find expressions for x̂11 and x̂22 . The
complete set of equations for the genotype frequencies with partial selfing are:
σpq
x̂11 = p2 + (3.18)
2(1 − σ/2)
!
σpq
x̂12 = 2pq − 2 (3.19)
2(1 − σ/2)
σpq
x̂22 = q2 + (3.20)
2(1 − σ/2)
6
Unfortunately, I’ll also be using hats to denote estimates of unknown parameters, as I did when discussing
maximum-likelihood estimates of allele frequencies. I apologize for using the same notation to mean different
things, but I’m afraid you’ll have to get used to figuring out the meaning from the context. Believe me.
Things are about to get a lot worse. Wait until I tell you how many different ways population geneticists
use a parameter f that is commonly called the inbreeding coefficient.

25
Notice that all of those equations have a term σ/(2(1 − σ/2)). Let’s call that f . Then we
can save ourselves a little hassle by rewriting the above equations as:
x̂11 = p2 + f pq (3.21)
x̂12 = 2pq(1 − f ) (3.22)
x̂22 = q 2 + f pq (3.23)
Now you’re going to have to stare at this a little longer, but notice that x̂12 is the frequency
of heterozygotes that we observe and 2pq is the frequency of heterozygotes we’d expect
under Hardy-Weinberg in this population if we were able to observe the genotype and allele
frequencies without error. So

x̂12
1−f = (3.24)
2pq
x̂12
f = 1− (3.25)
2pq
observed heterozygosity
= 1− (3.26)
expected heterozygosity
f is the inbreeding coefficient. When defined as 1 - (observed heterozygosity)/(expected
heterozygosity) it can be used to measure the extent to which a particular population departs
from Hardy-Weinberg expectations.7 When f is defined in this way, I refer to it as the
population inbreeding coefficient.8
But f can also be regarded as a function of a particular system of mating. With par-
tial self-fertilization the population inbreeding coefficient when the population has reached
equilibrium is σ/(2(1 − σ/2)). When regarded as the inbreeding coefficient predicted by a
particular system of mating, I refer to it as the equilibrium inbreeding coefficient.
We’ll encounter at least two more definitions for f once I’ve introduced idea of identity
by descent.

Identity by descent
Self-fertilization is, of course, only one example of the general phenomenon of inbreeding —
non-random mating in which individuals mate with close relatives more often than expected
7
f can be negative if there are more heterozygotes than expected, as might be the case if cross-homozygote
matings are more frequent than expected at random.
8
To be honest, I’ll try to remember to refer to it this way. Chances are that I’ll forget sometimes and
just call it the inbreeding coefficient. If I do, you’ll either have to figure out what I mean from the context
or ask me to be more explicit.

26
at random. We’ve already seen that the consequences of inbreeding can be described in
terms of the inbreeding coefficient, f and I’ve introduced you to two ways in which f can be
defined.9 I’m about to introduce you to one more, but first I have to tell you about identity
by descent.

Two alleles at a single locus are identical by descent if the are identical copies of
the same allele in some earlier generation, i.e., both are copies that arose by DNA
replication from the same ancestral sequence without any intervening mutation.

We’re more used to classifying alleles by type than by descent. All though we don’t
usually say it explicitly, we regard two alleles as the “same,” i.e., identical by type, if they
have the same phenotypic effects. Whether or not two alleles are identical by descent,
however, is a property of their genealogical history. Consider the following two scenarios:

Identity by descent
A1 → A1
%
A1
&
A1 → A1

Identity by type
A1 → A1
%
A1
&
A2 → A1
↑ ↑
mutation mutation

In both scenarios, the alleles at the end of the process are identical in type, i.e., they’re
both A1 alleles. In the second scenario, however, they are identical in type only because
one of the alleles has two mutations in its history.10 So alleles that are identical by descent
will also be identical by type, but alleles that are identical by type need not be identical by
descent.11
9
See paragraphs above describing the population and equilibrium inbreeding coefficient.
10
Notice that we could have had each allele mutate independently to A2 .
11
Systematists in the audience will recognize this as the problem of homoplasy.

27
A third definition for f is the probability that two alleles chosen at random are identical
by descent.12 Of course, there are several aspects to this definition that need to be spelled
out more explicitly.13

• In what sense are the alleles chosen at random, within an individual, within a particular
population, within a particular set of populations?

• How far back do we trace the ancestry of alleles to determine whether they’re identical
by descent? Two alleles that are identical by type may not share a common ancestor
if we trace their ancestry only 20 generations, but they may share a common ancestor
if we trace their ancestry back 1000 generations and neither may have undergone any
mutations since they diverged from one another.

Let’s imagine for a moment, however, that we’ve traced back the ancestry of all alleles
in a particular population to what we call a reference population, i.e., a population in which
we regard all alleles as unrelated. That’s equivalent to saying that alleles chosen at random
from this population have zero probability of being identical by descent. Let’s also make the
further assumption that every allele in our reference population is distinguishable from every
other allele. That means that in descendant populations two alleles that are are identical by
type will also be identical by descent. Given all of these assumptions we can write down the
genotype frequencies in a descendant population once we know f , where we define f as the
probability that two alleles chosen at random in the descendant population are identical by
descent:

x11 = p2 (1 − f ) + f p (3.27)
x12 = 2pq(1 − f ) (3.28)
x22 = q 2 (1 − f ) + f q . (3.29)

It may not be immediately apparent, but you’ve actually seen these equations before in a
different form. Since p − p2 = p(1 − p) = pq and q − q 2 = q(1 − q) = pq these equations can
be rewritten as

x11 = p2 + f pq (3.30)
12
Notice that if we adopt this definition for f it can only take on values between 0 and 1. When used in
the sense of a population or equilibrium inbreeding coefficient, however, f can be negative.
13
OK, maybe “of course” is overstating it. It isn’t really obvious that more clarity is needed until I point
out the ambiguities in the bullet points that follow.

28
x12 = 2pq(1 − f ) (3.31)
x22 = q 2 + f pq . (3.32)

You can probably see why population geneticists tend to play fast and loose with the
definitions. If we ignore the distinction between identity by type and identity by descent,
then the equations we used earlier to show the relationship between genotype frequencies,
allele frequencies, and f (defined as a measure of departure from Hardy-Weinberg expec-
tations) are identical to those used to show the relationship between genotype frequencies,
allele frequencies, and f (defined as a the probability that two randomly chosen alleles in
the population are identical by descent).

29
30
Chapter 4

Testing Hardy-Weinberg

Because the Hardy-Weinberg principle tells us what to expect concerning the genetic com-
position of a sample when no evolutionary forces are operating, one of the first questions
population geneticists often ask is “Are the genotypes in this sample present in the expected,
i.e., Hardy-Weinberg, proportions?” We ask that question because we know that if the geno-
types are not in Hardy-Weinberg proportions, at least one of the assumptions underlying
derivation of the principle has been violated, i.e., that there is some evolutionary force op-
erating on the population, and we know that we can use the magnitude and direction of the
departure to say something about what those forces might be. In particular, we now know
that inbreeding leads to a deficiency of heterozygotes, and we know that the extent of that
deficiency can be measured by f .1
What we haven’t talked about is (a) how to estimate f from data and (b) how to tell
whether we have good evidence that the estimate is positive (meaning that there’s a defi-
ciency of heterozygotes in the population) or negative. Both (a) and (b) pose more of a
challenge than you might initially think. After all we also know that the numbers in our
sample may differ from expectation just because of random sampling error. For example,
Table 4.1 presents data from a sample of 1000 English blood donors scored for MN phe-
notype. M and N are co-dominant, so that heterozygotes can be distinguished from the
two homozygotes. Clearly the observed and expected numbers don’t look very different.2
1
Quiz question: Which definition of f is relevant for determining whether there is a deficiency of het-
erozygotes?
2
For the time being, I simply calculated the expected numbers in the way you’d tell your students in
introductory biology to do it: (1) Use the sample frequency of M to estimate its population frequency. (This
is a maximum-likelihood estimate, by the way. (2) Calculate the expected frequency of each genotype from
the Hardy-Weinberg proportions. (3) Calculated the expected numbers of each genotype by multiplying the
expected frequency of each by the total sample size.

31
Observed Expected
Phenotype Genotype Number Number
M mm 298 294.3
MN mn 489 496.3
N nn 213 209.3

Table 4.1: Adapted from Table 2.4 in [32] (from [11])

The differences semm likely to be attributable purely to chance, but we need some way of
assessing that “likeliness.”

Testing Hardy-Weinberg
One approach to testing the hypothesis that genotypes are in Hardy-Weinberg proportions
is quite simple. We can simply do a χ2 or G-test for goodness of fit between observed and
predicted genotype (or phenotype) frequencies, where the predicted genotype frequencies
are derived from our estimates of the allele frequencies in the population.3 There’s only one
problem. To do either of these tests we have to know how many degrees of freedom are
associated with the test. How do we figure that out? In general, the formula is

d.f. = (# of categories in the data -1 )


−(# number of parameters estimated from the data) .

For this problem we have

d.f. = (# of phenotype categories in the data - 1)


−(# of allele frequencies estimated from the data)
= (3 − 1) − 1
= 1 .

In the ABO blood group we have 4 phenotype categories, and 3 allele frequencies. That
means that a test of whether a particular data set has genotypes in Hardy-Weinberg pro-
portions will have (4 − 1) − (3 − 1) = 1 degrees of freedom for the test. Notice that this
3
If you’re not familiar with the χ2 or G-test for goodness of fit, consult any introductory statistics or
biostatistics book, and you’ll find a description. In fact, you probably don’t have to go that far. You
can probably find one in your old genetics textbook. Or you can just boot up your browser and head to
Wikipedia: http://en.wikipedia.org/wiki/Goodness of fit.

32
Phenotype A AB B O Total
Observed 862 131 365 702 2060

Table 4.2: Data on variation in ABO blood type.

also means that if you have completely dominant markers, like RAPDs or AFLPs, you can’t
determine whether genotypes are in Hardy-Weinberg proportions because you have 0 degrees
of freedom available for the test.

An example
Table 4.2 exhibits data drawn from a study of phenotypic variation among individuals at
the ABO blood locus:
The maximum-likelihood estimate of allele frequencies, assuming Hardy-Weinberg, is:4

pa = 0.281
pb = 0.129
po = 0.590 ,

giving expected numbers of 846, 150, 348, and 716 for the four phenotypes. χ21 = 3.8,
0.05 < p < 0.1.

A Bayesian approach
We’ve already seen how to use JAGS to provide allele frequency estimates from phenotypic
data at the ABO locus.

model {
# likelihood
pi[1] <- p.a*p.a + 2*p.a*p.o
pi[2] <- 2*p.a*p.b
pi[3] <- p.b*p.b + 2*p.b*p.o
pi[4] <- p.o*p.o
x[1:4] ~ dmulti(pi[],n)
4
Take my word for it, or run the EM algorithm on these data yourself.

33
# priors
a1 ~ dexp(1)
b1 ~ dexp(1)
o1 ~ dexp(1)
p.a <- a1/(a1 + b1 + o1)
p.b <- b1/(a1 + b1 + o1)
p.o <- o1/(a1 + b1 + o1)

n <- sum(x[])
}
list(x=c(862, 131, 365, 702))
As you may recall, this produced the following results:
> source("multinomial.R")
Compiling model graph
Resolving undeclared variables
Allocating nodes
Graph Size: 20

Initializing model

|++++++++++++++++++++++++++++++++++++++++++++++++++| 100%
|**************************************************| 100%
Inference for Bugs model at "multinomial.txt", fit using jags,
5 chains, each with 2000 iterations (first 1000 discarded)
n.sims = 5000 iterations saved
mu.vect sd.vect 2.5% 25% 50% 75% 97.5% Rhat n.eff
p.a 0.282 0.008 0.266 0.276 0.282 0.287 0.297 1.001 5000
p.b 0.129 0.005 0.118 0.125 0.129 0.133 0.140 1.001 5000
p.o 0.589 0.008 0.573 0.584 0.589 0.595 0.606 1.001 5000
deviance 27.811 2.007 25.830 26.363 27.229 28.577 33.245 1.001 4400

For each parameter, n.eff is a crude measure of effective sample size,


and Rhat is the potential scale reduction factor (at convergence, Rhat=1).

DIC info (using the rule, pD = var(deviance)/2)


pD = 2.0 and DIC = 29.8

34
DIC is an estimate of expected predictive error (lower deviance is better).
>

Now that we know about inbreeding coefficients and that they allow us to measure the
departure of genotype frequencies from Hardy-Weinberg proportions, we can modify this a
bit and estimate allele frequencies without assuming that genotypes are in Hardy-Weinberg
proportions.

model {
# likelihood
pi[1] <- p.a*p.a + f*p.a*(1-p.a) + 2*p.a*p.o*(1-f)
pi[2] <- 2*p.a*p.b*(1-f)
pi[3] <- p.b*p.b + f*p.b*(1-p.b) + 2*p.b*p.o*(1-f)
pi[4] <- p.o*p.o + f*p.o*(1-p.o)
x[1:4] ~ dmulti(pi[],n)

# priors
a1 ~ dexp(1)
b1 ~ dexp(1)
o1 ~ dexp(1)
p.a <- a1/(a1 + b1 + o1)
p.b <- b1/(a1 + b1 + o1)
p.o <- o1/(a1 + b1 + o1)

f ~ dunif(0,1)

n <- sum(x[])
}

This simple change produces the following results:

> source("abo-inbreeding.R")
Compiling model graph
Resolving undeclared variables
Allocating nodes
Graph Size: 30

Initializing model

35
|++++++++++++++++++++++++++++++++++++++++++++++++++| 100%
|**************************************************| 100%
Inference for Bugs model at "abo-inbreeding.txt", fit using jags,
5 chains, each with 2000 iterations (first 1000 discarded)
n.sims = 5000 iterations saved
mu.vect sd.vect 2.5% 25% 50% 75% 97.5% Rhat n.eff
f 0.403 0.139 0.059 0.326 0.429 0.505 0.599 1.013 550
p.a 0.349 0.027 0.290 0.332 0.352 0.368 0.392 1.006 960
p.b 0.161 0.014 0.132 0.152 0.162 0.171 0.186 1.006 840
p.o 0.490 0.039 0.429 0.461 0.485 0.514 0.577 1.006 1000
deviance 25.200 2.416 22.249 23.411 24.716 26.342 31.206 1.007 470

For each parameter, n.eff is a crude measure of effective sample size,


and Rhat is the potential scale reduction factor (at convergence, Rhat=1).

DIC info (using the rule, pD = var(deviance)/2)


pD = 2.9 and DIC = 28.1
DIC is an estimate of expected predictive error (lower deviance is better).
>

Notice that the allele frequency estimates have changed quite a bit and that the posterior
mean of f is about 0.40. On first appearance, that would seem to indicate that we have lots
of inbreeding in this sample. BUT it’s a human population. It doesn’t seem very likely that
a human population is really that highly inbred.
Indeed, take a closer look at all of the information we have about that estimate of f . The
95% credible interval for f is between 0.06 and 0.60. That suggests that we don’t have much
information at all about f from these data.5 How can we tell if the model with inbreeding
is better than the model that assumes genotypes are in Hardy-Weinberg proportions?

The Deviance Information Criterion


A widely used statistic for comparing models in a Bayesian framework is the Deviance
Information Criterion. R2jags calculates an estimate of it for us automatically, but you
need to know that if you’re serious about model comparison, yous shouldn’t rely on the DIC

5
That shouldn’t be too surprising, since any information we have about f comes indirectly through our
allele frequency estimates.

36
Model deviance pD DIC
f >0 25.2 2.9 28.1
f =0 27.8 2.0 29.9

Table 4.3: DIC calculations for the ABO example.

calculation from R2jags unless you’ve verified it.6 Fortunately, in this case, the results are
fairly reliable.7 The results of the DIC calculations for our two models are summarized in
Table 4.3.
The deviance is a measure of how well the model fits the data, specifically -2 times the
average of the log likelihood values calculated from the parameters in each sample from the
posterior. pD is a measure of model complexity, roughly speaking the number of parameters
in the model.8 DIC is a composite measure of how well the model does. It’s a compromise
between fit and complexity, and smaller DICs are preferred. A difference of more than 7-10
units is regarded as strong evidence in favor of the model with the smaller DIC.
In this case the difference in DIC values is only about 0.8, so we have very little evidence
for f > 0 model for these data. This is consistent with the weak evidence for a departure
from Hardy-Weinberg that was revealed in the χ2 analysis.

6
If you’re interested in learning more, feel free to ask, but I’m afraid both the explanation and the solution
are a little complicated.
7
You’ll just have to trust me on this unless you asked the last question.
8
Notice that we estimated 2 parameters in the f = 0 model (2 allele frequencies) and 3 parameters in the
f > 0 model (2 allele frequencies plus the inbreeding coefficient).

37
38
Chapter 5

Analyzing the genetic structure of


populations

So far we’ve focused on inbreeding as one important way that populations may fail to mate
at random, but there’s another way in which virtually all populations and species fail to mate
at random. Individuals tend to mate with those that are nearby. Even within a fairly small
area, phenomena like nearest neighbor pollination in flowering plants or home-site fidelity in
animals can cause mates to be selected in a geographically non-random way. What are the
population genetic consequences of this form of non-random mating?
Well, if you think about it a little, you can probably figure it out. Since individuals that
occur close to one another tend to be more genetically similar than those that occur far
apart, the impacts of local mating will mimic those of inbreeding within a single, well-mixed
population.

A numerical example
For example, suppose we have two subpopulations of green lacewings, one of which occurs
in forests the other of which occurs in adjacent meadows. Suppose further that within each
subpopulation mating occurs completely at random, but that there is no mating between
forest and meadow individuals. Suppose we’ve determined allele frequencies in each pop-
ulation at a locus coding for phosglucoisomerase (P GI), which conveniently has only two
alleles. The frequency of A1 in the forest is 0.4 and in the meadow in 0.7. We can easily
calculate the expected genotype frequencies within each population, namely

39
A1 A1 A1 A2 A2 A2
Forest 0.16 0.48 0.36
Meadow 0.49 0.42 0.09

Suppose, however, we were to consider a combined population consisting of 100 indi-


viduals from the forest subpopulation and 100 individuals from the meadow subpopulation.
Then we’d get the following:1

A1 A1 A1 A2 A2 A2
From forest 16 48 36
From meadow 49 42 9
Total 65 90 45

So the frequency of A1 is (2(65) + 90)/(2(65 + 90 + 45)) = 0.55. Notice that this is just
the average allele frequency in the two subpopulations, i.e., (0.4 + 0.7)/2. Since each sub-
population has genotypes in Hardy-Weinberg proportions, you might expect the combined
population to have genotypes in Hardy-Weinberg proportions, but if you did you’d be wrong.
Just look.

A1 A1 A1 A2 A2 A2
Expected (from p = 0.55) (0.3025)200 (0.4950)200 (0.2025)200
60.5 99.0 40.5
Observed (from table above) 65 90 45

The expected and observed don’t match, even though there is random mating within both
subpopulations. They don’t match because there isn’t random mating in the combined
population. Forest lacewings choose mates at random from other forest lacewings, but they
never mate with a meadow lacewing (and vice versa). Our sample includes two populations
that don’t mix. This is an example of what’s know as the Wahlund effect [81].

The algebraic development


You should know by now that I’m not going to be satisfied with a numerical example. I now
feel the need to do some algebra to describe this situation a little more generally.
1
If we ignore sampling error.

40
Suppose we know allele frequencies in k subpopulations.2 Let pi be the frequency of A1
in the ith subpopulation. Then if we assume that all subpopulations contribute equally to
combined population,3 we can calculate expected and observed genotype frequencies the way
we did above:

A1 A1 A1 A2 A2 A2
Expected p̄2 2p̄q̄ q̄ 2
1 P 2 1 P 1 P 2
Observed k
pi k
2pi qi k
qi

pi /k and q̄ = 1 − p̄. Now


P
where p̄ =

1X 2 1X
pi = (pi − p̄ + p̄)2 (5.1)
k k
1 X 
= (pi − p̄)2 + 2p̄(pi − p̄) + p̄2 (5.2)
k
1X
= (pi − p̄)2 + p̄2 (5.3)
k
= Var(p) + p̄2 (5.4)

Similarly,

1X
2pi qi = 2p̄q̄ − 2Var(p) (5.5)
k
1X 2
qi = q̄ 2 + Var(p) (5.6)
k

Since Var(p) ≥ 0 by definition, with equality holding only when all subpopulations have
the same allele frequency, we can conclude that

• Homozygotes will be more frequent and heterozygotes will be less frequent than ex-
pected based on the allele frequency in the combined population.

• The magnitude of the departure from expectations is directly related to the magnitude
of the variance in allele frequencies across populations, Var(p).
2
For the time being, I’m going to assume that we know the allele frequencies without error, i.e., that we
didn’t have to estimate them from data. We’ll deal with real life, i.e., how we can detect the Wahlund effect
when we have to estimate allele freqeuncies from data, a little later.
3
We’d get the same result by relaxing this assumption, but the algebra gets messier, so why bother?

41
• The effect will apply to any mixing of samples in which the subpopulations combined
have different allele frequencies.4

• The same general phenomenon will occur if there are multiple alleles at a locus, al-
though it is possible for one or a few heterozygotes to be more frequent than expected
if there is positive covariance in the constituent allele frequencies across populations.5

• The effect is analogous to inbreeding. Homozygotes are more frequent and heterozy-
gotes are less frequent than expected.6

To return to our earlier numerical example:

 
Var(p) = (0.4 − 0.55)2 + (0.7 − 0.55)2 (5.7)
= 0.0225 (5.8)

Expected Observed
A1 A1 0.3025 + 0.0225 = 0.3250
A1 A2 0.4950 - 2(0.0225) = 0.4500
A2 A2 0.2025 + 0.0225 = 0.2250

Wright’s F -statistics
One limitation of the way I’ve described things so far is that Var(p) doesn’t provide a
convenient way to compare population structure from different samples. Var(p) can be
much larger if both alleles are about equally common in the whole sample than if one occurs
at a mean frequency of 0.99 and the other at a frequency of 0.01. Moreover, if you stare at
equations (5.4)–(5.6) for a while, you begin to realize that they look a lot like some equations
we’ve already encountered. Namely, if we were to define Fst 7 as Var(p)/p̄q̄, then we could
4
For example, if we combine samples from different years or across age classes of long-lived organisms, we
may see a deficienty of heterozygotes in the sample purely as a result of allele frequency differences across
years.
5
If you’re curious about this, feel free to ask, but I’ll have to dig out my copy of Li [56] to answer. I don’t
carry those details around in my head.
6
And this is what we predicted when we started.
7
The reason for the subscript will become apparent later. It’s also very important to notice that I’m
defining FST here in terms of the population parameters p and Var(p). Again, we’ll return to the problem
of how to estimate FST from data a little later.

42
rewrite equations (5.4)–(5.6) as
1X 2
pi = p̄2 + Fst p̄q̄ (5.9)
k
1X
2pi qi = 2p̄q̄(1 − Fst ) (5.10)
k
1X 2
qi = q̄ 2 + Fst p̄q̄ (5.11)
k
And it’s not even completely artificial to define Fst the way I did. After all, the effect of
geographic structure is to cause matings to occur among genetically similar individuals. It’s
rather like inbreeding. Moreover, the extent to which this local mating matters depends on
the extent to which populations differ from one another. p̄q̄ is the maximum allele frequency
variance possible, given the observed mean frequency. So one way of thinking about Fst is
that it measures the amount of allele frequency variance in a sample relative to the maximum
possible.8
There may, of course, be inbreeding within populations, too. But it’s easy to incor-
porate this into the framework, too.9 Let Hi be the actual heterozygosity in individuals
within subpopulations, Hs be the expected heterozygosity within subpopulations assuming
Hardy-Weinberg within populations, and Ht be the expected heterozygosity in the com-
bined population assuming Hardy-Weinberg over the whole sample.10 Then thinking of f
as a measure of departure from Hardy-Weinberg and assuming that all populations depart
from Hardy-Weinberg to the same degree, i.e., that they all have the same f , we can define
Hi
Fit = 1 −
Ht
Let’s fiddle with that a bit.
Hi
1 − Fit =
Ht
Hi Hs
  
=
Hs Ht
= (1 − Fis )(1 − Fst ) ,
8
I say “one way”, because there are several other ways to talk about Fst , too. But we won’t talk about
them until later.
9
At least it’s easy once you’ve been shown how.
10
Please remember that we’re assuming we know those frequencies exactly. In real applications, of course,
we’ll estimate those frequencies from data, so we’ll have to account for sampling error when we actually try to
estimate these things. If you’re getting the impression that I think the distinction between allele frequencies
as parameters, i.e., the real allele frequency in the population , and allele frequencies as estimates, i.e., the
sample frequencies from which we hope to estimate the paramters, is really important, you’re getting the
right impression.

43
where Fis is the inbreeding coefficient within populations, i.e., f , and Fst has the same
definition as before.11 Ht is often referred to as the genetic diversity in a population. So
another way of thinking about Fst = (Ht − Hs )/Ht is that it’s the proportion of the diversity
in the sample that’s due to allele frequency differences among populations.

F statistics
We’ve now seen the principles underlying Wright’s F -statistics. I should point out that
Gustave Malécot developed very similar ideas at about the same time as Wright, but since
Wright’s notation stuck,12 population geneticists generally refer to statistics like those we’ve
discussed as Wright’s F -statistics.13
Neither Wright nor Malécot worried too much about the problem of estimating F -
statistics from data. Both realized that any inferences about population structure are based
on a sample and that the characteristics of the sample may differ from those of the popula-
tion from which it was drawn, but neither developed any explicit way of dealing with those
differences. Wright develops some very ad hoc approaches in his book [88], but they have
been forgotten, which is good because they aren’t very satisfactory and they shouldn’t be
used. There are now three reasonable approaches available:14
1. Nei’s G-statistics,
2. Weir and Cockerham’s θ-statistics, and
3. A Bayesian analog of θ.15

An example from Isotoma petraea


To make the differences in implementation and calculation clear, I’m going to use data
from 12 populations of Isotoma petraea in southwestern Australia surveyed for genotype at
GOT –1 [41] as an example throughout these discussions (Table 5.1).
11
It takes a fair amount of algebra to show that this definition of Fst is equivalent to the one I showed you
before, so you’ll just have to take my word for it.
12
Probably because he published in English and Malécot published in French.
13
The Hardy-Weinberg proportions should probably be referred to as the Hardy-Weinberg-Castle propor-
tions too, since Castle pointed out the same principle. For some reason, though, his demonstration didn’t
have the impact that Hardy’s and Weinberg’s did. So we generally talk about the Hardy-Weinberg principle.
14
And as we’ll soon see, I’m not too crazy about one of these three. To my mind, there are really only
two approaches that anyone should consider.
15
These is, as you have probably already guessed, my personal favorite. We’ll talk about it next time.

44
Genotype
Population A1 A1 A1 A2 A2 A2 p̂
Yackeyackine Soak 29 0 0 1.0000
Gnarlbine Rock 14 3 3 0.7750
Boorabbin 15 2 3 0.8000
Bullabulling 9 0 0 1.0000
Mt. Caudan 9 0 0 1.0000
Victoria Rock 23 5 2 0.8500
Yellowdine 23 3 4 0.8167
Wargangering 29 3 1 0.9242
Wagga Rock 5 0 0 1.0000
“Iron Knob Major” 1 0 0 1.0000
Rainy Rocks 0 1 0 0.5000
“Rainy Rocks Major” 1 0 0 1.0000

Table 5.1: Genotype counts at the GOT − 1 locus in Isotoma petraea (from [41]).

Let’s ignore the sampling problem for a moment and calculate the F -statistics as if we
had observed the population allele frequencies without error. They’ll serve as our baseline
for comparison.

p̄ = 0.8888
Var(p) = 0.02118
Fst = 0.2143
Individual heterozygosity = (0.0000 + 0.1500 + 0.1000 + 0.0000 + 0.0000 + 0.1667 + 0.1000
+0.0909 + 0.0000 + 0.0000 + 1.0000 + 0.0000)/12
= 0.1340
Expected heterozygosity = 2(0.8888)(1 − 0.8888)
= 0.1976
Individual heterozygosity
Fit = 1−
Expected heterozygosity
0.1340
= 1−
0.1976
= 0.3221
1 − Fit = (1 − Fis )(1 − Fst )

45
Fit − Fst
Fis =
1 − Fst
0.3221 − 0.2143
=
1 − 0.2143
= 0.1372

Summary
Correlation of gametes due to inbreeding within subpopulations (Fis ): 0.1372
Correlation of gametes within subpopulations (Fst ): 0.2143
Correlation of gametes in sample (Fit ): 0.3221

Why do I refer to them as the “correlation of gametes . . .”? There are two reasons:

1. That’s the way Wright always referred to and interpreted them.

2. We can define indicator variables xijk = 1 if the ith allele in the jth individual of
population k is A1 and xijk = 0 if that allele is not A1 . This may seem like a strange
thing to do, but the Weir and Cockerham approach to F -statistics described below
uses just such an approach. If we do this, then the definitions for Fis , Fst , and Fit
follow directly.16

Notice that Fis could be negative, i.e., there could be an excess of heterozygotes within
populations (Fis < 0). Notice also that we’re implicitly assuming that the extent of departure
from Hardy-Weinberg proportions is the same in all populations. Equivalently, we can regard
Fis as the average departure from Hardy-Weinberg proportions across all populations.

Statistical expectation and unbiased estimates


So far I’ve assumed that we know the allele frequencies without error, but of course that’s
never the case unless we’ve created experimental populations. We are always taking a sample
from a population and inferring — estimating — allele frequencies from our sample. Similarly,
we are estimating FST and our estimate of FST needs to take account of the imprecision in
the allele frequency estimates on which it was based. To understand one approach to dealing
with this uncertainty I need to introduce two new concepts: statistical expectation and
unbiased estimates.
16
See [83] for details.

46
The concept of statistical expectation is actually quite an easy one. It is an arithmetic
average, just one calculated from probabilities instead of being calculated from samples. So,
for example, if P(k) is the probability that we find k A1 alleles in our sample the expected
number of A1 alleles in our sample is just
X
E(k) = kP(k)
= np ,

where n is the total number of alleles in our sample and p is the frequency of A1 in our
sample.17
Now consider the expected value of our sample estimate of the population allele frequency,
p̂ = k/n, where k now refers to the number of A1 alleles we actually found.
X 
E(p̂) = E (k/n)
X
= (k/n)P (k)
X 
= (1/n) kP (k)
= (1/n)(np)
= p .

Because E(p̂) = p, p̂ is said to be an unbiased estimate of p.18 When an estimate is unbiased


it means that if we were to repeat the sampling experiment an infinite number of times
and to take the average of the estimates, the average of those values would be equal to the
(unknown) parameter value.
What about estimating the frequency of heterozygotes within a population? The obvious
estimator is H̃ = 2p̂(1 − p̂). Well,

E(H̃) = E (2p̂(1 − p̂))


 
= 2 E(p̂) − E(p̂2 )
= ((n − 1)/n)2p(1 − p) .

P(k) = N N −k
17
 k
k p (1 − p) . The algebra in getting from the first line to the second is a little complicated,
but feel free to ask me about it if you’re intersted.
18
Notice that I’m using a hat here to refer to a statistical estimate. Remember when I told you I’d be
using hats for a couple of different purposes? Well, this is the second one.

47
Because E(H̃) 6= 2p(1 − p), H̃ is a biased estimate of 2p(1 − p). If we set Ĥ = (n/(n − 1))H̃,
however, Ĥ is an unbiased estimator of 2p(1 − p).19
If you’ve ever wondered why you typically divide the sum of squared deviations about the
mean by n − 1 instead of n when estimating the variance of a sample, this is why. Dividing
by n gives you a (slightly) biased estimator.

The gory details20


Starting where we left off above:
 
E(H̃) = 2 (Ep̂) − E(p̂2 )
  
= 2 p − E (k/n)2 ,

where k is the number of A1 alleles in our sample and n is the sample size.
 
E (k/n)2 (k/n)2 P(k)
X
=
= (1/n)2 k 2 P(k)
X
 
= (1/n)2 Var(k) + k̄ 2
 
= (1/n)2 np(1 − p) + n2 p2
= p(1 − p)/n + p2 .

Substituting this back into the equation above yields the following:
  
E(H̃) = 2 p − p(1 − p)/n + p2
= 2 (p(1 − p) − p(1 − p)/n)
= (1 − 1/n) 2p(1 − p)
= ((n − 1)/n)2p(1 − p) .

19
If you’re wondering how I got from the second equation for Ĥ to the last one, ask me about it or read
the gory details section that follows.
20
Skip this part unless you are really, really interested in how I got from the second equation to the third
equation in the last paragraph. This is more likely to confuse you than help unless you know that the
variance of a binomial sample is np(1 − p) and that E(k 2 ) = Var(p) + p2 .

48
Corrections for sampling error
There are two sources of allele frequency difference among subpopulations in our sample: (1)
real differences in the allele frequencies among our sampled subpopulations and (2) differences
that arise because allele frequencies in our samples differ from those in the subpopulations
from which they were taken.21

Nei’s Gst
Nei and Chesser [60] described one approach to accounting for sampling error. So far as I’ve
been able to determine, there aren’t any currently supported programs22 that calculate the
bias-corrected versions of Gst .23 I calculated the results in Table 5.2 by hand.
The calculations are tedious, which is why you’ll want to find some way of automating
the caluclations if you want to do them.24
N X m
1 X
Hi = 1 − Xkii
N k=1 i=1
m
" #
ñ HI
x̂¯2i −
X
Hs = 1−
ñ − 1 i=1 2ñ
m
HS HI
x̄2i +
X
Ht = 1 − −
i=1 ñ 2ñN

where we have N subpopulations, x̂¯2i = k=1 x2ki /N , x̄i = N


PN P
k=1 xki /N , ñ is the harmonic
1
mean of the population sample sizes, i.e., ñ = 1 PN 1 , Xkii is the frequency of genotype
N k=1 nk
Ai Ai in population k, xki is the frequency of allele Ai in population k, and nk is the sample
size from population k. Recall that
Hi
Fis = 1 −
Hs
21
There’s actually a third source of error that we’ll get to in a moment. The populations we’re sampling
from are the product of an evolutionary process, and since the populations aren’t of infinite size, drift has
played a role in determining allele frequencies in them. As a result, if we were to go back in time and re-run
the evolutionary process, we’d end up with a different set of real allele frequency differences. We’ll talk
about this more in just a moment when we get to Weir and Cockerham’s statistics.
22
Popgene estimates Gst , but I don’t think it’s been updated since 2000. FSTAT also estimates gene
diversities, but the most recent version is from 2002.
23
There’s a reason for this that we’ll get to in a moment. It’s alluded to in the last footnote.
24
It is also one big reason why most people use Weir and Cockerham’s θ. There’s readily available software
that calculates it for you.

49
Hs
Fst = 1 −
Ht
Hi
Fit = 1− .
Ht

Weir and Cockerham’s θ


Weir and Cockerham [84] describe the fundamental ideas behind this approach. Weir and
Hill [85] bring things up to date. Holsinger and Weir [39] provide a less technical overview.25
Most, if not all, packages available now that estimate FST provide estimates of θ. The most
important difference between θ and Gst and the reason why Gst has fallen into disuse is that
Gst ignores an important source of sampling error that θ incorporates.
In many applications, especially in evolutionary biology, the subpopulations included
in our sample are not an exhasutive sample of all populations. Moreover, even if we have
sampled from every population there is now, we may not have sampled from every population
there ever was. And even if we’ve sampled from every population there ever was, we know
that there are random elements in any evolutionary process. Thus, if we could run the clock
back and start it over again, the genetic composition of the populations we have might be
rather different from that of the populations we sampled. In other words, our populations
are, in many cases, best regarded as a random sample from a much larger set of populations
that could have been sampled.

Even more gory details26

Let xmn,i be an indicator variable such that xmn,i = 1 if allele m from individual n is of type i
1 P2 PN
and is 0 otherwise. Clearly, the sample frequency p̂i = 2N m=1 n=1 xmn,i , and E(p̂i ) = pi ,
i = 1 . . . A. Assuming that alleles are sampled independently from the population

E(x2mn,i ) = pi
E(xmn,i xmn0 ,i ) = E(xmn,i xm0 n0 ,i ) = p2i + σxmn,i xm0 n0 ,i
= p2i + pi (1 − pi )θ

25
We also talk a bit more about how F -statistics can be used. If you just can’t get enough of this, I
suggest you take a look at Verity and Nichols [80]. They provide a really solid analysis of FST , GST , and
some related statistics.
26
This is even worse than the last time. I include it for completeness only. I really don’t expect anyone
(unless they happen to be a statistician) to be able to understand these details.

50
Method Fis Fst Fit
Direct 0.1372 0.2143 0.3221
Nei 0.3092 0.2395 0.4746
Weir & Cockerham 0.5398 0.0387 0.5577

Table 5.2: Comparison of Wright’s F -statistics when ignoring sampling effects with Nei’s
GST and Weir and Cockerham’s θ.

where σxmn,i xm0 n0 ,i is the intraclass covariance for the indicator variables and

σp2i
θ= (5.12)
pi (1 − pi )

is the scaled among population variance in allele frequency in the populations from which
this population was sampled. Using (5.12) we find after some algebra

pi (1 − pi )(1 − θ)
σp̂2i = pi (1 − pi )θ + .
2N

The hat on σp̂2i indicates the sample variance of allele frequencies among popluations. A
natural estimate for θ emerges using the method of moments when an analysis of variance is
applied to indicator variables derived from samples representing more than one population.

Applying Gst and θ


If we return to the data that motivated this discussion, these are the results we get from
analyses of the GOT − 1 data from Isotoma petraea (Table 5.1). But first a note on how
you’ll see statistics like this reported in the literature. It can get a little confusing, because of
the different symbols that are used. Sometimes you’ll see Fis , Fst , and Fit . Sometimes you’ll
see f , θ, and F . And it will seem as if they’re referring to similar things. That’s because
they are. They’re really just different symbols for the same thing (see Table 5.3). Strictly
speaking the symbols in Table 5.3 are the parameters, i.e., values in the population that we
try to estimate. We should put hats over any values estimated from data to indicate that
they are estimates of the parameters, not the parameters themselves. But we’re usually a
bit sloppy, and everyone knows that we’re presenting estimates, so we usually leave off the
hats.

51
Notation
Fit F
Fis f
Fst θ

Table 5.3: Equivalent notations often encountered in descriptions of population genetic


structure.

An example from Wright


Hierarchical analysis of variation in the frequency of the Standard chromosome arrangement
of Drosophila pseudoobscura in the western United States (data from [15], analysis from [89]).
Wright uses his rather peculiar method of accounting for sampling error. I haven’t gone back
to the original data and used a more modern method of analysis.27
66 populations (demes) studied. Demes are grouped into eight regions. The regions are
grouped into four primary subdivisions.

Results
Correlation of gametes within individuals relative to regions (FIR ): 0.0444
Correlation of gametes within regions relative to subdivisions (FRS ): 0.0373
Correlation of gametes within subdivisions relative to total (FST ): 0.1478
Correlation of gametes in sample (FIT ): 0.2160

1 − FIT = (1 − FIR )(1 − FRS )(1 − FST )

Interpretation
There is relatively little inbreeding within regions (FIR = 0.04) and relatively little genetic
differentiation among regions within subdivisions (FRS = 0.04). There is, however, substan-
tial genetic differentiation among the subdivisions (FST = 0.15).
Thus, an explanation for the chromosomal diversity that predicted great local differ-
entiation and little or no differentiation at a large scale would be inconsistent with these
observations.

27
Sounds like it might be a good project, doesn’t it? We’ll see.

52
Chapter 6

Analyzing the genetic structure of


populations: a Bayesian approach

Our review of Nei’s Gst and Weir and Cockerham’s θ illustrated two important principles:

1. It’s essential to distinguish parameters from estimates. Parameters are the things we’re
really interested in, but since we always have to make inferences about the things we’re
really interested in from limited data, we have to rely on estimates of those parameters.

2. This means that we have to identify the possible sources of sampling error in our
estimates and to find ways of accounting for them. In the particular case of Wright’s F -
statistics we saw that, there are two sources of sampling error: the error associated with
sampling only some individuals from a larger universe of individuals within populations
(statistical sampling) and the error associated with sampling only some populations
from a larger universe of populations (genetic sampling).1

It shouldn’t come as any surprise that there is a Bayesian way to do what I’ve just described.
As I hope to convince you, there are some real advantages associated with doing so.

The Bayesian model


I’m not going to provide all of the gory details on the Bayesian model. If you’re interested
you can find most of them in my lecture notes from the Summer Institute in Statistical
1
The terms “statistical sampling” and “genetic sampling” are due to Weir [83].

53
Genetics last summer.2 In fact, I’m only going to describe two pieces of the model.3 First,
a little notation:
n11,i = # of A1 A1 genotypes
n12,i = # of A1 A2 genotypes
n22,i = # of A2 A2 genotypes
i = population index
I = number of populations

These are the data we have to work with. The corresponding genotype frequencies are
x11,i = p2i + f pi (1 − pi )
x12,i = 2pi (1 − pi )(1 − f )
x22,i = (1 − pi )2 + f pi (1 − pi )
So we can express the likelihood of our sample as a product of multinomial probabilities
I
Y n11,i n12,i n22,i
P (n|p, f ) ∝ x11,i x12,i x22,i .
i=1

To complete the Bayesian model, all we need are some appropriate priors. Specifically, we
so far haven’t done anything to describe the variation in allele frequency among populations.
Suppose that the distribution of allele frequencies among populations is well-approximated
by a Beta distribution. A Beta distribution is convenient for many reasons, and it is quite
flexible. Don’t worry about what the formula for a Beta distribution looks like. All you need
to know is that it has two parameters and that if these parameters are π and θ, we can set
things up so that
E(pik ) = π
Var(pik ) = π(1 − π)θ
Thus π corresponds to p̄ and θ corresponds to Fst .4 Figure 6.1 illustrates the shape of the
Beta distribution for different choices of π and θ. To complete the Bayesian model we need
2
Or you can read Holsinger and Wallace [38], which I’ve linked to from the course web site.
3
The good news is that to do the Bayesian analyses in this case, you don’t have to write any JAGS code.
I’ll provide the code, Alternatively, you can use a standalone program, Hickory, that will do the analysis
for your, provided you’re willing to get your data into a format that Hickory recognizes.
4
For any of you who happen to be familiar with the usual parameterization of a Beta distribution, this
parameterization corresponds to setting ν = ((1 − θ)/θ)π and ω = ((1 − θ)/θ)(1 − π).

54
only to specify priors on π, f , and θ. In the absence of any prior knowledge about the
parameters, a uniform prior on [0,1]5 is a natural choice.

The Isotoma petraea example


Here’s the JAGS code to estimate f and θ:

model {
## genotype frequencies
##
for (i in 1:n.pops) {
for (j in 1:n.loci) {
x[i,j,1] <- p[i,j]*p[i,j] + f*p[i,j]*(1-p[i,j])
x[i,j,2] <- 2*(1-f)*p[i,j]*(1 - p[i,j])
x[i,j,3] <- (1-p[i,j])*(1-p[i,j]) + f*p[i,j]*(1-p[i,j])
}
}

## likelihood
##
for (i in 1:n.pops) {
for (j in 1:n.loci) {
n[i,j,1:3] ~ dmulti(x[i,j,], N[i,j])
}
}

## priors
##
## allele frequencies within populations
##
for (i in 1:n.pops) {
for (j in 1:n.loci) {
p[i,j] ~ dbeta(alpha, beta)
}
}
## inbreeding coefficient within populations
##
5
dunif(0,1) in JAGS notation

55
3.0

3.0
2.0

2.0
1.5

1.0

1.0
0.0

0.0
0.0 0.4 0.8 0.0 0.4 0.8 0.0 0.4 0.8

p=0.5, theta=0.05 p=0.5, theta=0.1 p=0.5, theta=0.4


0.0 1.0 2.0 3.0

30
4
3

20
2

10
1
0

0
0.0 0.4 0.8 0.0 0.4 0.8 0.0 0.4 0.8

p=0.25, theta=0.05 p=0.25, theta=0.1 p=0.25, theta=0.4


12
6

40
8
4

20
4
2
0

0.0 0.4 0.8 0.0 0.4 0.8 0.0 0.4 0.8

p=0.25, theta=0.05 p=0.25, theta=0.1 p=0.25, theta=0.4

Figure 6.1: Shapes of the Beta distribution for different choices of π and θ. In the figure
captions “p” corresponds to π, and “theta” corresponds to θ.

56
f ~ dunif(0, 1)

## theta (Fst)
##
theta ~ dunif(0,1)

## pi
##
for (i in 1:n.loci) {
pi[j] ~ dunif(0,1)
}

## parameters of the beta distribution


## the weird constraints are to ensure that both of them
## lie in [1, 1.0e4]
##
for (i in 1:n.loci) {
alpha[i] <- max(1, min(((1-theta)/theta)*pi[i], 1.0e4))
beta[i] <- max(1, min(((1-theta)/theta)*(1-pi[i]), 1.0e4))
}
}
You’ll also find an R script that calls this code. The relevant function has the very creative
name of analyze.data(). It requires data in a very particular format, namely a list that
consists of four named elements:

1. n.pops: The number of populations in the sample.


2. n.loci: The number of loci scored in the sample.
3. n: A n.pops×n.loci×3 matrix of genotype counts where in the final dimension the
first entry corresponds to the number of A1A1 homozygotes, the second entry corre-
sponds to the number of A1A2 heterozygotes, and the third entry corresponds to the
number of A2A2 homozygotes.
4. N: An n.pops×n.loci matrix of sample sizes at each locus. This could be calculated
automatically by analyze.data(), but I haven’t written that code yet.

It’s not too hard to get data into that format. f-statistics.R also provides a set
of functions to construct that list from a CSV file in which each line corresponds with an

57
individual, the first column (pop) is the population from which that individual was collected,
and the remaining columns are the genotype (scored as 0, 1, 2) of the individual at a
particular locus.
The Isotoma petraea data come to us in a somewhat different format, so there’s also a
script that constructs the necessary input list and calls analyze.data(). If you look at the
code, you’ll see that I’ve specified n.chains=5. That allows me to check convergence by
looking at Rhat. If you run the code, here’s what you’ll get (except for MCMC error):

> print(fit)
Inference for Bugs model at "f-statistics.txt", fit using jags,
5 chains, each with 30000 iterations (first 25000 discarded), n.thin = 5
n.sims = 5000 iterations saved
mu.vect sd.vect 2.5% 25% 50% 75% 97.5% Rhat n.eff
f 0.527 0.097 0.327 0.464 0.533 0.595 0.698 1.001 5000
theta 0.112 0.051 0.024 0.076 0.108 0.143 0.223 1.003 3000
deviance 46.679 4.848 38.924 43.096 46.095 49.594 57.610 1.001 3900

For each parameter, n.eff is a crude measure of effective sample size,


and Rhat is the potential scale reduction factor (at convergence, Rhat=1).

DIC info (using the rule, pD = var(deviance)/2)


pD = 11.8 and DIC = 58.4
DIC is an estimate of expected predictive error (lower deviance is better).

It’s easy to modify the code to consider two special cases:

• f = 0: This corresponds to the case when genotypes within populations are in Hardy-
Weinberg proportions. Implemented in f-statistics-f0.txt.

• θ = 0: This corresponds to the case when population allele frequencies are iden-
tical across populations, i.e., there is no genetic differentiation. Implemented in
f-statistics-t0.txt

Before we go any further though and start comparing these models, remember when I said
that you need to be careful about using the DIC reported from R2jags? This is one of those
cases where I don’t trust it. Why? Because I calculated it from scratch and got a very
different result:6
6
Just include DIC=TRUE in the arguments to analyze.data() and you’ll get a printout of these results.

58
Model Dbar Dhat pD DIC
Full 46.5 40.7 5.8 52.3
f =0 73.0 67.6 5.3 73.8
θ=0 61.6 59.8 1.8 63.5

Table 6.1: DIC statistics for the Isotoma petraea data.

Dbar: 46.5
Dhat: 40.7
pD: 5.8
DIC: 52.3
If you compare these results to what’s reported from R2jags, you’ll see that Dbar in my cal-
culation corresponds to the average deviance in the R2jags output.7 That’s because they’re
both calculated as -2.0 times the log likelihood of the data, averaged across all posterior
samples. The difference is in the estimates for pD. My version calculates it according to the
original definition [76] as the difference between Dbar and Dhat, -2.0 times the log likelihood
of the data at the posterior mean of the parameters. R2jags calculates it differently. In both
cases, DIC is just Dbar + pD, but since the estimates of pD are different so are the estimates
of DIC.
In any case, it’s easy to compare DIC from the three models simply by adding model="f0"
(for the f = 0 model) or model="t0" (for the θ = 0 model) to the argument list of
analyze.data(). Table 6.1 summarizes the results.
The f = 0 has a much larger DIC than the full model, a difference of more than 20 units.
Thus, we have strong evidence for inbreeding in these populations of Isotoma petraea.8 The
θ = 0 model also has a DIC substantially larger than the DIC for the full model, a difference
of more than 10 units. Thus, we also have good evidence for genetic differentiation among
these populations.9
It’s useful to look back and think about the different ways we’ve used the data from
Isotoma petraea (Table 6.2). Several things become apparent from looking at this table:

• The direct calculation is very misleading. A population that has only one individual
sampled carries as much weight in determining Fst and Fis as populations with samples
of 20-30 individuals.
7
Except for rounding error.
8
Much stronger than the evidence we had for inbreeding in the ABO blood group data, by the way.
9
It’s important to remember that this conclusion applies only to the locus that we analyzed. Strong
differentiation at this locus need not imply that there is strong differentiation at other loci.

59
Method Fis Fst Fit
Direct 0.14 0.21 0.32
Nei 0.31 0.24 0.47
Weir & Cockerham 0.54 0.04 0.56
Bayesian 0.53 (0.33, 0.70) 0.11 (0.02, 0.22)

Table 6.2: Comparison of Fis and Fst estimates calculated in different ways.

• By failing to account for genetic sampling, Nei’s statistics significantly underestimate


Fis , while Weir & Cockerham’s estimate is indistinguishable from the Bayesian esti-
mates.

• It’s not illustrated here, but when a reasonable number of loci are sampled, say more
than 8-10, the Weir & Cockerham estimates and the Bayesian estimates are quite
similar. But the Bayesian estimates allow for more convenient comparisons of different
estimates, and the credible intervals don’t depend either on asymptotic approximations
or on bootstrapping across a limited collection of loci. The Bayesian approach can also
be extended more easily to complex situations. We’ll see one example of this later in
the semester when we discuss FST outliers.

60
Chapter 7

Analyzing the genetic structure of


populations: individual assignment

About 15 years ago a different approach to the analysis of genetic structure began to emerged:
analysis of individual assignment. Although the implementation details get a little hairy, the
basic idea is fairly simple. Suppose we have genetic data on a series of individuals. Label
the data we have for each individual xi . Suppose that all individuals belong to one of K
populations and let the genotype frequencies in population k be represented by γk . Then
the likelihood that individual i comes from population k is just
P(xi |γk )
P(i|k) = P .
k P(xi |γk )

So if we can specify prior probabilities for γk , we can use Bayesian methods to estimate the
posterior probability that individual i belongs to population k, and we can associate that
assignment with some measure of its reliability.1

Applying assignment to understand invasions


We’ll use Structure to assess whether cultivated genotypes of Berberis thunbergii contribute
to ongoing invasions in Connecticut and Massachusetts [58]. The first problem is to determine
what K to use, because K doesn’t necessarily have to equal the number of populations we
sample from. Some populations may not be distinct from one another. There are a couple
of ways to estimate K. The most straightforward is to run the analysis for a range of
plausible values, repeat it 10-20 times for each value, calculate the mean “log probability of
1
You can find details in [69].

61
K Mean L(K)
2 -2553.2
3 -2331.9
4 -2402.9
5 -2476.3

Table 7.1: Mean log probability of the data for K = 2, 3, 4, 5 in the Berberis thunbergii
data (adapted from [58]).

the data” for each value of K, and pick the value of K that is the biggest, i.e., the least
negative (Table 7.1). For the barberry data, K = 3 is the obvious choice.
Having determined that the data support K = 3, the results of the analysis are displayed
in Figure 7.1. Each vertical bar corresponds to an individual in the sample, and the pro-
portion of each bar that is of a particular color tells us the posterior probability that the
individual belongs to the cluster with that color.
Figure 7.1 may not look terribly informative, but actually it is. Look at the labels beneath
the figure. You’ll see that with te exception of individual 17 from Beaver Brook Park, all the
of the individuals that are solid blue are members of the cultivated Berberis thunbergii var.
atropurpurea. The solid red bar corresponds to Berberis thunbergii ’Atropurpurea’, another
modern cultivar. You’ll notice that individuals 1, 2, 18, and 19 from Beaver Brook Park and
individual 1 from Bluff Point State Park fall into the same genotypic cluster as this cultivar.
Berberis ×ottawensis is a hybrid cultivar whose parents are Berberis thunbergii and Berberis
vulgaris, so it makes sense that individuals of this cultivar would be half blue and half red.
The solid green bars are feral individuals from long-established populations. Notice that
the cultivars are distinct from all but a few of the individuals in the long-established feral
populations, suggesting that contemporary cultivars are doing relatively little to maintain
the invasion in areas where it is already established.

Genetic diversity in human populations


A much more interesting application of Structure appeared a little over a decade ago. The
Human Genome Diversity Cell Line Panel (HGDP-CEPH) consisted at the time of data
from 1056 individuals in 52 geographic populations. Each individual was genotyped at 377
autosomal loci. If those populations are grouped into 5 broad geographical regions (Africa,
[Europe, the Middle East, and Central/South Asia], East Asia, Oceania, and the Americas),
we find that about 93% of genetic variation is found within local populations and only about

62
Figure 7.1: Analysis of AFLP data from Berberis thunbergii [58].

4% is a result of allele frequency differences between regions [70]. You might wonder why Eu-
rope, the Middle East, and Central/South Asia were grouped together for that analysis. The
reason becomes clearer when you look at a Structure analysis of the same data (Figure 7.2).

A non-Bayesian look at individual-based analysis of genetic struc-


ture
Structure has a lot of nice features, but you’ll discover a couple of things about it if you
begin to use it seriously: (1) It often isn’t obvious what the “right” K is.2 (2) It requires a
lot of computational resources, especially with datasets that include a few thousand SNPs,
as is becoming increasingly common. An alternative is to use principal component analysis
directly on genotypes. There are technical details associated with estimating the principal
components and interpreting them that we won’t discuss,3 , but the results can be pretty
striking. Figure 7.3 shows the results of a PCA on data derived from 3192 Europeans at
500,568 SNP loci. The correspondence between the position of individuals in PCA space
and geographical space is remarkable.
2
In fact, it’s not clear that there is such a thing as the “right” K. If you’re interested in hearing more
about that. Feel free to ask.
3
See [64] for details

63
Figure 7.2: Structure analysis of microsatellite diversity in the Human Genome Diversity
Cell Line Panel (from [70]).

64
65
Figure 7.3: Principal components analysis of genetic diversity in Europe corresponds with
geography (from [63]). Panel b is a close-up view of the area around Switzerland (CH).
66
Chapter 8

Two-locus population genetics

So far in this course we’ve dealt only with variation at a single locus. There are obviously
many traits that are governed by more than a single locus in whose evolution we might
be interested. And for those who are concerned with the use of genetic data for forensic
purposes, you’ll know that forensic use of genetic data involves genotype information from
multiple loci. I won’t be discussing quantitative genetic variation for a few weeks, and I’m
not going to say anything about how population genetics gets applied to forensic analyses,
but I do want to introduce some basic principles of multilocus population genetics that are
relevant to our discussions of the genetic structure of populations before moving on to the
next topic. To keep things relatively simple multilocus population genetics will, for purposes
of this lecture, mean two-locus population genetics.

Gametic disequilibrium
One of the most important properties of a two-locus system is that it is no longer sufficient to
talk about allele frequencies alone, even in a population that satisfies all of the assumptions
necessary for genotypes to be in Hardy-Weinberg proportions at each locus. To see why
consider this. With two loci and two alleles there are four possible gametes:1

Gamete A1 B1 A1 B2 A2 B1 A2 B2
Frequency x11 x12 x21 x22

If alleles are arranged randomly into gametes then,

x11 = p1 p2
1
Think of drawing the Punnett square for a dihybrid cross, if you want.

67
x12 = p1 q2
x21 = q1 p2
x22 = q1 q2 ,

where p1 = freq(A1 ) and p2 = freq(A2 ). But alleles need not be arranged randomly into
gametes. They may covary so that when a gamete contains A1 it is more likely to contain
B1 than a randomly chosen gamete, or they may covary so that a gamete containing A1 is
less likely to contain B1 than a randomly chosen gamete. This covariance could be the result
of the two loci being in close physical association, but it doesn’t have to be. Whenever the
alleles covary within gametes

x11 = p1 p2 + D
x12 = p1 q2 − D
x21 = q1 p2 − D
x22 = q1 q2 + D ,

where D = x11 x22 − x12 x22 is known as the gametic disequilibrium.2 When D 6= 0 the alleles
within gametes covary, and D measures statistical association between them. It does not
(directly) measure the physical association. Similarly, D = 0 does not imply that the loci
are unlinked, only that the alleles at the two loci are arranged into gametes independently
of one another.

A little diversion
It probably isn’t obvious why we can get away with only one D for all of the gamete fre-
quencies. The short answer is:

There are four gametes. That means we need three parameters to describe the
four frequencies. p1 and p2 are two. D is the third.

Another way is to do a little algebra to verify that the definition is self-consistent.

D = x11 x22 − x12 x21


= (p1 p2 + D)(q1 q2 + D) − (p1 q2 − D)(q1 p2 − D)
 
= p1 q1 p2 q2 + D(p1 p2 + q1 q2 ) + D2
2
You will sometimes see D referred to as the linkage disequilibrium, but that’s misleading. Alleles at
different loci may be non-randomly associated even when they are not linked.

68
 
− p1 q1 p2 q2 − D(p1 q2 + q1 p2 ) + D2
= D(p1 p2 + q1 q2 + p1 q2 + q1 p2 )
= D (p1 (p2 + q2 ) + q1 (q2 + p2 ))
= D(p1 + q1 )
= D .

Transmission genetics with two loci


I’m going to construct a reduced version of a mating table to see how gamete frequencies
change from one generation to the next. There are ten different two-locus genotypes (if
we distinguish coupling, A1 B1 /A2 B2 , from repulsion, A1 B2 /A2 B1 , heterozygotes as we must
for these purposes). So a full mating table would have 100 rows. If we assume all the
conditions necessary for genotypes to be in Hardy-Weinberg proportions apply, however, we
can get away with just calculating the frequency with which any one genotype will produce
a particular gamete.3

Gametes
Genotype Frequency A1 B1 A1 B2 A2 B1 A2 B2
A1 B1 /A1 B1 x211 1 0 0 0
1 1
A1 B1 /A1 B2 2x11 x12 2 2
0 0
1
A1 B1 /A2 B1 2x11 x21 2
0 12 0
1−r r r 1−r
A1 B1 /A2 B2 2x11 x22 2 2 2 2
A1 B2 /A1 B2 x212 0 1 0 0
r 1−r 1−r r
A1 B2 /A2 B1 2x12 x21 2 2 2 2
1 1
A1 B2 /A2 B2 2x12 x22 0 2
0 2
A2 B1 /A2 B1 x221 0 0 1 0
1 1
A2 B1 /A2 B2 2x21 x22 0 0 2 2
A2 B2 /A2 B2 x222 0 0 0 1

1−r r
Where do 2 and 2 come from?
Consider the coupling double heterozygote, A1 B1 /A2 B2 . When recombination doesn’t hap-
pen, A1 B1 and A2 B2 occur in equal frequency (1/2), and A1 B2 and A2 B1 don’t occur at all.
When recombination happens, the four possible gametes occur in equal frequency (1/4). So
3
We’re assuming random union of gametes rather than random mating of genotypes.

69
the recombination frequency,4 r, is half the crossover frequency,5 c, i.e., r = c/2. Now the
results of crossing over can be expressed in this table:

Frequency A1 B1 A1 B2 A2 B1 A2 B2
1 1
1−c 2
0 0 2
1 1 1 1
c 4 4 4 4
2−c c c 2−c
Total 4 4 4 4
1−r r r 1−r
2 2 2 2

Changes in gamete frequency


We can use this table as we did earlier to calculate the frequency of each gamete in the next
generation. Specifically,

x011 = x211 + x11 x12 + x11 x21 + (1 − r)x11 x22 + rx12 x21
= x11 (x11 + x12 + x21 + x22 ) − r(x11 x22 − x12 x21 )
= x11 − rD
0
x12 = x12 + rD
x021 = x21 + rD
x022 = x22 − rD .

No changes in allele frequency


We can also calculate the frequencies of A1 and B1 after this whole process:

p01 = x011 + x012


= x11 − rD + x12 + rD
= x11 + x12
= p1
0
p2 = p2 .

Since each locus is subject to all of the conditions necessary for Hardy-Weinberg to apply
at a single locus, allele frequencies don’t change at either locus. Furthermore, genotype
frequencies at each locus will be in Hardy-Weinberg proportions. But the two-locus gamete
frequencies change from one generation to the next.
4
The frequency of recombinant gametes in double heterozygotes.
5
The frequency of cytological crossover during meiosis.

70
Gamete frequencies Allele frequencies
Population A1 B1 A1 B2 A2 B1 A2 B2 pi1 pi2 D
1 0.24 0.36 0.16 0.24 0.60 0.40 0.00
2 0.14 0.56 0.06 0.24 0.70 0.20 0.00
Combined 0.19 0.46 0.11 0.24 0.65 0.30 -0.005

Table 8.1: Gametic disequilibrium in a combined population sample.

Changes in D
You can probably figure out that D will eventually become zero, and you can probably even
guess that how quickly it becomes zero depends on how frequent recombination is. But I’d
be astonished if you could guess exactly how rapidly D decays as a function of r. It takes a
little more algebra, but we can say precisely how rapid the decay will be.

D0 = x011 x022 − x012 x021


= (x11 − rD)(x22 − rD) − (x12 + rD)(x21 + rD)
= x11 x22 − rD(x11 + x12 ) + r2 D2 − (x12 x21 + rD(x12 + x21 ) + r2 D2 )
= x11 x22 − x12 x21 − rD(x11 + x12 + x21 + x22 )
= D − rD
= D(1 − r)

Notice that even if loci are unlinked, meaning that r = 1/2, D does not reach 0 immediately.
That state is reached only asymptotically. The two-locus analogue of Hardy-Weinberg is
that gamete frequencies will eventually be equal to the product of their constituent allele
frequencies.

Population structure with two loci


You can probably guess where this is going. With one locus I showed you that there’s
a deficiency of heterozygotes in a combined sample even if there’s random mating within
all populations of which the sample is composed. The two-locus analog is that you can
have gametic disequilibrium in your combined sample even if the gametic disequilibrium is
zero in all of your constituent populations. Table 20.1 provides a simple numerical example
involving just two populations in which the combined sample has equal proportions from
each population.

71
The gory details
You knew that I wouldn’t be satisfied with a numerical example, didn’t you? You knew
there had to be some algebra coming, right? Well, here it is. Let

Di = x11,i − p1i p2i


Dt = x̄11 − p̄1 p̄2 ,

where x̄11 = K1 K
P 1 PK 1 PK
k=1 x11,k , p̄1 = K k=1 p1k , and p̄2 = K k=1 p2k . Given these definitions,
we can now caclculate Dt .

Dt = x̄11 − p̄1 p̄2


K
1 X
= x11,k − p̄1 p̄2
K k=1
K
1 X
= (p1k p2k + Dk ) − p̄1 p̄2
K k=1
K
1 X
= (p1k p2k − p̄1 p̄2 ) + D̄
K k=1
= Cov(p1 , p2 ) + D̄ ,

where Cov(p1 , p2 ) is the covariance in allele frequencies across populations and D̄ is the
mean within-population gametic disequilibrium. Suppose Di = 0 for all subpopulations.
Then D̄ = 0, too (obviously). But that means that

Dt = Cov(p1 , p2 ) .

6 0, then there will be


So if allele frequencies covary across populations, i.e., Cov(p1 , p2 ) =
non-random association of alleles into gametes in the sample, i.e., Dt 6= 0, even if there is
random association alleles into gametes within each population.6
Returning to the example in Table 20.1

Cov(p1 , p2 ) = 0.5(0.6 − 0.65)(0.4 − 0.3) + 0.5(0.7 − 0.65)(0.2 − 0.3)


= −0.005
x̄11 = (0.65)(0.30) − 0.005
= 0.19
6
Well, duh! Covariation of allele frequencies across populations means that alleles are non-randomly
associated across populations. What other result could you possibly expect?

72
x̄12 = (0.65)(0.7) + 0.005
= 0.46
x̄21 = (0.35)(0.30) + 0.005
= 0.11
x̄22 = (0.35)(0.70) − 0.005
= 0.24 .

73
74
Part II

The genetics of natural selection

75
Chapter 9

The Genetics of Natural Selection

So far in this course, we’ve focused on describing the pattern of variation within and among
populations. We’ve talked about inbreeding, which causes genotype frequencies to change,
although it leaves allele frequencies the same, and we’ve talked about how to describe varia-
tion among populations. But we haven’t yet discussed any evolutionary processes that could
lead to a change in allele frequencies within populations.1
Let’s return for a moment to the list of assumptions we developed when we derived the
Hardy-Weinberg principle and see what we’ve done so far.
Assumption #1 Genotype frequencies are the same in males and females, e.g., x11 is the
frequency of the A1 A1 genotype in both males and females.
Assumption #2 Genotypes mate at random with respect to their genotype at this partic-
ular locus.
Assumption #3 Meiosis is fair. More specifically, we assume that there is no segregation
distortion, no gamete competition, no differences in the developmental ability of eggs,
or the fertilization ability of sperm.
Assumption #4 There is no input of new genetic material, i.e., gametes are produced
without mutation, and all offspring are produced from the union of gametes within
this population.
Assumption #5 The population is of infinite size so that the actual frequency of matings
is equal to their expected frequency and the actual frequency of offspring from each
mating is equal to the Mendelian expectations.
1
We mentioned migration and drift in passing, and I’m sure you all understand the rudiments of them,
but we haven’t yet discussed them in detail.

77
Assumption #6 All matings produce the same number of offspring, on average.

Assumption #7 Generations do not overlap.

Assumption #8 There are no differences among genotypes in the probability of survival.

The only assumption we’ve violated so far is Assumption #2, the random-mating as-
sumption. We’re going to spend the next several lectures talking about what happens when
you violate Assumptions #3, #6, and #8. When any one of those assumptions is violated
we have some form of natural selection going on.2

Components of selection
Depending on which of those three assumptions is violated and how it’s violated we recognize
that selection may happen in different ways and at different life-cycle stages.3

Assumption #3: Meiosis is fair. There are at least two ways in which this assumption
may be violated.

• Segregation distortion: The two alleles are not equally frequent in gametes pro-
duced by heterozygotes. The t-allele in house mice, for example, is found in 95%
of fertile sperm produced by heterozygous males.
• Gamete competition: Gametes may be produced in equal frequency in heterozy-
gotes, but there may be competition among them to produce fertilized zygotes,
e.g., sperm competition in animals, pollen competition in seed plants.4

Assumption #6: All matings produce the same number of progeny.

• Fertility selection: The number of offspring produced may depend on maternal


genotype (fecundity selection), paternal genotype (virility selection), or on both.
2
As I alluded to when we first started talking about inbreeding, we can also have natural selection as a
result of certain types of violations of assumption #2, e.g., sexual selection or disassortative mating. See
below.
3
To keep things relatively simple we’re not even going to discuss differences in fitness that may be
associated with different ages. We’ll assume a really simple life-cycle in which there are non-overlapping
generations. So we don’t need to distinguish between fitness components that differ among age categories.
4
Strictly speaking pollen competition isn’t gamete competition, although the evolutionary dynamics are
the same. I’ll leave it to the botanists among you to explain to the zoologists why pollen competition would
be more properly called gametophytic competition.

78
Assumption #8: Survival does not depend on genotype.

• Viability selection: The probability of survival from zygote to adult may depend
on genotype, and it may differ between sexes.

At this point you’re probably thinking that I’ve covered all the possibilities. But by now
you should also know me well enough to guess from the way I wrote that last sentence that
if that’s what you were thinking, you’d be wrong. There’s one more way in which selection
can happen that corresponds to violating

Asssumption #2: Individuals mate at random.

• Sexual selection: Some individuals may be more successful at finding mates than
others. Since females are typically the limiting sex (Bateman’s principle), the
differences typically arise either as a result of male-male competition or female
choice.
• Disassortative mating: When individuals preferentially choose mates different
from themselves, rare genotypes are favored relative to common genotypes. This
leads to a form a frequency-dependent selection.

The genetics of viability selection


That’s a pretty exhaustive (and exhausting) list of the ways in which selection can happen.
Although we’re going to focus our study of natural selection just on viability selection, it’s
important to remember that any or all of the other forms of selection may be operating
simultaneously on the genes or the traits that we’re studying, and the direction of selection
due to these other components may be the same or different as the direction of viability
selection. We’re going to focus on viability selection for two reasons:

1. The most basic properties of natural selection acting on other components of the life
history are similar to those of viability selection. A good understanding of viability
selection provides a solid foundation for understanding other types of selection.5

2. The algebra associated with understanding viability selection is a lot simpler than the
algebra associated with understanding the other types of selection, and the dynamics
are simpler and easier to understand.6
5
There are some important differences, however, and I hope we have time to discuss a couple of them.
6
Once you’ve seen what you’re in for you may think I’ve lied about this. But if you really think I have,

79
The basic framework
To understand the basics, we’ll start with a numerical example using some data on Drosophila
pseudoobscura that Theodosius Dobzhansky collected more than 50 years ago. You may re-
member that this species has chromosome inversion polymorphisms. Although these inver-
sions involve many genes, they are inherited as if they were single Mendelian loci, so we can
treat the karyotypes as single-locus genotypes and study their evolutionary dynamics. We’ll
be considering two inversion types the Standard inversion type, ST , and the Chiricahua
inversion type, CH. We’ll use the following notation throughout our discussion:

Symbol Definition
N number of individuals in the population
x11 frequency of ST /ST genotype
x12 frequency of ST /CH genotype
x22 frequency of CH/CH genotype
w11 fitness of ST /ST genotype, probability of surviving from egg to adult
w12 fitness of ST /CH genotype
w22 fitness of CH/CH genotype

The data look like this:7


Genotype ST /ST ST /CH CH/CH
Number in eggs 41 82 27
x11 N x12 N x22 N
viability 0.6 0.9 0.45
w11 w12 w22
Number in adults 25 74 12
w11 x11 N w12 x12 N w22 x22 N

Genotype and allele frequencies


It should be trivial for you by this time to calculate the genotype frequencies in eggs and
adults. We’ll be using the convention that genotype frequencies in eggs (or newly-formed
zygotes) are the genotype frequencies before selection and that genotype frequencies in adults
are the genotype frequencies after selection.
just ask me to illustrate some of the algebra necessary for understanding viability selection when males and
females differ in fitness. That’s about as simple an extension as you can imagine, and things start to get
pretty complicated even then.
7
Don’t worry for the moment about how the viabilities were estimated.

80
41
freq(ST /ST ) before selection =
41 + 82 + 27
= 0.27
N x11
freq(ST /ST ) before selection =
N x11 + N x12 + N x22
= x11

25
freq(ST /ST ) after selection =
25 + 74 + 12
= 0.23
w11 x11 N
freq(ST /ST ) after selection =
w11 x11 N + w12 x12 N + w22 x22 N
w11 x11
=
w11 x11 + w12 x12 + w22 x22
w11 x11
=

w11 x11 N + w12 x12 N + w22 x22 N
w̄ =
N
= w11 x11 + w12 x12 + w22 x22 ,

where w̄ is the mean fitness, i.e., the average probability of survival in the population.
It is also trivial to calculate the allele frequencies before and after selection:

2(41) + 82
freq(ST ) before selection =
2(41 + 82 + 27)
= 0.55
2(N x11 ) + N x12
freq(ST ) before selection =
2(N x11 + N x12 + N x22 )
= x11 + x12 /2

2(25) + 74
freq(ST ) after selection =
2(25 + 74 + 12)
= 0.56
2w11 x11 N + w12 x12 N
freq(ST ) after selection =
2(w11 x11 N + w12 x12 N + w22 x22 N )
2w11 x11 + w12 x12
=
2(w11 x11 + w12 x12 + w22 x22 )

81
w11 x11 + w12 x12 /2
p0 =
w11 x11 + w12 x12 + w22 x22
x11 = p2 , x12 = 2pq, x22 = q 2
w11 p2 + w12 pq
p0 =
w11 p2 + w12 2pq + w22 q 2
w̄ = w11 x11 + w12 x12 + w22 x22
= p2 w11 + 2pqw12 + q 2 w22
If you’re still awake, you’re probably wondering8 why I was able to substitute p2 , 2pq, and
q 2 for x11 , x12 , and x22 . Remember what I said earlier about what we’re doing here. The only
Hardy-Weinberg assumption we’re violating is the one saying that all genotypes are equally
likely to survive from zygote to adult. Remember also that a single generation in which all of
the conditions for Hardy-Weinberg is enough to establish the Hardy-Weinberg proportions.
Putting those two observations together, it’s not too hard to see that genotypes will be
in Hardy-Weinberg proportions in newly formed zygotes. Viability selection will change
that later in the life-cycle, but we restart every generation with genotypes in the familiar
Hardy-Weinberg proportions, p2 , 2pq, and q 2 , where p is the frequency of ST in the parental
generation.

Selection acts on relative viability


Let’s stare at the selection equation for awhile and see what it means.
0w11 p2 + w12 pq
p = . (9.1)

Suppose, for example, that we were to divide the numerator and denominator of (9.1) by
w11 .9 We’d then have
p2 + (w12 /w11 )pq
p0 = . (9.2)
(w̄/w11 )
Why did I bother to do that? Well, notice that we start with the same allele frequency, p, in
the parental generation in both equations and that we end up with the same allele frequency
in the offspring generation, p0 , in both equations, but the fitnesses are different:
Fitnesses
Equation A1 A1 A1 A2 A2 A2
9.1 w11 w12 w22
9.2 1 w12 /w11 w22 /w11
8
Okay, “probably” is an overstatement. “May be” would have been a better guess.
9
I’m dividing by 1, in case you hadn’t noticed.

82
I could have, of course, divided the numerator and denominator by w12 or w22 intead and
ended up with yet other sets of fitnesses that produce exactly the same change in allele
frequency. This illustrates the following general principle:

The consequences of natural selection (in an infinite population) depend only on


the relative magnitude of fitnesses, not on their absolute magnitude.

That means, for example, that in order to predict the outcome of viability selection, we don’t
have to know the probability that each genotype will survive, their absolute viabilities. We
only need to know the probability that each genotype will survive relative to the probability
that other genotypes will survive, their relative viabilities. As we’ll see later, it’s sometimes
easier to estimate the relative viabilities than to estimate absolute viabilities.10

Marginal fitnesses
In case you haven’t already noticed, there’s almost always more than one way to write
an equation.11 They’re all mathematically equivalent, but they emphasize different things.
In this case, it can be instructive to look at the difference in allele frequencies from one
generation to the next, ∆p:

∆p = p0 − p
w11 p2 + w12 pq
= −p

w11 p2 + w12 pq − w̄p
=

p(w11 p + w12 q − w̄)
=

p(w1 − w̄)
= ,

where w1 is the marginal fitness of allele A1 . To explain why it’s called a marginal fitness,
I’d have to teach you some probability theory that you probably don’t want to learn.12
10
We’ll also see when we get to studying the interaction between natural selection and drift that this
statement is no longer true. To understand how drift and selection interact we have to know something
about absolute viabilities.
11
And you won’t have noticed this and may not believe me when I tell you, but I’m not showing you every
possible way to write these equations.
12
But remember this definition of marginal viability anyway. You’ll see it return in a few weeks when we
talk about the additive effect of an allele and about Fisher’s Fundamental Theorem of Natural Selection.

83
Pattern Description Figure
Directional w11 > w12 > w22 Figure ??
or
w11 < w12 < w22
Disruptive w11 > w12 , w22 > w12 Figure ??
Stabiliizing w11 < w12 , w22 < w12 Figure ??

Table 9.1: Patterns of viability selection at one locus with two alleles.

Fortunately, all you really need to know is that it corresponds to the probability that a
randomly chosen A1 allele in a newly formed zygote will survive into a reproductive adult.
Why do we care? Because it provides some (obvious) intuition on how allele frequencies
will change from one generation to the next. If w1 > w̄, i.e., if the chances of a zygote carrying
an A1 allele of surviving to make an adult are greater than the chances of a randomly chosen
zygote, then A1 will increase in frequency. If w1 < w̄, A1 will decrease in frequency. Only
if p = 0, p = 1, or w1 = w̄ will the allele frequency not change from one generation to the
next.

Patterns of natural selection


Well, all that algebra was lots of fun,13 but what good did it do us? Not an enormous
amount, except that it shows us (not surprisingly), that allele frequencies are likely to change
as a result of viability selection, and it gives us a nice little formula we could plug into a
computer to figure out exactly how. One of the reasons that it’s useful14 to go through all
of that algebra is that it’s possible to make predictions about the consequences of natural
selection simply by knowing the pattern of viaiblity differences. What do I mean by pattern?
Funny you should ask (Table 9.1).
Before exploring the consequences of these different patterns of natural selection, I need
to introduce you to a very important result: Fisher’s Fundamental Theorem of Natural
Selection. We’ll go through the details later when we get to quantitative genetics. For
now all you need to know is that viability selection causes the mean fitness of the progeny
generation to be greater than or equal to the mean fitness of the parental generation, with
equality only at equilibrium, i.e.,
w̄0 ≥ w̄ .
13
I’m kidding, in case you couldn’t tell.
14
If not exactly fun.

84
How does this help us? Well, the best way to understand that is to illustrate how we can use
Fisher’s theorem to predict the outcome of natural selection when we know only the pattern
of viability differences. Let’s take each pattern in turn.

Directional selection
To use the Fundamental Theorem we plot w̄ as a function of p (Figure 9.1(a) and 9.1(b)).
The Fundamental Theorem now tells us that allele frequencies have to change from one
generation to the next in such a way that w̄0 > w̄, which can only happen if p0 > p. So
viability selection will cause the frequency of the A1 allele to increase in panel (a) and
decrease in panel (b). Ultimately, the population will be monomorphic for the homozygous
genotype with the highest fitness.15

Disruptive selection
If we plot w̄ as a function of p when w11 > w12 and w22 > w12 , we wee a very different
pattern (Figure 9.1(c)). Since the Fundamental Theorem tells us that w̄0 ≥ w̄, we know that
if the population starts with an allele on one side of the bowl A1 , will be lost. If it starts on
the other side of the bowl, A2 will be lost.16
Let’s explore this example a little further. To do so, I’m going to set w11 = 1 + s1 ,
w12 = 1, and w22 = 1 + s2 .17 When fitnesses are written this way s1 and s2 are referred to as
selection coefficients. Notice also with these definitions that the fitnesses of the homozygotes
are greater than 1.18 Using these definitions and plugging them into (9.1),

p2 (1 + s1 ) + pq
p0 =
p2 (1 + s1 ) + 2pq + q 2 (1 + s2 )
p(1 + s1 p)
= . (9.3)
1 + p2 s1 + q 2 s2

We can use equation (9.3) to find the equilibria of this system, i.e., the values of p such that
15
A population is monomorphic at a particular locus when only one allele is present. If a population is
monomorphic for allele A1 , I might also say that allele A1 is fixed in the population or that the population
is fixed for allele A1 .
16
Strictly speaking, we need to know more than w̄0 ≥ w̄, but we do know the other things we need to know
in this case. Trust me. Have I ever lied to you? (Don’t answer that.)
17
Why can I get away with this? Hint: Think about relative fitnesses.
18
Which is why I gave you the relative fitness hint in the last footnote.

85
1.00

1.00
0.90

0.90
w(p)

w(p)
0.80

0.80
(a) (b)
0.70

0.70
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

p p
1.00

1.00
0.90

0.90
w(p)

w(p)
0.80

0.80

(c) (d)
0.70

0.70

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

p p

Figure 9.1: With directional selection (panel (a) w11 > w12 > w22 , panel (b) w11 > w12 > w22 )
viability selection leads to an ever increasing frequency of the favored allele. Ultimately, the
population will be monomorphic for the homozygous genotype with the highest fitness. With
disruptive selection (panel (c) w11 > w12 and w22 > w12 ) viability selection may lead either
to an increasing frequency of the A allele or to a decreasing frequency. Ultimately, the
population will be monomorphic for one of the homozygous genotypes. Which homozygous
genotype comes to predominate, however, depends on the initial allele frequencies in the
population. With stabilizing selection (panel 86(d) w11 < w12 > w22 ; also called balancing
selection or heterozygote advantage) viability selection will lead to a stable polymorphism.
All three genotypes will be present at equilibrium.
p0 = p.
p(1 + s1 p)
p =
1 + p 2 s1 + q 2 s2
p(1 + p2 s1 + q 2 s2 ) = p(1 + s1 p)
 
p (1 + p2 s1 + q 2 s2 ) − (1 + s1 p) = 0
 
p ps1 (p − 1) + q 2 s2 = 0
p(−pqs1 + q 2 s2 ) = 0
pq(−psq + qs2 ) = 0 .
So p0 = p if p̂ = 0, q̂ = 0, or p̂s1 = q̂s2 .19 We can simplify that last one a little further, too.
p̂s1 = q̂s2
p̂s1 = (1 − p̂)s2
p̂(s1 + s2 ) = s2
s2
p̂ = .
s1 + s2
Fisher’s Fundamental Theorem tells us which of these equilibria matter. I’ve already
mentioned that depending on which side of the bowl you start, you’ll either lose the A1 allele
or the A2 allele. But suppose you happen to start exactly at the bottom of the bowl. That
corresponds to the equilibrium with p̂ = s2 /(s1 + s2 ). What happens then?
Well, if you start exactly there, you’ll stay there forever (in an infinite population). But if
you start ever so slightly off the equilibrium, you’ll move farther and farther away. It’s what
mathematicians call an unstable equilibrium. Any departure from that equilibrium gets larger
and larger. For evolutionary purposes, we don’t have to worry about a population getting to
an unstable equilibrium. It never will. Unstable equilibria are ones that populations evolve
away from.
When a population has only one allele present it is said to be fixed for that allele. Since
having only one allele is also an equilibrium (in the absence of mutation), we can also call
it a monomorphic equilibrium. When a population has more than one allele present, it is
said to be polymoprhic. If two or more alleles are present at an equilibrium, we can call it a
polymorphic equilibrium. Thus, another way to describe the results of disruptive selection is
to say that the monomorphic equilibria are stable, but the polymorphic equilibrium is not.20
19
Remember that the “hats” can mean either the estimate of an unknown paramter or an equilibrium.
The context will normally make it clear which meaning applies. In this case it should be pretty obvious that
I’m talking about equilibria.
20
Notice that a polymorphic equilibrium doesn’t even exist when selection is directional.

87
Stabilizing selection
If we plot w̄ as a function of p when w11 < w12 and w22 < w12 , we see a third pattern. The
plot is shaped like an upside down bowl (Figure 9.1).
In this case we can see that no matter what allele frequency the population starts with, the
only way that w̄0 ≥ w̄ can hold is if the allele frequency changes in such a way that it gets close
to the value where w̄ is maximized every generation. Unlike directional selection or disruptive
selection, in which natural selection tends to eliminate one allele or the other, stabilizing
selection tends to keep both alleles in the population. You’ll also see this pattern of selection
referred to as balancing selection, because the selection on each allele is “balanced” at the
polymorphic equilibria.21 We can summarize the results by saying that the monomorphic
equilibria are unstable and that the polymorphic equilibrium is stable. By the way, if we
write the fitness as w11 = 1 − s1 , w12 = 1, and w22 = 1 − s2 , then the allele frequency at the
polymorphic equilibrium is p̂ = s2 /(s1 + s2 ).22

21
In fact, the marginal fitnesses are equal, i.e., w1 = w2 .
22
I’m not showing the algebra that justifies this conclusion on the off chance that you may want to test
your understanding by verifying it yourself.

88
Chapter 10

Estimating viability

Being able to make predictions with known (or estimated) viabilities, doesn’t do us a heck of
a lot of good unless we can figure out what those viabilities are. Fortunately, figuring them
out isn’t too hard.1 If we know the number of individuals of each genotype before selection,
it’s really easy as a matter of fact. Consider that our data looks like this:

Genotype A1 A1 A1 A2 A2 A2
(z) (z) (z)
Number in zygotes n11 n12 n22
Viability w11 w12 w22
(a) (z) (a) (z) (a) (z)
Number in adults n11 = w11 n11 n12 = w12 n12 n22 = w22 n22

In other words, estimating the absolute viability simply consists of estimating the proba-
bility that an individuals of each genotype that survive from zygote to adult. The maximum-
likelihood estimate is, of course, just what you would probably guess:
(a)
nij
wij = (z)
,
nij

Since wij is a probability and the outcome is binary (survive or die), you should be able to
guess what kind of likelihood relates the observed data to the unseen parameter, namely, a
binomial likelihood. In JAGS notation:2
n.11.adult ~ dbin(w.11, n.11)
n.12.adult ~ dbin(w.12, n.11)
n.22.adult ~ dbin(w.22, n.11)
1
I almost said that it was easy, but that would be going a bit too far.
2
You knew you were going to see this again, didn’t you?

89
Estimating relative viability
To estimate absolute viabilities, we have to be able to identify genotypes non-destructively,
because we have to know what their genotype was both before the selection event and after
the selection event.That’s fine if we happen to be dealing with an experimental situation
where we can do controlled crosses to establish known genotypes or if we happen to be
studying an organism and a trait where we can identify the genotype from the phenotype of
a zygote (or at least a very young individual) and from surviving adults.3 What do we do
when we can’t follow the survival of individuals with known genotype? Give up?4
Remember that to make inferences about how selection will act, we only need to know
relative viabilities, not absolute viabilities.5 We still need to know something about the
genotypic composition of the population before selection, but it turns out that if we’re only
interested in relative viabilities, we don’t need to follow individuals. All we need to be able
to do is to score genotypes and estimate genotype frequencies before and after selection. Our
data looks like this:

Genotype A1 A1 A1 A2 A2 A2
(z) (z) (z)
Frequency in zygotes x11 x12 x22
(a) (a) (a)
Frequency in adults x11 x12 x22

We also know that


(a) (z)
x11 = w11 x11 /w̄
(a) (z)
x12 = w12 x12 /w̄
(a) (z)
x22 = w22 x22 /w̄ .

Suppose we now divide all three equations by the middle one:


(a) (a) (z) (z)
x11 /x12 = w11 x11 /w12 x12
1 = 1
(a) (a) (z) (z)
x22 /x12 = w22 x22 /w12 x12 ,
3
How many organisms and traits can you think of that satisfy this criterion? Any? There is one other
possibility: If we can identify an individual’s genotype after it’s dead and if we can construct a random
sample that includes both living and dead individuals and if we the probability of including an individual
in the sample doesn’t depend on whether that individual is dead or alive, then we can sample a population
after the selection event and score genotypes both before and after the event from one set of observations.
4
Would I be asking the question if the answer were “Yes”?
5
At least that’s true until we start worrying about how selection and drift interact.

90
or, rearranging a bit
  
(a) (z)
w11 x   x12 
=  11
(a) (z)
w12 x12 x11
  
(a) (z)
w22 x   x12 
=  22
(a) (z)
.
w12 x12 x22

This gives us a complete set of relative viabilities.

Genotype A1 A1 A1 A2 A2 A2

w11 w22
Relative viability w12
1 w12

If we use the maximum-likelihood estimates for genotype frequencies before and after
selection, we obtain maximum likelihood estimates for the relative viabilities.6 If we use
Bayesian methods to estimate genotype frequencies (including the uncertainty around those
estimates), we can use these formulas to get Bayesian estimates of the relative viabilities
(and the uncertainty around them).

An example
Let’s see how this works with some real data from Dobzhansky’s work on chromosome
inversion polymorphisms in Drosophila pseudoobscura.7

Genotype ST /ST ST /CH CH/CH Total


Number in larvae 41 82 27 150
Number in adults 57 169 29 255

You may be wondering how the sample of adults can be larger than the sample of larvae.
That’s because to score an individual’s inversion type, Dobzhansky had to kill it. The
numbers in larvae are based on a sample of the population, and the adults that survived
6
If anyone cares, it’s because of the invariance property of maximum-likelihod estimates. If you don’t
understand what that is, don’t worry about it, just trust me.
7
Taken from [16].

91
were not genotyped as larvae. As a result, all we can do is to estimate the relative viabilities.
  
(a) (z) ! !
w11 x11   x12  57/255 82/150
=  (a) (z)
= = 0.67
w12 x12 x11 169/255 41/150
  
(a) (z) ! !
w22 x22   x12  29/255 82/150
=  (a) (z)
= = 0.52 .
w12 x12 x22 169/255 27/150

So it looks as if we have balancing selection, i.e., the fitness of the heterozygote exceeds that
of either homozygote.
We can check to see whether this conclusion is statistically justified by comparing the
observed number of individuals in each genotype category in adults with what we’d expect
if all genotypes were equally likely to survive.

Genotype ST /ST 


ST /CH
 
CH/CH
 
41 82 27
Expected 150
255 150
255 150
255
69.7 139.4 45.9
Observed 57 169 29
χ22 = 14.82, P < 0.001
So we have strong evidence that genotypes differ in their probability of survival.
We can also use our knowledge of how selection works to predict the genotype frequencies
at equilibrium:
w11
= 1 − s1
w12
w22
= 1 − s2 .
w12
So s1 = 0.33, s2 = 0.48, and the predicted equilibrium frequency of the ST chromosome is
s2 /(s1 + s2 ) = 0.59.
Now all of those estimates are maximum-likelihood estimates. Doing these estimates in
a Bayesian context is relatively straightforward and the details will be left as an excerise.8
In outline we simply

1. Estimate the gentoype frequencies before and after selection as samples from a multi-
nomial.
8
In past years Problem #3 has consisted of making Bayesian estimates of viabilities from data like these
and predicting the outcome of viability selection. This year Nora will illustrate the approach (unless you’d
rather have her spend more time helping you with Problem #2).

92
2. Apply the formulas above to calculate relative viabilities and selection coefficients.

3. Determine whether the 95% credible intervals for s1 or s2 overlap 0.9

4. Calculate the equilibrium frequency from s2 /(s1 + s2 ), if s1 > 0 and s2 > 0. Otherwise,
determine which fixation state will be approached.

In the end you then have not only viability estimates and their associated uncertainties, but
a prediction about the ultimate composition of the population, associated with an accom-
panying level of uncertainty.

9
Meaning that we don’t have good evidence for selection either for or against the associated homozygotes,
relative to the heterozygote.

93
94
Chapter 11

Selection at one locus with many


alleles, fertility selection, and sexual
selection

It’s easy to extend the Hardy-Weinberg principle to multiple alleles at a single locus. In fact,
we already did this when we were discussing the ABO blood group polymorphism. Just to
get some notation out of the way, though, let’s define xij as the frequency of genotype Ai Aj
and pi as the frequency of allele Ai . Then
(
p2i if i = j
xij =
2pi pj if i 6= j

Unfortunately, the simple principles we’ve learned for understanding selection at one locus
with two alleles don’t generalize completely to selection at one locus with many alleles (or
even three).

• For one locus with two alleles, heterozygote advantage guarantees maintenance of a
polymorphism.

• For one locus with multiple alleles, there are many different heterozygote genotypes.
As a result, there is not a unique pattern identifiable as “heterozygote advantage,”
and selection may eliminate one or more alleles at equilibrium even if all heterozygotes
have a higher fitness than all homozygotes.

95
Selection at one locus with multiple alleles
When we discussed selection at one locus with two alleles, I used the following set of viabil-
ities:
A1 A1 A1 A2 A2 A2
w11 w12 w22

You can probably guess where this is going. Namely, I’m going to use wij to denote the
viability of genotype Ai Aj . What you probably wouldn’t thought of doing is writing it as a
matrix
A1 A2
A1 w11 w12
A2 w12 w22
Clearly we can extend an array like this to as many rows and columns as we have alleles so
that we can summarize any pattern of viability selection with such a matrix. Notice that I
didn’t write both w12 and w21 , because (normally) an individual’s fitness doesn’t depend on
whether it inherited a particular allele from its mom or its dad.1

Marginal fitnesses and equilbria


After a little algebra it’s possible to write down how allele frequencies change in response to
viability selection:2
pi w i
p0i = ,

where pi = j pi wij is the marginal fitness of allele i and w̄ = i p2i wii + i j>i 2pi pj wij is
P P P P

the mean fitness in the population.


It’s easy to see3 that if the marginal fitness of an allele is less than the mean fitness of
the population it will decrease in frequency. If its marginal fitness is greater than the mean
fitness, it will increase in frequency. If its marginal fitness is equal to the mean fitness it
won’t change in frequency. So if there’s a stable polymorphism, all alleles present at that
equilibrium will have marginal fitnesses equal to the population mean fitness. And, since
they’re all equal to the same thing, they’re also all equal to one another.
That’s the only thing easy to say about selection with multiple alleles. To say anything
more complete would require a lot of linear algebra. The only general conclusion I can
1
If it’s a locus that’s subject to genomic imprinting, it may be necessary to distinguish A1 A2 from A2 A1 .
Isn’t genetics fun?
2
If you’re ambitious (or a little weird), you might want to try to see if you can derive this yourself.
3
At least it’s easy to see if you’ve stared a lot at these things in the past.

96
mention, and I’ll have to leave it pretty vague, is that for a complete polymorphism4 to
be stable, none of the fitnesses can be too different from one another. Let’s play with an
example to illustrate what I mean.

An example
The way we always teach about sickle-cell anemia isn’t entirely accurate. We talk as if
there is a wild-type allele and the sickle-cell allele. In fact, there are at least three alleles at
this locus in many populations where there is a high frequency of sickle-cell allele. In the
wild-type, A, allele there is a glutamic acid at position six of the β chain of hemoglobin. In
the most common sickle-cell allele, S, there is a valine in this position. In a rarer sickle-cell
allele, C, there is a lysine in this position. The fitness matrix looks like this:
A S C
A 0.976 1.138 1.103
S 0.192 0.407
C 0.550
There is a stable, complete polymorphism with these allele frequencies:5
pA = 0.83
pS = 0.07
pC = 0.10 .
If allele C were absent, A and S would remain in a stable polymorphism:
pA = 0.85
pS = 0.15
If allele A were absent, however, the population would fix on allele C.6
Weird property #1: The existence of a stable, complete polymorphism does
not imply that all subsets of alleles could exist in stable polymorphisms. Loss
of one allele as a result of random chance could result in a cascading loss of
diversity.7
4
A complete polymorphism is one in which all alleles are present.
5
If you’re wondering how I know that, feel free to ask. Otherwise, just take my word for it. Would I lie
to you? (Don’t answer that.)
6
Can you explain why? Take a close look at the fitnesses, and it should be fairly obvious.
7
The same thing can happen in ecological commmunities. Loss of a single species from a stable community
may lead to a cascading loss of several more.

97
If the fitness of AS were 1.6 rather than 1.138, C would be lost from the population, although
the A − S polymorphism would remain.

Weird property #2: Increasing the selection in favor of a heterozygous geno-


type may cause selection to eliminate one or more of the alleles not in that
heterozygous genotype. This also means that if a genotype with a very high fit-
ness in heterozygous form is introduced into a population, the resulting selection
may eliminate one or more of the alleles already present.

Fertility selection
So far we’ve been talking about natural selection that occurs as a result of differences in
the probability of survival, i.e., viability selection. There are, of course, other ways in which
natural selection can occur:

• Heterozygotes may produce gametes in unequal frequencies, segregation distortion, or


gametes may differ in their ability to participate in fertilization, gametic selection.8

• Some genotypes may be more successful in finding mates than others, sexual selection.

• The number of offspring produced by a mating may depend on maternal and paternal
genotypes, fertility selection.

In fact, most studies that have measured components of selection have identified far larger
differences due to fertility than to viability. Thus, fertility selection is a very important
component of natural selection in most populations of plants and animals. As we’ll see a
little later, it turns out that sexual selection is mathematically equivalent to a particular
type of fertility selection. But before we get to that, let’s look carefully at the mechanics of
fertility selection.

Formulation of fertility selection


I introduced the idea of a fitness matrix earlier when we were discussing selection at one
locus with more than two alleles. Even if we have only two alleles, it becomes useful to
describe patterns of fertility selection in terms of a fitness matrix. Describing the matrix is
easy. Writing it down gets messy. Each element in the table is simply the average number of
8
For the botanists in the room, I should point out that selection on the gametophyte stage of the life
cycle (in plants with alternation of generations) is mathematically equivalent to gametic selection.

98
offspring produced by a given mated pair. We write down the table with paternal genotypes
in columns and maternal genotypes in rows:

Paternal genotype
Maternal genotype A1 A1 A1 A2 A2 A2
A1 A1 F11,11 F11,12 F11,22
A1 A2 F12,11 F12,12 F12,22
A2 A2 F22,11 F22,12 F22,22

Then the frequency of genotype A1 A1 after one generation of fertility selection is:9

x211 F11,11 + x11 x12 (F11,12 + F12,11 )/2 + (x212 /4)F12,12


x011 = , (11.1)

where F̄ is the mean fecundity of all matings in the population.10
It probably won’t surprise you to learn that it’s very difficult to say anything very general
about how genotype frequenices will change when there’s fertility selection. Not only are
there nine different fitness parameters to worry about, but since genotypes are never guar-
anteed to be in Hardy-Weinberg proportion, all of the algebra has to be done on a system
of three simultaneous equations.11 There are three weird properties that I’ll mention:

1. F̄ 0 may be smaller than F̄ . Unlike selection on viabilities in which fitness evolved to


the maximum possible value, there are situations in which fitness will evolve to the
minimum possible value when there’s selection on fertilities.12

2. A high fertility of heterozygote × heterozygote matings is not sufficient to guarantee


that the population will remain polymorphic.

3. Selection may prevent loss of either allele, but there may be no stable equilibria.

Conditions for protected polymorphism


There is one case in which it’s fairly easy to understand the consequences of selection, and
that’s when one of the two alleles is very rare. Suppose, for example, that A1 is very rare,
9
I didn’t say it, but you can probably guess that I’m assuming that all of the conditions for Hardy-
Weinberg apply, except for the assumption that all matings leave the same number of offspring, on average.
10
As an exercise you might want to see if you can derive the corresponding equations for x012 and x022 .
11
And you thought that dealing with one was bad enough!
12
Fortunately, it takes rather weird fertility schemes to produce such a result.

99
then a little algebraic trickery13 shows that

x011 ≈ 0
x12 (F12,22 + F22,12 )/2
x012 ≈
F22,22

So A1 will become more frequent if

(F12,22 + F22,12 )/2 > F22,22 (11.2)

Similarly, A2 will become more frequent when it’s very rare when

(F11,12 + F12,11 )/2 > F11,11 . (11.3)

If both equation (11.2) and (11.3) are satisfied, natural selection will tend to prevent either
allele from being eliminated. We have what’s known as a protected polymorphism.
Conditions (11.2) and (11.3) are fairly easy to interpret intuitively: There is a protected
polymorphism if the average fecundity of matings involving a heterozygote and the “resident”
homozygote exceeds that of matings of the resident homozygote with itself.14
NOTE: It’s entirely possible for neither inequality to be satisfied and for their to be
a stable polymorphism. In other words, depending on where a population starts, selection
may eliminate one allele or the other or keep both segregating in the population in a stable
polymorphism.15

Sexual selection
A classic example of sexual selection is the peacock’s “tail” feathers.16 The long, elaborate
feathers do nothing to promote survival of male peacocks, but they are very important in
determining which males attract mates and which don’t. If you’ll recall, when we originally
derived the Hardy-Weinberg principle we said that the matings occurred randomly. Sexual
selection is clearly an instance of non-random mating. Let’s go back to our original mating
table and see how we need to modify it to accomodate sexual selection.
13
The trickery isn’t hard, just tedious. Justifying the trickery is a little more involved, but not too bad.
If you’re interested, drop by my office and I’ll show you.
14
A “resident” homozygote is the one of which the populations is almost entirely composed when all but
one allele is rare.
15
Can you guess what pattern of fertilities is consistent with both a stable polymorphism and the lack of
a protected polymorphism?
16
The brightly colored “tail” is actually the upper tail covert.

100
Offsrping genotype
Mating Frequency A1 A1 A1 A2 A2 A2
A1 A1 × A1 A1 xf11 xm
11 1 0 0
A1 A2 xf11 xm
12
1
2
1
2
0
A2 A2 xf11 xm
22 0 1 0
A1 A2 × A1 A1 xf12 xm
11
1
2
1
2
0
A1 A2 xf12 xm
12
1
4
1
2
1
4
A1 A2 xf12 xm
22 0 1
2
1
2
A2 A2 × A1 A1 xf22 xm
11 0 1 0
A1 A2 xf22 xm
12 0 1
2
1
2
f m
A2 A2 x22 x22 0 0 1

What I’ve done is to assume that there is random mating in the populations among those
individuals that are included in the mating pool. We’ll assume that all females are mated so
that xfij = xij .17 We’ll let the relative attractiveness of the male genotypes be a11 , a12 , and
a22 . Then it’s not too hard to convince yourself that
x11 a11
xm
11 =

x12 a12
xm
12 =

x22 a22
xm
22 = ,

where ā = x11 a11 + x12 a12 + x22 a22 . A little more algebra and you can see that
x211 a11 + x11 x12 (a12 + a11 )/2 + x212 a12 /4
x011 = (11.4)

And we could derive similar equations for x012 and x022 . Now you’re not likely to remember
this, but equation (11.4) bears a striking resemblance to one you saw earlier, equation (11.1).
In fact, sexual selection is equivalent to a particular type of fertility selection, in terms of how
genotype frequencies will change from one generation to the next. Specifically, the fertility
matrix corresponding to sexual selection on a male trait is:
A1 A1 A1 A2 A2 A2
A1 A1 a11 a12 a22
A1 A2 a11 a12 a22
A2 A2 a11 a12 a22
17
There’s a reason for doing this called Bateman’s principle that we can discuss, if you’d like.

101
There are, of course, a couple of other things that make sexual selection interesting.
First, traits that are sexually selected in males often come at a cost in viability, so there’s
a tradeoff between survival and reproduction that can make the dynamics complicated and
interesting. Second, the evolution of a sexually selected trait involves two traits: the male
characteristic that is being selected and a female preference for that trait. In fact the two
tend to become associated so that the female preference evokes a sexually selected response
in males, which evokes a stronger preference in females, and so on and so on. This is a
process Fisher referred to as “runaway sexual selection.”

102
Part III

Genetic drift

103
Chapter 12

Genetic Drift

So far in this course we’ve talked about changes in genotype and allele frequencies as if
they were completely deterministic. Given the current allele frequencies and viabilities, for
example, we wrote down an equation describing how they will change from one generation
to the next:
p2 w11 + pqw12
p0 = .

Notice that in writing this equation, we’re claiming that we can predict the allele frequency
in the next generation without error. But suppose the population is small, say 10 diploid
individuals, and our prediction is that p0 = 0.5. Then just as we wouldn’t be surprised if we
flipped a coin 20 times and got 12 heads, we shouldn’t be surprised if we found that p0 = 0.6.
The difference between what we expect (p0 = 0.5) and what we observe (p0 = 0.6) can be
chalked up to statistical sampling error. That sampling error is the cause of (or just another
name for) genetic drift — the tendency for allele frequencies to change from one generation
to the next in a finite population even if there is no selection.

A simple example
To understand in more detail what happens when there is genetic drift, let’s consider the
simplest possible example: a haploid population consisting of 2 individuals.1 Suppose that
we are studying a locus with only two alleles in this population A1 and A2 . This implies
that p = q = 0.5, but we’ll ignore that numerical fact for now and simply imagine that the
frequency of the A1 allele is p.
We imagine the following scenario:
1
Notice that once we start talking about genetic drift, we have to specify the size of the population.

105
• Each individual in the population produces a very large number of haploid gametes
that develop directly into adult offspring.

• The allele in each offspring is an identical copy of the allele in its parent, i.e., A1 begets
A1 and A2 begets A2 . In other words, there’s no mutation.

• The next generation is constructed by picking two offspring at random from the very
large number of offspring produced by these two individuals.

Then it’s not too hard to see that

Probability that both offspring are A1 = p2


Probability that one offspring is A1 and one is A2 = 2pq
Probability that both offspring are A2 = q 2

Of course p0 = 1 if both offspring sampled are A1 , p0 = 1/2 if one is A1 and one is A2 , and
p0 = 0 if both are A2 , so that set of equations is equivalent to this one:

P (p0 = 1) = p2 (12.1)
P (p0 = 1/2) = 2pq (12.2)
P (p0 = 0) = q 2 (12.3)

In other words, we can no longer predict with certainty what allele frequencies in the next
generation will be. We can only assign probabilities to each of the three possible outcomes.
Of course, in a larger population the amount of uncertainty about the allele frequencies
will be smaller,2 but there will be some uncertainty associated with the predicted allele
frequencies unless the population is infinite.
The probability of ending up in any of the three possible states obviously depends on
the current allele frequency. In probability theory we express this dependence by writing
equations (12.1)–(12.3) as conditional probabilities:

P (p1 = 1|p0 ) = p20 (12.4)


P (p1 = 1/2|p0 ) = 2p0 q0 (12.5)
P (p1 = 0|p0 ) = q02 (12.6)

I’ve introduced the subscripts so that we can distinguish among various generations in the
process. Why? Because if we can write equations (12.4)–(12.6), we can also write the
2
More about that later.

106
following equations:3

P (p2 = 1|p1 ) = p21


P (p2 = 1/2|p1 ) = 2p1 q1
P (p2 = 0|p1 ) = q12

Now if we stare at those a little while, we4 begin to see some interesting possibilities.
Namely,

P (p2 = 1|p0 ) = P (p2 = 1|p1 = 1)P (p1 = 1|p0 ) + P (p2 = 1|p1 = 1/2)P (p1 = 1/2|p0 )
= (1)(p20 ) + (1/4)(2p0 q0 )
= p20 + (1/2)p0 q0
P (p2 = 1/2|p0 ) = P (p2 = 1/2|p1 = 1/2)P (p1 = 1/2|p0 )
= (1/2)(2p0 q0 )
= p0 q0
P (p2 = 0|p0 ) = P (p2 = 0|p1 = 0)P (p1 = 0|p0 ) + P (p2 = 0|p1 = 1/2)P (p1 = 1/2|p0 )
= (1)(q02 ) + (1/4)(2p0 q0 )
= q02 + (1/2)p0 q0

It takes more algebra than I care to show,5 but these equations can be extended to an
arbitrary number of generations.
 
P (pt = 1|p0 ) = p20 + 1 − (1/2)t−1 p0 q0
P (pt = 1/2|p0 ) = p0 q0 (1/2)t−2
 
P (pt = 0|p0 ) = q02 + 1 − (1/2)t−1 p0 q0

Why do I bother to show you these equations?6 Because you can see pretty quickly that
as t gets big, i.e., the longer our population evolves, the smaller the probability that pt = 1/2
becomes. In fact, it’s not hard to verify two facts about genetic drift in this simple situation:

1. One of the two alleles originally present in the population is certain to be lost eventually.
3
I know. I’m weird. I actually get a kick out of writing equations!
4
Or at least the weird ones among us
5
Ask me, if you’re really interested.
6
It’s not just that I’m crazy.

107
2. The probability that A1 is fixed is equal to its initial frequency, p0 , and the probability
that A2 is fixed is equal to its initial frequency, q0 .

Both of these properties are true in general for any finite population and any number of
alleles.

1. Genetic drift will eventually lead to loss of all alleles in the population except one.7

2. The probability that any allele will eventually become fixed in the population is equal
to its current frequency.

General properties of genetic drift


What I’ve shown you so far applies only to a haploid population with two individuals. Even
I will admit that it isn’t a very interesting situation. Suppose, however, we now consider
a populaton with N diploid individuals. We can treat it as if it were a population of 2N
haploid individuals using a direct analogy to the process I described earlier, and then things
start to get a little more interesting.

• Each individual in the population produces a large number of gametes.

• The allele in each gamete is an identical copy of the allele in the individual that
produced it, i.e., A1 begets A1 and A2 begets A2 .

• The next generation is constructed by picking 2N gametes at random from the large
number originally produced.

We can then write a general expression for how allele frequencies will change between
generations. Specifically, the distribution describing the probability that there will be j
copies of A1 in the next generation given that there are i copies in this generation is
!
2N i i
 
P (j A1 in offspring | i A1 in parents) = 1− ,
j 2N 2N

i.e., a binomial distribution. I’ll be astonished if any of what I’m about to say is apparent
to any of you, but this equation implies three really important things. We’ve encountered
two already:
7
You obviously can’t lose all of them unless the population becomes extinct.

108
• Allele frequencies will tend to change from one generation to the next purely as a result
of sampling error. As a consequence, genetic drift will eventually lead to loss of all
alleles in the population except one.

• The probability that any allele will eventually become fixed in the population is equal
to its current frequency.

• The population has no memory.8 The probability that the offspring generation will
have a particular allele frequency depends only on the allele frequency in the parental
generation. It does not depend on how the parental generation came to have that allele
frequency. This is exactly analogous to coin-tossing. The probability that you get a
heads on the next toss of a fair coin is 1/2. It doesn’t matter whether you’ve never
tossed it before or if you’ve just tossed 25 heads in a row.9

Variance of allele frequencies between generations


For a binomial distribution
!
N k
P (K = k) = p (1 − p)N −k
k
Var(K) = N p(1 − p)
Var(p) = Var(K/N )
1
= Var(K)
N2
p(1 − p)
=
N
Applying this to our situation,
pt (1 − pt )
Var(pt+1 ) =
2N
Var(pt+1 ) measures the amount of uncertainty about allele frequencies in the next gener-
ation, given the current allele frequency. As you probably guessed long ago, the amount
of uncertainty is inversely proportional to population size. The larger the population, the
smaller the uncertainty.
8
Technically, we’ve described a Markov chain with a finite state space, but I doubt that you really care
about that.
9
Of course, if you’ve just tossed 25 heads in a row, you could be forgiven for having your doubts about
whether the coin is actually fair.

109
If you think about this a bit, you might expect that a smaller variance would “slow
down” the process of genetic drift — and you’d be right. It takes some pretty advanced
mathematics to say how much the process slows down as a function of population size,10 but
we can summarize the result in the following equation:

t̄ ≈ −4N (p log p + (1 − p) log(1 − p)) ,

where t̄ is the average time to fixation of one allele or the other and p is the current allele
frequency.11 So the average time to fixation of one allele or the other increases approximately
linearly with increases in the population size.

Analogy to inbreeding
You may have noticed some similarities between drift and inbreeding. Specifically, both
processes lead to a loss of heterozygosity and an increase in homozygosity. This analogy
leads to a useful heuristic for helping us to understand the dynamics of genetic drift.
Remember our old friend f , the inbreeding coefficient? I’m going to re-introduce you
to it in the form of the population inbreeding coefficient, the probability that two alleles
chosen at random from a population are identical by descent. We’re going to study how
the population inbreeding coefficient changes from one generation to the next as a result of
reproduction in a finite population.12

ft+1 = Prob. ibd from preceding generation


+(Prob. not ibd from prec. gen.) × (Prob. ibd from earlier gen.)
1 1
 
= + 1− ft
2N 2N
or, in general,
t
1

ft+1 = 1 − 1 − (1 − f0 ) .
2N

Summary
There are four characteristics of genetic drift that I think are particularly important for you
to remember:
10
Actually, we’ll encounter a way that isn’t quite so hard in a few lectures when we get to the coalescent.
11
Notice that this equation only applies to the case of one-locus with two alleles, although the principle
applies to any number of alleles.
12
Remember that I use the abbreviation ibd to mean identical by descent.

110
1. Allele frequencies tend to change from one generation to the next simply as a result
of sampling error. We can specify a probability distribution for the allele frequency in
the next generation, but we cannot predict the actual frequency with certainty.

2. There is no systematic bias to changes in allele frequency. The allele frequency is as


likely to increase from one generation to the next as it is to decrease.

3. If the process is allowed to continue long enough without input of new genetic material
through migration or mutation, the population will eventually become fixed for only
one of the alleles originally present.13

4. The time to fixation on a single allele is directly proportional to population size, and
the amount of uncertainty associated with allele frequencies from one generation to the
next is inversely related to population size.

Effective population size


I didn’t make a big point of it, but in our discussion of genetic drift so far we’ve assumed
everything about populations that we assumed to derive the Hardy-Weinberg principle, and
we’ve assumed that:

• We can model drift in a finite population as a result of sampling among haploid gametes
rather than as a result of sampling among diploid genotypes. Since we’re dealing with
a finite population, this effectively means that the two gametes incorporated into an
individual could have come from the same parent, i.e., self-fertilization occurs when
there’s random union of gametes in a finite, diploid population.

• Since we’re sampling gametes rather than individuals, we’re also implictly assuming
that there aren’t separate sexes.14

• The number of gametes any individual has represented in the next generation is a
binomial random variable.15

• The population size is constant.


13
This will hold true even if there is strong selection for keeping alleles in the population. Selection can’t
prevent loss of diversity, only slow it down.
14
How could there be separate sexes if there can be self-fertilization?
15
More about this later.

111
How do we deal with the fact that one or more of these conditions will be violated in
just about any case we’re interested in?16 One way would be to develop all the probability
models that incorporate that complexity and try to solve them. That’s nearly impossible,
except through computer simulations. Another, and by far the most common approach, is to
come up with a conversion formula that makes our actual population seem like the “ideal”
population that we’ve been studying. That’s exactly what effective population size is.

The effective size of a population is the size of an ideal population that has the
same properties with respect to genetic drift as our actual population does.

What does that phrase “same properties with respect to genetic drift” mean? Well there are
two ways it can be defined.17

Variance effective size


You may remember18 that the variance in allele frequency in an ideal population is

pt (1 − pt )
V ar(pt+1 ) = .
2N
So one way we can make our actual population equivalent to an ideal population to make their
allele frequency variances the same. We do this by calculating the variance in allele frequency
for our actual population, figuring out what size of ideal population would produce the same
variance, and pretending that our actual population is the same as an ideal population of
the same size. To put that into an equation,19 let Vd ar(p) be the variance we calculate for
our actual population. Then
p(1 − p)
Ne(v) = d
2V ar(p)
is the variance effective population size, i.e., the size of an ideal population that has the same
properties with respect to allele frequency variance as our actual population.

Inbreeding effective size


You may also remember that we can think of genetic drift as analogous to inbreeding. The
probability of identity by descent within populations changes in a predictable way in relation
16
OK, OK. They will probably be violated in every case we’re interested in.
17
There are actually more than two ways, but we’re only going to talk about two.
18
You probably won’t, so I’ll remind you
19
As if that will make it any clearer. Does anyone actually read these footnotes?

112
to population size, namely
1 1
 
ft+1 = + 1− ft .
2N 2N
So another way we can make our actual population equivalent to an ideal population is to
make them equivalent with respect to how f changes from generation to generation. We do
this by calculating how the inbreeding coefficient changes from one generation to the next in
our actual population, figuring out what size an ideal population would have to be to show
the same change between generations, and pretending that our actual population is the same
size at the ideal one. So suppose fˆt and fˆt+1 are the actual inbreeding coefficients we’d have
in our population at generation t and t + 1, respectively. Then
!
1 1
fˆt+1 = (f )
+ 1− (f )
fˆt
2Ne !
2Ne
1
= (f )
(1 − fˆt ) + fˆt
2Ne !
1
fˆt+1 − fˆt = (f )
(1 − fˆt )
2Ne
1 − fˆt
Ne(f ) = .
2(fˆt+1 − fˆt )

In many applications it’s convenient to assume that fˆt = 0. In that case the calculation gets
a lot simpler:
1
Ne(f ) = .
2fˆt+1
We also don’t lose anything by doing so, because Ne(f ) depends only on how much f changes
from one generation to the next, not on its actual magnitude.

Comments on effective population sizes


Those are nice tricks, but there are some limitations. The biggest is that Ne(v) 6= Ne(f ) if the
population size is changing from one generation to the next.20 So you have to decide which
of these two measures is more appropriate for the question you’re studying.
20
It’s even worse than that. When the population size is changing, it’s not clear that any of the available
adjustments to produce an effective population size are entirely satisfactory. Well, that’s not entirely true
either. Fu et al. [22] show that there is a reasonable definition in one simple case when the population size
varies, and it happens to correspond to the solution presented below.

113
• Ne(f ) is naturally related to the number of individuals in the parental populations. It
tells you something about how the probability of identity by descent within a single
population will change over time.

• Ne(v) is naturally related to the number of individuals in the offspring generation. It


tells you something about how much allele frequencies in isolated populations will
diverge from one another.

Examples
This is all pretty abstract. Let’s work through some examples to see how this all plays out.21
In the case of separate sexes and variable population size, I’ll provide a derivation of Ne(f ) .
In the case of differences in the number of offspring left by individuals, I’ll just give you the
formula and we’ll discuss some of the implications.

Separate sexes
We’ll start by assuming that fˆt = 0 to make the calculations simple. So we know that
1
Ne(f ) = .
2fˆt+1

The first thing to do is to calculate fˆt+1 . To do this we have to break the problem down into
pieces.22

• We assumed that fˆt = 0, so the only way for two alleles to be identical by descent is if
they are identical copies of the same allele in the immediately preceding generation.

• Even if the numbers of reproductive males and reproductive females are different, every
new offspring has exactly one father and one mother. Thus, the probability that the
first gamete selected at random is female is just 1/2, and the probability that the first
gamete selected is male is just 1/2.

• The probability that the second gamete selected is female given that the first one we
selected was female is (N − 1)/(2N − 1), because N out of the 2N alleles represented
21
If you’re interested in a comprehensive list of formulas relating various demographic parameters to
effective population size, take a look at [13, p. 362]. They provide a pretty comprehensive summary and a
number of derivations.
22
Remembering, of course, that fˆt+1 is the probability that two alleles drawn at random are identical by
descent.

114
among offspring came from females, and there are only N − 1 out of 2N − 1 left after
we’ve already picked one. The same logic applies for male gametes.
• The probability that one particular female gamete was chosen is 1/2Nf , where Nf is
the number of females in the population. Similarly the probability that one particular
male gamete was chosen is 1/2Nm , where Nm is the number of males in the population.
With those facts in hand, we’re ready to calculate fˆt+1 .
!
1 N −1 1 1 N −1 1
      
ft+1 = +
2 2N − 1 2Nf 2 2N − 1 2Nm
!
1 N −1 1 1
  
= +
2 2N − 1 2Nf 2Nm
!
1 2Nm + 2Nf
 

4 4Nf Nm
!
1 Nm + Nf
 
=
2 4Nf Nm
So,
4Nf Nm
Ne(f ) ≈ .
Nf + Nm
What does this all mean? Well, consider a couple of important examples. Suppose the
numbers of females and males in a population are equal, Nf = Nm = N/2. Then
4(N/2)(N/2)
Ne(f ) =
N/2 + N/2
4N 2 /4
=
N
= N .
The effective population size is equal to the actual population size if the sex ratio is 50:50. If
it departs from 50:50, the effective population size will be smaller than the actual population
size. Consider the extreme case where there’s only one reproductive male in the population.
Then
4Nf
Ne(f ) = . (12.7)
Nf + 1
Notice what this equation implies: The effective size of a population with only one reproduc-
tive male (or female) can never be bigger than 4, no matter how many mates that individual
has and no matter how many offspring are produced.

115
Variable population size
The notation for this one gets a little more complicated, but the ideas are simpler than those
you just survived. Since the population size is changing we need to specify the population
size at each time step. Let Nt be the population size in generation t. Then
1 1
 
ft+1 = 1− ft +
2Nt 2Nt
1
 
1 − ft+1 = 1− (1 − ft )
2Nt
K
!!
Y 1
1 − ft+K = 1− (1 − ft ) .
i=1 2Nt+i

Now if the population size were constant


K
!! !K
Y 1 1
1− = 1− (f )
.
i=1 2Nt+i 2Ne
Dealing with products and powers is inconvenient, but if we take the logarithm of both sides
of the equation we get something simpler:
K
! !
X 1 1
log 1 − = K log 1 − (f )
.
i=1 2Nt+i 2Ne
It’s a well-known fact23 that log(1 − x) ≈ −x when x is small. So if we assume that Ne and
all of the Nt are large,24 then
K
!
1 X 1
K − (f )
= −
2Ne i=1 2Nt+i
K
K X 1
(f )
=
Ne i=1 Nt+i
 K !−1
1 X 1

Ne(f ) =
K i=1 Nt+i

The quantity on the right side of that last equation is a well-known quantity. It’s the
harmonic mean of the Nt . It’s another well-known fact25 that the harmonic mean of a series
23
Well known to some of us at least.
24
So that there reciprocals are small
25
Are we ever going to run out of well-known facts? Probably not.

116
of numbers is always less than its arithmetic mean. This means that genetic drift may
play a much more imporant role than we might have imagined, since the effective size of a
population will be more influenced by times when it is small than by times when it is large.
Consider, for example, a population in which N1 through N9 are 1000, and N10 is 10.
−1
1 1 1
    
Ne = 9 +
10 1000 10
≈ 92

versus an arithmetic average of 901. So the population will behave with respect to the
inbreeding associated with drift like a population a tenth of its arithmetic average size.

Variation in offspring number


I’m just going to give you this formula. I’m not going to derive it for you.26
2N − 1
Ne(f ) = ,
1 + V2k

where Vk is the variance in number of offspring among individuals in the population. Re-
member I told you that the number of gametes any individual has represented in the next
generation is a binomial random variable in an ideal population? Well, if the population size
isn’t changing, that means that Vk = 2(1 − 1/N ) in an ideal population.27 A little algebra
should convince you that in this case Ne(f ) = N . It can also be shown (with more algebra)
that

• Ne(f ) < N if Vk > 2(1 − 1/N ) and

• Ne(f ) > N if Vk < 2(1 − 1/N ).

That last fact is pretty remarkable. Conservation biologists try to take advantage of it to
decrease the loss of genetic variation in small populations, especially those that are captive
bred. If you can reduce the variance in reproductive success, you can substantially increase
the effective size of the population. In fact, if you could reduce Vk to zero, then

Ne(f ) = 2N − 1 .

The effective size of the population would then be almost twice its actual size.
26
The details are in [13], if you’re interested.
27
The calculation is really easy, and I’d be happy to show it to you if you’re interested.

117
118
Chapter 13

Mutation, Migration, and Genetic


Drift

So far in this course we’ve focused on single, isolated populations, and we’ve imagined that
there isn’t any mutation. We’ve also completely ignored the ultimate source of all genetic
variation — mutation.1 We’re now going to study what happens when we consider multiple
populations simultaneously and when we allow mutation to happen. Let’s consider mutation
first, because it’s the easiest to understand.

Drift and mutation


Remember that in the absence of mutation
1 1
   
ft+1 = + 1− ft , (13.1)
2N 2N
One way of modeling mutation is to assume that every time a mutation occurs it introduces
a new allele into the population. This model is referred to as the infinite alleles model,
because it implicitly assumes that there is potentially an infinite number of alleles. Under
this model we need to make only one simple modification to equation (13.1). We have to
multiply the expression on the right by the probability that neither allele mutated:
1 1
    
ft+1 = + 1− ft (1 − µ)2 , (13.2)
2N 2N
1
Well, that’s not quite true. We talked about multiple populations when we talked about the Wahlund
effect and Wright’s FST , but we didn’t talk explicitly about any of the evolutionary processes associated
with multiple populations.

119
where µ is the mutation rate, i.e., the probability that an allele in an offspring is different
from the allele it was derived from in a parent. In writing down this expression, the reason
this is referred to as an infinite alleles model becomes apparent: we are assuming that every
time a mutation occurs it produces a new allele. The only way in which two alleles can be
identical is if neither mutated.2
So where do we go from here? Well, if you think about it, mutation is always introducing
new alleles that, by definition, are different from any of the alleles currently in the population.
It stands to reason, therefore, that we’ll never be in a situation where all of the alleles in a
population are identical by descent as they would be in the absence of mutation. In other
words we expect there to be an equilibrium between loss of diversity through genetic drift
and the introduction of diversity through mutation.3 From the definition of an equilibrium,

1 1
    
fˆ = + 1− fˆ (1 − µ)2
2N 2N
1 1
     
fˆ 1 − 1 − (1 − µ)2 = (1 − µ)2
2N 2N 
1
2N
(1 − µ)2
ˆ
f =  
1
1 − 1 − 2N (1 − µ)2
1 − 2µ
≈  
1
 
2N 1 − 1 − 2N (1 − 2µ)
1 − 2µ
= 
1 2µ

2N 1 − 1 + 2N + 2µ − 2N
1 − 2µ
=
1 + 4N µ − 2µ
1

4N µ + 1

Since f is the probability that two alleles chosen at random are identical by descent
within our population, 1 − f is the probability that two alleles chosen at random are not
2
Notice that we’re also playing a little fast and loose with definitions here, since I’ve just described this
in terms of identity by type when what the equation is written in terms of identity by descent. Can you see
why it is that I can get away with this?
3
Technically what the population reaches is not an equilibrium. It reaches a stationary distribution. At
any point in time there is some probability that the population has a particular allele frequency. After
long enough the probability distribution stops changing. That’s when the population is at its stationary
distribution. We often say that it’s “reached stationarity.” This is an example of a place where the inbreeding
analogy breaks down a little.

120
identical by descent in our population. So 1−f = 4N µ/(4N µ+1) is a reasonable measure of
the genetic diversity within the population. Notice that as N increases, the genetic diversity
maintained in the population also increases. This shouldn’t be too surprising. The rate
at which diversity is lost declines as population size increases so larger populations should
retain more diversity than small ones.4

A two-allele model with recurrent mutation


There’s another way of looking at the interaction between drift and mutation. Suppose we
have a set of populations with two alleles, A1 and A2 . Suppose further that the rate of
mutation from A1 to A2 is equal to the rate of mutation from A2 to A1 .5 Call that rate µ. In
the absence of mutation a fraction p0 of the populations would fix on A1 and the rest would
fix on A2 , where p0 is the original frequency of A1 . With recurrent mutation, no population
will ever be permanently fixed for one allele or the other. Instead we see the following:

12

10

0
0 0.2 0.4 0.6 0.8 1

Allele Frequency

When 4N µ < 1 the stationary distribution of allele frequencies is bowl-shaped, i.e, most
populations have allele frequencies near 0 or 1. When 4N µ > 1, the stationary distribution of
4
Remember that if we’re dealing with a non-ideal population, as we always are, you’ll need to substitute
Ne for N in this equation and others like it.
5
We don’t have to make this assumption, but relaxing it makes an already fairly complicated scenario
even more complicated. If you’re really interested, ask me about it.

121
allele frequencies is hump-shaped, i.e., most populations have allele frequencies near 0.5. In
other words if the population is “small,” drift dominates the distribution of allele frequencies
and causes populations to become differentiated. If the population is “large,” mutation
dominates and keeps the allele frequencies in the different populations similar to one another.
That’s what we mean when we say that a population is “large” or “small”. A population
is “large” if evolutionary processes other than drift have a predominant influence on the
outcom. It’s “small” if drift has a predominant role on the outcome.
A population is large with respect to the drift-mutation process if 4N µ > 1, and it is
small if 4N µ < 1. Notice that calling a population large or small is really just a convenient
shorthand. There isn’t much of a difference between the allele frequency distributions when
4N µ = 0.9 and when 4N µ = 1.1. Notice also that because mutation is typically rare, on the
order of 10−5 or less per locus per generation for a protein-coding gene and on the order of
10−3 or less per locus for a microsatellite, a population must be pretty large (> 25, 000 or
> 250) to be considered large with respect to the drift-migration process. Notice also that
whether the population is “large” or “small” will depend on the loci that you’re studying.

Drift and migration


I just pointed out that if populations are isolated from one another they will tend to diverge
from one another as a result of genetic drift. Recurrent mutation, which “pushes” all popu-
lations towards the same allele frequency, is one way in which that tendency can be opposed.
If populations are not isolated, but exchange migrants with one another, then migration will
also oppose the tendency for populations to become different from one another. It should
be obvious that there will be a tradeoff similar to the one with mutation: the larger the
populations, the less the tendency for them to diverge from one another and, therefore, the
more migration will tend to make them similar. To explore how drift and migration interact
we can use an approach exactly analogous to what we used for mutation.
The model of migration we’ll consider is an extremely oversimplified one. It imagines that
every allele brought into a population is different from any of the resident alleles.6 It also
imagines that all populations receive the same fraction of migrants. Because any immigrant
allele is different, by assumption, from any resident allele we don’t even have to keep track
of how far apart populations are from one another, since populations close by will be no
more similar to one another than populations far apart. This is Wright’s island model of

6
Sounds a lot like the infinite alleles model of mutation, doesn’t it? Just you wait. The parallel gets even
more striking.

122
migration. Given these assumptions, we can write the following:
1 1
    
ft+1 = + 1− ft (1 − m)2 . (13.3)
2N 2N
That might look fairly familiar. In fact, it’s identical to equation (13.2) except that
there’s an m in (13.3) instead of a µ. m is the migration rate, the fraction of individuals in
a population that is composed of immigrants. More precisely, m is the backward migration
rate. It’s the probability that a randomly chosen individual in this generation came from a
population different from the one in which it is currently found in the preceding generation.
Normally we’d think about the forward migration rate, i.e., the probability that a randomly
chosen individual with go to a different population in the next generation, but backwards
migration rates turn out to be more convenient to work with in most population genetic
models.7
It shouldn’t surprise you that if equations (13.2) and (13.3) are so similar the equilibrium
f under drift and migration is
1
fˆ ≈
4N m + 1
In fact, the two allele analog to the mutation model I presented earlier turns out to be pretty
similar, too.

• If 2N m > 1, the stationary distribution of allele frequencies is hump-shaped, i.e., the


populations tend not to diverge from one another.8
• If 2N m < 1, the stationary distribution of allele frequencies is bowl-shaped, i.e., the
populations tend to diverge from one another.

Now there’s a consequence of these relationships that’s both surprising and odd. N is
the population size. m is the fraction of individuals in the population that are immigrants.
So N m is the number of individuals in the population that are new immigrants in any
generation. That means that if populations receive more than one new immigrant every other
generation, on average, they’ll tend not to diverge in allele frequency from one another.9 It
doesn’t make any difference if the populations have a million individuals apiece or ten. One
new immigrant every other generation is enough to keep them from diverging.
With a little more reflection, this result is less surprising than it initially seems. After all
in populations of a million individuals, drift will be operating very slowly, so it doesn’t take
7
I warned you weeks ago that population geneticists tend to think backwards.
8
You read that right it’s 2N m not 4N m as you might have expected from the mutation model. If you’re
really interested why there’s a difference, I can show you. But the explanation isn’t simple.
9
In the sense that the stationary distribution of allele frequencies is hump-shaped.

123
a large proportion of immigrants to keep populations from diverging.10 In populations with
only ten individuals, drift will be operating much more quickly, so it takes a large proportion
of immigrants to keep populations from diverging.11

10
And one immigrant every other generation corresponds to a backwards migration rate of only 5 × 10−7 .
11
And one immigrant every other generation corresponds to a backwards migration rate of 5 × 10−2 .

124
Chapter 14

Selection and genetic drift

There are three basic facts about genetic drift that I really want you to remember, even if
you forget everything else I’ve told you about it:

1. Allele frequencies tend to change from one generation to the next purely as a result
of random sampling error. We can specify a probability distribution for the allele
frequency in the next generation, but we cannot specify the numerical value exactly.
2. There is no systematic bias to the change in allele frequency, i.e., allele frequencies are
as likely to increase from one generation to the next as to decrease.
3. Populations will eventually fix for one of the alleles that is initially present unless
mutation or migration introduces new alleles.

Natural selection introduces a systematic bias in allele frequency changes. Alleles favored
by natural selection tend to increase in frequency. Notice that word “tend.” It’s critical. Be-
cause there is a random component to allele frequency change when genetic drift is involved,
we can’t say for sure that a selectively favored allele will increase in frequency. In fact, we
can say that there’s a chance that a selectively favored allele won’t increase in frequency.
There’s also a chance that a selectively disfavored allele will increase in frequency in spite
of natural selection.

Loss of beneficial alleles


We’re going to confine our studies to our usual simple case: one locus, two alleles. We’re
also going to consider a very simple form of directional viability selection in which the
heterozygous genotype is exactly intermediate in fitness.

125
A1 A1 A1 A2 A2 A2
1+s 1 + 12 s 1
After solving a reasonably complex partial differential equation, it can be shown that1
the probability that allele A1 2 is fixed, given that its current frequency is p is
1 − e−2Ne sp
P1 (p) = . (14.1)
1 − e−2Ne s
Now it won’t be immediately evident to you, but this equation actually confirms our intuition
that even selectively favored alleles may sometimes be lost as a result of genetic drift. How
does it do that? Well, it’s not too hard to verify that P1 (p) < 1.3 The probability that the
beneficial allele is fixed is less than one meaning that the probability it is lost is greater than
zero, i.e., there’s some chance it will be lost.
How big is the chance that a favorable allele will be lost? Well, consider the case of a
newly arisen allele with a beneficial effect. If it’s newly arisen, there is only one copy by
definition. In a diploid population of N individuals that means that the frequency of this
allele is 1/2N . Plugging this into equation (14.1) above we find
1 − e−2Ne s(1/2N )
P1 (p) =
1 − e−2Ne s
≈ 1 − e−Ne s(1/N ) if 2Ne s is “large”
Ne
 
≈ s if s is “small.”
N
In other words, most beneficial mutations are lost from populations unless they are very
beneficial. If s = 0.2 in an ideal population, for example, a beneficial mutation will be lost
about 80% of the time.4 Remember that in a strict harem breeding system with a single
male Ne ≈ 4 if the number of females with which the male breeds is large enough. Suppose
that there are 99 females in the population. Then Ne /N = 0.04 and the probability that
this beneficial mutation will be fixed is only 0.8%.
Notice that unlike what we saw with natural selection when we were ignoring genetic
drift, the strength of selection5 affects the outcome of the interaction. The stronger selection
is the more likely it is that the favored allele will be fixed. But it’s also the case that the
larger the population is, the more likely the favored allele will be fixed.6 Size does matter.
1
Remember, I told you that “it can be shown that” hides a lot of work.
2
The beneficial allele.
3
Unless p = 1.
4
The exact calculation from equation (14.1) gives 82% for this probability.
5
i.e., the magnitude of differences in relative viabilities
6
Because the larger the population, the smaller the effect of drift.

126
Ne
s 4 100
0.001 1 × 10−2 9 × 10−3
0.01 1 × 10−2 3 × 10−3
0.1 7 × 10−3 5 × 10−10

Table 14.1: Fixation probabilities for a deleterious mutation as a function of effective popu-
lation size and selection coefficient for a newly arisen mutant (p = 0.01).

Fixation of detrimental alleles


If drift can lead to the loss of beneficial alleles, it should come as no surprise that it can
also lead to fixation of deleterious ones. In fact, we can use the same formula we’ve been
using (equation (14.1)) if we simply remember that for an allele to be deleterious s will be
negative. So we end up with
1 − e2Ne sp
P1 (p) = . (14.2)
1 − e2Ne s
One implication of equation (14.2) that should not be surprising by now is that evan a
deleterious allele can become fixed. Consider our two example populations again, an ideal
population of size 100 (Ne = 100) and a population with 1 male and 99 females (Ne = 4).
Remember, the probability of fixation for a newly arisen allele allele with no effect on fitness
is 1/2N = 5 × 10−3 (Table 14.1).7

Conclusions
I’m not going to try to show you the formulas, but it shouldn’t surprise you to learn that
heterozygote advantage won’t maintain a polymorphism indefinitely in a finite population.
At best what it will do is to retard its loss.8 There are four properties of the interaction of
drift and selection that I think you should take away from this brief discussion:

1. Most mutations, whether beneficial, deleterious, or neutral, are lost from the population
in which they occurred.
7
Because it’s probabliity of fixation is equal to its current frequency, i.e., 1/2N . We’ll return to this
observation in a few weeks when we talk about the neutral theory of molecular evolution.
8
In some cases it can actually accelerate its loss, but we won’t discuss that unless you are really interested.

127
2. If selection against a deleterious mutation is weak or Ne is small,9 a deleterious mutation
is almost as likely to be fixed as neutral mutants. They are “effectively neutral.”

3. If Ne is large, deleterious mutations are much less likely to be fixed than neutral
mutations.

4. Even if Ne is large, most favorable mutations are lost.

9
As with mutation and migration, what counts as large or small is determined by the product of Ne and
s. If it’s bigger than one the population is regarded as large, because selective forces predominate. If it’s
smaller than one, it’s regarded as small, because drift predominates.

128
Chapter 15

The Coalescent

I’ve mentioned many times by now that population geneticists often look at the world back-
wards. Sometimes when they do, the result is very useful. Consider genetic drift, for example.
So far we’ve been trying to predict what will happen in a population given a particular effec-
tive population size. But when we collect data we are often more interested in understanding
the processes that produced the pattern we find than in predicting what will happen in the
future. So let’s take a backward look at drift and see what we find.

Reconstructing the genealogy of a sample of alleles


Specifically, let’s keep track of the genealogy of alleles. In a finite population, two randomly
chosen alleles will be identical by descent with respect to the immediately preceding genera-
tion with probability 1/2Ne . That means that there’s a chance that two alleles in generation
t are copies of the same allele in generation t − 1. If the population size is constant, mean-
ing that the number of alleles in the population is remaining constant, then there’s also a
chance that some alleles present in generation t − 1 will not have descendants in generation t.
Looking backward, then, the number of alleles in generation t − 1 that have descendants in
generation t is always less than or equal to the number of alleles in generation t. That means
if we trace the ancestry of alleles in a sample back far enough, all of them will be descended
from a single common ancestor. Figure 15.1 provides a simple schematic illustrating how
this might happen.
Now take a look at Figure 15.1. Time runs from the top of the figure to the bottom,
i.e., the current generation is represented by the circles in the botton row of the figure.
Each circle represents an allele. The eighteen alleles in our current sample are descended
from only four alleles that were present in the populations ten generations ago. The other

129
Figure 15.1: A schematic depiction of one possible realization of the coalescent process in
a population with 18 haploid gametes. There are four coalescent events in the generation
immediately preceding the last one illustrated, one involving three alleles.

fourteen alleles present in the population ten generations ago left no descendants. How far
back in time we’d have to go before all alleles are descended from a single common ancestor
depends on the effective size of the population, and how frequently two (or more) alleles are
descended from the same allele in the preceding generation depends on the effective size of
the population, too. But in any finite population the pattern will look something like the
one I’ve illustrated here.

Mathematics of the coalescent: two alleles


J. F. C. Kingman developed a convenient and powerful way to describe how the time to
common ancestry is related to effective population size [45, 46]. The process he describes is
referred to as the coalescent, because it is based on describing the probability of coalescent
events, i.e., those points in the genealogy of a sample of alleles where two alleles are descended
from the same allele in the immediately preceding generation.1 Let’s consider a simple case,
one that we’ve already seen, first, i.e., two alleles drawn at random from a single populations.
The probability that two alleles drawn at random from a population are copies of the same
allele in the preceding generation is also the probability that two alleles drawn at random
1
An important assumption of the coalescent is that populations are large enough that we can ignore the
possibility that there is more than one coalescent event in a single generation. We also only allow coalescence
between a pair of alleles, not three or more. In both ways the mathematical model of the process differs
from the diagram in Figure 15.1.

130
from that population are identical by descent with respect to the immediately preceding
generation. We know what that probability is,2 namely

1
(f )
.
2Ne

I’ll just use Ne from here on out, but keep in mind that the appropriate population size
for use with the coalescent is the inbreeding effective size. Of course, this means that the
probability that two alleles drawn at random from a population are not copies of the same
allele in the preceding generation is
1
1− .
2Ne

We’d like to calculate the probability that a coalescent event happened at a particular time
t, in order to figure out how far back in the ancestry of these two alleles we have to go before
they have a common ancestor. How do we do that?
Well, in order for a coalescent event to occur at time t, the two alleles must have not have
coalesced in the generations preceding that.3 The probability that they did not coalesce in
the first t − 1 generations is simply

t−1
1

1− .
2Ne

Then after having remained distinct for t − 1 generations, they have to coalesce in generation
t, which they do with probability 1/2Ne . So the probability that two alleles chosen at random
coalesced t generations ago is

t−1 
1 1
 
P (T = t) = 1 − . (15.1)
2Ne 2Ne

It’s not too hard to show, once we know the probability distribution in equation (15.1), that
the average time to coalescence for two randomly chosen alleles is 2Ne .4

2
Though you may not remember it.
3
Remember that we’re counting generations backward in time, so when I say that a coalescent event
occurred at time t I mean that it occurred t generations ago.
4
If you’ve had a little bit of probability theory, you’ll notice that equation 15.1 shows that the coalescence
time is a geometric random variable.

131
Mathematics of the coalescent: multiple alleles
It’s quite easy to extend this approach to multiple alleles.5 We’re interested in seeing how far
back in time we have to go before all alleles are descended from a single common ancestor.
We’ll assume that we have m alleles in our sample. The first thing we have to calculate
is the probability that any two of the alleles in our sample are identical by descent from
the immediately preceding generation. To make the calculation easier, we assume that
the effective size of the population is large enough that the probability of two coalescent
events in a single generation is vanishingly small. We already know that the probability
of a coalescence in the immediately preceding generation for two randomly chosen alleles is
1/2Ne . But there are m(m − 1)/2 different pairs of alleles in our sample. So the probability
that one pair of these alleles is involved in a coalescent event in the immediately preceding
generation is !
1 m(m − 1)
 
.
2Ne 2
From this it follows6 that the probability that the first coalescent event involving this sample
of alleles occurred t generations ago is
!!t−1  !
1 m(m − 1) 1 m(m − 1)
  
P (T = t) = 1 − . (15.2)
2Ne 2 2Ne 2
So the mean time back to the first coalescent event is
2Ne 4Ne
= generations .
m(m − 1)/2 m(m − 1)
But this is, of course, only the first coalescent event. We were interested in how long
we have to wait until all alleles are descended from a single common ancestor. Now this
is where Kingman’s sneaky trick comes in. After the first coalescent event, we have m − 1
alleles in our sample, instead of m. So the whole process starts over again with m − 1 alleles
instead of m. Since the time to the first coalescence depends only on the number of alleles
in the sample and not on how long the first coalescence event took, we can calculate the
average time until all coalescences have happened as
m
X
t̄ = t̄k
k=2

5
Okay, okay. What I should really have said is “It’s not too hard to extend this approach to multiple
alleles.”
6
Using logic just like what we used in the two allele case.

132
m
X 4Ne
=
k=2 k(k − 1)
TAMO
1
 
= 4Ne 1 −
m
≈ 4Ne

An example: Mitochondrial Eve


Cann et al. [8] sampled mitochondrial DNA from 147 humans of diverse racial and geographic
origins. Based on the amount of sequence divergence they found among genomes in their
sample and independent estimates of the rate of sequence evolution, they inferred that the
mitochondria in their sample had their most recent common ancestor about 200,000 years
ago. Because all of the most ancient lineages in their sample were from individuals of
African ancestry, they also suggested that mitochondrial Eve lived in Africa. They used
these arguments as evidence for the “Out of Africa” hypothesis for modern human origins,
i.e., the hypothesis that anatomically modern humans arose in Africa about 200,000 years
ago and displaced other members of the genus Homo in Europe and Asia as they spread.
What does the coalescent tell us about their conclusion?
Well, we expect all mitochondrial genomes in the sample to have had a common ancestor
about 2Ne generations ago. Why 2Ne rather than 4Ne ? Because mitochondrial genomes are
haploid. Furthermore, since we all got our mitochondria from our mothers, Ne in this case
refers to the effective number of females.
Given that a human generation is about 20 years, a coalescence time of 200,000 years
implies that the mitochondrial genomes in the Cann et al. sample have their most recent
common ancestor about 10,000 generations ago. If the effective number of females in the
human populations is 5000, that’s exactly what we’d expect. While 5000 may sound awfully
small, given that there are more than 3 billion women on the planet now, remember that
until the recent historical past (no more than 500 generations ago) the human population was
small and humans lived in small hunter-gatherer groups, so an effective number of females
of 5000 and a total effective size of 10,000 may not be unreasonable. If that’s true, then
the geographical location of mitochondrial Eve need not tell us anything about the origin
of modern human populations, because there had to be a coalescence somewhere. There’s
no guarantee, from this evidence alone, that the Y-chromosome Adam would have lived
in Africa, too. Having said that, my limited reading of the literature suggests that other
dara are consistent with the “Out of Africa” scenario. Y-chromosome polymorphisms, for
example, are also consistent with the “Out of Africa” hypothesis [79]. Interestingly, dating of
those polymorphisms suggetsts that Y-chromosome Adam left Africa 35,000 – 89,000 years

133
ago.

The coalescent and F -statistics


Suppose we have a sample of alleles from a structured population. For alleles chosen ran-
domly within populations let the average time to coalescence be t̄0 . For alleles chosen
randomly from different populations let the average time to coalescence be t̄1 . If there are k
populations in our sample, the average time to coalescence for two alleles drawn at random
without respect to population is7

k(k − 1)t̄1 + k t̄0


t̄ = .
k
Slatkin [74] pointed out that Fst bears a simple relationship to average coalescence times
within and among populations. Given these definitions of t̄ and t̄0 ,

t̄ − t̄0
Fst = .

So another way to think about Fst is as a measure of the proportional increase in coalescence
time that is due to populations being separate from one another. One way to think about
that relationship is this: the longer it has been, on average, since alleles in different popu-
lations diverged from a common ancestor, the greater the chances that they have become
different. An implication of this relationship is that F -statistics, by themselves, can tell
us something about how recently populations have been connected, relative to the within-
population coalescence time, but they can’t distinguish between recent common ancestry
that is due to lots of migration among populations and recent common ancestry that is due
to a recent split between populations.
A given pattern of among-population relationships might reflect a migration-drift equi-
librium, a sequence of population splits followed by genetic isolation, or any combination of
the two. If we are willing to assume that populations in our sample have been exchanging
genes long enough to reach stationarity in the drift-migration process, then Fst may tell us
something about migration. If we are willing to assume that there’s been no gene exchange
among our populations, we can infer something about how recently they’ve diverged from
one another. But unless we’re willing to make one of those assumptions, we can’t really say
anything.

7
If you don’t see why, don’t worry about it. You can ask if you really care. We only care about t̄ for
what follows anyway.

134
Part IV

Quantitative genetics

135
Chapter 16

Introduction to quantitative genetics

Woltereck’s ideas force us to realize that when we see a phenotypic difference between two
individuals in a population there are three possible explanations for that difference:

1. The individuals have different genotypes.

2. The individuals developed in different environments.

3. The individuals have different genotypes and they developed in different environments.

This leads us naturally to think that phenotypic variation consists of two separable compo-
nents, namely genotypic and environmental components.1 Putting that into an equation

Var(P ) = Var(G) + Var(E) ,

where Var(P ) is the phenotypic variance, Var(G) is the genetic variance, and Var(E) is
the environmental variance.2 As we’ll see in just a moment, we can also partition the
genetic variance into components, the additive genetic variance, Var(A), and the dominance
variance, Var(D).3
There’s a surprisingly subtle and important insight buried in that very simple equation:
Because the expression of a quantitative trait is a result both of genes involved in that
trait’s expression and the environment in which it is expressed, it doesn’t make sense to say
of a particular individual’s phenotype that genes are more important than environment in
1
We’ll soon see that separating genotypic and environmental components is far from trivial.
2
Strictly speaking we should also include a term for the interaction between genotype and environment,
but we’ll ignore that for the time being.
3
We could even partition it further into additive by additive, additive by dominance, and dominance by
dominance epistatic variance, but let’s not go there.

137
determining it. You wouldn’t have a phenotype without both. What we might be able to say
is that when we look at a particular population of organisms some fraction of the phenotypic
differences among them is due to differences in the genes they carry and that some fraction
is due to differences in the environment they have experienced.4
One important implication of this insight is that much of the “nature vs. nurture” debate
concerning human intelligence or human personality characteristics is misguided. The intel-
ligence and personality that you have is a product of both the genes you happened to inherit
and the environment that you happened to experience. Any differences between you and the
person next to you probably reflect both differences in genes and differences in environment.
Moreover, just because you have a genetic pre-disposition for a particular condition doesn’t
mean you’re stuck with it.
Take phenylketonuria, for example. It’s a condition in which individuals are homozygous
for a deficiency that prevents them from metabolizing phenylalanine (http://www.nlm.
nih.gov/medlineplus/phenylketonuria.html). If individuals with phenylketonuria eat a
normal diet, severe mental can result by the time an infant is one year old. But if they eat
a diet that is very low in phenylalanine, their development is completely normal.
It’s often useful to talk about how much of the phenotypic variance is a result of additive
genetic variance or of genetic variance.

Var(A)
h2n =
Var(P )

is what’s known as the narrow-sense heritability. It’s the proportion of phenotypic variance
that’s attributable to differences among individuals in their additive genotype,5 much as Fst
can be thought of as the proportion of genotypic diversity that attributable to differences
among populations. Similarly,
Var(G)
h2b =
Var(P )
is the broad-sense heritability. It’s the proportion of phenotypic variance that’s attributable
to differences among individuals in their genotype. It is not, repeat NOT, a measure of how
important genes are in determining phenotype. Every individuals phenotype is determined
both by its genes and by its phenotype. It measures how much of the difference among
individuals is attributable to differences in their genes.6 Why bother to make the distinction
between narrow- and broad-sense heritability? Because, as we’ll see, it’s only the additive
4
When I put it this way, I hope it’s obvious that I’m neglecting genotype-environment interactions, and
that I’m oversimplifying quite a bit.
5
Don’t worry about what I mean by additive genotype — yet. We’ll get to it soon enough.
6
As we’ll see later it can do this only for the range of environments in which it was measured.

138
Genotype A1 A1 A1 A2 A2 A2
Frequency p2 2pq q2
Genotypic value x11 x12 x22
Additive genotypic value 2α1 α1 + α2 2α2

Table 16.1: Fundamental parameter definitions for quantitative genetics with one locus and
two alleles.

genetic variance that responds to natural selection.7 In fact,

R = h2n S ,

where R is the response to selection and S is the selective differential.


As you’ll see in the coming weeks, there’s a lot of stuff hidden behind these simple
equations, including a lot of assumptions. But quantitative genetics is very useful. Its
principles have been widely applied in plant and animal breeding for almost a century, and
they have been increasingly applied in evolutionary investigations in the last forty years.
Nonetheless, it’s useful to remember that quantitative genetics is a lot like a bikini. What
it reveals is interesting, but what it conceals is crucial.

Partitioning the phenotypic variance


Before we worry about how to estimate any of those variance components I just mentioned,
we first have to understand what they are. So let’s start with some definitions (Table 16.1).8
You should notice something rather strange about Table 16.1 when you look at it. I
motivated the entire discussion of quantitative genetics by talking about the need to deal
with variation at many loci, and what I’ve presented involves only two alleles at a single
locus. I do this for two reasons:

1. It’s not too difficult to do the algebra with multiple alleles at one locus instead of only
two, but it gets messy, doesn’t add any insight, and I’d rather avoid the mess.
7
Or at least only the additive genetic variance responds to natural selection when zygotes are found in
Hardy-Weinberg proportions.
8
Warning! There’s a lot of algebra between here and the end. It’s unavoidable. You can’t possibly
understand what additive genetic variance is without it. I’ll try to focus on principles, and I’ll do my best
to keep reminding us all why we’re slogging through the algebra, but a lot of the algebra that follows is
necessary. Sorry about that.

139
2. Doing the algebra with multiple loci involves a lot of assumptions, which I’ll mention
when we get to applications, and the algebra is even worse than with multiple alleles.

Fortunately, the basic principles extend with little modification to multiple loci, so we can
see all of the underlying logic by focusing on one locus with two alleles where we have a
chance of understanding what the different variance components mean.
Two terms in Table 16.1 will almost certainly be unfamiliar to you: genotypic value and
additive genotypic value. Of the two, genotypic value is the easiest to understand (Fig-
ure 16.1). It simply refers to the average phenotype associated with a given genotype.9 The
additive genotypic value refers to the average phenotype associated with a given genotype, as
would be inferred from the additive effect of the alleles of which it is composed. That didn’t
help much, did it? That’s because I now need to tell you what we mean by the additive
effect of an allele.10

The additive effect of an allele


In constructing Table 16.1 I used the quantities α1 and α2 , but I didn’t tell you where
they came from. Obviously, the idea should be to pick values of α1 and α2 that give additive
genotypic values that are reasonably close to the genotypic values. A good way to do that is to
minimize the squared deviation between the two, weighted by the frequency of the genotypes.
So our first big assumption is that genotypes are in Hardy-Weinberg proportions.11
The objective is to find values for α1 and α2 that minimize:

a = p2 [x11 − 2α1 ]2 + 2pq[x12 − (α1 + α2 )]2 + q 2 [x22 − 2α2 ]2 .

To do this we take the partial derivative of a with respect to both α1 and α2 , set the resulting
pair of equations equal to zero, and solve for α1 and α2 .12
∂a
= p2 {2[x11 − 2α1 ][−2]} + 2pq{2[x12 − (α1 + α2 )][−1]}
∂α1
= −4p2 [x11 − 2α1 ] − 4pq[x12 − (α1 + α2 )]
9
Remember. We’re now considering traits in which the environment influences the phenotypic expression,
so the same genotype can produce different phenotypes, depending on the environment in which it develops.
10
Hold on. Things get even more interesting from here.
11
As you should have noticed in Table 16.1.
12
We won’t bother with proving that the resulting estimates produce the minimum possible value of a.
Just take my word for it. Or if you don’t believe me and know a little calculus, take the second partials
of a and evaluate it with the values of α1 and α2 substituted in. You’ll find that the resulting matrix of
partial derivatives, the Hessian matrix, is positive definite, meaning that we’ve found values that minimize
the value of a.

140
0.8
0.6
0.4
0.2
0.0

0 1 2 3 4

Phenotype

Figure 16.1: The phenotype distribution in a population in which the three genotypes at a
single locus with two alleles occur in Hardy-Weinberg proportions and the alleles occur in
equal frequency. The A1 A1 genotype has a mean trait value of 1, the A1 A2 genotype has a
mean trait value of 2, and the A2 A2 genotype has a mean trait value of 3, but each genotype
can produce a range of phenotypes with the standard deviation of the distribution being
0.25 in each case.

141
∂a
= q 2 {2[x22 − 2α2 ][−2]} + 2pq{2[x12 − (α1 + α2 )][−1]}
∂α2
= −4q 2 [x22 − 2α2 ] − 4pq[x12 − (α1 + α2 )]
∂a ∂a
Thus, ∂α1
= ∂α2
= 0 if and only if

p2 (x11 − 2α1 ) + pq(x12 − α1 − α2 ) = 0


q 2 (x22 − 2α2 ) + pq(x12 − α1 − α2 ) = 0 (16.1)
Adding the equations in (16.1) we obtain (after a little bit of rearrangement)
[p2 x11 + 2pqx12 + q 2 x22 ] − [p2 (2α1 ) + 2pq(α1 + α2 ) + q 2 (2α2 )] = 0 . (16.2)
Now the first term in square brackets is just the mean phenotype in the population, x̄.
Thus, we can rewrite equation (16.2) as:
x̄ = 2p2 α1 + 2pq(α1 + α2 ) + 2q 2 α2
= 2pα1 (p + q) + 2qα2 (p + q)
= 2(pα1 + qα2 ) . (16.3)
Now divide the first equation in (16.1) by p and the second by q.
p(x11 − 2α1 ) + q(x12 − α1 − α2 ) = 0 (16.4)
q(x22 − 2α2 ) + p(x12 − α1 − α2 ) = 0 . (16.5)
Thus,
px11 + qx12 = 2pα1 + qα1 + qα2
= α1 (p + q) + pα1 + qα2
= α1 + pα1 + qα2
= α1 + x̄/2
α1 = px11 + qx12 − x̄/2 .
Similarly,
px12 + qx22 = 2qα2 + pα1 + pα2
= α2 (p + q) + pα1 + qα2
= α2 + pα1 + qα2
= α2 + x̄/2
α2 = px12 + qx22 − x̄/2 .

142
α1 is the additive effect of allele A1 , and α2 is the additive effect of allele A2 . If we use these
expressions, the additive genotypic values are as close to the genotypic values as possible,
given the particular allele freequencies in the population.13

Components of the genetic variance


Let’s assume for the moment that we can actually measure the genotypic values. Later, we’ll
relax that assumption and see how to use the resemblance among relatives to estimate the
genetic components of variance. But it’s easiest to see where they come from if we assume
that the genotypic value of each genotype is known. If it is then, writing Vg for Var(G)

Vg = p2 [x11 − x̄]2 + 2pq[x12 − x̄]2 + q 2 [x22 − x̄]2 (16.6)


2 2 2
= p [x11 − 2α1 + 2α1 − x̄] + 2pq[x12 − (α1 + α2 ) + (α1 + α2 ) − x̄]
+q 2 [x22 − 2α2 + 2α2 − x̄]2
= p2 [x11 − 2α1 ]2 + 2pq[x12 − (α1 + α2 )]2 + q 2 [x22 − 2α2 ]2
+p2 [2α1 − x̄]2 + 2pq[(α1 + α2 ) − x̄]2 + q 2 [2α2 − x̄]2
+p2 [2(x11 − 2α1 )(2α1 − x̄)] + 2pq[2(x12 − {α1 + α2 })({α1 + α2 } − x̄)]
+q 2 [2(x22 − 2α2 )(2α2 − x̄)] . (16.7)

There are two terms in (16.7) that have a biological (or at least a quantitative genetic)
interpretation. The term on the first line is the average squared deviation between the
genotypic value and the additive genotypic value. It will be zero only if the effects of the
alleles can be decomposed into strictly additive components, i.e., only if the pheontype of the
heterozygote is exactly intermediate between the phenotype of the two homozygotes. Thus,
it is a measure of how much variation is due to non-additivity (dominance) of allelic effects.
In short, the dominance genetic variance, Vd , is

Vd = p2 [x11 − 2α1 ]2 + 2pq[x12 − (α1 + α2 )]2 + q 2 [x22 − 2α2 ]2 . (16.8)

Similarly, the term on the second line of (16.7) is the average squared deviation between
the additive genotypic value and the mean genotypic value in the population. Thus, it is
a measure of how much variation is due to differences between genotypes in their additive
genotype. In short, the additive genetic variance, Va , is

Va = p2 [2α1 − x̄]2 + 2pq[(α1 + α2 ) − x̄]2 + q 2 [2α2 − x̄]2 . (16.9)


13
If you’ve been paying close attention and you have a good memory, the expressions for α1 and α2 may
look vaguely familiar. They look a lot like the expressions for marginal fitnesses we encountered when
studying viability selection.

143
What about the terms on the third and fourth lines of the last equation in 16.7? Well, they
can be rearranged as follows:

p2 [2(x11 − 2α1 )(2α1 − x̄)] + 2pq[2(x12 − {α1 + α2 })({α1 + α2 } − x̄)]


+q 2 [2(x22 − 2α2 )(2α2 − x̄)]
= 2p2 (x11 − 2α1 )(2α1 − x̄) + 4pq[x12 − (α1 + α2 )][(α1 + α2 ) − x̄)]
+2q 2 (x22 − 2α2 )(2α2 − x̄)
= 4p2 (x11 − 2α1 )[α1 − (pα1 + qα2 )]
+4pq[x12 − (α1 + α2 )][(α1 + α2 ) − 2(pα1 + qα2 )]
+4q 2 (x22 − 2α2 )[α2 − (pα1 + qα2 )]
= 4p[α1 − (pα1 + qα2 )][p(x11 − 2α1 ) + q(x12 − {α1 + α2 })]
+4q[α2 − (pα1 + qα2 )][p(x11 − 2α1 )p + q(x12 − {α1 + α2 })]
= 0

Where we have used the identities x̄ = 2(pα1 + qα2 ) [see equation (16.3)] and

p(x11 − 2α1 ) + q(x12 − α1 − α2 ) = 0


q(x22 − 2α2 ) + p(x12 − α1 − α2 ) = 0

[see equations (16.4) and (16.5)]. In short, we have now shown that the total genotypic
variance in the population, Vg , can be subdivided into two components — the additive genetic
variance, Va , and the dominance genetic variance, Vd . Specifically,

Vg = Va + Vd ,

where Vg is given by the first line of (16.6), Va by (16.9), and Vd by (16.8).

An alternative expression for Va


There’s another way to write the expression for Va when there are only two alleles at a locus.
I show it here because it comes in handy some times.

Va = p2 (2α1 )2 + 2pq(α1 + α2 )2 + q 2 (2α2 )2 − 4(pα1 + qα2 )2


= 4p2 α12 + 2pq(α1 + α2 )2 + 4q 2 α22 − 4(p2 α12 + 2pqα1 α2 + q 2 α22 )
= 2pq[(α1 + α2 )2 − 4α1 α2 ]
= 2pq[(α12 + 2α1 α2 + α22 ) − 4α1 α2 ]
= 2pq[α12 − 2α1 α2 + α22 ]

144
Genotype A1 A1 A1 A2 A2 A2
Genotypic value 0 1 2

Table 16.2: A set of perfectly additive genotypic values. Note that the genotypic value of
the heterozygote is exactly halfway between the genotypic values of the two homozygotes.

= 2pq[α1 − α2 ]2
= 2pqα2

An example: the genetic variance with known genotypes


We’ve been through a lot of algebra by now. Let’s run through a couple of numerical
examples to see how it all works. For the first one, we’ll use the set of genotypic values in
Table 16.2
For p = 0.4
x̄ = (0.4)2 (0) + 2(0.4)(0.6)(1) + (0.6)2 (2)
= 1.20

α1 = (0.4)(0) + (0.6)(1) − (1.20)/2


= 0.0
α2 = (0.4)(1) + (0.6)(2) − (1.20)/2
= 1.0

Vg = (0.4)2 (0 − 1.20)2 + 2(0.4)(0.6)(1 − 1.20)2 + (0.6)2 (2 − 1.20)2


= 0.48
Va = (0.4)2 [2(0.0) − 1.20]2 + 2(0.4)(0.6)[(0.0 + 1.0) − 1.20]2 + (0.6)2 [2(1.0) − 1.20]2
= 0.48
Vd = (0.4)2 [0 − 2(0.0)]2 + 2(0.4)(0.6)[1 − (0.0 + 1.0)]2 + (0.6)2 [2 − 2(1.0)]2
= 0.00 .
For p = 0.2, x̄ = 1.60, Vg = Va = 0.32, Vd = 0.00. You should verify for yourself that α1 = 0
and α2 = 1 for p = 0.2. If you are ambitious, you could try to prove that α1 = 0 and α2 = 1
for any allele frequency.

145
Genotype A1 A1 A1 A2 A2 A2
Genotypic value 0 0.8 2

Table 16.3: A set of non-additive genotypic values. Note that the genotypic value of the
heterozygote is closer to the genotypic value of A1 A1 than it is to the genotypic value of
A2 A2 .

For the second example we’ll use the set of genotypic values in Table 16.3.
For p = 0.4

x̄ = (0.4)2 (0) + 2(0.4)(0.6)(0.8) + (0.6)2 (2)


= 1.104

α1 = (0.4)(0) + (0.6)(0.8) − (1.104)/2


= −0.072
α2 = (0.4)(0.8) + (0.6)(2) − (1.104)/2
= 0.968

Vg = (0.4)2 (0 − 1.104)2 + 2(0.4)(0.6)(0.8 − 1.104)2 + (0.6)2 (2 − 1.104)2


= 0.5284
Va = (0.4)2 [2(−0.072) − 1.104]2 + 2(0.4)(0.6)[(−0.072 + 0.968) − 1.104]2
+(0.6)2 [2(0.968) − 1.104]2
= 0.5192
Vd = (0.4)2 [0 − 2(−0.072)]2 + 2(0.4)(0.6)[0.8 − (−0.072 + 0.968)]2
+(0.6)2 [2 − 2(0.968)]2
= 0.0092 .

To test your understanding, it would probably be useful to calculate x̄, α1 , α2 , Vg , Va ,


and Vd for one or two other allele frequencies, say p = 0.2 and p = 0.8. Is it still true that
α1 and α2 are independent of allele frequencies? If you are really ambitious you could try to
prove that α1 and α2 are independent of allele frequencies if and only if x12 = (x11 + x12 )/2,
i.e., when heterozygotes are exactly intermediate.

146
Chapter 17

Resemblance among relatives

Just as individuals may differ from one another in phenotype because they have different
genotypes, because they developed in different environments, or both, relatives may resemble
one another more than they resemble other members of the population because they have
similar genotypes, because they developed in similar environments, or both. In an experi-
mental situation, we may be able to randomize individuals across environments. Under those
circumstances any tendency for relatives to resemble one another more than non-relatives
must be due to similarities in their genotypes.
Using this insight, we can develop a statistical technique that allows us to determine how
much of the variance among individuals in phenotype is a result of genetic variance and how
much is due to environmental variance. Remember, we can only ask about how much of the
variability is due to genetic differences, and we can only do so in a particular environment
and with a particular set of genotypes, and we can only do it when we randomize genotypes
across environments.

An outline of the approach


The basic approach to the analysis is either to use a linear regression of offspring phenotype on
parental phenotype, which as we’ll see estimates h2n , or to use a nested analysis of variance.
One of the most complete designs is a full-sib, half-sib design in which each male sires
offspring from several dams but each dam mates with only one sire.
The offspring of a single dam are full-sibs (they are nested within dams). Differences
among the offspring of dams indicates that there are differences in maternal “genotype” in
the trait being measured.1
1
Assuming that we’ve randomized siblings across environments. If we haven’t, siblings may resemble one

147
Maternal Offspring genotype
genotype Frequency A1 A1 A1 A2 A2 A2
A1 A1 p2 p q 0
p 1 q
A1 A2 2pq 2 2 2
A2 A2 q2 0 p q

Table 17.1: Half-sib family structure in a population with genotypes in Hardy-Weinberg


proportions.

The offspring of different dams mated to a single sire are half-sibs. Differences among
the offspring of sires indicates that thee are differences in paternal “genotype” in the trait
being measured.2
As we’ll see, this design has the advantage that it allows both additive and dominance
components of the genetic variance to be estimated. It has the additional advantage that
we don’t have to assume that the distribution of environments in the offspring generation is
the same as it was in the parental generation.

The gory details


OK, so I’ve given you the basic idea. Where does it come from, and how does it work?
Funny you should ask. The whole approach is based on calculations of the degree to which
different relatives resemble one another. For these purposes we’re going to continue our focus
on phenotypes influenced by one locus with two alleles, and we’ll do the calculations in detail
only for half sib families. We start with something that may look vaguely familiar.3 Take a
look at Table 17.1.
Note also that the probabilities in Table 17.1 are appropriate only if the progeny are from
half-sib families. If the progeny are from full-sib families, we must specify the frequency of
each of the nine possible matings (keeping track of the genotype of both mother and father)
and the offspring that each will produce.4

another because of similarities in the environment they experienced, too.


2
You’ll see the reason for the quotes around genotype in this paragraph and the last a little later. It’s a
little more complex than what I’ve suggested.
3
Remember our mother-offspring combinations with Zoarces viviparus?
4
To check your understanding of all of this, you might want to try to produce the appropriate table.

148
Covariance of two random variables
Let pxy be the probability that random variable X takes the value x and random variable Y
takes the value y. Then the covariance between X and Y is:
X
Cov(X, Y ) = pxy (x − µx )(y − µy ) ,

where µx is the mean of X and µy is the mean of Y .

Covariance between half-siblings


Here’s how we can calculate the covariance between half-siblings: First, imagine selecting
huge number of half-sibs pairs at random. The phenotype of the first half-sib in the pair
is a random variable (call it S1 ), as is the phenotype of the second (call it S2 ). The mean
of S1 is just the mean phenotype in all the progeny taken together, x̄. Similarly, the mean
of S2 is just x̄. Now with one locus, two alleles we have three possible phenotypes: x11
(corresponding to the genotype A1 A1 ), x12 (corresponding to the genotype A1 A2 ), and x22
(corresponding to the genotype A2 A2 ). So all we need to do to calculate the covariance
between half-sibs is to write down all possible pairs of phenotypes and the frequency with
which they will occur in our sample of randomly chosen half-sibs based on the frequenices in
Table 17.1 above and the frequency of maternal genotypes. It’s actually a bit easier to keep
track of it all if we write down the frequency of each maternal genotype and the frequency
with which each possible phenotypic combination will occur in her progeny.

Cov(S1 , S2 ) = p2 [p2 (x11 − x̄)2 + 2pq(x11 − x̄)(x12 − x̄) + q 2 (x12 − x̄)2 ]


1 1 1
+2pq[ p2 (x11 − x̄)2 + p(x11 − x̄)(x12 − x̄) + pq(x11 − x̄)(x22 − x̄)
4 2 2
1 1 1 2
+ (x12 − x̄) + q(x12 − x̄)(x22 − x̄) + q (x22 − x̄)2 ]
2
4 2 4
+q 2 [p2 (x12 − x̄)2 + 2pq(x12 − x̄) + q 2 (x22 − x̄)]
= p2 [p(x11 − x̄) + q(x12 − x̄)]2
1 1 1 1
+2pq[ p(x11 − x̄) + q(x12 − x̄) + p(x12 − x̄) + q(x22 − x̄)]2
2 2 2 2
2 2
+q [p(x12 − x̄) + q(x22 − x̄)]
= p2 [px11 + qx12 − x̄]2
2
1 1

+2pq (px11 + qx12 − x̄) + (px12 + qx22 − x̄)
2 2
2 2
+q [px12 + qx22 − x̄]

149
Genotype A1 A1 A1 A2 A2 A2
Phenotype 0 0.8 2

Table 17.2: An example of a non-additive relationship between genotypes and phenotypes.

Maternal Offspring genotype


genotype Frequency A1 A1 A1 A2 A2 A2
A1 A1 0.16 0.4 0.6 0.0
A1 A2 0.48 0.2 0.5 0.3
A2 A2 0.36 0.0 0.4 0.6

Table 17.3: Mother-offspring combinations (half-sib) when the frequency of A1 is 0.4.

x̄ 2
2
x̄ 1 x̄ 1 x̄ 2
    
= p2 α 1 − (α1 − ) + (α2 − ) + q 2 α2 −
+ 2pq
2 2 2 2 2 2
2 2 2
2 1 1 2 1
  
= p (2α1 − x̄) + 2pq (α1 + α2 − x̄) + q (2α2 − x̄)
2 2 2
1
 h i
= p2 (2α1 − x̄)2 + 2pq[(α1 + α2 − x̄)]2 + q 2 (2α2 − x̄)2
4
1
 
= Va
4

A numerical example
Now we’ll return to an example we saw earlier (Table 17.2). This set of genotypes and
phenotypes may look familiar. It is the same one we encountered earlier when we calculated
additive and dominance components of variance. Let’s assume that p = 0.4. Then we know
that
x̄ = 1.104
Va = 0.5192
Vd = 0.0092 .
We can also calculate the numerical version of Table 17.1, which you’ll find in Table 17.3.
So now we can follow the same approach we did before and calculate the numerical value
of the covariance between half-sibs in this example:
Cov(S1 , S2 ) = [(0.4)2 (0.16) + (0.2)2 (0.48)](0 − 1.104)2

150
MZ twins (CovM Z ) V
 a + Vd
1
Parent-offspring (CovP O )1 2
Va
   
1 1
Full sibs (CovF S ) V + Vd
2 a 4
1
Half sibs (CovHS ) 4
Va
1
One parent or mid-parent.

Table 17.4: Genetic covariances among relatives.

+[(0.6)2 (0.16) + (0.5)2 (0.48) + (0.4)2 (0.36)](0.8 − 1.104)2


+[(0.3)2 (0.48) + (0.6)2 (0.36)](2 − 1.104)2
+2[(0.4)(0.6)(0.16) + (0.2)(0.5)(0.48)](0 − 1.104)(0.8 − 1.104)
+2(0.2)(0.3)(0.48)(0 − 1.104)(2 − 1.104)
+2[(0.5)(0.3)(0.48) + (0.4)(0.6)(0.36)](0.8 − 1.104)(2.0 − 1.104)
= 
0.1298
1

= 0.5192 .
4

Covariances among relatives


Well, if we can do this sort of calculation for half-sibs, you can probably guess that it’s also
possible to do it for other relatives. I won’t go through all of the calculations, but the results
are summarized in Table 17.4

Estimating heritability
Galton introduced the term regression to describe the inheritance of height in humans. He
noted that there is a tendency for adult offspring of tall parents to be tall and of short parents
to be short, but he also noted that offspring tended to be less extreme than the parents. He
described this as a “regression to mediocrity,” and statisticians adopted the term to describe
a standard technique for describing the functional relationship between two variables.

Regression analysis
Measure the parents. Regress the offspring phenotype on: (1) the phenotype of one parent
or (2) the mean of the parental parental phenotypes. In either case, the covariance between

151
 
1
the parental phenotype and the offspring genotype is 2
Va . Now the regression coefficient
between one parent and offspring, bP →O , is
CovP O
bP →O =
Var(P )
 
1
2
Va
=
Vp
1 2
 
= h .
2 N
In short, the slope of the regression line is equal to one-half the narrow sense heritability. In
the regression of offspring on mid-parent value,
M +F
 
Var(M P ) = Var
2
1
= Var(M + F )
4
1
= (V ar(M ) + V ar(F ))
4
1
= (2Vp )
4
1
= Vp .
2
Thus, bM P →O = 12 Va / 12 Vp = h2N . In short, the slope of the regression line is equal to the
narrow sense heritability.

Sib analysis
Mate a number of males (sires) with a number of females (dams). Each sire is mated to
more than one dam, but each dam mates only with one sire. Do an analysis of variance on
the phenotype in the progeny, treating sire and dam as main effects. The result is shown in
Table 17.5.
2 2
Now we need some way to relate the variance components (σW , σD , and σS2 ) to Va , Vd ,
5
and Ve . How do we do that? Well,
Vp = σT2 = σS2 + σD
2 2
+ σW .
5 2 2
σW , σD , and σS2 are often referred to as the observational components of variance, because they are
estimated from observations we make on phenotypic variation. Va , Vd , and Ve are often referred to as the
causal components of variance, because they represent the gentic and environmental influences on trait
expression.

152
Composition of
Source d.f. Mean square mean square
2 2
Among sires s−1 M SS σW + kσD + dkσs2
2 2
Among dams s(d − 1) M SD σW + kσD
(within sires)
2
Within progenies sd(k − 1) M SW σW
s = number of sires
d = number of dams per sire
k = number of offspring per dam

Table 17.5: Analysis of variance table for a full-sib analysis of quantitative genetic variation.

σS2 estimates the variance among the means of the half-sib familes fathered by each of the
different sires or, equivalently, the covariance among half-sibs.6

σS2 = CovHS
1
 
= Va .
4
2
Now consider the within progeny component of the variance, σW . In general, it can be shown
that any among group variance component is equal to the covariance among the members
within the groups.7 Thus, a within group component of the variance is equal to the total
variance minus the covariance within groups. In this case,
2
σW = Vp − CovF S
1 1
    
= Va + Vd + Ve − Va + Vd
2 4
1 3
   
= Va + Vd + Ve .
2 4
2 2
There remains only σD . Now σW = Vp − CovF S , σS2 = CovHS , and σT2 = Vp . Thus,
2
σD = σT2 − σS2 − σW
2

6
To see why consider this is so, consider the following: The mean genotypic value of half-sib families
with an A1 A1 mother is px11 + qx12 ; with an A1 A2 mother, px11 /2 + qx12 /2 + px12 /2 + qx22 /2; with an
A2 A2 mother, px12 + qx22 . The equation for the variance of these means is identical to the equation for the
covariance among half-sibs.
7
With xij = ai + ij , where ai is the mean group effect and ij is random effect on individual j in group i
(with mean 0), Cov(xij , xik ) = E(ai + ij − µ)(ai + ik − µ) = E((ai − µ2 ) + ai (ij + ik ) + ij ik = V ar(A).

153
= Vp − CovHS − (Vp − CovF S )
= CovF S − CovHS
1 1 1
      
= Va + Vd − Va
2 4 4
1 1
   
= Va + Vd .
4 4
So if we rearrange these equations, we can express the genetic components of the pheno-
typic variance, the causal components of variance, as simple functions of the observational
components of variance:
Va = 4σS2
2
Vd = 4(σD − σS2 )
2 2
Ve = σW − 3σD + σS2 .
Furthermore, the narrow-sense heritability is given by
4σs2
h2N = .
σS2 + σD
2 2
+ σW

An example: body weight in female mice


The analysis involves 719 offspring from 74 sires and 192 dams, each with one litter. The
offspring were spread over 4 generations, and the analysis is performed as a nested ANOVA
with the genetic analysis nested within generations. An additional complication is that the
design was unbalanced, i.e., unequal numbers of progeny were measured in each sibship. As
a result the degrees of freedom don’t work out to be quite as simple as what I showed you.8
The results are summarized in Table 17.6.
Using the expressions for the composition of the mean square we obtain
2
σW = M SW
= 2.19
1
 
2 2
σD = (M SD − σW )
k
= 2.47
1
 
2
σS = 0
(M SS − σW2
− k 0 σD
2
)
dk
= 0.48 .
8
What did you expect from real data? This example is extracted from Falconer and Mackay, pp. 169–170.
See the book for details.

154
Composition of
Source d.f. Mean square mean square
Among sires 70 17.10 σW + k 0 σD
2 2
+ dk 0 σs2
2 2
Among dams 118 10.79 σW + kσD
(within sires)
2
Within progenies 527 2.19 σW
d = 2.33
k = 3.48
k 0 = 4.16

Table 17.6: Quantitative genetic analysis of the inheritance of body weight in female mice
(from Falconer and Mackay, pp. 169–170.)

Thus,

Vp = 5.14
Va = 1.92
Vd + Ve = 3.22
Vd = (0.00—1.64)
Ve = (1.58—3.22)

Why didn’t I give a definite number for Vd after my big spiel above about how we can
estimate it from a full-sib crossing design? Two reasons. First, if you plug the estimates for
2
σD and σS2 into the formula above for Vd you get Vd = 7.96, Ve = −4.74, which is clearly
impossible since Vd has to be less than Vp and Ve has to be greater than zero. It’s a variance.
Second, the experimental design confounds two sources of resemblance among full siblings:
(1) genetic covariance and (2) environmental covariance. The full-sib families were all raised
by the same mother in the same pen. Hence, we don’t know to what extent their resemblance
is due to a common natal environment.9 If we assume Vd = 0, we can estimate the amount
of variance accounted for by exposure to a common natal environment, VEc = 1.99, and by
environmental variation within sibships, VEw = 1.23.10 Similarly, if we assume VEw = 0,
then Vd = 1.64 and VEc = 1.58. In any case, we can estimate the narrow sense heritability
9
Notice that this doesn’t affect our analysis of half-sib families, i.e., the progeny of different sires, since
each father was bred with several females
10
See Falconer for details.

155
as
1.92
 
h2N =
5.14
= 0.37 .

156
Chapter 18

Evolution of quantitative traits

Let’s stop and review quickly where we’ve come and where we’re going. We started our
survey of quantitative genetics by pointing out that our objective was to develop a way to
describe the patterns of phenotypic resemblance among relatives. The challenge was that we
wanted to do this for phenotypic traits that whose expression is influenced both by many
genes and by the environment in which those genes are expressed. Beyond the technical,
algebraic challenges associated with many genes, we have the problem that we can’t directly
associate particular genotypes with particular phenotypes. We have to rely on patterns of
phenotypic resemblance to tell us something about how genetic variation is transmitted.
Surprisingly, we’ve managed to do that. We now know that it’s possible to:

• Estimate the additive effect of an allele.1

• Partition the phenotypic variance into genotypic and environmental components and
to partition the genotypic variance into additive and dominance components.2

• Estimate all of the variance components from a combination of appropriate crossing


designs and appropriate statistical analyses.
1
Actually, we don’t know this. You’ll have to take my word for it that in certain breeding designs its
possible to estimate not only the additive genetic variance and the dominance genetic variance, but also the
actual additive effect of “alleles” that we haven’t even identified. We’ll see a more direct approach soon,
when we get to quantitative trait locus analysis.
2
I should point out that this is an oversimplification. I’ve mentioned that we typically assume that we can
simply add the effects of alleles across loci, but if you think about how genes actually work in organisms, you
realize that such additivity across loci isn’t likely to be very common. Strictly speaking there are epistatic
components to the genetic variance too, i.e., components of the genetic variance that have to do not with
the interaction among alleles at a single locus (the dominance variance that we’ve already encountered), but
with the interaction of alleles at different loci.

157
Now we’re ready for the next step: applying all of these ideas to the evolution of a
quantitative trait.

Evolution of the mean phenotype


We’re going to focus on how the mean phenotype in a population changes in response to nat-
ural selection, specifically in response to viability selection. Before we can do this, however,
we need to think a bit more carefully about the relationship between genotype, phenotype,
and fitness. Let Fij (x) be the probability that genotype Ai Aj has a phenotype smaller than
x.3 Then xij , the genotypic value of Ai Aj is
Z ∞
xij = xdFij (x)
−∞

and the population mean phenotype is p2 x11 +2pqx12 +q 2 x22 . If an individual with phenotype
x has fitness w(x), then the fitness of an individual with genotype Ai Aj is
Z ∞
wij = w(x)dFij (x)
−∞

and the mean fitness in the population is w̄ = p2 w11 + 2pqw12 + q 2 w22 .


Now, there’s a well known theorem from calculus known as Taylor’s theorem. It says
that for any function4 f (x)

(x − a)k
!
f (k) (a) .
X
f (x) = f (a) +
k=1 k!

Using this theorem we can produce an approximate expression describing how the mean
phenotype in a population will change in response to selection. Remember that the mean
phenotype, x̄, depends both on the underlying genotypic values and on the allele frequency.
So I’m going to write the mean phenotype as x̄(p) to remind us of that dependency.
!
0 0dx̄
x̄(p ) = x̄(p) + (p − p) + o(p2 )
dp

x̄(p) = p2 x11 + 2pqx12 + q 2 x22


3
For those of you who have had probability theory, Fij (x) is the cumulative distribution for the probability
density for phenotype associated with AI Aj .
4
Actually there are restrictions on the functions to which it applies, but we can ignore those restrictions
for our purposes.

158
dx̄(p)
= 2px11 + 2qx12 − 2px12 − 2qx22
dp
= 2 {(px11 + qx12 − x̄/2) + (px12 + qx22 − x̄/2)}
= 2 (α1 − α2 )

x̄(p0 ) ≈ x̄(p) + (p0 − p) (2(α1 − α2 ))

∆x̄ = (∆p) (2(α1 − α2 ))

Now you need to remember (from lo those many weeks ago) that

0 p2 w11 + pqw12
p = .

Thus,

∆p = p0 − p
p2 w11 + pqw12
= −p

p2 w11 + pqw12 − pw̄
=

pw11 + qw12 − w̄
 
= p .

Now,5 let’s do a linear regression of fitness on phenotype. After all, to make any further
progress, we need to relate phenotype to fitness, so that we can use the relationship between
phenotype and genotype to infer the change in allele frequencies, from which we will infer
the change in mean phenotype.6 From our vast statistical knowledge, we know that the slope
of this regression line is
Cov(w, x)
β1 =
Var(x)
and its intercept is
β0 = w̄ − β1 x̄ .
5
Since we’re having so much fun with mathematics why should we stop here?
6
Whew! That was a mouthful.

159
Let’s use this regression equation to determine the fitness of each genotype. This is only an
approximation to the true fitness,7 but it is adequate for many purposes.
Z ∞
wij = w(x)dFij (x)
−∞
Z ∞
≈ (β0 + β1 x)dFij (x)
−∞
= β0 + β1 xij
w̄ = β0 + β1 x̄ .

If we substitute this into our expression for ∆p above, we get


pw11 + qw12 − w̄
 
∆p = p
w̄ !
p(β0 + β1 x11 ) + q(β0 + β1 x12 ) − (β0 + β1 x̄)
= p

px11 + qx12 − x̄
 
= pβ1
w̄ !
α1 − x̄/2
= pβ1

!
α1 − (pα1 + qα2 )
= pβ1

pqβ1 (α1 − α2 )
= .

So where are we now?8 Let’s substitute this result back into the equation for ∆x̄. When we
do we get

∆x̄ = (∆p) (2(α1 − α2 ))


!
pqβ1 (α1 − α2 )
= (2(α1 − α2 ))

!
2 β1
= 2pqα

!
β1
= Va .

7
Specifically, we are implicitly assuming that the fitnesses are adequately approximated by a linear func-
tion of our phenotypic measure.
8
You don’t have to tell me where you wish you were. I can reliably guess that it’s not here.

160
This is great if we’ve done the regression between fitness and phenotype, but what if we
haven’t?9 Let’s look at Cov(w, x) in a little more detail.
Z ∞ Z ∞
2
Cov(w, x) = p xw(x)dF11 (x) + 2pq xw(x)dF12 (x)
−∞ −∞
Z ∞
+q 2 xw(x)dF22 (x) − x̄w̄
−∞
Z ∞ 
2
= p xw(x)dF11 (x) − x11 w̄ + x11 w̄
−∞
Z ∞ 
+2pq xw(x)dF11 (x) − x12 w̄ + x12 w̄
−∞
Z ∞ 
2
+q xw(x)dF22 (x) − x22 w̄ + x22 w̄
−∞
−x̄ w̄
Z ∞ 
2
= p xw(x)dF11 (x) − x11 w̄
−∞
Z ∞ 
+2pq xw(x)dF11 (x) − x12 w̄
−∞
Z ∞ 
2
+q xw(x)dF22 (x) − x22 w̄ .
−∞

Now
Z ∞ Z ∞ !
xw(x)
xw(x)dFij (x) − xij w̄ = w̄ dFij (x) − xij
−∞ −∞ w̄
= w̄(x∗ij − xij ) ,
where x∗ij refers to the mean phenotype of Ai Aj after selection. So
Cov(w, x) = p2 w̄(x∗11 − x11 ) + 2pq w̄(x∗12 − x12 )q 2 w̄(x∗22 − x22 )
= w̄(x̄∗ − x̄) ,
where x̄∗ is the population mean phenotype after selection. In short,10 combining our equa-
tions for the change in mean phenotype and for the covariance of fitness and phenotype and
remembering that β1 = Cov(w, x)/V ar(x)11
 w̄(x̄∗ −x̄) 
Vp
∆x̄ = Va  

9
Hang on just a little while longer. We’re almost there.
10
We finally made it.
11
You also need to remember that Var(x) = Vp , since they’re the same thing, the phenotypic variance.

161
Genotype A1 A1 A1 A2 A2 A2
Phenotype 1.303 1.249 0.948

Table 18.1: A simple example to illustrate response to selection in a quantitative trait.

= h2N (x̄∗ − x̄)

∆x̄ = x̄0 − x̄ is referred to as the response to selection and is often given the symbol R.
It is the change in population mean between the parental generation (before selection) and
the offspring beneration (before selection). x̄∗ − x̄ is referred to as the selection differential
and is often given the symbol S. It is the difference between the mean phenotype in the
parental generation before selection and the mean phenotype in the parental generation after
selection. Thus, we can rewrite our final equation as

R = h2N S .

This equation is often referred to as the breeders equation.

A Numerical Example
To illustrate how this works, let’s examine the simple example in Table 18.1.
Given these phenotypes, p = 0.25, and Vp = 0.16, it follows that x̄ = 1.08 and h2N =
0.1342. Suppose the mean phenotype after selection is 1.544. What will the phenotype be
among the newly born progeny?

S = x̄∗ − x̄
= 1.544 − 1.08
= 0.464
∆x̄ = h2N S
= (0.1342)(0.464)
= 0.06
0
x̄ = x̄ + ∆x̄
= 1.08 + 0.06
= 1.14

162
Genotype A1 A1 A1 A2 A2 A2
Frequency p2 2pq q2
Fitness w11 w12 w22
Additive fitness value 2α1 α1 + α2 2α2

Table 18.2: Fitnesses and additive fitness values used in deriving Fisher’s Fundamental
Theorem of Natural Selection.

Fisher’s Fundamental Theorem of Natural Selection


Suppose the phenotype whose evolution we’re interested in following is fitness itself.12 Then
we can summarize the fitnesses as illustrated in Table 18.2.
Although I didn’t tell you this, a well-known fact about viability selection at one locus
is that the change in allele frequency from one generation to the next can be written as
!
pq dw̄
 
∆p = .
2w̄ dp

Using our new friend, Taylor’s theorem, it follows immediately that

(∆p)2 d2 w̄
! ! !
0 dw̄
w̄ = w̄ + (∆p) + .
dp 2 dp2

Or, equivalently
(∆p)2 d2 w̄
! ! !
dw̄
∆w̄ = (∆p) + .
dp 2 dp2

Recalling that w̄ = p2 w11 + 2p(1 − p)w12 + (1 − p)2 w22 we find that

dw̄
= 2pw11 + 2(1 − p)w12 − 2pw12 − 2(1 − p)w22
dp
= 2[(pw11 + qw12 ) − (pw12 + qw22 )]
= 2[(pw11 + qw12 − w̄/2) − (pw12 + qw22 − w̄/2)]
= 2[α1 − α2 ]
= 2α ,
12
The proof of the fundamental theorem that follows is due to C. C. Li [56]

163
where the last two steps use the definitions for α1 and α2 , and we set α = α1 − α2 . Similarly,

d2 w̄
= 2w11 − 2w12 − 2w12 + 2w22
dp2
= 2(w11 − 2w12 + w22 )

Now we can plug these back into the equation for ∆w̄:
n  o2
( !) ! pq dw̄
pq dw̄ dw̄

2w̄ dp
∆w̄ = + [2(w11 − 2w12 + w22 )]
2w̄ dp dp 2
2
pq pq
    
= (2α) (2α) + (2α) (w11 − 2w12 + w22 )
2w̄ 2w̄
2pqα2 p2 q 2 α2
= + (w11 − 2w12 + w22 )
w̄ w̄2
Va pq

= 1+ (w11 − 2w12 + w22 ) ,
w̄ 2w̄
where the last step follows from the observation that Va = 2pqα2 . The quantity 2pqw̄ (w11 −
2w12 + w22 ) is usually quite small, especially if selection is not too intense.13 So we are left
with
Va
∆w̄ ≈ .

13
Notice that it’s exactly equal to 0 if the fitness of the heterozygote is exactly intermediate. In that case,
all of the variance in fitness is additive.

164
Chapter 19

Selection on multiple characters

So far we’ve studied only the evolution of a single trait, e.g., height or weight. But organ-
isms have many traits, and they evolve at the same time. How can we understand their
simultaneous evolution? The basic framework of the quantitative genetic approach was first
outlined by Russ Lande and Steve Arnold [52].
Let z1 , z2 , . . . , zn be the phenotype of each character that we are studying. We’ll use
z̄ to denote the vector of these characters before selection and z̄∗ to denote the vector after
selection. The selection differential, s, is also a vector given by
s = z̄∗ − z̄ .
Suppose p(z) is the probability that any individual has phenotype z, and let W (z) be the
fitness (absolute viability) of an individual with phenotype z. Then the mean absolute
fitness is Z
W̄ = W (z)p(z)dz .
The relative fitness of phenotype z can be written as
W (z)
w(z) = .

Using relative fitnesses the mean relative fitness, w̄, is 1. Now
Z

z̄ = zw(z)p(z)dz .

Recall that Cov(X, Y ) = E(X − µx )(Y − µy ) = E(XY ) − µx µy . Consider


s = Zz̄∗ − z̄
= zw(z)p(z)dz − z̄
= E(w, z) − w̄z̄ ,

165
where the last step follows since w̄ = 1 meaning that w̄z̄ = z̄. In short,
s = Cov(w, z) .
That should look familiar from our analysis of the evolution of a single phenotype.
If we assume that all genetic effects are additive, then the phenotype of an individual
can be written as
z=x+e ,
where x is the additive genotype and e is the environmental effect. We’ll denote by G the
matrix of genetic variances and covariances and by E the matrix of environmental variances
and covariances. The matrix of phenotype variances and covariances, P, is then given by1
P=G+E .
Now, if we’re willing to assume that the regression of additive genetic effects on phenotype
is linear2 and that the environmental variance is the same for every genotype, then we can
predict how phenotypes will change from one generation to the next
x̄∗ − x̄ = GP−1 (z̄∗ − z̄)
z̄0 − z̄ = GP−1 (z̄∗ − z̄)
∆z̄ = GP−1 s
GP−1 is the multivariate version of h2N . This equation is also the multivariate version of the
breeders equation.
But we have already seen that s = Cov(w, z). Thus,
β = P−1 s
is a set of partial regression coefficients of relative fitness on the characters, i.e., the depen-
dence of relative fitness on that character alone holding all others constant.
Note:
n
X
si = βj Pij
j=1
= β1 Pi1 + · · · + βi Pii + · · · + βn Pin
is the total selective differential in character i, including the indirect effects of selection on
other characters.
1
Assuming that there are no genotype × environment interactions.
2
And we were willing to do this when we were studying the evolution of only one trait, so why not do it
now?

166
Character Mean before selection standard deviation
head 0.880 0.034
thorax 2.038 0.049
scutellum 1.526 0.057
wing 2.337 0.043
head thorax scutellum wing
head 1.00 0.72 0.50 0.60
thorax 1.00 0.59 0.71
scutellum 1.00 0.62
wing 1.00
Character s s0 β β0
head -0.004 -0.11 -0.7 ± 4.9 -0.03 ± 0.17
thorax -0.003 -0.06 11.6 ± 3.9∗∗ 0.58 ± 0.19∗∗
scutellum -0.16∗ -0.28∗ -2.8 ± 2.7 -0.17 ± 0.15
wing -0.019∗∗ -0.43∗∗ -16.6 ± 4.0∗∗ -0.74 ± 0.18∗∗

Table 19.1: Selection analysis of pentastomid bugs on the shores of Lake Michigan.

An example: selection in a pentastomid bug


94 individuals were collected along shoreline of Lake Michigan in Parker County, Indiana
after a storm. 39 were alive, 55 dead. The means of several characters before selection, the
trait correlations, and the selection analysis are presented in Table 19.1.
The column labeled s is the selective differential for each character. The column labeled
0
s is the standardized selective differential, i.e., the change measured in units of standard
deviation rather than on the original scale.3 A multiple regression analysis of fitness versus
phenotype on the original scale gives estimates of β, the direct effect of selection on that
trait. A multiple regression analysis of fitness versus phenotype on the transformed scale
gives the standardized direct effect of selection, β 0 , on that trait.
Notice that the selective differential4 for the thorax measurement is negative, i.e., individ-
uals that survived had larger thoraces than those that died. But the direct effect of selection
on thorax is strongly positive, i.e., all other things being equal, an individual with a large
3
To measure on this scale the data is simply transformed by setting yi = (xi − x̄)/s, where xi is the raw
score for the ith individual, x̄ is the sample mean for the trait, and s is its standard deviation.
4
The cumulative effect of selection on the change in mean phenotype.

167
body tail
body 35.4606 11.3530
tail 11.3530 37.2973

Table 19.2: Genetic variance-covariance matrix for vertebral number in central Californian
garter snakes.

was more likely to survive than one with a small thorax. Why the apparent contradiction?
Because the thorax measurement is positively correlated with the wing measurement, and
there’s strong selection for decreased values of the wing measurement.

Cumulative selection gradients


Arnold [1] suggested an extension of this approach to longer evolutionary time scales. Specif-
ically, he studied variation in the number of body vertebrae and the number of tail vertebrae
in populations of Thamnophis elegans from two regions of central California. He found rel-
atively little vertebral variation within populations, but there were considerable differences
in vertebral number between populations on the coast side of the Coast Ranges and popu-
lations on the Central Valley side of the Coast Ranges. The consistent difference suggested
that selection might have produced these differences, and Arnold attempted to determine
the amount of selection necessary to produce these differences.

The data
Arnold collected pregnant females from two local populations in each of two sites in northern
California 282 km apart from one another. Females were collected over a ten-year period
and returned to the University of Chicago. Dam-offspring regressions were used to estimate
additive genetic variances and covariances of vertebral number.5 Mark-release-recapture ex-
periments in the California populations showed that females with intermediate numbers of
vertebrae grow at the fastest rate, at least at the inland site, although no such relation-
ship was found in males. The genetic variance-covariance matrix he obtained is shown in
Table 19.2.
5
1000 progeny from 100 dams.

168
The method
We know from Lande and Arnold’s results that the change in multivariate phenotype from
one generation to the next, ∆z̄, can be written as

∆z̄ = Gβ ,

where G is the genotypic variance-covariance matrix, β = P−1 s is the set of partial regression
coefficients describing the direct effect of each character on relative fitness.6 If we are willing
to assume that G remains constant, then the total change in a character subject to selection
for n generations is
n
X n
X
∆z̄ = G β .
k=1 k=1

Thus, nk=1 β can be regarded as the cumulative selection differential associated with a
P

particular observed change, and it can be estimated as


n n
−1
X X
β=G ∆z̄ .
k=1 k=1

The results
The overall difference in vertebral number between inland and coastal populations can be
summarized as:

bodyinland − bodycoastal = 16.21


tailinland − tailcoastal = 9.69

Given the estimate of G already obtained, this corresponds to a cumulative selection gradient
between inland and coastal populations of

βbody = 0.414
βtail = 0.134

Applying the same technique to looking at the differences between populations within
the inland site and within the coastal site we find cumulative selection gradients of

βbody = 0.035
βtail = 0.038
6
P is the phenotypic variance-covariance matrix and s is the vector of selection differentials.

169
for the coastal site and

βbody = 0.035
βtail = −0.004

for the inland site.

The conclusions
“To account for divergence between inland and coastal California, we must invoke cumulative
forces of selection that are 7 to 11 times stronger than the forces needed to account for
differentiation of local populations.”
Furthermore, recall that the selection gradients can be used to partition the overall
response to selection in a character into the portion due to the direct effects of that character
alone and the portion due to the indirect effects of selection on a correlated character. In
this case the overall response to selection in number of body vertebrae is given by

G11 β1 + G12 β2 ,

where G11 β1 is the direct effect of body vertebral number and G12 β2 is the indirect effect of
tail vertebral number. Similarly, the overall response to selection in number of tail vertebrae
is given by
G12 β1 + G22 β2 ,
where G22 β2 is the direct effect of tail vertebral number and G12 β1 is the indirect effect of
body vertebral number. Using these equations it is straightforward to calculate that 91%
of the total divergence in number of body vertebrae is a result of direct selection on this
character. In contrast, only 51% of the total divergence in number of tail vertebrae is a result
of direct selection on this character, i.e., 49% of the difference in number of tail vertebrae is
attributable to indirect selection as a result of its correlation with number of body vertebrae.

The caveats
While the approach Arnold suggests is intriguing, there are a number of caveats that must
be kept in mind in trying to apply it.

• This approach assumes that the G matrix remains constant.

• This approach cannot distinguish strong selection that happened over a short period
of time from weak selection that happened over a long period of time.

170
• This approach assumes that the observed differences in populations are the result of
selection, but populations isolated from one another will diverge from one another even
in the absence of selection simply as a result of genetic drift.

– Small amount of differentiation between populations within sites could reflect


relatively recent divergence of those populations from a common ancestral popu-
lation.
– Large amount of differentiation between populations from inland versus coastal
sites could reflect a more ancient divergence from a common ancestral population.

171
172
Chapter 20

Association mapping: the background


from two-locus genetics

One approach to understanding more about the genetics of quantitative traits takes advan-
tage of the increasing number of genetic markers available as a result of recent advances in
molecular genetics. Suppose you have two inbred lines that differ in a trait that interests
you, say body weight or leaf width. Call one of them the “high” line and the other the
“low” line.1 Further suppose that you have a whole bunch of molecular markers that differ
between the two lines, and designate the genotype in the “high” line A1 A1 and the genotype
in the low line A2 A2 .2 One last supposition: Suppose that at loci influencing the phenotype
you’re studying the genotype in the “high” line is Q1 Q1 and the genotype in the “low” line
is Q2 Q2 . Each of these loci is what we call a quantitative trait locus or QTL. Now do the
following experiment:

• Cross the “high” line and the “low” line to construct an F1 .

• Intercross individuals in the F1 generation to form an F2 .3

• “Walk” through the genome4 calculating a likelihood score for a QTL at a particular
map position, using what we know about the mathematics of recombination rates and
1
Corresponding to whether the body weight or leaf width is large or small.
2
Since these are inbred lines, I can assume that they are homozygous at the marker loci I’ve chosen.
3
Note: You could also backcross to either or both of the parental inbred lines. Producing an F2 , however,
allows you to estimate both the additive and dominance effects associated with each QTL.
4
I forgot to mention another supposition. I am supposing that you either have already constructed a
genetic map using your markers, or that you will construct a genetic map using segregation in the F2 before
you start looking for QTL loci.

173
Mendelian genetics. In calculating the likelihood score we maximize the likelihood of
the data assuming that there is a QTL at this position and estimating the corresponding
additive and dominance effects of the allele. We then identify QTLs as those loci where
there are “significant” peaks in the map of likelihood scores.5

The result is a genetic map showing where QTLs are in the genome and indicating the
magnitude of their additive and dominance effects.
QTL mapping is wonderful — provided that you’re working with an organism where it’s
possible to design a breeding program and where the information derived from that breeding
program is relevant to variation in natural populations. Think about it. If we do a QTL
analysis based on segregation in an F2 population derived from two inbred lines, all we re-
ally know is which loci are associated with phenotypic differences between those two lines.
Typically what we really want to know, if we’re evolutionary biologists, is which loci are
associated with phenotypic differences between individuals in the population we’re studying.
That’s where association mapping comes in. We look for statistical associations between
phenotypes and genotypes across a whole population. We expect there to be such associa-
tions, if we have a dense enough map, because some of our marker loci will be closely linked
to loci responsible for phenotypic variation.

A digression into two-locus population genetics6


It’s pretty obvious that if two loci are closely linked, alleles at those loci are likely to be
closely linked, but let’s take a closer look at exactly what that means.
One of the most important properties of a two-locus system is that it is no longer sufficient
to talk about allele frequencies alone, even in a population that satisfies all of the assumptions
necessary for genotypes to be in Hardy-Weinberg proportions at each locus. To see why
consider this. With two loci and two alleles there are four possible gametes:7

Gamete A1 B1 A1 B2 A2 B1 A2 B2
Frequency x11 x12 x21 x22

If alleles are arranged randomly into gametes then,

x11 = p1 p2
5
See http://darwin.eeb.uconn.edu/eeb348/lecture-notes/qtl-intro.pdf for more details.
6
Note: We’ll go over only a small part of this section in lecture. I’m providing all the details here so you
can find them in the future if you ever need them.
7
Think of drawing the Punnett square for a dihybrid cross, if you want.

174
x12 = p1 q2
x21 = q1 p2
x22 = q1 q2 ,

where p1 = freq(A1 ) and p2 = freq(A2 ). But alleles need not be arranged randomly into
gametes. They may covary so that when a gamete contains A1 it is more likely to contain
B1 than a randomly chosen gamete, or they may covary so that a gamete containing A1 is
less likely to contain B1 than a randomly chosen gamete. This covariance could be the result
of the two loci being in close physical association, but it doesn’t have to be. Whenever the
alleles covary within gametes

x11 = p1 p2 + D
x12 = p1 q2 − D
x21 = q1 p2 − D
x22 = q1 q2 + D ,

where D = x11 x22 − x12 x22 is known as the gametic disequilibrium.8 When D 6= 0 the alleles
within gametes covary, and D measures statistical association between them. It does not
(directly) measure the physical association. Similarly, D = 0 does not imply that the loci
are unlinked, only that the alleles at the two loci are arranged into gametes independently
of one another.

A little diversion
It probably isn’t obvious why we can get away with only one D for all of the gamete fre-
quencies. The short answer is:

There are four gametes. That means we need three parameters to describe the
four frequencies. p1 and p2 are two. D is the third.

Another way is to do a little algebra to verify that the definition is self-consistent.

D = x11 x22 − x12 x21


= (p1 p2 + D)(q1 q2 + D) − (p1 q2 − D)(q1 p2 − D)
 
= p1 q1 p2 q2 + D(p1 p2 + q1 q2 ) + D2
8
You will sometimes see D referred to as the linkage disequilibrium, but that’s misleading. Alleles at
different loci may be non-randomly associated even when they are not linked.

175
 
− p1 q1 p2 q2 − D(p1 q2 + q1 p2 ) + D2
= D(p1 p2 + q1 q2 + p1 q2 + q1 p2 )
= D (p1 (p2 + q2 ) + q1 (q2 + p2 ))
= D(p1 + q1 )
= D .

Transmission genetics with two loci


I’m going to construct a reduced version of a mating table to see how gamete frequencies
change from one generation to the next. There are ten different two-locus genotypes (if
we distinguish coupling, A1 B1 /A2 B2 , from repulsion, A1 B2 /A2 B1 , heterozygotes as we must
for these purposes). So a full mating table would have 100 rows. If we assume all the
conditions necessary for genotypes to be in Hardy-Weinberg proportions apply, however, we
can get away with just calculating the frequency with which any one genotype will produce
a particular gamete.9

Gametes
Genotype Frequency A1 B1 A1 B2 A2 B1 A2 B2
A1 B1 /A1 B1 x211 1 0 0 0
1 1
A1 B1 /A1 B2 2x11 x12 2 2
0 0
1
A1 B1 /A2 B1 2x11 x21 2
0 12 0
1−r r r 1−r
A1 B1 /A2 B2 2x11 x22 2 2 2 2
2
A1 B2 /A1 B2 x12 0 1 0 0
r 1−r 1−r r
A1 B2 /A2 B1 2x12 x21 2 2 2 2
1 1
A1 B2 /A2 B2 2x12 x22 0 2
0 2
A2 B1 /A2 B1 x221 0 0 1 0
1 1
A2 B1 /A2 B2 2x21 x22 0 0 2 2
2
A2 B2 /A2 B2 x22 0 0 0 1

1−r r
Where do 2 and 2 come from?
Consider the coupling double heterozygote, A1 B1 /A2 B2 . When recombination doesn’t hap-
pen, A1 B1 and A2 B2 occur in equal frequency (1/2), and A1 B2 and A2 B1 don’t occur at all.
When recombination happens, the four possible gametes occur in equal frequency (1/4). So
9
We’re assuming random union of gametes rather than random mating of genotypes.

176
the recombination frequency,10 r, is half the crossover frequency,11 c, i.e., r = c/2. Now the
results of crossing over can be expressed in this table:

Frequency A1 B1 A1 B2 A2 B1 A2 B2
1 1
1−c 2
0 0 2
1 1 1 1
c 4 4 4 4
2−c c c 2−c
Total 4 4 4 4
1−r r r 1−r
2 2 2 2

Changes in gamete frequency


We can use the mating table table as we did earlier to calculate the frequency of each gamete
in the next generation. Specifically,

x011 = x211 + x11 x12 + x11 x21 + (1 − r)x11 x22 + rx12 x21
= x11 (x11 + x12 + x21 + x22 ) − r(x11 x22 − x12 x21 )
= x11 − rD
0
x12 = x12 + rD
x021 = x21 + rD
x022 = x22 − rD .

No changes in allele frequency


We can also calculate the frequencies of A1 and B1 after this whole process:

p01 = x011 + x012


= x11 − rD + x12 + rD
= x11 + x12
= p1
0
p2 = p2 .

Since each locus is subject to all of the conditions necessary for Hardy-Weinberg to apply
at a single locus, allele frequencies don’t change at either locus. Furthermore, genotype
frequencies at each locus will be in Hardy-Weinberg proportions. But the two-locus gamete
frequencies change from one generation to the next.
10
The frequency of recombinant gametes in double heterozygotes.
11
The frequency of cytological crossover during meiosis.

177
Changes in D
You can probably figure out that D will eventually become zero, and you can probably even
guess that how quickly it becomes zero depends on how frequent recombination is. But I’d
be astonished if you could guess exactly how rapidly D decays as a function of r. It takes a
little more algebra, but we can say precisely how rapid the decay will be.
D0 = x011 x022 − x012 x021
= (x11 − rD)(x22 − rD) − (x12 + rD)(x21 + rD)
= x11 x22 − rD(x11 + x12 ) + r2 D2 − (x12 x21 + rD(x12 + x21 ) + r2 D2 )
= x11 x22 − x12 x21 − rD(x11 + x12 + x21 + x22 )
= D − rD
= D(1 − r)
Notice that even if loci are unlinked, meaning that r = 1/2, D does not reach 0 immediately.
That state is reached only asymptotically. The two-locus analogue of Hardy-Weinberg is
that gamete frequencies will eventually be equal to the product of their constituent allele
frequencies.

D in a finite population
In the absence of mutation, D will eventually decay to 0, although the course of that decay
isn’t as regular as what I’ve just shown [35]. If we allow recurrent mutation at both loci,
however, where
µ1 µ2
A1 * ) A2 B1 *) B2 ,
ν1 ν2
then it can be shown [66] that the expected value of D2 /p1 (1 − p1 )p2 (1 − p2 ) is
E(D2 ) 1
= 2
E(p1 (1 − p1 )p2 (1 − p2 )) 3 + 4Ne (r + µ1 + ν1 + µ2 + ν2 ) − (2.5+Ne (r+µ1 +ν1 +µ2 +ν2 )+Ne (µ1 +ν1 +µ2 +ν2 ))
1
≈ .
3 + 4Ne r

Bottom line: In a finite population, we don’t expect D to go to 0, and the magnitude of D2


is inversely related to amount of recombination between the two loci. The less recombination
there is between two loci, i.e., the smaller r is, the larger the value of D2 we expect.
This has all been a long way12 of showing that our initial intuition is correct. If we can
detect a statistical association between a marker locus and a phenotypic trait, it suggests
12
OK. You can say it. A very long way.

178
Gamete frequencies Allele frequencies
Population A1 B1 A1 B2 A2 B1 A2 B2 pi1 pi2 D
1 0.24 0.36 0.16 0.24 0.60 0.40 0.00
2 0.14 0.56 0.06 0.24 0.70 0.20 0.00
Combined 0.19 0.46 0.11 0.24 0.65 0.30 -0.005

Table 20.1: Gametic disequilibrium in a combined population sample.

that the marker locus and a locus influence expression of the trait are physically linked. So
how do we detect such an association and why do I say that it suggests the loci are physically
linked?

Population structure with two loci


You can probably guess where this is going. With one locus I showed you that there’s
a deficiency of heterozygotes in a combined sample even if there’s random mating within
all populations of which the sample is composed. The two-locus analog is that you can
have gametic disequilibrium in your combined sample even if the gametic disequilibrium is
zero in all of your constituent populations. Table 20.1 provides a simple numerical example
involving just two populations in which the combined sample has equal proportions from
each population.

The gory details


You knew that I wouldn’t be satisfied with a numerical example, didn’t you? You knew
there had to be some algebra coming, right? Well, here it is. Let

Di = x11,i − p1i p2i


Dt = x̄11 − p̄1 p̄2 ,

where x̄11 = K1 K
P 1 PK 1 PK
k=1 x11,k , p̄1 = K k=1 p1k , and p̄2 = K k=1 p2k . Given these definitions,
we can now caclculate Dt .

Dt = x̄11 − p̄1 p̄2


K
1 X
= x11,k − p̄1 p̄2
K k=1

179
K
1 X
= (p1k p2k + Dk ) − p̄1 p̄2
K k=1
K
1 X
= (p1k p2k − p̄1 p̄2 ) + D̄
K k=1
= Cov(p1 , p2 ) + D̄ ,

where Cov(p1 , p2 ) is the covariance in allele frequencies across populations and D̄ is the
mean within-population gametic disequilibrium. Suppose Di = 0 for all subpopulations.
Then D̄ = 0, too (obviously). But that means that

Dt = Cov(p1 , p2 ) .

6 0, then there will be


So if allele frequencies covary across populations, i.e., Cov(p1 , p2 ) =
non-random association of alleles into gametes in the sample, i.e., Dt 6= 0, even if there is
random association alleles into gametes within each population.13
Returning to the example in Table 20.1

Cov(p1 , p2 ) = 0.5(0.6 − 0.65)(0.4 − 0.3) + 0.5(0.7 − 0.65)(0.2 − 0.3)


= −0.005
x̄11 = (0.65)(0.30) − 0.005
= 0.19
x̄12 = (0.65)(0.7) + 0.005
= 0.46
x̄21 = (0.35)(0.30) + 0.005
= 0.11
x̄22 = (0.35)(0.70) − 0.005
= 0.24 .

Association mapping
So what does any of this have to do with QTL mapping? Imagine that we have a well-mixed
population segregating both for a lot of molecular markers spread throughout the genome
and for loci influencing a trait we’re interested in, like body weight or leaf width. Let’s call
13
Well, duh! Covariation of allele frequencies across populations means that alleles are non-randomly
associated across populations. What other result could you possibly expect?

180
our measurement of that trait zi in the ith individual. Let xij be the genotype of individual
i at the jth locus.14 Then to do association mapping, we simply fit the following regression
model:
yi = xij βj + ij ,
where ij is the residual error in our regression estimate and βj is our estimate of the effect
of substituting one allele for another at locus j, i.e., the additive effect of an allele at locus
j. If βj is significantly different from 0, we have evidence that there is a locus linked to this
marker that influences the phenotype we’re interested in.
Notice that I claimed we have evidence that the locus is linked. That’s a bit of sleight of
hand. What we have evidence for directly is that the locus is associated. As we’ve just seen,
though, that association could reflect population structure rather than physical linkage. So
in practice the regression model we fit is a bit more complicated than the one I showed.
The simplest case is when individuals fall into obvious groups, e.g., samples from different
(k)
populations. Then yi is the trait value for individual i. The superscript (k) indicates that
this individual belongs to group k.
(k)
yi = xij βj + φ(k) + ij .

The difference between this model and the one above is that we include a random effect of
group, φ(k) , to account for the fact that individuals may have similar phenotypes not because
of similarity in genotypes at loci close to those we’ve scored but because of their similarity
at other loci that differ among groups.

14
To keep things simple I’m assuming that we’re dealing with biallelic loci, e.g., SNPs, and we can then
order the genotypes as 0, 1, 2 depending on how many copies of the most frequent allele they carry.

181
182
Part V

Molecular evolution

183
Chapter 21

Introduction to molecular population


genetics

The study of evolutionary biology is commonly divided into two components: study of the
processes by which evolutionary change occurs and study of the patterns produced by those
processes. By “pattern” we mean primarily the pattern of phylogenetic relationships among
species or genes.1 Studies of evolutionary processes often don’t often devote too much at-
tention to evolutionary patterns, except insofar as it is often necessary to take account of
evolutionary history in determining whether or not a particular feature is an adaptation.
Similarly, studies of evolutionary pattern sometimes try not to use any knowledge of evo-
lutionary processes to improve their guesses about phylogenetic relationships, because the
relationship between process and pattern can be tenuous.2 Those who take this approach
argue that invoking a particular evolutionary process seems often to be a way of making sure
that you get the pattern you want to get from the data.
Or at least that’s the way it was in evolutionary biology when evolutionary biologists were
concerned primarily with the evolution of morphological, behavioral, and physiological traits
and when systematists used primarily anatomical, morphological, and chemical features (but
not proteins or DNA) to describe evolutionary patterns. With the advent of molecular
1
In certain cases it may make sense to talk about a phylogeny of populations within species, but in many
cases it doesn’t. We’ll discuss this further when we get to phylogeography in a couple of weeks.
2
This approach is much less common than it used to be. In the “old days” (meaning when I was a
young assistant professor), we had vigorous debates about whether or not it was reasonable to incorporate
some knowledge of evolutionary processes into the methods we use for inferring evolutionary patterns. Now
it’s pretty much taken for granted that we should. One way of justifying a strict parsimony approach to
cladistics, however, is by arguing (a) that by minimizing character state changes on a tree you’re merely
trying to find a pattern of character changes as consistent as possible with the data you’ve gathered and (b)
that evolutionary processes should be invoked only to explain that pattern, not to construct it.

185
biology after the Second World War and its application to an increasing diversity of organisms
in the late 1950s and early 1960s, that began to change. Goodman [27] used the degree of
immunological cross-reactivity between serum proteins as an indication of the evolutionary
distance among primates. Zuckerkandl and Pauling [91] proposed that after species diverged,
their proteins diverged according to a “molecular clock,” suggesting one that molecular
similarities could be used to reconstruct evolutionary history. In 1966, Harris [30] and
Lewontin and Hubby [40, 55] showed that human populations and populations of Drosophila
pseudoobscura respectively, contained surprising amounts of genetic diversity.
In this course, we’ll focus on advances made in understanding the processes of molecular
evolution and pay relatively little attention to the ways in which inferences about evolu-
tionary patterns can be made from molecular data. Up to this point in the course we’ve
completely ignored evolutionary pattern.3 As you’ll see in what follows, however, any discus-
sion of molecular evolution, even if it focuses on understanding the processes, cannot avoid
some careful attention to the pattern.

Types of data
Before we delve any further into our study of molecular evolution, it’s probably useful to back
up a bit and talk a bit about the types of data that are available to molecular evolutionists.
We’ve already encountered a couple of these (microsatellites and SNPs), but there are a
variety of important categories into which we can group data used for molecular evolutionary
analyses. Even though studies of molecular evolution in the last 10-15 years have focused on
data derived from DNA sequence or copy number variation, modern applications of molecular
markers evolved from earlier applications. Those markers had their limitations, but analyses
of them also laid the groundwork for most or all of what’s going on in analyses of molecular
evolution today. Thus, it’s useful to remind everyone what those groups are and to agree on
some terminology for the ones we’ll say something about. Let’s talk first about the physical
basis of the underlying data. Then we’ll talk about the laboratory methods used to reveal
variation.

The physical basis of molecular variation


With the exception of RNA viruses, the hereditary information in all organisms is carried in
DNA. Ultimately, differences in any of the molecular markers we study (and of genetically-
3
Why should I bother to tell you much of anything about inferring phylogenies from molecular data when
Paul Lewis and Chris Simon teach courses on that very subject?

186
based morphological, behavioral, or physiological traits) is associated with some difference
in the physical structure of DNA, and molecular evolutionists study a variety of its aspects.

Nucleotide sequence A difference in nucleotide sequence is the most obvious way in which
two homologous stretches of DNA may differ. The differences may be in translated
portions of protein genes (exons), portions of protein genes that are transcribed but
not translated (e.g., introns, 5’ or 3’ untranslated regions), non-transcribed functional
regions (e.g., promoters), or regions without apparent function.
Protein sequence Because of redundancy in the genetic code, a difference in nucleotide
sequence at a protein-coding locus may or may not result in proteins with a different
amino acid sequence. Important note: Don’t forget that some loci code for RNA
that has an immediate function without being translated to a protein, e.g., ribosomal
RNA and various small nuclear RNAs.
Secondary, tertiary, and quaternary structure Differences in amino acid sequence
may or may not lead to a different distribution of α-helices and β-sheets, to a dif-
ferent three-dimensional structure, or to different multisubunit combinations.
Imprinting At certain loci in some organisms the expression pattern of a particular allele
depends on whether that allele was inherited from the individual’s father or its mother.
Expression Functional differences among individuals may arise because of differences in the
patterns of gene expression, even if there are no differences in the primary sequences
of the genes that are expressed.4
Sequence organization Particular genes may differ between organisms because of differ-
ences in the position and number of introns. At the whole genome level, there may
be differences in the amount and kind of repetitive sequences, in the amount and type
of sequences derived from transposable elements, in the relative proportion of G-C
relative to A-T, or even in the identity and arrangement of genes that are present. In
microbial species, only a subset of genes are present in all strains. For example, in
Streptococcus pneumoniae the “core genome” contains only 73% of the loci present in
one fully sequenced reference strain [65]. Similarly, a survey of 20 strains of Escherichia
coli and one of E. fergusonii , E. coli ’s closest relative, identified only 2000 homologous
loci that were present in all strains out of 18,000 orthologous loci identified [78]
4
Of course, differences in expression must ultimately be the result of a DNA sequence (or at least a
methylation difference) difference somewhere, e.g., in a promoter sequence or the locus encoding a promotor
or repressor protein, if it is a genetic difference or the result of an epigenetic modification of the sequence,
e.g., by methylation.

187
Copy number variation Even within diploid genomes, there may be substantial differ-
ences in the number of copies of particular genes. In humans, for example, 76 copy-
number polymorphisms (CNPs) were identified in a sample of only 20 individuals, and
individuals differed from one another by an average of 11 CNPs. [73].

It is worth remembering that in nearly all eukaryotes there are two different genomes whose
characteristics may be analyzed: the nuclear genome and the mitochondrial genome. In
plants there is a third: the chloroplast genome. In some protists, there may be even more,
because of secondary or tertiary endosymbiosis. The mitochondrial and chloroplast genomes
are typically inherited only through the maternal line, although some instances of biparental
inheritance are known.

Revealing molecular variation


The diversity of laboratory techniques used to reveal molecular variation is even greater
than the diversity of underlying physical structures. Various techniques involving direct
measurement of aspects of DNA sequence variation are by far the most common today, so
I’ll mention only the techniques that have been most widely used.

Immunological distance Some molecules, notably protein molecules, induce an immune


response in common laboratory mammals. The extent of cross-reactivity between
an antigen raised to humans and chimps, for example, can be used as a measure
of evolutionary distance. The immunological distance between humans and chimps is
smaller than it is between humans and orangutans, suggesting that humans and chimps
share a more recent common ancestor.

DNA-DNA hybridization Once repetitive sequences of DNA have been “subtracted


out”,5 the rate and temperature at which DNA species from two different species
anneal reflects the average percent sequence divergence between them. The percent
sequence divergence can be used as a measure of evolutionary distance. Immunological
distances and DNA-DNA hybridization were once widely used to identify phylogenetic
relationships among species. Neither is now widely used in molecular evolution studies.

Isozymes Biochemists recognized in the late 1950s that many soluble enzymes occurred in
multiple forms within a single individual. Population geneticists, notably Hubby and
Lewontin, later recognized that in many cases, these different forms corresponded to
different alleles at a single locus, allozymes. Allozymes are relatively easy to score in
5
See below for a description of some of these repetitive seqeuences.

188
most macroscopic organisms, they are typically co-dominant (the allelic composition
of heterozygotes can be inferred), and they allow investigators to identify both variable
and non-variable loci.6 Patterns of variation at allozyme loci may not be representative
of genetic variation that does not result from differences in protein structure or that
are related to variation in proteins that are insoluble.

RFLPs In the 1970s molecular geneticists discovered restriction enzymes, enzymes that
cleave DNA at specific 4, 5, or 6 base pair sequences, the recognition site. A single
nucleotide change in a recognition site is usually enough to eliminate it. Thus, presence
or absence of a restriction site at a particular position in a genome provides compelling
evidence of an underlying difference in nucleotide sequence at that positon.

RAPDs, AFLPs, ISSRs With the advent of the polymerase chain reaction in the late
1980s, several related techniques for the rapid assessment of genetic variation in or-
ganisms for which little or no prior genetic information was available. These methods
differ in details of how the laboratory procedures are performed, buty they are simi-
lar in that they (a) use PCR to amplify anonymous stretches of DNA, (b) generally
produce larger amounts of variation than allozyme analyses of the same taxa, and
(c) are bi-allelic, dominant markers. They have the advantage, relative to allozymes,
that they sample more or less randomly through the genome. They have the disadvan-
tage that heterozygotes cannot be distinguished from dominant homozygotes, meaning
that it is difficult to use them to obtain information about levels of within population
inbreeding.7

Microsatellites Satellite DNA, highly repetitive DNA associated with heterochromatin,


had been known since biochemists first began to characterize the large-scale struc-
ture of genomes in DNA-DNA hybridization studies. In the mid-late 1980s several
investigators identified smaller repetitive units dispersed throughout many genomes.
Microsatellites, which consist of short (2-6) nucleotide sequences repeated many times,
have proven particularly useful for analyses of variation within populations since the
mid-1990s. Because of high mutation rates at each locus, they commonly have many
alleles. Moreover, they are typically co-dominant, making them more generally useful
6
Classical Mendelian genetics, and quantitative genetics too for that matter, depend on genetic variation
in traits to identify the presence of a gene.
7
To be fair, it is possible to distinguish heterozygotes from homozyotes with AFLPs, if you are very
careful with your PCR technique [42]. That being said, few people are careful enough with their PCR to be
able to score AFLPs reliably as codominant markers, and I am unaware of anyone who has done so outside
of a controlled breeding program.

189
than dominant markers. Identifying variable microsatellite loci is more laborious than
identifying AFLPs, RAPDs, or ISSRs.

Nucleotide sequence The advent of automated sequencing has greatly increased the
amount of population-level data available on nucleotide sequences. Nucleotide se-
quence data has an important advantage over most of the types of data discussed so
far: allozymes, RFLPs, AFLPs, RAPDs, and ISSRs may all hide variation. Nucleotide
sequence differences need not be reflected in any of those markers. On the other hand,
each of those markers provides information on variation at several or many, indepen-
dently inherited loci. Nucleotide sequence information reveals differences at a location
that rarely extends more than 2-3kb. Of course, as next generation sequencing tech-
niques become less expensive and more widely available, we will see more and more
examples of nucleotide sequence variation from many loci within individuals.

Single nucleotide polymorphisms In organisms that are genetically well-characterized


it may be possible to identify a large number of single nucleotide positions that harbor
polymorphisms. These SNPs potentially provide high-resolution insight into patterns
of variation within the genome. For example, the HapMap project has identified ap-
proximately 3.2M SNPs in the human genome, or about one every kb [12].

As you can see from these brief descriptions, each of the markers reveals different aspects
of underlying hereditary differences among individuals, populations, or species. There is no
single “best” marker for evolutionary analyses. Which is best depends on the question you
are asking. In many cases in molecular evolution, the interest is intrinsically in the evolution
of the molecule itself, so the choice is based not on what those molecules reveal about the
organism that contains them but on what questions about which molecules are the most
interesting.

Divergence of nucleotide sequences


Underlying much of what we’re going to discuss in this last part of the course is the idea
that we should be able to describe the degree of difference between nucleotide sequences,
proteins, or anything else as a result of some underlying evolutionary processes. To illustrate
the principle, let’s start with nucleotide sequences and develop a fairly simple model that
describes how they become different over time.8
8
By now you should realize that when I write that somethin is “fairly simple”, I mean that it’s fairly
simple to someone who’s comfortable with mathematics.

190
Let qt be the probability that two homologous nucleotides are identical after having
been evolving for t generations independently since the gene in which they were found was
replicated in their common ancestor. Let λ be the probability of a substitution9 occuring at
this nucleotide position in either of the two genes during a small time interval, ∆t. Then
1
 
qt+∆t = (1 − λ∆t) qt + 2 (1 − λ∆t) λ∆t (1 − qt ) + o(∆t2 )
2
3
2
 
= (1 − 2λ∆t)qt + λ∆t (1 − qt ) + o(∆t2 )
3
2 8
qt+∆t − qt = λ∆t − λ∆tqt + o(∆t2 )
3 3
qt+∆t − qt 2 8
= λ − λqt + o(∆t)
∆t 3 3
qt+∆t − qt dqt 2 8
lim = = λ − λqt
∆t→0 ∆t dt 3 3
3 
qt = 1− 1 − e−8λt/3
4

The expected number of nucleotide substitutions separating the two sequences at any one
position since they diverged is d = 2λt.10 Thus,
3 
qt = 1 − 1 − e−4d/3
4
3 4

d = − ln 1 − (1 − qt )
4 3

This is the simplest model of nucleotide substitution possible — the Jukes-Cantor model. It
assumes

• that substitutions are equally likely at all positions and

• that substitution among all nucleotides is equally likely.


9
Notice that I wrote “substitution,” not “mutation.” We’ll come back to this distinction later. It turns
out to be really important.
10
The factor 2 is there because λt substitutions are expected  on each branch. In fact you will usually
see the equation for qt written as qt = 1 − (3/4) 1 − e−4αt/3 , where α = 2λ. α is also referred to as
the substitution rate, but it refers to the rate of substitution between the two sequences, not to the rate
of substitution between each sequence and their common ancestor. If mutations are neutral λ equals the
mutation rate, while α equals twice the mutation rate.

191
Let’s examine the second of those assumptions first. Observed differences between nu-
cleotide sequences shows that some types of substitutions, i.e., transitions (A ⇐⇒ G
[purine to purine], C ⇐⇒ T [pyrimidine to pyrimidine]), occur much more frequently than
others, i.e., transversions (A ⇐⇒ T , A ⇐⇒ C, G ⇐⇒ C, G ⇐⇒ T [purine to
pyrimidine or vice versa]). There are a variety of different substitution models correspond-
ing to different assumed patterns of substitution: Kimura 2 parameter (K2P), Felsenstein
1984 (F84), Hasegawa-Kishino-Yano 1985 (HKY85), Tamura and Nei (TrN), and generalized
time-reversible (GTR). The GTR is, as its name suggests, the most general time-reversible
model. It allows substitution rates to differ between each pair of nucleotides. That’s why it’s
general. It still requires, however, that the substitution rate be the same in both directions.
That’s what it means to say that it’s time reversible. While it is possible to construct a
model in which the substitution rate differs depending on the direction of substitution, it
leads to something of a paradox: with non-reversible substitution models the distance be-
tween two sequences A and B depends on whether we measure the distance from A to B or
from B to A.
There are two ways in which the rate of nucleotide substitution can be allowed to vary
from position to position — the phenomenon of among-site rate variation. First, we expect
the rate of substitution to depend on codon position in protein-coding genes. The sequence
can be divided into first, second, and third codon positions and rates calculated separately
for each of those positions. Second, we can assume a priori that there is a distribution
of different rates possible and that this distribution is described by one of the standard
distributions from probability theory. We then imagine that the substitution rate at any
given site is determined by a random draw from the given probability distribution. The
gamma distribution is widely to describe the pattern of among-site rate variation, because
it can approximate a wide variety of different distributions (Figure 21.1).11
The mean substitution rate in each curve above is 0.1. The curves differ only in the
value of a parameter, α, called the “shape parameter.” The shape parameter gives a nice
numerical description of how much rate variation there is, except that it’s backwards. The
larger the parameter, the less among-site rate variation there is.

The neutral theory of molecular evolution


I didn’t make a big deal of it in what we just went over, but in deriving the Jukes-Cantor
equation I used the phrase “substitution rate” instead of the phrase “mutation rate.”12 As
a preface to what is about to follow, let me explain the difference.
11
And, to be honest, because it is mathematically convenient to work with.
12
In fact, I just mentioned the distinction in passing in two different footnotes.

192
30
25
20
α=0.8
15 α=2.5
α=1
10
α=2.0
5

0.1 0.2 0.3 0.4 0.5

Figure 21.1: Examples of a gamma distribution.

• Mutation rate refers to the rate at which changes are incorporated into a nucleotide
sequence during the process of replication, i.e., the probability that an allele differs
from the copy of that allele in its parent from which it was derived. Mutation rate
refers to the rate at which mutations arise.

• An allele substitution occurs when a newly arisen allele is incorporated into a popula-
tion, e.g., when a newly arisen allele becomes fixed in a population. Substitution rate
refers to the rate at which allele substitutions occur.

Mutation rates and substitution rates are obviously related related — substitutions can’t
happen unless mutations occur, after all — , but it’s important to remember that they refer
to different processes.

Early empirical observations


By the early 1960s amino acid sequences of hemoglobins and cytochrome c for many mam-
mals had been determined. When the sequences were compared, investigators began to
notice that the number of amino acid differences between different pairs of mammals seemed

193
to be roughly proportional to the time since they had diverged from one another, as inferred
from the fossil record. Zuckerkandl and Pauling [91] proposed the molecular clock hypothesis
to explain these results. Specifically, they proposed that there was a constant rate of amino
acid substitution over time. Sarich and Wilson [71, 87] used the molecular clock hypothesis
to propose that humans and apes diverged approximately 5 million years ago. While that
proposal may not seem particularly controversial now, it generated enormous controversy at
the time, because at the time many paleoanthropologists interpreted the evidence to indicate
humans diverged from apes as much as 30 million years ago.
One year after Zuckerkandl and Pauling’s paper, Harris [30] and Hubby and Lewontin [40,
55] showed that protein electrophoresis could be used to reveal surprising amounts of genetic
variability within populations. Harris studied 10 loci in human populations, found three of
them to be polymorphic, and identified one locus with three alleles. Hubby and Lewontin
studied 18 loci in Drosophila pseudoobscura, found seven to be polymorphic, and five that
had three or more alleles.
Both sets of observations posed real challenges for evolutionary geneticists. It was difficult
to imagine an evolutionary mechanism that could produce a constant rate of substitution.
It was similarly difficult to imagine that natural selection could maintain so much polymor-
phism within populations. The “cost of selection,” as Haldane called it would simply be too
high.

Neutral substitutions and neutral variation


Kimura [43] and King and Jukes [44] proposed a way to solve both empirical problems. If
the vast majority of amino acid substitutions are selectively neutral, then substitutions will
occur at approximately a constant rate (assuming that mutation rates don’t vary over time)
and it will be easy to maintain lots of polymorphism within populations because there will be
no cost of selection. I’ll develop both of those points in a bit more detail in just a moment,
but let me first be precise about what the neutral theory of molecular evolution actually
proposes. More specifically, let me first be precise about what it does not propose. I’ll do
so specifically in the context of protein evolution for now, although we’ll broaden the scope
later.

• The neutral theory asserts that alternative alleles at variable protein loci are selectively
neutral. This does not mean that the locus is unimportant, only that the alternative
alleles found at this locus are selectively neutral.

– Glucose-phosphate isomerase is an esssential enzyme. It catalyzes the first step


of glycolysis, the conversion of glucose-6-phosphate into fructose-6-phosphate.

194
– Natural populations of many, perhaps most, populations of plants and animals
are polymorphic at this locus, i.e., they have two or more alleles with different
amino acid sequences.
– The neutral theory asserts that the alternative alleles are essentially equivalent
in fitness, in the sense that genetic drift, rather than natural selection, dominates
the dynamics of frequency changes among them.

• By selectively neutral we do not mean that the alternative alleles have no effect on
physiology or fitness. We mean that the selection among different genotypes at this
locus is sufficiently weak that the pattern of variation is determined by the interaction
of mutation, drift, mating system, and migration. This is roughly equivalent to saying
that Ne s < 1, where Ne is the effective population size and s is the selection coefficient
on alleles at this locus.

– Experiments in Colias butterflies, and other organisms have shown that different
electrophoretic variants of GPI have different enzymatic capabilities and different
thermal stabilities. In some cases, these differences have been related to differences
in individual performance.
– If populations of Colias are large and the differences in fitness associated with dif-
ferences in genotype are large, i.e., if Ne s > 1, then selection plays a predominant
role in determining patterns of diversity at this locus, i.e., the neutral theory of
molecular evolution would not apply.
– If populations of Colias are small or the differences in fitness associated with
differences in genotype are small, or both, then drift plays a predominant role in
determining patterns of diversity at this locus, i.e., the neutral theory of molecular
evolution applies.

In short, the neutral theory of molecular really asserts only that observed amino acid substi-
tutions and polymorphisms are effectively neutral, not that the loci involved are unimportant
or that allelic differences at those loci have no effect on fitness.

The rate of molecular evolution


We’re now going to calculate the rate of molecular evolution, i.e., the rate of allelic sub-
stitution, under the hypothesis that mutations are selectively neutral. To get that rate we
need two things: the rate at which new mutations occur and the probability with which new

195
mutations are fixed. In a word equation
# of substitutions/generation = (# of mutations/generation) × (probability of fixation)
λ = µ0 p 0 .
Surprisingly,13 it’s pretty easy to calculate both µ0 and p0 from first principles.
In a diploid population of size N , there are 2N gametes. The probability that any one
of them mutates is just the mutation rate, µ, so
µ0 = 2N µ . (21.1)
To calculate the probability of fixation, we have to say something about the dynamics of
alleles in populations. Let’s suppose that we’re dealing with a single population, to keep
things simple. Now, you have to remember a little of what you learned about the properties
of genetic drift. If the current frequency of an allele is p0 , what’s the probability that is
eventually fixed? p0 . When a new mutation occurs there’s only one copy of it,14 so the
frequency of a newly arisen mutation is 1/2N and
1
p0 = . (21.2)
2N
Putting (21.1) and (21.2) together we find
λ = µ0 p 0
1
 
= (2N µ)
2N
= µ .
In other words, if mutations are selectively neutral, the substitution rate is equal to the
mutation rate. Since mutation rates are (mostly) governed by physical factors that remain
relatively constant, mutation rates should remain constant, implying that substitution rates
should remain constant if substitutions are selectively neutral. In short, if mutations are
selectively neutral, we expect a molecular clock.

Diversity in populations
Protein-coding genes consist of hundreds or thousands of nucleotides, each of which could
mutate to one of three other nucleotides.15 That’s not an infinite number of possibilities,
13
Or perhaps not.
14
By definition. It’s new.
15
Why three when there are four nucleotides? Because if the nucleotide at a certain position is an A, for
example, it can only change to a C, G, or T.

196
but it’s pretty large.16 It suggests that we could treat every mutation that occurs as if it
were completely new, a mutation that has never been seen before and will never be seen
again. Does that description ring any bells? Does the infinite alleles model sound familiar?
It should, because it exactly fits the situation I’ve just described.
Having remembered that this situation is well described by the infinite alleles model, I’m
sure you’ll also remember that we can calculate the equilibrium inbreeding coefficient for the
infinite alleles model, i.e.,
1
f= .
4Ne µ + 1
What’s important about this for our purposes, is that to the extent that the infinite alleles
model is appropriate for molecular data, then f is the frequency of homozygotes we should
see in populations and 1 − f is the frequency of heterozygotes. So in large populations we
should find more diversity than in small ones, which is roughly what we do find. Notice,
however, that here we’re talking about heterozygosity at individual nucleotide positions,17
not heterozygosity of halpotypes.

Conclusions
In broad outline then, the neutral theory does a pretty good job of dealing with at least
some types of molecular data. I’m sure that some of you are already thinking, “But what
about third codon positions versus first and second?” or “What about the observation
that histone loci evolve much more slowly than interferons or MHC loci?” Those are good
questions, and those are where we’re going next. As we’ll see, molecular evolutionists have
elaborated the framework extensively18 in the last thirty years, but these basic principles
underlie every investigation that’s conducted. That’s why I wanted to spend a fair amount
of time going over the logic and consequences. Besides, it’s a rare case in population genetics
where the fundamental mathematics that lies behind some important predictions are easy
to understand.19

16
If a protein consists of 400 amino acids, that’s 1200 nucleotides. There are 41200 ≈ 10720 different
sequences that are 1200 nucleotides long.
17
Since the mutation rate we’re talking about applies to individual nucleotide positions.
18
That mean’s they’ve made it more complicated.
19
It’s the concepts that get tricky, not the algebra, or at least that’s what I think.

197
198
Chapter 22

Patterns of nucleotide and amino acid


substitution

So I’ve just suggested that the neutral theory of molecular evolution explains quite a bit, but
it also ignores quite a bit.1 The derivations we did assumed that all substitutions are equally
likely to occur, because they are selectively neutral. That isn’t plausible. We need look no
further than sickle cell anemia to see an example of a protein polymorphism in which a single
amino acid difference has a very large effect on fitness. Even reasoning from first principles
we can see that it doesn’t make much sense to think that all nucleotide substitutions are
created equal. Just as it’s unlikely that you’ll improve the performance of your car if you
pick up a sledgehammer, open its hood, close your eyes, and hit something inside, so it’s
unlikely that picking a random amino acid in a protein and substituting it with a different
one will improve the function of the protein.2

The genetic code


Of course, not all nucleotide sequence substitutions lead to amino acid substitutions in
protein-coding genes. There is redundancy in the genetic code. Table 22.1 is a list of the
codons in the universal genetic code.3 Notice that there are only two amino acids, methionine
1
I won’t make my bikini joke, because it doesn’t conceal as much as quantitative gentics. But still the
“pure” version of the neutral theory of molecular evolution makes a lot of simplifying assumptions.
2
Obviously it happens sometimes. If it didn’t, there wouldn’t be any adaptive evolution. It’s just that,
on average, mutations are more likely to decrease fitness than to increase it.
3
By the way, the “universal” genetic code is not universal. There are at least eight, but all of them have
similar redundancy properties.

199
Amino Amino Amino Amino
Codon Acid Codon Acid Codon Acid Codon Acid
UUU Phe UCU Ser UAU Tyr UGU Cys
UUC Phe UCC Ser UAC Tyr UGC Cys
UUA Leu UCA Ser UAA Stop UGA Stop
UUG Leu UCG Ser UAG Stop UGG Trp

CUU Leu CCU Pro CAU His CGU Arg


CUC Leu CCC Pro CAC His CGC Arg
CUA Leu CCA Pro CAA Gln CGA Arg
CUG Leu CCG Pro CAG Gln CGG Arg

AUU Ile ACU Thr AAU Asn AGU Ser


AUC Ile ACC Thr AAC Asn AGC Ser
AUA Ile ACA Thr AAA Lys AGA Arg
AUG Met ACG Thr AAG Lys AGG Arg

GUU Val GCU Ala GAU Asp GGU Gly


GUC Val GCC Ala GAC Asp GGC Gly
GUA Val GCA Ala GAA Glu GGA Gly
GUG Val GCG Ala GAG Glu GGG Gly

Table 22.1: The universal genetic code.

and tryptophan, that have a single codon. All the rest have at least two. Serine, arginine,
and leucine have six.
Moreover, most of the redundancy is in the third position, where we can distinguish 2-fold
from 4-fold redundant sites (Table 22.2). 2-fold redundant sites are those at which either
one of two nucleotides can be present in a codon for a single amino acid. 4-fold redundant
sites are those at which any of the four nucleotides can be present in a codon for a single
amino acid. In some cases there is redundancy in the first codon position, e.g, both AGA
and CGA are codons for arginine. Thus, many nucleotide substitutions at third positions do
not lead to amino acid substitutions, and some nucleotide substitutions at first positions do
not lead to amino acid substitutions. But every nucleotide substitution at a second codon
position leads to an amino acid substitution. Nucleotide substitutions that do not lead to
amino acid substitutions are referred to as synonymous substitutions, because the codons
involved are synonymous, i.e., code for the same amino acid. Nucleotide substitutions that

200
Amino
Codon Acid Redundancy
CCU Pro 4-fold
CCC
CCA
CCG
AAU Asn 2-fold
AAC
AAA Lys 2-fold
AAG

Table 22.2: Examples of 4-fold and 2-fold redundancy in the 3rd position of the universal
genetic code.

do lead to amino acid substituions are non-synonymous substitutions.

Rates of synonymous and non-synonymous substitution


By using a modification of the simple Jukes-Cantor model we encountered before, it is
possible make separate estimates of the number of synonymous substitutions and of the
number of non-synonymous substitutions that have occurred since two sequences diverged
from a common ancestor. If we combine an estimate of the number of differences with
an estimate of the time of divergence we can estimate the rates of synonymous and non-
synonymous substitution (number/time). Table 22.3 shows some representative estimates
for the rates of synonymous and non-synonymous substitution in different genes studied in
mammals.
Two very important observations emerge after you’ve looked at this table for awhile. The
first won’t come as any shock. The rate of non-synonymous substitution is generally lower
than the rate of synonymous substitution. This is a result of my “sledgehammer principle.”
Mutations that change the amino acid sequence of a protein are more likely to reduce that
protein’s functionality than to increase it. As a result, they are likely to lower the fitness of
individuals carrying them, and they will have a lower probability of being fixed than those
mutations that do not change the amino acid sequence.
The second observation is more subtle. Rates of non-synonymous substitution vary by
more than two orders of magnitude: 0.02 substitutions per nucleotide per billion years in
ribosomal protein S14 to 3.06 substitutions per nucleotide per billion years in γ-interferon,

201
Locus Non-synonymous rate Synonymous rate
Histone
H4 0.00 3.94
H2 0.00 4.52
Ribosomal proteins
S17 0.06 2.69
S14 0.02 2.16
Hemoglobins & myoglobin
α-globin 0.56 4.38
β-globin 0.78 2.58
Myoglobin 0.57 4.10
Interferons
γ 3.06 5.50
α1 1.47 3.24
β1 2.38 5.33

Table 22.3: Representative rates of synonymous and non-synonymous substitution in mam-


malian genes (from [57]). Rates are expressed as the number of substitutions per 109 years.

while rates of synonymous substitution vary only by a factor of two (2.16 in ribosomal protein
S14 to 4.52 in histone H2). If synonymous substitutions are neutral, as they probably are to
a first approximation,4 then the rate of synonymous substitution should equal the mutation
rate. Thus, the rate of synonymous substitution should be approximately the same at
every locus, which is roughly what we observe. But proteins differ in the degree to which
their physiological function affects the performance and fitness of the organisms that carry
them. Some, like histones and ribosomal proteins, are intimately involved with chromatin
or translation of messenger RNA into protein. It’s easy to imagine that just about any
change in the amino acid sequence of such proteins will have a detrimental effect on its
function. Others, like interferons, are involved in responses to viral or bacterial pathogens.
It’s easy to imagine not only that the selection on these proteins might be less intense, but
that some amino acid substitutions might actually be favored by natural selection because
they enhance resistance to certain strains of pathogens. Thus, the probability that a non-
synonymous substitution will be fixed is likely to vary substantially among genes, just as we

4
We’ll see that they may not be completely neutral a little later, but at least it’s reasonable to believe that
the intensity of selection to which they are subject is less than that to which non-synonymous substitutions
are subject.

202
observe.

Revising the neutral theory


So we’ve now produced empirical evidence that many mutations are not neutral. Does this
mean that we throw the neutral theory of molecular evolution away? Hardly. We need only
modify it a little to accomodate these new observations.

• Most non-synonymous substitutions are deleterious. We can actually generalize this


assertion a bit and say that most mutations that affect function are deleterious. After
all, organisms have been evolving for about 3.5 billion years. Wouldn’t you expect
their cellular machinery to work pretty well by now?

• Most molecular variability found in natural populations is selectively neutral. If most


function-altering mutations are deleterious, it follows that we are unlikely to find much
variation in populations for such mutations. Selection will quickly eliminate them.

• Natural selection is primarily purifying. Although natural selection for variants that
improve function is ultimately the source of adaptation, even at the molecular level,
most of the time selection is simply eliminating variants that are less fit than the norm,
not promoting the fixation of new variants that increase fitness.

• Alleles enhancing fitness are rapidly incorporated. They do not remain polymorphic
for long, so we aren’t likely to find them when they’re polymorphic.

As we’ll see, even these revisions aren’t entirely sufficient, but what we do from here on
out is more to provide refinements and clarifications than to undertake wholesale revisions.

Detecting selection on nucleotide polymorphisms


At this point, we’ve refined the neutral theory quite a bit. Our understanding of how
molecules evolve now recognizes that some substitutions are more likely than others, but
we’re still proceeding under the assumption that most nucleotide substitutions are neutral
or detrimental. So far we’ve argued that variation like what Hubby and Lewontin [40, 55]
found is not likely to be maintained by natural selection. But we have strong evidence
that heterozygotes for the sickle-cell allele are more fit than either homozygote in human
populations where malaria is prevalent. That’s an example where selection is acting to

203
maintain a polymorphism, not to eliminate it. Are there other examples? How could we
detect them?
In the 1970s a variety of studies suggested that a polymorphism in the locus coding for
alcohol dehydrogenase in Drosophila melanogaster might not only be subject to selection
but that selection may be acting to maintain the polymorphism. As DNA sequencing be-
came more practical at about the same time,5 population geneticists began to realize that
comparative analyses of DNA sequences at protein-coding loci could provide a powerful tool
for unraveling the action of natural selection. Synonymous sites within a protein-coding
sequence provide a powerful standard of comparision. Regardless of

• the demographic history of the population from which the sequences were collected,

• the length of time that populations have been evolving under the sample conditions and
whether it has been long enough for the population to have reached a drift-migration-
mutation-selection equilibrium, or

• the actual magnitude of the mutation rate, the migration rate, or the selection coeffi-
cients

the synonymous positions within the sequence provide an internal control on the amount
and pattern of differentiation that should be expected when substitutions.6 Thus, if we see
different patterns of nucleotide substitution at synonymous and non-synonymous sites, we
can infer that selection is having an effect on amino acid substitutions.

Nucleotide sequence variation at the Adh locus in Drosophila


melanogaster
Kreitman [49] took advantage of these ideas to provide additional insight into whether natu-
ral selection was likely to be involved in maintaining the polymorphism at Adh in Drosophila
melanogaster. He cloned and sequenced 11 alleles at this locus, each a little less than 2.4kb
in length.7 If we restrict our attention to the coding region, a total of 765bp, there were
6 distinct sequences that differed from one another at between 1 and 13 sites. Given the
observed level of polymorphism within the gene, there should be 9 or 10 amino acid differ-
ences observed as well, but only one of the nucleotide differences results in an amino acid
5
It was still vastly more laborious than it is now.
6
Ignoring, for the moment, the possibility that there may be selection on codon usage.
7
Think about how the technology has changed since then. This work represented a major part of his
Ph.D. dissertation, and the results were published as an article in Nature.

204
difference, the amino acid difference associated with the already recognized electrophoretic
polymorphsim. Thus, there is significantly less amino acid diversity than expected if nu-
cleotide substitutions were neutral, consistent with my assertion that most mutations are
deleterious and that natural selection will tend to eliminate them. In other words, another
example of the “sledgehammer principle.”
Does this settle the question? Is the Adh polymorphism another example of allelic vari-
ants being neutral or selected against? Would I be asking these questions if the answer were
“Yes”?

Kreitman and Aguadé


A few years after Kreitman [49] appeared, Kreitman and Aguadé [50] published an anal-
ysis in which they looked at levels of nucleotide diversity in the Adh region, as revealed
through analysis of RFLPs, in D. melanogaster and the closely related D. simulans. Why
the comparative approach? Well, Kreitman and Aguadé recognized that the neutral theory
of molecular evolution makes two predictions that are related to the underlying mutation
rate:

• If mutations are neutral, the substitution rate is equal to the mutation rate.

• If mutations are neutral, the diversity within populations should be about


4Ne µ/(4Ne µ + 1).

Thus, if variation at the Adh locus in D. melanogaster is selectively neutral, the amount of
divergence between D. melanogaster and D. simulans should be related to the amount of
diversity within each. What they found instead is summarized in Table 22.4.
Notice that there is substantially less divergence at the Adh locus than would be expected,
based on the average level of divergence across the entire region. That’s consistent with the
earlier observation that most amino acid substitutions are selected against. On the other
hand, there is more nucleotide diversity within D. melanogaster than would be expected
based on the levels of diversity seen in across the entire region. What gives?
Time for a trip down memory lane. Remember something called “coalescent theory?”
It told us that for a sample of neutral genes from a population, the expected time back to
a common ancestor for all of them is about 4Ne for a nuclear gene in a diploid population.
That means there’s been about 4Ne generations for mutations to occur. Suppose, however,
that the electrophoretic polymorphism were being maintained by natural selection. Then we
might well expect that it would be maintained for a lot longer than 4Ne generations. If so,
there would be a lot more time for diversity to accumulate. Thus, the excess diversity could
be accounted for if there is balancing selection at ADH.

205
5’ flanking Adh locus 3’ flanking
Diversity1
Observed 9 14 2
Expected 10.8 10.8 3.4
Divergence2
Observed 86 48 31
Expected 55 76.9 33.1
1
Number of polymorphic sites within D. melanogaster
2
Number of nucleotide differences between D. melanogaster and D. simulans

Table 22.4: Diversity and divergence in the Adh region of Drosophila (from [50]).

Kreitman and Hudson


Kreitman and Hudson [51] extended this approach by looking more carefully within the
region to see where they could find differences between observed and expected levels of
nucleotide sequence diversity. They used a “sliding window” of 100 silent base pairs in their
calculations. By “sliding window” what they mean is that first they calculate statistics for
bases 1-100, then for bases 2-101, then for bases 3-102, and so on until they hit the end of
the sequence. It’s rather like walking a chromosome for QTL mapping, and the results are
rather pretty (Figure 22.1).
To me there are two particularly striking things about this figure. First, the position of
the single nucleotide substitution responsible for the electrophoretic polymorphism is clearly
evident. Second, the excess of polymorphism extends for only a 200-300 nucleotides in each
direction. That means that the rate of recombination within the gene is high enough to
randomize the nucleotide sequence variation farther away.8

Detecting selection in the human genome


I’ve already mentioned the HapMap project [12], a collection of genotype data at roughly
3.2M SNPs in the human genome. The data in phase II of the project were collected from
four populations:

• Yoruba (Ibadan, Nigeria)


8
Think about what that means for association mapping. In organisms with a large effective population
size, associations due to physical linkage may fall off very rapidly, meaning that you would have to have a
very dense map to have a hope of finding associations.

206
Figure 22.1: Sliding window analysis of nucleotide diversity in the Adh-Adh-dup region of
Drosophila melanogaster. The arrow marks the position of the single nucleotide substitution
that distinguishes Adh-F from Adh-S (from [51])

• Japanese (Tokyo, Japan)

• Han Chinese (Beijing, China)

• ancestry from northern and western Europe (Utah, USA)

We expect genetic drift to result in allele frequency differences among populations, and
we can summarize the extent of that differentiation at each locus with FST . If all HapMap
SNPs are selectively neutral,9 then all loci should have the same FST within the bounds
of statistical sampling error and the evolutionary sampling due to genetic drift. A scan of
human chromosome 7 reveals both a lot of variation in individual-locus estimates of FST and
a number of loci where there is substantially more differentiation among populations than
is expected by chance (Figure 22.2). At very fine genomic scales we can detect even more
outliers (Figure 22.3), suggesting that human populations have been subject to divergent
selection pressures at many different loci [28].

9
And unlinked to sites that are under selection.

207
Figure 22.2: Single-locus estimates of FST along chromosome 7 in the HapMap data set.
Blue dots denote outliers. Adjacent SNPs in this sample are separated, on average, by about
52kb. (from [28])

208
128500000
KIAA0828
Non−outlier
Oulier

CALU

128000000
NYD−SP18

Position on Chromosome 7 (base pair)


127500000
LEP SND1

127000000
FSCN3
LOC168850GCC1

126500000
GRM8

126000000
Gene

0.5 0.4 0.3 0.2 0.1 0.0

M1CAR: posteriro mean of θ i

Figure 22.3: Single-locus estimates of FST along a portion of chromosome 7 in the HapMap
data set. Black dots denote outliers. Solid bars refert to previously identified genes. Adjacent
SNPs in this sample are separated, on average, by about 1kb. (from [28])

209
Tajima’s D, Fu’s FS , Fay and Wu’s H, and Zeng et al.’s
E
So far we’ve been comparing rates of synonymous and non-synonymous substitution to detect
the effects of natural selection on molecular polymorphisms. Tajima [77] proposed a method
that builds on the foundation of the neutral theory of molecular evolution in a different
way. I’ve already mentioned the infinite alleles model of mutation several times. When
thinking about DNA sequences a closely related approximation is to imagine that every time
a mutation occurs, it occurs at a different site.10 If we do that, we have an infinite sites
model of mutation.

Tajima’s D
When dealing with nucleotide sequences in a population context there are two statistics of
potential interest:

• The number of nucleotide positions at which a polymorphism is found or, equivalently,


the number of segregating sites, k.

• The average per nucleotide diversity, π, where π is estimated as


X
π= xi xj δij /N .

In this expression, xi is the frequency of the ith haplotype, δij is the number of nu-
cleotide sequence differences between haplotypes i and j, and N is the total length of
the sequence.11

The quantity 4Ne µ comes up a lot in mathematical analyses of molecular evolution.


Population geneticists, being a lazy bunch, get tired of writing that down all the time,
10
Of course, we know this isn’t true. Multiple substitutions can occur at any site. That’s why the percent
difference between two sequences isn’t equal to the number of substitutions that have happened at any
particular site. We’re simply assuming that the sequences we’re comparing are closely enough related that
nearly all mutations have occurred at different positions.
11
I lied, but you must be getting used to that by now. This isn’t quite the way you estimate it. To get an
unbiased estimate of pi, you have to multiply this equation by n/(n−1), where n is the number of haplotypes
in your sample. And, of course, if you’re Bayesian you’ll be even a little more careful. You’ll estimate xi
using an appropriate prior on haplotype frequencies and you’ll estimate the probability that haplotypes i
and j are different at a randomly chosen position given the observed number of differences and the sequence
length. That probability will be close to δij /N , but it won’t be identical.

210
so they invented the parameter θ = 4Ne µ to save themselves a little time.12 Under the
infinite-sites model of DNA sequence evolution, it can be shown that

E(π) = θ
n−1
X 1
E(k) = θ ,
i i

where n is the number of haplotypes in your sample.13 This suggests that there are two ways
to estimate θ, namely

θ̂π = π̂
k
θ̂k = Pn−1 1 ,
i i

where π̂ is the average heterozygosity at nucleotide sites in our sample and k is the observed
number of segregating sites in our sample.14 If the nucleotide sequence variation among
our haplotypes is neutral and the population from which we sampled is in equilibrium with
respect to drift and mutation, then θ̂π and θ̂k should be statistically indistinguishable from
one another. In other words,
D̂ = θ̂π − θ̂k
should be indistinguishable from zero. If it is either negative or positive, we can infer that
there’s some departure from the assumptions of neutrality and/or equilibrium. Thus, D̂ can
be used as a test statistic to assess whether the data are consistent with the population being
at a neutral mutation-drift equilibrium. Consider the value of D under following scenarios:

Neutral variation If the variation is neutral and the population is at a drift-mutation


equilibrium, then D̂ will be statistically indistinguishable from zero.

Overdominant selection Overdominance will allow alleles beloning to the different classes
to become quite divergent from one another. δij within each class will be small, but δij
12
This is not the same θ we encountered when discussing F -statistics. Weir and Cockerham’s θ is a
different beast. I know it’s confusing, but that’s the way it is. When reading a paper, the context should
make it clear which conception of θ is being used. Another thing to be careful of is that sometimes authors
think of θ in terms of a haploid population. When they do, it’s 2Ne µ. Usually the context makes it clear
which definition is being used, but you have to remember to pay attention to be sure.
13
The “E” refers to expectation. It is the average value of a random variable. E(π) is read as “the
expectation of π¿
14
If your memory is really good, you may recognize that those estimates are method of moments estimates,
i.e., parameter estimates obtained by equating sample statistics with their expected values.

211
between classes will be large and both classes will be in intermediate frequency, leading
to large values of θπ . There won’t be a similar tendency for the number of segregating
sites to increase, so θk will be relatively unaffected. As a result, D̂ will be positive.

Population bottleneck If the population has recently undergone a bottleneck, then π will
be little affected unless the bottleneck was prolonged and severe.15 k, however, may
be substantially reduced. Thus, D̂ should be positive.

Purifying selection If there is purifying selection, mutations will occur and accumulate at
silent sites, but they aren’t likely ever to become very common. Thus, there are likely
to be lots of segregating sites, but not much heterozygosity, meaning that θ̂k will be
large, θ̂π will be small, and D̂ will be negative.

Population expansion Similarly, if the population has recently begun to expand, muta-
tions that occur are unlikely to be lost, increasing θ̂k , but it will take a long time before
they contribute to heterozygosity, θ̂π . Thus, D̂ will be negative.

In short, D̂ provides a different avenue for insight into the evolutionary history of a
particular nucleotide sequence. But interpreting it can be a little tricky.

D̂ = 0: We have no evidence for changes in population size or for any particular pattern of
selection at the locus.16

D̂ < 0: The population size may be increasing or we may have evidence for purifying selec-
tion at this locus.

D̂ > 0: The population may have suffered a recent bottleneck (or be decreaing) or we may
have evidence for overdominant selection at this locus.

If we have data available for more than one locus, we may be able to distinguish changes in
population size from selection at any particular locus. After all, all loci will experience the
same demographic effects, but we might expect selection to act differently at different loci,
especially if we choose to analyze loci with different physiological function.
A quick search in Google Scholar reveals that the paper in which Tajima described this
approach [77] has been cited over 5300 times. Clearly it has been widely used for interpreting
15
Why? Because most of the heterozygosity is due to alleles of moderate to high frequency, and those are
not the ones likely to be lost in a bottleneck. See the Appendix22 for more details.
16
Please remember that the failure to detect a difference from 0 could mean that your sample size is too
small to detect an important effect. If you can’t detect a difference, you should try to assess what values of
D are consistent with your data and be appropriately circumspect in your conclusions.

212
patterns of nucleotide sequence variation. Although it is a very useful statistic, Zeng et al. [90]
point out that there are important aspects of the data that Tajima’s D does not consider.
As a result, it may be less powerful, i.e., less able to detect departures from neutrality, than
some alternatives.

Fu’s FS
Fu [24] proposes a different statistic based on the infinite sites model of mutation. He
suggests estimating the probability of observing a random sample with a number of alleles
equal to or smaller than the observed value under given the observed level of diversity and
the assumption that all of the alleles are selectively neutral. If we call this probability Ŝ,
then !

FS = ln .
1 − Ŝ
A negative value of FS is evidence for an excess number of alleles, as would be expected
from a recent population expansion or from genetic hitchhiking. A positive value of FS is
evidence for an deficiency of alleles, as would be expect from a recent population bottleneck
or from overdominant selection. Fu’s simulations suggest that FS is a more sensitive indicator
of population expansion and genetic hitchhiking than Tajimas D. Those simulations also
suggest that the conventional P-value of 0.05 corresponds to a P-value from the coalescent
simulation of 0.02. In other words, FS should be regarded as significant if P < 0.02.

Fay and Wu’s H


Let ξi be the number of sites at which a sequence occurring i times in the sample differs from
the sequence of the most recent common ancestor for all the sequences. Fu [23] showed that

θ
E(ξi ) = .
i
Remember that i is the number of times this haplotype occurs in the sample. Using this
result, we can rewrite θ̂π and θ̂k as
!−1 n−1
n
i(n − i)ξˆi
X
θ̂π =
2 i=1

1 n−1
ξˆi
X
θ̂k =
an i=1

213
There are also at least two other statistics that could be used to estimate θ from these data:
!−1 n−1
n
i2 ξˆi
X
θH =
2 i=1
n−1
1
iξˆi
X
θL = .
n−1 i=1

Notice that to estimate θH or θL , you’ll need information on the sequence of an ancestral


haplotype. To get this you’ll need an outgroup. As we’ve already seen, we can get estimates
of θπ and θk without an outgroup.
Fay and Wu [21] suggest using the statistic

H = θ̂π − θH

to detect departures from neutrality. So what’s the difference between Fay and Wu’s H
and Tajima’s D? Well, notice that there’s an i2 term in θH . The largest contributions
to this estimate of θ are coming from alleles in relatively high frequency, i.e., those with
lots of copies in our sample. In contrast, intermediate-frequency alleles contribute most to
estiamtes of θπ . Thus, H measures departures from neutrality that are reflected in the dif-
ference between high-frequency and intermediate-frequency alleles. In contrast, D measures
departures from neutrality that are reflected in the difference between low-frequency and
intermediate frequency alleles. Thus, while D is sensitive to population expansion (because
the number of segregating sites responds more rapidly to changes in population size than
the nucleotide heterozygosity), H will not be. As a result, combining both tests may allow
you to distinguish populaion expansion from purifying selection.

Zeng et al.’s E
So if we can use D to compare estimates of θ from intermediate- and low-frequency variants
and H to compare estimates from intermediate- and high-frequency variatnts, what about
comparing estimates from high-frequency and low-frequency variants? Funny you should
ask, Zeng et al. [90] suggest looking at

E = θL − θk .

E doesn’t put quite as much weight on high frequency variants as H,17 but it still provides
a useful contrast between estimates of θ dertived from high-frequency variants and low-
frequency variants. For example, suppose a new favorable mutation occurs and sweeps to
17
Because it has an i rather than an i2 in its formula

214
fixation. All alleles other than those carrying the new allele will be eliminated from the
population. Once the new variant is established, neutral variaton will begin to accumulate.
The return to neutral expectations after such an event, however, happens much more rapidly
in low frequency variants than in high-frequency ones. Thus, a negative E may provide
evicence of a recent selective sweep at the locus being studied. For similar reasons, it will
be a sensitive indicator of recent population expansion.

Appendix
I noted earlier that π will be little affected by a population bottleneck unless it is prolonged
and severe. Here’s one way of thinking about it that might make that counterintuitive
assertion a little clearer.
P
Remember that π is defined as π = xi xj δij /N . Unless one haplotype in the population
happens to be very divergent from all other haplotypes in the population, the magnitude of π
will be approximately equal to the average difference between any two nucleotide sequences
times the probability that two randomly chosen sequences represent different haplotypes.
Thus, we can treat haplotypes as alleles and ask what happens to heterozygosity as a result
of a bottleneck. Here we recall the relationship between identity by descent and drift, and
we pretend that homozygosity is the same thing as identity by descent. If we do, then the
heterozygosity after a bottleneck is
t
1

Ht = 1 − H0 .
2Ne
So consider a really extreme case: a population reduced to one male and one female for 5
generations. Ne = 2, so H5 ≈ 0.24H0 , so the population would retain roughly 24% of its
original diversity even after such a bottleneck. Suppose it were less severe, say, five males
and five females for 10 generations, then Ne = 10 and H10 ≈ 0.6.

215
216
Part VI

Phylogeography

217
Chapter 23

AMOVA and Statistical


phylogeography

The notation now becomes just a little bit more complicated. We will now use xik to refer
to the frequency of the ith haplotype in the kth population. Then
K
1 X
xi· = xik
K k=1
is the mean frequency of haplotype i across all populations, where K is the number of
populations. We can now define
X
πt = xi· xj· δij
ij
K X
1 X
πs = xik xjk δij ,
K k=1 ij
where πt is the nucleotide sequence diversity across the entire set of populations and πs is
the average nucleotide sequence diversity within populations. Then we can define
πt − π s
Φst = , (23.1)
πt
which is the direct analog of Wright’s Fst for nucleotide sequence diversity. Why? Well, that
requires you to remember stuff we covered eight or ten weeks ago.
To be a bit more specific, refer back to http://darwin.eeb.uconn.edu/eeb348/
lecture-notes/wahlund/node4.html. If you do, you’ll see that we defined
Hi
FIT = 1 − ,
Ht

219
where Hi is the average heterozygosity in individuals and Ht is the expected panmictic
heterozygosity. Defining Hs as the average panmictic heterozygosity within populations, we
then observed that
Hi
1 − FIT =
Ht
Hi Hs
=
Hs Ht
= (1 − FIS )(1 − FST )
1 − FIT
1 − FST =
1 − FIS
(1 − FIS ) − (1 − FIT )
FST =
1 − FIS
(Hi /Hs ) − (Hi /Ht )
=
Hi /Hs
Hs
= 1− .
Ht
In short, another way to think about FST is
Ht − Hs
FST = . (23.2)
Ht

Now if you compare equation (23.1) and equation (23.2), you’ll see the analogy.
So far I’ve motivated this approach by thinking about δij as the fraction of sites at which
two haplotypes differ and πs and πt as estimates of nucleotide diversity. But nothing in the
algebra leading to equation (23.1) requires that assumption. Excoffier et al. [20] pointed out
that other types of molecular data can easily be fit into this framework. We simply need
an appropriate measure of the “distance” between different haplotypes or alleles. Even with
nucleotide sequences the appropriate δij may reflect something about the mutational pathway
likely to connect sequences rather than the raw number of differences between them. For
example, the distance might be a Jukes-Cantor distance or a more general distance measure
that accounts for more of the properties we know are associated with nucleotide substitution.
The idea is illustrated in Figure 23.1. Once we have δij for all pairs of haplotypes or alleles
in our sample, we can use the ideas lying behind equation (23.1) to partition diversity — the
average distance between randomly chosen haplotypes or alleles — into within and among
population components.1 This procedure for partitioning diversity in molecular markers is
1
As with F -statistics, the actual estimation procedure is more complicated than I describe here. Stan-

220
Figure 23.1: Converting raw differences in sequence (or presence and absence of restriction
sites) into a minimum spanning tree and a mutational measure of distance for an analysis of
molecular variance (from [20]).

referred to as an analysis of molecular variance or AMOVA (by analogy with the ubiquitous
statistical procedure analysis of variance, ANOVA). Like Wright’s F -statistics, the analysis
can include several levels in the hierarchy.

An AMOVA example
Excoffier et al. [20] illustrate the approach by presenting an analysis of restriction haplotypes
in human mtDNA. They analyze a sample of 672 mitochondrial genomes representing two
populations in each of five regional groups (Figure 23.2). They identified 56 haplotypes
in that sample. A minimum spanning tree illustrating the relationships and the relative
frequency of each haplotype is presented in Figure 23.3.
It’s apparent from the figure that haplotype 1 is very common. In fact, it is present in
dard approaches to AMOVA use method of moments calculations analogous to those introduced by Weir
and Cockerham for F -statistics [84]. Bayesian approaches are possible, but they are not yet widely avail-
able (meaning, in part, that I know how to do it, but I haven’t written the necessary software yet).

221
Figure 23.2: Locations of human mtDNA samples used in the example analysis (from [20]).

Figure 23.3: Minimum spanning network of human mtDNA samples in the example. The
size of each circle is proportional to its frequency (from [20]).

222
Component of differentiation Φ-statistics
Among regions ΦCT = 0.220
Among populations within regions ΦSC = 0.044
Among all populations ΦST = 0.246

Table 23.1: AMOVA results for the human mtDNA sample (from [20]).

substantial frequency in every sampled population. An AMOVA using the minimum span-
ning network in Figure 23.3 to measure distance produces the results shown in Table 23.1.
Notice that there is relatively little differentiation among populations within the same geo-
graphical region (ΦSC = 0.044). There is, however, substantial differentiation among regions
(ΦCT = 0.220). In fact, differences among populations in different regions is responsible for
nearly all of the differences among populations (ΦST = 0.246). Notice also that Φ-statistics
follow the same rules as Wright’s F -statistics, namely
1 − ΦST = (1 − ΦSC )(1 − ΦCT )
0.754 = (0.956)(0.78) ,
within the bounds of rounding error.2

An extension
As you may recall,3 Slatkin [75] pointed out that there is a relationship between coalescence
time and Fst . Namely, if mutation is rare then
t̄ − t̄0
FST ≈ ,

where t̄ is the average time to coalescence for two genes drawn at random without respect
to population and t̄0 is the average time to coalescence for two genes drawn at random
from the same populations. Results in [37] show that when δij is linearly proportional to
the time since two sequences have diverged, ΦST is a good estimator of FST when FST is
thought of as a measure of the relative excess of coalescence time resulting from dividing a
species into several population. This observation suggests that the combination of haplotype
frequency differences and evolutionary distances among haplotypes may provide insight into
the evolutionary relationships among populations of the same species.
2
There wouldn’t be any rounding error if we had access to the raw data.
3
Look back at http://darwin.eeb.uconn.edu/eeb348/lecture-notes/coalescent/node6.html for
the details.

223
Statistical phylogeography: a preview
Nested clade analysis represented the earliest attempt to develop a formal approach to using
an estimate of phylogenetic relationships among haplotypes to infer something both about
the biogeographic history of the populations in which they are contained and the evolutionary
processes associated with the pattern of diversification implied by the phylogenetic relation-
ships among haplotypes and their geographic distribution. The statistical parsimony part
of NCA depends heavily on coalescent theory for calculating the “limits” of parsimony. As
a result, NCA combines aspects of pure phylogenetic inference — parsimony — with aspects
of pure population genetics — coalescent theory — to develop a set of inferences about the
phylogeographic history of populations within species. But well before NCA was developed,
Pamilo and Nei [68] pointed out that the phylogenetic relationships of a single gene might
be different from those of the populations from which the samples were collected.

Gene trees versus population trees


There are several reasons why gene trees might not match population trees.

• It could simply be a problem of estimation. Given a particular set of gene sequences,


we estimate a phylogenetic relationship among them. But our estimate could be wrong.
In fact, given the astronomical number of different trees possible with 50 or 60 distinct
sequences, it’s virtually certain to be wrong somewhere. We just don’t know where. It
could be that if we had the right gene tree it would match the species tree.

• There might have been a hybridization event in the past so that the phylogenetic
history of the gene we’re studying is different from that of the populations from which
we sampled. Hybridization is especially likely to have a large impact if the locus for
which we have information is uniparentally inherited, e.g., mitochondrial or chloroplast
DNA. A single hybridization event in the distant past in which the maternal parent
was from a different population will give mtDNA or cpDNA a very different phylogeny
than nuclear genes that underwent a lot of backcrossing after the hybridization event.

• If the ancestral population was polymorphic at the time the initial split occurred alleles
that are more distantly related might, by chance, end up in the same descendant
population (see Figure 23.4)

As Pamilo and Nei showed, it’s possible to calculate the probability of discordance be-
tween the gene tree and the population tree using some basic ideas from coalescent theory.

224
Figure 23.4: Discordance between gene and population trees as a result of ancestral poly-
morphism (from [47]).

That leads to a further refinement, using coalescent theory directly to examine alternative
biogeographic hypotheses.

Phylogeography of montane grasshoppers


Lacey Knowles studied grasshoppers in the genus Melanopus. She collected 1275bp of DNA
sequence data from cytochrome oxidase I (COI) from 124 individuals of M. oregonensis and
two outgroup speices. The specimens were collected from 15 “sky-island” sites in the northern
Rocky Mountains (see Figure 23.5; [47]). Two alternative hypotheses had been proposed to
describe the evolutionary relationships among these grasshoppers (refer to Figure 23.6 for a
pictorial representation):

• Widespread ancestor: The existing populations might represent independently de-


rived remnants of a single, widespread population. In this case all of the populations
would be equally related to one another.

• Multiple glacial refugia: Populations that shared the same refugium will be closely
related while those that were in different refugia will be distantly related.

As is evident from Figure 23.6, the two hypotheses have very different consequences for
the coalescent history of alleles in the sample. Since the interrelationships between divergence

225
Figure 23.5: Collection sites for Melanopus oregonensis in the northern Rocky Moun-
tains (from [47]).

Figure 23.6: Pictorial representations of the “widespread ancestor” (top) and “glacial refu-
gia” (bottom) hypotheses (from [47]).

226
times and time to common ancestry differ so markedly between the two scenarios, the pattern
of sequence differences found in relation to the geographic distribution will differ greatly
between the two scenarios.
Using techniques described in Knowles and Maddison [48], Knowles simulated gene trees
under the widespread ancestor hypothesis. She then placed them within a population tree
representing the multiple glacial refugia hypothesis and calculated a statistic, s, that mea-
sures the discordance between a gene tree and the population tree that contains it. This
gave her a distribution of s under the widespread ancestor hypothesis. She compared the s
estimated from her actual data with this distribution and found that the observed value of
s was only 1/2 to 1/3 the size of the value observed in her simulations.4 In short, Knowles
presented strong evidence that her data are not consistent with the widespread ancestor
hypothesis.
A few years before Knowles [47] appeared Beerli and Felsenstein [6, 7] proposed a
coalescent-based method to estimate migration rates among populations. As with other
analytical methods we’ve encountered in this course, the details can get pretty hairy, but
the basic idea is (relatively) simple.
Recall that in a single population we can describe the coalescent history of a sample
without too much difficulty. Specifically, given a sample of n alleles in a diploid population
with effective size Ne , the probability that the first coalescent event took place t generations
ago is
! !t−1
n(n − 1) n(n − 1)
P (t|n, Ne ) = 1− . (23.3)
4Ne 4Ne
Now suppose that we have a sample of alleles from K different populations. To keep things
(relatively) simple, we’ll imagine that we have a sample of n alleles from every one of these
populations and that every population has an effective size of Ne . In addition, we’ll imagine
that there is migration among populations, but again we’ll keep it really simple. Specifically,
we’ll assume that the probability that a given allele in our sample from one population had
its ancestor in a different population in the immediately preceding generation is m.5 Under
this simple scenario, we can again construct the coalescent history of our sample. How?
Funny you should ask.
We start by using the same logic we used to construct equation (23.3). Specifically, we
ask “What’s the probability of an ‘event’ in the immediately preceding generation?” The
4
The discrepancy was largest when divergence from the widespread ancestor was assumed to be very
recent.
5
In other words, m is the backwards migration rate, the probability that a gene in one population came
from another population in the preceding generation. This is the same migration rate we encountered weeks
ago when we discussed the balance between drift and migration.

227
complication is that there are two kinds of events possible: (1) a coalescent event and (2)
a migration event. As in our original development of the coalescent process, we’ll assume
that the population sizes are large enough that the probability of two coalescent events in a
single time step is so small as to be negligible. In addition, we’ll assume that the number
of populations and the migration rates are small enough that the probability of more than
one event of either type is so small as to be negligible. That means that all we have to do
is to calculate the probability of either a coalescent event or a migration event and combine
them to calculate the probability of an event. It turns out that it’s easiest to calculate the
probability that there isn’t an event first and then to calculate the probability that there is
an event as one minus that.
We already know that the probability of a coalescent event in population k, is
n(n − 1)
Pk (coalescent|n, Ne ) = ,
4Ne
so the probability that there is not a coalescent event in any of our K populations is
!K
n(n − 1)
P (no coalescent|n, Ne , K) = 1 − .
4Ne
If m is the probability that there was a migration event in a particular population than the
probability that there is not a migration event involving any of our nK alleles6 is

P (no migration|m, K) = (1 − m)nK .

So the probability that there is an event of some kind is

P (event|n, m, Ne , K) = 1 − P (no coalescent|n, Ne , K)P (no migration|m, K) .

Now we can calculate the time back to the first event

P (event at t|n, m, Ne , K) = P (event|n, m, Ne , K) (1 − P (event|n, m, Ne , K))t−1 .

We can then use Bayes theorem to calculate the probability that the event was a coalescence
or a migration and the populations involved. Once we’ve done that, we have a new population
configuration and we can start over. We continue until all of the alleles have coalesced into
a single common ancestor, and then we have the complete coalescent history of our sample.7
6
K populations each with n alleles
7
This may not seem very simple, but just think about how complicated it would be if I allowed every
population to have a different effective size and if I allowed each pair of populations to have different migration
rates between them.

228
That’s roughly the logic that Beerli and Felsenstein use to construct coalescent histories for
a sample of alleles from a set of populations — except that they allow effective population
sizes to differ among populations and they allow migration rates to differ among all pairs of
populations. As if that weren’t bad enough, now things start to get even more complicated.
There are lots of different coalescent histories possible for a sample consisting of n alleles
from each of K different populations, even when we fix m and Ne . Worse yet, given any
one coalescent history, there are a lot of different possible mutational histories possible. In
short, there are a lot of different possible sample configurations consistent with a given set
of migration rates and effective population size. Nonetheless, some combinations of m and
Ne will make the data more likely than others. In other words, we can construct a likelihood
for our data:
P (data|m, Ne ) ∝ f (n, m, Ne , K) ,

where f (n, m, Ne , K) is some very complicated function of the probabilities we derived above.
In fact, the function is so complicated, we can’t even write it down. Beerli and Felsenstein,
being very clever people, figured out a way to simulate the likelihood, and Migrate provides
a (relatively) simple way that you can use your data to estimate m and Ne for a set of
populations. In fact, Migrate will allow you to estimate pairwise migration rates among
all populations in your sample, and since it can simulate a likelihood, if you put priors on
the parameters you’re interested in, i.e., m and Ne , you can get Bayesian estimates of those
parameters rather than maximum likelihood estimates, including credible intervals around
those estimates so that you have a good sense of how reliable your estimates are.8
There’s one further complication I need to mention, and it involves a lie I just told you.
Migrate can’t give you estimates of m and Ne . Remember how every time we’ve dealt
with drift and another process we always end up with things like 4Ne m, 4Ne µ, and the like.
Well, the situation is no different here. What Migrate can actually estimate are the two
parameters 4Ne m and θ = 4Ne µ.9 How did µ get in here when I only mentioned it in passing?
Well, remember that I said that once the computer has constructed a coalescent history, it
has to apply mutations to that history. Without mutation, all of the alleles in our sample
would be identical to one another. Mutation is what what produces the diversity. So what
we get from Migrate isn’t the fraction of a population that’s composed of migrants. Rather,
we get an estimate of how much migration contributes to local population diversity relative
to mutation. That’s a pretty interesting estimate to have, but it may not be everything that
we want.
8
If you’d like to see a comparision of maximum likelihood and Bayesian approaches, Beerli [4] provides
an excellent overview.
9
Depending on the option you pick when you run Migrate you can either get θ and 4Ne m or θ and
M = m/µ.

229
There’s a further complication to be aware of. Think about the simulation process I
described. All of the alleles in our sample are descended from a single common ancestor.
That means we are implicitly assuming that the set of populations we’re studying have been
around long enough and have been exchanging migrants with one another long enough that
we’ve reached a drift-mutation-migration equilibrium. If we’re dealing with a relatively small
number of populations in a geographically limited area, that may not be an unreasonable
assumption, but what if we’re dealing with populations of crickets spread across all of the
northern Rocky Mountains? And what if we haven’t sampled all of the populations that
exist?10 In many circumstances, it may be more appropriate to imagine that populations
diverged from one another at some time in the not too distant past, have exchanged genes
since their divergence, but haven’t had time to reach a drift-mutation-migration equilibrium.
What do we do then?

Divergence and migration


Nielsen and Wakely [62] consider the simplest generalization of Beerli and Felsenstein [6, 7]
you could imagine (Figure 23.7). They consider a situation in which you have samples from
only two populations and you’re interested in determining both how long ago the populations
diverged from one another and how much gene exchange there has been between the pop-
ulations since they diverged. As in Migrate mutation and migration rates are confounded
with effective population size, and the relevant parameters become:
• θa , which is 4Ne µ in the ancestral population.
• θ1 , which is 4Ne µ in the first population.
• θ2 , which is 4Ne µ in the second population.
• M1 , which is 2Ne m in the first population, where m is the fraction of the first population
composed of migrants from the second population.
• M2 , which is 2Ne m in the second population.
• T , which is the time since the populations diverged. Specifically, if there have been t
units since the two populations diverged, T = t/2N1 , where N1 is the effective size of
the first population.

10
Beerli [5] discusses the impact of “ghost” populations. He concludes that you have to be careful about
which populations you sample, but that you don’t necessarily need to sample every population. Read the
paper for the details.

230
Figure 23.7: The simple model developed by Nielsen and Wakeley [62]. θa is 4Ne µ in the
ancestral population; θ1 and θ2 are 4Ne µ in the descendant populations; M1 and M2 are
2Ne m, where m is the backward migration rate; and T is the time since divergence of the
two populations.

Given that set of parameters, you can probably imagine that you can calculate the
likelihood of the data for a given set of parameters.11 Once you can do that you can either
obtain maximum-likelihood estimates of the parameters by maximizing the likelihood, or
you can place prior distributions on the parameters and obtain Bayesian estimates from the
posterior distribution. Either way, armed with estimates of θa , θ1 , θ2 , M1 , M2 , and T you
can say something about: (1) the effective population sizes of the two populations relative to
one another and relative to the ancestral population, (2) the relative frequency with which
migrants enter each of the two populations from the other, and (3) the time at which the
two populations diverged from one another. Keep in mind, though, that the estimates of
M1 and M2 confound local effective population sizes with migration rates. So if M1 > M2 ,
for example, it does not mean that the fraction of migrants incorporated into population 1
exceeds the fraction incorporated into population 2. It means that the impact of migration
has been felt more strongly in population 1 than in population 2.

11
As with Migrate, you can’t calculate the likelihood explicitly, but you can approximate it numerically.
See [62] for details.

231
An example
Orti et al. [67] report the results of phylogenetic analyses of mtDNA sequences from 25
populations of threespine stickleback, Gasterosteus aculeatus, in Europe, North America,
and Japan. The data consist of sequences from a 747bp fragment of cytochrome b. Nielsen
and Wakely [62] analyze these data using their approach. Their analyses show that “[a] model
of moderate migration and very long divergence times is more compatible with the data than
a model of short divergence times and low migration rates.” By “very long divergence times”
they mean T > 4.5, i.e., t > 4.5N1 . Focusing on populations in the western (population 1)
and eastern Pacific (population 2), they find that the maximum likelihood estimate of M1 is 0,
indicating that there is little if any gene flow from the eastern Pacific (population 2) into the
western Pacific (population 1). In contrast, the maximum likelihood estiamte of M2 is about
0.5, indicating that one individual is incorporated into the eastern Pacific population from
the western Pacific population every other generation. The maximum-likelihood estimates
of θ1 and θ2 indicate that the effective size of the population eastern Pacific population is
about 3.0 times greater than that of the western Pacific population.

Extending the approach to multiple populations


Several years ago, Jody Hey announced the release of IMa2. Building on work described
in Hey and Nielsen [33, 34], IMa2 allows you to estimate relative divergence times, relative
effective population sizes, and relative pairwise migration rates for more than two populations
at a time. That flexibility comes at a cost, of course. In particular, you have to specify the
phylogenetic history of the populations before you begin the analysis.

Approximate Bayesian computation


Just when you thought it was safe to go back into the water, I’m going to complicate things
even further.12 Nielsen, Wakely, and Hey introduced a very flexible and very powerful ap-
proach for making inferences about population histories, including the history of migration
among populations [62, 33, 34]. It uses coalescent theory to calculate likelihoods and esti-
mate times of population divergence, migration rates, and populations sizes in a surprisingly
flexible way, but even it doesn’t cover all possible scenarios. It allows for non-equilibrium
scenarios in which the populations from which we sampled diverged from one another at
different times, but suppose that we think our populations have dramatically increased in
12
Look on the bright side. The semester is nearly over. Besides, you need to know a little about approxi-
mate Bayesian computation in order to write up your final problem.

232
size over time (as in humans) or dramatically changed their distribution (as with an invasive
species). Is there a way to use genetic data to gain some insight into those processes? Would
I be asking that question if the answer were “No”?

An example
Let’s change things up a bit this time and start with an example of a problem we’d like to
solve first. Once you see what the problem is, then we can talk about how we might go about
solving it. The case we’ll discuss is the case of the cane toad, Bufo marinus, in Australia.
You may know that the cane toad is native to the American tropics. It was purposely
introduced into Australia in 1935 as a biocontrol agent, where it has spread across an area
of more than 1 million km2 . Its range is still expanding in northern Australia and to a lesser
extent in eastern Australia (Figure 23.8).13 Estoup et al. [19] collected microsatellite data
from 30 individuals in each of 19 populations along roughly linear transects in the northern
and eastern expansion areas.
With these data they wanted to distinguish among five possible scenarios describing the
geographic spread:
• Isolation by distance: As the expansion proceeds, each new population is founded
by or immigrated into by individuals with a probability proportional to the distance
from existing populations.
• Differential migration and founding: Identical to the preceding model except
that the probability of founding a population may be different from the probability of
immigration into an existing population.
• “Island” migration and founding: New populations are established from existing
populations without respect to the geographic distances involved, and migration occurs
among populations without respect to the distances involved.
• Stepwise migration and founding with founder events: Both migration and
founding of populations occurs only among immediately adjacent populations. More-
over, when a new population is established, the number of individuals involved may
be very small.
• Stepwise migration and founding without founder events: Identical to the
preceding model except that when a population is founded its size is assumed to be
equal to the effective population size.
13
All of this information is from the introduction to [19].

233
Figure 23.8: Maps showing the expansion of the cane toad population in Australia since its
introduction in 1935 (from [19]).

234
That’s a pretty complex set of scenarios. Clearly, you could use Migrate or IMa2 to
estimate parameters from the data Estoup et al. [19] report, but would those parameters
allow you to distinguish those scenarios? Not in any straightforward way that I can see.
Neither Migrate nor IMa2 distinguishes between founding and migration events for example.
And with IMa2 we’d have to specify the relationships among our sampled populations before
we could make any of the calculations. In this case we want to test alternative hypotheses
of population relationship. So what do we do?

Approximate Bayesian Computation


Well, in principle we could take an approach similar to what Migrate and IMa2 use. Let’s
start by reviewing what we did last time14 with Migrate and IMa2. In both cases, we knew
how to simulate data given a set of mutation rates, migration rates, local effective population
sizes, and times since divergence. Let’s call that whole, long string of parameters φ and our
big, complicated data set X. If we run enough simulations, we can keep track of how many
of those simulations produce data identical to the data we collected. With those results in
hand, we can estimate P (X|φ), the likelihood of the data, as the fraction of simulations
that produce data identical to the data we collected.15 In principle, we could take the
same approach in this, much more complicated, situation. But the problem is that there
are an astronomically large number of different possible coalescent histories and different
allelic configurations possible with any one population history both because the population
histories being considered are pretty complicated and because the coalescent history of every
locus will be somewhat different from the coalescent history at other loci. As a result, the
chances of getting any simulated samples that match our actual samples is virtually nil, and
we can’t estimate P (X|φ) in the way we have so far.
Approximate Bayesian computation is an approach that allows us to get around this
problem. It was introduced by Beaumont et al. [3] precisely to allow investigators to get
approximate estimates of parameters and data likelihoods in a Bayesian framework. Again,
the details of the implementation get pretty hairy,16 but the basic idea is relatively straight-
forward.17

1. Calculate “appropriate” summary statistics for your data set, e.g., pairwise estimates of
14
More accurately, what Peter Beerli, Joe Felsenstein, Rasmus Nielsen, John Wakeley, and Jody Hey did
15
The actual implementation is a bit more involved than this, but that’s the basic idea.
16
You’re welcome to read the Methods in [3], and feel free to ask questions if you’re interested.
17
OK. This maybe calling it “relatively straightforward” is misleading. Even this simplified outline is
fairly complicated, but compared to some of what you’ve already survived in this course, it may not look
too awful.

235
φST (possibly one for every locus if you’re using microsatellite or SNP data), estimates
of within population diversity, counts of the number of segregating sites (for nucleotide
sequence data, both within each population and across the entire sample). Call that
set of summary statistics S.

2. Specify a prior distribution for the unknown parameters, φ.

3. Pick a random set of parameter values, φ0 from the prior distribution and simulate a
data set for that set of parameter values.

4. Calculate the same summary statistics for the simulated data set as you calculated for
your actual data. Call that set of statistics S 0 .

5. Calculate the Euclidean distance between S and S 0 . Call it δ. If it’s less than some
value you’ve decided on, δ ∗ , keep track of S 0 and the associated φ0 and δ. Otherwise,
throw all of them away and forget you ever saw them.

6. Return to step 2 and repeat until you you have accepted a large number of pairs of S 0
and φ0 .

Now you have a bunch of S 0 s and a bunch of φ0 s that produced them. Let’s label them
Si and φi , and let’s remember what we’re trying to do. We’re trying to estimate φ for our
real data. What we have from our real data is S. So far it seems as if we’ve worked our
computer pretty hard, but we haven’t made any progress.
Here’s where the trick comes in. Suppose we fit a regression to the data we’ve simulated

φi = α + Si β +  ,

where α is an intercept, β is a vector of regression coefficients relating each of the summary


statistics to φ, and  is an error vector.18 Now we can use that regression relationship to
predict what φ should be in our real data, namely

φ = α + Sβ .

If we throw in some additional bells and whistles, we can approximate the posterior distri-
bution of our parameters. With that we can get not only a point estimate for φ, but also
credible intervals for all of its components.
18
I know what you’re thinking to yourself now. This doesn’t sound very simple. Trust me. It is as simple
as I can make it. The actual procedure involves local linear regression. I’m also not telling you how to go
about picking δ or how to pick “appropriate” summary statistics. There’s a fair amount of “art” involved
in that.

236
Back to the real world19
OK. So now we know how to do ABC, how do we apply it to the cane toad data. Well, using
the additional bells and whistles I mentioned, we end up with a whole distribution of δ for
each of the scenarios we try. The scenario with the smallest δ provides the best fit of the
model to the data. In this case, that corresponds to model 4, the stepwise migration with
founder model, although it is only marginally better than model 1 (isolation by distance)
and model 2 (isolation by distance with differential migration and founding) in the northern
expansion area (Figure 23.9).
Of course, we also have estimates for various parameters associated with this model:
• Nes : the effective population size when the population is stable.
• Nef : the effective population size when a new population is founded.
• FR : the founding ratio, Nes /Nef .
• m: the migration rate.
• Nes m: the effective number of migrants per generation.
The estimates are summarized in Table 23.2. Although the credible intervals are fairly
broad,20 There are a few striking features that emerge from this analysis.
• Populations in the northern expansion area are larger, than those in the eastern ex-
pansion region. Estoup et al. [19] suggest that this is consistent with other evidence
suggesting that ecological conditions are more homogeneous in space and more favor-
able to cane toads in the north than in the east.
• A smaller number of individuals is responsible for founding new populations in the
east than in the north, and the ratio of “equilibrium” effective size to the size of the
founding population is bigger in the east than in the north. (The second assertion is
only weakly supported by the results.)
• Migration among populations is more limited in the east than in the north.
As Estoup et al. [19] suggest, results like these could be used to motivate and calibrate
models designed to predict the future course of the invasion, incorporating a balance between
gene flow (which can reduce local adaptation), natural selection, drift, and colonization of
new areas.
19
Or at least something resembling the real world
20
And notice that these are 90% credible intervals, rather than the conventional 95% credible intervals,
which would be even broader.

237
Figure 23.9: Posterior distribution of δ for the five models considered in Estoup et al. [19].

238
Parameter area mean (5%, 90%)
Nes east 744 (205, 1442)
north 1685 (526, 2838)
Nef east 78 (48, 118)
north 311 (182, 448)
FR east 10.7 (2.4, 23.8)
north 5.9 (1.6, 11.8)
m east 0.014 (6.0 × 10−6 , 0.064)
north 0.117 (1.4 × 10−4 , 0.664)
Nes m east 4.7 (0.005, 19.9)
north 188 (0.023, 883)

Table 23.2: Posterior means and 90% credible intervals for parameters of model 4 in the
eastern and northern expansion areas of Bufo marinus.

Limitations of ABC
If you’ve learned anything by now, you should have learned that there is no perfect method.
An obvious disadvantage of ABC relative to either Migrate or IMa2 is that it is much more
computationally intensive.

• Because the scenarios that can be considered are much more complex, it simply takes
a long time to simulate all of the data.

• In the last few years, one of the other disadvantages — that you had to know how to
do some moderately complicated scripting to piece together several different pack-
ages in order to run analysis — has become less of a problem. popABC (http:
//code.google.com/p/popabc/ and DIYABC (http://www1.montpellier.inra.fr/
CBGP/diyabc/) make it relatively easy21 to perform the simulations.

• Selecting an appropriate set of summary statistics isn’t easy, and it turns out that
which set is most appropriate may depend on the value of the parameters that you’re
trying to estimate and the which of the scenarios that you’re trying to compare is
closest to the actual scenario applying to the populations from which you collected the
data. Of course, if you knew what the parameter values were and which scenario was
closest to the actual scenario, you wouldn’t need to do ABC in the first place.
21
Emphasis on “relatively”.

239
• In the end, ABC allows you to compare a small number of evolutionary scenarios. It
can tell you which of the scenarios you’ve imagined provides the best combination of
fit to the data and parsimonious use of parameters (if you choose model comparison
statistics that include both components), but it takes additional work to determine
whether the model is adequate, in the sense that it does a good job of explaining the
data. Moreover, even if you determine that the model is adequate, you can’t exclude
the possibility that there are other scenarios that might be equally adequate — or even
better.

240
Chapter 24

Population genomics

In the past decade, the development of high-throughput methods for genomic sequencing
(next-generation sequencing: NGS) have revolutionized how many geneticists collect data.
It is now possible to produce so much data so rapidly that simply storing and processing the
data poses great challenges [61]. The Nekrutenko and Taylor review [61] doesn’t even discuss
the new challenges that face population geneticists and evolutionary biologists as they start
to take advantage of those tools, nor did it discuss the promise these data hold for providing
new insight into long-standing questions, but the challenges and the promise are at least as
great as those they do describe.

To some extent the most important opportunity provided by NGS sequencing is simply
that we now have a lot more data to answer the same questions. For example, using a
technique like RAD sequencing [2] or genotyping-by-sequencing (GBS: [17]), it is now possible
to identify thousands of polymorphic SNP markers in non-model organisms, even if you don’t
have a reference genome available. As we’ve seen several times this semester, the variance
associated with drift is enormous. Many SNPs identified through RAD-Seq or GBS are likely
to be independently inherited. Thus, the amount and pattern of variation at each locus will
represent an independent sample from the underlying evolutionary process. As a result,
we should be able to get much better estimates of fundamental parameters like θ = 4Ne µ,
M = 4Ne m, and R = 4Ne r and to have much greater power to discriminate among different
evolutionary scenarios. Willing et al. [86], for example, present simulations suggesting that
accurate estimates of FST are possible with sample sizes as small as 4–6 individuals per
population, so long as the number of markers used for inference is greater than 1000.

241
A quick overview of NGS methods
I won’t review the chemistry used for next-generation sequencing. It changes very rapidly,
and I can’t keep up with it. Suffice it to say that 454 Life Sciences, Illumina, PacBio, and
probably other companies I don’t know about each have different approaches to very high
throughput DNA sequencing. What they all have in common is that the whole genome
is broken into small fragments and sequnced and that a single run through the machine
produces an enormous amount of data, 900-1800 Gb from a HiSeq X for example (http:
//www.illumina.com/systems/sequencing.html; accessed 26 April 2015).1

RAD sequencing
Baird et al. [2] introduced RAD about 7 years ago. One of its great attraction for evolutionary
geneticists is that RAD-seq can be used in any organism from which you can extract DNA
and the laboratory manipulations are relatively straightforward.

• Digest genomic DNA from each individual with a restriction enzyme, and ligate an
adapter to the resulting fragments. The adapter includes a forward amplification
primer, a sequencing primer and a “barcode” used to identify the individual from
which the DNA was extracted.

• Pool the individually barcoded samples (“normalizing” the mixture so that roughly
equal amounts of DNA from each individual are present) shear them and select those
of a size appropriate for the sequencing platform you are using.

• Ligate a second adapter to the sample, where the second adapter is the reverse com-
plement of the reverse amplification primer.

• PCR amplification will enrich only DNA fragments having both the forward and reverse
amplification primer.

The resulting library consists of sequences within a relatively small distance from restric-
tion sites.
1
In NGS applications for phylogeny, a strategy of targeted enrichment is often used. In this approach,
pre-identified parts of the genome are “baited” using primers and those parts of the genome are enriched
through PCR before the sequencing library is constructed [53].

242
Genotyping-by-sequencing
Genotyping-by-sequencing (GBS) is a similar approach.
• Digest genomic DNA with a restriction enzyme and ligate two adapters to the genomic
fragments. One adapter contains a barcode and the other does not.
• Pool the samples.
• PCR amplify and sequence. Not all ligated fragments will be sequenced because some
will contain only one adapter and some fragments will be too long for the NGS platform.
Once an investigator has her sequenced fragments back, she can either map the fragments
back to a reference genome or she can assemble the fragments into orthologous sequences de
novo. I’m not going to discuss either of those processes, but you can imagine that there’s a
lot of bioinformatic processing going on. What I want to focus on is what you do with the
data and how you interpret it.

Next-generation phylogeography
The American pitcher plant mosquito Wyeomyia smithii has been extensively studied for
many years. It’s a model organism for ecology, but its genome has not been sequenced. An
analysis of COI from 20 populatins and two outgroups produced the set of relationships you
see in Figure 24.1 [18]. As you can see, this analysis allows us to distinguish a northern
group of populations from a southern group of populations, but it doesn’t provide us any
reliable insight into finer scale relationships.
Using the same set of samples, the authors used RAD sequencing to identify 3741 SNPs.
That’s more than 20 times the number of variable sites found in COI, 167. Not surprisingly,
the large number of additional sites allowed the authors to produce a much more highly
resolved phylogeny (Figure 24.2). With this phylogeny it’s easy to see that southern popu-
lations are divided into two distinct groups, those from North Carolina and those from the
Gulf Coast. Similarly, the northern group of populations is subdivided into those from the
Appalachians in North Carolina, those from the mid-Atlantic coast, and those from further
north. The glacial history of North America means that both the mid-Atlantic the popu-
lations farther north must have been derived from one or more southern populations after
the height of the last glaciation. Given the phylogenetic relationships recovered here, it
seems clear that they are most closely related to populations in the Appalachians of North
Carolina.
That’s the promise of NGS for population genetics. What are the challenges? Funny you
should ask.

243
Figure 24.1: Maximum-likelihood phylogenetic tree depicting relationshps among popula-
tions of W. smithii relative to the outgroups244
W. vanduzeei and W. mitchelli (from [18]).
Figure 24.2: A. Geographical distribution of samples included in the analysis. B. Phyloge-
netic relationship of samples included in the analysis.

245
Estimates of nucleotide diversity2
Beyond the simple challenge of dealing with all of the short DNA fragments that emerge
from high-throughput sequencing, there are at least two challenges that don’t arise with data
obtained in more traditional ways.

1. Most studies involve “shotgun” sequencing of entire genomes. In large diploid genomes,
this leads to variable coverage. At sites where coverage is low, there’s a good chance
that all of the reads will be derived from only one of the two chromosomes present, and
a heterozygous individual will be scored as homozygous. “Well,” you might say, “let’s
just throw away all of the sites that don’t have at least 8× coverage.”3 That would
work, but you would also be throwing out a lot of potentially valuable information.4
It seems better to develop an approach that lets us use all of the data we collect.

2. Sequencing errors are more common with high-throughput methods than with tradi-
tional methods, and since so much data is produced, it’s not feasible to go back and
resequence apparent polymorphisms to see if they reflect sequencing error rather than
real differences. Quality scores can be used, but they only reflect the quality of the
reads from the sequencing reaction, not errors that might be introduced during sample
preparation. Again, we might focus on high-coverage sites and ignore “polymorphisms”
associated with single reads, but we’d be throwing away a lot of information.

A better approach than setting arbitrary thresholds and throwing away data is to develop an
explicit model of how errors can arise during sequencing and to use that model to interpret
the data we’ve collected. That’s precisely the approach that Lynch [59] adopts. Here’s how
it works assuming that we have a sample from a single, diploid individual:

• Any particular site will have a sequence profile, (n1 , n2 , n3 , n4 ), corresponding to the
number of times an A, C, G, or T was observed. n = n1 + n2 + n3 + n4 is the depth of
coverage for that site.

• Let  be the probability of a sequencing error at any site, and assume that all errors
are equiprobable, e.g., there’s no tendency for an A to be miscalled as a C rather than
a T when it’s miscalled.5
2
This section draws heavily on [59]
3
If both chromosomes have an equal probability of being sequenced, the probability that one of them is
missed with 8× coverage is (1/2)8 = 1/256.
4
It’s valuable information, providing you know how to deal with in properly.
5
It wouldn’t be hard, conceptually, to allow different nucleotides to have different error rates, e.g., A ,
C , G , T , but the notation would get really complicated.

246
• If the site in question were homozygous A, the probability of getting our observed
sequence profile is:
!
n
P (n1 , n2 , n3 , n4 |homozygous A, ) = (1 − )n1 n−n1 .
n1
A similar relationship holds if the site were homozygous C, G, or T. Thus, we can
calculate the probability of our data if it were homozygous as6
4
p2i
! !
n
(1 − )ni n−ni
X
P (n1 , n2 , n3 , n4 |homozygous, ) = P4 .
i=1 j=1 p2j ni

• If the site in question were heterozygous, the probability of getting our observed se-
quence profile is a bit more complicated. Let k1 be the number of reads from the first
chromosome and k2 be the number of reads from the second chromosome (n = k1 +k2 ).
Then
1 k1 1
!   k2
n
P (k1 , k2 ) =
k1 2 2
!  n
n 1
= .
k1 2
Now consider the ordered genotype xi xj , where xi refers to the nucleotide on the first
chromosome and xj refers to the nucleotide on the second chromosome. The probability
of getting our observed sequence profile from this genotype is:
4 Xk1    
X k1 k2 k −(n −m)
P (n1 , n2 , n3 , n4 |xi , xj , k1 , k2 ) = (1 − δil )m δilk1 −m (1 − δjl )n1 −m δjl2 1 ,
m=0
m n i − m
l=1

where (
1 −  if i = l
δil =
 if i 6= l .
We can use Bayes’ Theorem7 to get
P (n1 , n2 , n3 , n4 |xi , xj , ) = P (n1 , n2 , n3 , n4 |xi , xj , k1 , k2 , )P (k1 , k2 ) ,
and with that in hand we can get
4 X
!
X xi xj
P (n1 , n2 , n3 , n4 |heterozygous, ) = P4 P (n1 , n2 , n3 , n4 |xi , xj )
i=1 j6=i 1− l=1 p2l
6
This expression looks a little different from the one in [59], but I’m pretty sure it’s equivalent.
7
Ask me for details if you’re interested.

247
• Let π be the probability that any site is heterozygous. Then the probability of getting
our data is:

P (n1 , n2 , n3 , n4 |π, ) = πP (n1 , n2 , n3 , n4 |heterozygous, ) + (1 − π)P (n1 , n2 , n3 , n4 |homozygous, ) .

• What we’ve just calculated is the probability of the configuration we observed at a


particular site. The probability of our data is just the product of this probability
across all of the sites in our sample:

S
Y (s) (s) (s) (s)
P (data|π, ) = P (n1 , n2 , n3 , n4 |π, ) ,
s=1

where the superscript (s) is used to index each site in the data.

• What we now have is the likelihood of the data in terms of , which isn’t very inter-
esting since it’s just the average sequencing error rate in our sample, and π, which
is interesting, because it’s the genome-wide nucleotide diversity. Now we “simply”
maximize that likelihood, and we have maximum-likelihood estimates of both parame-
ters. Alternatively, we could supply priors for  and π and use MCMC to get Bayesian
estimates of  and π.

Notice that this genome-wide estimate of nucleotide diversity is obtained from a sample
derived from a single diploid individual. Lynch [59] develops similar methods for estimating
gametic disequilibrium as a function of genetic distance for a sample from a single diploid
individual. He also extends that method to samples from a pair of individuals, and he
describes how to estimate mutation rates by comparing sequences derived from individuals
in mutation accumulation lines with consensus sequences.8
Haubold et al. [31] describe a program implementing these methods. Recall that under
the infinite sites model of mutation π = 4Ne µ. They analyzed data sets from the sea squirt
Ciona intestinalis and the water flea Daphnia pulex (Table 24.1). Notice that the sequencing
error rate in D. pulex is indistinguishable from the nucleotide diversity.

8
Mutation accumulation lines are lines propagated through several (sometimes up to hundreds) of genera-
tions in which population sizes are repeatedly reduced to one or a few individuals, allowing drift to dominate
the dynamics and alleles to “accumulate” with little regard to their fitness effects.

248
Taxon 4Ne µ 4Ne µ (low coverage) 
Cionia intestinalis 0.0111 0.012 0.00113
Daphnia pulex 0.0011 0.0012 0.00121

Table 24.1: Estimates of nucleotide diversity and sequencing error rate in Cionia intestinalis
and Daphnia pulex (results from [31]).

Next-generation AMOVA9
What we’ve discussed so far gets us estimates of some population parameters (4Ne µ, 4Ne r),
but they’re derived from the sequences in a single diploid individual. That’s not much of a
population sample, and it certainly doesn’t tell us anything about how different populations
are from one another. Gompert and Buerkle [26] describe an approach to estimate statistics
very similar to ΦST from AMOVA. Since they take a Bayesian approach to developing their
estimates, they refer to approach as BAMOVA, Bayesian models for analysis of molecular
variance. They propose several related models.
• NGS-individual model: This model assumes that sequencing errors are negligible.10
Under this model, the only trick is that we may or may not pick up both sequences
from a heterozygote. The probability of not seeing both sequences in a heterozygote
is related to the depth of coverage.
• NGS-population model: In some NGS experiments, investigators pool all of the
samples from a population into a single sample. Again, Gompert and Buerkle assume
that sequencing errors are negligible. Here we assume that the number of reads for one
of two alleles at a particular SNP site in a sample is related to the underlying allele
frequency at that site. Roughly speaking, the likelihood of the data at that site is then
!
ni ki
P (xi |pi , ni , ki ) = p (1 − pi )n−ki ,
ki i
where pi is the allele frequency at this site, ni is the sample size, and ki is the count of
one of the alleles in the sample. The likelihood of the data is just the product across
the site-specific likelihoods.11
9
This section depends heavily on [26]
10
Or that they’ve already been corrected. We don’t care how they might have been corrected. We care
only that we can assume that the reads we get from a sequencing run faithfully reflect the sequences present
on each of the chromosomes.
11
The actual model they use is a bit more complicated than this, but the principles are the same.

249
Then, as we did way back when we used a Bayesian appraach to estimate FST [38], we
put a prior on the pi and the parameters of this prior are defined in terms of ΦST (among
other things).12 They also propose a method for detecting SNP loci13 that have unusually
large or small values of ΦST .

BAMOVA example
Gompert and Buerkle [26] used data derived from two different human population data sets:

• 316 fully sequenced genes in an African population and a population with European
ancestry. With these data, they didn’t have to worry about the sequencing errors that
their model neglects and they could simulate pooled samples allowing them to compare
estimates derived from pooled versus individual-level data.

• 12,649 haplotype regions and 11,866 genes derived from 597 individuals across 33 widely
distributed human populations.

In analysis of the first data set, they estimated ΦST = 0.08. Three loci were identified as
having unusually high values of ΦST .

• HSD11B2: ΦST = 0.32(0.16, 0.48). Variants at this locus are associated with an
inherited form of high blood pressure and renal disease. A microsatellite in an intron
of this locus is weakly associated with type 1 diabetes.

• FOXA2: ΦST = 0.32(0.12, 0.51). This gene is involved in reguation of insulin sensi-
tivity.

• POLG2: ΦST = 0.33(0.18, 0.48). This locus was identified as a target of selection in
another study.

In analysis of the 33-population data set, they found similar values of ΦST on each
chromosome, ranging from 0.083 (0.075, 0.091) on chromosome 22 to 0.11 (0.10, 0.12) on
chromosome 16. ΦST for the X chromosome was higher: 0.14 (0.13,0.15). They detected 569
outlier loci, 518 were high outliers and 51 were low outliers. Several of the loci they detected
as outlier had been previously identified as targets of selection. The loci they identified as
candidates for balancing selection have not been suggested before as targets of such selection.
12
Again, the actual model is a bit more complicated than what I’m describing here, but the principle is
the same.
13
Or sets of SNP loci that are parts of a single contig.

250
Estimating population structure
In addition to FST we saw that a principal components analysis of genetic data might some-
times be useful. Fumagalli et al. [25] develop a method for PCA that, like Lynch’s [59]
method for estimating nucleotide diversity, uses all of the information available in NGS
data rather than imposing an artificial threshold for calling genotypes. They calculate the
pairwise entries of the covariance matrix by integrating across the genotype probabiity at
each site as part of the calculation and weighting the contribution of each site to the anal-
ysis by the probability that it is variable.14 As shown in Figure 24.3 this approach to
PCA recovers the structure much better than approaches that simply call genotypes at each
locus, whether or not outliers are excluded. The authors also describe approaches to esti-
mating FST that take account of the characteristics of NGS data. Their software (ANGSD:
http://popgen.dk/wiki/index.php/ANGSD) implements these and other useful statistical
analysis tools for next-generation sequencing data, including Tajima’s D. They also provide
NgsAdmix for Structure-like analyses of NGS data (http://www.popgen.dk/software/
index.php/NgsAdmix).

14
See [25] for details.

251
Figure 24.3: The “true genotypes” PCA is based on the actual, simulated genotypes (20
individuals in each population, 10,000 sites in the sample with 10% variable; FST between
the purple population and either the red or the green population was 0.4 and between the
green and red populations was 0.15; and coverage was simulated at 2× (from [25]).

252
Chapter 25

Genetic structure of human


populations in Great Britain

As we’ve seen several times in this course, the amount of genetic data available on humans is
vastly greater than what is available for any other organism. As a result, it’s possible to use
these data to gain unusually deep insight into the recent history of many human populations.
Today’s example comes from Great Britain, courtesy of a very large consortium [54]

Data
• 2039 individuals with four grandparents born within 80km of one another, effectively
studying alleles sampled from grandparents (ca. 1885).

• 6209 samples from 10 countries in continental Europe.

• Autosomal SNPs genotyped in both samples (ca. 500K).

Results*
Very little evidence of population structure within British sample

• Average pairwise FST : 0.0007

• Maximum pairwise FST : 0.003

253
Individual assignment analysis of genotypes using fineSTRUCTURE. Same principle as
STRUCTURE, but it models the correlations among SNPs resulting from gametic disequilib-
rium, rather than treating each locus as being independently inherited. The analysis is on
haplotypes rather than on alleles. In addition, it clusters populations hierarchically (Fig-
ure ??
Analysis of the European data identifies 52 groups. The authors used Chromopainter
to construct each of the haplotypes detected in their sample of 2039 individuals from the
UK as a mosaic of haplotypes derived from those found in their sample of 6209 individuals
from continental Europe. Since they know (a) the UK cluster to which each UK individual
belongs and (b) the European group from which each individual contributing to the UK
mosaic belongs they can estimate (c) the proportion of ancestry for each UK cluster derived
from each European group. The results are shown in Figure 25.2

254
Figure 25.1: fineSTRUCTURE analysis of genotypes from Great Britain (from [54]).

255
Figure 25.2: European ancestry of the 17 clusters identified in the UK (from [54]).

256
Literature cited

[1] S J Arnold. Quantitative genetics and selection in natural populations: microevolution


of vertebral numbers in the garter snake Thamnophis elegans. In B S Weir, E J Eisen,
M M Goodman, and G Namkoong, editors, Proceedings of the Second International
Conference on Quantitative Genetics, pages 619–636. Sinauer Associates, Sunderland,
MA, 1988.
[2] Nathan A Baird, Paul D Etter, Tressa S Atwood, Mark C Currey, Anthony L Shiver,
Zachary A Lewis, Eric U Selker, William A Cresko, and Eric A Johnson. Rapid SNP
discovery and genetic mapping using sequenced RAD markers. PLoS ONE, 3(10):e3376,
2008.
[3] Mark A Beaumont, Wenyang Zhang, and David J Balding. Approximate Bayesian
computation in population genetics, 2002.
[4] P Beerli. Comparison of Bayesian and maximum-likelihood estimation of population
genetic parameters. Bioinformatics, 22:341–345, 2006.
[5] Peter Beerli. Effect of unsampled populations on the estimation of population sizes and
migration rates between sampled populations, 2004.
[6] Peter Beerli and Joseph Felsenstein. Maximum-likelihood estimation of migration rates
and effective population numbers in two populations using a coalescent approach, 1999.
[7] Peter Beerli and Joseph Felsenstein. Maximum likelihood estimation of a migration
matrix and effective population sizes in n subpopulations by using a coalescent approach,
2001.
[8] R L Cann, M Stoneking, and A C Wilson. Mitochondrial DNA and human evolution.
Nature, 325:31–36, 1987.
[9] R Ceppellini, M Siniscalco, and C A B Smith. The estimation of gene frequencies in a
random-mating population. Annals of Human Genetics, 20:97–115, 1955.

257
[10] F B Christiansen. Studies on selection components in natural populations using popu-
lation samples of mother-offspring combinations. Hereditas, 92:199–203, 1980.

[11] T E Cleghorn. MNS gene frequencies in English blood donors. Nature, 187:701, 1960.

[12] The International HapMap Consortium. A second generation human haplotype map of
over 3.1 million SNPs. Nature, 449(7164):851–861, 2007.

[13] J F Crow and M Kimura. An Introduction to Population Genetics Theory. Burgess


Publishing Company, Minneapolis, Minn., 1970.

[14] A P Dempster, N M Laird, and D B Rubin. Maximum likelihood from incomplete data.
Journal of the Royal Statistical Society Series B, 39:1–38, 1977.

[15] T Dobzhansky and C Epling. Contributions to the genetics, taxonomy, and ecology
of Drosophila pseudoobscura and its relatives. Publication 554. Carnegie Institution of
Washington, Washington, DC, 1944.

[16] Th. Dobzhansky. Genetics of natural populations. XIV. A response of certain gene
arrangements in the third chromosome of Drosophila pseudoobscura to natural selection.
Genetics, 32:142–160, 1947.

[17] Robert J Elshire, Jeffrey C Glaubitz, Qi Sun, Jesse A Poland, Ken Kawamoto, Edward
Buckler, and Sharon E Mitchell. A Robust, Simple Genotyping-by-Sequencing (GBS)
Approach for High Diversity Species. PLoS ONE, 6(5):e19379, May 2011.

[18] Kevin Emerson, Clayton Merz, Julian Catchen, Paul A Hohenlohe, William Cresko,
William Bradshaw, and Christina Holzapfel. Resolving postglacial phylogeography using
high-throughput sequencing. Proceedings of the National Academy of Sciences of the
United States of America, 107(37):16196–16200, 2010.

[19] Arnaud Estoup, Mark A Beaumont, Florent Sennedot, Craig Moritz, and Jean-Marie
Cornuet. Genetic analysis of complex demographic scenarios: spatially expanding pop-
ulations of the cane toad, Bufo marinus, 2004.

[20] L Excoffier, P E Smouse, and J M Quattro. Analysis of molecular variance inferred


from metric distances among DNA haplotypes: application to human mitochondrial
DNA restriction data. Genetics, 131(2):479–491, 1992.

[21] J C Fay and C.-I. Wu. Hitchhiking under positive Darwinian selection. Genetics,
155:1405–1413, 2000.

258
[22] R Fu, A E Gelfand, and K E Holsinger. Exact moment calculations for genetic models
with migration, mutation, and drift. Theoretical Population Biology, 63:231–243, 2003.

[23] Y X Fu. Statistical properties of segregating sites. Theoretical Population Biology,


48:172–197, 1995.

[24] Y.-X. Fu. Statistical tests of neutrality of mutations against population growth, hitch-
hiking, and background selection. Genetics, 147:915–925, 1997.

[25] Matteo Fumagalli, F G Vieira, Thorfinn Sand Korneliussen, Tyler Linderoth, Emilia
Huerta-Sánchez, Anders Albrechtsen, and Rasmus Nielsen. Quantifying population
genetic differentiation from next-generation sequencing data. Genetics, 195(3):979–992,
November 2013.

[26] Zachariah Gompert and C Alex Buerkle. A hierarchical Bayesian model for next-
generation population genomics. Genetics, 187(3):903–917, March 2011.

[27] M Goodman. Immunocytochemistry of the primates and primate evolution. Annals of


the New York Academy of Sciences, 102:219–234, 1962.

[28] Feng Guo, Dipak K Dey, and Kent E Holsinger. A Bayesian hierarchical model for anal-
ysis of SNP diversity in multilocus, multipopulation samples. Journal of the American
Statistical Association, 104(485):142–154, March 2009.

[29] Thomas M Hammond, David G Rehard, Hua Xiao, and Patrick K T Shiu. Molecular
dissection of Neurospora Spore killer meiotic drive elements. Proceedings of the National
Academy of Sciences of the United States of America, 109(30):12093–12098, July 2012.

[30] H Harris. Enzyme polymorphisms in man. Proceedings of the Royal Society of London,
Series B, 164:298–310, 1966.

[31] Bernhard Haubold, Peter Pfaffelhuber, and MICHAEL LYNCH. mlRho - a program for
estimating the population mutation and recombination rates from shotgun-sequenced
diploid genomes. Molecular Ecology, 19:277–284, March 2010.

[32] P W Hedrick. Genetics of Populations. Jones and Bartlett Publishers, Sudbury, MA,
2nd ed. edition, 2000.

[33] Jody Hey and Rasmus Nielsen. Multilocus methods for estimating population sizes,
migration rates and divergence time, with applications to the divergence of Drosophila
pseudoobscura and D. persimilis, 2004.

259
[34] Jody Hey and Rasmus Nielsen. Integration within the Felsenstein equation for improved
Markov chain Monte Carlo methods in population genetics. Proceedings of the National
Academy of Sciences, 104(8):2785–2790, 2007.

[35] W G Hill and A Robertson. Linkage disequilibrium in finite populations. Theoretical


and Applied Genetics, 38:226–231, 1968.

[36] K E Holsinger. The population genetics of mating system evolution in homosporous


plants. American Fern Journal, pages 153–160, 1990.

[37] K E Holsinger and R J Mason-Gamer. Hierarchical analysis of nucleotide diversity in


geographically structured populations. Genetics, 142(2):629–639, 1996.

[38] K E Holsinger and L E Wallace. Bayesian approaches for the analysis of population
structure: an example from Platanthera leucophaea (Orchidaceae). Molecular Ecology,
13:887–894, 2004.

[39] Kent E. Holsinger and Bruce S. Weir. Genetics in geographically structured populations:
defining, estimating, and interpreting FST . Nature Reviews Genetics, 10:639–650, 2009.

[40] J L Hubby and R C Lewontin. A molecular approach to the study of genic heterozy-
gosity in natural populations. I. The number of alleles at different loci in Drosophila
pseudoobscura. Genetics, 54:577–594, 1966.

[41] S H James, A P Wylie, M S Johnson, S A Carstairs, and G A Simpson. Complex hy-


bridity in Isotoma petraea V. Allozyme variation and the pursuit of hybridity. Heredity,
51(3):653–663, 1983.

[42] R C Jansen, H Geerlings, A J VanOeveren, and R C VanSchaik. A comment on codom-


inant scoring of AFLP markers. Genetics, 158(2):925–926, 2001.

[43] M Kimura. Evolutionary rate at the molecular level. Nature, 217:624–626, 1968.

[44] J L King and T L Jukes. Non-Darwinian evolution. Science, 164:788–798, 1969.

[45] J F C Kingman. On the genealogy of large populations. Journal of Applied Probability,


19A:27–43, 1982.

[46] J F C Kingman. The coalescent. Stochastic Processes and their Applications, 13:235–
248, 1982.

260
[47] L Knowles. Did the Pleistocene glaciations promote divergence? Tests of explicit refugial
models in montane grasshopprers, 2001.

[48] L Knowles and Wayne P Maddison. Statistical phylogeography, 2002.

[49] M Kreitman. Nucleotide polymorphism at the alcohol dehydrogenase locus of Drosophila


melanogaster. Nature, 304:412–417, 1983.

[50] M Kreitman and M Aguadé. Excess polymorphism at the alcohol dehydrogenase locus
in Drosophila melanogaster. Genetics, 114:93–110, 1986.

[51] M Kreitman and R R Hudson. Inferring the evolutionary history of the Adh and Adh-
dup loci in Drosophila melanogaster from patterns of polymorphism and divergence.
Genetics, 127:565–582, 1991.

[52] R Lande and S J Arnold. The measurement of selection on correlated characters.


Evolution, 37:1210–1226, 1983.

[53] Alan R Lemmon, Sandra A Emme, and Emily Moriarty Lemmon. Anchored Hybrid
Enrichment for Massively High-Throughput Phylogenomics. Systematic Biology, 2012.

[54] Stephen Leslie, Bruce Winney, Garrett Hellenthal, Dan Davison, Abdelhamid Boumer-
tit, Tammy Day, Katarzyna Hutnik, Ellen C Royrvik, Barry Cunliffe, Consortium
Wellcome Trust Case Control, Consortium International Multiple Sclerosis Genetics,
Daniel J Lawson, Daniel Falush, Colin Freeman, Matti Pirinen, Simon Myers, Mark
Robinson, Peter Donnelly, and Walter Bodmer. The fine-scale genetic structure of the
British population. Nature, 519(7543):309–314, 2015.

[55] R C Lewontin and J L Hubby. A molecular approach to the study of genic heterozygosity
in natural populations. II. Amount of variation and degree of heterozygosity in natural
populations of Drosophila pseudoobscura. Genetics, 54:595–609, 1966.

[56] C C Li. First Course in Population Genetics. Boxwood Press, Pacific Grove, CA, 1976.

[57] W.-H. Li. Molecular Evolution. Sinauer Associates, Sunderland, MA, 1997.

[58] J D Lubell, M H Brand, J M Lehrer, and K E Holsinger. Detecting the influence


of ornamental Berberis thunbergii var. atropurpurea in invasive populations of Berberis
thunbergii (Berberidaceae) using AFLP. American Journal of Botany, 95(6):700–705,
2008.

261
[59] Michael Lynch. Estimation of nucleotide diversity, disequilibrium coefficients, and mu-
tation rates from high-coverage genome-sequencing projects. Molecular biology and
evolution, 25(11):2409–2419, November 2008.

[60] M Nei and R K Chesser. Estimation of fixation indices and gene diversities. Annals of
Human Genetics, 47:253–259, 1983.

[61] Anton Nekrutenko and James Taylor. Next-generation sequencing data interpretation:
enhancing reproducibility and accessibility. Nature Publishing Group, 13(9):667–672,
September 2012.

[62] Rasmus Nielsen and J Wakeley. Distinguishing migration from isolation: a Markov
chain Monte Carlo approach, 2001.

[63] John Novembre, Toby Johnson, Katarzyna Bryc, Zoltán Kutalik, Adam R Boyko, Adam
Auton, Amit Indap, Karen S King, Sven Bergmann, Matthew R Nelson, Matthew
Stephens, and Carlos D Bustamante. Genes mirror geography within Europe. Nature,
456(7218):98–101, 2008.

[64] John Novembre and Matthew Stephens. Interpreting principal component analyses of
spatial population genetic variation. Nat Genet, 40(5):646–649, 2008.

[65] Caroline Obert, Jack Sublett, Deepak Kaushal, Ernesto Hinojosa, Theresa Barton,
Elaine I Tuomanen, and Carlos J Orihuela. Identification of a Candidate Strepto-
coccus pneumoniae Core Genome and Regions of Diversity Correlated with Invasive
Pneumococcal Disease. Infection and Immunity, 74(8):4766–4777, 2006.

[66] Tomoko Ohta and Motoo Kimura. Linkage disequilibrium at steady state determined
by random genetic drift and recurrent mutation. Genetics, 63:229–238, 1969.

[67] Guillermo Orti, Michael A Bell, Thomas E Reimchen, and Axel Meyer. Global sur-
vey of mitochondrial DNA sequences in the threespine stickleback: evidence for recent
migrations. Evolution, 48(3):608–622, 1994.

[68] P Pamilo and M Nei. Relationships between gene trees and species trees. Molecular
Biology and Evolution, 5(5):568–583, 1988.

[69] Jonathan Pritchard, Matthew Stephens, and Peter Donnelly. Inference of Population
Structure Using Multilocus Genotype Data. Genetics, 155(2):945–959, 2000.

262
[70] Noah A Rosenberg, Jonathan K Pritchard, James L Weber, Howard M Cann, Ken-
neth K Kidd, Lev A Zhivotovsky, and Marcus W Feldman. Genetic structure of human
populations. Science, 298(5602):2381–2385, 2002.
[71] V M Sarich and A C Wilson. Immunological time scale for hominid evolution. Science,
158:1200–1203, 1967.
[72] Sven J Saupe. A fungal gene reinforces Mendel’s laws by counteracting genetic cheat-
ing. Proceedings of the National Academy of Sciences of the United States of America,
109(30):11900–11901, July 2012.
[73] Jonathan Sebat, B Lakshmi, Jennifer Troge, Joan Alexander, Janet Young, Par Lundin,
Susanne Maner, Hillary Massa, Megan Walker, Maoyen Chi, Nicholas Navin, Robert
Lucito, John Healy, James Hicks, Kenny Ye, Andrew Reiner, T Conrad Gilliam, Bar-
bara Trask, Nick Patterson, Anders Zetterberg, and Michael Wigler. Large-Scale Copy
Number Polymorphism in the Human Genome. Science, 305(5683):525–528, 2004.
[74] Montgomery Slatkin. Inbreeding coefficients and coalescence times. Genetical Research,
58:167–175, 1991.
[75] Montgomery Slatkin. Inbreeding coefficients and coalescence times. Genetical Research,
58:167–175, 1991.
[76] David J Spiegelhalter, Nicola G Best, Bradley P Carlin, and Angelika Van Der Linde.
Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society:
Series B (Statistical Methodology), 64(4):583–639, 2002.
[77] F Tajima. Statistical method for testing the neutral mutation hypothesis by DNA
polymorphism. Genetics, 123:585–595, 1989.
[78] Marie Touchon, Claire Hoede, Olivier Tenaillon, Valérie Barbe, Simon Baeriswyl,
Philippe Bidet, Edouard Bingen, Stéphane Bonacorsi, Christiane Bouchier, Odile Bou-
vet, Alexandra Calteau, Hélène Chiapello, Olivier Clermont, Stéphane Cruveiller, An-
toine Danchin, Médéric Diard, Carole Dossat, Meriem El Karoui, Eric Frapy, Louis
Garry, Jean Marc Ghigo, Anne Marie Gilles, James Johnson, Chantal Le Bougénec,
Mathilde Lescat, Sophie Mangenot, Vanessa Martinez-Jéhanne, Ivan Matic, Xavier
Nassif, Sophie Oztas, Marie Agnès Petit, Christophe Pichon, Zoé Rouy, Claude Saint
Ruf, Dominique Schneider, Jérôme Tourret, Benoit Vacherie, David Vallenet, Claudine
Médigue, Eduardo P C Rocha, and Erick Denamur. Organised Genome Dynamics in
the Escherichia coli Species Results in Highly Diverse Adaptive Paths. PLoS Genet,
5(1):e1000344, 2009.

263
[79] Peter A Underhill, Peidong Shen, Alice A Lin, Li Jin, Giuseppe Passarino, Wei H
Yang, Erin Kauffman, Batsheva Bonne-Tamir, Jaume Bertranpetit, Paolo Francalacci,
Muntaser Ibrahim, Trefor Jenkins, Judith R Kidd, S Qasim Mehdi, Mark T Seielstad,
R Spencer Wells, Alberto Piazza, Ronald W Davis, Marcus W Feldman, L Luca Cavalli-
Sforza, and Peter J Oefner. Y chromosome sequence variation and the history of human
populations. Nature Genetics, 26(3):358–361, 2000.

[80] Robert Verity and Richard A Nichols. What is genetic differentiation, and how should
we measure it- GST, D, neither or both? Molecular ecology, 23(17):4216–4225, 2014.

[81] S Wahlund. Zusammensetzung von Population und Korrelationserscheinung vom Stand-


punkt der Vererbungslehre aus betrachtet. Hereditas, 11:65–106, 1928.

[82] C Wedekind, T Seebeck, F Bettens, and A J Paepke. MHC-dependent mate preferences


in humans. Proceedings of the Royal Society of London, Series B, 260:245–249, 1995.

[83] B S Weir. Genetic Data Analysis II. Sinauer Associates, Sunderland, MA, 1996.

[84] B S Weir and C C Cockerham. Estimating F -statistics for the analysis of population
structure. Evolution, 38:1358–1370, 1984.

[85] B S Weir and W G Hill. Estimating F -statistics. Annual Review of Genetics, 36:721–
750, 2002.

[86] Eva-Maria Willing, Christine Dreyer, and Cock van Oosterhout. Estimates of Genetic
Differentiation Measured by FST Do Not Necessarily Require Large Sample Sizes When
Using Many SNP Markers. PLoS ONE, 7(8):e42649, August 2012.

[87] A C Wilson and V M Sarich. A molecular time scale for human evolution. Proceedings
of the National Academy of Sciences U.S.A., 63:1088–1093, 1969.

[88] Sewall Wright. Evolution and the Genetics of Populations, volume 2. University of
Chicago Press, Chicago, IL, 1969.

[89] Sewall Wright. Evolution and the Genetics of Populations., volume 4. University of
Chicago Press, Chicago, IL, 1978.

[90] K Zeng, Y.-X. Fu, S Shi, and C.-I. Wu. Statistical tests for detecting positive selection
by utilizing high-frequency variants. Genetics, 174:1431–1439, 2006.

264
[91] E Zuckerkandl and L Pauling. Evolutionary divergence and convergence in proteins. In
V Bryson and H J Vogel, editors, Evolving Genes and Proteins, pages 97–166. Academic
Press, New York, NY, 1965.

265
Index

additive effect, 146, 149 Bayesian inference, 14


Hardy-Weinberg assumption, 146 Beta distribution, 54
additive genetic variance, 149 breeders equation, 170, 173, 174
additive genotypic value, 145 Bufo
Adh, 238, 252 marinus, 325, 329
balancing selection, 239, 253
purifying selection, 239, 252 Cionia
alcohol dehydrogenase, 238, 252 intestinalis, 340
allele fixation, 85, 87 clade distance, 298
allele frequency distribution, 54 coalescent, 135
allele genealogy, 135 balancing selection, 239, 253
Ambystoma diverging populations, 317, 322
tigrinum, 300, 301 estimating migration, 317, 322
Amia estimating migration rates, 316, 321
calva, 294 F -statistics, 140
among-site rate variation, 220 migration, 313, 319
shape parameter, 220 mitochondrial Eve, 139
AMOVA, 288, 303 multiple alleles, 138
example, 289, 305 time to coalescence, 138
ancestral polymorphism, 308 two alleles, 136
Approximate Bayesian Computation, 327
coalescent events, 136
limitations, 331
components of selection, 78
regression, 328
components of variance
association mapping
causal, 158, 160
BAMD priors, 208
observational, 158, 160
linear mixed model, 207
covariance, 155
relatedness, 208
half-siblings, 155
assortative mating, 21
relatives, 157
BAMOVA, 341 cumulative selection gradient, 176
example, 342 caveats, 178

266
Daphnia fitness
pulex, 340 regression on phenotype, 168, 169
Deviance Information Criterion, 36 Fu’s FS , 247, 266
directional selection, 85 full-sib analysis, 153, 158
disassortative mating, 79 fundamental theorem of natural selection,
disruptive selection, 85 171
diversity-divergence, 239, 253
dominance genetic variance, 149 G-matrix, 174
Drosophila gamete competition, 78
melanogaster, 238, 252, 296 gametic disequilibrium, 199
pseudoobscura, 80 drift, 202
gene tree, 308
E-matrix, 174 genetic code, 234
effective neutrality, 223, 229 redundancy, 235
effectively neutral, 134 genetic composition of populations, 7
EM algorithm, 11 genetic drift, 111
environmental variance, 143 allele frequency variance, 115
eqilibrium, 24 binomial distribution, 114
equilibrium, 87 effective population size, 118
monomorphic, 87 effective population size, limitations, 119
polymorphic, 87 effective population size, separate sexes,
unstable, 87 120
estimate, 53 effective population size, variable popula-
evolutionary process, 213 tion size, 122
evolutionary pattern, 213 effective population size, variation in off-
spring number, 123
F -statistics, 42, 44 effectively neutral, 134
Gst , 49 fixation of deleterious alleles, 133
coalescent, 140 fixation probability, 132
notation, 51 fixation time, 116
outliers, 241, 255 ideal population, 117
Weir and Cockerham, 50 inbreeding analogy, 116
Fay and Wu’s H, 247, 266 inbreeding effective size, 118
fecundity selection, 78 loss of beneficial alleles, 131
fertility selection, 78 migration, 128
First law of population genetics, 6 mutation, 125
Fisher’s Fundamental Theorem of Natural mutation, recurrent, 127
Selection, 85 mutation, stationary distribtuion, 127

267
mutation, stationary distribution, 126 equilibrium, 26
population size, 128 individual assignment, 61
properties, 113, 114, 116 application, 61
properties with selection, 133
uncertainty in allele frequencies, 112 JAGS, 15
variance effective size, 118 Jukes-Cantor distance, 218
genetic variance, 143 assumptions, 219
additive, 150 linkage disequilibrium, 199
components, 149
dominance, 149 marginal fitness, 84
genotypic value, 145, 166 mating table, 8
additive, 145 self-fertilization, 22
fitness, 166 maximum-likelihood estimates, 13
genotyping-by-sequencing, 333, 335 MCMC sampling, 15
geographic structure, 39 mean fitness, 81
Melanopus, 309
half-sib analysis, 154 MHC
Hardy-Weinberg assumptions, 8 conservative and non-conservative substi-
Hardy-Weinberg principle, 10 tutions, 261
Hardy-Weinberg proportions synonymous and non-synonymous substi-
multiple alleles, 95 tutions, 260
heritability, 157, 158, 160 MHC polymorphism, 259
broad sense, 144 migration
narrow sense, 144 estimating, 316, 321
HGDP-CEPH, 62 migration rate
humans, 62 backward, 129
forward, 129
ideniity by type, 27 molecular clock, 214, 222, 228
identity by descent, 27 derivation, 224, 229
immunological distance, 214 molecular variation
inbreeding, 21, 28 markers, 216
consequences, 23 physical basis, 215
partial self-fertilization, 23 monomorphic, 85
self-fertilization, 22 mother-offspring pairs, 154
types, 21 mutation
inbreeding coefficient infinite alleles model, 125, 225, 231
population, 26 infinite sites model, 244, 263
inbreeding coefficient, 25 mutation rate, 221, 227

268
natural selection, 78 parent-offspring regression, 153, 157
components of selection, 78 phenotypic variance
disassortative mating, 79 partitioning, 143
fertility selection, 78, 98 phenylketonuria, 144
fertility selection, fertility matrix, 98 Φst , 288, 303
fertility selection, properties, 99 phylogeography, 293
fertility selection, protected polymor- population tree, 308
phism, 99
gamete competition, 78 QTL, 197
multiple alleles, marginal viability, 96 QTL mapping
patterns, 84 caveats, 186
segregation distortion, 78 inbred lines, 185
sexual selection, 79, 100 outline, 181
viability selection, 79 quantitative trait locus, 197
nature vs. nurture, 144 quantitative trait locus, 181
nested clade analysis, 294 R/qtl, 189
clade distance, 298 data format, 190
constructing nested clades, 296 estimating QTL effects, 194
nested clade distance, 299 identifying QTLs, 193
statistical parsimony, 295 permutation test, 193
neutral alleles, 222, 228 QTL analysis, 191
neutral theory visualizing QTL effects, 194
effective neutrality, 223, 229 RAD sequencing, 333, 334
modifications, 237 recombination frequency, 201
next-generation sequencing, 333 reference population, 28
estimating FST , 333 relative fitness, 83
estimating nucleotide diversity, 338 resemblance between relatives, 153
partitioning diversity, 341 response to selection, 145, 166, 170, 173
phylogeography, 335
non-synonymous substitutions, 235 sampling
nucleotide diversity, 244, 263, 287 genetic, 53
partitioning, 288, 303 statistical, 53
nucleotide substitutions sampling error, 49
selection against, 261 segregating sites, 244, 263
selection for, 261 segregation distortion, 78
selection
P -matrix, 174 directional selection, 85
parameter, 53 disruptive, 85

269
multivariate example, 175 unbiased estimates, 46
stabilizing, 88
selection coefficient, 85 viability
selection differential, 170, 173 absolute, 83
selection equation, 82 estimating absolute, 89
self-fertilization, 22 estimating relative, 90
partial, 23 relative, 83
sexual selection, 21, 79 viability selection, 79
sledgehammer principle, 233, 235, 239, 252, genetics, 79
260 virility selection, 78
stabilizing selection, 88 Wahlund effect, 39, 40
statistical expectation, 46 properties, 41
statistical parsimony, 295 theory, 41
example, 296 two loci, 203
haplotype network, 295 Wyeomyia
statistical phylogeography smithii, 335
example, 311
substitution rate, 221, 227 Zeng et al.’s E, 248, 267
substitution rates, 235 zero force laws, 5
synonymous substitutions, 235 Zoarces viviparus, 3

Tajima’s D, 244, 263


interpretation, 245, 265
TCS parsimony, 295
testing Hardy-Weinberg
goodness of fit, 32
testing Hardy-Weinberg, 32
Bayesian approach, 33
two-locus genetics
decay of disequilibrium, 202
drift, 202
gamet frequenies, 198
gametic disequilibrium, 199
Hardy-Weinberg, 201
recombination, 201
transmission, 200

unbiased estimate, 47

270

You might also like