KEMBAR78
Stochbasics Handout | PDF | Variance | Probability Distribution
0% found this document useful (0 votes)
26 views36 pages

Stochbasics Handout

The document provides an overview of statistical concepts for Master's students, focusing on random variables, distributions, and conditional probabilities. It covers fundamental topics such as the binomial distribution, expected value, variance, and applications in quantitative genetics. Additionally, it introduces the Bayes-Formula and discusses examples related to probability calculations and stochastic independence.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views36 pages

Stochbasics Handout

The document provides an overview of statistical concepts for Master's students, focusing on random variables, distributions, and conditional probabilities. It covers fundamental topics such as the binomial distribution, expected value, variance, and applications in quantitative genetics. Additionally, it introduces the Bayes-Formula and discusses examples related to probability calculations and stochastic independence.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

Statistics for Master’s students

Basics from Stochastics


Dirk Metzler
June 4, 2024

Contents
1 Random Variables and Distributions 2

2 Conditional Probabilities and the Bayes-Formula 6

3 The binomial distribution 8

4 Expected value 10

5 Variance and Correlation 13

6 Applications in Quantitative Genetics 21

7 Application Example about Codon Bias 28

8 Normal distribution 29

9 Normal approximation 33

10 The z-Test 34
You sample an indivdual from a population and measure its length X.
X is a random variable because it depends on random sampling.
Its expected value is in this case the population mean µ:

EX = µ

If you n individuals, all their lengths X1 , X2 ,. . . ,Xn are random variables.


q
Also their mean value X = n1 i Xi and s = n−1 1
P P 2
i (Xi − X) are random variables.
Assume a small population of 100 individuals, and a neutral allele A that has frequency 0.3 in this
generation.

What will be the frequency X of A in the next generation?

We don’t know, as X is a random variable .

However, we can ask, for example, for


EX , the expected value of X, or for
Pr(X = 0.32) , the probability that X takes a value of 0.32.
Even these values (especially the second on) depend on our model assumptions.

1
Contents

1 Random Variables and Distributions

We start with a simpler Example: Rolling a dice, W is the result of the next trial.
1 1
S = {1, 2, . . . , 6} Pr(W = 1) = · · · = Pr(W = 6) = 6
( Pr(W = x) = 6
for all x ∈ {1, . . . , 6} )

A Random Variable is a result of a random incident or experiment.

The state space S of a random variable is the set of possible values.

The distribution of a random variable X assigns to each set A ⊆ S the probability Pr(X ∈ A) that X takes a
value in A.

In general, we use capitals for random variables (X, Y, Z, . . . ), and small letters (x, y, z, . . . ) for (possible) fixed
values.

Writing events like sets


The event that X takes a value in A can be written with curly brackets:

{X ∈ A}

We can interpret this as the set of results (elementary events) for which the event is fulfilled. The intersection

{X ∈ A} ∩ {X ∈ B} = {X ∈ A, X ∈ B}

is then the event that X takes a value that is in A and in B.


The join
{X ∈ A} ∪ {X ∈ B}
is the event that the event that X takes a value in A or in B (or both).
Sometimes the curly brackets are not written:

Pr(X ∈ A, X ∈ B) = Pr({X ∈ A, X ∈ B})

Of course, we can also give events names, e.g.:

U := {X ∈ A}, V := {X ∈ B}
⇒ U ∩ V = {X ∈ A ∩ B}

Note that if two events contradict each other, e.g.

U = {X ∈ {1, 2}} V = {X ∈ {3, 4}}

then
U ∩ V = ∅ = {X ∈ ∅}
where ∅ is the (impossible) empty event (for which we use the same symbol as for the empty set).
If fact, events are (certain) subsets of a so-called sample space Ω. For example, if X is the result of
rolling a dice, then
n o
Ω = {X = 1}, {X = 2}, {X = 3}, {X = 4}, {X = 5}, {X = 6}

• In case like this with a finite Ω, als subsets of Ω are also events, and their probabilities are just the
sums of their elements.

2
• For infinite Ω things become more complicated:
– Events can have non-zero probability even if all their elements have zero probability.
– We cannot assume that all subsets of Ω are events (mathematical details are complicated).
• A probability distribution assigns to each event U ⊆ Ω a probability Pr(U ) that the event takes
place.

Example for an infinite state space


Uniform distribution on [0, 1]
If U is one of the closed, half-open or open intervals [a, b], (a, b], [a, b) or (a, b) with 0 ≤ a ≤ b ≤ 1, be
Pr(X ∈ U ) = b − a.
The state space Ω consists of all events of the form {X ∈ V }, where V is a countable join of intervals.
Note that probabilities of “elementary events” {X = y} do not help to define Pr(X ∈ V ), because
Pr(X = y) = Pr(X ∈ [y, y]) = y − y = 0
Pr(X ∈ V ) is defined due to countable additivity, see below.

An important axiom for infinite state spaces


Countable additivity (also called “sigma additivity”)
If A1 , A2 , A3 , . . . ⊂ Ω is a sequence of events such that Pr(Ai ) is defined for each i ∈ {1, 2, 3, . . . } and
Ai ∩ Aj = ∅ holds for each pair (i, j) with i 6= j, then

X n
X
Pr(A1 ∪ A2 ∪ A3 ∪ . . . ) = Pr(Ai ) = lim Pr(Ai ).
n→∞
i=1 i=1

Back to finite state spaces:


Example Rolling a dice W :
2 1 1
Pr(W ∈ {2, 3}) = = +
6 6 6
= Pr(W = 2) + Pr(W = 3)
4 2 2
Pr(W ∈ {1, 2} ∪ {3, 4}) = = +
6 6 6
= Pr(W ∈ {1, 2}) + Pr(W ∈ {3, 4})

Caution:
2 2 4
Pr(W ∈ {2, 3}) + Pr(W ∈ {3, 4}) = + =
6 6 6
3
6= Pr(W ∈ {2, 3, 4}) =
6

Example: rolling two dice (W1 , W2 ): Let W1 and W2 the result of dice 1 and dice 2.
Pr(W1 ∈ {4}, W2 ∈ {2, 3, 4})
= Pr((W1 , W2 ) ∈ {(4, 2), (4, 3), (4, 4)})
3 1 3
= = ·
36 6 6
= Pr(W1 ∈ {4}) · Pr(W2 ∈ {2, 3, 4})
In general:
Pr(W1 ∈ A, W2 ∈ B) = Pr(W1 ∈ A) · Pr(W2 ∈ B)

for all sets A, B ⊆ {1, 2, . . . , 6}

3
If S is the sum of the results S = W1 + W2 , what is the probability that S = 5, if dice 1 shows W1 = 2?
!
Pr(S = 5|W1 = 2) = Pr(W2 = 3)
1 1/36 Pr(S=5,W1 =2)
= 6
= 1/6
= Pr(W1 =2)

What is the probability S ∈ {4, 5} under the condition W1 ∈ {1, 6}?


Pr(S ∈ {4, 5}|W1 ∈ {1, 6})
Pr(S ∈ {4, 5}, W1 ∈ {1, 6})
=
Pr(W1 ∈ {1, 6})
Pr(W2 ∈ {3, 4}, W1 = 1)
=
Pr(W1 ∈ {1, 6})
2/36 1
= =
2/6 6

Calculation rules:
We consider events from a sample space Ω.

• 0 ≤ Pr(U ) ≤ 1 for all events U ∈ Ω

• Ω and the impossible event ∅ are events, and Pr(Ω) = 1 and Pr(∅) = 0.

• If U, V ⊂ Ω are disjoint, that is U ∩ V = ∅, in other words, they contradict each other, then U ∪ V is also an event and:

Pr(U ∪ V ) = Pr(U ) + Pr(V )

• If U ∩ V 6= ∅, then still U ∪ V is also an event and the inclusion-exclusion formula holds:

Pr(U ∪ V ) = Pr(U ) + Pr(V ) − Pr(U ∩ V )

• Definition of conditional probabilities: The probability of U under the condition V

Pr(U, V )
Pr(U |V ) :=
Pr(V )

“Conditional probability of U given V ” Note: Pr(U, V ) = Pr(V ) · Pr(U |V )

How to say
Pr(U, V ) = Pr(V ) · Pr(U |V )
in words:
The probability that both U and V take place can be computed in two steps:
• For U ∩ V , the event V must take place.
• Multiply the probability of V with the conditional probability of U , given that V is already known to take
place. (Not relevant are the time points when it turns out that U or V take place.)

Stochastic Independence

Definition 1 (stochastic independence) Two events U , V are (stochastically) independent if

Pr(U, V ) = Pr(U ) · Pr(V ).

Two random variables X and Y are (stochastically) independent, if all pairs of events of the form (X ∈ A, Y ∈ B)
for all possible A and B are stochastically independent.

Example:

4
• Tossing two dice: X = result dice 1, Y = result dice 2.
1 1 1
Pr(X = 2, Y = 5) = = · = Pr(X = 2) · Pr(Y = 5)
36 6 6

If X is a random variable with values in S and f : S → R is the function (or, more generally, a map),
then f (X) is a random variable that depends on X. If X takes the value x, f (X) takes the value f (x).
This implies:
Pr(f (X) ∈ U ) = Pr(X ∈ f −1 (U )),
Where f −1 (U ) is the inverse image of U , that is the set of all x such that f (x) ∈ U , formally:

f −1 (U ) = {x : f (x) ∈ U }

(Note the difference between f −1 ({y}) and f −1 (y). The latter only exists if f invertible, and is then
a number. The first is a set of numbers. Note also that {y} is not a number but a set containing one
number.)



8
6
f(X)





4
2





2
The function f : x 7→ (x − 3) for x ∈ {1, 2, 3, 4, 5, 6}) is not invertible.


0

Thus, f −1 (4) is not defined, and indeed f (1) = 4 = f (5).


1 2 3 4 5 6 However, in f −1 ({4}), the f −1 is not an inverse function but the
X inverse image function, which operates on sets:

f −1 ({4}) = {x : f (x) ∈ {4}} = {1, 5}

Or, e.g.:

f −1 ([0.5, 5]) = {x : f (x) ∈ [0.5, 5]} = {1, 2, 4, 5}


8
6
f(X)





4
2






0

1 2 3 4 5 6

Example: Let f be the function f (x) = (x − 3)2 , and let X be the result of rolling a dice. (Imagine a
game, in which you can move on f (x) steps if the dice shows x pips).Then

f −1 ({1}) = {2, 4},

and therefore

5
Pr(f (X) = 1) = Pr(f (X) ∈ {1})
1
= Pr(X ∈ f −1 ({1})) = Pr(X ∈ {2, 4}) = .
3

2 Conditional Probabilities and the Bayes-Formula


Example: Medical Test
Data about breast cancer mammography:
• 0.8% of 50 year old women have breast cancer.
• The mammogram detects breast cancer for 90% of the diseased patients.
• For 7% of the healthy patients, the mammogram gives false alarm.

In an early detection examination with a 50 year old patient, the mammogram indicates breast cancer.
What is the probability that the patient really has breast cancer?
This background information was given and the question was asked to 24 experienced medical prac-
titioners. 1 .
• 8 of them answered: 90%

• 8 answered: 50 to 80%
• 8 answered: 10% or less.
This is a question about a conditional probability: How high is the conditional probability to have
cancer, given that the mammogram indicates it.[2cm]
We can compute conditional probabilities with the Bayes-Formula.

A, B events
The conditional probability of A, given B (assuming
Pr(B) > 0):

Pr(A ∩ B)
Pr(A|B) =
Pr(B)
(A ∩ B:= A and B occur)

The theorem of the total probability (with B c :={B does not


occur}):
Thomas Bayes,
1702–1761
c c
Pr(A) = Pr(B) Pr(A|B) + Pr(B ) Pr(A|B )

Bayes-Formula:
Pr(B) Pr(A|B)
Pr(B|A) =
Pr(A)

1 Hoffrage, U. & Gigerenzer, G. (1998). Using natural frequencies to improve diagnostic inferences. Academic Medicine,

73, 538-540

6
Example: Let W ∈ {1, 2, 3, 4, 5, 6} be the result of rolling a dice. How probable is W ≥ 5 if W is an
c
A A

A := {W ≥ 5} B
even number? [0.5cm]
B := {W is even }
A∩B = {W is even and ≥ 5}
c
B

Pr(A ∩ B) 1/6 1
Pr(A|B) = = =
Pr(B) 3/6 3
1 1
Pr(B) · Pr(A|B) · 1
Pr(B|A) = = 2 3 =
Pr(A) 1/3 2

Now back to mammography. Define events:


A: The mammogram indicates breast cancer.
B: The patient has breast cancer.
The (unconditioned) probability Pr(B) is called prior probability of B, i.e. the probability that you
would assign to B before seeing “the data” A. In our case Pr(B) = 0.008 is the probability that a patient
coming to the early detection examination has breast cancer.[0.5cm] The conditional probability Pr(B|A)
is called posterior probability of B. This is the probability that you assign to B after seeing the data A.
The conditional probability that a patient has cancer, given that the mammogram indicates it, is

Pr(B) · Pr(A|B)
Pr(B|A) =
Pr(A)
Pr(B) · Pr(A|B)
=
Pr(B) · Pr(A|B) + Pr(B C ) · Pr(A|B C )
0.008 · 0.9
= ≈ 0.0939.
0.008 · 0.9 + 0.992 · 0.07
Thus, the probability that a patient for whom the mammogram indicates cancer has cancer is only 9.4%.
The right answer “approximately 10%” was only given by 4 of the 24 medical practitioners. Two of them
gave an explanation that was so fallacious that we have to assume that they gave the right answer only
by accident.

The Monty Hall problem


The Monty Hall problem (the goat problem)

• In the US-American TV-Show Let’s Make a Deal the candidate can win a sports car at the end of
the show if he or she selects the right one of three doors.
• Behind the two wrong doors there are goats.
• The candidate first selects one of the three doors, let’s say door 1.
• The host of the show, Monty Hall, then says “I show you something” and opens one of the two
other doors, let’s say door 2. A goat is standing behind this door.
• The candidate can then stay with door 1 or switch to door 3.
• Should they switch to door 3?

We assume that the candidate (first) chose door 1, and the placement of the car is purely random.

7
A : The host opens door 2.

B : The car is behind door 3.


C : The car is behind door 1.
D : The car is behind door 2.

Pr(B) = 1/3 = Pr(C) = Pr(D) Pr(A|B) = 1, Pr(A|C) = 1/2, Pr(A|D) = 0.

Pr(B) · Pr(A|B)
Pr(B|A) =
Pr(B) · Pr(A|B) + Pr(C) · Pr(A|C) + Pr(D) · Pr(A|D)
1
3 ·1
= 1 1 1 1
3 ·1+ 3 · 2 + 3 ·0
= 2/3

Thus, it is advisable to switch to door 3.

3 The binomial distribution


Bernoulli distribution

A Bernoulli experiment is an experiment with two possible outcomes “success” and “fail”, or 1 or 0.

Bernoulli random variable X: State space S = {0, 1}. Distribution:


Pr(X = 1) = p
Pr(X = 0) = 1 − p

The parameter p ∈ [0, 1] is the success probability.

Bernoulli distribution
Examples:
• Tossing a coin: Possible outcomes are “head” and “tail”
• Does the Drosophila have a mutation that causes white eyes? Possible outcomes are “yes” or “no”.
Assume a Bernoulli experiment (for example tossing a coin) with success probability p is repeated n times
independently.
What is the probability that it...
1. ...alway succeeds?
p · p · p · · · p = pn
2. ...always fails?
(1 − p) · (1 − p) · · · (1 − p) = (1 − p)n
3. ...first succeeds k times and then fails n − k times?

pk · (1 − p)n−k

4. ...succeeds in total k times and fails the other n − k times?


!
n
· pk · (1 − p)n−k
k

Note
n n!

k
= k!·(n−k)!
(“n choose k”) is the number of possibilities to choose k successes in n trials.

8
Binomial distribution
Let X be the number of successes in n independent trials with success probability of p each. Then,
!
n k
Pr(X = k) = p · (1 − p)n−k
k

holds for all k ∈ {0, 1, . . . , n} and X is said to be binomially distributed, for short:

X ∼ bin(n, p).

probabilities of bin(n=10,p=0.2) probabilities of bin(n=100,p=0.2)

0.10
0.30

● ●●



0.25

0.08


0.20


0.06

0.15



0.04
0.10





0.02

● ●
0.05



● ●



0.00

0.00

● ●
● ●●●
● ● ● ● ●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

0 2 4 6 8 10 0 20 40 60 80 100

k k

With the binomial distribution we can treat our initial question


Assume in a small population of n = 100 individuals the neutral allele A has a frequency of 0.3.
How probable is it that X, the frequency of A in the next generarion is 0.32?

Pr(X = 0.32) =?
We can only answer this on the basis of a probabilistic model, and the answer will depend on how we
model the population.

Modeling approach
We make a few simplifying assumptions:
• Discrete generations

• The population is haploid, that is, each individual has exactly one parent in the generation before.
• constant population size n = 100

Pr(X = 0.32) still depends on whether few individuals have many offspring or whether all individuals
have similar offspring numbers. Pr(X = 0.32) is only defined with additional assumptions, e.g.:
• Each individual chooses its parent purely randomly in the generation before.

“purely randomly” means independent of all others and all potential parents with the same probability.
Our assumptions imply that each individuals of the next generations have a probability of 0.3 to
obtain allele A, and they get their alleles independently of each other.
Therefore, the number K of individuels who get allele A is binomially distributed with n = 100 and
p = 0.3:
K ∼ bin(n = 100, p = 0.3)

9
For X = K/n follows:
 
n
Pr(X = 0.32) = Pr(K = 32) = · p32 · (1 − p)100−32
32
 
100
= · 0.332 · 0.768 ≈ 0.078
32

Some of the things you should be able to explain

• Concepts of events, random variables and probabilities, and their notations


• Inclusion-exclusion formula
• How to apply a function to a random variable

• Conditional probabilities
• Stochastic independence of events, and of random variables
• Bayes formula and how to apply it

• Binomial distribution and nk




4 Expected value
Example: genetic and environmental effects
Example: In population on a continent, skin pigmentation S of an individual depends on

• genetic effects G
• environmental effects E (e.g. due to local amount of sunshine)
• random effects R

Simple Model:
S =G+E+R
S, G, E, R are random variables if they refer to a randomly chosen individual from the population.

Question
Is the population mean of S the sum of the population means of G, E and R?

We need to formalize what population mean means.


General concept: The expected value of a random variable.

Definition 2 (Expected value) Let X be a random variable with finite or countable state space S = {x1 , x2 , x3 . . . } ⊆
R. The expected value of X is defined by
X
EX = x · Pr(X = x)
x∈S

It is also common to write µX instead of EX.

If we replace probabilities by relative frequencies in this definition, we get the formula for the mean value (of
a sample).

10
Definition 3 (Expected value) If X is a random variable with finite or countable state space S = {x1 , x2 , x3 . . . } ⊆
R, the expected value of X is defined by
X
EX = x · Pr(X = x)
x∈S

Examples:
• Let X be Bernoulli distributed with success probability p ∈ [0, 1]. Then we get

EX = 1 · Pr(X = 1) + 0 · Pr(X = 0) = Pr(X = 1) = p

• Let W be the result of rolling a dice. Then we get

EW = 1 · Pr(W = 1) + 2 · Pr(W = 2) + . . . + 6 · Pr(W = 6)


1 1 1
=1· 6
+2· 6
+ ... + 6 · 6
= 21 61 = 3.5

Calculating with expectations

Theorem 1 (Linearity of Expectation) If X and Y are random variables with values in R and if a ∈ R, we
get:
• E(a · X) = a · EX
• E(X + Y ) = EX + EY

Theorem 2 (Only if independent!) If X and Y are stochastically independent random variables with val-
ues in R, we get
• E(X · Y ) = EX · EY .

But in general E(X · Y ) 6= EX · EY . Example:


91
E(W · W ) = 6
= 15.167 > 12.25 = 3.5 · 3.5 = EW · EW

Proof of E(a · X) = a · EX:


Let S be the state space of X and R = {a · x | x ∈ S} = {y | y/a ∈ S} be the state space of a · X.
X
E(a · X) = y · Pr(a · X = y)
y∈R
X
= y · Pr(X = y/a)
y∈R
X
= a· y/a · Pr(X = y/a)
y∈R
X
= a· x · Pr(X = x)
x∈S
= a · EX

Theorem 3 If X is random variable with finite state space S ⊂ R, and if f : R → R is a function, we


obtain X
E(f (X)) = f (x) · Pr(X = x)
x∈S

Exercise: proof this.


With this, the proof of E(a · X) = a · EX can be written as follows:

11
Let S be the state space of X and define f (X) = a · X.
X
E(a · X) = E(f (X)) = f (x) · Pr(X = x)
x∈S
X
= a · x · Pr(X = x)
x∈S
X
= a· x · Pr(X = x)
x∈S
= a · EX

If X and Y are random variables, and Y has a countable state space S, then
X
Pr(X = x) = Pr(X = x, Y ∈ S) = Pr(X = x, Y = y).
y∈S

We will use this in the next proof.


Proof E(X + Y ) = EX + EY : To simplify notation we assume that X and Y have the same state space S.
We apply the same theorem as before, this time with f (x, y) = x + y, and obtain:

X
E(X + Y ) = E(f (X, Y )) = f (x, y) · Pr((X, Y ) = (x, y))
(x,y)∈S 2
XX
= (x + y) · Pr(X = x, Y = y)
x∈S y∈S
XX XX
= x · Pr(X = x, Y = y) + y · Pr(X = x, Y = y)
x∈S y∈S y∈S x∈S
X X X X
= x· Pr(X = x, Y = y) + y· Pr(X = x, Y = y)
x∈S y∈S y∈S x∈S
X X
= x · Pr(X = x) + y · Pr(Y = y)
x∈S y∈S

= E(X) + E(Y )

Proof of the product formula: Let S be the state space of X and Y , and let X and Y be (stochastically)
independent.

E(X · Y )
XX
= (x · y) Pr(X = x, Y = y)
x∈S y∈S
XX
= (x · y) Pr(X = x) Pr(Y = y)
x∈S y∈S
X X
= x Pr(X = x) · y Pr(Y = y)
x∈S y∈S

= EX · EY ·

Expectation of the binomial distribution


Let Y1 , Y2 , . . . , Yn be the indicator variables of the n independent trials, that is:

1 if trial i succeeds
Yi =
0 if trial i fails

12
Then the Yi are Bernoulli distributed and X = Y1 + · · · + Yn is binomially distributed with parameters (n, p),
where p is the success probability of the trials.

Linearity of expectation implies:

EX = E(Y1 + · · · + Yn )
= EY1 + · · · + EYn
= p + · · · + p = np

Thus, we obtain:
X ∼ bin(n, p) ⇒ EX = n · p

Probability distributions on continuous ranges are defined by densities instead of probabilities of single
values. Compare, e.g.:
−(x−30)2 /42
p(k) = 100 f (x) = e 42·π

k · 0.3k · 0.7100−k
●●●
0.08

0.08

● ●





p(k)

f(x)


0.04

0.04

● ●
● ●
● ●
● ●
● ●
0.00

0.00

● ●●
●● ● ● ● ●●

15 20 25 30 35 40 45 15 20 25 30 35 40 45

k x

In this case, the sum in the definition of E turns into an integral:


X Z
E(K) = k · P r(K = k) E(X) = x · f (x) dx
k x

The calculation rules for E still apply in the continuous case.

5 Variance and Correlation


Question: (for skin pigmentation example)
How does the standard deviation of S depend on the standard deviations of G, E and R?
How to infer σS , σG , σE and σR ?
σS can be estimated from indivduals sampled from the whole population (same probability for each
individual).
σR can in principle be estimated with genetically identical individuals living in same environment.
But how to measure σG and σE ?

Definition 4 (Variance, Covariance and Correlation) The Variance of a R-valued random variable X is
2
= E (X − EX)2 .
 
VarX = σX

σX = Var X is the Standard Deviation.
If Y is another R-valued random variable,

Cov(X, Y ) = E [(X − EX) · (Y − EY )]

is the Covariance of X and Y .


The Correlation of X and Y is
Cov(X, Y )
Cor(X, Y ) = .
σX · σY

13
The Variance
VarX = E (X − EX)2
 

is the average squared deviation from the expectation.


The Correlation
Cov(X, Y )
Cor(X, Y ) =
σX · σY
is always between in the range from -1 to 1. The random variables X and Y are
• positively correlated, if X and Y tend to be both above average or both below average.
• negatively correlated, if X and Y tend to deviate from their expected values in opposite ways.
If X and Y are independent, they are also uncorrelated, that is Cor(X, Y ) = 0.

Example: rolling dice


Variance of result from rolling a dice W :
 2 
Var(W ) = E W − EW
 2 
= E W − 3.5
1 1 1
= (1 − 3.5)2 · + (2 − 3.5)2 · + . . . + (6 − 3.5)2 ·
6 6 6
17.5
=
6
= 2.91667

Example: Empirical Distribution


nx
If x1 , . . . , xn ∈ R are data and if X is the result of a random draw from the data (such that Pr(X = x) = n
,
where nx is the number of xi that are equal to x, formally nx = |{i : xi = x}|), we get:
n
X nx 1X 1X
EX = x· = x · nx = xi = x
x
n n x n i=1

and
n
2  1X
(xi − x)2

Var X = E X − EX =
n i=1
If (x1 , y1 ), . . . , (xn , yn ) ∈ R × R are data if (X, Y ) are drawn from the data such that Pr((X, Y ) = (x, y)) =
|{i : (xi ,yi )=(x,y)}|
n
, we get
n
   1X
Cov (X, Y ) = E X − EX Y − EY = (xi − x)(yi − y)
n i=1

Why Cov(X, Y ) = E([X − EX][Y − EY ])?

14
10

8


● ● ●
● ●
● ●

● ●

●● ●
● ● ●
●● ●●
6

●● ●
● ● ● ●● ● ●
●●●● ●
● ●
● ●
● ● ●
● ●● ● ● ● ●●
[X−EX]<0 [X−EX]>0
Y

● ●
● ● ●
●● ●● ●●
● ● ●● ● ●●
● ● ●
● ● ●
●● ● ●
● ●
4

● ●●● ●
● ● ●
● ● ●



2


0

0 2 4 6 8 10

X
10


[Y−EY]>0
8



● ● ●
● ●
● ●

● ●

●● ●
● ● ●

● ●●
6

●●

● ● ● ●● ● ●
●●●● ●
● ●
● ● ●●
● ●● ● ● ● ●●●
[X−EX]<0 [X−EX]>0
Y

● ●
● ● ●
●● ●● ●●
● ● ●● ● ●●
● ● ●
● ● ●
●● ● ●
● ●
4

● ●●● ●
● ● ●
● ● ●



2


[Y−EY]<0
0

0 2 4 6 8 10

15
Cov(X,Y)= 1.11

10

8


● ● ●
[X−EX]*[Y−EY]<0 [X−EX]*[Y−EY]>0





● ●

●● ●
● ● ●
●● ●●
6

●● ●
● ● ● ●● ● ●
●●●● ●
● ●
● ●
● ● ●
● ●● ● ● ● ●●
Y

● ●
● ● ●
●● ●● ●●
● ● ●● ● ●●
● ● ●
● ● ●
●● ● ●
● ●
4

● ●●● ●
● ● ●
● ● ●


[X−EX]*[Y−EY]>0 [X−EX]*[Y−EY]<0
2


0

0 2 4 6 8 10

Cov(X,Y)= −0.78
10

[X−EX]*[Y−EY]<0 [X−EX]*[Y−EY]>0
8


● ●

● ● ●

● ●●
6

● ●● ●●
●●●●●●●
●●● ● ●
● ●● ● ● ● ●
● ●
● ● ● ● ●
Y

● ● ●●
● ●● ● ●
●● ● ● ●● ●

● ● ●● ●●●
● ● ● ●
●●●
●●
4

● ● ●
●●●
● ●● ●
● ●
● ● ●

● ●
2

[X−EX]*[Y−EY]>0 [X−EX]*[Y−EY]<0
0

0 2 4 6 8 10

16
10

10
8

8



● ●
● ●
● ● ●
● ● ● ●

● ●
● ● ● ●
●● ●
● ● ● ● ●
● ●●
6

6
● ●● ● ● ● ●●
● ● ●
● ● ● ● ●● ● ●
● ● ● ●● ●
● ● ● ●●● ●
● ● ● ● ● ●
● ● ● ● ●
● ● ● ● ●● ● ● ● ● ● ● ●● ●● ●●
● ●●● ● ● ●● ●
Y

Y
● ● ●●
● ●●●● ● ●●●● ●
● ● ●
● ●●● ●
● ●● ●● ●
● ● ● ●
● ●● ● ● ●
● ●●●●
●● ●
● ●
●●● ●● ●
● ●● ● ● ●
4

4
●● ● ●
● ● ● ● ●
● ● ● ●
● ●
●● ●● ●
● ● ● ● ●
● ●


● ●
2

2

0

0
0 2 4 6 8 10 0 2 4 6 8 10

X X

σX = 0.95, σY = 0.92 σX = 1.14, σY = 0.78


Cov(X, Y ) = −0.06 Cov(X, Y ) = 0.78
Cor(X, Y ) = −0.069 Cor(X, Y ) = 0.71
10


8

●● ●

●●● ●●
● ●● ●●
●●
●● ●

●●● ●●● ●●●●● ●●


●●
●●●●●●●●
●●●





●●●●
●●● ●●
● ●
●●● ●
● ●
● ●●
●●●
●●
●●●

●●● ●
●●●

6
Y

4
2
10

●● 0 2 4 6 8 10
8

● ●
● ● X

● ●

● ● ● ●●● ●
● ●
● ● ●●

σX = 1.03, σY = 0.32
6

●●
● ●●
●● ●● ●

●● ● ●
●●
● ●●
● ● ●● ●
Y

●● ●●
●● ● ●
● ● ●● ●
●●
●●

●● ● ● ●
● ●●● ● ●● ●●

Cov(X, Y ) = 0.32
4

●● ● ●
● ●● ●

● ●



Cor(X, Y ) = 0.95



2

10
0

0 2 4 6 8 10
8

σX = 1.13, σY = 1.2
6
Y

Cov(X, Y ) = −1.26 ● ●
4

Cor(X, Y ) = −0.92 ●



● ●



●● ●

● ●
● ●
● ●
2

● ●


● ●




● ●


● ●●
●●
●● ●●

●●● ●●
●●●
●●
0

0 2 4 6 8 10

σX = 0.91, σY = 0.88
Cov(X, Y ) = 0
Cor(X, Y ) = 0

Calculation rules for Covariances


Cov(X, Y ) = E[(X − EX) · (Y − EY )]

17
• If X and Y are independent, then Cov(X, Y ) = 0 (but not the other way around!)

• Cov(X, Y ) = Cov(Y, X)
• Cov(X, Y ) = E(X · Y ) − EX · EY (Exercise!)
• Cov(a · X, Y ) = a · Cov(X, Y ) = Cov(X, a · Y )

• Cov(X + Z, Y ) = Cov(X, Y ) + Cov(Z, Y )


• Cov(X, Z + Y ) = Cov(X, Z) + Cov(X, Y )

The last three rules describe the bilinearity of covariance.

Calculation rules for Correlations


Cor(X, Y ) = Cov (X,Y )
σX ·σY

• −1 ≤ Cor(X, Y ) ≤ 1
• Cor(X, Y ) = Cor(Y, X)
• Cor(X, Y ) = Cov(X/σX , Y /σY )

• Cor(X, Y ) = 1 if and only if Y is an increasing, affine-linear function of X, that is, if Y = a · X + b


for appropriate a > 0 and b ∈ R.
• Cor(X, Y ) = −1 if and only if Y is an decreasing, affine-linear function of X, that is, if Y = a · X + b
for appropriate a < 0 and b ∈ R.

Calculation rules for variances


VarX = E[(X − EX)2 ]

• VarX = Cov(X, X)
• VarX = E(X 2 ) − (EX)2 (Exercise!)

• Var(a · X) = a2 · VarX
• Var(X + Y ) = VarX + VarY + 2 · Cov(X, Y )
P
n  Pn Pn Pj−1
• Var

i=1 Xi = i=1 Var Xi + 2 · j=1 i=1 Cov(Xi , Xj )

• If (X, Y ) stochastically independent we get:

Var(X + Y ) = VarX + VarY

Question for skin pigmentation example: How does the standard deviation of S depend on the standard
deviations of G, E and R?
p
Answer: σS = Var(S), and

Var(S) = Var(G) + Var(E) + Var(R) + 2 · Cov(G, E) +


+2 · Cov(G, R) + 2 · Cov(E, R)

Perhaps we may assume Cov(G, R) =Cov(E, R) = 0, but Cov(G, E) > 0 is plausible as individuals
who live in more sunny areas may have genes for darker pigmentation.

18
So, how to measure σG and σE ?
(at least in principle)
Var(R): infer from genetically identically individuals in same environment
Var(G + R): infer from individuals sampled from whole population but exposed to same environment
Var(E + R): infer from genetically identically individuals exposed to random environments
If Cov(G, R) =Cov(E, R) = 0, then
p
σG = Var(G + R) − Var(R) and
p
σE = Var(E + R) − Var(R).

With these rules we can proof:


2
Theorem 4 If XP 1 , X2 , . . . Xn are independent R-valued random variables with expected value µ and variance σ ,
1 n
we get for X = n i=1 Xi :
EX = µ
and
1 2
Var X = σ ,
n
that is,
σ
σX = √
n

In particular: The standard error √sn is a estimator for the standard deviation of the σX sample mean X of
(X1 , X2 , . . . , Xn ).
The sample standard deviation s is an estimator of the standard deviation σ in the entire population.

Proof : Linearity of the expected value implies


1 Xn n
 1X 
EX = E Xi = E Xi
n i=1 n i=1
n
1X
= µ = µ.
n i=1

The independce of Xi helps to simplify the variance:


1 Xn n
 1 X 
Var X = Var Xi = 2 Var Xi
n i=1 n i=1
n n
1 X 1 X 2 1
σ = σ2

= Var Xi =
n2 i=1 n2 i=1 n

Bernoulli distribution
A Bernoulli distributed random variable Y with success probability p ∈ [0, 1] has expected value

EY = p

and variance
Var Y = p · (1 − p)

19
Proof : From Pr(Y = 1) = p and Pr(Y = 0) = (1 − p) follows

EY = 1 · p + 0 · (1 − p) = p.

variance:
2
Var Y = E Y 2 − EY


= 12 · p + 02 · (1 − p) − p2 = p · (1 − p)

Binomial distribution
Let Y1 , · · · , Yn be independent Bernoulli distributed with success probability p. Then follows
n
X
Yi =: X ∼ bin(n, p)
i=1

and we get:
n
X  Xn
Var X = Var Yi = Var Yi = n · p · (1 − p)
i=1 i=1

Binomial distribution

Theorem 5 (Expected value and variance of the binomial distribution) If X is binomially distributed
with parameters (n, p), we get:
EX = n · p
und
Var X = n · p · (1 − p)

Example: Genetic Drift


In a haploid population of n individuals, let p be the frequency of some allele A. We assume that
(due to some simplifying assumptions?) the absolute frequency K of A in the next generation is (n, p)-
binomially distributed.
For X = K/n, the relative frequency in the next generation follows:

Var(X) = Var(K/n) = Var(K)/n2 = n · p · (1 − p)/n2

p · (1 − p)
=
n

Example: Genetic Drift


If we consider the change of allele frequencies over m generations, the variances add up. If m is a
small number, such that p will not change much over m generations, the variance of change of allele
frequencies is approximately
m · p · (1 − p)
m · Var(X) =
n
(because the changes per generation are independent of each other) and thus, the standard deviation is
about r
m
· p · (1 − p)
n

20
Some of the things you should be able to explain

• Definitions of E, Var, Cov, Cor for random variables


• Calculation rules for E, Var, Cov, Cor and how to use them

• Difference between correlation and stochastic dependence


• E and Var (and SD) of the binomial distribution
• how genetic drift depends on population size and allele frequency

• basic principles and ideas of the proofs in this section

6 Applications in Quantitative Genetics


Quantitative Traits

continuous traits: weight, size, growth rate. . .


discrete traits: number of offspring, bristle number,. . .

traits with quantitative thresholds: environment and genes determine whether a character is ex-
pressed

Quantitative Genetics

• natural selection needs phenotypic variation to operate


• many traits are influenced by few major and many minor genes
• Q.G. has been successfully applied in animal an plant breeding
• application to evolutionary and ecological processes not trivial

• no exact knowledge of genetic mechanisms, rather statistical approach


• QTL analysis to search for genomic regions that influence a trait

Aims for now

• use formulas for Var and Cov to understand how


– natural variation and
– correlation of a trait with fitness
– heritability of the trait

influence the effect of selection


• based on the theoretical considerations how to predict effect of selection based on data?
• Results will be summarized in

– Robertson-Price identity
– breeder’s equation

Recommended Books

21
References
[LW98] M. Lynch, B. Walsh (1998) Genetics and Analysis of Quantitative Traits Sinauer Associates,
Inc., Sunderland, MA, USA
[BB+07] N.H. Barton, D.E.G. Briggs, J.A. Eisen, D.B. Goldstein, N.H. Patel (2007) Evolution Cold
Spring Harbor Laboratory Press, Cold Spring Harbor, NY, USA

Selection on quantitative trait


Parent population before and after selection
1

all
survival prob
survivors
0.5
0

0 5 10 15 20

Phenotype: size [cm]

Next generation ???

0 5 10 15 20

Phenotype: size [cm]

Origin of the word “Regression”


Sir Francis Galton (1822–1911): Regression toward the mean.

Tall fathers tend to have sons that are slightly smaller than the fathers.
Sons of small fathers are on average larger than their fathers.
Body Height Body Height Body Height
2.2

2.2

2.2
2.0

2.0

2.0
1.8

1.8

1.8
Son

Son
1.6

1.6

1.6
1.4

1.4

1.4

1.4 1.6 1.8 2.0 2.2 1.4 1.6 1.8 2.0 2.2 1.4 1.6 1.8 2.0 2.2

Father Father Father

Similar effects

• In sports: The champion of the season will tend to fail the high expectations in the next year.

22
• In school: If the worst 10% of the students get extra lessons and are not the worst 10% in the next
year, then this does not proof that the extra lessons are useful.

Phenotype vs. genotype of survivors


100

phenotype before selection


survival probability (%)
phenotype of survivors
genotype of survivors
80
60
40
20
0

5 10 15

Size [cm]

genotype <- rnorm(1000,10,1.5)


environment <- rnorm(1000,0,1.5)
phenotype <- genotype + environment

hist(phenotype,col="lightblue",breaks=4:36/2)

survival.prob <- function(x) {


1-1/(1+exp(-x+7))
}

lines(20:180/10,survival.prob(20:180/10)*100,lwd=2)

survivors <- rbinom(1000,size=1,prob=(survival.prob(phenotype)))

hist(phenotype[survivors==1],add=TRUE,col="blue",breaks=4:36/2)
hist(genotype[survivors==1],add=TRUE,col="orange",breaks=4:36/2)

23

16
● ●
● all ●
● ●
●●

● survivors ● ● ● ●● ●
● ● ●● ●
●● ●●● ● ●● ●●
●● ● ● ●

14
● ●● ●
● ●●●● ● ● ●● ●●
● ● ●●●●● ●●
●●●● ● ●● ●
●●●
●●
● ●●●●●●
● ● ● ●● ● ●
●●● ●●● ●
●●● ●● ●
● ●●●●●●● ●●●●● ●

● ●
●●

●●●

● ●●● ●
● ● ● ●
● ● ●● ●●●● ● ●●●● ●
●● ●●●●
●●● ●
● ● ● ● ● ●●● ● ●

12

●●● ●● ●●
● ●● ●●●●
● ●●

●●●
●● ● ●● ●


●●

●●●●●
● ●
●●●●●
●●

●●
●● ●● ●
● ●●

●●●●● ●●● ●●
●●●● ● ●● ●
● ● ●●●●●●●
● ●
● ●

●●
● ● ●
phenotype

● ●● ●
●●
● ●
●●● ●●●●
●●●●
●●●●
●●●● ● ●
●● ● ●●●●●●
●●● ●


●● ●
●●

●●●
● ●●●


●●●●
● ● ●●●

●●●●●●
●● ● ●● ●● ●●
●● ● ●●●●
● ● ●
●●●●
●●●● ● ●●●● ●
●●
● ●● ●
●●
●●
●●
●●● ●●● ●●●●
● ● ● ●●


● ●●●
● ●●●
● ●
●● ●

10

● ● ●●● ●●● ●●●


●●
● ●● ●●
●● ●●●● ●●
● ●●● ●● ●● ●
● ●

●●● ●

●●
●●

●●

●●




●●
●●


●●●●●●● ● ●

●● ● ●● ● ●●● ●● ●●●●
●●●● ● ●● ●
●● ● ● ●
● ●●
●● ●●
●●
●●●●●●●●●●●


●●● ●
● ●● ● ● ● ●● ● ●
● ●●●●●
●●●

●●●

●●●
●● ●●
●● ●
●● ●●●● ●●
●● ●●
●●● ●●●● ●●



● ●●●●
●●


● ●



●●●
● ●●●

● ●● ●

●●● ●●
● ●●
●● ●
● ●● ●
●● ●● ●
●●● ●●
●● ● ●
● ●● ●●● ●
●●●●●●
●●●●●●●
● ●●

● ●●
●●
●● ●●●
●● ● ●
8

● ● ●
● ● ●
●● ●●●● ●● ● ●● ● ● ●● ●


●●●●
●●●●
●●
●● ●●●

● ●●
● ●
●●●● ●
●●● ●

● ●●●● ●●●● ●
●●
●● ●● ● ●● ● ●
●● ●●●●●
●●●●●● ●●● ●
● ●●●● ●●● ●●
●● ● ●
●●●●● ●● ● ●● ● ●● ●
6

● ● ●
●●●● ● ● ●●
●●
●● ●●● ● ●

● ●

4

● ●


4 6 8 10 12 14 16

genotype

Parent population before and after selection


1

all
survival prob
survivors
0.5
0

0 5 10 15 20

Phenotype: size [cm]

Next generation !!!

0 5 15 20
{

Phenotype: size [cm]


{

Classical estimated of heritabilities after Falconer (1981) Introduction to quantitative gentics

Species Trait Heritability


humans stature 0.65
serum immunoglobulin 0.45
cattle body weight 0.65
milk yield 0.35
poultry body weight 0.40
egg production 0.10

We will now derive two equations:


(Robertson-)Price-equation: How selection shifts the mean phenotype (in the same generation)
breeders’ equation:

24
• Predict change from one generation to the next
• Account for selection and heritability
• use a measure of heritability that can be estimated from parent-offspring comparisons
µ mean phenotype before selection
µs mean phenotype after selection but before reproduction
S = µs − µ directional selection differential
µo mean phenotype in offspring generation
∆µ = µo − µ
W (z) individual fitness: probability that individual with phenotype z will survive to reproduce
p(z) density of phenotype z before selection
R
W = W (z) · p(z) dz mean individual fitness
w(z) = W (z)/W relative individual fitness
ps (z) = w(z)p(z) density of phenotype z after selection but before reproduction (density in a stochastic
sense, i.e. integrates to 1)
Let Z be the phenotype of an individual drawn randomly from the parent population before selection.
µ = EZ E(w(Z)) = 1
Z Z
µS = z · pS (z) dz = z · w(z) · p(z) dz = E(Z · w(Z))
z z

⇒ S = µs − µ = E(Z · w(Z)) − E(Z) · E(w(Z)) = Cov(Z, w(Z))


Thus, we obtain:
Theorem 6 (Robertson-Price identity; Robertson 1966; Price 1970/72)
S = Cov(Z, w(Z))
Assume we can partition the phenotypic value Z into a genotypic value G and an environmental (or
random) deviation E:
Z =G+E
Then,
Cov(Z, G) = Cov(G + E, G) = Var(G) + Cov(E, G)
and
Var(G) + Cov(G, E)
Cor(Z, G) = .
σG · σZ
In the special case of Cov(G, E) = 0, we obtain for the genetic contribution of the phenotypic variance
Var(G)
Cor2 (G, Z) = .
Var(Z)
(Note that if Cov(G, E) = 0, then Var(Z)=Var(G)+Var(E))
Note that if E is really due to environmental effects, Cov(G, E) may not be 0 if the population is
genetically and spatially structured (and for many other possible reasons).

In any case,
Var(G)
=: H 2
Var(Z)
is called heritability in the broad sense.

Problem: Var(G) and thus also H 2 are parameters that are hard to estimate.

25
narrow-sense heritability
Let Zm , Zf , Zo be a the phenotype sampled from a triplet of mother, father and an offspring, sampled
from the population. The narrow-sense heritability h2 is defined by
 
Z +Z
Cov m 2 f , Zo
h2 :=   .
Z +Z
Var m 2 f

Z +Z
It is the slope of the regression line to predict Z0 from the mid-parental phenotype m 2 f and can be
estimated from a sample of many parent-offspring triples.
We will see later in this semester: The line that predicts Y from X has slope Cov(X, Y )/Var(X).

}a

If there was selection, then:


   
Zm + Zf Zm + Zf
µo = EZo = E a + h2 · = a + h2 · E = a + h2 · µS
2 2

If the values Zem , Zef , Z


eo stem from a population with no selection, we assume that the mean phenotype
is the same in the two generations:
!
2 Z
em + Z ef
µ = EZeo = a + h · E = a + h2 · µ
2

This implies: ∆µ = µo − µ = (a + h2 · µS ) − (a + h2 · µ) = h2 · (µS − µ) = h2 · S

Theorem 7 (breeders’ equation)


∆µ = h2 S

Equivalent definition of h2
Assume that Zm and Zf are independent and have the same distribution as Z. Then follows
 
Zm + Zf 1 1 1
Var = Var (Zm + Zf ) = (Var (Zm ) + Var (Zf )) = Var (Z) ,
2 4 4 2
and
 
Zm + Zf 1 Cov (Zm , Z0 ) + Cov (Zf , Z0 )
Cov , Z0 = Cov (Zm + Zf , Z0 ) = .
2 2 2
And thus  
Zm +Zf
Cov 2 , Z0 Cov (Zm , Z0 ) + Cov (Zf , Z0 )
h2 =   =
Var
Zm +Zf Var (Z)
2

26
Equivalent definition of h2 under certain assumptions
Let Gm and Gf be the phenotypic effects of the genes transmitted by the mother and the father to
the offspring.
If mating is so random and if there are no correlations (between parental genotypes and environmental
effects etc.), and if genetic effects are additive, we obtain
   
Zm + Zf Gm + Gf VarGm + VarGf
Cov , Zo = Cov , Gm + Gf = ,
2 2 2

and thus
 
Zm +Zf
Cov 2 , Zo VarGm + VarGf
h2 = 1 =
2 Var (Z)
Var (Z)

How to define h2 if genetic effects are not additive


Let A and B be the alleles at one locus and let zAA , zAB and zBB be the average phenotypes of
individuals with genotypes AA, AB and BB.
What if zAB 6= (zAA + zBB )/2?
Then, decompose the genetic effects for each (u, v) ∈ {(A, A), (A, B), (B, B)} as follows:

zuv = µ + G(u) + G(v) + D(u, v)

by setting these components as follows, where p is the population frequency of A.

µ = p2 · zAA + 2p(1 − p) · zAB + (1 − p)2 · zBB


G(A) = p · zAA + (1 − p) · zAB − µ
G(B) = p · zAB + (1 − p) · zBB − µ
D(u, v) = zuv − µ − G(u) − G(v)

If U and V are sampled independently from {A, B} according to the population allele frequencies p
and 1 − p, we obtain that G(U ), G(V ) and D(U, V ) are random variables with the following properties:

• their expected values EG(U ), EG(V ) and ED(U, V ) are 0.


• G(U ), G(V ) and D(U, V ) are uncorrelated

see e.g. Felsenstein (2019+) Theoretical Evolutionary Genetics https://evolution.gs.washington.


edu/pgbook/pgbook.pdf

G(A) and G(B) are called additive effects


D(u, v) is called dominance deviation

Assume now we have n unlinked loci with additive effects G1 (.), G2 (.), . . . , Gn (.) and dominance
deviations D1 (., .), D2 (., .), . . . , Dn (., .), and the effects are additive among the loci, that is, no
epistasis. (Otherwise: how to separate additive from non-additive locus interactions, see e.g. Falconer,
Mackay (1996) Introduction to Quantitative Genetics. 4th ed.)
Then, the phenotypic variance is the sum of
the so-called additive variance VA = Var (G1 (U1 ) + G1 (V1 ) + G2 (U2 ) + G2 (V2 ) + · · · + Gn (Un ) + Gn (Vn )),
the so-called dominance variance VD = Var (D1 (U1 , V1 ) + D2 (U2 , V2 ) + · · · + Dn (Un , Vn )) and

27
and the environmental variance VE .

We can then define narrow-sense heritability as the fraction of phenotypic variation that is due to
additive genetic effects
VA
h2 = ,
VA + VD + VE
Cov((Zm +Zf )/2,ZO )
and this is still Var((Zm +Zf )/2) , see e.g. Felsenstein (2019+) for details.

Example

References
[1] Galen (1996) Rates of floral evolution: adaptation to bumblebee pollination in an alpine wildflower,
Polemonium viscosum Evolution 50(1): 120–125

• long-term experiment, trait is corolla flare

• S was measured as
– 7% when estimated from number of seeds
– 17% when estimated from number of surviving offspring after 6 years
• h2 ≈ 1

• Change of trait 9% in one generation

Some of the things you should be able to explain

• Robertson-Price identity

• Breeder’s equation
• Why is the narrow-sense heritability and not the broad-sense heritability used in the breeder’s
equation
• Connection of narrow-sense heritability and additive genetics effects

– definition of additive genetics effects depends on population allele frequencies


– additivity between loci still required

7 Application Example about Codon Bias


In

References
[1] E.N. Moriyama (2003) Codon Usage Encyclopedia of the human genome, Macmillan Publishers Ltd.

examines 9497 human Genes for Codon Bias.


In these genes the amino acid proline is coded 16710 times by the codon CCT and 18895 times by the
codon CCC.
Does it only depend on pure randomness which codon is used?

28
Then the number X of the CCC would be binomially distributed with p = 21 and n = 16710 + 18895 =
35605. Assume the number X (= 18895) of CCC is binomially distributed with p = 12 and n =
16710 + 18895 = 35605.
EX = n · p = 17802.5
p
σX = n · p · (1 − p) ≈ 94.34
18895 − 17802.5 = 1092.5 ≈ 11.6 · σX
Does this look like purely random?

The question is:

How small is the probability of a deviation from the expectation of at least ≈ 11.6 · σX , if it is all purely random?

We have to calculate 
Pr |X − EX| ≥ 11.6σX .
n

A problem with the binomial distribution is: Calculating k
precisely is slow for large n. Therefore:

The binomial distribution can be approximated by other distributions.

8 Normal distribution

A binomial distribution with large n looks like a normal distribution:


0.025

0.025


●●
●● ●
●●
●●
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
dbinom(400:600, 1000, 0.5)

dbinom(400:600, 1000, 0.5)


0.020

0.020

● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
0.015

0.015

● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
0.010

0.010

● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
0.005

0.005

● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
●● ●● ●● ●●
●● ●● ●● ●●
●● ●● ●● ●●
0.000

0.000


●● ●●
● ●
●● ●●


●●
●●
● ●
●●
●●
● ●
●●
●●
● ●
●●
●●


●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
● ●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
● ●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
● ●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●

400 450 500 550 600 400 450 500 550 600

400:600 400:600

Density of the standard normal distribution

29
A random variable Z with the density
1 x2
f (x) = √ · e− 2

“Gaussian bell-
0.4

curve”
0.3

for short:
Z ∼ N (0, 1)
0.2

EZ = 0
0.1

Var Z = 1
0.0

−4 −2 0 2 4

is called
standard-normally distributed.
If Z is N (0, 1) distributed, then X = σ · Z + µ is normally distributed with mean µ and variance σ 2 , for short:

X ∼ N (µ, σ 2 )

X has the density


1 (x−µ)2
f (x) = √ · e− 2σ2 .
2πσ

Always have in mind:


If Z ∼ N (µ, σ 2 ), we get:
• Pr(|Z − µ| > σ) ≈ 33%
• Pr(|Z − µ| > 1.96 · σ) ≈ 5%
• Pr(|Z − µ| > 3 · σ) ≈ 0.3%

1 − (x−µ)
2

f (x) = √ ·e 2σ 2

2πσ
0.25
0.20
0.15
0.10
0.05
0.00

0 2 µ − σ4 µ µ
6 + σ 8 10

Densities need Integrals


If Z is a random variable with density f (x),

30
0.25
0.20
0.15
0.10
0.05
0.00

0 2 4 6 8 10

a b
we get Z b
Pr(Z ∈ [a, b]) = f (x)dx.
a

Note: the probability density f is not the probability distribution of Z, but the probability distribu-
tion
A 7→ Pr(Z ∈ A)
can be calculated from the probability density:
Z
A 7→ Pr(Z ∈ A) = f (x)dx
A

Question: How to compute Pr(Z = 5)?

Answer: For each x ∈ R we have Pr(Z = x) = 0 (Area of width 0)


P
What happens with EZ = x∈S x · Pr(Z = x) ?

For a continuous random variable with density f we define:


Z ∞
EZ := x · f (x)dx
−∞

The E-based defintions of Var, Cov, Cor still apply, e.g.:

Var(Z) = E(Z − EZ)2 = EZ 2 − (EZ)2

The normal distribution in R


dnorm(): density of the normal distribution
rnorm(): drawing a random sample
pnorm(): probability function of the normal distribution
qnorm(): quantile function of the normal distribution

example: density of the standard normal distribution:


> plot(dnorm,from=-4,to=4)

31
0.4
0.3
0.2
0.1
0.0
−4 −2 0 2 4

> dnorm(0) [1] 0.3989423 > dnorm(0,mean=1,sd=2) [1] 0.1760327

example: drawing a sample

draw a sample of length 6 from standard normal:


> rnorm(6) [1] -1.24777899 0.03288728 0.19222813 0.81642692 -0.62607324 -1.09273888

draw a sample of length 6 from standard normal with expected value 5 and standard deviation 3:
> rnorm(7,mean=5,sd=3) [1] 2.7618897 6.3224503 10.8453280 -0.9829688 5.6143127 0.6431437 8.123570

example: Computing probabilities: Let Z ∼ N (µ = 0, σ 2 = 1) be standard normally distributed

Pr(Z < a) can be computed in R by pnorm(a)

> pnorm(0.5) [1] 0.6914625


0.4
0.3
standard normal density

0.2
0.1
0.0

0.5

example: Computing probabilities: Let Z ∼ N (µ = 5, σ 2 = 2.25).

Computing Pr(Z ∈ [3, 4]):


Pr(Z ∈ [3, 4]) = Pr(Z < 4) − Pr(Z < 3)
> pnorm(4,mean=5,sd=1.5)-pnorm(3,mean=5,sd=1.5) [1] 0.1612813 example: Computing quantiles: Let
Z ∼ N (µ = 0, σ 2 = 1) be standard normally distributed. Fo which value z holds Pr(|Z| > z) = 5%?
0.4
0.3
density

0.2

2.5% 2.5%
0.1
0.0

−4 −2 0 2 4

32
From the symmetry around the y-axis follows

Pr(|Z| > z) = Pr(Z < −z) + Pr(Z > z) = 2 · Pr(Z < −z)

So find z > 0, such that Pr(Z < −z) = 2.5%. > qnorm(0.025,mean=0,sd=1) [1] -1.959964 Answer: z ≈ 1.96,
just below 2 standard deviations.

9 Normal approximation
Normal approximation
For large n and p which are not too close to 0 or 1, we can approximate the binomial distribution by a normal
distribution with the corresponding mean and variance.

If X ∼ bin(n, p) and Z ∼ N (µ = n · p, σ 2 = n · p · (1 − p)), we get

Pr(X ∈ [a, b]) ≈ Pr(Z ∈ [a, b])

(rule of thumb: Usually okay if n · p · (1 − p) ≥ 9)


n = 1000, p = 0.5, n · p · (1 − p) = 250
0.025

0.025


●●
●● ●
●●
●●
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
dbinom(400:600, 1000, 0.5)

dbinom(400:600, 1000, 0.5)


0.020

0.020

● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
0.015

0.015

● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
0.010

0.010

● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
0.005

0.005

● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
●● ●● ●● ●●
●● ●● ●● ●●
●● ●● ●● ●●
0.000

0.000


●● ●●
● ●
●● ●●


●●
●●
● ●
●●
●●
● ●
●●
●●
● ●
●●
●●


●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
● ●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
● ●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
● ●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●

400 450 500 550 600 400 450 500 550 600

400:600 400:600

n = 10, p = 0.2, n · p · (1 − p) = 1.6


0.30

0.30

● ●

● ●
0.25

0.25
0.20

0.20

● ●
dbinom(0:10, 10, 0.2)

dbinom(0:10, 10, 0.2)


0.15

0.15

● ●
0.10

0.10

● ●
0.05

0.05

● ●
0.00

0.00

● ●
● ● ● ● ● ● ● ●

0 2 4 6 8 10 0 2 4 6 8 10

0:10 0:10

33
Theorem 8 (Central Limit Law) If the R-valued random variables X1 , X2 , . . . are independent and identically
distributed with finite variance 0 < Var Xi < ∞ and if
Zn := X1 + X2 + · · · + Xn
is the sum of the first n variables, then the centered and rescaled sum is in the limit n → ∞ standard-normally
distributed:
Zn − EZn
√ ∼ N (µ = 0, σ 2 = 1)
Var Zn
for n → ∞. Formally: For all −∞ ≤ a < b ≤ ∞ holds
 Zn − EZn 
lim Pr a ≤ √ ≤ b = Pr(a ≤ Z ≤ b),
n→∞ Var Zn
where Z is a standard-normally distributed random variable.

In other words: For large n holds:


Zn ∼ N µ = EZn , σ 2 = Var Zn


The requirements “independent” and “identically distributed” can be diluted.

Usually holds:
If Y is the sum of many small contributions, most of which are independent of each other, then Y is approximately
normally distributed.
that is
Y ∼ N µ = EY, σ 2 = Var Y


10 The z-Test

Back to the example with the proline codons in the human genome.

CCT is used k = 16710 times CCC is used n − k = 18895 times

Do this look like purely random?


We say: No.

Doubters may say: Just random

The Hypothesis
Purely random No difference
is called the Nullhypothesis.
To convince the doubters, we have to find arguments against the null hypothesis. We show: If the null
hypothesis is true the observation is very improbable.

CCT is used k = 16710 times CCC is used n − k = 18895 times Under the null hypothesis “just random” the
number X of CCT is bin(n, p)-distributed with n = 35605 and p = 0.5.

Normal approximation: X is approximately N (µ, σ 2 )-distributed with


µ = n · p = 17802.5 ≈ 17800
and
p
σ= n · p · (1 − p) = 94.34 ≈ 95

34
Question: Is it plausible, that a random variable X that has taken the value k = 18895 is approximately

N (17800, 952 )-distributed? k


If the null hypothesis H0 holds, then
Pr(X = 17800) = 0
But that does not imply anything, because Pr(X = k) = 0 holds for every value k!

Relevant is the probability (assuming the null hypothesis H0 ) that X takes a value at least as extreme as k:

k
Pr(|X − µ| ≥ |k − µ|) = Pr(|X − µ| ≥ 1092.5) ≈ Pr(|X − µ| ≥ 11.6 · σ)
We have memorized already:
P r(|X − µ| ≥ 3 · σ) ≈ 0.003

This means that Pr(|X − µ| ≥ 11.6 · σ) must be extremely small.


Indeed:
> 2 * pnorm(18895,mean=17800,sd=95,lower.tail=FALSE) [1] 9.721555e-31

Without normal approximation:


> pbinom(16710,size=35605,p=0.5) + + pbinom(18895-1,size=35605,p=0.5,lower.tail=FALSE) [1] 5.329252e-31

We can argue that such a deviation from the expected value could only be explaind by extreme random.

Thus we can reject the null hypothese “just random” and search for alternative explanations, as for example
differences in the efficiency of CCC and CCT or in the availability of C and T.

Summary z-Test
Null hypothesis H0 (what we want to reject): the observed value x comes from a normal distribution with
mean µ and known Variance σ 2 .
p-value =Pr(|X − µ| ≥ |x − µ|), where X ∼ N (µ, σ 2 ), the probability of a deviation that is at least as large as
the observed one.
Significance level α : usually 0.05. If the p-value is smaller than α, we reject the null hypothesis on the
significance level α and look for an alternative explanation.

35
Limitations of the z-Test
The z-Test can only be applied if the variance of the normal distribution is known or at least assumed to be
known according to the null hypothesis.

This is usually not the case when the normal distribution is applied in statistical tests.

Ususally the variance has to be estimated from the data. In this case we have to apply the famous

t-Test
instead of the z-Test.

Some of the things you should be able to explain

• Probability densities and how to get probability distributions from them


• when and how to approximate binomial by normal distribution
• Properties of the normal distribution (µ,σ 2 , important quantiles,. . . )

• normal distribution of a · X + b if X is normally distributed


• meaning of the central limit law
• R commands to deal with probability distributions
• z-test, H0 , p-value, significance level α

Note also the lists on pages 10, 21 and 28.

36

You might also like