Stochbasics Handout
Stochbasics Handout
Contents
1 Random Variables and Distributions 2
4 Expected value 10
8 Normal distribution 29
9 Normal approximation 33
10 The z-Test 34
You sample an indivdual from a population and measure its length X.
X is a random variable because it depends on random sampling.
Its expected value is in this case the population mean µ:
EX = µ
1
Contents
We start with a simpler Example: Rolling a dice, W is the result of the next trial.
1 1
S = {1, 2, . . . , 6} Pr(W = 1) = · · · = Pr(W = 6) = 6
( Pr(W = x) = 6
for all x ∈ {1, . . . , 6} )
The distribution of a random variable X assigns to each set A ⊆ S the probability Pr(X ∈ A) that X takes a
value in A.
In general, we use capitals for random variables (X, Y, Z, . . . ), and small letters (x, y, z, . . . ) for (possible) fixed
values.
{X ∈ A}
We can interpret this as the set of results (elementary events) for which the event is fulfilled. The intersection
{X ∈ A} ∩ {X ∈ B} = {X ∈ A, X ∈ B}
U := {X ∈ A}, V := {X ∈ B}
⇒ U ∩ V = {X ∈ A ∩ B}
then
U ∩ V = ∅ = {X ∈ ∅}
where ∅ is the (impossible) empty event (for which we use the same symbol as for the empty set).
If fact, events are (certain) subsets of a so-called sample space Ω. For example, if X is the result of
rolling a dice, then
n o
Ω = {X = 1}, {X = 2}, {X = 3}, {X = 4}, {X = 5}, {X = 6}
• In case like this with a finite Ω, als subsets of Ω are also events, and their probabilities are just the
sums of their elements.
2
• For infinite Ω things become more complicated:
– Events can have non-zero probability even if all their elements have zero probability.
– We cannot assume that all subsets of Ω are events (mathematical details are complicated).
• A probability distribution assigns to each event U ⊆ Ω a probability Pr(U ) that the event takes
place.
Caution:
2 2 4
Pr(W ∈ {2, 3}) + Pr(W ∈ {3, 4}) = + =
6 6 6
3
6= Pr(W ∈ {2, 3, 4}) =
6
Example: rolling two dice (W1 , W2 ): Let W1 and W2 the result of dice 1 and dice 2.
Pr(W1 ∈ {4}, W2 ∈ {2, 3, 4})
= Pr((W1 , W2 ) ∈ {(4, 2), (4, 3), (4, 4)})
3 1 3
= = ·
36 6 6
= Pr(W1 ∈ {4}) · Pr(W2 ∈ {2, 3, 4})
In general:
Pr(W1 ∈ A, W2 ∈ B) = Pr(W1 ∈ A) · Pr(W2 ∈ B)
3
If S is the sum of the results S = W1 + W2 , what is the probability that S = 5, if dice 1 shows W1 = 2?
!
Pr(S = 5|W1 = 2) = Pr(W2 = 3)
1 1/36 Pr(S=5,W1 =2)
= 6
= 1/6
= Pr(W1 =2)
Calculation rules:
We consider events from a sample space Ω.
• Ω and the impossible event ∅ are events, and Pr(Ω) = 1 and Pr(∅) = 0.
• If U, V ⊂ Ω are disjoint, that is U ∩ V = ∅, in other words, they contradict each other, then U ∪ V is also an event and:
Pr(U, V )
Pr(U |V ) :=
Pr(V )
How to say
Pr(U, V ) = Pr(V ) · Pr(U |V )
in words:
The probability that both U and V take place can be computed in two steps:
• For U ∩ V , the event V must take place.
• Multiply the probability of V with the conditional probability of U , given that V is already known to take
place. (Not relevant are the time points when it turns out that U or V take place.)
Stochastic Independence
Two random variables X and Y are (stochastically) independent, if all pairs of events of the form (X ∈ A, Y ∈ B)
for all possible A and B are stochastically independent.
Example:
4
• Tossing two dice: X = result dice 1, Y = result dice 2.
1 1 1
Pr(X = 2, Y = 5) = = · = Pr(X = 2) · Pr(Y = 5)
36 6 6
If X is a random variable with values in S and f : S → R is the function (or, more generally, a map),
then f (X) is a random variable that depends on X. If X takes the value x, f (X) takes the value f (x).
This implies:
Pr(f (X) ∈ U ) = Pr(X ∈ f −1 (U )),
Where f −1 (U ) is the inverse image of U , that is the set of all x such that f (x) ∈ U , formally:
f −1 (U ) = {x : f (x) ∈ U }
(Note the difference between f −1 ({y}) and f −1 (y). The latter only exists if f invertible, and is then
a number. The first is a set of numbers. Note also that {y} is not a number but a set containing one
number.)
●
●
8
6
f(X)
●
●
●
●
4
2
●
●
●
●
2
The function f : x 7→ (x − 3) for x ∈ {1, 2, 3, 4, 5, 6}) is not invertible.
●
●
0
Or, e.g.:
●
●
●
●
●
●
4
2
●
●
●
●
●
●
0
1 2 3 4 5 6
Example: Let f be the function f (x) = (x − 3)2 , and let X be the result of rolling a dice. (Imagine a
game, in which you can move on f (x) steps if the dice shows x pips).Then
and therefore
5
Pr(f (X) = 1) = Pr(f (X) ∈ {1})
1
= Pr(X ∈ f −1 ({1})) = Pr(X ∈ {2, 4}) = .
3
In an early detection examination with a 50 year old patient, the mammogram indicates breast cancer.
What is the probability that the patient really has breast cancer?
This background information was given and the question was asked to 24 experienced medical prac-
titioners. 1 .
• 8 of them answered: 90%
• 8 answered: 50 to 80%
• 8 answered: 10% or less.
This is a question about a conditional probability: How high is the conditional probability to have
cancer, given that the mammogram indicates it.[2cm]
We can compute conditional probabilities with the Bayes-Formula.
A, B events
The conditional probability of A, given B (assuming
Pr(B) > 0):
Pr(A ∩ B)
Pr(A|B) =
Pr(B)
(A ∩ B:= A and B occur)
Bayes-Formula:
Pr(B) Pr(A|B)
Pr(B|A) =
Pr(A)
1 Hoffrage, U. & Gigerenzer, G. (1998). Using natural frequencies to improve diagnostic inferences. Academic Medicine,
73, 538-540
6
Example: Let W ∈ {1, 2, 3, 4, 5, 6} be the result of rolling a dice. How probable is W ≥ 5 if W is an
c
A A
A := {W ≥ 5} B
even number? [0.5cm]
B := {W is even }
A∩B = {W is even and ≥ 5}
c
B
Pr(A ∩ B) 1/6 1
Pr(A|B) = = =
Pr(B) 3/6 3
1 1
Pr(B) · Pr(A|B) · 1
Pr(B|A) = = 2 3 =
Pr(A) 1/3 2
Pr(B) · Pr(A|B)
Pr(B|A) =
Pr(A)
Pr(B) · Pr(A|B)
=
Pr(B) · Pr(A|B) + Pr(B C ) · Pr(A|B C )
0.008 · 0.9
= ≈ 0.0939.
0.008 · 0.9 + 0.992 · 0.07
Thus, the probability that a patient for whom the mammogram indicates cancer has cancer is only 9.4%.
The right answer “approximately 10%” was only given by 4 of the 24 medical practitioners. Two of them
gave an explanation that was so fallacious that we have to assume that they gave the right answer only
by accident.
• In the US-American TV-Show Let’s Make a Deal the candidate can win a sports car at the end of
the show if he or she selects the right one of three doors.
• Behind the two wrong doors there are goats.
• The candidate first selects one of the three doors, let’s say door 1.
• The host of the show, Monty Hall, then says “I show you something” and opens one of the two
other doors, let’s say door 2. A goat is standing behind this door.
• The candidate can then stay with door 1 or switch to door 3.
• Should they switch to door 3?
We assume that the candidate (first) chose door 1, and the placement of the car is purely random.
7
A : The host opens door 2.
Pr(B) · Pr(A|B)
Pr(B|A) =
Pr(B) · Pr(A|B) + Pr(C) · Pr(A|C) + Pr(D) · Pr(A|D)
1
3 ·1
= 1 1 1 1
3 ·1+ 3 · 2 + 3 ·0
= 2/3
A Bernoulli experiment is an experiment with two possible outcomes “success” and “fail”, or 1 or 0.
Bernoulli distribution
Examples:
• Tossing a coin: Possible outcomes are “head” and “tail”
• Does the Drosophila have a mutation that causes white eyes? Possible outcomes are “yes” or “no”.
Assume a Bernoulli experiment (for example tossing a coin) with success probability p is repeated n times
independently.
What is the probability that it...
1. ...alway succeeds?
p · p · p · · · p = pn
2. ...always fails?
(1 − p) · (1 − p) · · · (1 − p) = (1 − p)n
3. ...first succeeds k times and then fails n − k times?
pk · (1 − p)n−k
Note
n n!
k
= k!·(n−k)!
(“n choose k”) is the number of possibilities to choose k successes in n trials.
8
Binomial distribution
Let X be the number of successes in n independent trials with success probability of p each. Then,
!
n k
Pr(X = k) = p · (1 − p)n−k
k
holds for all k ∈ {0, 1, . . . , n} and X is said to be binomially distributed, for short:
X ∼ bin(n, p).
0.10
0.30
● ●●
●
●
●
0.25
0.08
●
●
0.20
●
●
0.06
●
0.15
●
●
0.04
0.10
●
●
●
●
0.02
● ●
0.05
●
●
● ●
●
●
●
0.00
0.00
● ●
● ●●●
● ● ● ● ●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
0 2 4 6 8 10 0 20 40 60 80 100
k k
Pr(X = 0.32) =?
We can only answer this on the basis of a probabilistic model, and the answer will depend on how we
model the population.
Modeling approach
We make a few simplifying assumptions:
• Discrete generations
• The population is haploid, that is, each individual has exactly one parent in the generation before.
• constant population size n = 100
Pr(X = 0.32) still depends on whether few individuals have many offspring or whether all individuals
have similar offspring numbers. Pr(X = 0.32) is only defined with additional assumptions, e.g.:
• Each individual chooses its parent purely randomly in the generation before.
“purely randomly” means independent of all others and all potential parents with the same probability.
Our assumptions imply that each individuals of the next generations have a probability of 0.3 to
obtain allele A, and they get their alleles independently of each other.
Therefore, the number K of individuels who get allele A is binomially distributed with n = 100 and
p = 0.3:
K ∼ bin(n = 100, p = 0.3)
9
For X = K/n follows:
n
Pr(X = 0.32) = Pr(K = 32) = · p32 · (1 − p)100−32
32
100
= · 0.332 · 0.768 ≈ 0.078
32
• Conditional probabilities
• Stochastic independence of events, and of random variables
• Bayes formula and how to apply it
4 Expected value
Example: genetic and environmental effects
Example: In population on a continent, skin pigmentation S of an individual depends on
• genetic effects G
• environmental effects E (e.g. due to local amount of sunshine)
• random effects R
Simple Model:
S =G+E+R
S, G, E, R are random variables if they refer to a randomly chosen individual from the population.
Question
Is the population mean of S the sum of the population means of G, E and R?
Definition 2 (Expected value) Let X be a random variable with finite or countable state space S = {x1 , x2 , x3 . . . } ⊆
R. The expected value of X is defined by
X
EX = x · Pr(X = x)
x∈S
If we replace probabilities by relative frequencies in this definition, we get the formula for the mean value (of
a sample).
10
Definition 3 (Expected value) If X is a random variable with finite or countable state space S = {x1 , x2 , x3 . . . } ⊆
R, the expected value of X is defined by
X
EX = x · Pr(X = x)
x∈S
Examples:
• Let X be Bernoulli distributed with success probability p ∈ [0, 1]. Then we get
Theorem 1 (Linearity of Expectation) If X and Y are random variables with values in R and if a ∈ R, we
get:
• E(a · X) = a · EX
• E(X + Y ) = EX + EY
Theorem 2 (Only if independent!) If X and Y are stochastically independent random variables with val-
ues in R, we get
• E(X · Y ) = EX · EY .
11
Let S be the state space of X and define f (X) = a · X.
X
E(a · X) = E(f (X)) = f (x) · Pr(X = x)
x∈S
X
= a · x · Pr(X = x)
x∈S
X
= a· x · Pr(X = x)
x∈S
= a · EX
If X and Y are random variables, and Y has a countable state space S, then
X
Pr(X = x) = Pr(X = x, Y ∈ S) = Pr(X = x, Y = y).
y∈S
X
E(X + Y ) = E(f (X, Y )) = f (x, y) · Pr((X, Y ) = (x, y))
(x,y)∈S 2
XX
= (x + y) · Pr(X = x, Y = y)
x∈S y∈S
XX XX
= x · Pr(X = x, Y = y) + y · Pr(X = x, Y = y)
x∈S y∈S y∈S x∈S
X X X X
= x· Pr(X = x, Y = y) + y· Pr(X = x, Y = y)
x∈S y∈S y∈S x∈S
X X
= x · Pr(X = x) + y · Pr(Y = y)
x∈S y∈S
= E(X) + E(Y )
Proof of the product formula: Let S be the state space of X and Y , and let X and Y be (stochastically)
independent.
E(X · Y )
XX
= (x · y) Pr(X = x, Y = y)
x∈S y∈S
XX
= (x · y) Pr(X = x) Pr(Y = y)
x∈S y∈S
X X
= x Pr(X = x) · y Pr(Y = y)
x∈S y∈S
= EX · EY ·
12
Then the Yi are Bernoulli distributed and X = Y1 + · · · + Yn is binomially distributed with parameters (n, p),
where p is the success probability of the trials.
EX = E(Y1 + · · · + Yn )
= EY1 + · · · + EYn
= p + · · · + p = np
Thus, we obtain:
X ∼ bin(n, p) ⇒ EX = n · p
Probability distributions on continuous ranges are defined by densities instead of probabilities of single
values. Compare, e.g.:
−(x−30)2 /42
p(k) = 100 f (x) = e 42·π
k · 0.3k · 0.7100−k
●●●
0.08
0.08
● ●
●
●
●
●
●
p(k)
f(x)
●
0.04
0.04
● ●
● ●
● ●
● ●
● ●
0.00
0.00
● ●●
●● ● ● ● ●●
15 20 25 30 35 40 45 15 20 25 30 35 40 45
k x
Definition 4 (Variance, Covariance and Correlation) The Variance of a R-valued random variable X is
2
= E (X − EX)2 .
VarX = σX
√
σX = Var X is the Standard Deviation.
If Y is another R-valued random variable,
13
The Variance
VarX = E (X − EX)2
and
n
2 1X
(xi − x)2
Var X = E X − EX =
n i=1
If (x1 , y1 ), . . . , (xn , yn ) ∈ R × R are data if (X, Y ) are drawn from the data such that Pr((X, Y ) = (x, y)) =
|{i : (xi ,yi )=(x,y)}|
n
, we get
n
1X
Cov (X, Y ) = E X − EX Y − EY = (xi − x)(yi − y)
n i=1
14
10
●
8
●
●
● ● ●
● ●
● ●
●
● ●
●
●● ●
● ● ●
●● ●●
6
●● ●
● ● ● ●● ● ●
●●●● ●
● ●
● ●
● ● ●
● ●● ● ● ● ●●
[X−EX]<0 [X−EX]>0
Y
● ●
● ● ●
●● ●● ●●
● ● ●● ● ●●
● ● ●
● ● ●
●● ● ●
● ●
4
● ●●● ●
● ● ●
● ● ●
●
●
●
2
●
0
0 2 4 6 8 10
X
10
●
[Y−EY]>0
8
●
●
● ● ●
● ●
● ●
●
● ●
●
●● ●
● ● ●
●
● ●●
6
●●
●
● ● ● ●● ● ●
●●●● ●
● ●
● ● ●●
● ●● ● ● ● ●●●
[X−EX]<0 [X−EX]>0
Y
● ●
● ● ●
●● ●● ●●
● ● ●● ● ●●
● ● ●
● ● ●
●● ● ●
● ●
4
● ●●● ●
● ● ●
● ● ●
●
●
●
2
●
[Y−EY]<0
0
0 2 4 6 8 10
15
Cov(X,Y)= 1.11
10
●
8
●
●
● ● ●
[X−EX]*[Y−EY]<0 [X−EX]*[Y−EY]>0
●
●
●
●
●
● ●
●
●● ●
● ● ●
●● ●●
6
●● ●
● ● ● ●● ● ●
●●●● ●
● ●
● ●
● ● ●
● ●● ● ● ● ●●
Y
● ●
● ● ●
●● ●● ●●
● ● ●● ● ●●
● ● ●
● ● ●
●● ● ●
● ●
4
● ●●● ●
● ● ●
● ● ●
●
●
●
[X−EX]*[Y−EY]>0 [X−EX]*[Y−EY]<0
2
●
0
0 2 4 6 8 10
Cov(X,Y)= −0.78
10
[X−EX]*[Y−EY]<0 [X−EX]*[Y−EY]>0
8
●
●
● ●
●
● ● ●
●
● ●●
6
● ●● ●●
●●●●●●●
●●● ● ●
● ●● ● ● ● ●
● ●
● ● ● ● ●
Y
● ● ●●
● ●● ● ●
●● ● ● ●● ●
●
● ● ●● ●●●
● ● ● ●
●●●
●●
4
● ● ●
●●●
● ●● ●
● ●
● ● ●
●
● ●
2
[X−EX]*[Y−EY]>0 [X−EX]*[Y−EY]<0
0
0 2 4 6 8 10
16
10
10
8
8
●
●
●
● ●
● ●
● ● ●
● ● ● ●
●
● ●
● ● ● ●
●● ●
● ● ● ● ●
● ●●
6
6
● ●● ● ● ● ●●
● ● ●
● ● ● ● ●● ● ●
● ● ● ●● ●
● ● ● ●●● ●
● ● ● ● ● ●
● ● ● ● ●
● ● ● ● ●● ● ● ● ● ● ● ●● ●● ●●
● ●●● ● ● ●● ●
Y
Y
● ● ●●
● ●●●● ● ●●●● ●
● ● ●
● ●●● ●
● ●● ●● ●
● ● ● ●
● ●● ● ● ●
● ●●●●
●● ●
● ●
●●● ●● ●
● ●● ● ● ●
4
4
●● ● ●
● ● ● ● ●
● ● ● ●
● ●
●● ●● ●
● ● ● ● ●
● ●
●
●
● ●
2
2
●
0
0
0 2 4 6 8 10 0 2 4 6 8 10
X X
●
8
●● ●
●
●●● ●●
● ●● ●●
●●
●● ●
●
●●● ●●● ●●●●● ●●
●
●
●●
●●●●●●●●
●●●
●
●
●
●
●
●●●●
●●● ●●
● ●
●●● ●
● ●
● ●●
●●●
●●
●●●
●
●●● ●
●●●
●
6
Y
4
2
10
●● 0 2 4 6 8 10
8
● ●
● ● X
●
● ●
●
● ● ● ●●● ●
● ●
● ● ●●
σX = 1.03, σY = 0.32
6
●●
● ●●
●● ●● ●
●
●● ● ●
●●
● ●●
● ● ●● ●
Y
●● ●●
●● ● ●
● ● ●● ●
●●
●●
●
●● ● ● ●
● ●●● ● ●● ●●
●
Cov(X, Y ) = 0.32
4
●● ● ●
● ●● ●
●
● ●
●
●
●
Cor(X, Y ) = 0.95
●
●
●
2
10
0
0 2 4 6 8 10
8
σX = 1.13, σY = 1.2
6
Y
Cov(X, Y ) = −1.26 ● ●
4
Cor(X, Y ) = −0.92 ●
●
●
●
● ●
●
●
●
●
●● ●
●
● ●
● ●
● ●
2
● ●
●
●
● ●
●
●
●
●
● ●
●
●
● ●●
●●
●● ●●
●
●●● ●●
●●●
●●
0
0 2 4 6 8 10
σX = 0.91, σY = 0.88
Cov(X, Y ) = 0
Cor(X, Y ) = 0
17
• If X and Y are independent, then Cov(X, Y ) = 0 (but not the other way around!)
• Cov(X, Y ) = Cov(Y, X)
• Cov(X, Y ) = E(X · Y ) − EX · EY (Exercise!)
• Cov(a · X, Y ) = a · Cov(X, Y ) = Cov(X, a · Y )
• −1 ≤ Cor(X, Y ) ≤ 1
• Cor(X, Y ) = Cor(Y, X)
• Cor(X, Y ) = Cov(X/σX , Y /σY )
• VarX = Cov(X, X)
• VarX = E(X 2 ) − (EX)2 (Exercise!)
• Var(a · X) = a2 · VarX
• Var(X + Y ) = VarX + VarY + 2 · Cov(X, Y )
P
n Pn Pn Pj−1
• Var
i=1 Xi = i=1 Var Xi + 2 · j=1 i=1 Cov(Xi , Xj )
Question for skin pigmentation example: How does the standard deviation of S depend on the standard
deviations of G, E and R?
p
Answer: σS = Var(S), and
Perhaps we may assume Cov(G, R) =Cov(E, R) = 0, but Cov(G, E) > 0 is plausible as individuals
who live in more sunny areas may have genes for darker pigmentation.
18
So, how to measure σG and σE ?
(at least in principle)
Var(R): infer from genetically identically individuals in same environment
Var(G + R): infer from individuals sampled from whole population but exposed to same environment
Var(E + R): infer from genetically identically individuals exposed to random environments
If Cov(G, R) =Cov(E, R) = 0, then
p
σG = Var(G + R) − Var(R) and
p
σE = Var(E + R) − Var(R).
In particular: The standard error √sn is a estimator for the standard deviation of the σX sample mean X of
(X1 , X2 , . . . , Xn ).
The sample standard deviation s is an estimator of the standard deviation σ in the entire population.
Bernoulli distribution
A Bernoulli distributed random variable Y with success probability p ∈ [0, 1] has expected value
EY = p
and variance
Var Y = p · (1 − p)
19
Proof : From Pr(Y = 1) = p and Pr(Y = 0) = (1 − p) follows
EY = 1 · p + 0 · (1 − p) = p.
variance:
2
Var Y = E Y 2 − EY
= 12 · p + 02 · (1 − p) − p2 = p · (1 − p)
Binomial distribution
Let Y1 , · · · , Yn be independent Bernoulli distributed with success probability p. Then follows
n
X
Yi =: X ∼ bin(n, p)
i=1
and we get:
n
X Xn
Var X = Var Yi = Var Yi = n · p · (1 − p)
i=1 i=1
Binomial distribution
Theorem 5 (Expected value and variance of the binomial distribution) If X is binomially distributed
with parameters (n, p), we get:
EX = n · p
und
Var X = n · p · (1 − p)
p · (1 − p)
=
n
20
Some of the things you should be able to explain
traits with quantitative thresholds: environment and genes determine whether a character is ex-
pressed
Quantitative Genetics
– Robertson-Price identity
– breeder’s equation
Recommended Books
21
References
[LW98] M. Lynch, B. Walsh (1998) Genetics and Analysis of Quantitative Traits Sinauer Associates,
Inc., Sunderland, MA, USA
[BB+07] N.H. Barton, D.E.G. Briggs, J.A. Eisen, D.B. Goldstein, N.H. Patel (2007) Evolution Cold
Spring Harbor Laboratory Press, Cold Spring Harbor, NY, USA
all
survival prob
survivors
0.5
0
0 5 10 15 20
0 5 10 15 20
Tall fathers tend to have sons that are slightly smaller than the fathers.
Sons of small fathers are on average larger than their fathers.
Body Height Body Height Body Height
2.2
2.2
2.2
2.0
2.0
2.0
1.8
1.8
1.8
Son
Son
1.6
1.6
1.6
1.4
1.4
1.4
1.4 1.6 1.8 2.0 2.2 1.4 1.6 1.8 2.0 2.2 1.4 1.6 1.8 2.0 2.2
Similar effects
• In sports: The champion of the season will tend to fail the high expectations in the next year.
22
• In school: If the worst 10% of the students get extra lessons and are not the worst 10% in the next
year, then this does not proof that the extra lessons are useful.
5 10 15
Size [cm]
hist(phenotype,col="lightblue",breaks=4:36/2)
lines(20:180/10,survival.prob(20:180/10)*100,lwd=2)
hist(phenotype[survivors==1],add=TRUE,col="blue",breaks=4:36/2)
hist(genotype[survivors==1],add=TRUE,col="orange",breaks=4:36/2)
23
●
16
● ●
● all ●
● ●
●●
●
● survivors ● ● ● ●● ●
● ● ●● ●
●● ●●● ● ●● ●●
●● ● ● ●
14
● ●● ●
● ●●●● ● ● ●● ●●
● ● ●●●●● ●●
●●●● ● ●● ●
●●●
●●
● ●●●●●●
● ● ● ●● ● ●
●●● ●●● ●
●●● ●● ●
● ●●●●●●● ●●●●● ●
●
● ●
●●
●
●●●
●
● ●●● ●
● ● ● ●
● ● ●● ●●●● ● ●●●● ●
●● ●●●●
●●● ●
● ● ● ● ● ●●● ● ●
12
●
●●● ●● ●●
● ●● ●●●●
● ●●
●
●●●
●● ● ●● ●
●
●
●●
●
●●●●●
● ●
●●●●●
●●
●
●●
●● ●● ●
● ●●
●
●●●●● ●●● ●●
●●●● ● ●● ●
● ● ●●●●●●●
● ●
● ●
●
●●
● ● ●
phenotype
● ●● ●
●●
● ●
●●● ●●●●
●●●●
●●●●
●●●● ● ●
●● ● ●●●●●●
●●● ●
●
●
●● ●
●●
●
●●●
● ●●●
●
●
●●●●
● ● ●●●
●
●●●●●●
●● ● ●● ●● ●●
●● ● ●●●●
● ● ●
●●●●
●●●● ● ●●●● ●
●●
● ●● ●
●●
●●
●●
●●● ●●● ●●●●
● ● ● ●●
●
●
● ●●●
● ●●●
● ●
●● ●
●
10
● ● ●
● ● ●
●● ●●●● ●● ● ●● ● ● ●● ●
●
●
●●●●
●●●●
●●
●● ●●●
●
● ●●
● ●
●●●● ●
●●● ●
●
● ●●●● ●●●● ●
●●
●● ●● ● ●● ● ●
●● ●●●●●
●●●●●● ●●● ●
● ●●●● ●●● ●●
●● ● ●
●●●●● ●● ● ●● ● ●● ●
6
● ● ●
●●●● ● ● ●●
●●
●● ●●● ● ●
●
● ●
●
4
● ●
●
4 6 8 10 12 14 16
genotype
all
survival prob
survivors
0.5
0
0 5 10 15 20
0 5 15 20
{
24
• Predict change from one generation to the next
• Account for selection and heritability
• use a measure of heritability that can be estimated from parent-offspring comparisons
µ mean phenotype before selection
µs mean phenotype after selection but before reproduction
S = µs − µ directional selection differential
µo mean phenotype in offspring generation
∆µ = µo − µ
W (z) individual fitness: probability that individual with phenotype z will survive to reproduce
p(z) density of phenotype z before selection
R
W = W (z) · p(z) dz mean individual fitness
w(z) = W (z)/W relative individual fitness
ps (z) = w(z)p(z) density of phenotype z after selection but before reproduction (density in a stochastic
sense, i.e. integrates to 1)
Let Z be the phenotype of an individual drawn randomly from the parent population before selection.
µ = EZ E(w(Z)) = 1
Z Z
µS = z · pS (z) dz = z · w(z) · p(z) dz = E(Z · w(Z))
z z
In any case,
Var(G)
=: H 2
Var(Z)
is called heritability in the broad sense.
Problem: Var(G) and thus also H 2 are parameters that are hard to estimate.
25
narrow-sense heritability
Let Zm , Zf , Zo be a the phenotype sampled from a triplet of mother, father and an offspring, sampled
from the population. The narrow-sense heritability h2 is defined by
Z +Z
Cov m 2 f , Zo
h2 := .
Z +Z
Var m 2 f
Z +Z
It is the slope of the regression line to predict Z0 from the mid-parental phenotype m 2 f and can be
estimated from a sample of many parent-offspring triples.
We will see later in this semester: The line that predicts Y from X has slope Cov(X, Y )/Var(X).
}a
Equivalent definition of h2
Assume that Zm and Zf are independent and have the same distribution as Z. Then follows
Zm + Zf 1 1 1
Var = Var (Zm + Zf ) = (Var (Zm ) + Var (Zf )) = Var (Z) ,
2 4 4 2
and
Zm + Zf 1 Cov (Zm , Z0 ) + Cov (Zf , Z0 )
Cov , Z0 = Cov (Zm + Zf , Z0 ) = .
2 2 2
And thus
Zm +Zf
Cov 2 , Z0 Cov (Zm , Z0 ) + Cov (Zf , Z0 )
h2 = =
Var
Zm +Zf Var (Z)
2
26
Equivalent definition of h2 under certain assumptions
Let Gm and Gf be the phenotypic effects of the genes transmitted by the mother and the father to
the offspring.
If mating is so random and if there are no correlations (between parental genotypes and environmental
effects etc.), and if genetic effects are additive, we obtain
Zm + Zf Gm + Gf VarGm + VarGf
Cov , Zo = Cov , Gm + Gf = ,
2 2 2
and thus
Zm +Zf
Cov 2 , Zo VarGm + VarGf
h2 = 1 =
2 Var (Z)
Var (Z)
If U and V are sampled independently from {A, B} according to the population allele frequencies p
and 1 − p, we obtain that G(U ), G(V ) and D(U, V ) are random variables with the following properties:
Assume now we have n unlinked loci with additive effects G1 (.), G2 (.), . . . , Gn (.) and dominance
deviations D1 (., .), D2 (., .), . . . , Dn (., .), and the effects are additive among the loci, that is, no
epistasis. (Otherwise: how to separate additive from non-additive locus interactions, see e.g. Falconer,
Mackay (1996) Introduction to Quantitative Genetics. 4th ed.)
Then, the phenotypic variance is the sum of
the so-called additive variance VA = Var (G1 (U1 ) + G1 (V1 ) + G2 (U2 ) + G2 (V2 ) + · · · + Gn (Un ) + Gn (Vn )),
the so-called dominance variance VD = Var (D1 (U1 , V1 ) + D2 (U2 , V2 ) + · · · + Dn (Un , Vn )) and
27
and the environmental variance VE .
We can then define narrow-sense heritability as the fraction of phenotypic variation that is due to
additive genetic effects
VA
h2 = ,
VA + VD + VE
Cov((Zm +Zf )/2,ZO )
and this is still Var((Zm +Zf )/2) , see e.g. Felsenstein (2019+) for details.
Example
References
[1] Galen (1996) Rates of floral evolution: adaptation to bumblebee pollination in an alpine wildflower,
Polemonium viscosum Evolution 50(1): 120–125
• S was measured as
– 7% when estimated from number of seeds
– 17% when estimated from number of surviving offspring after 6 years
• h2 ≈ 1
• Robertson-Price identity
• Breeder’s equation
• Why is the narrow-sense heritability and not the broad-sense heritability used in the breeder’s
equation
• Connection of narrow-sense heritability and additive genetics effects
References
[1] E.N. Moriyama (2003) Codon Usage Encyclopedia of the human genome, Macmillan Publishers Ltd.
28
Then the number X of the CCC would be binomially distributed with p = 21 and n = 16710 + 18895 =
35605. Assume the number X (= 18895) of CCC is binomially distributed with p = 12 and n =
16710 + 18895 = 35605.
EX = n · p = 17802.5
p
σX = n · p · (1 − p) ≈ 94.34
18895 − 17802.5 = 1092.5 ≈ 11.6 · σX
Does this look like purely random?
How small is the probability of a deviation from the expectation of at least ≈ 11.6 · σX , if it is all purely random?
We have to calculate
Pr |X − EX| ≥ 11.6σX .
n
A problem with the binomial distribution is: Calculating k
precisely is slow for large n. Therefore:
8 Normal distribution
0.025
●
●●
●● ●
●●
●●
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
dbinom(400:600, 1000, 0.5)
0.020
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
0.015
0.015
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
0.010
0.010
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
0.005
0.005
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
●● ●● ●● ●●
●● ●● ●● ●●
●● ●● ●● ●●
0.000
0.000
●
●● ●●
● ●
●● ●●
●
●
●●
●●
● ●
●●
●●
● ●
●●
●●
● ●
●●
●●
●
●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
● ●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
● ●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
● ●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●
400 450 500 550 600 400 450 500 550 600
400:600 400:600
29
A random variable Z with the density
1 x2
f (x) = √ · e− 2
2π
“Gaussian bell-
0.4
curve”
0.3
for short:
Z ∼ N (0, 1)
0.2
EZ = 0
0.1
Var Z = 1
0.0
−4 −2 0 2 4
is called
standard-normally distributed.
If Z is N (0, 1) distributed, then X = σ · Z + µ is normally distributed with mean µ and variance σ 2 , for short:
X ∼ N (µ, σ 2 )
1 − (x−µ)
2
f (x) = √ ·e 2σ 2
2πσ
0.25
0.20
0.15
0.10
0.05
0.00
0 2 µ − σ4 µ µ
6 + σ 8 10
30
0.25
0.20
0.15
0.10
0.05
0.00
0 2 4 6 8 10
a b
we get Z b
Pr(Z ∈ [a, b]) = f (x)dx.
a
Note: the probability density f is not the probability distribution of Z, but the probability distribu-
tion
A 7→ Pr(Z ∈ A)
can be calculated from the probability density:
Z
A 7→ Pr(Z ∈ A) = f (x)dx
A
31
0.4
0.3
0.2
0.1
0.0
−4 −2 0 2 4
draw a sample of length 6 from standard normal with expected value 5 and standard deviation 3:
> rnorm(7,mean=5,sd=3) [1] 2.7618897 6.3224503 10.8453280 -0.9829688 5.6143127 0.6431437 8.123570
0.2
0.1
0.0
0.5
0.2
2.5% 2.5%
0.1
0.0
−4 −2 0 2 4
32
From the symmetry around the y-axis follows
Pr(|Z| > z) = Pr(Z < −z) + Pr(Z > z) = 2 · Pr(Z < −z)
So find z > 0, such that Pr(Z < −z) = 2.5%. > qnorm(0.025,mean=0,sd=1) [1] -1.959964 Answer: z ≈ 1.96,
just below 2 standard deviations.
9 Normal approximation
Normal approximation
For large n and p which are not too close to 0 or 1, we can approximate the binomial distribution by a normal
distribution with the corresponding mean and variance.
0.025
●
●●
●● ●
●●
●●
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
dbinom(400:600, 1000, 0.5)
0.020
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
0.015
0.015
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
0.010
0.010
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
0.005
0.005
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
●● ●● ●● ●●
●● ●● ●● ●●
●● ●● ●● ●●
0.000
0.000
●
●● ●●
● ●
●● ●●
●
●
●●
●●
● ●
●●
●●
● ●
●●
●●
● ●
●●
●●
●
●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
● ●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
● ●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
● ●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●
400 450 500 550 600 400 450 500 550 600
400:600 400:600
0.30
● ●
● ●
0.25
0.25
0.20
0.20
● ●
dbinom(0:10, 10, 0.2)
0.15
● ●
0.10
0.10
● ●
0.05
0.05
● ●
0.00
0.00
● ●
● ● ● ● ● ● ● ●
0 2 4 6 8 10 0 2 4 6 8 10
0:10 0:10
33
Theorem 8 (Central Limit Law) If the R-valued random variables X1 , X2 , . . . are independent and identically
distributed with finite variance 0 < Var Xi < ∞ and if
Zn := X1 + X2 + · · · + Xn
is the sum of the first n variables, then the centered and rescaled sum is in the limit n → ∞ standard-normally
distributed:
Zn − EZn
√ ∼ N (µ = 0, σ 2 = 1)
Var Zn
for n → ∞. Formally: For all −∞ ≤ a < b ≤ ∞ holds
Zn − EZn
lim Pr a ≤ √ ≤ b = Pr(a ≤ Z ≤ b),
n→∞ Var Zn
where Z is a standard-normally distributed random variable.
Usually holds:
If Y is the sum of many small contributions, most of which are independent of each other, then Y is approximately
normally distributed.
that is
Y ∼ N µ = EY, σ 2 = Var Y
10 The z-Test
Back to the example with the proline codons in the human genome.
The Hypothesis
Purely random No difference
is called the Nullhypothesis.
To convince the doubters, we have to find arguments against the null hypothesis. We show: If the null
hypothesis is true the observation is very improbable.
CCT is used k = 16710 times CCC is used n − k = 18895 times Under the null hypothesis “just random” the
number X of CCT is bin(n, p)-distributed with n = 35605 and p = 0.5.
34
Question: Is it plausible, that a random variable X that has taken the value k = 18895 is approximately
Relevant is the probability (assuming the null hypothesis H0 ) that X takes a value at least as extreme as k:
k
Pr(|X − µ| ≥ |k − µ|) = Pr(|X − µ| ≥ 1092.5) ≈ Pr(|X − µ| ≥ 11.6 · σ)
We have memorized already:
P r(|X − µ| ≥ 3 · σ) ≈ 0.003
We can argue that such a deviation from the expected value could only be explaind by extreme random.
Thus we can reject the null hypothese “just random” and search for alternative explanations, as for example
differences in the efficiency of CCC and CCT or in the availability of C and T.
Summary z-Test
Null hypothesis H0 (what we want to reject): the observed value x comes from a normal distribution with
mean µ and known Variance σ 2 .
p-value =Pr(|X − µ| ≥ |x − µ|), where X ∼ N (µ, σ 2 ), the probability of a deviation that is at least as large as
the observed one.
Significance level α : usually 0.05. If the p-value is smaller than α, we reject the null hypothesis on the
significance level α and look for an alternative explanation.
35
Limitations of the z-Test
The z-Test can only be applied if the variance of the normal distribution is known or at least assumed to be
known according to the null hypothesis.
This is usually not the case when the normal distribution is applied in statistical tests.
Ususally the variance has to be estimated from the data. In this case we have to apply the famous
t-Test
instead of the z-Test.
36