Will Monroe with materials by
July 3, 2017 Mehran Sahami
and Chris Piech
image: mattbuck
Conditional Probability
Announcements: Problem Set #1
Due this Wednesday!
4.c is particularly challenging.
Announcements: Python!
Handout on website
Tutorial: Wed. 7/5, 2:30pm
Review: What is a probability?
# ( E)
P( E)=lim
n→∞ n
Review: Meaning of probability
A quantification of ignorance
image: Frank Derks
Review: Meaning of probability
A quantification of ignorance
image: www.yourbestdigs.com
Review: Axioms of probability
(1)
0≤P( E)≤1
(2)
P(S)=1
(3)
If E∩F=∅ , then
P( E∪F)=P( E)+ P( F)
(Sum rule, but with probabilities!)
Review: Corollaries
c
P( E )=1−P( E)
If E⊆F , then P( E)≤P( F)
P( E∪F)=P( E)+ P( F)−P( EF)
(Principle of inclusion/exclusion, but with probabilities!)
Review: Inclusion/exclusion with
more than two sets
prob. of add or prob. of
OR subtract AND
(based on size)
n n r
P ( )
⋃
i=1
Ei =∑ (−1)
r=1
(r +1)
∑
i 1 <⋯<i r
P ( )
⋂
j=1
Ei j
sum over sum over
subset all subsets
sizes of that size
Equally likely outcomes
Coin flip S = {Heads, Tails}
Two coin flips S = {(H, H), (H, T), (T, H), (T, T)}
Roll of 6-sided die S = {1, 2, 3, 4, 5, 6}
1
P(Each outcome)=
|S|
|E| (counting!)
P( E)=
|S|
Review: How do I get started?
For word problems involving probability,
start by defining events!
Review: Getting rid of ORs
Finding the probability of an OR of events
can be nasty. Try using De Morgan's laws
to turn it into an AND!
c c c
P( A∪B∪⋯∪Z )=1−P( A B ⋯Z )
E F
Birthdays
Wolfram Alpha
Review: Flipping cards
● Shuffle deck.
● Reveal cards from the top until
we get an Ace. Put Ace aside.
● What is P(next card is the
Ace of Spades)?
● P(next card is the 2 of Clubs)?
P(Ace of Spades) = P(2 of Clubs)
Definition of conditional probability
The conditional probability P(E | F) is the
probability that E happens, given that F
has happened. F is the new sample space.
P( EF)
P(E|F )=
P( F)
E EF F
Equally likely outcomes
If all outcomes are equally likely:
|EF|
P(E|F )=
|F|
E EF F
Rolling two dice
D1 D2
E: {all outcomes such that the sum of the two dice is 4}
What should you hope for D1 to be?
A) 2
B) 1 and 3 are equally good
C) 1, 2, 3 are equally good
D) other
https://bit.ly/1a2ki4G → https://b.socrative.com/login/student/
Room: CS109SUMMER17
Rolling two dice
P( D 1 + D 2 =4)=?
D1 D2
E: {all outcomes such that the sum of the two dice is 4}
Rolling two dice
3 1
P( D 1 + D 2=4)= =
D1 D2 36 12
S = {(1, 1), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6),
S = {(2, 1), (2, 2), (2, 3), (2, 4), (2, 5), (2, 6),
|E|=3 S = {(3, 1), (3, 2), (3, 3), (3, 4), (3, 5), (3, 6),
S = {(4, 1), (4, 2), (4, 3), (4, 4), (4, 5), (4, 6),
|S|=36
S = {(5, 1), (5, 2), (5, 3), (5, 4), (5, 5), (5, 6),
S = {(6, 1), (6, 2), (6, 3), (6, 4), (6, 5), (6, 6)}
Rolling two dice
P( D 1 + D 2=4|D 1=2)=?
D1 D2
E: {all outcomes such that the sum of the two dice is 4}
F: {all outcomes such that the first die is 2}
Rolling two dice
P( E|F)=?
D1 D2
E: {all outcomes such that the sum of the two dice is 4}
F: {all outcomes such that the first die is 2}
Rolling two dice
P( E|F)=?
D1 D2
S = {(1, 1), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6),
S = {(2, 1), (2, 2), (2, 3), (2, 4), (2, 5), (2, 6),
S = {(3, 1), (3, 2), (3, 3), (3, 4), (3, 5), (3, 6),
S = {(4, 1), (4, 2), (4, 3), (4, 4), (4, 5), (4, 6),
S = {(5, 1), (5, 2), (5, 3), (5, 4), (5, 5), (5, 6),
S = {(6, 1), (6, 2), (6, 3), (6, 4), (6, 5), (6, 6)}
Rolling two dice
1
P( E|F)=
D1 D2 6
S = {(1, 1), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6),
S = {(2, 1), (2, 2), (2, 3), (2, 4), (2, 5), (2, 6),
|EF|=1 S = {(3, 1), (3, 2), (3, 3), (3, 4), (3, 5), (3, 6),
S = {(4, 1), (4, 2), (4, 3), (4, 4), (4, 5), (4, 6),
|F|=6
S = {(5, 1), (5, 2), (5, 3), (5, 4), (5, 5), (5, 6),
S = {(6, 1), (6, 2), (6, 3), (6, 4), (6, 5), (6, 6)}
Definition of conditional probability
The conditional probability P(E | F) is the
probability that E happens, given that F
has happened. F is the new sample space.
P( EF)
P(E|F )=
P( F)
E EF F
Rolling two dice
1 1/36
P( E|F)= =
D1 D2 6 6/36
S = {(1, 1), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6),
S = {(2, 1), (2, 2), (2, 3), (2, 4), (2, 5), (2, 6),
|EF|=1 S = {(3, 1), (3, 2), (3, 3), (3, 4), (3, 5), (3, 6),
S = {(4, 1), (4, 2), (4, 3), (4, 4), (4, 5), (4, 6),
|F|=6
S = {(5, 1), (5, 2), (5, 3), (5, 4), (5, 5), (5, 6),
S = {(6, 1), (6, 2), (6, 3), (6, 4), (6, 5), (6, 6)}
Juniors
What is the probability that a randomly chosen
(rising) junior in CS 109 comes to class?
C: event that randomly chosen student comes to class
J: event that randomly chosen student is a junior
P(CJ ) ? /65
P(C|J )= =
P(J ) 16/65
What if P(F) = 0?
P( EF)
P(E|ZeroDivisionError:
F )=
P( F)
float division by zero
Congratulations! You've observed the impossible!
Definition of conditional probability
The conditional probability P(E | F) is the
probability that E happens, given that F
has happened. F is the new sample space.
P( EF)
P(E|F )=
P( F)
E EF F
Chain rule of probability
The probability of both events happening
is the probability of one happening times
the probability of the other happening
given the first one.
P(EF )=P (F) P( E|F )
S S
E EF F = F x EF F
General chain rule of probability
The probability of all events happening
is the probability of the first happening
times the prob. of the second given the first
times the prob. of the third given the first two
...etc.
P(EFG …)=P( E) P(F|E) P(G|EF)…
Four piles of cards
● Divide deck randomly into 4
piles of 13 cards each.
● What is P(one Ace in each pile)?
S: ways of labeling 52 cards with 4 types of labels
E: ways resulting in all Aces getting different labels
48
|E|=4 !⋅
(
12,12,12,12 ) |E|
P (E)= ≈0.105
52 |S|
|S|=
(13,13,13,13 )
Four piles of cards
● Divide deck randomly into 4
piles of 13 cards each.
● What is P(one Ace in each pile)?
E1: Ace of Spades goes in any one pile
E2: Ace of Clubs goes in different pile from Spades
E3: Ace of Hearts goes in different pile from first two
E4: Ace of Diamonds goes in different pile from first three
P( E)=P ( E1 E 2 E 3 E 4 )=P ( E1 ) P(E 2|E1 ) P( E3|E 1 E 2 ) P( E 4|E 1 E 2 E3 )
52 39 26 13
= ⋅ ⋅ ⋅ ≈0.105
52 51 50 49
Law of total probability
You can compute an overall probability
by adding up the case when an event
happens and when it doesn't happen.
C
P(F )=P(EF)+ P( E F)
C C
P(F )=P(E) P( F|E)+ P( E ) P(F E )
|
S S S
F = EF + EcF
Majoring in CS
What is the probability that a randomly chosen
student in CS 109 is a (declared) CS major (M)?
|M| 7
P(M )= =
|S| 65
Majoring in CS
What is the probability that a randomly chosen
student in CS 109 is a (declared) CS major (M)?
C
P(M )=P(J M )+P(J M )
juniors C
(J) |J M| |J M|
= +
|S| |S|
3 4 7
= + =
65 65 65
Majoring in CS
What is the probability that a randomly chosen
student in CS 109 is a (declared) CS major (M)?
P(M )=P(J ) P( M|J )
juniors C C
(J) + P(J ) P( M|J )
16 3 49 4
= ⋅ + ⋅
65 16 65 49
7
=
65
General law of total probability
You can compute an overall probability
by summing over mutually exclusive and
exhaustive sub-cases.
P(F )=∑ P(E i F )
i
P(F )=∑ P(E i ) P(F|E i )
i
S E1 E2
F
E1F E2F
E4F E3F
E4 E3
Majoring in CS
What is the probability that a randomly chosen
student in CS 109 is a (declared) CS major (M)?
P(M )=P(J ) P( M|J )
juniors C C
(J) + P(J ) P( M|J )
16 3 49 4
= ⋅ + ⋅
65 16 65 49
7
=
65
Majoring in CS
What is the probability that a randomly chosen
student in CS 109 is a (declared) CS major (M)?
sophomores (Σ)
P(M )=P( M Σ)+ P(M J )
seniors (S) juniors
(J)
+ P(M S)+P( M G)
grads + P(M O)
(G)
other
(O)
0 3 2 2 0
= + + + +
65 65 65 65 65
7
=
65
Majoring in CS
What is the probability that a randomly chosen
student in CS 109 is a (declared) CS major (M)?
sophomores (Σ)
P(M )=P(Σ) P( M|Σ)
seniors (S) juniors +P(J ) P (M |J)
(J)
+P(S) P(M |S)
grads
(G) +P(G) P( M|G)
other +P(O) P( M|O)
(O)
0 7 3 16 2 5 2 23 0 14 7
= ⋅ + ⋅ + ⋅ + ⋅ + ⋅ =
7 65 16 65 5 65 23 65 14 65 65
General law of total probability
You can compute an overall probability
by summing over mutually exclusive and
exhaustive sub-cases.
P(F )=∑ P(E i F )
i
P(F )=∑ P(E i ) P(F|E i )
i
S E1 E2
F
E1F E2F
E4F E3F
E4 E3
Break time!
Bayes' theorem
You can “flip” a conditional probability
if you multiply by the probability of
the hypothesis and divide by the
probability of the observation.
P(F|E) P (E)
P(E|F )=
P(F )
Probabilistic inference
Beliefs
P(E|F)
Unobserved truth
E
Evidence
F
Probabilistic inference
definition of
P( EF)
P(E|F )= conditional
P( F) probability
P(F|E) P (E) chain rule
P(E|F )= (aka same def'n,
P(E|F)
P(F ) plus algebra)
F
Finding the denominator
If you don't know P(F) on the bottom,
try using the law of total probability.
P(F|E) P(E)
P(E|F )= c c
P(F|E) P (E)+P( F E ) P( E )
|
P(F|E) P( E)
P(E|F )=
∑ P(F|E i) P(E i)
i
Zika testing
0.08% of people have Zika
P(Z )=0.0008
90% of people with Zika test positive
P(T|Z )=0.90
7% of people without Zika test positive
C
P(T|Z )=0.07
Someone tests positive. What's the probability they have Zika?
Z: event that person has Zika
T: event that person tests positive
P (T |Z) P (Z)
P(Z|T )=
P(T )
?
Zika testing
0.08% of people have Zika
P(Z )=0.0008
90% of people with Zika test positive
P(T|Z )=0.90
7% of people without Zika test positive
C
P(T|Z )=0.07
Someone tests positive. What's the probability they have Zika?
Z: event that person has Zika
T: event that person tests positive
P(T |Z ) P(Z)
P(Z|T )=
P (T |Z) P (Z )+ P (T |Z C ) P (Z C )
Zika testing
0.08% of people have Zika
P(Z )=0.0008
90% of people with Zika test positive
P(T|Z )=0.90
7% of people without Zika test positive
C
P(T|Z )=0.07
Someone tests positive. What's the probability they have Zika?
Z: event that person has Zika
T: event that person tests positive
P(T |Z ) P(Z) 0.90⋅0.0008
P(Z|T )= = ≈0.01
P (T |Z) P (Z )+ P (T |Z ) P (Z ) 0.90⋅0.0008+0.07⋅0.9992
C C
Bayes: Terminology
hypothesis
P(T|Z ) P(Z)
P(Z|T )=
P(T )
observation
Bayes: Terminology
likelihood prior
P(T|Z ) P(Z)
posterior P(Z|T )=
P(T )
normalizing
constant
Bayes' theorem
You can “flip” a conditional probability
if you multiply by the probability of
the hypothesis and divide by the
probability of the observation.
P(F|E) P (E)
P(E|F )=
P(F )
Thomas Bayes
Rev. Thomas Bayes (~1701-1761):
British mathematician and Presbyterian minister
[citation needed]
images: (left) unknown, (right) Martin ormazabal
Implicatures
Work is work.
Examples adapted from Grice (1970)
Implicatures
Will produced a series of sounds that corresponded
closely to the tune of “Hey Jude.”
Examples adapted from Grice (1970)
Implicatures: Grice's maxims
● “Make your contribution as informative as required [...]
● Do not make your contribution more informative than is
required. [...]
● Do not say what you believe to be false. [...]
● Avoid obscurity of expression.
● Avoid ambiguity.
● Be brief (avoid unnecessary prolixity).”
(Grice, 1970)
Image credit: Siobhan Chapman
Implicatures: Grice's maxims
Work is work.
“Make your contribution as informative as required”
Implicatures: Grice's maxims
Will produced a series of sounds that corresponded
closely to the tune of “Hey Jude.”
“Be brief (avoid unnecessary prolixity).”
Implicatures: Grice's maxims
How do you like
my new haircut?
...It's shorter in
the back!
“Be relevant.”
Image credit: Gage Skidmore
Implicatures
1 2 3
Image credit: Chris Potts
Implicatures
1 2 3
“glasses”
Image credit: Chris Potts
Implicatures
1 2 3
“person”
Image credit: Chris Potts
RSA: Bayesian pragmatic reasoning
“hat”
“glasses”
“person”
RSA: Bayesian pragmatic reasoning
literal
“hat” 1 0 0 (naive)
listener
“glasses” 0.5 0 0.5
“person” 0.33 0.33 0.33
RSA: Bayesian pragmatic reasoning
literal
“hat” 0.33 0 0 (naive)
speaker
“glasses” 0.33 0 0.5
“person” 0.33 1 0.5
RSA: Bayesian pragmatic reasoning
t
m
“hat” 1 0 0 pragmatic
listener
“glasses” 0.4 0 0.6 L(t∣m)=
S (m∣t) P(t )
P(m)
“person” 0.18 0.55 0.27
(Frank and Goodman, 2012)
Conditional probabilities
are still probabilities
Everything you know about some set of events is still true
if you condition consistently on some other event!
0≤P( E)≤1
Conditional probabilities
are still probabilities
Everything you know about some set of events is still true
if you condition consistently on some other event!
0≤P( E|G)≤1
Conditional probabilities
are still probabilities
Everything you know about some set of events is still true
if you condition consistently on some other event!
c
P(E)=1−P(E )
Conditional probabilities
are still probabilities
Everything you know about some set of events is still true
if you condition consistently on some other event!
c
P(E|G)=1−P( E |G)
Conditional probabilities
are still probabilities
Everything you know about some set of events is still true
if you condition consistently on some other event!
c
P(E|G)=1−P( E |G)
must be
the same!
Conditional probabilities
are still probabilities
Everything you know about some set of events is still true
if you condition consistently on some other event!
P(EF )=P (E|F) P(F )
Conditional probabilities
are still probabilities
Everything you know about some set of events is still true
if you condition consistently on some other event!
P(EF|G)=P(E|FG) P(F|G)
Conditional probabilities
are still probabilities
Everything you know about some set of events is still true
if you condition consistently on some other event!
P(EF|G)=P(E|FG) P(F|G)
same as P((E|F) | G)
“conditioning twice” =
conditioning on AND
Conditional probabilities
are still probabilities
Everything you know about some set of events is still true
if you condition consistently on some other event!
P(F|E) P (E)
P(E|F )=
P(F )
Conditional probabilities
are still probabilities
Everything you know about some set of events is still true
if you condition consistently on some other event!
P(F|EG) P (E|G)
P(E|FG)=
P(F|G)
Let's play a game
$1 $20 $1
Let's Make a Deal
A has $20 B has $20 C has $20
(P = 1/3): (P = 1/3): (P = 1/3):
staying switching switching
wins wins wins
Important assumption: host must open a $1!