Mathematical Foundations of Computer Science
Lecture Outline
November 8, 2018
Theorem. If X and Y are independent real-valued random variables then
Var[X + Y ] = Var[X] + Var[Y ] and E[X · Y ] = E[X] · E[Y ]
The result can be extended to a finite number of random variables.
Note that the converse of the above statement is not true as illustrated by the following
example. Let Ω = {a, b, c}, with all three outcomes equally likely. Let X and Y be
random variables defined as follows: X(a) = 1, X(b) = 0, X(c) = −1 and Y (a) = 0, Y (b) =
1, Y (c) = 0. Note that X and Y are not independent since
1 2 2
Pr[X = 0 ∧ Y = 0] = 0, but Pr[X = 0] · Pr[Y = 0] = · = 6= 0.
3 3 9
Note that for all ω ∈ Ω, X(ω)Y (ω) = 0. Also, E[X] = 0 and E[Y ] = 1/3. Thus we have
E[XY ] = 0 = E[X]E[Y ]
It is also easy to verify that Var[X + Y ] = Var[X] + Var[Y ].
Example (Chebyshev’s Inequality). Let X be a random variable. Show that for any
a > 0,
Var[X]
Pr[|X − E[X]| ≥ a] ≤
a2
Solution. The inequality that we proved in the earlier homework is called Markov’s in-
equality. We will use it to prove the above tail bound called Chebyshev’s inequality.
Pr[|X − E[X]| ≥ a] = Pr[(X − E[X])2 ≥ a2 ]
E[(X − E[X])2 ]
≤ (using Markov’s Inequality)
a2
Var[X]
=
a2
Example. Use Chebyshev’s inequality to bound the probability of obtaining at least 3n/4
heads in a sequence of n fair coin flips.
2 Lecture Outline November 8, 2018
Solution. Let X denote the random variable denoting the total number of heads that
result in n flips of a fair coin. For 1 ≤ i ≤ n, let Xi be a random variable that is 1, if the
ith flip results in Heads, 0, otherwise. Thus,
X = X1 + X2 + · · · + Xn
By the linearity of expectation, E[X] = n/2. Since the random variables Xi s are indepen-
dent, we have
n n
X X n
Var[X] = Var[Xi ] = (1/2 − 1/4) =
4
i=1 i=1
Using Chebyshev’s inequality, we get
Pr[X ≥ 3n/4] = Pr[X − n/2 ≥ n/4]
= Pr[X − E[X] ≥ n/4]
1
= · Pr[|X − E[X]| ≥ n/4]
2
1 Var[X]
≤ · 2
2 n /16
2
=
n
Probability Distributions
Tossing a coin is an experiment with exactly two outcomes: heads (“success”) with a
probability of, say p, and tails (“failure”) with a probability of 1 − p. Such an experiment
is called a Bernoulli trial. Let Y be a random variable that is 1 if the experiment succeeds
and is 0 otherwise. Y is called a Bernoulli or an indicator random variable. For such a
variable we have
E[Y ] = p · 1 + (1 − p) · 0 = p = Pr[Y = 1]
Thus for a fair coin if we consider heads as ”success” then the expected value of the corre-
sponding indicator random variable is 1/2.
A sequence of Bernoulli trials means that the trials are independent and each has a prob-
ability p of success. We will study two important distributions that arise from Bernoulli
trials: the geometric distribution and the binomial distribution.
The Geometric Distribution
Consider the following question. Suppose we have a biased coin with heads probability p
that we flip repeatedly until it lands on heads. What is the distribution of the number
of flips? This is an example of a geometric distribution. It arises in situations where we
perform a sequence of independent trials until the first success where each trial succeeds
with a probability p.
November 8, 2018 Lecture Outline 3
Note that the sample space Ω consists of all sequences that end in H and have exactly one
H. That is
Ω = {H, T H, T T H, T T T H, T T T T H, . . .}
For any ω ∈ Ω of length i, Pr[ω] = (1 − p)i−1 p.
Definition. A geometric random variable X with parameter p is given by the following
distribution for i = 1, 2, . . . :
Pr[X = i] = (1 − p)i−1 p
We can verify that the geometric random variable admits a valid probability distribution
as follows:
∞ ∞ ∞
X X p X p 1−p
(1 − p)i−1 p = p (1 − p)i−1 = (1 − p)i = · =1
1−p 1 − p 1 − (1 − p)
i=1 i=1 i=1
P∞ i c
Note that to obtain the second-last term we have used the fact that i=1 c = 1−c , |c| < 1.
Let’s now calculate the expectation of a geometric random variable, X. We can do this in
several ways. One way is to use the definition of expectation.
∞
X
E[X] = i Pr[X = i]
i=0
∞
X
= i(1 − p)i−1 p
i=0
∞
p X
= i(1 − p)i
1−p
i=0
∞
!
p 1−p X x
= ∵ kxk = , for |x| < 1.
1−p (1 − (1 − p))2 (1 − x)2
i=0
p 1−p
=
1−p p2
1
=
p
Another way to compute the expectation is to note that X is a random variable that takes
on non-negative values. From a theorem proved in last class we know that if X takes on
only non-negative values then
X∞
E[X] = Pr[X ≥ i]
i=1
Using this result we can calculate the expectation of the geometric random variable X. For
the geometric random variable X with parameter p,
∞ ∞
X
j−1 i−1
X 1
Pr[X ≥ i] = (1−p) p = (1−p) p (1−p)j = (1−p)i−1 p× = (1−p)i−1
1 − (1 − p)
j=i j=0
4 Lecture Outline November 8, 2018
Therefore
∞ ∞ ∞
X X 1 X 1 1−p 1
E[X] = Pr[X ≥ i] = (1 − p)i−1 = (1 − p)i = · =
1−p 1 − p 1 − (1 − p) p
i=1 i=1 i=1
Memoryless Property. For a geometric random variable X with parameter p and for
n > 0,
Pr[X = n + k | X > k] = Pr[X = n]
Conditional Expectation. The following is the definition of conditional expectation.
X
E[Y | Z = z] = y Pr[Y = y | Z = z],
y
where the summation is over all possible values y that the random variable Y can assume.
Example. For any random variables X and Y ,
X
E[X] = Pr[Y = y]E[X | Y = y]
y
We can also calculate the expectation of a geometric random variable X using the
memoryless property of the geometric random variable. Let Y be a random variable that
is 0, if the first flip results in tails and that is 1, if the first flip is a heads. Using conditional
expectation we have
E[X] = Pr[Y = 0]E[X|Y = 0] + Pr[Y = 1]E[X|Y = 1]
= (1 − p)(E[X] + 1) + p · 1 (using the memoryless property)
∴ pE[X] = 1
1
E[X] =
p