Module 7: Discrete Probability
Theme 1: Elementary Probability Theory
Probability is usually associated with an outcome of an experiment. For example, the experiment may be a throw of a coin, while the two possible outcomes are heads and tails. Roughly speaking, probability will estimate our chance that the next outcome of this experiment is either a head or a tail (here we assume that tail and head are equally likely, that is, the probability of tossing a head or a tail is equal to or
).
An experiment is a procedure that gives a set of possible outcomes. In fact, the set of all possible outcomes is called the sample space (e.g., in the experiment with a coin, the sample space head tail ). Finally, an event is a subset of the sample space (e.g., a head). When there are a nite number of equally likely outcomes, Laplace suggested the following denition of the probability: The probability of an event equally likely outcomes is
(which is a subset of a nite sample space
) of
where events in
and
are cardinalities of the sets
and , respectively. We often call the
favorable events, while events in all possible events.
Example 1: A box has 5 black balls and 7 green balls. What is the probability of selecting a green ball? The sample space consists of balls. The event select a green ball has seven elements.
Therefore or
Example 2: Let two dice be rolled (we recall that a die has six sides, and each side has one, or two, Let us rst build the probability space . It consists of pairs , six dots). What is the probability that the sum of the numbers on the two dice is ?
where
(since every die has six outcomes, so two of them must have
, so we have outcomes). The event
sum is equal to 11 consists of
therefore, lem. Example 3: Find the probability that a hand of ve cards in poker contains four cards of one kind?
The counting problems encountered so far were very simple. Consider now the following prob-
We recall that there are
cards in a deck; there are different kinds of cards, with 4 cards of
, tens, jacks, queens, kings, and aces. There are also four cards out of
each kind. These kinds are twos, threes, suits: spades, clubs, hearts, and diamonds. The number of ways to choose
is (which is a large number).
This is
the cardinality of the sample space. Let us now consider the event of ways to pick one kind (
that a hand has four cards of one
kind. By the multiplication rule, a hand of ve cards with four cards of the same kind is the number
(in words, one in every
) and the number of ways to pick the fth card, which is
since there are
possible outcomes and favorable outcomes for
). Therefore, by the above denition
Sometimes, we know the probability of events combinations of events such as occur), or event
and
and need to know the probability of
(i.e., at least one event occurs),
(both events must
does not occur). Let us start with the probability of the complementary
. We claim that
Indeed, since
we obtain
Example 4: What is the probability that among ve randomly generated bits at least one is ? there are possible binary strings of length ve, only only one (i.e., one, we nd
This exactly the case when it easier to compute
than
. In this case
) is the favorable
all bits are 1 . Since
since there are binary strings of length
and there is only one string with all s. Hence
Let us now compute
. From previous modules we know that
therefore, by the denition of probability
In summary, we prove that
In words, the probability of union of two events is the sum of the probability of both events minus the the probability of product of the events, When the events are disjoint (i.e.,
), then
is divisible either by or by
Let
Example 5: What is the probability that a randomly selected positive integer smaller than equal to
and
that the integer is divisible by , and
it we need
. Observe that the event we are looking for is . In order to compute since there are ten numbers in the range to that are divisible by .
the event that the integer is divisible by . Clearly
Therefore, by the denition of probability we have
Exercise 7A: What is the probability of generating a binary string provided and are equally likely.
of length seven
Theme 2: Probability Theory
In the previous section, we assumed that all outcomes of the sample space are equally likely. This led us to the Laplace denition of probability. Here we generalize it. Let
be a probability space. Throughout, we assume that
is nite, and often we just list all
outcomes (e.g.,
). Any subset
of
will be called an event. We now dene into the interval
probability
as a function from the set of subsets of
If we denote by
, then
such that the following three properties hold (below 1. 2.
denotes the probability of the event
):
; ;
, then
3. if
The above three properties say that the probability of any event must be nonnegative, that the probability of a sure event (i.e., ) is equal to one, and nally that the probability of the union of disjoint events is the sum of the probabilities of the corresponding events. Using these three assumptions one can prove many properties of probability (that we already encountered in the previous section). For example, let (that is, is the same as not ). We have disjoint, hence by (c) we nd
. Indeed, observe that
be the complementary event to and are
which proves our claim that
. By the way, as a corollary we see that
Let now all outcomes in be equally likely. To be more precise, let
and
since by the second property above we have Let now
(all events sum up to one).
, that is
. By the third property of the probability denition and
the above we have
In the above we rst observe that the event
is a union of the elementary events
. All elementary events are disjoint, hence we can sum probabilities, as the second line above events, hence
shows. Finally, since every event is equally likely and there are
. We
have just recovered Laplaces denition of probability for equally likely outcomes. octal number whose digits are between and . First, a length number is decimal, and Example 6: Find the probability that a randomly selected -digits decimal number is also a valid digit number can be represented as
is (just apply the multiplication rule). The number of valid octal numbers of length
where
if the is
if the number is octal. The number of decimal numbers of .
. Therefore, the probability is
Conditional Probability
Consider the case when you know that event the probability of event . This is known as the conditional probability and denoted as has occurred, and knowing this you want to compute
Example 7: There are ve black balls and ten green balls in a box. You select randomly a ball, and it happens to be a green ball. You do not return this ball to the box. What is the probability that in the second selection you pick up a green ball? If seeking is denoted as and is the event of selecting a green ball in the rst pick, is the probability of choosing another green ball in the second pick, then the probability we are
. In our case it is
since after the rst selection there are only nine green balls in the box containing used explicitly the fact that after picking a geen ball there are only We can compute this probability in a different way. Observe that
balls. (Here we
balls left with and
, hence
green balls.)
. Event
Let us now compute the probability of occur
out of since one ball was already taken out from the box in the pick. Hence
can occur in ways out of , while
can
and then we dene (see below for additional explanations) the conditional probability
as
Thus, we obtain the same result as the one computed directly. It suggests a denition of conditional probability that we shall discuss next. 5
Let us generalize this example. Consider a sample space event the occurrence of event to those outcomes that fall into
and two events
. Assume
has occurred. Then the sample space effectively reduces to . In a sense, but
, therefore, we must restrict is the new sample space. . Therefore for equally
likely events we compute
In other words, the number of favorable outcomes is not
as follows
Observe, however, that
In the second line above, we multiply and divide by the probabilities
and .
and then observe in the third line that we have
Actually, the last expression is used as a denition of the conditional probability. Let denoted as and be events with
, is dened as
The conditional probability of
given
, the rest by company
Example 8: A box contains It is known that
chips made by company
chips, of them made by company
are defective, while only
chips
made by company comes from company to nd Let
are defective. Compute the probability that if you pick up a defective chips it . and that a chip is defective. We need .
, that is, the probability that provided a chip is defective it i comes from company For this we need and . But
be the event that a chip is made by company
Then
that is, one out of every three.
Independence
If
, then the knowledge of
does not change the probability of
. We say that
, which serves as a denition.
Two events and
and
are independent events. Observe that the above condition is equivalent to
are said to be independent if and only if
Example 8: Consider a ve-bit binary string. The probability of generating a zero is equal to . Bits are generated independently. What is the probability of getting ? Since we have independence we easily compute
and
since is the probability of generating a one. Exercise 7B: Show that if are independent events, then and are also independent events.
Binomial Distribution and Bernoulli Trials
In the last example, we generated ve bits and asked for the probability of getting do not specify where the two s and three of ve. Thus this probability is equal to if we ask for the probability of generating two s and three s, the situation is different. This time we
. However,
are located. Therefore, strings like , , etc. satisfy the description of the event. In fact, we have ways to select two zeros out
and this should be compared with the answer to the previous example. For instance, if the above becomes
, then
We shall generalize the last situation, and introduce the so called Bernoulli trials and the binomial distribution. Consider an experiment that has two outcomes called successes and failures. Let the probability of a success be , while the probability of a failure asking what is the probability of Let us now consider
. This experiment is
called the Bernoulli trial. Let us repeat it times. Many problems in probability can be solved by successes in Bernoulli trials. The last example can be viewed as ve Bernoulli trials with a success being a generation of a zero. independent Bernoulli trials with the probability of a success equal to . What is the probability of obtaining successes. Since the outcomes are independent a particular trial 7
with
successes has the probability
. But we can choose on
ways
successes
out of trials, therefore, the probability of
successes in independent Bernoulli trials is
(1)
Considered as a function of , we call the above function the binomial distribution and denote it as
Observe that (1) is probability since by the denition of probability it sums up to one. More precisely, by Newtons summation formula discussed in Module 5
as needed.1 Example 9: A biased coin is thrown 7 times. The probability of throwing a tail is probability of throwing three tails in four trials? Clearly, we have the Bernoulli trials with the success being a throw of a tail. Hence, the probability is equal to . What is the
in (1).
after substituting
Random Variables
Many problems are concerned with a numerical values associated with the outcome of an experiment. For example, we can assign value head. Such a numerical value assigned to an outcome is known as a random variable. A random variable is a function from the sample space of an experiment to the set of real numbers. Example 10: Let us ip a coin three times. Dene a random variable that appear when is the outcome. We have
to the tail when throwing a coin, and value when throwing a
to be the number of tails
We recall that by Newtons formula
Having dened a random variable, we can now introduce the probability mass function. Let
, that is,
is the subset of (an event) that assigns value of
. Then
since
is disjoint union of elementary events such that
Let us now discuss an important notion of probability theory, namely, the expected value of an experiment. For example, one expects about are now in a position to denite it precisely. The expected value (also known as the mean value) of a random variable
tails when ipping an unbiased coin times. We over
taking values in
is dened as
The above formula extends the denition of average value known from high school. Indeed, let all events
are equally likely, and assume that
. WE learned in high school to
compute the average (expected value) as follows
which coincides with the above denition. Example 11: We shall continue Example 10 assuming that the coin is fair (i.e., probability of a head or a tail is ). From the previous example we nd that
, thus we have three out of outcomes
since, for example, satisfying
(i.e., the number of tails is equal to one). Therefore,
that is, on average we have tails per three throws.
Let us now compute the expected value of the binomial distribution dened above. We dene as the number of successes in Bernoulli trials. Then2
The rst line is just the denition of the binomial distribution and the expected value. In the third line we use the following property of the binomial coefcients (see Module 4 and 6):
In the fourth line above we change the index of summation from
to
, while in the fth line
we apply the Newton summation formula, discussed in Module 4 which we recall below
(In our case,
and
.)
Expectation has some nice properties. For example,
important result! Let us derive it. We have
this is, the expectation of the sum of random variables is the sum of expectations. This is very
This derivation is quite long and can be omitted in the rst reading. We shall re-derive the same result in Example 13
using simpler arguments.
10
Example 13: We just computed that way. Observe that
for binomially distributed
. We needed a long
chain of computations. But we can prove the same result using the above property in a much easier
where
is equal to when a success occurs and otherwise.
Such a random variable is called
the Bernoulli random variable or, more precisely, Bernoulli distributed random variable. Clearly,
. Since the expectation of a sum of random variables is the sum of
expectations, we have
as before, but this time we derive it in a simple way. However, in general and is not equal to . To assure this is true one must assume
are independent dened as follows: Two random variables
and
on the same sample space are independent if
Example 14: Let us roll two dice. What is the probability of getting die. Let represent the number obtained on the rst die and Since the events are independent, we have
on the die and on the second
the number rolled on the second die.
We now prove the following result Theorem 1 Let and are independent random variables. Then
Proof. We have
11
where in the second line we used independence, while in the third line we computed two independent sums. Finally, we shall discuss variance. The expected value of a random variable tells us its average value but says nothing about variability of it. The reader should not forget that is a random variable and it (randomly) varies. While we would like to nd one synthetic number (e.g., the expected value) to describe this random variable, such a characterization is usually very poor. Therefore, we try to introduce some parameters that can tall us (in a simplied way) more about the random variable. The variance, roughly speaking, determines how widely a random variable is distributed around the expected value. Formally: Let as
be a random variable dened on a sample space . The variance of , is
, denoted
That is, the variance is the expected value of the following random variable: expect that is more likely to concentrate around around the expected value. about variations of
. Since we , the random variable tells us
We can compute the variance using the following formula
Indeed,
(2)
where above we used the fact that the expected value of a sum of random variables is the sum of the expected values and the following identity (lets call it the square of sum identity)
known from high school.
Example 15: Consider a Bernoulli random variable otherwise. What is the variance of We observe rst that
taking value
with probability
and zero
. Then we compute
12
Thus, a straightforward computation gives us
Unlike the expectation the variance of a sum of two random variables is not the sum of variances. For this to hold, we need additional assumptions, as shown below. Theorem 2. Let and be independent random variables. Then
In general, if ,
are pairwise independent random variables, then
Proof. From (2) we have
But
where in the second line we use the identity independence of and
and in the third line we apply
. Summing up, we obtain
which completes the proof. In the rst line we use the fact that obtain the desired identity.
(derived
above), then we use again the square of sum identity, then we rearrange terms of the sum, and nally
Example 16: Let us compute the variance of the binomial distribution. We use the representation of binomial distribution from Example 13, that is,
13
where
are Bernoulli distributed with
as computed in Example 15. Therefore,
by the last theorem
That is, the variance of the sum of Bernoulli distributed random variables is the sum of variances of individual random variables, and it is equal to .
14