BMAT202L: Probability and Statistics
Mohit Kumar
VIT Chennai
Basics of Probability Theory
Probability
What is Probability?
▶ Probability is a branch of mathematics that deals with the study
of random events or experiments.
Probability
What is Probability?
▶ Probability is a branch of mathematics that deals with the study
of random events or experiments.
▶ It is the measure of chance that a particular event will occur.
Probability
What is Probability?
▶ Probability is a branch of mathematics that deals with the study
of random events or experiments.
▶ It is the measure of chance that a particular event will occur.
▶ The probability of an event can be calculated by dividing the
number of favourable outcomes by the total number of possible
outcomes.
Probability
What is Probability?
▶ Probability is a branch of mathematics that deals with the study
of random events or experiments.
▶ It is the measure of chance that a particular event will occur.
▶ The probability of an event can be calculated by dividing the
number of favourable outcomes by the total number of possible
outcomes.
Probability
What is Probability?
▶ Probability is a branch of mathematics that deals with the study
of random events or experiments.
▶ It is the measure of chance that a particular event will occur.
▶ The probability of an event can be calculated by dividing the
number of favourable outcomes by the total number of possible
outcomes.
How can we define it?
▶ In order to define probability mathematically, we need to set
appropriate context.
Probability
What is Probability?
▶ Probability is a branch of mathematics that deals with the study
of random events or experiments.
▶ It is the measure of chance that a particular event will occur.
▶ The probability of an event can be calculated by dividing the
number of favourable outcomes by the total number of possible
outcomes.
How can we define it?
▶ In order to define probability mathematically, we need to set
appropriate context.
▶ We will do so, by defining the elements of probability.
Elements of Probability
Experiment
An experiment is a systematic and controlled process or activity carried out
to gather data or information about a particular phenomenon or system.
Elements of Probability
Experiment
An experiment is a systematic and controlled process or activity carried out
to gather data or information about a particular phenomenon or system.
Deterministic Experiment
▶ An experiment or a process in which the outcome can be predicted
with certainty before it is actually observed or performed.
Random Experiment
Elements of Probability
Experiment
An experiment is a systematic and controlled process or activity carried out
to gather data or information about a particular phenomenon or system.
Deterministic Experiment
▶ An experiment or a process in which the outcome can be predicted
with certainty before it is actually observed or performed.
▶ For instance, if we add 5 and 3, we know the outcome will always be 8,
and there is no randomness involved.
Random Experiment
Elements of Probability
Experiment
An experiment is a systematic and controlled process or activity carried out
to gather data or information about a particular phenomenon or system.
Deterministic Experiment
▶ An experiment or a process in which the outcome can be predicted
with certainty before it is actually observed or performed.
▶ For instance, if we add 5 and 3, we know the outcome will always be 8,
and there is no randomness involved.
Random Experiment
▶ An experiment or a process in which the outcome cannot be predicted
with certainty.
Elements of Probability
Experiment
An experiment is a systematic and controlled process or activity carried out
to gather data or information about a particular phenomenon or system.
Deterministic Experiment
▶ An experiment or a process in which the outcome can be predicted
with certainty before it is actually observed or performed.
▶ For instance, if we add 5 and 3, we know the outcome will always be 8,
and there is no randomness involved.
Random Experiment
▶ An experiment or a process in which the outcome cannot be predicted
with certainty.
▶ For example, tossing a coin is a random experiment because we cannot
predict the outcome of the coin toss with certainty.
Elements of Probability
Sample Space
▶ The set of all possible outcomes of a random experiment is called the
sample space and is represented by the symbol S.
Elements of Probability
Sample Space
▶ The set of all possible outcomes of a random experiment is called the
sample space and is represented by the symbol S.
▶ For example, the sample space of a coin toss is S = {H, T}, where H
represents the outcome of getting heads and T represents the outcome
of getting tails.
Elements of Probability
Sample Space
▶ The set of all possible outcomes of a random experiment is called the
sample space and is represented by the symbol S.
▶ For example, the sample space of a coin toss is S = {H, T}, where H
represents the outcome of getting heads and T represents the outcome
of getting tails.
▶ Each outcome in a sample space is called an element or a member of
the sample space, or simply a sample point.
Elements of Probability
Sample Space
▶ The set of all possible outcomes of a random experiment is called the
sample space and is represented by the symbol S.
▶ For example, the sample space of a coin toss is S = {H, T}, where H
represents the outcome of getting heads and T represents the outcome
of getting tails.
▶ Each outcome in a sample space is called an element or a member of
the sample space, or simply a sample point.
Elements of Probability
Sample Space
▶ The set of all possible outcomes of a random experiment is called the
sample space and is represented by the symbol S.
▶ For example, the sample space of a coin toss is S = {H, T}, where H
represents the outcome of getting heads and T represents the outcome
of getting tails.
▶ Each outcome in a sample space is called an element or a member of
the sample space, or simply a sample point.
Event
▶ An event is a subset of a sample space.
Elements of Probability
Sample Space
▶ The set of all possible outcomes of a random experiment is called the
sample space and is represented by the symbol S.
▶ For example, the sample space of a coin toss is S = {H, T}, where H
represents the outcome of getting heads and T represents the outcome
of getting tails.
▶ Each outcome in a sample space is called an element or a member of
the sample space, or simply a sample point.
Event
▶ An event is a subset of a sample space.
▶ For example, the event of getting heads in a coin toss is {H}.
Elements of Probability
Sample Space
▶ The set of all possible outcomes of a random experiment is called the
sample space and is represented by the symbol S.
▶ For example, the sample space of a coin toss is S = {H, T}, where H
represents the outcome of getting heads and T represents the outcome
of getting tails.
▶ Each outcome in a sample space is called an element or a member of
the sample space, or simply a sample point.
Event
▶ An event is a subset of a sample space.
▶ For example, the event of getting heads in a coin toss is {H}.
Elements of Probability
Sample Space
▶ The set of all possible outcomes of a random experiment is called the
sample space and is represented by the symbol S.
▶ For example, the sample space of a coin toss is S = {H, T}, where H
represents the outcome of getting heads and T represents the outcome
of getting tails.
▶ Each outcome in a sample space is called an element or a member of
the sample space, or simply a sample point.
Event
▶ An event is a subset of a sample space.
▶ For example, the event of getting heads in a coin toss is {H}.
Equally Likely Events
▶ Equally likely events are events that have the same theoretical
probability (or likelihood) of occurring.
Elements of Probability
Intuitive Definition of Probability
▶ Intuitively, probability of an event A is the likelihood of the event A in
S, so it can be defined as,
#A
P(A) = ,
#S
where #A denote the number of elements in A and P(A) be the
probability of A.
Elements of Probability
Intuitive Definition of Probability
▶ Intuitively, probability of an event A is the likelihood of the event A in
S, so it can be defined as,
#A
P(A) = ,
#S
where #A denote the number of elements in A and P(A) be the
probability of A.
▶ For example, the probability of getting heads in a fair coin toss is 1/2.
Elements of Probability
Intuitive Definition of Probability
▶ Intuitively, probability of an event A is the likelihood of the event A in
S, so it can be defined as,
#A
P(A) = ,
#S
where #A denote the number of elements in A and P(A) be the
probability of A.
▶ For example, the probability of getting heads in a fair coin toss is 1/2.
Elements of Probability
Intuitive Definition of Probability
▶ Intuitively, probability of an event A is the likelihood of the event A in
S, so it can be defined as,
#A
P(A) = ,
#S
where #A denote the number of elements in A and P(A) be the
probability of A.
▶ For example, the probability of getting heads in a fair coin toss is 1/2.
Is this definition okay?
▶ The elements mentioned above form the basis of probability, but in
order to define it rigorously we need to develop strong foundation
using set theory and counting principles.
Set Theory
Intersection of Two Events
▶ The intersection of two events A and B, denoted by the symbol
A ∩ B, is the event containing all elements that are common to A
and B.
Set Theory
Intersection of Two Events
▶ The intersection of two events A and B, denoted by the symbol
A ∩ B, is the event containing all elements that are common to A
and B.
▶ For example, if A = {1, 2, 3} and B = {3, 4, 5}, then
A ∩ B = {3}.
Set Theory
Intersection of Two Events
▶ The intersection of two events A and B, denoted by the symbol
A ∩ B, is the event containing all elements that are common to A
and B.
▶ For example, if A = {1, 2, 3} and B = {3, 4, 5}, then
A ∩ B = {3}.
Set Theory
Intersection of Two Events
▶ The intersection of two events A and B, denoted by the symbol
A ∩ B, is the event containing all elements that are common to A
and B.
▶ For example, if A = {1, 2, 3} and B = {3, 4, 5}, then
A ∩ B = {3}.
Union of Two Events
▶ The union of the two events A and B, denoted by the symbol
A ∪ B, is the event containing all the elements that belong to A or
B or both.
Set Theory
Intersection of Two Events
▶ The intersection of two events A and B, denoted by the symbol
A ∩ B, is the event containing all elements that are common to A
and B.
▶ For example, if A = {1, 2, 3} and B = {3, 4, 5}, then
A ∩ B = {3}.
Union of Two Events
▶ The union of the two events A and B, denoted by the symbol
A ∪ B, is the event containing all the elements that belong to A or
B or both.
▶ For example, if A = {1, 2, 3} and B = {3, 4, 5}, then
A ∪ B = {1, 2, 3, 4, 5}.
Set Theory
Complement of an Event
▶ The complement of an event A with respect to S is the subset of
all elements of S that are not in A.
Set Theory
Complement of an Event
▶ The complement of an event A with respect to S is the subset of
all elements of S that are not in A.
▶ We denote the complement of A by the symbol Ac .
Set Theory
Complement of an Event
▶ The complement of an event A with respect to S is the subset of
all elements of S that are not in A.
▶ We denote the complement of A by the symbol Ac .
▶ For example, if A = {1, 2, 3} and S = {1, 2, 3, 4, 5, 6}, then
Ac = {4, 5, 6}.
Set Theory
Complement of an Event
▶ The complement of an event A with respect to S is the subset of
all elements of S that are not in A.
▶ We denote the complement of A by the symbol Ac .
▶ For example, if A = {1, 2, 3} and S = {1, 2, 3, 4, 5, 6}, then
Ac = {4, 5, 6}.
Set Theory
Complement of an Event
▶ The complement of an event A with respect to S is the subset of
all elements of S that are not in A.
▶ We denote the complement of A by the symbol Ac .
▶ For example, if A = {1, 2, 3} and S = {1, 2, 3, 4, 5, 6}, then
Ac = {4, 5, 6}.
Mutually Exclusive or Disjoint Events
Two events A and B are mutually exclusive, or disjoint, if
A ∩ B = ∅,
that is, if A and B have no elements in common.
Counting Principles
Rule 1: Multiplication Rule
▶ If an operation can be performed in n1 ways, and if for each of
these ways a second operation can be performed in n2 ways, then
the two operations can be performed together in n1 × n2 ways.
Counting Principles
Rule 1: Multiplication Rule
▶ If an operation can be performed in n1 ways, and if for each of
these ways a second operation can be performed in n2 ways, then
the two operations can be performed together in n1 × n2 ways.
▶ Example: Suppose you want to choose a fruit and a drink for
breakfast, and you have 2 fruit options (apple and orange) and 3
drink options (coffee, milk, and tea). How many different
breakfast combinations can you create?
Counting Principles
Rule 1: Multiplication Rule
▶ If an operation can be performed in n1 ways, and if for each of
these ways a second operation can be performed in n2 ways, then
the two operations can be performed together in n1 × n2 ways.
▶ Example: Suppose you want to choose a fruit and a drink for
breakfast, and you have 2 fruit options (apple and orange) and 3
drink options (coffee, milk, and tea). How many different
breakfast combinations can you create?
▶ The total number of breakfast combinations is 2 × 3 = 6.
Counting Principles
Rule 2: Generalized Multiplication Rule
▶ If an operation can be performed in n1 ways, and if for each of
these a second operation can be performed in n2 ways, and for
each of the first two a third operation can be performed in n3
ways, and so forth, then the sequence of k operations can be
performed in n1 × n2 × . . . × nk ways.
Counting Principles
Rule 2: Generalized Multiplication Rule
▶ If an operation can be performed in n1 ways, and if for each of
these a second operation can be performed in n2 ways, and for
each of the first two a third operation can be performed in n3
ways, and so forth, then the sequence of k operations can be
performed in n1 × n2 × . . . × nk ways.
▶ Example: A lock has 4 digits, each of which can be any number
from 0 to 9 (inclusive). If you are trying to guess the
combination, how many possible combinations are there?
Counting Principles
Rule 2: Generalized Multiplication Rule
▶ If an operation can be performed in n1 ways, and if for each of
these a second operation can be performed in n2 ways, and for
each of the first two a third operation can be performed in n3
ways, and so forth, then the sequence of k operations can be
performed in n1 × n2 × . . . × nk ways.
▶ Example: A lock has 4 digits, each of which can be any number
from 0 to 9 (inclusive). If you are trying to guess the
combination, how many possible combinations are there?
▶ The total number of possible combinations is
10 × 10 × 10 × 10 = 10, 000.
Counting Principles
Permutation
▶ A permutation is an arrangement of a set of objects in a specific
order.
Counting Principles
Permutation
▶ A permutation is an arrangement of a set of objects in a specific
order.
▶ In other words, it is a way of selecting and arranging a subset of
elements from a larger set, where the order of selection matters.
Counting Principles
Permutation
▶ A permutation is an arrangement of a set of objects in a specific
order.
▶ In other words, it is a way of selecting and arranging a subset of
elements from a larger set, where the order of selection matters.
▶ The total number of permutations of a set of n distinct objects
can be calculated using the formula n! (pronounced as “n
factorial”), where n! = n × (n − 1) × (n − 2) × · · · × 2 × 1, with
special case 0! = 1.
Counting Principles
Permutation
▶ A permutation is an arrangement of a set of objects in a specific
order.
▶ In other words, it is a way of selecting and arranging a subset of
elements from a larger set, where the order of selection matters.
▶ The total number of permutations of a set of n distinct objects
can be calculated using the formula n! (pronounced as “n
factorial”), where n! = n × (n − 1) × (n − 2) × · · · × 2 × 1, with
special case 0! = 1.
▶ The number of permutations of n objects arranged in a circle is
(n − 1)!.
Counting Principles
▶ The number of permutations of n distinct objects taken r at a
time is
n n!
Pr = .
(n − r)!
Counting Principles
▶ The number of permutations of n distinct objects taken r at a
time is
n n!
Pr = .
(n − r)!
▶ For example, the number of ways to select a committee of 3
members from a group of 6 people is
6 6!
P3 = = 6 × 5 × 4 = 120.
(6 − 3)!
Counting Principles
▶ The number of permutations of n distinct objects taken r at a
time is
n n!
Pr = .
(n − r)!
▶ For example, the number of ways to select a committee of 3
members from a group of 6 people is
6 6!
P3 = = 6 × 5 × 4 = 120.
(6 − 3)!
▶ The number of distinct permutations of n things of which n1 are
of one kind, n2 of a second kind, . . . , nk of a kth kind is
n!
.
n1 !n2 ! . . . nk !
Counting Principles
Combination
▶ A combination is a way of selecting a subset of objects from a
larger set, where the order of the selection does not matter.
Counting Principles
Combination
▶ A combination is a way of selecting a subset of objects from a
larger set, where the order of the selection does not matter.
▶ The number of combinations of k objects chosen from a set of n
distinct objects is
n n n!
Ck = = .
k k!(n − k)!
Counting Principles
Combination
▶ A combination is a way of selecting a subset of objects from a
larger set, where the order of the selection does not matter.
▶ The number of combinations of k objects chosen from a set of n
distinct objects is
n n n!
Ck = = .
k k!(n − k)!
▶ For example, if we have a set of 5 objects. The number of ways
to choose 3 objects without regard to order is:
5 5! 5×4
C3 = = = 10.
3!(5 − 3)! 2×1
Counting Principles
Partitioning
▶ Partitioning refers to the process of dividing a larger set into
smaller subsets or partitions.
Counting Principles
Partitioning
▶ Partitioning refers to the process of dividing a larger set into
smaller subsets or partitions.
▶ Each partition is a subset of the original set, and the union of all
partitions is equal to the original set.
Counting Principles
Partitioning
▶ Partitioning refers to the process of dividing a larger set into
smaller subsets or partitions.
▶ Each partition is a subset of the original set, and the union of all
partitions is equal to the original set.
▶ The number of ways of partitioning a set of n objects into k cells
with n1 elements in the first cell, n2 elements in the second, and
so forth is
n n!
= ,
n1 , n2 , . . . , nk n1 !n2 ! . . . nk !
where n1 + n2 + . . . + nk = n.
Probability of an Event
Classical Definition
▶ If a random experiment can result in any one of N different
equally likely outcomes, and if exactly n of these outcomes
correspond to event A, then the probability of event A is
n
P(A) = .
N
Probability of an Event
Classical Definition
▶ If a random experiment can result in any one of N different
equally likely outcomes, and if exactly n of these outcomes
correspond to event A, then the probability of event A is
n
P(A) = .
N
Probability of an Event
Classical Definition
▶ If a random experiment can result in any one of N different
equally likely outcomes, and if exactly n of these outcomes
correspond to event A, then the probability of event A is
n
P(A) = .
N
Relative Frequency Definition
▶ If there is no basis to assume the outcomes are equally likely,
then we can repeat the experiment n times, record the outcomes
favourable to event A, say na and then take the likelihood of the
event A with sample space S, as
na
P(A) = .
n
Probability of an Event
Axiomatic Definition
The axiomatic definition of probability includes three axioms:
1. For any event A,
P(A) ≥ 0
Probability of an Event
Axiomatic Definition
The axiomatic definition of probability includes three axioms:
1. For any event A,
P(A) ≥ 0
2. The probability of the entire sample space
P(S) = 1
Probability of an Event
Axiomatic Definition
The axiomatic definition of probability includes three axioms:
1. For any event A,
P(A) ≥ 0
2. The probability of the entire sample space
P(S) = 1
3. For any collection of mutually exclusive events A1 , A2 , A3 , . . .,
P(A1 ∪ A2 ∪ . . .) = P(A1 ) + P(A2 ) + . . .
Properties of Probability
▶ If A and B are two events, then
P(A ∪ B) = P(A) + P(B) − P(A ∩ B).
Properties of Probability
▶ If A and B are two events, then
P(A ∪ B) = P(A) + P(B) − P(A ∩ B).
▶ Example: Let the sample space be S = A ∪ B, where P(A) = 0.8
and P(B) = 0.5. Find P(A ∩ B).
Properties of Probability
▶ If A and B are two events, then
P(A ∪ B) = P(A) + P(B) − P(A ∩ B).
▶ Example: Let the sample space be S = A ∪ B, where P(A) = 0.8
and P(B) = 0.5. Find P(A ∩ B).
▶ Ans: 0.3
Properties of Probability
▶ If A and B are two events, then
P(A ∪ B) = P(A) + P(B) − P(A ∩ B).
▶ Example: Let the sample space be S = A ∪ B, where P(A) = 0.8
and P(B) = 0.5. Find P(A ∩ B).
▶ Ans: 0.3
▶ For any three events A, B, C,
P(A∪B∪C) = P(A)+P(B)+P(C)−P(A∩B)−P(A∩C)−P(B∩C)+P(A∩B∩C).
Properties of Probability
▶ If A and B are two events, then
P(A ∪ B) = P(A) + P(B) − P(A ∩ B).
▶ Example: Let the sample space be S = A ∪ B, where P(A) = 0.8
and P(B) = 0.5. Find P(A ∩ B).
▶ Ans: 0.3
▶ For any three events A, B, C,
P(A∪B∪C) = P(A)+P(B)+P(C)−P(A∩B)−P(A∩C)−P(B∩C)+P(A∩B∩C).
▶ If A1 , A2 , . . . , An are mutually exclusive, then
P(A1 ∪ A2 ∪ · · · ∪ An ) = P(A1 ) + P(A2 ) + · · · + P(An ).
Properties of Probability
▶ A collection of events {A1 , A2 , . . . , An } of a sample space S is
called a partition of S if A1 , A2 , . . . , An are mutually exclusive
and A1 ∪ A2 ∪ · · · ∪ An = S.
Properties of Probability
▶ A collection of events {A1 , A2 , . . . , An } of a sample space S is
called a partition of S if A1 , A2 , . . . , An are mutually exclusive
and A1 ∪ A2 ∪ · · · ∪ An = S.
▶ If A1 , A2 , . . . , An partition the sample space S, then
P(A1 ∪A2 ∪· · ·∪An ) = P(A1 )+P(A2 )+· · ·+P(An ) = P(S) = 1.
Properties of Probability
▶ A collection of events {A1 , A2 , . . . , An } of a sample space S is
called a partition of S if A1 , A2 , . . . , An are mutually exclusive
and A1 ∪ A2 ∪ · · · ∪ An = S.
▶ If A1 , A2 , . . . , An partition the sample space S, then
P(A1 ∪A2 ∪· · ·∪An ) = P(A1 )+P(A2 )+· · ·+P(An ) = P(S) = 1.
▶ For any event A, P(Ac ) = 1 − P(A).
Properties of Probability
▶ A collection of events {A1 , A2 , . . . , An } of a sample space S is
called a partition of S if A1 , A2 , . . . , An are mutually exclusive
and A1 ∪ A2 ∪ · · · ∪ An = S.
▶ If A1 , A2 , . . . , An partition the sample space S, then
P(A1 ∪A2 ∪· · ·∪An ) = P(A1 )+P(A2 )+· · ·+P(An ) = P(S) = 1.
▶ For any event A, P(Ac ) = 1 − P(A).
▶ In particular, the probability of null set P(∅) = 1 − P(S) = 0.
Properties of Probability
▶ A collection of events {A1 , A2 , . . . , An } of a sample space S is
called a partition of S if A1 , A2 , . . . , An are mutually exclusive
and A1 ∪ A2 ∪ · · · ∪ An = S.
▶ If A1 , A2 , . . . , An partition the sample space S, then
P(A1 ∪A2 ∪· · ·∪An ) = P(A1 )+P(A2 )+· · ·+P(An ) = P(S) = 1.
▶ For any event A, P(Ac ) = 1 − P(A).
▶ In particular, the probability of null set P(∅) = 1 − P(S) = 0.
▶ For any events A and B, if A ⊆ B, then P(A) ≤ P(B).
Properties of Probability
▶ A collection of events {A1 , A2 , . . . , An } of a sample space S is
called a partition of S if A1 , A2 , . . . , An are mutually exclusive
and A1 ∪ A2 ∪ · · · ∪ An = S.
▶ If A1 , A2 , . . . , An partition the sample space S, then
P(A1 ∪A2 ∪· · ·∪An ) = P(A1 )+P(A2 )+· · ·+P(An ) = P(S) = 1.
▶ For any event A, P(Ac ) = 1 − P(A).
▶ In particular, the probability of null set P(∅) = 1 − P(S) = 0.
▶ For any events A and B, if A ⊆ B, then P(A) ≤ P(B).
▶ For any event A ⊆ S, 0 ≤ P(A) ≤ 1.
Conditional Probability and Product Rule
Conditional Probability
▶ The conditional probability of event B given that A has occurred
is, denoted by P(B|A), and defined as
P(B ∩ A)
P(B|A) = ,
P(A)
provided that P(A) > 0.
Product Rule
Conditional Probability and Product Rule
Conditional Probability
▶ The conditional probability of event B given that A has occurred
is, denoted by P(B|A), and defined as
P(B ∩ A)
P(B|A) = ,
P(A)
provided that P(A) > 0.
Product Rule
▶ For any two events A and B,
P(A ∩ B) = P(A)P(B|A),
provided P(A) > 0.
Independence of Events
▶ Two events A and B are independent if and only if
P(B|A) = P(B) or P(A|B) = P(A),
assuming the existence of the conditional probabilities.
Independence of Events
▶ Two events A and B are independent if and only if
P(B|A) = P(B) or P(A|B) = P(A),
assuming the existence of the conditional probabilities.
▶ Two events A and B are independent if and only if
P(A ∩ B) = P(A)P(B).
Independence of Events
▶ Two events A and B are independent if and only if
P(B|A) = P(B) or P(A|B) = P(A),
assuming the existence of the conditional probabilities.
▶ Two events A and B are independent if and only if
P(A ∩ B) = P(A)P(B).
▶ Example: The probability that A hits the target is 1/4 and the
probability B hits is 2/5. What is the probability the target will be
hit if A and B each shoot at the target?
Independence of Events
▶ Two events A and B are independent if and only if
P(B|A) = P(B) or P(A|B) = P(A),
assuming the existence of the conditional probabilities.
▶ Two events A and B are independent if and only if
P(A ∩ B) = P(A)P(B).
▶ Example: The probability that A hits the target is 1/4 and the
probability B hits is 2/5. What is the probability the target will be
hit if A and B each shoot at the target?
▶ Ans: 11/20.
Generalized Product Rule
▶ If, in an experiment, the events A1 , A2 , . . . , Ak can occur, then
P(A1 ∩A2 ∩. . .∩Ak ) = P(A1 )P(A2 |A1 )P(A3 |A1 ∩A2 ) · · · P(Ak |A1 ∩A2 ∩· · ·∩Ak−1 ).
Generalized Product Rule
▶ If, in an experiment, the events A1 , A2 , . . . , Ak can occur, then
P(A1 ∩A2 ∩. . .∩Ak ) = P(A1 )P(A2 |A1 )P(A3 |A1 ∩A2 ) · · · P(Ak |A1 ∩A2 ∩· · ·∩Ak−1 ).
▶ Example: A box contains 12 items of which 4 are defective,
three items are drawn at random from the box one after the other.
Find the probability that all three are non-defective.
Generalized Product Rule
▶ If, in an experiment, the events A1 , A2 , . . . , Ak can occur, then
P(A1 ∩A2 ∩. . .∩Ak ) = P(A1 )P(A2 |A1 )P(A3 |A1 ∩A2 ) · · · P(Ak |A1 ∩A2 ∩· · ·∩Ak−1 ).
▶ Example: A box contains 12 items of which 4 are defective,
three items are drawn at random from the box one after the other.
Find the probability that all three are non-defective.
▶ Ans: 14/55.
Generalized Product Rule
▶ If, in an experiment, the events A1 , A2 , . . . , Ak can occur, then
P(A1 ∩A2 ∩. . .∩Ak ) = P(A1 )P(A2 |A1 )P(A3 |A1 ∩A2 ) · · · P(Ak |A1 ∩A2 ∩· · ·∩Ak−1 ).
▶ Example: A box contains 12 items of which 4 are defective,
three items are drawn at random from the box one after the other.
Find the probability that all three are non-defective.
▶ Ans: 14/55.
▶ Example: A box contains 20 balls of which 5 are red, 15 are
white. If 3 balls are selected at random and are drawn in
succession without replacement. Find the probability that all
three balls selected are red.
Generalized Product Rule
▶ If, in an experiment, the events A1 , A2 , . . . , Ak can occur, then
P(A1 ∩A2 ∩. . .∩Ak ) = P(A1 )P(A2 |A1 )P(A3 |A1 ∩A2 ) · · · P(Ak |A1 ∩A2 ∩· · ·∩Ak−1 ).
▶ Example: A box contains 12 items of which 4 are defective,
three items are drawn at random from the box one after the other.
Find the probability that all three are non-defective.
▶ Ans: 14/55.
▶ Example: A box contains 20 balls of which 5 are red, 15 are
white. If 3 balls are selected at random and are drawn in
succession without replacement. Find the probability that all
three balls selected are red.
▶ Ans: 1/114.
Independence of More Than Two Events
▶ A collection of events A = {A1 , . . . , An } is (mutually)
independent, if, for any sub-collection {Ai1 , · · · , Aik }, for k ≤ n,
P(Ai1 ∩ · · · ∩ Aik ) = P(Ai1 ) · · · P(Aik ).
Independence of More Than Two Events
▶ A collection of events A = {A1 , . . . , An } is (mutually)
independent, if, for any sub-collection {Ai1 , · · · , Aik }, for k ≤ n,
P(Ai1 ∩ · · · ∩ Aik ) = P(Ai1 ) · · · P(Aik ).
▶ If A1 , A2 , . . . , Ak are independent, then
P(A1 ∩ A2 ∩ . . . ∩ Ak ) = P(A1 )P(A2 ) · · · P(Ak ).
Law of Total Probability
▶ If the events B1 , B2 , . . . , Bk constitute a partition of the sample
space S such that P(Bi ) ̸= 0 for i = 1, 2, . . . , k, then for any
event A of S,
k
X k
X
P(A) = P(A ∩ Bi ) = P(Bi )P(A|Bi ).
i=1 i=1
Bayes’ Rule
▶ If the events B1 , B2 , · · · , Bk constitute a partition of the sample
space S such that P(Bi ) ̸= 0 for i = 1, 2, . . . , k, then for any
event A in S such that P(A) ̸= 0,
P(Br ∩ A)
P(Br |A) = Pk
i=1 P(Bi ∩ A)
P(Br )P(A|Br )
= Pk ,
i=1 P(Bi )P(A|Bi )
for r = 1, 2, . . . , k.
Bayes’ Rule
▶ If the events B1 , B2 , · · · , Bk constitute a partition of the sample
space S such that P(Bi ) ̸= 0 for i = 1, 2, . . . , k, then for any
event A in S such that P(A) ̸= 0,
P(Br ∩ A)
P(Br |A) = Pk
i=1 P(Bi ∩ A)
P(Br )P(A|Br )
= Pk ,
i=1 P(Bi )P(A|Bi )
for r = 1, 2, . . . , k.
▶ Example: A box contains 3 blue, 2 red marbles while another
box contains 2 blue, 5 red. A marble drawn at random from one
of the boxes terns out to be blue. What is the probability that it
come from the first box?
Bayes’ Rule
▶ If the events B1 , B2 , · · · , Bk constitute a partition of the sample
space S such that P(Bi ) ̸= 0 for i = 1, 2, . . . , k, then for any
event A in S such that P(A) ̸= 0,
P(Br ∩ A)
P(Br |A) = Pk
i=1 P(Bi ∩ A)
P(Br )P(A|Br )
= Pk ,
i=1 P(Bi )P(A|Bi )
for r = 1, 2, . . . , k.
▶ Example: A box contains 3 blue, 2 red marbles while another
box contains 2 blue, 5 red. A marble drawn at random from one
of the boxes terns out to be blue. What is the probability that it
come from the first box?
▶ Ans: 21/31.
Random Variables
Random Variable
What is a Random Variable?
▶ A random variable is a function that associates a real number
with each element in the sample space.
Random Variable
What is a Random Variable?
▶ A random variable is a function that associates a real number
with each element in the sample space.
Random Variable
What is a Random Variable?
▶ A random variable is a function that associates a real number
with each element in the sample space.
Example:
▶ Suppose a coin is tossed twice so the sample space is
S = {HH, HT, TH, TT}.
Random Variable
What is a Random Variable?
▶ A random variable is a function that associates a real number
with each element in the sample space.
Example:
▶ Suppose a coin is tossed twice so the sample space is
S = {HH, HT, TH, TT}.
▶ Let X represents the number of heads obtained in two tosses of
the coin, then
X = {2, 1, 1, 0}.
Types of Random Variables
Discrete Random Variable
▶ A discrete random variable is a random variable that takes on a
countable number of possible values.
Continuous Random Variable
Types of Random Variables
Discrete Random Variable
▶ A discrete random variable is a random variable that takes on a
countable number of possible values.
▶ The values that a discrete random variable can take on are
typically integers or a finite set of values.
Continuous Random Variable
Types of Random Variables
Discrete Random Variable
▶ A discrete random variable is a random variable that takes on a
countable number of possible values.
▶ The values that a discrete random variable can take on are
typically integers or a finite set of values.
▶ Example: The number of defective items in a production run, or
the number of heads in a series of coin flipping are discrete
random variables.
Continuous Random Variable
Types of Random Variables
Discrete Random Variable
▶ A discrete random variable is a random variable that takes on a
countable number of possible values.
▶ The values that a discrete random variable can take on are
typically integers or a finite set of values.
▶ Example: The number of defective items in a production run, or
the number of heads in a series of coin flipping are discrete
random variables.
Continuous Random Variable
▶ A random variable is called continuous random variable if it
takes the values on continuous scale.
Types of Random Variables
Discrete Random Variable
▶ A discrete random variable is a random variable that takes on a
countable number of possible values.
▶ The values that a discrete random variable can take on are
typically integers or a finite set of values.
▶ Example: The number of defective items in a production run, or
the number of heads in a series of coin flipping are discrete
random variables.
Continuous Random Variable
▶ A random variable is called continuous random variable if it
takes the values on continuous scale.
▶ Example: The height or weight of a person, the amount of
rainfall in a given day are continuous random variables.
Discrete Probability Distributions
Probability Mass Function (PMF)
▶ The set of ordered pairs (x, f (x)) is a probability function,
probability mass function or probability distribution of a discrete
random variable X if, for every x,
Discrete Probability Distributions
Probability Mass Function (PMF)
▶ The set of ordered pairs (x, f (x)) is a probability function,
probability mass function or probability distribution of a discrete
random variable X if, for every x,
▶ f (x) ≥ 0,
Discrete Probability Distributions
Probability Mass Function (PMF)
▶ The set of ordered pairs (x, f (x)) is a probability function,
probability mass function or probability distribution of a discrete
random variable X if, for every x,
▶ f (x) ≥ 0,
▶
P
x f (x) = 1,
Discrete Probability Distributions
Probability Mass Function (PMF)
▶ The set of ordered pairs (x, f (x)) is a probability function,
probability mass function or probability distribution of a discrete
random variable X if, for every x,
▶ f (x) ≥ 0,
▶
P
x f (x) = 1,
▶ PX (x) = P(X = x) = f (x).
Discrete Probability Distributions
Probability Mass Function (PMF)
▶ The set of ordered pairs (x, f (x)) is a probability function,
probability mass function or probability distribution of a discrete
random variable X if, for every x,
▶ f (x) ≥ 0,
▶
P
x f (x) = 1,
▶ PX (x) = P(X = x) = f (x).
Discrete Probability Distributions
Probability Mass Function (PMF)
▶ The set of ordered pairs (x, f (x)) is a probability function,
probability mass function or probability distribution of a discrete
random variable X if, for every x,
▶ f (x) ≥ 0,
▶
P
x f (x) = 1,
▶ PX (x) = P(X = x) = f (x).
Cumulative Distribution Function (CDF)
▶ Let X be a discrete random variable with PMF f (x). The
cumulative distribution function F(x) of X is defined as
X
F(x) = P(X ≤ x) = f (t), for − ∞ < x < ∞.
t≤x
Continuous Probability Distributions
Probability Density Function (PDF)
▶ The function f (x) is a probability density function of the continuous
random variable X defined over the set of real numbers if
Continuous Probability Distributions
Probability Density Function (PDF)
▶ The function f (x) is a probability density function of the continuous
random variable X defined over the set of real numbers if
▶ f (x) ≥ 0, for x ∈ R,
Continuous Probability Distributions
Probability Density Function (PDF)
▶ The function f (x) is a probability density function of the continuous
random variable X defined over the set of real numbers if
▶ f (x) ≥ 0, for x ∈ R,
▶ ∞ f (x)dx = 1,
R
−∞
Continuous Probability Distributions
Probability Density Function (PDF)
▶ The function f (x) is a probability density function of the continuous
random variable X defined over the set of real numbers if
▶ f (x) ≥ 0, for x ∈ R,
▶ ∞ f (x)dx = 1,
R
−∞
▶ P(a < X < b) = b f (x)dx.
R
a
Continuous Probability Distributions
Probability Density Function (PDF)
▶ The function f (x) is a probability density function of the continuous
random variable X defined over the set of real numbers if
▶ f (x) ≥ 0, for x ∈ R,
▶ ∞ f (x)dx = 1,
R
−∞
▶ P(a < X < b) = b f (x)dx.
R
a
Continuous Probability Distributions
Probability Density Function (PDF)
▶ The function f (x) is a probability density function of the continuous
random variable X defined over the set of real numbers if
▶ f (x) ≥ 0, for x ∈ R,
▶ ∞ f (x)dx = 1,
R
−∞
▶ P(a < X < b) = b f (x)dx.
R
a
Cumulative Distribution Function (CDF)
▶ The cumulative distribution function F(x) of a continuous random
variable X with density function f (x) is defined as
Z x
F(x) = P(X ≤ x) = f (t)dt, for − ∞ < x < ∞.
−∞
Continuous Probability Distributions
Probability Density Function (PDF)
▶ The function f (x) is a probability density function of the continuous
random variable X defined over the set of real numbers if
▶ f (x) ≥ 0, for x ∈ R,
▶ ∞ f (x)dx = 1,
R
−∞
▶ P(a < X < b) = b f (x)dx.
R
a
Cumulative Distribution Function (CDF)
▶ The cumulative distribution function F(x) of a continuous random
variable X with density function f (x) is defined as
Z x
F(x) = P(X ≤ x) = f (t)dt, for − ∞ < x < ∞.
−∞
▶ The PDF is the derivative of the CDF, i.e., f (x) = d
dx F(x) = F ′ (x).
Properties of CDF
▶ The CDF is a non-decreasing function, i.e,
if x1 < x2 , then F(x1 ) ≤ F(x2 ).
Properties of CDF
▶ The CDF is a non-decreasing function, i.e,
if x1 < x2 , then F(x1 ) ≤ F(x2 ).
▶ The CDF is a right-continuous function.
Properties of CDF
▶ The CDF is a non-decreasing function, i.e,
if x1 < x2 , then F(x1 ) ≤ F(x2 ).
▶ The CDF is a right-continuous function.
▶ limx→−∞ F(x) = 0.
Properties of CDF
▶ The CDF is a non-decreasing function, i.e,
if x1 < x2 , then F(x1 ) ≤ F(x2 ).
▶ The CDF is a right-continuous function.
▶ limx→−∞ F(x) = 0.
▶ limx→∞ F(x) = 1.
Properties of CDF
▶ The CDF is a non-decreasing function, i.e,
if x1 < x2 , then F(x1 ) ≤ F(x2 ).
▶ The CDF is a right-continuous function.
▶ limx→−∞ F(x) = 0.
▶ limx→∞ F(x) = 1.
▶ The range of the CDF is [0,1], i.e., the CDF takes on values
between 0 and 1, inclusive.
Properties of CDF
▶ The CDF is a non-decreasing function, i.e,
if x1 < x2 , then F(x1 ) ≤ F(x2 ).
▶ The CDF is a right-continuous function.
▶ limx→−∞ F(x) = 0.
▶ limx→∞ F(x) = 1.
▶ The range of the CDF is [0,1], i.e., the CDF takes on values
between 0 and 1, inclusive.
▶ P(a ≤ X ≤ b) = F(b) − F(a).
Properties of CDF
▶ The CDF is a non-decreasing function, i.e,
if x1 < x2 , then F(x1 ) ≤ F(x2 ).
▶ The CDF is a right-continuous function.
▶ limx→−∞ F(x) = 0.
▶ limx→∞ F(x) = 1.
▶ The range of the CDF is [0,1], i.e., the CDF takes on values
between 0 and 1, inclusive.
▶ P(a ≤ X ≤ b) = F(b) − F(a).
▶ P(X > a) = 1 − P(X ≤ a) = 1 − F(a).
Examples
▶ Example 1: Let X be a discrete random variable with PMF
1/2, x = 1
f (x) = 1/3, x = 2
1/6, x = 3.
Find the CDF of X.
Examples
▶ Example 1: Let X be a discrete random variable with PMF
1/2, x = 1
f (x) = 1/3, x = 2
1/6, x = 3.
Find the CDF of X.
▶ Ans: The CDF of X is
0,
x<1
1/2, 1 ≤ x < 2
F(x) =
5/6, 2 ≤ x < 3
x ≥ 3.
1,
Examples
▶ Example 2: A shipment of 8 similar computers to a retail outlet
contains 3 defective. If a school makes a random purchase of 2
computers. Find the probability distribution of the number of
defective.
Examples
▶ Example 2: A shipment of 8 similar computers to a retail outlet
contains 3 defective. If a school makes a random purchase of 2
computers. Find the probability distribution of the number of
defective.
▶ Ans: Let X be the number of defective computers purchased by
the school. Then X can take the values from the set {0, 1, 2}.
Therefore, the PMF of X is
5/14, x = 0
f (x) = 15/28, x = 1
3/28, x = 2.
Examples
▶ Example 3: A continuous random variable X has a density
function
Ce−3x , x > 0
f (x) =
0, x ≤ 0.
(i) Find the value of C.
(ii) Find P(1 < X < 2).
(iii) Find P(X ≥ 3).
(iv) Find P(X ≤ 1).
Examples
▶ Example 3: A continuous random variable X has a density
function
Ce−3x , x > 0
f (x) =
0, x ≤ 0.
(i) Find the value of C.
(ii) Find P(1 < X < 2).
(iii) Find P(X ≥ 3).
(iv) Find P(X ≤ 1).
▶ Ans:
(i) C = 3.
(ii) P(1 < X < 2) = e−3 − e−6 .
(iii) P(X ≥ 3) = e−9 .
(iv) P(X ≤ 1) = 1 − e−3 .
Examples
▶ Example 4: Let X be a discrete random variable and
4
4! 1
f (x) = , x = 0, 1, 2, 3, 4.
x!(4 − x)! 2
Is f a PMF? if so, find P({0, 1}).
Examples
▶ Example 4: Let X be a discrete random variable and
4
4! 1
f (x) = , x = 0, 1, 2, 3, 4.
x!(4 − x)! 2
Is f a PMF? if so, find P({0, 1}).
▶ Ans: Since f (x) ≥ 0 for each x = 0, 1, 2, 3, 4 and
f (0) + f (1) + f (2) + f (3) + f (4) = 1.
Therefore f is a PMF. Also, P({0, 1}) = f (0) + f (1) = 5/16.
Examples
▶ Example 5: Let X be a continuous random variable with PDF
e−x , 0<x<∞
f (x) =
0, otherwise.
Find P(0 < X < 1).
Examples
▶ Example 5: Let X be a continuous random variable with PDF
e−x , 0<x<∞
f (x) =
0, otherwise.
Find P(0 < X < 1).
▶ Ans: P(0 < X < 1) = 1 − 1e .
Examples
▶ Example 6: Determine the value of k and the CDF of the continuous random
variable X, whose PDF is
0, x<0
0≤x≤1
kx,
f (x) = k, 1≤x≤2
3k − kx,
2≤x≤3
0, x > 3.
Examples
▶ Example 6: Determine the value of k and the CDF of the continuous random
variable X, whose PDF is
0, x<0
0≤x≤1
kx,
f (x) = k, 1≤x≤2
3k − kx,
2≤x≤3
0, x > 3.
▶ Ans: k = 1
2
and CDF is
0, x<0
2
x
0≤x≤1
,
4
x
F(x) = 2 − 4 ,1
1≤x≤2
−x2
3x 5
+ 2 − 4, 2 ≤ x ≤ 3
4
1, x > 3.
Joint Distributions
Joint Probability Distributions
Joint Probability Mass Function
▶ The function f (x, y) is a joint probability distribution or probability mass
function of the discrete random variables X and Y if
Joint Density Function
Joint Probability Distributions
Joint Probability Mass Function
▶ The function f (x, y) is a joint probability distribution or probability mass
function of the discrete random variables X and Y if
▶ f (x, y) ≥ 0 for all (x, y),
Joint Density Function
Joint Probability Distributions
Joint Probability Mass Function
▶ The function f (x, y) is a joint probability distribution or probability mass
function of the discrete random variables X and Y if
▶ f (x, y) ≥ 0 for all (x, y),
▶
P P
x y f (x, y) = 1,
Joint Density Function
Joint Probability Distributions
Joint Probability Mass Function
▶ The function f (x, y) is a joint probability distribution or probability mass
function of the discrete random variables X and Y if
▶ f (x, y) ≥ 0 for all (x, y),
▶
P P
x y f (x, y) = 1,
▶ P(X = x, Y = y) = f (x, y).
Joint Density Function
Joint Probability Distributions
Joint Probability Mass Function
▶ The function f (x, y) is a joint probability distribution or probability mass
function of the discrete random variables X and Y if
▶ f (x, y) ≥ 0 for all (x, y),
▶
P P
x y f (x, y) = 1,
▶ P(X = x, Y = y) = f (x, y).
▶ For any region A, in the xy plane, P[(X, Y) ∈ A] =
PP
A f (x, y).
Joint Density Function
Joint Probability Distributions
Joint Probability Mass Function
▶ The function f (x, y) is a joint probability distribution or probability mass
function of the discrete random variables X and Y if
▶ f (x, y) ≥ 0 for all (x, y),
▶
P P
x y f (x, y) = 1,
▶ P(X = x, Y = y) = f (x, y).
▶ For any region A, in the xy plane, P[(X, Y) ∈ A] =
PP
A f (x, y).
Joint Density Function
▶ The function f (x, y) is a joint density function of the continuous random
variables X and Y if
Joint Probability Distributions
Joint Probability Mass Function
▶ The function f (x, y) is a joint probability distribution or probability mass
function of the discrete random variables X and Y if
▶ f (x, y) ≥ 0 for all (x, y),
▶
P P
x y f (x, y) = 1,
▶ P(X = x, Y = y) = f (x, y).
▶ For any region A, in the xy plane, P[(X, Y) ∈ A] =
PP
A f (x, y).
Joint Density Function
▶ The function f (x, y) is a joint density function of the continuous random
variables X and Y if
▶ f (x, y) ≥ 0, for all (x, y),
Joint Probability Distributions
Joint Probability Mass Function
▶ The function f (x, y) is a joint probability distribution or probability mass
function of the discrete random variables X and Y if
▶ f (x, y) ≥ 0 for all (x, y),
▶
P P
x y f (x, y) = 1,
▶ P(X = x, Y = y) = f (x, y).
▶ For any region A, in the xy plane, P[(X, Y) ∈ A] =
PP
A f (x, y).
Joint Density Function
▶ The function f (x, y) is a joint density function of the continuous random
variables X and Y if
▶ f (x, y) ≥ 0, for all (x, y),
▶ ∞ ∞ f (x, y)dxdy = 1,
R R
−∞ −∞
Joint Probability Distributions
Joint Probability Mass Function
▶ The function f (x, y) is a joint probability distribution or probability mass
function of the discrete random variables X and Y if
▶ f (x, y) ≥ 0 for all (x, y),
▶
P P
x y f (x, y) = 1,
▶ P(X = x, Y = y) = f (x, y).
▶ For any region A, in the xy plane, P[(X, Y) ∈ A] =
PP
A f (x, y).
Joint Density Function
▶ The function f (x, y) is a joint density function of the continuous random
variables X and Y if
▶ f (x, y) ≥ 0, for all (x, y),
▶ ∞ ∞ f (x, y)dxdy = 1,
R R
−∞ −∞
▶ P[(X, Y) ∈ A] =
RR
A
f (x, y)dxdy, for any region A in the xy plane.
Marginal Distributions
Discrete Case
▶ For discrete random variables X and Y,
Marginal Distributions
Discrete Case
▶ For discrete random variables X and Y,
▶ The marginal distribution of X is g(x) = fX (x) =
P
y f (x, y),
Marginal Distributions
Discrete Case
▶ For discrete random variables X and Y,
▶ The marginal distribution of X is g(x) = fX (x) = y f (x, y),
P
▶ The marginal distribution of Y is h(y) = fY (y) = x f (x, y).
P
Marginal Distributions
Discrete Case
▶ For discrete random variables X and Y,
▶ The marginal distribution of X is g(x) = fX (x) = y f (x, y),
P
▶ The marginal distribution of Y is h(y) = fY (y) = x f (x, y).
P
Marginal Distributions
Discrete Case
▶ For discrete random variables X and Y,
▶ The marginal distribution of X is g(x) = fX (x) = y f (x, y),
P
▶ The marginal distribution of Y is h(y) = fY (y) = x f (x, y).
P
Continuous Case
▶ For Continuous random variables X and Y,
Marginal Distributions
Discrete Case
▶ For discrete random variables X and Y,
▶ The marginal distribution of X is g(x) = fX (x) = y f (x, y),
P
▶ The marginal distribution of Y is h(y) = fY (y) = x f (x, y).
P
Continuous Case
▶ For Continuous random variables X and Y,
R∞
▶ The marginal distribution of X is g(x) = fX (x) = f (x, y)dy,
−∞
Marginal Distributions
Discrete Case
▶ For discrete random variables X and Y,
▶ The marginal distribution of X is g(x) = fX (x) = y f (x, y),
P
▶ The marginal distribution of Y is h(y) = fY (y) = x f (x, y).
P
Continuous Case
▶ For Continuous random variables X and Y,
▶ The marginal distribution of X is g(x) = fX (x) = ∞ f (x, y)dy,
R
R −∞
▶ The marginal distribution of Y is h(y) = fY (y) = ∞ f (x, y)dx.
−∞
Conditional Distributions
▶ Let X and Y be two random variables, discrete or continuous.
The conditional distribution of the random variable Y given that
X = x is
f (x, y)
f (y|x) = fY|X (y|x) = ,
g(x)
provided g(x) > 0.
Conditional Distributions
▶ Let X and Y be two random variables, discrete or continuous.
The conditional distribution of the random variable Y given that
X = x is
f (x, y)
f (y|x) = fY|X (y|x) = ,
g(x)
provided g(x) > 0.
▶ Similarly, the conditional distribution of X given that Y = y is
f (x, y)
f (x|y) = fX|Y (x|y) = ,
h(y)
provided h(y) > 0.
Independence or Statistical Independence
▶ Let X and Y be two random variables, discrete or continuous,
with joint probability distribution f (x, y) and marginal
distributions g(x) and h(y), respectively.
Independence or Statistical Independence
▶ Let X and Y be two random variables, discrete or continuous,
with joint probability distribution f (x, y) and marginal
distributions g(x) and h(y), respectively.
▶ The random variables X and Y are said to be statistically
independent if and only if
f (x, y) = g(x)h(y) = fX (x)fY (y)
for all (x, y) within their range.
Example 1:
Two ballpoint pens are selected at random from a box that contains 3
blue pens, 2 red pens, and 3 green pens. If X is the number of blue
pens selected and Y is the number of red pens selected.
(a) Find the joint probability function f (x, y).
(b) Find P[(X, Y) ∈ A], where A is the region {(x, y)|x + y ≤ 1}.
(c) Find the marginal distributions of X and Y.
(d) Find the conditional distribution of X, given that Y = 1, and use
it to determine P(X = 0|Y = 1).
(e) Show that the random variables X and Y are not statistically
independent.
Example 2:
Let the joint density function of the continuous random variables X
and Y is
2 (2x + 3y), 0 ≤ x ≤ 1, 0 ≤ y ≤ 1,
f (x, y) = 5
0, otherwise.
R∞ R∞
(a) Show that −∞ −∞ f (x, y)dxdy = 1.
(b) Find P[(X, Y) ∈ A], where A = {(x, y)|0 < x < 21 , 14 < y < 12 }.
(c) Find the marginal distributions of X and Y.
(d) Find the conditional densities f (y|x), f (x|y), and then evaluate
P( 14 < X < 21 |Y = 13 ).
(e) Show that X and Y are not statistically independent.
Example 3:
The joint density of the random variables X and Y is given as
4xye−(x2 +y2 ) x, y ≥ 0
f (x, y) =
0 elsewhere.
Test whether X and Y are statistically independent.
Mathematical Expectation
Mean of a Random Variable
▶ For a discrete random variable X with probability mass function
f (x), the expected value or mean is
X
µ = E(X) = xf (x).
x
Mean of a Random Variable
▶ For a discrete random variable X with probability mass function
f (x), the expected value or mean is
X
µ = E(X) = xf (x).
x
▶ For a continuous random variable X with probability density
function f (x), the expected value or mean is
Z ∞
µ = E(X) = xf (x)dx.
−∞
Mean of a function of a Random Variable
Law of the Unconscious Statistician (LOTUS)
Let X be a random variable with probability distribution f (x).
Mean of a function of a Random Variable
Law of the Unconscious Statistician (LOTUS)
Let X be a random variable with probability distribution f (x).
▶ If X is discrete, the expected value of the random variable g(X) is
X
µg(X) = E[g(X)] = g(x)f (x).
x
Mean of a function of a Random Variable
Law of the Unconscious Statistician (LOTUS)
Let X be a random variable with probability distribution f (x).
▶ If X is discrete, the expected value of the random variable g(X) is
X
µg(X) = E[g(X)] = g(x)f (x).
x
▶ If g(X) = X 2 , then µX 2 = E(X 2 ) = x2 f (x).
P
x
Mean of a function of a Random Variable
Law of the Unconscious Statistician (LOTUS)
Let X be a random variable with probability distribution f (x).
▶ If X is discrete, the expected value of the random variable g(X) is
X
µg(X) = E[g(X)] = g(x)f (x).
x
▶ If g(X) = X 2 , then µX 2 = E(X 2 ) = x2 f (x).
P
x
▶ If X is continuous, the expected value of the random variable
g(X) is Z ∞
µg(X) = E[g(X)] = g(x)f (x)dx.
−∞
Mean of a function of a Random Variable
Law of the Unconscious Statistician (LOTUS)
Let X be a random variable with probability distribution f (x).
▶ If X is discrete, the expected value of the random variable g(X) is
X
µg(X) = E[g(X)] = g(x)f (x).
x
▶ If g(X) = X 2 , then µX 2 = E(X 2 ) = x2 f (x).
P
x
▶ If X is continuous, the expected value of the random variable
g(X) is Z ∞
µg(X) = E[g(X)] = g(x)f (x)dx.
−∞
R∞
▶ If g(X) = X 2 , then µX 2 = E(X 2 ) = x2 f (x).
−∞
Mean of a function of two Random Variables
Let X and Y be the random variables with joint probability
distribution f (x, y).
▶ If X and Y are discrete, the mean, or expected value of the
random variable g(X, Y) is
XX
µg(X,Y) = E[g(X, Y)] = g(x, y)f (x, y).
x y
Mean of a function of two Random Variables
Let X and Y be the random variables with joint probability
distribution f (x, y).
▶ If X and Y are discrete, the mean, or expected value of the
random variable g(X, Y) is
XX
µg(X,Y) = E[g(X, Y)] = g(x, y)f (x, y).
x y
▶ If X and Y are continuous, the mean, or expected value of the
random variable g(X, Y) is
Z ∞ Z ∞
µg(X,Y) = E[g(X, Y)] = g(x, y)f (x, y)dxdy.
−∞ −∞
Variance and Standard Deviation of a Random Variable
▶ Let X be a random variable with probability distribution f (x) and
mean µ.
Variance and Standard Deviation of a Random Variable
▶ Let X be a random variable with probability distribution f (x) and
mean µ.
▶ If X is discrete, the variance of X is
X
Var(X) = σ 2 = E[(X − µ)2 ] = (x − µ)2 f (x).
x
Variance and Standard Deviation of a Random Variable
▶ Let X be a random variable with probability distribution f (x) and
mean µ.
▶ If X is discrete, the variance of X is
X
Var(X) = σ 2 = E[(X − µ)2 ] = (x − µ)2 f (x).
x
▶ If X is continuous, the variance of X is
Z ∞
Var(X) = σ 2 = E[(X − µ)2 ] = (x − µ)2 f (x)dx.
−∞
Variance and Standard Deviation of a Random Variable
▶ Let X be a random variable with probability distribution f (x) and
mean µ.
▶ If X is discrete, the variance of X is
X
Var(X) = σ 2 = E[(X − µ)2 ] = (x − µ)2 f (x).
x
▶ If X is continuous, the variance of X is
Z ∞
Var(X) = σ 2 = E[(X − µ)2 ] = (x − µ)2 f (x)dx.
−∞
▶ The variance of the random variable X is also given by
σ 2 = E(X 2 ) − µ2 = E(X 2 ) − [E(X)]2
Variance and Standard Deviation of a Random Variable
▶ Let X be a random variable with probability distribution f (x) and
mean µ.
▶ If X is discrete, the variance of X is
X
Var(X) = σ 2 = E[(X − µ)2 ] = (x − µ)2 f (x).
x
▶ If X is continuous, the variance of X is
Z ∞
Var(X) = σ 2 = E[(X − µ)2 ] = (x − µ)2 f (x)dx.
−∞
▶ The variance of the random variable X is also given by
σ 2 = E(X 2 ) − µ2 = E(X 2 ) − [E(X)]2
▶ The positive square root of the variance, σ is called the standard
deviation of X.
Variance of a Function of a Random Variable
Let X be a random variable with probability distribution f (x).
▶ If X is discrete, the variance of the random variable g(X) is
2
Var[g(X)] = σg(X) = E[(g(X) − µg(X) )2 ]
X
= (g(x) − µg(X) )2 f (x).
x
Variance of a Function of a Random Variable
Let X be a random variable with probability distribution f (x).
▶ If X is discrete, the variance of the random variable g(X) is
2
Var[g(X)] = σg(X) = E[(g(X) − µg(X) )2 ]
X
= (g(x) − µg(X) )2 f (x).
x
▶ If X is continuous, the variance of the random variable g(X) is
2
Var[g(X)] = σg(X) = E[(g(X) − µg(X) )2 ]
Z ∞
= (g(x) − µg(X) )2 f (x)dx.
−∞
Covariance of Random Variables
▶ Let X and Y be random variables with joint probability
distribution f (x, y), and with means µX and µY , respectively.
Covariance of Random Variables
▶ Let X and Y be random variables with joint probability
distribution f (x, y), and with means µX and µY , respectively.
▶ If X and Y are discrete, the covariance of X and Y is
Cov(X, Y) = σXY = E[(X − µX )(Y − µY )]
XX
= (x − µX )(y − µy )f (x, y).
x y
Covariance of Random Variables
▶ Let X and Y be random variables with joint probability
distribution f (x, y), and with means µX and µY , respectively.
▶ If X and Y are discrete, the covariance of X and Y is
Cov(X, Y) = σXY = E[(X − µX )(Y − µY )]
XX
= (x − µX )(y − µy )f (x, y).
x y
▶ If X and Y are continuous, the covariance of X and Y is
Cov(X, Y) = σXY = E[(X − µX )(Y − µY )]
Z ∞Z ∞
= (x − µX )(y − µY )f (x, y)dxdy.
−∞ −∞
Covariance of Random Variables
▶ The covariance of two random variables X and Y with means µX
and µY , respectively, is given by
σXY = E(XY) − µX µY .
Covariance of Random Variables
▶ The covariance of two random variables X and Y with means µX
and µY , respectively, is given by
σXY = E(XY) − µX µY .
▶ We can also rewrite it as
Cov(X, Y) = E(XY) − E(X)E(Y).
Covariance of Random Variables
▶ The covariance of two random variables X and Y with means µX
and µY , respectively, is given by
σXY = E(XY) − µX µY .
▶ We can also rewrite it as
Cov(X, Y) = E(XY) − E(X)E(Y).
▶ Moreover, if a, b, c, d are constants, then
Cov(aX + b, cY + d) = ac Cov(X, Y).
The Correlation Coefficient
▶ Let X and Y be random variables with covariance σXY and
standard deviations σX and σY , respectively. The correlation
coefficient of X and Y is
σXY
ρXY = .
σX σY
The Correlation Coefficient
▶ Let X and Y be random variables with covariance σXY and
standard deviations σX and σY , respectively. The correlation
coefficient of X and Y is
σXY
ρXY = .
σX σY
▶ If X and Y are independent, then ρXY = 0.
Properties of Mean and Variance
▶ For any constants a and b, E(aX + b) = aE(X) + b.
Properties of Mean and Variance
▶ For any constants a and b, E(aX + b) = aE(X) + b.
▶ If a = 0, then E(b) = b.
Properties of Mean and Variance
▶ For any constants a and b, E(aX + b) = aE(X) + b.
▶ If a = 0, then E(b) = b.
▶ If b = 0, then E(aX) = aE(X).
Properties of Mean and Variance
▶ For any constants a and b, E(aX + b) = aE(X) + b.
▶ If a = 0, then E(b) = b.
▶ If b = 0, then E(aX) = aE(X).
▶ If a = 1, then E(X + b) = E(X) + b.
Properties of Mean and Variance
▶ For any constants a and b, E(aX + b) = aE(X) + b.
▶ If a = 0, then E(b) = b.
▶ If b = 0, then E(aX) = aE(X).
▶ If a = 1, then E(X + b) = E(X) + b.
▶ E[g(X) ± h(X)] = E[g(X)] ± E[h(X)].
Properties of Mean and Variance
▶ For any constants a and b, E(aX + b) = aE(X) + b.
▶ If a = 0, then E(b) = b.
▶ If b = 0, then E(aX) = aE(X).
▶ If a = 1, then E(X + b) = E(X) + b.
▶ E[g(X) ± h(X)] = E[g(X)] ± E[h(X)].
▶ E[g(X, Y) ± h(X, Y)] = E[g(X, Y)] ± E[h(X, Y)].
Properties of Mean and Variance
▶ For any constants a and b, E(aX + b) = aE(X) + b.
▶ If a = 0, then E(b) = b.
▶ If b = 0, then E(aX) = aE(X).
▶ If a = 1, then E(X + b) = E(X) + b.
▶ E[g(X) ± h(X)] = E[g(X)] ± E[h(X)].
▶ E[g(X, Y) ± h(X, Y)] = E[g(X, Y)] ± E[h(X, Y)].
▶ E[X ± Y] = E[X] ± E[Y].
Properties of Mean and Variance
▶ For any constants a and b, E(aX + b) = aE(X) + b.
▶ If a = 0, then E(b) = b.
▶ If b = 0, then E(aX) = aE(X).
▶ If a = 1, then E(X + b) = E(X) + b.
▶ E[g(X) ± h(X)] = E[g(X)] ± E[h(X)].
▶ E[g(X, Y) ± h(X, Y)] = E[g(X, Y)] ± E[h(X, Y)].
▶ E[X ± Y] = E[X] ± E[Y].
▶ If X and Y are independent random variables, then
E(XY) = E(X)E(Y).
Moreover, σXY = Cov(X, Y) = 0.
Properties of Mean and Variance
▶ If X and Y are random variables with joint probability
distribution f (x, y) and a, b, and c are constants, then
2
σaX+bY+c = a2 σX2 + b2 σY2 + 2abσXY .
Properties of Mean and Variance
▶ If X and Y are random variables with joint probability
distribution f (x, y) and a, b, and c are constants, then
2
σaX+bY+c = a2 σX2 + b2 σY2 + 2abσXY .
▶ If b = 0, then σaX+c
2
= a2 σX2
Properties of Mean and Variance
▶ If X and Y are random variables with joint probability
distribution f (x, y) and a, b, and c are constants, then
2
σaX+bY+c = a2 σX2 + b2 σY2 + 2abσXY .
▶ If b = 0, then σaX+c
2
= a2 σX2
▶ If a = 1 and b = 0, then σX+c
2
= σX2 .
Properties of Mean and Variance
▶ If X and Y are random variables with joint probability
distribution f (x, y) and a, b, and c are constants, then
2
σaX+bY+c = a2 σX2 + b2 σY2 + 2abσXY .
▶ If b = 0, then σaX+c
2
= a2 σX2
▶ If a = 1 and b = 0, then σX+c
2
= σX2 .
▶ If b = c = 0, then σaX
2
= a2 σX2 .
Properties of Mean and Variance
▶ If X and Y are random variables with joint probability
distribution f (x, y) and a, b, and c are constants, then
2
σaX+bY+c = a2 σX2 + b2 σY2 + 2abσXY .
▶ 2
If b = 0, then σaX+c = a2 σX2
▶ 2
If a = 1 and b = 0, then σX+c = σX2 .
▶ 2
If b = c = 0, then σaX = a2 σX2 .
▶ If a = b = 0, then σc2 = 0.
Properties of Mean and Variance
▶ If X and Y are random variables with joint probability
distribution f (x, y) and a, b, and c are constants, then
2
σaX+bY+c = a2 σX2 + b2 σY2 + 2abσXY .
▶ 2
If b = 0, then σaX+c = a2 σX2
▶ 2
If a = 1 and b = 0, then σX+c = σX2 .
▶ 2
If b = c = 0, then σaX = a2 σX2 .
▶ If a = b = 0, then σc2 = 0.
▶ For independent random variables X and Y,
Properties of Mean and Variance
▶ If X and Y are random variables with joint probability
distribution f (x, y) and a, b, and c are constants, then
2
σaX+bY+c = a2 σX2 + b2 σY2 + 2abσXY .
▶ 2
If b = 0, then σaX+c = a2 σX2
▶ 2
If a = 1 and b = 0, then σX+c = σX2 .
▶ 2
If b = c = 0, then σaX = a2 σX2 .
▶ If a = b = 0, then σc2 = 0.
▶ For independent random variables X and Y,
▶ σaX+bY
2
= a2 σX2 + b2 σY2 .
Properties of Mean and Variance
▶ If X and Y are random variables with joint probability
distribution f (x, y) and a, b, and c are constants, then
2
σaX+bY+c = a2 σX2 + b2 σY2 + 2abσXY .
▶ 2
If b = 0, then σaX+c = a2 σX2
▶ 2
If a = 1 and b = 0, then σX+c = σX2 .
▶ 2
If b = c = 0, then σaX = a2 σX2 .
▶ If a = b = 0, then σc2 = 0.
▶ For independent random variables X and Y,
▶ σaX+bY
2
= a2 σX2 + b2 σY2 .
▶ σaX−bY = a2 σX2 + b2 σY2 .
2
Moment Generating Function
Moments
▶ The rth moment about the origin of a random variable X is
denote by µ′r and defined by
P xr f (x), if X is discrete
µr = E(X ) = R ∞x
′ r
r
−∞ x f (x)dx, if X is continuous.
Moments
▶ The rth moment about the origin of a random variable X is
denote by µ′r and defined by
P xr f (x), if X is discrete
µr = E(X ) = R ∞x
′ r
r
−∞ x f (x)dx, if X is continuous.
▶ If r = 0, then µ′0 = E(X 0 ) = E(1) = 1.
Moments
▶ The rth moment about the origin of a random variable X is
denote by µ′r and defined by
P xr f (x), if X is discrete
µr = E(X ) = R ∞x
′ r
r
−∞ x f (x)dx, if X is continuous.
▶ If r = 0, then µ′0 = E(X 0 ) = E(1) = 1.
▶ If r = 1, then µ′1 = E(X) = µ.
Moments
▶ The rth moment about the origin of a random variable X is
denote by µ′r and defined by
P xr f (x), if X is discrete
µr = E(X ) = R ∞x
′ r
r
−∞ x f (x)dx, if X is continuous.
▶ If r = 0, then µ′0 = E(X 0 ) = E(1) = 1.
▶ If r = 1, then µ′1 = E(X) = µ.
▶ If r = 2, then µ′2 = E(X 2 ) = Var(X) + [E(X)]2 = σ 2 + µ2 .
Therefore,
σ 2 = µ′2 − µ′2
1.
Moment Generating Function (MGF)
▶ The MGF of a random variable completely describes the nature
of the distribution.
Moment Generating Function (MGF)
▶ The MGF of a random variable completely describes the nature
of the distribution.
▶ The moment-generating function of the random variable X is
given by E(etX ) and is denoted by MX (t). Hence,
P etx f (x), if X is discrete
MX (t) = E(e ) = R ∞x
tX
tx
−∞ e f (x)dx, if X is continuous.
Moment Generating Function (MGF)
▶ The MGF of a random variable completely describes the nature
of the distribution.
▶ The moment-generating function of the random variable X is
given by E(etX ) and is denoted by MX (t). Hence,
P etx f (x), if X is discrete
MX (t) = E(e ) = R ∞x
tX
tx
−∞ e f (x)dx, if X is continuous.
▶ Moment Formula: Let X be a random variable with moment
generating function MX (t). Then
dr MX (t)
= µ′r .
dtr t=0
Properties of MGF
▶ Let X be a random variable and a be a constant.
Properties of MGF
▶ Let X be a random variable and a be a constant.
▶ MX+a (t) = eat MX (t).
Properties of MGF
▶ Let X be a random variable and a be a constant.
▶ MX+a (t) = eat MX (t).
▶ MaX (t) = MX (at).
Properties of MGF
▶ Let X be a random variable and a be a constant.
▶ MX+a (t) = eat MX (t).
▶ MaX (t) = MX (at).
▶ Uniqueness Theorem: Let X and Y be two random variables
with moment generating functions MX (t) and MY (t),
respectively. If MX (t) = MY (t) for all values of t, then X and Y
have the same probability distribution.
Properties of MGF
▶ Let X be a random variable and a be a constant.
▶ MX+a (t) = eat MX (t).
▶ MaX (t) = MX (at).
▶ Uniqueness Theorem: Let X and Y be two random variables
with moment generating functions MX (t) and MY (t),
respectively. If MX (t) = MY (t) for all values of t, then X and Y
have the same probability distribution.
▶ If X and Y are two independent random variables with moment
generating functions MX (t) and MY (t), respectively. Then
MX+Y (t) = MX (t)MY (t).