Introduction to Probability Theory
K. Suresh Kumar
Department of Mathematics
Indian Institute of Technology Bombay
July 28, 2017
2
LECTURES 3 - 4
Theorem 0.0.1 ( Properties of probability measure) Let (Ω, F, P ) a prob-
ability space and A, B, A1 , A2 , . . . are in F. Then
(1) P (Ac ) = 1 − P (A) .
(2) Monotonicity: if A ⊆ B, then
P (A) ≤ P (B) .
(3) Inclusion - exclusion formula:
n
X n
X
P (∪nk=1 Ak ) = P (Ak ) − P (Ai Aj )
k=1 i<j
n
X
+ P (Ai Aj Ak ) . . . + (−1)n+1 P (A1 A2 . . . An ) .
i<j<k
(4) Finite sub-additivity:
P (A ∪ B) ≤ P (A) + P (B) .
(5) Continuity property:
(i) For A1 ⊆ A2 ⊆ . . .
P (∪∞
n=1 An ) = lim P (An ) .
n→∞
(ii) For A1 ⊇ A2 ⊇ . . . ,
P (∩∞
n=1 An ) = lim P (An ) .
n→∞
(6) Boole’s inequality (Countable sub-additivity):
∞
X
P (∪∞
n=1 An ) ≤ P (An ) .
n=1
Proof. Since
Ω = A ∪ Ac ∪ ∅ ∪ ∅ · · · we have 1 = P (A) + P Ac + P (∅) + P (∅) + · · · .
3
Since RHS is a convergent series, we get P (∅) = 0 and hence (1) follows.
Now
A ⊆ B =⇒ B = A ∪ B \ A .
Therefore
P (B) = P (A) + P (B \ A) ⇒ P (B) ≥ P (A) ,
since P (B \ A) ≥ 0. This proves (2).
We prove (3) by induction. For n = 2
A1 ∪ A2 = A1 ∪ (A2 \ A1 ) .
and
A2 = (A2 \ A1 ) ∪ A1 A2 .
(Here A1 A2 = A1 ∩ A2 ) . Hence we have
P (A1 ∪ A2 ) = P (A1 ) + P (A2 \ A1 )
and
P A2 = P (A2 \ A1 ) + P (A1 A2 ) .
Combining the above, we have
P (A1 ∪A2 ) = P (A1 ) + P (A2 ) − P (A1 A2 ) .
4
Assume that equality holds for n ≤ m . Consider
P (∪m+1 m
k=1 Ak ) = P (∪k=1 Ak ∪ Am+1 )
= P (∪m m
k=1 Ak ) + P (Am+1 ) − P ([∪k=1 Ak ] ∩ Am+1 )
= P (∪m m
k=1 Ak ) + P (Am+1 ) − P (∪k=1 Ak Am+1 )
m
X m
X
= P (Ak ) − P (Ai Aj )
k=1 i<j
m+1
+ · · · + (−1) P (A1 · · ·Am ) + P (Am+1 )
hX m X m
− P (Ak Am+1 ) − P (Ai Am+1 Aj Am+1 )
k=1 i<j i
m+1
+ · · · + (−1) P (A1 Am+1 A2 Am+1 · · ·Am Am+1 )
m+1
X m
hX m
X i
= P (Ak ) − P (Ai Aj ) + P (Ak Am+1 )
k=1 i<j k=1
m
h X m
X i
+ P (Ai Aj Ak ) + P (Ai Aj Am+1 )
i<j<k i<j
+· · · + (−1)m+2 P (A1 A2 · · ·Am Am+1 )
m+1
X m+1
X
= P (Ak ) − P (Ai Aj )
k=1 i<j
m+1
X
+ P (Ai Aj Ak )· · · + (−1)m+2 P (A1 · · ·Am+1 ) .
i<j<k
Therefore the result true for n = m + 1. Hence by induction property (3)
follows.
From property (3), we have
P (A ∪ B) = P (A) + P (B) − P (AB) .
Hence
P (A ∪ B) ≤ P (A) + P (B), .
Thus we have (4).
Now we prove (5)(i). Set
B1 = A1 , Bn = An \ An−1 , n = 2, 3, · · · .
5
Then
Bn ∈ F ∀ n = 1, 2, · · · ,
Bn0 s are disjoint and
An = ∪nk=1 Bk , n ≥ 1 . (0.0.1)
Also
∪∞ ∞
n=1 Bn = ∪n=1 An . (0.0.2)
Using (0.0.1), we get
n
X
P (An ) = P (Bk ) .
k=1
Hence, from the definition of series convergence, we get
n
X ∞
X
lim P (An ) = lim P (Bk ) = P (Bn ). (0.0.3)
n→∞ n→∞
k=1 n=1
Similarly, using (0.0.2), we get
∞
X
P (∪∞
n=1 An ) = P (Bn ) .
n=1
Therefore from (0.0.3), we have
P (A) = lim P (An ).
n→∞
Proof of (5)(ii) is as follows.
Note that
Ac1 ⊆ Ac2 · · · .
Now using (5)(i) we have
lim P (Acn ) = P (∪∞ c
n=1 An ) .
n→∞
i.e.,
1 − lim P (An ) = P [(∩∞ c ∞
n=1 An ) ] = 1 − P (∩n=1 An ) .
n→∞
Hence
lim P (An ) = P (A) .
n→∞
From property (4), it follows that
P (A1 ∪ A2 ∪ · · · ∪ An ) ≤ P (A1 ) + · · · + P (An ) ∀ n ≥ 1 .
6
i.e.,
n
X
P (∪nk=1 Ak ) ≤ P (Ak ) ∀ n ≥ 1 .
k=1
Therefore
∞
X
P (∪nk=1 Ak ) ≤ P (Ak ) ∀ n ≥ 1 . (0.0.4)
k=1
Set
Bn = ∪nk=1 Ak .
Then B1 ⊆ B2 ⊆ · · · and are in F. Also
∪∞ ∞
n=1 An = ∪n=1 Bn .
Hence
P (∪∞ ∞ n
n=1 An ) = P (∪n=1 Bn ) = lim P (Bn ) = lim P (∪k=1 Ak ) . (0.0.5)
n→∞ n→∞
Here the second equality follows from the continuity property 5(i). Using
(0.0.5), letting n → ∞ in (0.0.4), we have
∞
X
P (∪∞
n=1 An ) ≤ P (Ak ) .
k=1
Let us end this chapter with the formulation and solution to the points
problem which is essentially in sprit with the solution by Pascal initiated by
Fermat.
Note since A need m points and B need l points, one can consider the
random experiment of m + l − 1 tosses of a fair coin. This gives Ω as
Ω = {(ω1 , · · · ωm+l−1 )|ωi ∈ {H, T }}
and # Ω = 2m+l−1 .
Let E denote the event that A gets atleast m points. The E c denote the
event that B gets atleast l. Then
m+l−1 m+l−1 m+l−1
#E = + + ··· +
m m+1 m+l−1
c m+l−1 m+l−1 m+l−1
#E = + + ··· + .
0 1 m−1
7
This gives the probabilities p and q for the win of A and B respectively.
Hence clearly
p #E
= .
q #E c
This gives the division of the prize money p : q
For additional reading-not a part of the syllabus
Recall that all the examples of probability spaces we had seen till now
are with sample space finite or countable and the σ-field as the power set of
the sample space. Now let us look at a random experiment with uncountable
sample space and the σ-field as a proper subset of the power set.
Consider the random experiment in Example 1.0.5, i.e, pick a point ’at
random’ from the interval (0, 1]. Since point is picked ’at random’, the
probability measure should satisfy the following.
P [a, b] = P (a, b] = P [a, b) = P (a, b) = b − a . (0.0.6)
The σ-field we are using to define P is B(0, 1] , the σ-field generated by all
intervals in (0, 1]. B(0, 1] is called the Borel σ-field of subsets of (0, 1].
Our aim is to define P for all elements of B(0, 1] , preserving (0.0.6). Set
B0 := all finite union of intervals in (0, 1] of the form (a, b], 0 ≤ a ≤ b ≤ 1.
Clearly Ω = (0, 1] ∈ B0 .
Let A ∈ B0 . then A can be represented as
A = ∪ni=1 Ii ,
where Ii = (ai , bi ],
0 ≤ a1 < b1 ≤ a2 < b2 ≤ a3 < b3 · · · ≤ am < bm .
Then
Ac = J1 ∪ J2 ∪ · · · ∪ Jm ∪ Jm+1 ,
where
J1 = [0, a1 ], Ji = (bi−1 , ai ], i = 2, 3, · · ·, m, Jm+1 = (bm , 1].
Therefore Ac ∈ B0 .
For A, B ∈ B0 , it follows from the definition of B0 that A ∪ B ∈ B0 .
8
Hence B0 is a field.
Define P on B0 as follows.
m
X
P (A) = P (Ji ), (0.0.7)
i=1
where
A = ∪ni=1 Ji ,
Ji ’s are pair wise disjoint intervals of the form (a, b].
Extension of P from B0 to B follows from the extension theorem by
Caratheodary. To understand the statement of the extension theorem, we
need the following definition.
Definition 1.8 (Probability measure on a field) Let Ω be a nonempty
set and F be a field. Then P : F → [0, 1] is said to be a probability
measure on F if
(i) P (Ω) = 1
(ii) if A1 , A2 , · · · ∈ F be such that A0i s are pairwise disjoint and ∪∞
n=1 An ∈
F, then
X∞
P (∪∞ A
n=1 n ) = P (An ) .
n=1
Example 0.0.1 The set function P given by (0.0.7) is a probability measure
on the field B0 .
Theorem 0.0.2 (Extension Theorem) A probability measure defined on a
field F has a unique extension to σ(F).
Using Theorem 0.0.2, one can extend P defined by (0.0.7) to σ(B0 ).
Since
σ(B0 ) = B(0, 1] ,
there exists a unique probability measure P on B(0, 1] preserving (0.0.6).
Chapter 1
Random Variables-General
Facts
Key words: Random variable, Borel σ-field of subsets of R, σ-field generated
by random variable.
This chapter explains some general ideas related to random variables.
In many situations, one is interested in only some aspects of random ex-
periment/phenomenon. For example, consider the experiment of tossing 3
unbiased coins and we are only interested in number of ’Heads’ turned up.
The probability space corresponding to the experiment of tossing 3 coins is
given by
Ω = {(ω1 , ω2 , ω3 ) | ωi ∈ {H, T }} , F = P(Ω)
and P is described by
1
P ({(ω1 , ω2 , ω3 )}) = .
9
Our interest is in knowing no. of ’Heads’, i.e., in the map (ω1 , ω2 , ω3 ) →
I{ω1 =H} + I{ω2 =H} + I{ω3 =H} . In general we are interested in functions of
sample space. Also in many cases, one never observe directly the underly-
ing random phenomenon but observe ’measurements’ coming from it and
these mesaurements can be modeled as functions of the sample space. So in
short, mainly our interest lies in functions of sample space and its analysis to
make deductions about the random phenomenon. But to study a function
associated with a random phenomenon, one should be able to assign proba-
bilities to ’reasonably large class of events associated with the function. But
in general we can’t do this for all functions defined on the sample space.
9
10 CHAPTER 1. RANDOM VARIABLES-GENERAL FACTS
So one need to restrict ourself to certain class of functions of the sample
space for which we can do this and we call them random variables. In short
random variables are nothing but ’measurable observations’ from random
phenomenon.
Definition 2.1 Let Ω be a sample space and F a σ-field of subsets of Ω. A
function X : Ω → R is said to be a random variable (with respect to F) if
{ω ∈ Ω | X(ω) ≤ x} ∈ F for all x ∈ R .
Now on, we denote {ω ∈ Ω | X(ω) ≤ x} by {X ≤ x}.
Remark 1.0.1 (1) Some times we refere X as a random variable on (Ω, F)
and if no ambiguity arise, we simply call X as a random variable.
(2) In the definition of random variable, the probability measure P is not
in the picture.
(3) Also it is possible that a function may not be a random variable with
respect to a σ-field becomes a random variable with respect to another σ-field.
Example 1.0.2 Let Ω = {H, T }, F = P(Ω). Define X : Ω → R as
follows:
X(H) = 1 , X(T ) = 0 .
Then X is a random variable.
Note that any map Z : Ω → R is a random variable for the probability
space given in Example 1.0.2.
Now, we give an example, where this is not the case.
Example 1.0.3 Let Ω = {1, 2, 3, 4, 5, 6} and F = σ({{2}, {4}, {6}, {1, 3, 5}}).
Then X(1) = 1, X(ω) = 0 otherwise is not a random variable with respect
to F.
X is a random variable imply that the basic events X −1 (−∞, x] as-
sociated X are in F. Does this imply a larger class of events associated with
X can be assigned probabilities (i.e. in F ) ? To examine this, one need the
following σ-field.
11
Definition 2.2 The σ-field generated by the collection of all open sets1 in
R is called the Borel σ-field of subsets of R and is denoted by BR .
Lemma 1.0.1 Let I1 = {(−∞, x] | x ∈ R}. Then σ(I1 ) = BR
Proof. For x ∈ R,
1
(−∞, x] = ∩∞
n=1 − ∞, x + ∈ BR .
n
Therefore
I1 ⊆ BR .
Hence
σ(I1 ) ⊆ BR .
For x ∈ R,
1i
(−∞, x) = ∪∞
n=1 − ∞, x − .
n
Therefore
{(−∞, x) | x ∈ R} ⊆ σ(I1 ) .
Also, for a, b ∈ R, a < b,
(a, b) = (−∞, b) \ (−∞, a] ∈ σ(I1 ) .
Therefore, σ(I1 ) contains all open intervals for the form (a, b). Since any
open set in R can be written as a countable union of open intervals with
rational end points, it follows that
O ⊆ σ(I1 ) ,
where O denote the set of all open sets in R. Thus,
BR ⊆ σ(I1 ) .
This completes the proof.
Theorem 1.0.3 Let X : Ω → R be a function. Then X is a random
variable on (Ω, F, P ) iff X −1 (B) ∈ F for all B ∈ BR .
1
O ⊆ R is said to be an open set if for x ∈ O, there exists an ε > 0 such that
B(x, ε) := (x − ε, x + ε) ⊆ O. A set is closed in R if its complement in R is open in R.
Any open set in R can be written as a countable union of open intervals with rational end
points.
12 CHAPTER 1. RANDOM VARIABLES-GENERAL FACTS
Proof. Suppose X is a random variable. i.e.,
{X ≤ x} = X −1 {(−∞, x]} ∈ F for all x ∈ R. (1.0.1)
Set
A = {B ∈ BR | X −1 (B) ∈ F} .
From (1.0.1), we have
I1 ⊆ A ⊆ BR . (1.0.2)
Note that X −1 (R) = Ω. Hence R ∈ A.
Now
B∈A ⇒ X −1 (B) ∈ F
⇒ [X −1 (B)]c ∈ F
⇒ X −1 (B c ) ∈ F (since [X −1 (B)]c = X −1 (B c )]
⇒ Bc ∈ A .
Also
B1 , B 2 , · · · ∈ A ⇒ X −1 (Bn ) ∈ F ∀ n = 1, 2, 3, · · ·
⇒ ∪∞
n=1 X
−1 (B ) ∈ F
n
⇒ −1 ∞
X (∪n=1 Bn ) ∈ F
⇒ ∪∞
n=1 Bn ∈ A .
Hence A is a σ-field. Now from (1.0.2) and Lemma 2.0.1, it follows that
A = BR . i.e, X −1 (B) ∈ F for all B ∈ BR .
Converse statement follows from the observation
I1 ⊆ BR .
This completes the proof.
Remark 1.0.2 Theorem 1.0.3 tells that if X −1 (−∞, x] ∈ F for all x ∈
R, then X −1 (B) ∈ F for all B ∈ BR .
Example 1.0.4 Let Ω = (0, 1], F = B(0, 1] . Define X(ω) = 3ω + 1.
Here B(0, 1] is the σ-field generated by all open sets in (0, 1]2
For x ∈ R
{X ≤ x} = {ω
∈ Ω | 3 ω + 1 ≤ x} .
∅ if x < 1
1 1
= (0, 3 x − 3 ] if 1 ≤ x ≤ 4
(0, 1] if x > 4 .
2
A set O in (0, 1] is open if O = (0, 1] ∩ G, where G is an open set in R.
13
Since
1 1
∅, (0, x − ], (0, 1] ∈ B(0, 1] ,
3 3
X is a random variable.
Lemma 1.0.2 Let X : Ω → R. Define
σ(X) = {X −1 (B) | B ∈ BR } .
Then σ(X) is a σ-field.
Proof.
X −1 (R) = Ω .
Hence Ω ∈ σ(X).
For A ∈ σ(X), there exists B ∈ BR such that
X −1 (B) = A
and
Ac = X −1 (B c ) .
Hence A ∈ σ(X) implies Ac ∈ σ(X). Similarly from
X −1 (∪∞ ∞
n=1 Bn ) = ∪n=1 X
−1
(Bn )
it follows that
A1 , A2 , · · · ∈ σ(X) ⇒ ∪∞
n=1 An ∈ σ(X) .
This completes the proof.
Definition 2.3 Let X be a random variable with respect to a σ-field F.
Then σ(X) is called the σ-field generated by X.
Remark 1.0.3 (1)The occurrence or nonoccurence of the event X −1 (B)
tells whether any realization X(ω) is in B or not. Thus σ(X) collects all
such information. Hence σ(X) can be viewed as the information available
with the random variable.
(2) For a function X defined on a sample space Ω, σ(X) is a σ-field of
subsets of Ω and is the smallest σ-field under which X is a random variable.
14 CHAPTER 1. RANDOM VARIABLES-GENERAL FACTS
Example 1.0.5 Let X be as in Example 1.0.3. i.e. Ω = {1, 2, 3, 4, 5, 6}
and X = I{1} . Then
∅ if B ∩ {0, 1} = ∅
{2, 3, 4, 5, 6} if B ∩ {0, 1} = {0}
X −1 (B) =
{1} if B ∩ {0, 1} = {1}
Ω if B ∩ {0, 1} = {0, 1}
Hence σ(X) = σ({1}). It is easy to see that X is a random variable with
respect to any σ-field containing σ({1}). Now see again (2) of Remark 1.0.3
.