Probability Theory Lecture Notes PDF
Probability Theory Lecture Notes PDF
Lecture 1: Introduction
We will concentrate on the rigorous mathematical aspects, but we will try not to forget the
connections to the intuitive notion of real-life probability. These connections will enhance
our intuition, and they make probability an extremely useful tool in all the sciences. And
they make the study of probability much more fun, too! A note of caution is in order,
though: Mathematical models are only as good as the assumptions they are based on. So
probability can be used, and it can be (and quite frequently is) abused...
1
1.2 The algebra of events
A central notion in probability is that of the algebra of events (we’ll clarify later what
the word “algebra” means in this context). We begin with an informal discussion. We
imagine that probability is a function, denoted P, that takes as its argument an “event”
(i.e., occurrence of something in a real-life situation involving uncertainty) and returns a
number in [0, 1] representing how likely this event is to occur. For example, if a fair coin is
tossed 10 times and we denote the results of the tosses by X1 , X2 , . . . , X10 (where each of Xi
is 0 or 1, signifying “tails” or “heads”), then we can write statements like
Do such questions make sense? (And if they do, can you guess what the answers are?)
Maybe it is not enough to have an informal discussion to answer this...
Example 1.1 (a) An urn initially contains a white ball and a black ball. A ball is drawn
out at random from the urn, then added back and another white ball is added to the urn.
2
This procedure is repeated infinitely many times, so that after step n the urn contains 1
black ball and n + 1 white balls. For each n ≥ 1, let An denote the event that at step n the
black ball was drawn. Now let A∞ denote the event
A∞ = “in total, the black ball was selected infinitely many times”,
B∞ = “in total, the black ball was selected infinitely many times in
the second copy of the experiment”,
(in other words, the event that infinitely many of the events Bn occurred).
(c) For each n ≥ 1, let Cn be the event that both An and Bn occurred, i.e.
and let C∞ denote the event “Cn occurred for infinitely many values of n”.
Theorem 1. We have
Proof. These claims are consequences of the Borel-Cantelli lemmas which we will learn
later in the course. Here is a sketch of the proof that P (C∞ ) = 0 (remember, this is still
an “informal discussion”, so our “proof” is really more of an exploration of what formal
assumptions are needed to make the claim hold). For each n we have
1
P(An ) = P(Bn ) = ,
n+1
since at time n each of the urns contains n + 1 balls, only one of which is black. Moreover,
the choices in both rooms are made independently, so we have
1
P(Cn ) = P(An ∧ Bn ) = P(An )P(Bn ) = .
(n + 1)2
3
It turns out that to prove that P(C∞ ) = 0, the only relevant bit of information is that the
infinite series ∞
P
n=1 P(Cn ) is a convergent series; the precise values of the probabilities are
irrelevant. Indeed, we can try to do various manipulations on the definition of the event C,
as follows:
For any N ≥ 1, denote the event “Cn occurred for some n ≥ N ” by DN . Then
In particular, in order for the event C∞ to happen, DN must happen for any fixed value of
N (for example, D100 must happen, D101 must happen, etc.). It follows that C∞ is at most
as likely to happen as any of the DN ’s; in other words we have
Now, what can we say about P(DN )? Looking at the definition of DN , we see that it too
can be written as an infinite disjunction of events, namely
DN = CN ∨ CN +1 ∨ CN +2 ∨ . . . (infinite disjunction)
_∞
= Cn (shorthand for infinite disjunction).
n=N
If this were a finite disjunction, we could say that the likelihood for at least one of the
events to happen is at most the sum of the likelihoods (for example, the probability that it
will rain next weekend is at most the probability that it will rain next Saturday, plus the
probability that it will rain next Sunday; of course it might rain on both days, so the sum
of the probabilities can be strictly greater than the probability of the disjunction). What
can we say for an infinite disjunction? Since this is an informal discussion, it is impossible
to answer this without being more formal about the precise mathematical model and its
assumptions. As it turns out, the correct thing to do (in the sense that it leads to the most
4
interesting and natural mathematical theory) is to assume that this fact that holds for finite
disjunctions also holds for infinite ones. Whether this has any relevance to real life is a
different question! If we make this assumption, we get for each N ≥ 1 the bound
∞
X
P(DN ) ≤ P(Cn ).
n=N
∅ ∈ F, (A1)
A ∈ F =⇒ Ω \ A ∈ F, (A2)
A, B ∈ F =⇒ A ∪ B ∈ F. (A3)
A1 , A2 , A3 , . . . ∈ F =⇒ ∪∞
n=1 An ∈ F. (A4)
5
Example 2.2 If Ω is any set, then {∅, Ω} is a σ-algebra – in fact it is the smallest possible
σ-algebra of subsets of Ω. Similarly, the power set P(Ω) of all subsets of Ω is a σ-algebra,
and is (obviously) the largest σ-algebra of subsets of Ω.
Definition 5 (Probability measure). Given a measurable space (Ω, F), a probability mea-
sure on (Ω, F) is a function P : F → [0, 1] that satisfies the properties:
Definition 6 (Probability space). A probability space is a triple (Ω, F, P), where (Ω, F)
is a measurable space, and P is a probability measure on (Ω, F).
Probability theory can be described loosely as the study of probability spaces (this is of course
a gross oversimplification...). A more general mathematical theory called measure theory
studies measure spaces, which are like probability spaces except that the measures can take
values in [0, ∞] instead of [0, 1], and the total measure of the space is not necessarily equal to
1 (such measures are referred to as σ-additive nonnegative measures). Measure theory is
an important and non-trivial theory, and studying it requires a separate concentrated effort.
We shall content ourselves with citing and using some of its most basic results. For proofs
and more details, refer to the appendix in Durrett’s book or to a measure theory textbook.
6
(i) Monotonicity: If A, B ∈ F, A ⊂ B then P(A) ≤ P(B).
P∞
(ii) Sub-additivity: If A1 , A2 , . . . ∈ F then P(∪∞
n=1 An ) ≤ n=1 P(An ).
P(∪∞
n=1 An ) = lim P(An ).
n→∞
P(∩∞
n=1 An ) = lim P(An ).
n→∞
Example 2.3 Discrete probability spaces. Let Ω be a countable set and let p : Ω → [0, 1]
be a function such that
X
p(ω) = 1.
ω∈Ω
This corresponds to the intuitive notion of a probabilistic experiment with a finite or count-
ably infinite number of outcomes, where each individual outcome ω has a probability p(ω) of
occurring. We can put such an “elementary” or “discrete” experiment in our more general
framework by defining the σ-algebra of events F to be the set of subsets of Ω, and defining
the probability measure P by
X
P(A) = p(ω), A ∈ F.
ω∈A
7
are looking for a hypothetical probability space (Ω, F, P), in which the sample space Ω is
simply (0, 1), F is some σ-algebra of subsets of (0, 1), and P is a probability measure that
corresponds to our notion of a “uniform” choice of a random number. One plausible way to
formalize this is to require that intervals of the form (a, b) ⊂ (0, 1) be considered as events,
and that the probability for our “uniform” number to fall in such an interval should be equal
to its length b − a. In other words, we shall require that
and that
P (a, b) = b − a, (0 ≤ a < b ≤ 1). (1)
How do we generate a σ-algebra of subsets of (0, 1) that contains all the intervals? We
already saw that the set of all subsets of (0, 1) will work. But that is too large! If we take all
subsets, we will see in an exercise later that it will be impossible to construct the probability
measure P to satisfy our requirements. So let’s try to build the smallest possible σ-algebra.
One way (which can perhaps be described as the bottom-up approach) would be to start
with the intervals, then take all countable unions of such and add them to our collection of
sets, then add all countable intersections of such sets, then add all countable unions, etc.
Will this work? In principle it can be made to work, but is a bit difficult and requires
knowing something about transfinite induction. Fortunately there is a more elegant way
(but somewhat more abstract and less intuitive) of constructing the minimal σ-algebra, that
is outlined in the next exercise below, and can be thought of as the top-down approach.
The resulting σ-algebra of subsets of (0, 1) is called the Borel σ-algebra; its elements are
called Borel sets.
What about the probability measure P? Here we will simply cite a result from measure
theory that says that the measure we are looking for exists, and is unique. This is not too
difficult to prove, but doing so would take us a bit too far off course.
Theorem 9. Let B be the σ-algebra of Borel sets on (0, 1), the minimal σ-algebra containing
all the sub-intervals of (0, 1), proved to exist in the exercise below. There exists a unique
measure P on the measure space satisfying (1), called Lebesgue measure on (0, 1).
Exercise 10 (The σ-algebra generated by a set of subsets of Ω). (i) Let Ω be a set, and let
{Fi }i∈I be some collection of σ-algebras of subsets of Ω, indexed by some index set I. Prove
8
that the intersection of all the Fi ’s (i.e., the collection of subsets of Ω that are elements of
all the Fi ’s) is also a σ-algebra.
(ii) Let Ω be a set, and let A be a collection of subsets of Ω. Prove that there exists a unique
σ-algebra σ(A) of subsets of Ω that satisfies the following two properties:
2. σ(A) is the minimal σ-algebra satisfying property 1 above, in the sense that if F is any
other σ-algebra that contains all the elements of A, then σ(A) ⊂ F.
Hint for (ii): Let (Fi )i∈I be the collection of all σ-algebras of subsets of Ω that contain A.
This is a non-empty collection, since it contains for example P(Ω), the set of all subsets of
Ω. Any σ-algebra σ(A) that satisfies the two properties above is necessarily a subset of any
of the Fi ’s, hence it is also contained in the intersection of all the Fi ’s, which is a σ-algebra
by part (i) of the exercise.
Definition 11. If A is a collection of subsets of a set Ω, the σ-algebra σ(A) discussed above
is called the σ-algebra generated by A.
Example 2.5 The space of infinite coin toss sequences. Another archetypical exper-
iment in probability theory is that of a sequence of independent fair coin tosses, so let’s try
to model this experiment with a suitable probability space. If for convenience we represent
the result of each coin as a binary value of 0 or 1, then the sample space Ω is simply the set
of infinite sequences of 0’s and 1’s, namely
Ω = (x1 , x2 , x3 , . . .) : xi ∈ {0, 1}, i = 1, 2, . . . = {0, 1}N .
What about the σ-algebra F? We will take the same approach as we did in the previous
example, which is to require certain natural sets to be events, and to take as our σ-algebra
the σ-algebra generated by these “elementary” events. In this case, surely, for each n ≥ 1,
we would like the set
An (1) := {x = (x1 , x2 , . . .) ∈ Ω : xn = 1} (2)
to be an event (in words, this represents the event “the coin toss xn came out Heads”).
Therefore we take F to be the σ-algebra generated by the collection of sets of this form.
9
Finally, the probability measure P should conform to our notion of a sequence of inde-
pendent fair coin tosses. Generalizing the notation in (2), for a ∈ {0, 1} define
{(x1 , x2 , . . .) ∈ Ω : xn ∈ A}
for some n ≥ 1 and set A ∈ Fn . Then there exists a unique probability measure P on (Ω, F)
such that for any n ≥ 1 and any finite sequence
(A1 , A2 , . . . , An ) ∈ F1 × F2 × . . . × Fn
the equation
n
Y
P (x1 , x2 , . . .) ∈ Ω : x1 ∈ A1 , x2 ∈ A2 , . . . , xn ∈ An = Pk (Ak )
k=1
holds.
Exercise 13. Explain why the “infinite sequence of coin tosses” experiment is a special case
of a product of probability spaces, and why the existence and uniqueness of a probability
measure satisfying (3) follows from Theorem 12.
10
In an upcoming homework exercise we will show an alternative way of proving the ex-
istence of the probability space of infinite coin toss sequences using Lebesgue measure on
(0, 1).
Definition 14. If (Ω1 , F1 ) and (Ω2 , F2 ) are two measurable spaces, a function X : Ω1 → Ω2
is called measurable if for any set E ∈ F2 , the set
X −1 (E) = {ω ∈ Ω1 : X(ω) ∈ E}
is in F1 .
Exercise 16. Let (Ω1 , F1 ) and (Ω2 , F2 ) be two measurable spaces such that F2 is the σ-
algebra generated by a collection A of subsets of Ω2 . Prove that a function X : Ω1 → Ω2 is
measurable if and only if X −1 (A) ∈ F1 for all A ∈ A.
It follows that the random variables are exactly those real-valued functions on Ω for
which the question
“What is the probability that a < X < b?”
11
has a well-defined answer for all a < b. This observation makes it easier in practice to check
if a given function is a random variable or not, since working with intervals is much easier
than with the rather unwieldy (and mysterious, until you get used to them) Borel sets.
What can we say about the behavior of a random variable X defined on a probability
space (Ω, F, P)? All the information is contained in a new probability measure µX on the
measurable space (Ω, B) that is induced by X, defined by
The number µX (A) is the probability that X “falls in A” (or “takes its value in A”).
Exercise 17. Verify that µX is a probability measure on (Ω, B). This measure is called the
distribution of X, or sometimes referred to more fancifully as the law of X. In some
textbooks it is denoted LX .
Definition 18. If X and Y are two random variables (possibly defined on different probability
spaces), we say that X and Y are identically distributed (or equal in distribution) if
µX = µY (meaning that µX (A) = µY (A) for any Borel set A ⊂ R). We denote this
d
X = Y.
How can we check if two random variables are identically distributed? Once again,
working with Borel sets can be difficult, but since the Borel sets are generated by the intervals,
a simpler criterion involving just this generating family of sets exists. The following lemma
is a consequence of basic facts in measure theory, which can be found in the Measure Theory
appendix in Durrett’s book.
Lemma 19. Two probability measures µ1 , µ2 on the measurable space (R, B) are equal if
only if they are equal on the generating set of intervals, namely if
µ1 (a, b) = µ2 (a, b)
12
3.2 Distribution functions
Instead of working with distributions of random variables (which are probability measure on
the measurable space (R, B) and themselves quite unwieldy objects), we will encode them in
a simpler object called a distribution function (sometimes referred to as a cumulative
distribution function, or c.d.f.).
Note that we have introduced here a useful notational device that will be used again
many times in the following sections: if A is a Borel set, we will often write {X ∈ A} as
shorthand for the set {ω ∈ Ω : X(ω) ∈ A}. In words, we may refer to this as “the event that
X falls in A”. When discussing its probability, we may omit the curly braces and simply
write P (X ∈ A). Of course, one should always remember that on the formal level this is
just the set-theoretic inverse image of a set by a function!
(i) F is nondecreasing.
13
Theorem 23. If F is a distribution function, then there exists a random variable X such
that F = FX .
This fact has a measure-theoretic proof similar to the proof of Theorem 9, but fortunately
in this case, there is a more probabilistic proof that relies only on the existence of Lebesgue
measure. (This is one of many examples of probabilistic ideas turning out to be useful to
prove facts in analysis and measure theory). This involves the probabilistic concept of a
percentile that we frequently encounter in the media.
Definition 24. If X is a random variable on a probability space (Ω, F, P) and 0 < p < 1 is
a number, then a real number x is called a p-percentile of X if the inequalities
P(X ≤ x) ≥ p,
P(X ≥ x) ≥ 1 − p
hold.
Note that the question of whether t is a p-percentile of X can be answered just by knowing
the distribution function FX of X: since P(X ≤ x) = FX (x) and P(X ≥ x) = 1 − F (x−),
we can write the conditions above as
FX (x−) ≤ p ≤ FX (x).
Lemma 25. A p-percentile for X always exists. Moreover, the set of p-percentiles of X is
equal to the (possibly degenerate) closed interval [ap , bp ], where
Proof of Theorem 23. Let ((0, 1), B, P) be the unit interval with Lebesgue measure, repre-
senting the experiment of drawing a uniform random number in (0, 1). We shall construct
our random variable X on this space. Inspired by the discussion of percentiles above, we
define
X(p) = sup{y : F (y) < p}.
14
If F were the distribution function of a random variable, then X(p) would be its (minimal)
p-percentile. Note that X is a monotone nondecreasing function on (0, 1), hence measurable,
so it is in fact a random variable. We need to show that F is its distribution function. We
will show that for each p ∈ (0, 1) and x ∈ R, we have that X(p) ≤ x if and only if p ≤ F (x).
This will imply that for every x ∈ R we have the equality of sets
{p : X(p) ≤ x} = {p : p ≤ F (x)},
The function X defined in the proof above is sometimes referred to as the (lower) per-
centile function of the distribution F . Note that if F is a strictly increasing function then
X is simply its set-theoretic inverse function.
3.3 Examples
Example 3.6 Indicator random variables If A is an event in a probability space
(Ω, F, P), its indicator random variable is the r.v. 1A defined by
0 ω ∈ / A,
1A (ω) =
1 ω ∈ A.
The above discussion shows that to specify the behavior of a random variable, it is enough
to specify its distribution function. Another useful concept is that of a density function.
If F = FX is a distribution function such that for some nonnegative function f : R → R we
have Z x
F (x) = f (y) dy, (y ∈ R), (4)
−∞
15
then we say that X has a density function f . Note that f determines F but is itself only
determined by F up to “small” changes that do not affect the integral in (4) (the precise
term is the measure-theoretic term “up to measure 0”). For example, changing the valuef
in a finite number of points results in a density function that is equally valid for computing
F.
More generally, if a < b we say that X is a uniform random variable in the interval
(a, b) if it has the (respective) distribution and density functions
0 x ≤ a,
1 a ≤ x ≤ b,
b−a
F (x) = x−ab−a
a ≤ x ≤ b, f (x) =
0 otherwise.
1 x ≥ b,
Example 3.9 Standard normal distribution. The normal (or gaussian) distribution is
given in terms of its density function
1 2
f (x) = √ e−x /2 .
2π
16
The cumulative distribution function is denoted by
Z x
1 2
Φ(x) = √ e−y /2 dy.
2π −∞
This integral cannot be evaluated explicitly in terms of more familiar functions, but Φ is an
important special function of mathematics nonetheless.
Definition 27. The Borel σ-algebra on Rd is defined in one of the following equivalent ways:
(iv) It is the minimal σ-algebra of subsets of Rd such that the coordinate functions πi :
Rd → R defined by
πi (x) = xi , i = 1, 2, . . . , d
are all measurable (where measurability is respect to the Borel σ-algebra on the target
space R).
Exercise 28. Check that the definitions above are indeed all equivalent.
17
Definition 29. A random (d-dimensional) vector (or vector random variable)
X = (X1 , X2 , . . . , Xd ) on a probability space (Ω, F, P) is a function X : Ω → Rd that is
measurable (as a function between the measurable spaces (Ω, F) and (Rd , B).
Lemma 30. X = (X1 , . . . , Xi ) is a random vector if and only if Xi is a random variable for
each i = 1, . . . , d.
Exercise 31. (i) Prove that any continuous function f : Rm → Rn is measurable (when
each of the spaces is equipped with the respective Borel σ-algebra).
(ii) Prove that the composition g ◦ f of measurable functions f : (Ω1 , F1 ) → (Ω2 , F2 ) and
g : (Ω2 , F2 ) → (Ω3 , F3 ) (where (Ωi , Fi ) are measurable spaces for i = 1, 2, 3) is a measurable
function.
(iii) Deduce that the sum X1 + . . . + Xd of random variables is a random variable.
Exercise 32. Prove that if X1 , X2 , . . . is a sequence of random variables (all defined on the
same probability space, then the functions
are all random variables. Note: Part of the question is to generalize the notion of random
variable to a function taking values in R = R ∪ {−∞, +∞}, or you can solve it first with
the additional assumption that all the Xi ’s are uniformly bounded by some constant M .
18
similarly to the one-dimensional case. The measure µX is also called the joint distribution
(or joint law) of the random variables X1 , . . . , Xd .
Once again, to avoid having to work with measures, we introduce the concept of a d-
dimensional distribution function.
FX (x1 , x2 , . . . , xd ) = P(X1 ≤ x1 , X2 ≤ x2 , . . . , Xd ≤ xd )
= µX (−∞, x1 ] × (−∞, x2 ] × . . . × (−∞, xd ]
(iv) F is right-continuous, i.e., F (x+) := limy↓x F (y) = F (x), where here y ↓ x means
that yi ↓ xi in each coordinate.
(v) For 1 ≤ i ≤ d and a < b, denote by ∆xa,b the differencing operator in the variable x,
which takes a function f of the real variable x (and possibly also dependent on other
variables) and returns the value
Theorem 35. Any function F satisfying the properties in Theorem 34 above is a distribution
function of some random vector X.
19
4.3 Independence
Definition 36. Events A, B ∈ F in a probability space (Ω, F, P) are called independent if
P(A ∩ B) = P(A)P(B).
More generally, a family A = (Ai )i∈I of events in a probability space (Ω, F, P) is called
an independent family if for any finite subset Ai1 , Ai1 , . . . , Aik ∈ A of distinct events in the
family we have that
k
Y
P(Ai1 ∩ Ai2 ∩ . . . ∩ Aik ) = P(Aij ).
j=1
Definition 37. Random variables X, Y on a probability space (Ω, F, P) are called indepen-
dent if
P(X ∈ E, Y ∈ F ) = P(X ∈ E)P(Y ∈ F )
for any Borel sets E, F ⊂ R. In other words any two events representing possible statements
about the behaviors of X and Y are independent events.
It follows from the above definitions that r.v.’s X, Y are independent if and only if the
σ-algebras σ(X), σ(Y ) generated by them are independent σ-algebras.
Definition 40. If (Ω, F, P) is a probability space(F and (Fi )i∈I is some family of sub-σ-
algebras of F (i.e., σ-algebras that are subsets of F, we say that (Fi )i∈I is an independent
family of σ-algebras if for any i1 , i2 , . . . ik ∈ I and events A1 ∈ Fi1 , A2 ∈ Fi2 , . . . , Ak ∈ Fik ,
the events A1 , . . . , Ak are independent.
20
Definition 41. A family (Xi )i∈I of random variables defined on some common probability
space (Ω, F, P) is called an independent family of random variables if the σ-algebras
{σ(Xi )}i∈I form an independent family of σ-algebras.
Unraveling these somewhat abstract definitions, we see that (Xi )i∈I is an independent
family of r.v.’s if and only if we have
k
Y
P(Xi1 ∈ A1 , . . . Xik ∈ Ak ) = P(Xij ∈ Aj )
j=1
Theorem 42. If (Fi )i∈I are a family of sub-σ-algebras of the σ-algebra of events F in
a probability space, and for each i ∈ I, the σ-algebra Fi is generated by a family Ai of
subsets of Ω, and each family Ai is closed under taking the intersection of two sets (such a
family is called a π-system), then the family (Fi )i∈I is independent if and only if for each
i1 , . . . , ik ∈ I, and finite sequence of events A1 ∈ Ai1 , A2 ∈ Ai2 , . . . , Ak ∈ Aik is independent.
Proof. This uses Dynkin’s π −λ theorem from measure theory. See [Durrett], Theorem (4.2),
p. 24.
Lemma 43. If X1 , . . . , Xd are random variables defined on a common probability space, then
they are independent if and only if for all x1 , . . . , xd ∈ R we have that
21
then X1 , . . . , Xd are independent.
(ii) Show that if X1 , . . . , Xd are random variables taking values in a countable set S, then in
order for X1 , . . . , Xd to be independent it is enough that for all x1 , . . . , xd ∈ S we have
For this reason, the event lim sup An is often denoted by {An infinitely often} or {An i.o.}.
The definition of the event lim inf An can similarly be given meaning by writing
22
Theorem 46 (Borel-Cantelli lemmas).
P∞
(i) If n=1 P(An ) < ∞ then P(An i.o.) = 0.
P∞
(ii) If n=1 P(An ) = ∞ and (An )∞
n=1 are independent then P(An i.o.) = 1.
Proof. We essentially already proved part (i) in the first lecture, but here is a more general
repetition of the same argument.
∞ [
∞
! ∞
! ∞
\ [ X
P(An i.o.) = P An ≤ inf P An ≤ inf P(An ).
N ≥1 N ≥1
N =1 n=N n=N n=N
P∞
Since we assumed that n=1 P(An ), converges, this last expression is equal to 0.
Proof of (ii): Consider the complementary event that the An ’s did not occur for infinitely
many values of n. Using De-Morgan’s laws, we get
∞ [
∞
!c ! ∞ \
∞
!
\ [
P({An i.o.}c ) = P An =P Acn
N =1 n=N N =1 n=N
∞ ∞
!
X \
≤ P Acn .
N =1 n=N
P∞
So, to show that this is 0 (under the assumptions that n=1 P(An ) = ∞ and that the
events are independent), we show that P (∩∞ c
n=N An ) = 0 for all N ≥ 1. Since the events
are independent, the probability of the intersection is the product of the probabilities, so we
need to show that ∞
Y
(1 − P(An )) = 0,
n=N
But − log(1 − x) ≥ x for all x > 0, so this follows from the assumption that the series of
probabilities diverges.
23
Lecture 6: A brief excursion into measure theory
Here we briefly mention some of the basic definitions and results from measure theory, and
point out how we used them in the previous lectures. The relevant material is covered in
Appendices A.1 and A.2 in Durrett’s book (pages 437-448). It is not required reading, but if
you read it you are perhaps more likely to attain a good understanding of the material that
is required...
Definition 47. (i) A π-system is a collection P of subsets of a set Ω that is closed under
intersection of two sets, i.e., if A, B ∈ P then A ∩ B ∈ P.
(ii) A λ-system is a collection L of subsets of a set Ω such that: 1. Ω ∈ L; 2. If A, B ∈ L
and A ⊂ B then B \ A ∈ L; 3. If (An )∞
n=1 are all in L and An ↑ A then A ∈ L.
The following is a somewhat technical result that turns out to be quite useful:
The uniqueness theorem implies for example that to check if random variables X and Y
are equal in distribution, it is enough to check that they have the same distribution functions.
Both of the above results are used in the proof of this important theorem in measure
theory:
Carathéodory’s extension theorem is the main tool used in measure theory for construct-
ing measures: one always starts out by defining the measure on some relatively small family of
24
sets and then extending to the generated σ-algebra (after verifying σ-additivity, which often
requires using topological arguments, e.g., involving compactness). Applications include:
Note that Durrett’s book also talks about measures that are not probability measures,
i.e., the total measure of the space is not 1 and may even be infinite. In this setting, the
theorems above can be formulated in greater generality.
End of Part I
25