KEMBAR78
MTH451 Study Notes | PDF | Probability Distribution | Probability Theory
0% found this document useful (0 votes)
11 views29 pages

MTH451 Study Notes

The document outlines the postulates, definitions, and theorems related to probability theory, including key concepts such as conditional probability, independence, and various types of random variables. It provides detailed definitions for probability distributions, expected values, moments, and specific distributions like Bernoulli and binomial. The document serves as a comprehensive reference for foundational principles in probability and statistics.

Uploaded by

mmohamo999
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views29 pages

MTH451 Study Notes

The document outlines the postulates, definitions, and theorems related to probability theory, including key concepts such as conditional probability, independence, and various types of random variables. It provides detailed definitions for probability distributions, expected values, moments, and specific distributions like Bernoulli and binomial. The document serves as a comprehensive reference for foundational principles in probability and statistics.

Uploaded by

mmohamo999
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

MTH451: Postulates, Definitions,

Theorems, and Expansions

April 17, 2009

1 Postulates of Probability
• Postulate 1: The probability of an event is a non-negative real number;
that is, P (A) ≥ 0 for any subset A of S.

• Postulate 2: P (S) = 1.
• Postulate 3: If A1 , A2 , A3 , ..., is a finite or infinite sequence of mutually
exclusive events of S, then

P (A1 ∪ A2 ∪ A3 ∪ ...) = P (A1 ) + P (A2 ) + P (A3 ) + ...

1
2 Definitions
• Definition 2.1: If A and B are any two events in a sample space S and
P (A) 6= 0, the conditional probability of B given A is

P (A ∩ B)
P (B|A) =
P (A)

• Definition 2.2: Two events A and B are independent if and only if

P (A ∩ B) = P (A) · P (B)

• Definition 2.3: Events A1 , A2 , ..., and Ak are independent if and only


if the probability of the intersection of any 2, 3, ..., or k of these evetns
equals the product of their respective probabilities.

• Definition 2.4: A countale collection B1 , B2 , ..., Bi , ... of events is


called a partition of a sample space S if they are pairwise disjoint (i.e.,
Ai ∩ Aj = ∅ for i 6= j) and ∪i Bi = S.
• Definition 3.1: If S is a sample space with a probability measure and X
is a real-valued function defined over the elements of S, then X is called
a random variable.
• Definition 3.2: If X is a discrete random variable, the function given
by f (x) = P (X = x) for each x within the range of X is called the
probability distribution of X.

• Definition 3.3: If X is a discrete random variable, the function given by


X
F (x) = P (X ≤ x) = f (t), −∞ < x < ∞
t≤x

where f (t) is the value of the probability distribution of X at t, is called


the distribution function, or the cumulative distribution, of X.

• Definition 3.4: A function with values f (x), defined over the set of
all real numbers, is called a probability distribution function of the
continuous random variable X if and only if
Z b
P (a ≤ X ≤ b) = f (x)dx
a

for any real constants a and b with a ≤ b.

2
• Definition 3.5: If X is a continuous random variable and the value of its
probability density at t is f (t), then the function given by
Z x
F (x) = P (X ≤ x) = f (t)dt, −∞ < x < ∞
−∞

is called the cumulative distribution function of X.


• Definition 3.6: If X and Y are discrete random variables, the function
given by f (x, y) = P (X = x, Y = y) for each pair of values (x, y) within
the range of X and Y is called the joint probability distribution of X
and Y .
• Definition 3.7: If X and Y are discrete random variables, the function
given by XX
F (x, y) = P (X ≤ x, Y ≤ y) = f (s, t)
s≤x t≤y

for −∞ < x < ∞ and −∞ < y < ∞, where f (s, t) is the value of
the joint probability distribution of X and Y at (s, t) is called the joint
cumulative distribution, of X and Y .
• Definition 3.8: A bivariate function with values f (x, y), defined over
the xy-plane, is called a joint probability density function of the
continuous random variables X and Y if and only if
Z Z
P [(X, Y ) ∈ A] = f (x, y) dx dy
A

for any region A in the xy-plane.Given n continuous random variables, X1 ,


..., Xn , the multivariate function with values f (x1 , ..., xn ), defined over
the n-dimensional plane, is a generalized version of the joint probability
density function if and only if
Z Z
P [(X1 , ..., Xn ) ∈ A] = ... f (x, y) dx1 ... dxn
A

for any region A in the n-dimensional plane


• Definition 3.9: If X and Y are continuous random variables, the function
given by
Z y Z x
F (x, y) = P (X ≤ x, Y ≤ y) = f (s, t) ds dt, x, y ∈ (−∞, ∞)
−∞ −∞

where f (s, t) is the value of the joint probability density of X and Y at


(s, t), is called the joint distribution function of X and Y . In the
n-dimensional case, the joint distribution function is given as
F (x1 , ..., xn ) = P (X1 ≤ x1 , ..., Xn ≤ xn ) =
Z xn Z x1
... f (t1 , ..., tn ) dt1 ...dtn , ti ∈ (−∞, ∞) ∀i ∈ [1, n]
−∞ −∞

3
• Definition 3.10: If X and Y are discrete random variables and f (x, y)
is the value of their joint probability distribution at (x, y), the function
given by X
g(x) = f (x, y)
y

for each x within the range of X is called the marginal distribution of


X. Correspondingly, the function given by
X
h(y) = f (x, y)
x

for each y within the range of Y is called the marginal distribution of


Y . These can generalized to n-dimensions.
• Definition 3.11: If X and Y are continuous random variables and f (x, y)
is the value of their joint probability distribution at (x, y), the function
given by Z ∞
g(x) = f (x, y) dy
−∞
for each x within the range of X is called the marginal density of X.
Correspondingly, the function given by
Z ∞
h(y) = f (x, y) dx
−∞

for each y within the range of Y is called the marginal density of Y .


These can generalized to n-dimensions.
• Definition 3.12: If f (x, y) is the value of the joint probability distribution
of the discrete random variables X and Y at (x, y), and h(y) is the value
of the marginal distribution of Y at y, the function given by
f (x, y)
f (x|y) = h(y) 6= 0
h(y)
for each x within the range of X, is called the conditional distribution
of X given Y = y. Correspondingly, if g(x) is the value of the marginal
distribution of X at x, the function given by
f (x, y)
w(y|x) = g(x) 6= 0
g(x)
for each y within the range of Y , is called the conditional distribution
of Y given X = x.
• Definition 3.13: If f (x, y) is the value of the joint density of the con-
tinuous random variables X and Y at (x, y), and h(y) is the value of the
marginal density of Y at y, the function given by
f (x, y)
f (x|y) = h(y) 6= 0
h(y)

4
for x ∈ (−∞, +∞), is called the conditional density of X given Y = y.
Correspondingly, if g(x) is thevalue of the marginal density of X at x, the
function given by
f (x, y)
w(y|x) = g(x) 6= 0
g(x)
for y ∈ (−∞, +∞), is called the conditional density of Y given X = x.
• Definition 3.14: If f (x1 , x2 , ..., xn ) is the value of the joint probabil-
ity distribution of the n discrete random variables X1 , X2 , ..., Xn at
(x1 , x2 , ..., xn ), and fi (xi ) is the value of the marginal distribution of Xi
at xi for i = 1, 2, ..., n, then the n random variables are independent if
and only if

f (x1 , x2 , ..., xn ) = f1 (x1 ) · f2 (x2 ) · ... · fn (xn )

for all (x1 , x2 , ..., xn ) within their range.


• Definition 4.1: If X is a discrete random variable and f (x) is the value
of its probability distribution at x, the expected value of X is
X
E(X) = x · f (x)
x

Correspondingly, if X is a continuous random variable and f (x) is the


value of its probability density at x, the expected value of X is
Z +∞
E(X) = x · f (x)dx
−∞

• Definition 4.2: The rth moment about the origin of a random vari-
able X, denoted by µ0r , is the expected value of X r ; symbolically,
X
µ0r = E(X r ) = xr · f (x)
x

for r = 0, 1, 2, ... when X is discrete, and


Z +∞
µ0r = E(X r ) = xr · f (x)dx
−∞

when X is continuous.
• Definition 4.3: µ01 is called the mean of the distribution X, or simply
the mean of X, and it is denoted by µ.
• Definition 4.4: The rth moment about the mean of a random variable
r
X, denoted by µr , is the expected value of (X − µ) ; symbolically,
X
µr = E [(X − µ)r ] = (x − µ)r · f (x)
x

5
for r = 0, 1, 2, ... when X is discrete, and
Z +∞
µr = E [(X − µ)r ] = (x − µ)r · f (x)dx
−∞

when X is continuous.
• Definition 4.5: µ2 is called the variance of the distribution X, or simply
the variance of X, and it is denoted by σ 2 , var(X), or V (X); σ, or the
positive square root of the variance, is called the standard deviation.
• Definition 4.6: The moment-generating function of a random vari-
able X, where it exists, is given by
X
MX (t) = E(etX ) = etx · f (x)
x

when X is discrete and


Z +∞
MX (t) = E(etX ) = etx · f (x)dx
−∞

when X is continuous.
• Definition 4.7: The rth and sth product moment about the origin
of the random variables X and Y , denoted µ0r,s is the expected value of
X r Y s ; symbolically,
XX
µ0r,s = E(X r Y s ) = xr y s · f (x, y)
x y

for r = 0, 1, 2, ... and s = 0, 1, 2, .... when X and Y are discrete, and


Z +∞ Z +∞
µ0r,s = E(X r Y s ) = xr y s · f (x, y) dx dy
−∞ −∞

when X and Y are continuous.


• Definition 4.8: The rth and sth product moment about the means
of the random variables X and Y , denoted by µr,s is the expected value
of (X − µX )r (Y − µY )s ; symbolically,
XX
µ0r,s = E [(X − µX )r (Y − µY )s ] = (x − µX )r (y − µY )s · f (x, y)
x y

for r = 0, 1, 2, ... and s = 0, 1, 2, ... when X and Y are discrete, and,


Z +∞ Z +∞
µ0r,s = E [(X − µX )r (Y − µY )s ] = (x−µX )r (y−µY )s ·f (x, y) dx dy
−∞ −∞

when X and Y are continuous.

6
• Definition 4.9: µ1,1 is called the covariance of X and Y , and it is
denoted by σXY , cov(X, Y ), or C(X, Y ).
• Definition 4.10: If X is a discrete random variable and f (x|y) is the
value of the conditional probability distribution of X given Y = y at x,
the conditional expectation of u(X) given Y = y is
X
E [u(X)|y] = u(x) · f (x|y)
x

Correspondingly, if X is a continuous random variable and f (x|y) is the


value of the conditional probability density of X given Y = y at x, the
conditional expectation of u(X) given Y = y is
Z ∞
E [u(X)|y] = u(x) · f (x|y) dx
−∞

• Definition 5.1: A random variable X has a discrete uniform distri-


bution and it is referred to as a discrete uniform variables if and only if
its probability distribution is given by
1
f (x) = ∀x ∈ {x1 , x2 , ..., xk }
k
where xi 6= xj when i 6= j.
• Definition 5.2: A random variable X has a Bernoulli distribution and
it is referred to as a Bernoulli random variable if and only if its probability
distribution is given by

f (x; θ) = θx (1 − θ)1−x x = 0, 1

for θ ∈ (0, 1).


• Definition 5.3: A random variable X has a binomial distribution and
it is referred to as a binomial random variable if and only if its probability
distribution is given by

b(x; n, θ) = (nx ) θx (1 − θ)n−x x = 0, 1, 2, ..., n

• Definition 5.4: A random variable X has a negative binomial distri-


bution and it is referred to as a negative binomial random variable if and
only if its probability distribution is given by

b∗ (x; k, θ) = x−1
 k k−x
k−1 θ (1 − θ)

for x = k, k + 1, k + 2, ....

7
• Definition 5.5: A random variable X has a geometric distribution
and it is referred to as a geometric random variable if and only if its
probability distribution is given by
g(x; θ) = θ(1 − θ)x−1
for x = 1, 2, 3, ....
• Definition 5.6: A random variable X has a hypergeometric distribu-
tion and it is referred to as a hypergeometric random variable if and only
if its probability distribution is given by
M
 N −M 
x n−x
h(x; n, N, M ) =
(N
n)

for x = 0, 1, 2, ..., n; x ≤ M ; and n − x ≤ N − M .


• Definition 5.7: A random variable X has a Poisson distribution and
it is referred to as a Poisson random variable if and only if its probability
distribution is given by
λx e−λ
p(x; λ) =
x!
for x = 0, 1, 2, ....
• Definition 5.8: The random variables X1 , X2 , ..., Xn have a multino-
mial distribution and are referred to as a multinomial random variables
if and only if their joint probability distribution is given by
n
 x1 x2
f (x1 , x2 , ..., xn ; n, θ1 , θ2 , ..., θn ) = x1 ,x2 ,...,x n
· θ1 · θ2 ... · θnxn
Pk Pk
for xi = 0, 1, ..., n for each i, where i=1 xi = n and i=1 θi = 1.
• Definition 5.9: The random variables X1 , X2 , ..., Xn have a multivari-
ate hypergeometric distribution and are referred to as a multivariate
hypergeometric random variables if and only if their joint probability dis-
tribution is given by
M1
 M  Mk

x2 ... xk
2
x1
f (x1 , x2 , ..., xn ; n, M1 , M2 , ..., Mk ) =
(N
n)
Pk
for xi = 0, 1, ..., n and xi ≤ Mi for each i, where i=1 xi = n and
Pk
i=1 Mi = N .

• Definition 6.1: A random variable has a uniform distribution and it


is referred to as a continuous uniform random variable if and only if its
probability density is given by
1
u(x; α, β) =
β−α
for x ∈ (α, β) and u(x; α, β) = 0 elsewhere.

8
• Definition 6.2: A random variable has a gamma distribution and it
is referred to as a gamma random variable if and only if its probability
density is given by
1
g(x; α, β) = xα−1 e−x/β
β α Γ(α)
for x > 0 and g(x; α, β) = 0 elsewhere, where α > 0 and β > 0.
• Definition 6.3: A random variable has an exponential distribution
and it is referred to as an exponential random variable if and only if its
probability density is given by
1 −x/θ
g(x; θ) = e
θ
for x > 0 and g(x; θ) = 0 elsewhere, where θ > 0.
• Definition 6.4: A random variable X has a chi-square distribution
and it is referred to as a chi-square random variable if and only if its
probability density is given by
1 ν−2 x
f (x) = x 2 e− 2
2ν/2 Γ(ν/2)
for x > 0 and f (x) = 0 elsewhere. The parameter ν is referred to as the
number of degrees of freedom, or simply the degrees of freedom.
• Definition 6.5: A random variable has a beta distribution and it is
referred to as a beta random variable if and only if its probability density
is given by
Γ(α + β) α−1
f (x) = x (1 − x)β−1
Γ(α) · Γ(β)
for x ∈ (0, 1) and f (x) = 0 elsewhere, where α > 0 and β > 0.
• Definition 6.5X: A random variable has a Cauchy distribution and
it is referred to as a Cauchy random variable if and only if its probability
density is given by
β/π
f (x) =
(x − α)2 + β 2
for x ∈ (−∞, ∞).
• Definition 6.6: A random variable X has a normal distribution and
it is referred to as a normal random variable if and only if its probability
density is given by
1 1 x−µ 2
n(x; µ, σ) = √ e− 2 ( σ )
σ 2π
for x ∈ (−∞, ∞) and where σ > 0.

9
• Definition 6.7: The normal distribution with µ = 0 and σ = 1 is referred
to as the standard normal distribution.
• Definition 6.8: A pair of random variables X and Y have a bivariate
normal distribution and they are referred to as jointly normally dis-
tributed random variables if and only if their joint probability density is
given by
 2     2 
1 x−µ1 2−µ1 y−µ2 y−µ2
− 2(1−ρ2 ) σ1 − 2ρ σ1 σ2 + σ2
f (x, y) = p
2πσ1 σ2 1 − ρ2

for x ∈ (−∞, ∞) and y ∈ (−∞, ∞), where σ1 > 0, σ2 > 0, and −1 < ρ <
1.
• Definition 8.1: If X1 , X2 , ..., Xn are independent and identically dis-
tributed random variables, we say that they constitute a random sample
from the infinite population given by their common distribution.
• Definition 8.2: If X1 , X2 , ..., Xn constitute a random sample, then
Pn
Xi
X̄ = i=1
n
is called the sample mean and
Pn
2 (Xi − X̄)2
S = i=1
n−1
is called the sample variance.
• Definition 8.3: If X1 is the first value drawn from a finite population of
size N , X2 is the second drawn, ..., Xn is the nth value drawn, and the
joint probability distribution of the n random variables is given by
1
f (x1 , x2 , ..., xn ) =
N (N − 1)...(N − n + 1)

for each ordered n-tuple of values of these random values, then X1 , X2 ,


and Xn are said to constitute a random sample from the given finite
population.
• Definition 8.4: The mean and the variance of the finite population
{c1 , c2 , ..., cN } are
N N
X 1 X 1
µ= ci · , σ2 = (ci − µ)2 ·
i=1
N i=1
N

10
3 Theorems
• Theorem 1.1 - Multiplication of Choices: If an operation consists of
two steps, of which the first can be done in n1 ways and for each of these
the second can be done in n2 ways, then the whole operation can be done
in n1 · n2 ways.
• Theorem 1.2 - Multiplication of Choices (Generalized): If an op-
eration consists of k steps, of which the first can be done in n1 ways, for
each of these the second step can be done in n2 ways, for each of the first
two the third step can be done in n3 ways, and so forth, then the whole
operation can be done in n1 · n2 · ... · nk ways.
• Theorem 1.3 - Permutations of n distinct objects: The number of
permutations of n distinct objects is n!.
• Theorem 1.4 - Permutations of n distinct objects taken r at a
time: The number of permutations of n distinct objects taken r at a time
is
n!
n Pr =
(n − r)!
for r = 0, 1, 2, ..., n.
• Theorem 1.5 - Permutations of n distinct objects arranged on a
circle: The number of permutations of n distinct objects arranged in a
circle is (n − 1)!.
• Theorem 1.6 - Permutations of n objects of which some are alike:
The number of permutations of n objects of which n1 are of one kind, n2
are of a second kind, ..., nk are of a kth kind, and n1 + n2 + ... + nk = n is
n!
n1 ! · n2 ! · ... · nk !

• Theorem 1.7 - Combinations of n distinct objects taken r at a


time: The number of combinations of n distinct objects taken r at a time
is
n!
(nr ) =
r!(n − r)!
for r = 0, 1, 2, ..., n.
• Theorem 1.8 - Partitions of n distinct objects into k subsets: The
number of ways in which a set of n distinct objects can be partitioned
into k subsets with n1 objects in the first subset, n2 objects in the second
subset, ..., and nk objects in the kth subset is
n
 n!
n1 ,n2 ,...,nk =
n1 ! · n2 ! · ... · nk !

11
• Theorem 1.9 - Binomial Expansion: For any positive integer n
n
X
(x + y)n = (nr ) xn−r y r
r=0

• Theorem 1.10 - An identity for binomial coefficients: For any


positive integers n and r = 01, 2, ..., n,

(nr ) = n−r n


• Theorem 1.11 - A second identity for binomial coefficients:For


any positive integers n and r = 01, 2, ..., n − 1,

(nr ) = n−1 + n−1


 
r r−1

• Theorem 1.12 - A third identity for binomial coefficients:


k
X
m+n
(m n
 
r ) k−r = k
r=0

• Theorem 2.1 - Probabilities of events and individual outcomes:


If A is an event in a siscrete sample space S, then P (A) equals the sum
of the probabilities of the individual outcomes comprising A.
• Theorem 2.2 - Probability formula for equally likely outcomes: If
an experiment can result in any one of N different equally likely outcomes,
and if n of these outcomes together constitute event A, then the probability
of event A is
n
P (A) =
N

• Theorem 2.3 - Probabilities of complementary events: If A and


Ac are complementary events in a sample space S, then

P (Ac ) = 1 − P (A)

• Theorem 2.4 - Probability of empty set: P (∅) = 0 for any sample


space S.
• Theorem 2.5 - Relationship between probabilities of A and B
when A ⊂ B: If A and B are events in a sample space S and A ⊂ B,
then P (A) ≤ P (B).

12
• Theorem 2.6 - Maximum and minimum values of probabilities:
0 ≤ P (A) ≤ 1 for any event A.
• Theorem 2.7 - Addition rule for two events: If A and B are any
two events in a sample space S, then

P (A ∪ B) = P (A) + P (B) − P (A ∩ B)

• Theorem 2.8 - Addition rule for three events: If A, B, and C are


any three events in a sample space S, then

P (A ∪ B ∪ C) = P (A) + P (B) + P (C) − P (A ∩ B) − P (A ∩ C)

−P (B ∩ C) + P (A ∩ B ∩ C)

• Theorem 2.8x - Addition rule for n events: If A1 , A2 , ..., An are


any n events in a sample space S, then
 
n
X X
P (∪ni=1 Ai ) = (−1)(i−1) P (∩ik=1 Aji,k )
i=1 1≤ji,1 <ji,2 <...<ji,i ≤n

• Theorem 2.9 - Multiplication rule for two events: If A and B are


any two events in a sample space S and P (A) 6= 0, then

P (A ∩ B) = P (A) · P (B|A)

• Theorem 2.10 - Multiplication rule for three events: If A, B, and


C are any three events in a sample space S such that P (A ∩ B) 6= 0, then

P (A ∩ B ∩ C) = P (A) · P (B|A) · P (C|A ∩ B)

• Theorem 2.11 - Independence of events and their complements:


If A and B are independent, then A and B c are also independent.
• Theorem 2.12 - Rule of total probability (rule of elimination): If
the events B1 , B2 , ..., and Bk constitute a partition of the sample space
S and P (Bi ) 6= 0 for ∀i, then for any event A in S
k
X
P (A) = P (Bi ) · P (A|Bi )
i=1

13
• Theorem 2.13 - Bayes’ Theorem : If B1 , B2 , ..., Bk constitute a
partition of the sample space S and P (Bi ) 6= 0 for ∀i, then for any event
A in S such that P (A) 6= 0

P (Br ) · P (A|Br )
P (Br |A) = Pk
i=1 P (Bi ) · P (A|Bi )

for ∀k.
• Theorem 3.1 - Conditions for function to serve as probability
distribution: A function can serve as the probability distribution of a
discrete random variable X if and only if its values, f (x), satisfy the
conditions

– 1. f (x) ≤ 0 for each value within its domain;


P
– 2. x f (x) = 1, where the summation extends over all the values
within its domain.
• Theorem 3.2 - Conditions satisfied by values of distribution func-
tion: The values of F (x) of the distribution function of a discrete random
variable X satisfy the conditions
– 1. F (−∞) = 0 and F (∞) = 1;
– 2. if a < b, then F (a) ≤ F (b) for any real numbers a and b.

• Theorem 3.3 - Values of probability distribution expressed in


terms of values of cumulative distribution function: If the range
of a random variable X consists of the values x1 < x2 < x3 < ... < xn ,
then f (x1 ) = F (x1 ) and

f (xi ) = F (xi ) − F (xi−1 ) i = 2, 3, ..., n

• Theorem 3.4 - For continuous random variables, inclusion and


exclusion of endpoints of interval: If X is a continuous random vari-
able and a and b are real constants with a ≤ b, then

P (a ≤ X ≤ b) = P (a ≤ X < b) = P (a < X ≤ b) = P (a < X < b)

• Theorem 3.5 - Conditions for function to serve as probability


density: A function can serve as a probability density of a continuous
random variable X if its values, f (x), satisfy the conditions

– 1. f (x) ≥ 0 for −∞ < x < ∞;


R∞
– 2. −∞ f (x)dx = 1.

14
• Theorem 3.6 - Relationship between values of probability density
and distribution function: If f (x) and F (x) are the values of the
probability density and the cumulative distribution of X at x, then

P (a ≤ X ≤ b) = F (b) − F (a)

for any real constants a and b with a ≤ b, and


d
f (x) = F (x)
dx
where the derivative exists.

• Theorem 3.7 - Conditions for bivariate function to serve as joint


probability density function: A bivariate function can serve as the
joint probability distribution of a pair of discrete random variables X and
Y if and only if its values, f (x, y), satisfy the conditions
– 1. f (x, y) ≥ 0 for each pair of values (x, y) within its domain;
P P
– 2. x y f (x, y) = 1, where the double summation extends over all
possible pairs (x, y) within its domain.
• Theorem 3.8 - Conditions for bivariate function to serve as joint
probability density: A bivariate function can serve as a joint probability
density function of a pair of continuous random variables X and Y if its
values, f (x, y), satisfy the conditions
– 1. f (x, y) ≥ 0 ∀x, y ∈ (−∞, ∞);
R∞ R∞
– 2. −∞ −∞ f (x, y) dx dy = 1.
This can be generalized to the n-dimensional case, i.e., n independent
variables X1 ,...,Xn with f (x1 , ..., xn ) such that

– 1. f (x1 , ..., xn ) ≥ 0 ∀xi ∈ (−∞, ∞);


R∞ R∞
– 2. −∞ ... −∞ f (x1 , ..., xn ) dx1 ... dxn = 1.
• Theorem 3.8x - Partial Derivatives: Analogous to Theorem 3.6, Def-
inition 3.9 leads to in the 2-dimensional case
∂2
f (x, y) = F (x, y)
∂x∂y
and in the n-dimensional case
∂n
f (x1 , ..., xn ) = F (x1 , ..., xn )
∂x1 ...∂xn

15
• Theorem 4.1 - Expected value of a function g(X) defined over X:
If X is a discrete random variable and f (x) is the value of its probability
distribution at x, the expected value of g(X) is given by
X
E [g(X)] = g(x) · f (x)
x

Corresondingly, if X is a continuous random variable and f (x) is the value


of its probability density at x, the expected value of g(X) is given by
Z +∞
E [g(X)] = g(x) · f (x)dx
−∞

• Theorem 4.2 - Expected value of a linear function of X: If a and


b are constants, then
E(aX + b) = aE(X) + b

• Theorem 4.3 - Expected value of a linear combination of func-


tions, gi (X), defined over X: If c1 , c2 , ..., cn are constants, then
" n # n
X X
E ci gi (X) = ci E [gi (X)]
i=1 i=1

• Theorem 4.4 - Expected values of two random variables X and


Y : If X and Y are discrete random variables and f (x, y) is the value of
their joint probability distribution at (x, y), the expected value of g(X, Y )
is XX
E [g(X, Y )] = g(x, y) · f (x, y)
x y

Correspondingly, if X and Y are continuous random variables and f (x, y)


is the value of their joint probability density at (x, y), the expected value
of g(X, Y ) is
Z +∞ Z +∞
E [g(X, Y )] = g(x, y) · f (x, y)
−∞ −∞

• Theorem 4.5 - Expected value of a linear combination of k func-


tions defined over n discrete random variables: If c1 , c2 , ..., cn are
constants, then
" n # n
X X
E ci gi (X1 , X2 , ..., Xn ) = ci E [gi (X1 , X2 , ..., Xn )]
i=1 i=1

16
• Theorem 4.6 - Relationship between the variance of a random
variable X and the first and second moments about the origin of
X:
σ 2 = µ02 − µ2

• Theorem 4.7 - Variance of linear function of a random variable


X:
V (aX + b) = a2 σ 2

• Theorem 4.8A - Markov’s Inequality: If X is a random variable with


mean µ and a probability distribution function f (x) with f (x) = 0 for
x < 0, then for any positive constant a,
µ
P (X ≥ a) ≤
a

• Theorem 4.8B - Chebyshev’s Inequality: If µ and σ are the mean


and standard deviation, respectively, of the random variable X, then for
any positive constant k
1
P (|x − µ| < kσ) ≥ 1 − , σ 6= 0
k2

• Theorem 4.9 - Moment-generating function and moments of a


random variable X:
dr MX (t)
|t=0 = µ0r
dtr

• Theorem 4.10 - Moment-generating functions of linear functions


of a random variable X: If a and b are constants, then
 
– 1. MX+a (t) = E e(X+a)t = eat · MX (t);
– 2. MbX (t) = E(ebXt ) = MX (bt);
h X+a i
– 3. M X+a (t) = E e( b )t = e b t · MX
a t

b .
b

• Theorem 4.11 - Relationship between covariance and moments


about the origin:
σXY = µ01,1 − µX µY

• Theorem 4.12 - Covariance of independent random variables: If


X and Y are independent, then E(XY ) = E(X) · E(Y ) and σXY = 0.

17
• Theorem 4.13 - Expected value of product of independent ran-
dom variables: If X1 , X2 , ..., Xn are independent, then

E(X1 X2 · ... · Xn ) = E(X1 ) · E(X2 ) · ... · E(Xn )

• Theorem 4.14 - Mean and variance of linear combination of ran-


dom variables: If X1 , X2 , ..., Xn are random variables and
n
X
Y = ai Xi
i=1

where a1 , a2 , ..., an are constants, then


n
X
E(Y ) = ai E(Xi )
i=1

and
n
X XX
var(Y ) = a2i · var(Xi ) + 2 ai aj · cov(Xi Xj )
i=1 i<j

where the double summation extends over all values of i and j, from 1 to
n, for which i < j.
• Theorem 4.15 - Covariance of two linear combinations of random
variables: If X1 , X2 , ..., Xn are random variables and
n
X n
X
Y1 = ai Xi , Y2 = bi Xi
i=1 i=1

where a1 , a2 , ..., an and b1 , b2 , ..., bn are constants, then


n
X XX
cov(Y1 , Y2 ) = ai bi · var(Xi ) + (ai bj + aj bk ) · cov(Xi , Xj )
i=1 i<j

• Theorem 5.1 - Relationship between binomial probabilities of x


and n − x:
b(x; n, θ) = b(n − x; n, 1 − θ)

• Theorem 5.2 - Mean and variance of binomial distribution: The


mean and the variance of the binomail distribution are

µ = nθ , σ 2 = nθ(1 − θ)

18
• Theorem 5.3 - Mean and variance of X/n, where X has bino-
mial distribution with parameters n and θ: If X has a binomial
distribution with the parameters n and θ and Y = X/n, then

θ(1 − θ)
E(Y ) = θ , σY2 =
n

• Theorem 5.4 - Moment-generating function of binomial distri-


bution: The moment-generating function of the binomial distribution is
given by
2
MX (t) = 1 + θ(et − 1)


• Theorem 5.5 - Relationship between binomial and negative bi-


nomial distributions:
k
b∗ (x; k, θ) = · b(k; x, θ)
x

• Theorem 5.6 - Mean and variance of negative binomial distribu-


tions: The mean and the variance of the negative binomial distribution
are  
k 2 k 1
µ= , σ = −1
θ θ θ

• Theorem 5.7 - Mean and variance of hypergeometric distribu-


tion: The mean and the variance of the hypergeometric distribution are

nM nM (N − M )(N − n)
µ= , σ2 =
N N 2 (N − 1)

• Theorem 5.8 - Mean and variance of Poisson distribution: The


mean and the variance of the Poisson distribution are given by

µ=λ , σ2 = λ

• Theorem 5.9 - Moment-generating function of Poisson distribu-


tion: The moment-generating function of the Poisson distribution is given
by
t
MX (t) = eλ(e −1)

19
• Theorem 6.1 - Mean and variance of uniform density: The mean
and the variance of the uniform distribution are given by
α+β 1
µ= , σ2 = (β − α)2
2 12

• Theorem 6.2 - rth moment about origin of gamma distribution:


The rth moment about the origin of the gamma distribution is given by

β r Γ(α + r)
µ0r =
Γ(α)

• Theorem 6.3 - Mean and variance of gamma distribution: The


mean and the variance of the gamma distribution are

µ = αβ , σ 2 = αβ 2

• Theorem 6.4 - Moment-generating function of gamma distribu-


tion: The moment-generating function of the gamma distribution is given
by
MX (t) = (1 − βt)−α

• Theorem 6.5 - Mean and variance of beta distribution: The mean


and the variance of the beta distribution are given by
α αβ
µ= , σ2 =
α+β (α + β)2 (α + β + 1)

• Theorem 6.6 - Moment-generating function of normal distribu-


tion: The moment-generating function of the normal distribution is given
by
1 2 2
MX (t) = eµt+ 2 σ t

• Theorem 6.7 - Transforming random variable having normal dis-


tribution to random variable having bivariate normal distribu-
tion: If X has a normal distribution with the mean µ and the standard
deviation σ, then
X −µ
Z=
σ

20
• Theorem 6.8 - Normal approximation to binomial distribution: If
X is a random variable having a binomial distribution with the parameters
n and θ, then the moment-generating function of
X − nθ
Z=p
nθ(1 − θ)

approaches that of the standard normal distribution when n → ∞.


• Theorem 6.9 - Means and variances of conditional densities of
random variables having bivariate normal distribution: If X and
Y have a bivariate normal distribution, the conditional density of Y given
X = x is a normal distribution with the mean
σ2
µY |x = µ2 + ρ (x − µ1 )
σ1
and the variance
σY2 |x = σ22 (1 − ρ2 )
and the conditional density of X given Y = y is a normal distribution
with the mean
σ1
µX|y = µ1 + ρ (y − µ2 )
σ2
and the variance
2
σX|y = σ12 (1 − ρ2 )

• Theorem 6.10 - Condition for independence of random variables


having bivariate normal distribution: If two random variables have
a bivariate normal distribution, they are independent if and only if ρ = 0.

• Theorem 7.1 - Formula for transforming probability density of


random variable to that of function of the random variable: Let
f (x) be the value of the probability density of the continuous random
variable X at x. If the function given by y = u(x) is differentiable and
either increasing or decreasing for all values within the range of X for
which f (x) 6= 0, then, for these values of x, the equation y = u(x) can be
uniquely solved for x to give x = w(y), and for the corresponding values
of y the probability density of Y = u(X) is given by

g(y) = f [w(y)] · |w0 (y)|

provided u0 (x) = 0. Elsewhere, g(y) = 0.

• Theorem 7.2 - Formula for transforming the joint probability


density of two random variables to that of two functions of the
random variables: Let f (x1 , x2 ) be the value of the joint probability
density of the continuous random variables X1 and X2 at (x1 , x2 ). If

21
the functions given by y1 = u1 (x1 , x2 ) and y2 = u2 (x1 , x2 ) are partially
differentiable with respect both x1 and x2 and represent a one-to-one
transformation for all the values within the range of X1 and X2 for which
f (x1 , x2 ) 6= 0, then, for these values of x1 and x2 , the equations y1 =
u1 (x1 , x2 ) and y2 = u2 (x1 , x2 ) can be uniquely solved for x1 and x2 to
give x1 = w1 (y1 , y2 ) and y2 = w2 (y1 , y2 ), and for the corresponding values
of y1 and y2 , the joint probability density of Y1 = u1 (X1 , X2 ) and Y2 =
u2 (X1 , X2 ) is given by

g(y1 , y2 ) = f [w1 (y1 , y2 ), w2 (y1 , y2 )] · |J|

Here, J, called the Jacobian of the transformation, is the determinant


∂xi
J=
∂yj
Elsewhere, g(y1 , y2 ) = 0.
• Theorem 7.3 - Moment-generating function of sum of indepen-
dent random variables: If X1 ,X2 , ..., Xn are independent random
variables and Y = X1 + X2 + ... + Xn , then
n
Y
MY (t) = MXi (t)
i=1

where MXi (t) is the value of the moment-generating function of Xi at t.


• Theorem 8.1 - Mean and variance of X̄ for random sample from
infinite population: If X1 , X2 , ..., Xn constitute a random sample from
an infinite populations with the mean µ and the variance σ 2 , then
σ2
E(X̄) = µ , V (X̄) =
n

• Theorem 8.2 - Law of large numbers: For any positive constant c,


the probability that X̄ will take on a value between µ − c and µ + c is at
least
σ2
1− 2
nc
When n → ∞, this probability approaches 1.
• Theorem 8.3 - Central limit theorem: If X1 , X2 , ..., Xn constitute
a random sample from an infinite population with the mean µ and the
variance σ 2 , and the moment generating function MX (t), then the limiting
distribution of
X̄ − µ
Z= √
σ/ n
as n → ∞ is the standard normal distribution

22
• Theorem 8.4 - Distribution of X̄ for random sample from normal
population: If X̄ is the mean of a random sample of size n from a normal
population with the mean µ and the variance σ 2 , its sampling distribution
is a normal distribution with the mean µ and the variance σ 2 /n.
• Theorem 8.5 - Covariance of rth and sth values of random sample
from infinite population: If Xr and Xs are the rth and sth random
variables of a random sample of size n drawn from the finite population
{c1 , c2 , ..., cn }, then
σ2
C(Xr , Xs ) = −
N −1

• Theorem 8.6 - Mean and variance of X̄ for random sample from


finite population: If X̄ is the mean of a random sample of size n from
a finite population of size N with the mean µ and the variance σ 2 , then

σ2 N − n
E(X̄) = µ , V (X̄) = ·
n N −1

• Theorem 8.7 - Distribution of square of random variable having


standard normal distribution: If X has the standard normal distri-
bution, then X 2 has the chi-square distribution with ν = 1 degree of
freedom.
• Theorem 8.8 - Distribution of sum of squares of independent
random variables having standard normal distribution: If X1 ,
X2 , ..., Xn are independent random variables having standard normal
distributions, then
n
X
Y = Xi2
i=1

has the chi-square distribution with ν = n degrees of freedom

• Theorem 8.9 - Distribution of sum of independent random vari-


ables having chi-square distributions: If X1 , X2 , ..., Xn are indepen-
dent random variables having chi-square distributions with ν1 , ν2 , ..., νn
degrees of freedom, then
Xn
Y = Xi
i=1

has the chi-square distribution with ν1 + ν2 + ... + νn degrees of freedom.


• Theorem 8.10 - If two random variables are independent, and
the first and their sum have chi-square distributions, then the
second has a chi-square distribution: If X1 and X2 are independent
random variables, X1 has a chi-square distribution with ν1 degrees of

23
freedom, and X1 + X2 has a chi-square distribution with ν > ν1 degrees
of freedom, then X2 has a chi-square distribution with ν − ν1 degrees of
freedom.
• Theorem 8.11 - Joint distribution of mean and variance for ran-
dom sample from normal population: If X̄ and S 2 are the mean and
the variance of a random sample of size n from a normal population with
the mean µ and the standard deviation σ, then
– 1.: X̄ and S 2 are independent;
(n−1)S 2
– 2.: the random variable σ2 has a chi-square distribution with
n − 1 degrees of freedom
• Theorem 8.12 - Derivation of t distribution: If Y and Z are indepen-
dent variables, Y has a chi-square distribution with ν degrees of freedom,
and Z has the standard normal distribution, then the distribution of
T
T =p
Y /ν
is given by
− ν+1
Γ ν+1
 
2 t2 2

f (t) = √ ν
 · 1 + , t ∈ (−∞, ∞)
πνΓ 2 ν

and it is called the t distribution with ν degrees of freedom


• Theorem 8.13 - Distribution of standardized mean of random
sample from normal population: If X̄ and S 2 are the mean and the
variance of a random sample of size n from a normal population with the
mean µ and the variance σ 2 , then
X̄ − µ
T = √
S/ n
has the t distribution with n − 1 degrees of freedom.
• Theorem 8.14 - Derivation of F distribution: If U and V are inde-
pendent random variables having chi-square distributions with ν1 and ν2
degrees of freedom, then
U/ν1
F =
V /ν2
is a random variable having an F distribution, that is, a random variable
whose probability density is given by
   ν1 − 21 (ν1 +ν2 )
Γ ν1 +ν 2

2 ν1 2 ν1
−1 ν1
g(f ) = ·f 2 1+ f
Γ ν21 Γ ν22
 
ν2 ν2

for f > 0 and g(f ) = 0 elsewhere

24
• Theorem 8.15 - Distribution of ratio of variances of independent
random samples from normal populations, divided by respective
population variances: If S12 and S22 are the variances of independent
random samples of size n1 and n2 from normal populations with the vari-
ances σ12 and σ22 , then

S12 /σ12 σ2 S 2
F = 2 2 = 22 12
S2 /σ2 σ1 S2

is a random variable having an F distribution with n1 − 1 and n2 − 1


degrees of freedom.
• Theorem 8.16 - Formula for probability density of rth order
statistic: For random samples of size n from an infinite population that
has the value f (x) at x, the probability density of the rth order statistic
Yr , is given by
Z yr r−1 Z ∞ n−r
n!
gr (yr ) = f (x) dx f (yr ) f (x) dx
(r − 1)!(n − r)! −∞ yr

for yr ∈ (−∞, ∞).


• Theorem 8.17 - Large-sample approximation to sampling distri-
bution of median: For large n, the sampling distribution of the median
for random samples of size 2n + 1 is approximately normal with the mean
1
µ̄ and the variance 8[f (µ̄)] 2 .
n

25
4 Power Series
1
• 1−x for |x| < 1:

1 X
= 1 + x + x2 + x3 + ... = xj
1−x j=0

• ex for ∀x:

x2 x3 X xj
ex = 1 + x + + + ... =
2! 3! j=0
j!

• sin(x) for ∀x:



x3 x5 x7 X (−1)j x2j+1
sin(x) = x − + − + ... =
3! 5! 7! j=0
(2j + 1)!

• cos(x) for ∀x:



x2 x4 x6 X (−1)j x2j
cos(x) = 1 − + − + ... =
2! 4! 6! j=0
(2j)!

• (1 + x)P for |x| < 1:



P (P − 1) 2 P (P − 1)(P − 2) 3 X
(1 + x)P = 1 + P x + x + x + ... = (P
j )x
j
2! 3! j=0

• ln(1 − x) for |x| < 1:



x2 x3 x4 X xj+1
ln(1 − x) = x − + − + ... = (−1)j
2 3 4 j=0
j+1

26
5 Special Probability Distributions
5.1 The Discrete Uniform Distribution
• Distribution:
1
f (x) = x = x1 , x2 , ..., xk
k
• Mean:
k
X 1
µ= xi ·
i=1
k

• Variance:
k
X 1
σ2 = (xi − µ)2 ·
i=1
k

5.2 The Bernoulli Distribution


• Distribution:
f (x; θ) = θx (1 − θ)1−x x = 0, 1

5.3 The Binomial Distribution


• Distribution:
b(x; n, θ) = (nx ) θx (1 − θ)1−x x = 0, 1, 2, ..., n

• Mean:
µ = nθ
• Variance:
σ 2 = nθ(1 − θ)
• Moment-Generating Function:
n
MX (t) = 1 + θ(et − 1)


5.4 The Negative Binomial Distribution


Sometimes called the binomial waiting-time distribution or Pascal distribution:
• Distribution:
b∗ (x; k, θ) = x−1
θk (1 − θ)x−k

k−1 x = 0, 1, 2, ..., n

• Mean:
k
µ=
θ
• Variance:  
k 1
σ2 = −1
σ θ

27
5.5 The Geometric Distribution
• Distribution:
g(x; θ) = θ(1 − θ)x−1

5.6 The Hypergeometric Distribution


• Distribution:
N −M
M
 
x n−x
h(x; n, N, M ) =
(N
n)
x = 0, 1, 2, .., n, x ≤ M, n−x≤N −M

• Mean:
nM
µ=
N
• Variance:
nM (N − M )(N − n)
σ2 =
N 2 (N − 1)

5.7 The Poisson Distribution


• Distribution:
λx e−λ
p(x; λ) = x = 0, 1, 2, ...
x!
• Mean:
µ=λ

• Variance:
σ2 = λ

• Moment-Generating Function:
t
−1)
MX (t) = eλ(e

5.8 The Multinomial Distribution


• Distribution:
n
· θ1x1 · θ2x2 ... · θkxk

f (x1 , x2 , ..., xk ; n, θ1 , θ2 , ..., θk ) = x1 ,x2 ,...,xk

k
X k
X
xi = 0, 1, ..., n, xi = n, θi = 1
i=1 i=1

28
5.9 The Multivariate Hypergeometric Distribution
• Distribution:
(M M2 Mk
x1 )(x2 )...(xk )
1

f (x1 , x2 , ..., xk ; n, M1 , M2 , ..., Mk ) = N


(n )
k
X k
X
xi = 0, 1, ..., n, xi ≤ Mi , xi = n Mi = 1
i=1 i=1

29

You might also like