Probability and Statistics II - Study Notes
for CAT 2
Author: Manus AI
Date: August 4, 2025
Table of Contents
1. Basic PDF and PMF
2. Marginal and Joint Distributions
3. Exponential Random Variables
4. Uniform Random Variables
5. Bivariate Random Variables
6. Moment Generating Functions (MGF)
7. Transformation of Random Variables
8. Conditional Probability
9. Expected Values
10. Covariance and Correlation Coefficient
Basic PDF and PMF
Discrete Random Variables (PMF)
A random variable is a numerical description of the outcome of a statistical
experiment. A discrete random variable is one that can take on only a finite or
countable number of distinct values. The probability distribution of a discrete random
variable is described by its Probability Mass Function (PMF).
Definition: The Probability Mass Function (PMF) of a discrete random variable X,
denoted by P(X=x) or p(x), is a function that gives the probability that X takes on a
specific value x. The PMF must satisfy the following properties [1]:
1. 0 <= p(x) <= 1 for all values of x.
2. sum(p(x)) = 1 over all possible values of x.
Example: Consider the experiment of flipping a fair coin three times. Let X be the
number of tails that appear. The possible outcomes and the corresponding values of X
are:
Outcome X (Number of Tails)
HHH 0
HHT 1
HTH 1
THH 1
HTT 2
THT 2
TTH 2
TTT 3
The PMF for X is:
x P(X=x)
0 1/8
1 3/8
2 3/8
3 1/8
This satisfies the properties of a PMF: all probabilities are between 0 and 1, and their
sum is 1/8 + 3/8 + 3/8 + 1/8 = 8/8 = 1.
Continuous Random Variables (PDF)
A continuous random variable is one that can take any value within a given interval.
The probability distribution of a continuous random variable is described by its
Probability Density Function (PDF).
Definition: The Probability Density Function (PDF) of a continuous random variable X,
denoted by f(x), is a function such that the probability that X falls within a certain
interval [a, b] is given by the integral of f(x) over that interval. The PDF must satisfy the
following properties [1]:
1. f(x) >= 0 for all x.
2. integral(f(x)dx) from -infinity to +infinity = 1.
3. P(a <= X <= b) = integral(f(x)dx) from a to b.
Remarks: * For a continuous random variable, the probability of X taking on any single
specific value is 0, i.e., P(X = a) = 0 . This is because the integral from a to a is always
zero. * P(a <= X <= b) = P(a < X < b) = P(a <= X < b) = P(a < X <= b) .
Example: Let X be a continuous random variable with the PDF:
f(x) = 1/2 * x, for 0 <= x <= 2
f(x) = 0, otherwise
To verify this is a PDF: 1. f(x) >= 0 for 0 <= x <= 2 . 2. integral(1/2 * x dx) from
0 to 2 = [1/4 * x^2] from 0 to 2 = 1/4 * (2^2) - 1/4 * (0^2) = 1 - 0 = 1 .
To compute P(0 <= X < 1) : P(0 <= X < 1) = integral(1/2 * x dx) from 0 to 1 =
[1/4 * x^2] from 0 to 1 = 1/4 * (1^2) - 1/4 * (0^2) = 1/4 .
Cumulative Distribution Function (CDF)
The Cumulative Distribution Function (CDF), denoted by F(x), gives the probability that
a random variable X takes on a value less than or equal to x. It is applicable to both
discrete and continuous random variables.
Definition:
For a discrete random variable X: F(x) = P(X <= x) = sum(p(t)) for all t <=
x.
For a continuous random variable X: F(x) = P(X <= x) = integral(f(t)dt)
from -infinity to x.
Properties of any CDF [1]: 1. lim(x-> -infinity) F(x) = 0 2. lim(x-> +infinity)
F(x) = 1 3. F(x) is a non-decreasing function. 4. F(x) is a right-continuous
function.
Relationship between PDF/PMF and CDF: * For a continuous random variable, f(x)
= d/dx F(x) . * For a discrete random variable, p(x) = F(x) - F(x-1) .
Example (Discrete CDF): Let X be a discrete random variable with PMF:
f(x) = (1/20) * (1 + x), for x = 1, 2, 3, 4, 5
f(x) = 0, otherwise
To find the CDF, F(x):
For x < 1 , F(x) = 0 .
For x = 1 , F(1) = P(X=1) = (1/20)*(1+1) = 2/20 = 1/10 .
For x = 2 , F(2) = P(X=1) + P(X=2) = 2/20 + (1/20)*(1+2) = 2/20 + 3/20
= 5/20 = 1/4 .
And so on. The general form is F(x) = x(x+3)/40 for x = 1, 2, 3, 4, 5 .
For x > 5 , F(x) = 1 .
Example (Continuous CDF): Suppose X is a continuous random variable whose PDF
f(x) is given by:
f(x) = 1/2 * x, for 0 <= x <= 2
f(x) = 0, otherwise
To obtain the CDF, F(x):
For x < 0 , F(x) = 0 .
For 0 <= x <= 2 , F(x) = integral(1/2 * t dt) from 0 to x = [1/4 * t^2]
from 0 to x = 1/4 * x^2 .
For x > 2 , F(x) = integral(1/2 * t dt) from 0 to 2 = 1 .
So, the CDF is:
F(x) = 0, for x < 0
F(x) = x^2/4, for 0 <= x <= 2
F(x) = 1, for x > 2
[1] LectureNotesCombined_231026_194010.pdf, pages 1-21
Marginal and Joint Distributions
When dealing with multiple random variables, we often need to understand their joint
behavior as well as the individual behavior of each variable. This is where joint and
marginal distributions come into play.
Joint Probability Mass Function (JPMF) for Discrete Random Variables
For two discrete random variables X and Y, their joint probability mass function,
denoted by P(X=x, Y=y) or f(x, y) , gives the probability that X takes on a specific
value x AND Y takes on a specific value y simultaneously.
Properties of JPMF: 1. 0 <= f(x, y) <= 1 for all pairs (x, y). 2. sum(sum(f(x, y)))
= 1 over all possible pairs (x, y).
Marginal Probability Mass Function (MPMF) for Discrete Random
Variables
The marginal PMF of a discrete random variable (say, X) from a joint distribution of X
and Y is the probability distribution of X alone, ignoring the values of Y. It is obtained by
summing the joint probabilities over all possible values of the other variable (Y).
Definition: * f_X(x) = P(X=x) = sum(f(x, y)) over all possible y. * f_Y(y) =
P(Y=y) = sum(f(x, y)) over all possible x.
Example (Discrete Joint and Marginal PMFs): Consider the joint probability
distribution of X (deductible on home-owners' insurance) and Y (deductible on auto-
mobile insurance) from the provided Ass1g.pdf [1]:
X\Y 1000 500 100 Marginal P(X=x)
100 0.05 0.10 0.15 0.30
500 0.10 0.20 0.05 0.35
1000 0.20 0.10 0.05 0.35
Marginal P(Y=y) 0.35 0.40 0.25 1.00
From this table: * f(100, 1000) = 0.05 * Marginal PMF for X: * P(X=100) = 0.05 +
0.10 + 0.15 = 0.30 * P(X=500) = 0.10 + 0.20 + 0.05 = 0.35 * P(X=1000) =
0.20 + 0.10 + 0.05 = 0.35 * Marginal PMF for Y: * P(Y=1000) = 0.05 + 0.10 +
0.20 = 0.35 * P(Y=500) = 0.10 + 0.20 + 0.10 = 0.40 * P(Y=100) = 0.15 + 0.05
+ 0.05 = 0.25
Joint Probability Density Function (JPDF) for Continuous Random
Variables
For two continuous random variables X and Y, their joint probability density function,
denoted by f(x, y) , is a function such that the probability that X falls within an
interval [x1, x2] AND Y falls within an interval [y1, y2] is given by the double integral of
f(x, y) over that region.
Properties of JPDF [2]: 1. f(x, y) >= 0 for all (x, y). 2. double_integral(f(x, y)
dx dy) over all x and y = 1. 3. P(a <= X <= b, c <= Y <= d) = integral_a^b
integral_c^d f(x, y) dy dx .
Marginal Probability Density Function (MPDF) for Continuous Random
Variables
The marginal PDF of a continuous random variable (say, X) from a joint distribution of
X and Y is the probability distribution of X alone, ignoring the values of Y. It is obtained
by integrating the joint PDF over all possible values of the other variable (Y).
Definition [2]: * f_X(x) = integral(f(x, y) dy) over all possible y. * f_Y(y) =
integral(f(x, y) dx) over all possible x.
Example (Continuous Joint and Marginal PDFs): Let the joint density function of X
and Y be given by [2]:
f(x, y) = 2, for 0 < y < 1 - x; 0 < x < 1
f(x, y) = 0, elsewhere
To find the marginal density of X: f_X(x) = integral_0^(1-x) 2 dy = [2y]_0^(1-x)
= 2(1-x) for 0 < x < 1 .
To find the marginal density of Y: f_Y(y) = integral_0^(1-y) 2 dx = [2x]_0^(1-y)
= 2(1-y) for 0 < y < 1 .
[1] Ass1g.pdf, page 1 [2] ContinuousBivariateDistribution.pdf, page 11
Exponential Random Variables
The exponential distribution is a continuous probability distribution that describes the
time between events in a Poisson process, i.e., a process in which events occur
continuously and independently at a constant average rate. It is often used to model
the lifetime of electronic components, the time between arrivals at a service station, or
the time between machine breakdowns [1].
Definition and PDF
A continuous random variable X is said to have an exponential distribution with
parameter λ (lambda), where λ > 0, if its Probability Density Function (PDF) is given by
[1]:
f(x) = λe^(-λx), for x > 0
f(x) = 0, otherwise
Here, λ is the rate parameter, representing the average number of events per unit of
time. The reciprocal of λ, 1/λ , represents the mean time between events.
Mean and Variance
For an exponential random variable X with parameter λ:
Mean (Expected Value): E(X) = 1/λ [1]
Derivation: E(X) = integral_0^infinity x * f(x) dx =
integral_0^infinity x * λe^(-λx) dx Using integration by parts ( u = x , dv
= λe^(-λx) dx ): E(X) = [-xe^(-λx)]_0^infinity - integral_0^infinity -
e^(-λx) dx E(X) = (0 - 0) + [-1/λ * e^(-λx)]_0^infinity E(X) = 0 + (0
- (-1/λ)) = 1/λ
Variance: Var(X) = 1/λ^2 [1]
Derivation: Var(X) = E(X^2) - [E(X)]^2 First, find E(X^2) =
integral_0^infinity x^2 * λe^(-λx) dx Using integration by parts ( u = x^2 ,
dv = λe^(-λx) dx ): E(X^2) = [-x^2e^(-λx)]_0^infinity -
integral_0^infinity -2xe^(-λx) dx E(X^2) = (0 - 0) + 2 *
integral_0^infinity xe^(-λx) dx We know integral_0^infinity xe^(-λx)
dx = 1/λ^2 (from the derivation of E(X) without the leading λ). So, E(X^2) = 2
* (1/λ^2) = 2/λ^2 Therefore, Var(X) = 2/λ^2 - (1/λ)^2 = 2/λ^2 - 1/λ^2 =
1/λ^2
Moment Generating Function (MGF)
The Moment Generating Function (MGF) of an exponential random variable X with
parameter λ is given by [1]:
M_X(t) = λ / (λ - t) , for t < λ
Derivation: M_X(t) = E(e^(tX)) = integral_0^infinity e^(tx) * λe^(-λx) dx
M_X(t) = integral_0^infinity λe^((t-λ)x) dx M_X(t) = [λ / (t-λ) * e^((t-
λ)x)]_0^infinity For the integral to converge, t-λ must be negative, i.e., t < λ . In
this case, e^((t-λ)x) approaches 0 as x approaches infinity. M_X(t) = 0 - (λ /
(t-λ) * e^0) = -λ / (t-λ) = λ / (λ-t)
Using MGF to find Mean and Variance:
Mean: E(X) = M_X'(0) M_X'(t) = d/dt [λ(λ-t)^(-1)] = λ * (-1) * (λ-
t)^(-2) * (-1) = λ(λ-t)^(-2) = λ / (λ-t)^2 E(X) = M_X'(0) = λ /
(λ-0)^2 = λ / λ^2 = 1/λ
Variance: Var(X) = M_X''(0) - [M_X'(0)]^2 M_X''(t) = d/dt [λ(λ-
t)^(-2)] = λ * (-2) * (λ-t)^(-3) * (-1) = 2λ(λ-t)^(-3) = 2λ / (λ-t)^3
M_X''(0) = 2λ / (λ-0)^3 = 2λ / λ^3 = 2/λ^2 Var(X) = 2/λ^2 - (1/λ)^2 =
2/λ^2 - 1/λ^2 = 1/λ^2
Example: If jobs arrive every 15 seconds on average, what is the probability of waiting
less than or equal to 30 seconds (0.5 minutes)?
Given average arrival time is 15 seconds, so 1/λ = 15 seconds = 0.25 minutes .
Thus, λ = 1/0.25 = 4 per minute. We want to find P(T <= 0.5) : P(T <= 0.5) =
integral_0^0.5 4e^(-4t) dt = [-e^(-4t)]_0^0.5 = -e^(-4*0.5) - (-e^0) = -
e^(-2) + 1 = 1 - e^(-2) approx 0.8647
[1] LectureNotesCombined_231026_194010.pdf, pages 121-135
Uniform Random Variables
The uniform distribution, also known as the rectangular distribution, is a continuous
probability distribution where all outcomes within a given interval are equally likely.
This means that the probability density function is constant over the interval and zero
elsewhere [1].
Definition and PDF
A continuous random variable X has a uniform distribution over the interval [a, b] ,
denoted as X ~ U(a, b) , if its Probability Density Function (PDF) is given by [1]:
f(x) = 1 / (b - a), for a <= x <= b
f(x) = 0, otherwise
Here, a is the minimum value and b is the maximum value that the random variable
can take.
Mean and Variance
For a uniform random variable X over the interval [a, b] :
Mean (Expected Value): E(X) = (a + b) / 2 [1]
Derivation: E(X) = integral_a^b x * f(x) dx = integral_a^b x * (1 / (b
- a)) dx E(X) = (1 / (b - a)) * [x^2 / 2]_a^b E(X) = (1 / (b - a)) *
(b^2 / 2 - a^2 / 2) E(X) = (1 / (b - a)) * (b^2 - a^2) / 2 E(X) = (1
/ (b - a)) * ((b - a)(b + a)) / 2 E(X) = (a + b) / 2
Variance: Var(X) = (b - a)^2 / 12 [1]
Derivation: Var(X) = E(X^2) - [E(X)]^2 First, find E(X^2) = integral_a^b
x^2 * f(x) dx = integral_a^b x^2 * (1 / (b - a)) dx E(X^2) = (1 / (b
- a)) * [x^3 / 3]_a^b E(X^2) = (1 / (b - a)) * (b^3 / 3 - a^3 / 3)
E(X^2) = (b^3 - a^3) / (3(b - a)) E(X^2) = ((b - a)(b^2 + ab + a^2))
/ (3(b - a)) E(X^2) = (b^2 + ab + a^2) / 3
Now, substitute into the variance formula: Var(X) = (b^2 + ab + a^2) / 3 -
((a + b) / 2)^2 Var(X) = (b^2 + ab + a^2) / 3 - (a^2 + 2ab + b^2) / 4
Var(X) = (4(b^2 + ab + a^2) - 3(a^2 + 2ab + b^2)) / 12 Var(X) = (4b^2
+ 4ab + 4a^2 - 3a^2 - 6ab - 3b^2) / 12 Var(X) = (b^2 - 2ab + a^2) /
12 Var(X) = (b - a)^2 / 12
Moment Generating Function (MGF)
The Moment Generating Function (MGF) of a uniform random variable X over the
interval [a, b] is given by [1]:
M_X(t) = (e^(tb) - e^(ta)) / (t(b - a)) , for t != 0 M_X(t) = 1 , for t = 0
Derivation: M_X(t) = E(e^(tX)) = integral_a^b e^(tx) * f(x) dx =
integral_a^b e^(tx) * (1 / (b - a)) dx M_X(t) = (1 / (b - a)) * [e^(tx) /
t]_a^b M_X(t) = (1 / (b - a)) * (e^(tb) / t - e^(ta) / t) M_X(t) =
(e^(tb) - e^(ta)) / (t(b - a))
Example: Assume the time of arrival is uniformly distributed on the interval from 12:00
noon to 12:30 p.m. (0 to 30 minutes). Find the probability that Joseph waits at least 15
minutes for the bus to arrive.
Here, a = 0 and b = 30 . The PDF is f(t) = 1 / (30 - 0) = 1/30 for 0 <= t <=
30 . We want to find P(t > 15) : P(t > 15) = integral_15^30 (1/30) dt =
[t/30]_15^30 = 30/30 - 15/30 = 1 - 1/2 = 1/2
[1] Uniformdistribution(1).pdf, pages 1-3
Bivariate Random Variables
A bivariate random variable is a pair of random variables, say (X, Y), that are defined on
the same probability space. Understanding their joint behavior is crucial in many
statistical analyses. We have already introduced the concepts of Joint and Marginal
Distributions in the 'Marginal and Joint Distributions' section. Here, we will briefly
recap and then delve into conditional distributions.
Joint and Marginal Distributions (Recap)
As discussed previously, the joint probability distribution (either PMF for discrete or
PDF for continuous) describes the probability of X and Y taking on specific values or
falling within specific ranges simultaneously. The marginal distributions describe the
probability distribution of each variable independently, obtained by summing (for
discrete) or integrating (for continuous) over the values of the other variable.
Conditional Distributions
Conditional distributions describe the probability distribution of one random variable
given that the other random variable has taken on a specific value. This is a
fundamental concept for understanding the relationship between two random
variables.
Conditional Probability Mass Function (CPMF) for Discrete Random Variables
For discrete random variables X and Y, the conditional PMF of Y given X=x is defined as:
P(Y=y | X=x) = P(X=x, Y=y) / P(X=x) = f(x, y) / f_X(x) , provided f_X(x) >
0.
Similarly, the conditional PMF of X given Y=y is:
P(X=x | Y=y) = P(X=x, Y=y) / P(Y=y) = f(x, y) / f_Y(y) , provided f_Y(y) >
0.
Example (Discrete Conditional PMF): Consider the joint probability distribution of X
and Y from the 'Marginal and Joint Distributions' section:
X\Y 1000 500 100 Marginal P(X=x)
100 0.05 0.10 0.15 0.30
500 0.10 0.20 0.05 0.35
1000 0.20 0.10 0.05 0.35
Marginal P(Y=y) 0.35 0.40 0.25 1.00
Let's find the conditional PMF of Y given X=100: P(Y=y | X=100) = f(100, y) /
f_X(100) We know f_X(100) = 0.30 .
P(Y=1000 | X=100) = f(100, 1000) / 0.30 = 0.05 / 0.30 = 1/6
P(Y=500 | X=100) = f(100, 500) / 0.30 = 0.10 / 0.30 = 1/3
P(Y=100 | X=100) = f(100, 100) / 0.30 = 0.15 / 0.30 = 1/2
Conditional Probability Density Function (CPDF) for Continuous Random
Variables
For continuous random variables X and Y, the conditional PDF of Y given X=x is defined
as [1]:
f(y | x) = f(x, y) / f_X(x) , provided f_X(x) > 0 .
Similarly, the conditional PDF of X given Y=y is:
f(x | y) = f(x, y) / f_Y(y) , provided f_Y(y) > 0 .
Example (Continuous Conditional PDF): Let the joint density function of X and Y be
given by [1]:
f(x, y) = x + y, for 0 < x < 1, 0 < y < 1
f(x, y) = 0, elsewhere
First, find the marginal PDFs: f_X(x) = integral_0^1 (x + y) dy = [xy +
y^2/2]_0^1 = x + 1/2 = (2x + 1) / 2 , for 0 < x < 1 . f_Y(y) = integral_0^1
(x + y) dx = [x^2/2 + xy]_0^1 = 1/2 + y = (1 + 2y) / 2 , for 0 < y < 1 .
Now, find the conditional PDF of Y given X=x: f(y | x) = f(x, y) / f_X(x) = (x +
y) / ((2x + 1) / 2) = 2(x + y) / (2x + 1) , for 0 < y < 1 .
And the conditional PDF of X given Y=y: f(x | y) = f(x, y) / f_Y(y) = (x + y) /
((1 + 2y) / 2) = 2(x + y) / (1 + 2y) , for 0 < x < 1 .
[1] ContinuousBivariateDistribution.pdf, pages 7-9
Moment Generating Functions (MGF)
The Moment Generating Function (MGF) is a powerful tool in probability theory for
characterizing probability distributions and deriving their moments (like mean and
variance). It provides an alternative way to define a distribution and can simplify
calculations for moments, especially for complex distributions.
Definition and Properties
The Moment Generating Function (MGF) of a random variable X, denoted by M_X(t) , is
defined as the expected value of e^(tX) for any real number t for which the
expectation exists [1]:
M_X(t) = E[e^(tX)]
For a discrete random variable X: M_X(t) = sum(e^(tx) * P(X=x)) over all
possible values of x.
For a continuous random variable X: M_X(t) = integral(e^(tx) * f(x) dx)
from -infinity to +infinity.
Key Properties of MGF: 1. Uniqueness: If two random variables have the same MGF,
then they have the same probability distribution. 2. Moments: The moments of a
random variable can be obtained by differentiating its MGF and evaluating at t=0 . *
The k-th moment about the origin, E(X^k) , is given by the k-th derivative of M_X(t)
evaluated at t=0 : E(X^k) = M_X^(k)(0) = d^k/dt^k M_X(t) |_(t=0)
Using MGF to find Moments, Mean, and Variance
As mentioned above, the MGF is particularly useful for finding the moments of a
random variable.
First Moment (Mean): E(X) = M_X'(0)
Second Moment: E(X^2) = M_X''(0)
Variance: Once E(X) and E(X^2) are found, the variance can be calculated
using the formula: Var(X) = E(X^2) - [E(X)]^2 Therefore, Var(X) =
M_X''(0) - [M_X'(0)]^2
Example (Discrete MGF): Find the MGF of a random variable X whose PMF is given by
[1]:
P(X = x) = (1/6) * (5/6)^x, for x = 0, 1, 2, 3, ...
P(X = x) = 0, otherwise
This is a geometric distribution with p = 1/6 .
M_X(t) = E(e^(tX)) = sum_(x=0)^infinity e^(tx) * (1/6) * (5/6)^x M_X(t) =
(1/6) * sum_(x=0)^infinity (e^t * 5/6)^x This is a geometric series with a = 1
and r = (5/6)e^t . The sum converges if |r| < 1 , i.e., |(5/6)e^t| < 1 , which
implies e^t < 6/5 or t < ln(6/5) .
M_X(t) = (1/6) * (1 / (1 - (5/6)e^t)) = 1 / (6 - 5e^t)
Now, let's find the mean and variance using this MGF:
Mean: M_X'(t) = d/dt [ (6 - 5e^t)^(-1) ] = -1 * (6 - 5e^t)^(-2) *
(-5e^t) = 5e^t / (6 - 5e^t)^2 E(X) = M_X'(0) = 5e^0 / (6 - 5e^0)^2 =
5 / (6 - 5)^2 = 5 / 1^2 = 5
Second Moment: M_X''(t) = d/dt [ 5e^t * (6 - 5e^t)^(-2) ] Using the
product rule: u = 5e^t , v = (6 - 5e^t)^(-2) u' = 5e^t v' = -2 * (6 -
5e^t)^(-3) * (-5e^t) = 10e^t * (6 - 5e^t)^(-3) M_X''(t) = u'v + uv'
M_X''(t) = 5e^t * (6 - 5e^t)^(-2) + 5e^t * 10e^t * (6 - 5e^t)^(-3)
M_X''(t) = 5e^t / (6 - 5e^t)^2 + 50e^(2t) / (6 - 5e^t)^3 E(X^2) =
M_X''(0) = 5e^0 / (6 - 5e^0)^2 + 50e^0 / (6 - 5e^0)^3 E(X^2) = 5 /
1^2 + 50 / 1^3 = 5 + 50 = 55
Variance: Var(X) = E(X^2) - [E(X)]^2 = 55 - 5^2 = 55 - 25 = 30
Example (Continuous MGF): Let X have the density function f(x) = e^(-x) , where
0 <= x <= infinity . Calculate the MGF of X and find the mean and variance [1].
M_X(t) = E(e^(tX)) = integral_0^infinity e^(tx) * e^(-x) dx =
integral_0^infinity e^((t-1)x) dx
For the integral to converge, t-1 must be negative, i.e., t < 1 .
M_X(t) = [e^((t-1)x) / (t-1)]_0^infinity = 0 - (1 / (t-1)) = 1 / (1-t)
Now, let's find the mean and variance using this MGF:
Mean: M_X'(t) = d/dt [ (1-t)^(-1) ] = -1 * (1-t)^(-2) * (-1) = (1-
t)^(-2) = 1 / (1-t)^2 E(X) = M_X'(0) = 1 / (1-0)^2 = 1 / 1^2 = 1
Second Moment: M_X''(t) = d/dt [ (1-t)^(-2) ] = -2 * (1-t)^(-3) *
(-1) = 2 * (1-t)^(-3) = 2 / (1-t)^3 E(X^2) = M_X''(0) = 2 / (1-0)^3 =
2 / 1^3 = 2
Variance: Var(X) = E(X^2) - [E(X)]^2 = 2 - 1^2 = 2 - 1 = 1
[1] LectureNotesCombined_231026_194010.pdf, pages 22-42
Transformation of Random Variables
In probability and statistics, it is often necessary to find the probability distribution of
a function of one or more random variables. This process is known as the
transformation of random variables. The method used depends on whether the
transformation is for a single random variable or multiple random variables.
Single Variable Transformation
If Y is a function of a single random variable X, say Y = g(X) , and we know the PDF (or
PMF) of X, we can find the PDF (or PMF) of Y.
For Discrete Random Variables:
If X is a discrete random variable with PMF P(X=x) , and Y = g(X) is a one-to-one
transformation, then the PMF of Y is simply:
P(Y=y) = P(X=g^(-1)(y))
If g(X) is not one-to-one, then P(Y=y) is the sum of P(X=x) for all x such that g(x)
= y.
Example (Discrete Transformation): Let X be a discrete random variable with PMF:
x P(X=x)
1 0.2
2 0.3
3 0.5
Let Y = X^2 . The possible values for Y are 1^2=1 , 2^2=4 , 3^2=9 .
P(Y=1) = P(X=1) = 0.2
P(Y=4) = P(X=2) = 0.3
P(Y=9) = P(X=3) = 0.5
So the PMF of Y is:
y P(Y=y)
1 0.2
4 0.3
9 0.5
For Continuous Random Variables:
If X is a continuous random variable with PDF f_X(x) , and Y = g(X) is a one-to-one
and differentiable transformation, then the PDF of Y, f_Y(y) , can be found using the
formula:
f_Y(y) = f_X(g^(-1)(y)) * |d/dy g^(-1)(y)|
This is also known as the change of variable technique. The term |d/dy g^(-1)(y)| is
the absolute value of the derivative of the inverse function, which accounts for the
scaling of the probability density.
Example (Continuous Transformation): Let X have the probability density function
f(x) = 3(1 - x)^2 , for 0 < x < 1 [1]. Find the PDF of Y = (1 - X)^2 .
First, find the inverse function X = g^(-1)(Y) : Y = (1 - X)^2 sqrt(Y) = 1 - X
(Since 0 < x < 1 , 1-x is positive, so we take the positive square root) X = 1 -
sqrt(Y) So, g^(-1)(y) = 1 - sqrt(y) .
Next, find the derivative of the inverse function with respect to y: d/dy g^(-1)(y) =
d/dy (1 - y^(1/2)) = -1/2 * y^(-1/2) = -1 / (2 * sqrt(y))
Now, apply the formula for f_Y(y) : f_Y(y) = f_X(1 - sqrt(y)) * |-1 / (2 *
sqrt(y))| f_Y(y) = 3(1 - (1 - sqrt(y)))^2 * (1 / (2 * sqrt(y))) f_Y(y) =
3(sqrt(y))^2 * (1 / (2 * sqrt(y))) f_Y(y) = 3y * (1 / (2 * sqrt(y)))
f_Y(y) = (3/2) * sqrt(y) , for 0 < y < 1 .
To determine the range of Y: When X = 0 , Y = (1 - 0)^2 = 1 . When X = 1 , Y = (1
- 1)^2 = 0 . So, the range of Y is 0 < Y < 1 .
Multiple Variable Transformation (Jacobian)
When transforming multiple random variables, say from (X1, X2) to (Y1, Y2) ,
where Y1 = g1(X1, X2) and Y2 = g2(X1, X2) , we use the Jacobian of the
transformation.
If (X1, X2) are continuous random variables with joint PDF f_X1,X2(x1, x2) , and
the transformation to (Y1, Y2) is one-to-one and differentiable, then the joint PDF of
(Y1, Y2) , f_Y1,Y2(y1, y2) , is given by:
f_Y1,Y2(y1, y2) = f_X1,X2(h1(y1, y2), h2(y1, y2)) * |J|
Where h1 and h2 are the inverse functions such that x1 = h1(y1, y2) and x2 =
h2(y1, y2) , and J is the Jacobian determinant of the transformation, defined as:
J = det([[dx1/dy1, dx1/dy2], [dx2/dy1, dx2/dy2]])
J = (dx1/dy1 * dx2/dy2) - (dx1/dy2 * dx2/dy1)
Example (Bivariate Transformation with Jacobian): Let X1 and X2 be independent
exponential random variables with rate λ . Their joint PDF is f_X1,X2(x1, x2) = λ^2
* e^(-λ(x1 + x2)) for x1 > 0, x2 > 0 .
Let Y1 = X1 + X2 and Y2 = X1 / (X1 + X2) . We want to find the joint PDF of (Y1,
Y2) .
First, find the inverse transformations: From Y1 = X1 + X2 , we have X2 = Y1 - X1 .
Substitute X2 into Y2 = X1 / (X1 + X2) : Y2 = X1 / Y1 => X1 = Y1 * Y2 Now,
substitute X1 back into X2 = Y1 - X1 : X2 = Y1 - Y1 * Y2 = Y1 * (1 - Y2)
So, x1 = y1 * y2 and x2 = y1 * (1 - y2) .
Next, calculate the partial derivatives for the Jacobian: dx1/dy1 = y2 dx1/dy2 = y1
dx2/dy1 = 1 - y2 dx2/dy2 = -y1
Now, compute the Jacobian determinant: J = (y2 * (-y1)) - (y1 * (1 - y2)) J
= -y1 * y2 - y1 + y1 * y2 J = -y1
The absolute value of the Jacobian is |J| = |-y1| = y1 (since y1 = x1 + x2 > 0 ).
Finally, substitute into the formula for f_Y1,Y2(y1, y2) : f_Y1,Y2(y1, y2) =
f_X1,X2(y1 * y2, y1 * (1 - y2)) * |J| f_Y1,Y2(y1, y2) = λ^2 * e^(-λ(y1 *
y2 + y1 * (1 - y2))) * y1 f_Y1,Y2(y1, y2) = λ^2 * e^(-λ(y1 * y2 + y1 - y1
* y2)) * y1 f_Y1,Y2(y1, y2) = λ^2 * e^(-λy1) * y1
Determine the range of Y1 and Y2 : Since x1 > 0 and x2 > 0 : Y1 = x1 + x2 > 0
Y2 = x1 / (x1 + x2) . Since x1 > 0 and x1 + x2 > x1 , 0 < Y2 < 1 .
So, the joint PDF of (Y1, Y2) is: f_Y1,Y2(y1, y2) = λ^2 * y1 * e^(-λy1) , for y1
> 0, 0 < y2 < 1 . f_Y1,Y2(y1, y2) = 0 , otherwise.
[1] Ass1g.pdf, page 2
Conditional Probability
Conditional probability is a measure of the probability of an event occurring given that
another event has already occurred. In the context of random variables, this extends to
conditional probability distributions and conditional expectations.
Conditional PMF/PDF
As discussed in the 'Bivariate Random Variables' section, conditional probability mass
functions (for discrete variables) and conditional probability density functions (for
continuous variables) describe the probability distribution of one random variable
given the value of another.
For Discrete Random Variables: The conditional PMF of Y given X=x is: P(Y=y |
X=x) = f(x, y) / f_X(x) , provided f_X(x) > 0 .
For Continuous Random Variables: The conditional PDF of Y given X=x is: f(y |
x) = f(x, y) / f_X(x) , provided f_X(x) > 0 .
These definitions are crucial for understanding how the outcome of one variable
influences the probabilities of another.
Conditional Expectation
The conditional expectation of a random variable is its expected value given that some
other random variable has taken on a specific value. It is a function of the conditioning
variable.
For Discrete Random Variables: The conditional expectation of Y given X=x is:
E(Y | X=x) = sum(y * P(Y=y | X=x)) over all possible values of y.
For Continuous Random Variables: The conditional expectation of Y given X=x
is: E(Y | X=x) = integral(y * f(y | x) dy) over the range of y.
Properties of Conditional Expectation: 1. E(c | X) = c for any constant c . 2.
E(g(X) | X) = g(X) for any function g . 3. E(aY + bZ | X) = aE(Y | X) + bE(Z |
X) for constants a, b . 4. E(g(X)Y | X) = g(X)E(Y | X) . 5. Law of Total
Expectation (or Law of Iterated Expectations): E(Y) = E[E(Y | X)] . This property
is particularly important as it allows us to compute the overall expected value of a
random variable by first conditioning on another variable and then taking the
expectation over the conditioning variable.
Example (Conditional Expectation - Continuous): Let the joint p.d.f of X and Y be
given by f(x,y) = 8xy , for 0 <= x <= 1, 0 <= y <= x [1]. Find the conditional
expected value of Y given X, E(Y | X) .
First, find the marginal PDF of X: f_X(x) = integral_0^x 8xy dy = [4xy^2]_0^x =
4x(x^2) = 4x^3 , for 0 <= x <= 1 .
Next, find the conditional PDF of Y given X=x: f(y | x) = f(x, y) / f_X(x) = 8xy /
(4x^3) = 2y / x^2 , for 0 <= y <= x .
Now, calculate the conditional expectation of Y given X=x: E(Y | X=x) =
integral_0^x y * f(y | x) dy = integral_0^x y * (2y / x^2) dy E(Y | X=x) =
integral_0^x (2y^2 / x^2) dy = (2 / x^2) * [y^3 / 3]_0^x E(Y | X=x) = (2 /
x^2) * (x^3 / 3) = 2x / 3
So, E(Y | X) = 2X / 3 .
[1] Ass1g.pdf, page 2
Expected Values
The expected value (or expectation, or mean) of a random variable is a fundamental
concept in probability theory. It represents the average value of the random variable
over a large number of trials or observations. It is a measure of the central tendency of
the distribution.
Definition for Discrete Random Variables
Let X be a discrete random variable with probability mass function (PMF) p(x) . The
expected value of X, denoted as E(X) or μ , is given by [1]:
E(X) = sum(x * p(x)) over all possible values of x.
Example (Discrete Expected Value): Given a probability distribution of X as below [1]:
x P(X=x)
0 1/8
1 1/4
2 3/8
3 1/4
E(X) = (0 * 1/8) + (1 * 1/4) + (2 * 3/8) + (3 * 1/4) E(X) = 0 + 1/4 + 6/8
+ 3/4 E(X) = 0 + 2/8 + 6/8 + 6/8 = 14/8 = 7/4 = 1.75
Definition for Continuous Random Variables
Let X be a continuous random variable with probability density function (PDF) f(x) .
The expected value of X, denoted as E(X) or μ , is given by [1]:
E(X) = integral(x * f(x) dx) from -infinity to +infinity.
Example (Continuous Expected Value): For an exponential random variable X with
parameter λ , the PDF is f(x) = λe^(-λx) for x > 0 . The expected value is [2]:
E(X) = integral_0^infinity x * λe^(-λx) dx
As derived in the 'Exponential Random Variables' section, E(X) = 1/λ .
Properties of Expectation
The expected value has several important properties that simplify calculations and
theoretical derivations [1]:
1. Expected value of a constant: If c is a constant, then E(c) = c .
2. Linearity of Expectation: For any random variables X and Y, and constants a
and b : E(aX + bY) = aE(X) + bE(Y) This property extends to any finite
number of random variables: E(sum(a_i * X_i)) = sum(a_i * E(X_i))
3. Expected value of a function of a random variable:
For discrete X: If g(X) is a function of X, then E(g(X)) = sum(g(x) *
p(x)) .
For continuous X: If g(X) is a function of X, then E(g(X)) =
integral(g(x) * f(x) dx) .
4. Expected value of a product of independent random variables: If X and Y are
independent random variables, then E(XY) = E(X)E(Y) .
[1] Discreteprobabilitydistributions.pdf, pages 4-5 [2]
LectureNotesCombined_231026_194010.pdf, page 123
Covariance and Correlation Coefficient
When dealing with two random variables, it is often useful to quantify the nature and
strength of their linear relationship. Covariance and the correlation coefficient are two
key measures that serve this purpose.
Covariance
Covariance measures the extent to which two random variables change together. A
positive covariance indicates that the variables tend to increase or decrease together,
while a negative covariance suggests that one variable tends to increase when the
other decreases. A covariance near zero implies a weak linear relationship.
Definition: The covariance between two random variables X and Y, denoted as Cov(X,
Y) or σ_XY , is defined as [1]:
Cov(X, Y) = E[(X - E(X))(Y - E(Y))]
An equivalent and often more convenient formula for calculation is:
Cov(X, Y) = E(XY) - E(X)E(Y)
For Discrete Random Variables: E(XY) = sum(sum(xy * f(x, y))) over all
possible pairs (x, y).
For Continuous Random Variables: E(XY) = double_integral(xy * f(x, y)
dx dy) over the entire range of X and Y.
Properties of Covariance: 1. Cov(X, X) = Var(X) : The covariance of a random
variable with itself is its variance. 2. Cov(X, Y) = Cov(Y, X) : Covariance is
symmetric. 3. Cov(aX + b, cY + d) = ac Cov(X, Y) : For constants a, b, c, d [1].
4. Cov(X + Y, Z) = Cov(X, Z) + Cov(Y, Z) : Covariance is linear in its arguments
[1]. 5. If X and Y are independent, then Cov(X, Y) = 0 . However, the converse is not
necessarily true (i.e., Cov(X, Y) = 0 does not imply independence) [1].
Example (Discrete Covariance): Consider the joint probability distribution of X and Y
from the insurance example in the 'Marginal and Joint Distributions' section:
X\Y 1000 500 100 Marginal P(X=x)
100 0.05 0.10 0.15 0.30
500 0.10 0.20 0.05 0.35
1000 0.20 0.10 0.05 0.35
Marginal P(Y=y) 0.35 0.40 0.25 1.00
First, calculate E(X) and E(Y) : E(X) = 100*(0.30) + 500*(0.35) + 1000*(0.35) =
30 + 175 + 350 = 555 E(Y) = 1000*(0.35) + 500*(0.40) + 100*(0.25) = 350 +
200 + 25 = 575
Next, calculate E(XY) : E(XY) = (100*1000*0.05) + (100*500*0.10) +
(100*100*0.15) + (500*1000*0.10) + (500*500*0.20) + (500*100*0.05) +
(1000*1000*0.20) + (1000*500*0.10) + (1000*100*0.05) E(XY) = 5000 + 5000
+ 1500 + 50000 + 50000 + 25000 + 200000 + 50000 + 5000 E(XY) = 391500
Now, calculate Cov(X, Y) : Cov(X, Y) = E(XY) - E(X)E(Y) = 391500 - (555 *
575) Cov(X, Y) = 391500 - 319125 = 72375
Correlation Coefficient
The correlation coefficient, denoted by ρ_XY (rho), is a standardized measure of the
linear relationship between two random variables. It is a dimensionless quantity that
ranges from -1 to +1, making it easier to interpret the strength and direction of the
relationship compared to covariance.
Definition: The correlation coefficient between two random variables X and Y is
defined as [1]:
ρ_XY = Cov(X, Y) / (σ_X * σ_Y)
Where σ_X is the standard deviation of X and σ_Y is the standard deviation of Y.
Interpretation of ρ_XY : * ρ_XY = +1 : Perfect positive linear relationship. * ρ_XY =
-1 : Perfect negative linear relationship. * ρ_XY = 0 : No linear relationship (though a
non-linear relationship might exist). * Values close to +1 or -1 indicate a strong linear
relationship. * Values close to 0 indicate a weak linear relationship.
Relationship with Independence
As mentioned under covariance properties, if two random variables X and Y are
independent, then their covariance is zero, and consequently, their correlation
coefficient is also zero. This means that there is no linear relationship between
independent variables.
However, the converse is not true: a correlation coefficient of zero (or zero covariance)
does not necessarily imply that the random variables are independent. It only implies
that there is no linear relationship. There could still be a strong non-linear relationship
between them.
Example: Consider a random variable X uniformly distributed between -1 and 1, and
let Y = X^2 . Then E(X) = 0 , E(Y) = E(X^2) = Var(X) . E(XY) = E(X^3) = 0 (since
X^3 is an odd function and the distribution is symmetric around 0). Thus, Cov(X, Y)
= E(XY) - E(X)E(Y) = 0 - 0 * E(Y) = 0 . So, ρ_XY = 0 . However, X and Y are
clearly not independent, as Y is perfectly determined by X.
[1] ProductMomentsofBivariateRandomVariables.pdf, pages 1-2