M.Sc. Statistics & Data Science.
Distribution Theory-Supportive Notes (Unit 2)
Bivariate discrete and continuous distributions:
Two-dimensional random variable.
Definition:
Let X and Y be two random variables defined on the sample space S, then the function
(X, Y) that assigns a point in R2 (R×R), is called a two-dimensional random variable.
Let (X, Y) be a two- dimensional random variable, defined on the sample space S and
w € S. The value of (X, Y) at w is given by the pair of real numbers [X(w), Y(w)]. The
notion [ X ≤ a, Y ≤ b] denote the event of elements w € S, such that X(w) ≤ a and
Y(w) ≤ b. The probability of the event [ X ≤ a, Y ≤ b] is denoted by P (X ≤ a, Y ≤ b).
A two- dimensional random variable is said to be discrete if it takes at most a
countable number of points in R2 e.g. rolling pair of dice.
Two random variables X and Y are said to be jointly distributed if they are defined on
the same probability space. The sample points consist of 2-tuples. If the joint
probability function is denoted by PXY (x, y), then the probability of a certain event E
is given by P (x, y) = P [ (X, Y) € E]
Two-dimensional or joint probability mass function.
Let (X, Y) be two discrete random variables, then the joint probability mass function
of X and Y is defined as
P (x, y) = P (X= x, Y =y) ∀ (x, y) ∈ S
Properties: i) 0 ≤ p (x, y) ≤ 1 ∀ (x, y) ∈ S
ii) ∑𝑥 ∑𝑦 𝑝(𝑥, 𝑦) = 1
iii) If A1 ⊂ S, then P(A1) = ∑∑𝑝(𝜘, 𝑦) (𝑥, 𝑦) ∈ 𝐴1
The Marginal and conditional probability functions:
If P (x, y) is a joint prob. mass function of discrete random variables X and Y, then
marginal pmf of X is defined as
P(X=x) = P(x) = ∑𝑦 𝑃(𝑥, 𝑦)
and marginal pmf of Y is defined as
P(Y=y) = P(y) = ∑𝑥 𝑃(𝑥, 𝑦)
The conditional prob. mass function of X given Y=y is defined as
P(X=x│Y=y) =P (x│y) = 𝑃(𝑥, 𝑦)/𝑃(𝑦) , P(y) > 0
The conditional prob. mass function of Y given X=x is defined as
P(Y=y│X=x) =P(y│x) = 𝑃(𝑥, 𝑦)/𝑃(𝑥) , P(x) > 0
A necessary and sufficient condition for two discrete random variables to be
independent is 𝑃(𝑥, 𝑦) = P(x). P(y) ∀ (x, y)
Two-dimensional or joint probability density function
If (X, Y) be two continuous random variables, then the joint probability density
function of X and Y is defined as
P (x- dx/2 ≤ X ≤ x+ dx/2, y- dy/2 ≤ Y ≤ y+ dy/2 ) = f(x, y)dx dy
⇨ f(x, y) = P (x- dx/2 ≤ X ≤ x- dx/2, y- dy/2 ≤ Y ≤ y- dy/2 )/ dx dy
Properties: i) f (x, y) ≥ 0 ∀ (x, y) ∈ S
ii) ∬(𝑥,𝑦)𝜖𝑆 𝑓(𝑥, 𝑦) ⅆ𝑥 ⅆ𝑦 =1
iii) If A1 ⊂ S, then P(A1) =∬(𝑥,𝑦)𝜖𝐴1 𝑓(𝑥, 𝑦) ⅆ𝑥 ⅆ𝑦
The Marginal and conditional probability functions:
If f (x, y) is a joint pdf of continuous random variable X and Y, then the marginal
pdf of X is defined as
∞
f(x) = ∫−∞ 𝑓(𝑥, 𝑦)ⅆ𝑦
The marginal pdf of Y is defined as
∞
f(y) = ∫−∞ 𝑓(𝑥, 𝑦)ⅆ𝑥
The conditional pdf of X given Y=y is defined as
f(x│y) = f (x, y)/f(y) , f(y) > 0
The conditional pdf of Y given X=x is defined as
f(y│x) = f (x, y)/f(x) , f(x) > 0
A necessary and sufficient condition for two continuous random variables to be
independent is 𝑓(𝑥, 𝑦) = f(x).f(y) ∀ (x,y)
Conditional Expectation and Conditional Variance.
Conditional Mean.
Let (X, Y) be two dimensional continuous random variables with joint prob. density
function f (x, y). The conditional mean of X given Y=y is defined as
∞
E [X |Y=y] = E(X/y) = ∫−∞ 𝑥𝑓(𝑥/𝑦)ⅆ𝑥 where f(x/y) = f (x, y)/f(y) and
∞
f(y) = ∫−∞ 𝑓(𝑥, 𝑦)ⅆ𝑥
The conditional variance of X given Y =y is defined as
V (X/Y=y) = V(X/y) = E [{X – E(X/y)}2|y] = E(X2/y) - [E(X/y)]2
∞ ∞
Where E(X2/y) = ∫−∞ 𝑥2𝑓(𝑥/𝑦)ⅆ𝑥 and E(X/y) = ∫−∞ 𝑥𝑓(𝑥/𝑦)ⅆ𝑥
The conditional mean of Y given X=x is defined as
∞
E[Y/X=x] = E(Y/x) = ∫−∞ 𝑦𝑓(𝑦/𝑥)ⅆ𝑦 where f(y/x) = f (x, y)/f(x) and
∞
f(x) = ∫−∞ 𝑓(𝑥, 𝑦)ⅆ𝑦
The conditional variance of Y given X =x is defined as
V (Y/X=x) = V(Y/x) = E[{Y – E(Y/x)}2 | y] = E(Y2/x) - [E(Y/x)]2
∞ ∞
Where E(Y2/x) = ∫−∞ 𝑦2𝑓(𝑦/𝑥)ⅆ𝑥 and E(Y/x) = ∫−∞ 𝑦𝑓(𝑦/𝑥)ⅆ𝑦
Remark: E [Y | X=x] = E [Y |x] is called the regression curve of Y on x, it is denoted
by 𝜇𝑌|𝑥 . In particular if, E [Y |x] is linear in x, then it represents regression equation
(line) of Y on x.
E [X | Y= y] = E [X |y] is called the regression curve of X on y, it is denoted by 𝜇𝑋|𝑦 .
In particular if, E [X |y] is linear in y, then it represents regression equation (line) of X
on y.
Theorem: If X and Y are ant two random variables, then prove that
(I) E(Y) = E{E(Y/X)}
(II) V(Y) = E{V(Y/X)} + V{E(Y/X)}
The Joint distribution Function (Joint CDF):
The joint distribution is defined as (joint cdf)
F (x, y) = P (X ≤x, Y ≤y) = ∑(𝑢≤𝑥,𝑣≤𝑦) 𝑃(𝑢, 𝑣) ,if X and Y are discrete r.vs
with joint pmf P (x, y)
= ∬(𝑢≤𝑥,𝑣≤𝑦) 𝑓(𝑢, 𝑣) ⅆ𝑣 ⅆ𝑢 , if X and Y are
continuous r.vs with joint pdf f (x, y)
Properties:
1. For real numbers a1, a2, b1, b2
P (a1 < X ≤ b1, a2 < Y ≤ b2) = F (b1, b2) + F (a1, a2) - F (a1, b2) - F (b1, a2)
2. 0 ≤ F (x, y) ≤ 1 ∀ (x, y)
3. F (x, y) is a monotonic non-decreasing function.
i.e. If a1 < a2 and b1 < b1,, then F (b1, a2) ≥ F (a1, a2) and F (a1, b2) ≥ F (a1, a2)
4. F (-∞, y) = 0, F (x, -∞) = 0, F (∞, ∞) = 1.
5. If X and Y are continuous random variables then
𝜕2
f (x, y) = F (x, y)
𝜕𝑥̇ 𝜕𝑦
6. The marginal distribution of X, F(x) = P (X ≤x, Y < ∞)
F(x) = 𝑙𝑖𝑚 F (x, y) = F (x, ∞)
𝑦→∞
The marginal distribution of Y, F(y) = P (X < ∞, Y ≤y)
F(y) = 𝑙𝑖𝑚 F (x, y) = F (∞, y)
𝑥→∞
Moments:
The rth raw moment about point A denoted by 𝜇′𝑟 (𝐴) is defined as
𝜇′𝑟 (𝐴) = E(X-A)r =∑∞
0 (𝑥 − 𝐴)rp(x)
∞
=∫−∞ (𝑥 − 𝐴)𝑟 𝑓(𝑥)ⅆ𝑥
In particular if, A=0, 𝜇′𝑟 = E(Xr) = ∑∞
0 𝑥r p(x),if X is discrete r.v. with pmf p(x)
∞
= ∫−∞ 𝑥 𝑟 𝑓(𝑥)ⅆ𝑥 , if X is continuous r.v. with pdf f(x)
If A = µ = E(X), The rth central moment denoted by µr is defined as
µr = E(X-µ)r = ∑∞
0 (𝑥 − µ)rp(x)
∞
=∫−∞ (𝑥 − µ)𝑟 𝑓(𝑥)ⅆ𝑥
µr = E(X-µ)r
𝑟
= E{∑𝑟𝑘=0 (−1)𝑘 ( ) (𝜇1′ )𝑘 (𝑋)𝑟−𝑘 }
𝑘
𝑟
= ∑𝑟𝑘=0 ′
(−1)𝑘 ( ) (𝜇1′ )𝑘 (𝜇𝑟−𝑘 )
𝑘
Relationship between central moments and raw-moments:
µ1 = 0
µ2 = 𝜇′2 − (𝜇1′ )2
µ3 = 𝜇′3 − 3𝜇′2(𝜇1′) + 2(𝜇1′ )3
µ4 = 𝜇′4 − 4𝜇3′ 𝜇1′ + 6 𝜇′2 (𝜇1′ )2 – 3(𝜇1′ )4
-----------------------------------------------------------------------
t
PX(s) =E(SX) = ∑∞
0 𝑠 𝑥 𝑝(𝑥) . Let S = e
MX(t) = E (etX) = ∑∞
0 𝑒 𝑡𝑥 𝑝(𝑥), if X is discrete r.v. with pmf p(x)
∞
=∫−∞ 𝑒 𝑡𝑥 𝑓(𝑥)ⅆ𝑥, if X is continuous r.v. with pdf f(x)
Definition: The moment generating function (mgf) of random variable X (about
origin) is denoted by MX(t) and is defined as MX(t) = E(etX), provided the RHS exist
for values of t in some interval -h < t < h, h € R, thus
MX(t) = E (etX) = ∑∞
0 𝑒 𝑡𝑥 𝑝(𝑥), if X is discrete r.v. with pmf p(x)
∞
=∫−∞ 𝑒 𝑡𝑥 𝑓(𝑥)ⅆ𝑥, if X is continuous r.v. with pdf f(x)
Now MX(t) = E(etX) = E (1+ t X + t2X2/2! + t3X3/3! + …..+ trXr/r! +……)
= 1 + t 𝜇1′ + t2/2! 𝜇2′ +………+tr/r!𝜇′𝑟 +……….
=∑∞
0 𝑡 𝑟 /𝑟! (𝜇𝑟′ )
Hence 𝜇′𝑟 = co-efficient of tr/r! in MX(t), provided 𝜇′𝑟 exist for r= 1,2,…….
ⅆ𝑟
Also 𝜇′𝑟 = 𝑀𝑥(𝑡)│t = 0
ⅆ𝑡 𝑟
● Existence of mgf does not always implies all moments exist.
Let P(X=x) = 1/x(x+1); x = 1, 2,…………(pmf of Yule’s distribution)
E(X) = ∑∞0 𝑥𝑝(𝑥) = ∑∞ 0 𝑥(1/𝑥(𝑥 + 1) = ∑∞ 0 1/(𝑥 + 1)= ½ + 1/3 +1/4 +…..,
a divergent series E(X) does not exist and hence no moments of Yule’s distribution
exist.
Now MX(t) = 1 – (1 – e-t) log (1 – et) exists if 0 < et <1 i.e. t < 0.
Hence Mx(t) for t < 0, exists. However, E(X) does not exist.
Properties of MGF:
i) M cX (t) = Mx(ct), c being a constant
ii) If Xi’s are independent, i =1, 2, ……, n.
then MX1+X2+……+Xn(t) = MX1(t).MX2(t)…….MXn(t) =∏𝑛𝑖=1 𝑀𝑥𝑖(𝑡) .
iii) Effect of change of origin and scale,
Let Y= (X-a)/h, where a and h are constants.
MY(t) = e-at/hMX(t/h)
iv) Uniqueness theorem:
The moment generating function of a distribution, if it exists, uniquely determine the
distribution. In other words, corresponding to a given probability distribution, there is
only one mgf (provided exists) and corresponding to a given mgf, there is only one
probability distribution.
Joint Moment Generating Function.
The joint moment generating function of bivariate distribution denoted by
M (t1, t2) is defined as
M (t1, t2) = MX, Y (t1, t2) = E [ 𝑒 𝑡1𝑋+𝑡2𝑌 ]
= ∑𝑥 ∑𝑦 𝑒 𝑡1𝑥+𝑡2𝑦 p (x, y), If X and Y are discrete
random variables with joint pmf p(x,y)
=∫𝑦 ∫𝑥 𝑒 𝑡1𝑥+𝑡2𝑦 𝑓(𝑥, 𝑦) ⅆ𝑥 ⅆ𝑦, if X and Y continuous
random variables with joint pdf f (x, y).
Properties of joint mgf.
1. M (0, 0) = 1
2. M (t1, 0) = E [ 𝑒 𝑡1𝑋 ] = MX(t1), M (0, t2) = E [ 𝑒 𝑡2𝑌 ] = MY(t2)
3. Effect of change of origin and scale on joint mgf.
Let U = (X- a)/c and v = (Y-b)/d a, b, c, d are constants.
𝑎 𝑏
−[ 𝑡1 + 𝑡2]
MU, V (t1, t2) = 𝑒 𝑐 𝑑 MX, Y (t1/c, t2/d)
4. The random variables X and Y are independent if and only if,
M (t1, t2) = M (t1, 0). M (0, t2)
Transformations:
One Dimensional transformation.
Let X be continuous random variable with pdf f(x).
Consider a transformation Y= g(X). Let y= g(x) be one to one transformation between
the values of x and y.
Then the pdf of is given by f(y) = f [w(y)] |J| where y =g(x) => x= w(y) and
𝜕𝜘
J= is called Jacobian of transformation.
𝜕𝑦
Two- Dimensional Transformation.
Let X and Y be two continuous random variables with joint pdf f (x, y).
Consider a transformation U = g1(X, Y) and V= g2(X, Y), let u= g1(x, y) and
v= g2(x, y) be one to one transformation between the pairs of values (x, y) and (u, v),
so that equations u= g1(x, y) and v= g2(x, y) gives unique solution for x and y in terms
of u and v. Let x=w1(u, v) and y=w2(u, v). Then the joint pdf U and V is given by
f(u, v) = f{w1(u, v), w2(u, v)}|J|, where J =|𝜕𝜘 𝜕𝑥 𝜕𝑦 𝜕𝑦
𝜕𝑢 𝜕𝑣 𝜕𝑢 𝜕𝑣
| is called Jacobian of
transformation.
Convolutions:
Let X and Y be non-negative independent integral-valued random variables with
probability P(X=j) = aj and P(Y=j)= bj . The event (X=j, Y=k) has probability ajbk.
The sum Z=X+Y is new random variable, and the event Z=r is the union of the
mutually exclusive events
(X=0,Y=r), (X=1,Y=r-1),…..,(X=r,Y=0) . Therefore the distribution cr = P(Z=r) is
given by cr = a0br + a1br-1 + a2br-2 +………….+ ar-1b1 + arb0 (1)
The operation (1), leading from two sequences {ak} and {bk} to a new sequence {ck}
is called the convolution of {ak} and {bk}.
Definition: Let {ak} and {bk} be any two numerical sequences (not necessarily
probability distributions). The new sequence {ck} defined as
ck = a0bk + a1bk-1 + a2bk-2 +………….+ ak-1b1 + akb0 and will be denoted by
{ck} ={ak}*{bk}
Examples: i) If ak = 1, bk =1 for all k ≥ 0, then ck= k +1.
ii) If ak = k, bk =1 for all k ≥ 0, then ck= k(k +1)/2
Theorem: If {ak} and {bk} are sequences with generating functions
A(s) = ∑∞0 𝑎𝑘 𝑠 𝑘 B(s) = ∑∞0 𝑏𝑘 𝑠 𝑘 , and {ck }is their convolution, then the
generating function C(s) = A(s)B(s)
Proof: If {ck} is the convolution of {ak} and {bk} then
ck = a0bk + a1bk-1 + a2bk-2 +………….+ ak-1b1 + akb0
and generating function of {ck} is C(s) =∑∞0 𝑐𝑘 𝑠 𝑘
Now A(s)B(s) = ∑∞
0 𝑎𝑖 𝑠 𝑖 ∑∞
0 𝑏𝑗 𝑠 𝑗
=∑𝑖,𝑗 𝑎𝑖 𝑏𝑗𝑠 𝑖+𝑗 then ck the coefficient of sk in A(s)B(s) is
ck = a0bk + a1bk-1 + a2bk-2 +………….+ ak-1b1 + akb0 , which is the convolution of
{ak} and {bk}and generating function of {ck }is C(s)
Hence C(s) = A(s)B(s)
Remark: Let {ak}, {bk}, {ck}, {dk},….be any sequences. We can form the
convolution {ak}*{bk} and the convolution of this new sequence with {ck}etc. The
generating function of {ak}*{bk}*{ck}*{dk} is A(s)B(s)C(s)D(s) and the order in
which is performed is immaterial.
Remark: if Xi’s are iid and {ai} is the common probability distribution of Xi then the
distribution of Sn = X1+X2+ ………+ Xn is {ai}n* (n fold convolution of {ai}.
Example: The pgf of Bernoulli distribution with b(k; 1, p) = pk (1-p)1-k is
(q + ps) and the pgf of binomial distribution with b(k; n, p) is (q + ps)n
Now (q + ps)n = (q + ps) (q + ps) ……… (q + ps) = (q + ps)n*
⇨ { b(k; n, p)} = {b(k; 1, p)}n*
⇨ Hence convolution of Bernoulli distribution is binomial distribution.
Also (q + ps)m (q + ps)n = (q + ps)m+n =>
{ b(k; m, p)} * { b(k; n, p)} = { b(k; m+n, p)}
⇨ Convolution of binomial is also a binomial when p the prob. Of success in each trial
remain same.
For Poisson distribution P(k; λ) = (e-λ λk)/k! is e -λ(1-s)
(e-λ λk)/k! (e-µ µk)/k! = (e-(λ+µ) (λ+µ)k)/k!
⇨ {P(k; λ)}* {P(k; µ}= {P(k; λ+µ)}
⇨ Convolution of Poisson distributions is also Poisson distribution.
Geometric and negative binomial distributions.
Let X be a random variable with f(k; p) = PX=k) = qkp , k =0, 1, 2,……….
The pgf of X is p/(1-qs)
Let X be a random variable with f(k; r,p)= PX=k) =(−𝑟 𝑘 )(-q)kpr , k =0, 1, …….
The pgf of X is [p/(1-qs)]r
Now [p/(1-qs)]r = p/(1-qs) p/(1-qs)……….. p/(1-qs)
{ f(k; r,p)} = {f(k; p)}{f(k; p)} ……….. {f(k; p)}
={f(k; p)}r*
⇨ Convolution of geometric distribution is negative binomial distribution.
……………………………………………………………………………………..
Question Bank- Unit 2
1) Let P (x, y) = (2x+y)/27 x = 0, 1, 2. y = 0, 1, 2
= 0 otherwise.
i) Find marginal pmf of X and Y.
ii) Conditional pmf of X given Y=2 and conditional pmf of Y given X=1
iii) Are X and Y independent?
2) Let f(x, y) = k 0 < x < 1, 0 < y < x.
= 0 otherwise.
Find k, f(x), f(y), f(x/y), f(y/x). Are X and Y independent?
3) Let f (x, y) = k [1 + x y] |x| < 1, |y| < 1.
= 0 otherwise
Find k. Show that X and Y are dependent but X2 and Y2 are independent.
4) Let X~ P(m) and Y/x ~ Binomial (x, p). Find prob. mass function of Y.
5) Let P (x, y) =[λx e-λ py (1-p)(x-y) ]/y!(x-y)!, y =0, 1, 2, , ……, x. x = 0, 1, 2, ,..
= 0 otherwise
Where λ and p are constants. λ > 0, 0 < p < 1. Find P(x), P(y), P (x/y) and
P(y/x)
6) f (x, y) = e-(x+y) I(0, ∞)(x) I(0, ∞) (y)
Find (i) Check if X and Y are independent
(ii) P (X< Y/X< 2Y )
(iii) P(1< X+Y <2)
7) If the joint distribution of X and Y is given by
F(x,y) = 1 – e-x –e-y +e-(x+y) ; x>0,y>0
= 0 , otherwise
(i) Find marginal distribution of X and Y. Are X and Y independent?
(ii) P (X ≤ 1, Y ≤1)
(ii) Find P(X+Y<2)
8) Prove theorem: If X and Y are ant two random variables, then prove that
i) E(Y) = E{E(Y/X)}
ii) V(Y) = E{V(Y/X)} + V{E(Y/X)}
9) Let f (x, y) = e-y 0 < x < y < ∞, be the joint pdf of X and Y.
Obtain joint mgf of X and Y. Find correlation i.e. ρ(X_,Y)
10) Let f (x, y) = 3x 0 < y < x < 1,
=0 otherwise.
Find pdf of U = X -Y
11) Let f (x, y, z) = e-(x+y+z) 0 < x< ∞; 0 < y< ∞; 0 < z< ∞;
be the joint pdf of X and Y. Obtain pdf of U= ( X+Y +Z)/3
12) Let X ~ exp.(mean=1) and Y ~ exp.(mean=1). X and Y are independent.
Find pdf of X-Y.
13) If X and Y are independent N (0, 1) random variates then,
show that U= X + Y and V = X - Y are independently distributed using
Jacobian transformation.
14) Let X ~ N (0, 1) and Y ~ N (0, 1).
X and Y are independent. Find pdf of X/Y.
15) Let X ~ β2 (m. n). Find the pdf of Y = (1 +X)-1