Chapter 4: Multiple Random Variables
We study the joint distribution of more than two random variables, called
a random vector, such that (X, Y ), (X, Y, Z), (X1 , · · · , Xn ), and the distri-
bution of their functions like X + Y , XY Z, or X1 + X2 + · · · + Xn .
1     Bivariate Random Variables
Assume both X and Y are random. We treat (X, Y ) as a two-dimensional
random vector and study their relationship.
1.1    Discrete Case
Assume that both X and Y are discrete random variables, with the sample
space X and Y respectively.
Joint pmf:
              fX,Y (x, y) = P (X = x, Y = y),        ∀x ∈ X , y ∈ Y.
Properties:
    • fX,Y (x, y) ≥ 0;
      P      P
    •   x∈X     y∈Y fX,Y (x, y) = 1.
The probability of a set A is given by
                                            X
                     P ((X, Y ) ∈ A) =             fX,Y (x, y).
                                         (x,y)∈A
Marginal pmf: If the joint distribution of (X, Y ) is known, their marginal
pmf are                                 X
                 fX (x) = P (X = x) =       fX,Y (x, y).
                                             y∈Y
                                             X
                     fY (y) = P (Y = y) =          fX,Y (x, y)
                                             x∈X
                                       80
Example 1 Two fair dice thrown. Let X=maximum, Y =sum.
  Possible values:
  X: 1, 2, 3, 4, 5, 6.
  Y : 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12.
  Can write the probabilities in a table.
Remark:
   • Joint distribution determines the marginal distribution.
   • Marginals do not determine the joint distribution.
Example: Define the joint pmf by
                       1                       1
  f (0, 0) = f (0, 1) = ; f (1, 0) = f (1, 1) = ;   f (x, y) = 0, otherwise.
                       6                       3
Consider another joint pmf by
         1               5                       3
f (0, 0) =  ; f (1, 0) = ; f (0, 1) = f (1, 1) = ; f (x, y) = 0, otherwise.
         12             12                      12
They share the same marginal distributions, but not the same joint distri-
bution!
                                     81
1.2   Continuous Case
Assume that both X and Y are continuous random variables.
Joint pdf: A function fX,Y (x, y) is called a joint probability density function
of (X, Y ) if
                                Z Z
              P ((X, Y ) ∈ A) =           f (x, y)dxdy, ∀A ∈ R2 .
                                   (x,y)∈A
The joint pdf satisfies:
   • f (x, y) ≥ 0,
     R∞ R∞
   • −∞ −∞ fX,Y (x, y)dxdy = 1.
Joint cdf: The joint distribution of (X, Y ) can be completely described
with their joint cdf
                 F (x, y) = P (X ≤ x, Y ≤ y),         ∀(x, y) ∈ R2 .
Relationship between joint pdf and joint cdf: if F is differentiable with
respect to both x and y, then
                               Z y Z x
                    F (x, y) =         fX,Y (u, v)dudv,
                                 −∞       −∞
                            ∂2
                                F (x, y) = f (x, y).
                           ∂x∂y
Marginal pdf: If the joint pdf of (X, Y ) is given, the marginal pdfs of X
and Y are given by               Z ∞
                        fX (x) =      f (x, y)dy,
                                          −∞
                                      Z    ∞
                           fY (y) =            f (x, y)dx.
                                      −∞
                                          82
Review on Double
          RR        Integration:
  Compute    D f (x, y)dxdy using iterated integrals
                                   83
Example. Check whether the following function a valid pdf
                    f (x, y) = ye−(x+y) I{0 < x < y}.
Example. Show that     f (x, y) = 2I{0 ≤ x ≤ y ≤ 1} is a valid pdf.
                                   84
Example. Assume f (x, y) = e−y I{0 < x < y}.
  (i) Show that f (x, y) is a valid pdf.
 (ii) What is the marginal distribution of X?
(iii) What is the marginal distribution of Y ?
(iv) Compute P (X + Y ≥ 1).
                                      85
1.3   Expectation of Functions of Random Vector
Assume g is a real-valued function of two random variables g(X, Y ).
If X and Y are both discrete, then
                                XX
                 E(g(X, Y )) =     g(x, y)fX,Y (x, y).
                                    x∈X y∈Y
If X and Y are both continuous, then
                           Z ∞Z ∞
             E(g(X, Y )) =           g(x, y)fX,Y (x, y)dxdy.
                                 −∞    −∞
Properties:
   • E(aX + bY + c) = aE(X) + bE(Y ) + c.
   • E(ag1 (X, Y ) + bg2 (X, Y ) + c) = aE(g1 (X, Y )) + bE(g2 (X, Y )) + c.
   • In general, E(XY ) 6= E(X)E(Y ) unless X and Y are independent.
Joint mgf: MX,Y (t, s) = E(etX+sY ). Note
                MX,Y (t, 0) = MX (t),            MX,Y (0, s) = MY (s),
        ∂ k MX,Y (t, s)                          ∂ k MX,Y (t, s)
                        |(0,0) = E(X k ),                        |(0,0) = E(Y k ).
             ∂tk                                      ∂sk
                                            86
Discrete Example: Two fair dice thrown. Let X=maximum, Y =sum.
Compute E(XY ).
Ex. f (x, y) = e−y I{0 < x < y}. Compute E(X), E(Y ), E(XY ), MX,Y (t, s).
                                   87
2     Conditional Distributions
Oftentimes (X, Y ) are related. For example, let X be a person’s height and
Y be a person’s weight. Knowledge about the value of X gives us some
information about the value of Y . It turns out conditional probabilities
of Y given knowledge of X can be computed from their joint distribution
fX,Y (x, y).
2.1      Discrete Case
Assume both X and Y are discrete. For any x such that P (X = x) > 0, the
conditional pmf of Y given X = x is defined as
                                           P (X = x, Y = y)
         fY |X (y|x) = P (Y = y|X = x) =                    ,   ∀y ∈ Y.
                                              P (X = x)
We can define fX|Y (x|y) similarly.
Remark: The function f (y|x) is indeed a pmf, since for any fixed x it
satisfies
    • fY |X (y|x) ≥ 0 for any y.
      P
    •    y fY |X (y|x) = 1.
Proof:
Example. The two dice example, X=maximum, Y =sum.
    fY |X (y|3).
    fX|Y (x|7).
                                      88
2.2   Continuous Case
Assume both X and Y are continuous. For any x such that fX (x) > 0, the
conditional pdf of Y given X = x is defined as
                                      f (x, y)
                      fY |X (y|x) =            ,   ∀y ∈ Y.
                                      fX (x)
We can define fX|Y (x|y) similarly.
Remark: The function fY |X (y|x) is indeed a pdf, since for any fixed x it
satisfies
   • fY |X (y|x) ≥ 0 for any y.
     R
   • y fY |X (y|x)dy = 1.
Example. Assume f (x, y) = e−y I{0 < x < y}. Compute fY |X (y|x).
                                       89
2.3   Conditional Mean and Variance
For discrete random variables:
                                 X
            E(Y |X = x) =            yfY |X (y|x),
                                 y
                                 X
          Var(Y |X = x) =            {y − E(Y |X = x)}2 fY |X (y|x).
                                 y
For continuous random variables:
                            Z
           E(Y |X = x) =        yfY |X (y|x)dy,
                            Z
          Var(Y |X = x) =      {y − E(Y |X = x)}2 fY |X (y|x)dy.
Remark 1: As before, we have
            Var(Y |X = x) = E(Y 2 |X = x) − {E(Y |X = x)}2 .
Example. Two dice example, X=max, Y =sum. Compute E(Y |X = 3).
Ex. f (x, y) = e−y I{0 < x < y}. Find E(Y |X = x) and Var(Y |X = x).
                                       90
Remark 2: Note E(Y |X = x) is a function of x. Therefore, E(Y |X) is a
random variable as a function of X.
   • E(g(X)|X) = g(X).
Theorem:
   • Conditional Expectation Identity
                             E(Y ) = E(E(Y |X)).
   • Conditional Variance Identity
                   Var(Y ) = E(Var(Y |X)) + Var(E(Y |X)).
Remark 3:
   • E(g(X, Y )) = E(E(g(X, Y )|Y )) = E(E(g(X, Y )|X)).
   • Conditional expectation as projection:
              E(Y − E(Y |X))2 ≤ E(Y − g(X))2 ,     ∀ g function
     So E(Y |X) is “closest” (in above sense) to Y among all the functions
     of X.
                                     91
3    Independence
Def: Let (X, Y ) be a bivariate random vector with joint pdf/pmf fX,Y (x, y),
Then X and Y are called independent random variables if for every x, y ∈ R
                              f (x, y) = fX (x)fY (y).
Example. Consider the discrete bivariate random vector (X, Y ) with joint
pmf given by
                                       1                             1                   3
f (10, 1) = f (20, 1) = f (20, 2) =      ,    f (10, 2) = f (10, 3) = ,   f (20, 3) =      .
                                      10                             5                  10
Are X and Y are independent?
Lemma. Let (X, Y ) be a bivariate random vector with joint pdf or pmf
fX,Y (x, y), Then X and Y are independent random variables if and only
if there exist functions g(x) and h(y) such that for every x ∈ R and y ∈ R,
                                f (x, y) = g(x)h(y).
In other words, the joint pmf/pdf is factorizable. (We do not need to com-
pute marginal pdfs).
Example. Consider the continuous bivariate random vector (X, Y ) with
joint pdf given by
                               1 2 4 −y−(x/2)
                 f (x, y) =       x y e       ,       x > 0, y > 0.
                              384
Are X and Y are independent?
                                             92
Theorem: If X and Y are independent, then
 (i) E(Y |X) = E(Y ).
 (ii) The events {X ∈ A} and {Y ∈ B} are independent.
            P (X ∈ A, Y ∈ B) = P (X ∈ A)P (Y ∈ B),         ∀A ⊂ R, B ⊂ R.
(iii)
                          E(g(X)h(Y )) = E(g(X))E(h(Y ).
        In particular, E(XY ) = E(X)E(Y ).
(iv) In addition, we have MX,Y (t, s) = E(etX+sY ) = MX (t)MY (s). And
                       MX+Y (t) = E(et(X+Y ) ) = MX (t)MY (t).
        If it is easy to identify the right-hand side as the MGF of some standard
        distribution, then the sum of two independent variables is easy to find.
Example 1.       X ∼ Bin(n1 , p), Y ∼ Bin(n2 , p), and they are independent.
Example 2.      X ∼ Poisson(λ1 ), Y ∼ Poisson(λ2 ), and they are independent.
                                        93
Example 3.   X ∼ NB(r1 , p), Y ∼ NB(r2 , p), and they are independent.
Example 4.   X ∼ N(µ1 , σ12 ), Y ∼ N(µ2 , σ22 ), and they are independent.
Example 5.   X ∼ Gamma(α1 , β), Y ∼ Gamma(α2 , β), and they are inde-
pendent.
                                    94
4     Bivariate transformation
In this section, we only consider continuous bivariate random vector (X, Y ).
Consider the following bivariate transformation of (X, Y ):
                       U = g1 (X, Y ),        V = g2 (X, Y ).
4.1   Transformation for Discrete Random Variables
Assume that (X, Y ) is a discrete bivariate random vector with the support
A, i.e. P (X = x, Y = y) > 0 on A. Consider the bivariate transformation
                       U = g1 (X, Y ),        V = g2 (X, Y ).
The the support of (U, V ) is
        B = {(u, v) : u = g1 (x, y), v = g2 (x, y) for some (x, y) ∈ A}.
For any (u, v) ∈ B, define A(u,v) = {(x, y) ∈ A : g1 (x, y) = u, g2 (x, y) = v}.
Then the joint pmf of (U, V ) is given by
                                                            X
fU,V (u, v) = P (U = u, V = v) = P ((X, Y ) ∈ A(u,v) ) =             fX,Y (x, y).
                                                                (x,y)∈A(u,v)
Example 1: Assume X ∼ Poisson(λ) and Y ∼ Poisson(θ), and they are
independent. Find the joint pmf of (X + Y, Y ) and the marginal pmf of U .
                                         95
4.2   One-to-One Transformation for Continuous Random Vari-
      ables
Assume that g1 and g2 are continuous, differentiable, and one-to-one. There-
fore, we can define their inverse transformations as
                      X = h1 (U, V ),        Y = h2 (U, V ).
Def: Jacobian matrix and determinant
                               ∂x            ∂y   
                          J=     ∂u           ∂u       .
                                 ∂x           ∂y
                                    ∂v        ∂v
is the Jacobian matrix and det(J) is the Jacobian determinant, or simply
the Jacobian.
Example 1. Linear transform.
                        U = X + Y,           V = X − Y.
Example 2. Polar transform. Assume (X, Y ) ∈ R2 . Consider
                         x = r cos θ,        y = r sin θ,
where r ∈ (0, ∞) and θ ∈ (0, 2π). How to express (r, θ) in terms of (x, y)?
                                        96
Theorem. If fX,Y (x, y) is the joint density of (X, Y ), then
               fU,V (u, v) = fX,Y (h1 (u, v), h2 (u, v))| det(J)|.
   Proof follows from change of variable rules for integration — omitted.
Example. Assume X, Y ∼ N (0, 1) and they are independent. Let U =
X + Y, V = X − Y . Find the joint and marginal distributions of (U, V ).
   Example. Polar transform of independent normals.
                                       97
Example. Assume X ∼ Gamma(α1 , β) and Y ∼ Gamma(α2 , β), and they
                                     X
are independent. Let U = X + Y, V = X+Y . Find the joint and marginal
distributions of (U, V )
Example. Assume X, Y ∼ N (0, 1) and they are independent. Let U =
X/Y . Find the distribution of U .
Example. Assume X ∼ Beta(α, β) and Y ∼ Beta(α + β, γ) and they are
independent. Find the distribution of XY .
                                 98
4.3    Piecewise One-to-One Transformation
Assume (X, Y ) takes value from A = A0 ∪ A1 ∪ · · · ∪ Ak , where P ((X, Y ) ∈
A0 ) = 0. Also U = g1i (X, Y ), V = g2i (X, Y ) is one-to-one transformation
from Ai to B, for i = 1, · · · , k. Then
                              k
                              X
              fU,V (u, v) =         fX,Y (h1i (u, v), h2i (u, v))| det(Ji )|,
                              i=1
5     Hierarchical Mixtures.
Recall that
       E(Y ) = E(E(Y |X)),          var(Y ) = E(var(Y |X)) + var(E(Y |X)).
Example. (binomial-Poisson). An insect lays a large number of eggs, each
surviving with probability p. On the average, how many eggs will survive?
   Let X=the number of eggs that survive
   Let Y =the total number of eggs laid by the insect.
    1. Describe their distributions.
    2. Find the joint distribution of (X, Y ).
    3. Find the marginal distribution of X, E(X) and Var(X).
                                            99
Example: Assume X|Λ ∼ Poisson(Λ), Λ ∼ Gamma(α, β).
Example: Assume X|p ∼ Binomial(n,p), p ∼ Beta(α, β).
                                100
Example: binomial-Poisson-gamma (optional). Assume X|Y ∼ Bin(Y, p),
Y |Λ ∼ Poisson(Λ), Λ ∼Gamma(α, β).
6   Covariance and Correlation.
Covariance: A measure of joint variation.
               Cov(X, Y ) = E[{X − E(X)}{Y − E(Y )}].
Note: the outer expectation is with respect to the joint distribution of
(X, Y ). And we have
                  Cov(X, Y ) = E(XY ) − E(X)E(Y ).
Correlation:
                                   Cov(X, Y )
                          ρX,Y =              .
                                     σX σY
Remark: If X and Y are independent, then cov(X, Y ) = 0 and ρX,Y = 0.
But the converse is not true!
                                   101
Example. X ∼ N (0, 1), Y = X 2 .
Example. X = X1 + X3 , Y = X2 + X3 , where X1 , X2 , X3 pairwise inde-
pendent with common variance σ 2 . Compute ρ.
                                   102
                                   1
Example. X ∼Unif(0,1), Z ∼Unif(0, 10 ) and they are independent. Let
Y = X + Z. Compute ρ.
Example. Y = X 2 + Z, where X, Z are independent, X symmetric about
0, Z any distribution. Compute Cov(X, Y ).
One Important Equation:
         Var(aX + bY ) = a2 Var(X) + b2 Var(Y ) + 2abCov(X, Y ).
One Important Inequality: Cauchy-Schwarz Inequality
                           |Cov(X, Y )| ≤ σX σY
with equality iff X and Y are linearly related.
Corollary.
                              −1 ≤ ρX,Y ≤ 1.
And |ρX,Y | = 1 iff Y = aX + b wp 1, where a > 0 iff ρX,Y = 1 and a < 0 iff
ρX,Y = −1.
(Proofs can be found in the textbook and are omitted here.)
                                    103
7     Bivariate Normal.
We say (X, Y ) ∼ BV N (µ1 , µ2 , σ12 , σ22 , ρ) if
                      1
f (x, y) =        p
             1 − ρ2 σ 1 σ 2
             2π
             "          (                                              )#
                            x − µ1 2                             y − µ2 2
                                                          
                1                           x − µ1    y − µ2
    × exp −                          − 2ρ                     +             .
            2(1 − ρ2 )        σ1              σ1        σ2         σ2
We will show that
    X ∼ N (µ1 , σ12 ),    Y ∼ N (µ2 , σ22 ),   ρX,Y = ρ,   aX + bY is normal.
                                           104
Conditional distribution for Bivariate normal.
  Suppose (X, Y ) ∼ BV N (µ1 , µ2 , σ12 , σ22 , ρ), then
                                                          
                                        σ1               2
          X|Y = y ∼ N µ1 + ρ (y − µ2 ), σ1 (1 − ρ)
                                        σ2
                                                          
                                        σ2               2
          Y |X = x ∼ N µ2 + ρ (x − µ1 ), σ2 (1 − ρ)
                                        σ1
8      Multivariate Distributions
Several variables (X1 , . . . , Xn ).
8.1      Discrete Case
Joint pmf
             fX1 ,...,Xn (x1 , . . . , xn ) = P (X1 = x1 , . . . , Xn = xn ).
                                                  P
satisfying fX1 ,...,Xn (x1 , . . . , xn ) ≥ 0 and x1 ,...,xn fX1 ,...,Xn (x1 , . . . , xn ) = 1.
For any subset A of Rn , we have
                                                       X
             P {(X1 , . . . , Xn ) ∈ A} =                           fX1 ,...,Xn (x1 , . . . , xn ).
                                                 (x1 ,...,xn )∈A
                                    X          X
        Eg(X1 , . . . , Xn ) =           ...       g(x1 , . . . , xn )fX1 ,...,Xn (x1 , . . . , xn ).
Marginal distribution of (Xi1 , . . . , Xik ):
                                               X
      fXi1 ,...,Xik (xi1 , . . . , xik ) =               fX1 ,...,Xn (x1 , . . . , xn ).
                                           other indices
One-dimensional marginals:
                                            X
                   fXi (xi ) =                        f (x1 , . . . , xn ).
                                       x1 ,...,xi−1 ,xi+1 ,...,xn
Conditional distribution:
                                                                                f (x1 , . . . , xn )
      fXk+1 ,...,Xn |X1 ,...,Xk (xk+1 , . . . , xn |x1 , . . . , xk ) =                                    .
                                                                            fX1 ,...,Xk (x1 , . . . , xk )
Covariance: Cov(Xi , Xj ) — based on pairwise distribution.
                                                    105
Independence: X1 , · · · , Xn are called mutually independent random vari-
ables if their joint is the product of marginals:
                                                n
                                                Y
             fX1 ,···,Xn (x1 , · · · , xn ) =         fXi (xi ),     ∀(x1 , · · · , xn )
                                                i=1
Remark: If X1 , · · · , Xn are mutually independent, then
  (1) Any pair Xi and Xj are pairwise independent.
  (2) Functions g1 (X1 ), · · · , gn (Xn ) are independent, and
                             n
                             Y            n
                                          Y
                           E( gi (Xi )) =   E(gi (Xi )).
                               i=1                    i=1
   (3) MGF is the product of individual MGF’s.
                                                            n
                                                            Y
                       MX1 ,···,Xn (t1 , · · · , tn ) =           MXi (ti ).
                                                            i=1
   (4) Let Z = X1 + · · · + Xn , then the mgf of Z is
                                                n
                                                Y
                                 MZ (t) =              MXi (t).
                                                i=1
In particular, if X1 , · · · , Xn all have the same distribution with mgf MX (t),
then
                                  MZ (t) = [MX (t)]n .
Applications:
  (i) Sum of independent normals is normal. Mean, variance add up.
 (ii) Sum of independent gammas with the same scale parameter is gamma
      with the same scale and shape parameter added up. In particular, sum
      of independent exponentials is gamma.
(iii) Sum of independent Poisson is Poisson with parameters added up.
 (iv) Sum of independent geometric is negative binomial.
                                                106
Multinomial distribution. n categories, and each item can be from one
and only one category. Sampling m times independently from the categories
with probabilities p1 , . . . , pn , where p1 + · · · pn = 1. Let Xi = the count of
the ith category. Let x1 , . . . , xn be non-negative integers adding up to m.
Then
                                                     m!
          P (X1 = x1 , . . . , Xn = xn ) =                       px1 px2 · · · pxnn .
                                              x1 !x2 ! · · · xn ! 1 2
Prob. add up to one, as they are the terms in expansion of (p1 + · · · + pn )m .
   (i) marginals are (lower order) multinomial. One dimensional Xi ∼Bin(m, pi ).
    (ii) Conditionals:
    (iii) Merging: (X1 +X2 , X3 , · · · , Xn ) ∼ Multinomial(m; p1+p2, p3 , · · · , pn ).
    (iv) V ar(Xi ) = mpi (1 − pi ) and Cov(Xi , Xj ) = −mpi pj for all i 6= j.
                                         107
8.2      Continuous Case
Joint Rpdf of (X1 , . . . , Xn ): If fX1 ,...,Xn (x1 , . . . , xn ) satisfies fX1 ,...,Xn (x1 , . . . , xn ) ≥
0 and fX1 ,...,Xn (x1 , . . . , xn )dx1 · · · dxn = 1.
Probabilities are obtained by
                                Z
   P {(X1 , . . . , Xn ) ∈ A} =                              fX1 ,...,Xn (x1 , . . . , xn )dx1 · · · dxn .
                                           (x1 ,...,xn )∈A
                             Z         Z
  Eg(X1 , . . . , Xn ) =         ...       g(x1 , . . . , xn )fX1 ,...,Xn (x1 , . . . , xn )dx1 . . . dxn .
Marginal of (Xi1 , . . . , Xik ):
                                                 Z
          fXi1 ,...,Xik (xi1 , . . . , xik ) =                         fX1 ,...,Xn (x1 , . . . , xn ).
                                                  other indices
One-dimensional marginals:
                    Z
         fXi (xi ) = f (x1 , . . . , xn )dx1 · · · dxi−1 dxi+1 · · · dxn .
Conditional:
                                                                               f (x1 , . . . , xn )
      fXk+1 ,...,Xn |X1 ,...,Xk (xk+1 , . . . , xn |x1 , . . . , xk ) =                                   .
                                                                           fX1 ,...,Xk (x1 , . . . , xk )
Covariance: cov(Xi , Xj ) — based on pairwise distribution.
Independence: joint is the product of marginals. Equivalently, MGF is
the product of individual MGF’s.
    Example 1. Uniform over the ball.
                                                  3
                          f (x1 , x2 , x3 ) =       I{x21 + x22 + x23 < 1}.
                                                 4π
                                                     108
   Example 2. Dirichlet.
f (x1 , . . . , xk−1 )
          Γ(α1 + · · · + αk ) α1 −1        αk−1 −1 αk −1
     =                         x    · · · xk−1    xk     I{xi > 0, x1 + · · · + xk = 1}.
            Γ(α1 ) · · · Γ(αk ) 1
   Properties:
   (i) marginals are (lower order) Dirichlet. One dimensionals are beta.
   (ii) Conditionals:
   (iii) Merging of categories:
   (iv) Covariances:
                                         109
Example 3. Let n = 4 and the joint density of (X1 , X2 , X3 , X4 ) is
                                         3
f(X1 ,X2 ,X3 ,X4 ) (x1 , x2 , x3 , x4 ) = (x21 +x22 +x23 +x24 ),   if 0 < xi < 1, i = 1, 2, 3, 4;
                                         4
and = 0 otherwise.
  (i) Show that this is a valid pdf.
 (ii) Compute P (X1 < 12 , X2 < 43 , X4 > 12 )
 (iii) Obtain the marginal pdf of (X1 , X2 ).
                                                                      1
 (iv) Find the conditional pdf of (X3 , X4 ) given X1 =               3   and X2 = 23 .
  (v) Compute E(X1 X2 ).
                                              110
8.3     Multivariate Transformation
Let (X1 , · · · , Xn ) be a random vector with pdf fX1 ,···,Xn (x1 , · · · , xn ). Let
A = {x : fX (x) > 0}. A new random vector (U1 , · · · , Un ) is defined by
                                    U1 = g1 (X1 , · · · , Xn ),
                                    U2 = g2 (X1 , · · · , Xn ),
                                    ···          ···
                                    Un = gn (X1 , · · · , Xn ).
The transformation is one-to-one from A onto B. The inverse of gi ’s are
                                    X1 = h1 (U1 , · · · , Un ),
                                    X2 = h2 (U1 , · · · , Un ),
                                    ···          ···
                                    Xn = hn (U1 , · · · , Un ).
Let J be the Jacobian from the inverse. The joint pdf of U1 , · · · , Un is then
   fU1 ,···,Un (u1 , · · · , un ) = fX1 ,···,Xn (h1 (u1 , · · · , un ), · · · , hn (u1 , · · · , un ))|J|.
    Example: Let (X1 , X2 , X3 , X4 ) have the joint pdf
fX1 ,X2 ,X3 ,X4 (x1 , x2 , x3 , x4 ) = 24e−x1 −x2 −x3 −x4 ,            0 < x1 < x2 < x3 < x4 < ∞.
Consider the transformation
        U1 = X1 ,         U2 = X 2 − X 1 ,          U3 = X 3 − X 2 ,          U4 = X 4 − X 3 .
                                                   111
Example 3. Multivariate normal.
   Joint density of Y = (Y1 , · · · , Yn ):
                                                                                                         1               1       T −1
   fY (y1 , . . . , yn ) =                     exp − (y − µ) Σ (y − µ) .
                           (2π)n/2 (det(Σ))1/2      2
Generation of multivariate normal N (µ, Σ):
   (1)Let X1 , · · · , Xn be iid N (0, 1).
   (2) Write X = (X1 , . . . , Xn )0 and let
                                   Y = AX + µ,
where Σ = AAT (i.e. the Cholesky decomposition). Then Y is multivariate
normal with the above density function, with E(Y ) = µ and Var(Y ) = Σ.
Marginals and conditionals are also (multivariate) normal.
                                         112
9    Some Useful Inequalities.
a. Cauchy-Schwarz
                        (E(XY ))2 ≤ E(X 2 )E(Y 2 ).
b. Hölder
                   |E(XY )| ≤ (E(|X|p ))1/p (E(|X|q ))1/q ,
where p−1 + q −1 = 1.
c. Minkowski
             (E(|X + Y |p ))1/p ≤ (E(|X|p ))1/p + (E(|Y |p ))1/p ,
for p ≥ 1.
d. Jensen
    A function ψ is called convex if ψ(at + (1 − a)s) ≤ aψ(t) + (1 − a)ψ(s).
    For any convex function ψ,
                           E(ψ(X)) ≥ ψ(E(X)).
                                     113