0% found this document useful (0 votes)

6 views13 pages

Notes 5 Multivariate Distributions

yes

Uploaded by

varshzz16

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views13 pages

Notes 5 Multivariate Distributions

yes

Uploaded by

varshzz16

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

MA634 Financial Risk Management

Distribution Modeling

Multivariate Distributions
Recall
Let X = (X1 , . . . , Xn ) be an n-dimensional vector of random variables. We have the following defini-
tions and statements.
Definition 0.1 (Joint CDF). For all x = (x1 , . . . , xn )⊤ ∈ Rn , the joint cumulative distribution
function (CDF) of X satisfies

FX (x) = FX (x1 , . . . , xn ) = P (X1 ≤ x1 , . . . , Xn ≤ xn ).

Definition 0.2 (Marginal CDF). For a fixed i, the marginal CDF of Xi satisfies

FXi (xi ) = FX (∞, . . . , ∞, xi , ∞, . . . , ∞).

It is straightforward to generalize the previous definition to joint marginal distributions. For example,
the joint marginal distribution of Xi and Xj satisfies

Fij (xi , xj ) = FX (∞, . . . , ∞, xi , ∞, . . . , ∞, xj , ∞, . . . , ∞).

If the joint CDF is absolutely continuous, then it has an associated probability density function (PDF)
so that Z x1 Z xn
FX (x1 , . . . , xn ) = ··· f (u1 , . . . , un ) du1 · · · dun .
−∞ −∞

Similar statements also apply to the marginal CDFs. A collection of random variables is independent
if the joint CDF (or PDF if it exists) can be factored into the product of the marginal CDFs (or
PDFs).

If X1 = (X1 , . . . , Xk )⊤ and X2 = (Xk+1 , . . . , Xn )⊤ is a partition of X, then the conditional CDF

satisfies
FX2 |X1 (x2 |x1 ) = P (X2 ≤ x2 | X1 = x1 ).

If X has a PDF f (·), then it satisfies

R xk+1 R xn
−∞ ··· −∞ f (x1 , . . . , xk , uk+1 , . . . , un ) duk+1 · · · dun
FX2 |X1 (x2 |x1 ) = ,
fX1 (x1 )

where fX1 (·) is the joint marginal PDF of X1 .

Assuming it exists, the mean vector of X is given by

⊤
E[X] := E[X1 ], . . . , E[Xn ] .

1
Whereas, again assuming it exists, the covariance matrix of X satisfies
h i
Cov(X) := Σ := E (X − E[X])(X − E[X])⊤ .

so that the (i, j)th element of Σ is simply the covariance of Xi and Xj . Note that the covariance matrix
is symmetric so that Σ⊤ = Σ, its diagonal elements satisfy Σi,i ≥ 0, and it is positive semi-definite so
that x⊤ Σx ≥ 0 for all x ∈ Rn .

The correlation matrix, ρ(X), has as its (i, j)th element

ρij := Corr(Xi , Xj ).

It is also symmetric, positive semi-definite and has 1’s along the diagonal.

For any matrix A ∈ Rk×n and vector a ∈ Rk we have

E[AX + a] = AE[X] + a, (1)

Cov(AX + a) = A Cov(X) A⊤ . (2)

Multivariate Normal Distribution

A random vector X = (X1 , . . . , Xp ) with nonsingular covariance matrix Σ has a multivariate normal
distribution if and only if its density is
p
1 1 1
f (x) = √ √ exp − (x − µ)′ Σ−1 (x − µ) . (3)
2π det Σ 2

The term (x − µ)′ Σ−1 (x − µ) appearing inside the exponent of the multivariate normal distribution
is a quadratic form. This particular quadratic form is also called the squared Mahalanobis distance
between the random vector x and the mean vector µ. Thus, density function only depends on x through
the squared Mahalanobis distance: (x−µ)′ Σ−1 (x−µ). This is the equation for a hyper-ellipse centered
at µ. The multivariate normal density is constant on ellipsoids of the form

(x − µ)′ Σ−1 (x − µ) = constant.

If the variables are uncorrelated then the variance-covariance matrix will be a diagonal matrix with
variances of the individual variables appearing on the main diagonal of the matrix and zeros everywhere
else:

 
σ12 0 ... 0
0 σ22 . . . 0
 
 
Σ= .. .. . . .. 
. . . .
 
 
0 0 . . . σp2

2
Let X ∼ Np (µ, Σ) be p-variate multivariate normal with mean µ and variance-covariance matrix Σ,
where

     
X1 µ1 σ11 σ12 · · · σ1p
X2   µ2  σ21 σ22 · · · σ2p 
     
X=
 ..  ,
 µ=
 ..  ,
 Σ= .
 .. .. .
.. 
 .  .  .. . . . 
Xp µp σp1 σp2 · · · σpp

(i) Marginal distributions. All subsets of X are themselves multivariate normal.

(ii) Any linear combination of the Xi , say c′ X = c1 X1 + c2 X2 + · · · + cp Xp , is normally distributed

as
c′ X ∼ N (c′ µ, c′ Σc).

(iii) Conditional Distributions. Subdivide the vector X into two subsets X1 and X2 ,
   
X11 X21
" #
X12  X22 
   
X1
X= , X1 = 
 ..  ,
 X2 = 
 ..  ,

X2  .   . 
X1p X2q

so that " # " # " #!

X1 µ1 Σ11 Σ12
X= ∼ Np+q , .
X2 µ2 Σ21 Σ22

The conditional distribution of X1 given X2 = x2 , f (X1 |X2 = x2 ), is p-variate multivariate

normal,
Np µ1 + Σ12 Σ−1 −1

22 (x2 − µ2 ), Σ11 − Σ12 Σ22 Σ21 .

(iv) If two variates, say X1 and X2 , of the multivariate normal are uncorrelated, ρ12 = 0 that implies
σ12 = 0, then X1 and X2 are independent. This property is not in general true for other
distributions. However, it is always true that if two variates are independent, then they are
uncorrelated, no matter what their joint distribution is.

(v) The characteristic function of a multivariate normal distribution with mean µ and covariance
matrix Σ ≥ 0 is, for t ∈ Rp ,
′ 1 ′
ϕ(t) = exp it µ − t Σt .
2

(vi) Linear Combinations. If X ∼ Np (µ, Σ) and Y = AX + b for A(q × p) and b(q × 1), then
Y ∼ Nq (Aµ + b, AΣA′ ).

(vii) Convolutions. If X and Y are independent d-dimensional random vectors satisfying X ∼

Nd (µ, Σ) and Y ∼ Nd (µ̃, Σ̃), then we may take the product of characteristic functions to show
that
X + Y ∼ Nd (µ + µ̃, Σ + Σ̃).

3
Generating Multivariate Normally Distributed Random Vectors

Suppose we wish to generate a random vector X = (X1 , . . . , Xn ) with distribution X ∼ M Nn (0, Σ).
The case E[X] ̸= 0 can easily be accommodated afterwards. Let Z = (Z1 , . . . , Zn )⊤ , where the Zi are
i.i.d. N (0, 1) random variables for i = 1, . . . , n. If C is an (n × m) matrix, then

C⊤ Z ∼ M N (0, C⊤ C).

Thus, our task reduces to finding a matrix C such that C⊤ C = Σ. This can be achieved using
the Cholesky decomposition of Σ. A standard result from linear algebra states that any symmetric
positive-definite matrix M can be written as

M = U⊤ DU,

where U is an upper triangular matrix and D is a diagonal matrix with strictly positive diagonal
entries. Since Σ is symmetric positive-definite, we can express it as
√ √ √ √
Σ = U⊤ DU = (U⊤ D)( D U) = ( D U)⊤ ( D U).

√
Therefore, the matrix C = D U satisfies C⊤ C = Σ, and is called the Cholesky decomposition of Σ.

Estimation of Multivariate Normal Distributions

We assume that Yi ∼ Np (µ, Σ). We have the following log-likelihood function,

n
np n 1X
ℓ(µ, Σ) = − ln(2π) − ln |Σ| − (Yi − µ)⊤ Σ−1 (Yi − µ)
2 2 2
i=1

The first-order condition with respect to µ is:

n
X
∂µ ℓ(µ, Σ) = Σ−1 (Yi − µ) = 0
i=1

Pn −1 (Y
Pn
Since i=1 Σ i − µ) = Σ−1 i=1 (Yi − µ), we deduce that µ̂ is the empirical mean:

 Pn 
n−1i=1 Y i,1
 .. 
µ̂ = Ȳ = 
 . 

−1
P n
n i=1 Yi,p

By using the properties of the trace function, the concentrated log-likelihood function becomes:

n
np n 1X
ℓ(µ̂, Σ) = − ln(2π) − ln |Σ| − (Yi − µ̂)⊤ Σ−1 (Yi − µ̂)
2 2 2
i=1

n
np n 1X
=− ln(2π) − ln |Σ| − tr (Yi − Ȳ )⊤ Σ−1 (Yi − Ȳ )
2 2 2
i=1

4
n
np n 1 X −1
= − ln(2π) − ln |Σ| − tr Σ (Yi − Ȳ )(Yi − Ȳ )⊤
2 2 2
i=1

np n 1
ln(2π) − ln |Σ| − tr Σ−1 S

=−
2 2 2

where S is the p × p matrix defined in the following way:

n
X
S= (Yi − Ȳ )(Yi − Ȳ )⊤
i=1

We deduce the first-order condition:

∂ℓ(µ̂, Σ) n 1
−1
= Σ− S =0
∂Σ 2 2

It follows that the ML estimator of Σ is the empirical covariance matrix:

n
1 1X
Σ̂ = S= (Yi − Ȳ )(Yi − Ȳ )⊤
n n
i=1

The simplest and most common method of estimating a multivariate normal distribution is to take
the sample mean vector and sample covariance matrix as our estimators of µ and Σ, respectively. It
is easy to justify this choice since they are the maximum likelihood estimators. It is also common to
take n/(n − 1) times the sample covariance matrix as an estimator of Σ as this estimator is known to
be unbiased.

Limitations In many risk-management applications the multivariate normal distribution is not a

good description of reality. It has three main defects,

(i) The tails of its univariate marginal distributions are too thin; they do not assign enough weight
to extreme events.

(ii) The joint tails of the distribution do not assign enough weight to joint extreme outcomes.

(iii) The distribution has a strong form of symmetry, known as elliptical symmetry.

We now discuss some distributions that address some of these defects.

5
Multivariate Gaussian Mixture Model
Multivariate Gaussian mixture model The probability density function of the random vector
X of dimension d is defined as a weighted sum of Gaussian distributions:

K
X
f (x) = πj ϕd (x; µj , Σj )
j=1

where K is the number of mixture components, µj and Σj are the mean vector and the covariance
matrix of the Gaussian distribution associated with the j th component, and πj is the mixture weight
such that K
P
j=1 πj = 1. The log-likelihood function of the sample X = {X1 , . . . , XN } is:

 
N
X K
X
ℓ(θ) = ln  πj ϕd (Xi ; µj , Σj )
i=1 j=1

To maximize the log-likelihood function, we use EM algorithm1 .

One-dimensional Case

In case of one-dimensional mixture, the parameters of the GMM are:

PK
• Mixing coefficients πk where k=1 πk = 1.

• Means µk .

• Variances σk2 .

In the EM algorithm, we maximize the expected complete-data log-likelihood, which is the log-
likelihood considering both the observed data X and the latent variables Z = {z1 , z2 , . . . , zN } that
indicate which Gaussian component generated each data point.

The complete-data log-likelihood is:

N X
X K
2
I(zi = k) log πk + log N (xi |µk , σk2 )

log L(π, µ, σ ) =
i=1 k=1

where I(zi = k) is an indicator function that equals 1 if zi = k (i.e., if the i-th data point was generated
by the k-th Gaussian component) and 0 otherwise.

Since we don’t know the values of zi (they are latent variables), we take the expectation with respect
to their posterior distribution, which leads to the expected complete-data log-likelihood:

N X
X K
Q(π, µ, σ 2 ) = γik log πk + log N (xi |µk , σk2 )

i=1 k=1

where γik = P (zi = k|xi ) is the responsibility calculated in the E-step.

1
Refer Appendix for EM algorithm

6
E-Step

In the E-step of the EM algorithm, we calculate the responsibilities, which represent the probability
that each data point xi was generated by each Gaussian component k. These responsibilities are
denoted as γik and are defined as:
γik = P (zi = k|xi )

where zi is the latent variable indicating which Gaussian component generated the data point xi . The
responsibilities γik can be computed using Bayes’ theorem:

P (zi = k) p(xi |zi = k)

γik = PK
j=1 P (zi = j) p(xi |zi = j)

In the context of a Gaussian Mixture Model, P (zi = k) is the mixing coefficient πk , and p(xi |zi = k)
is the probability density function of the k-th Gaussian, given by:

(xi − µk )2

1
p(xi |zi = k) = N (xi |µk , σk2 ) =q exp −
2πσk2 2σk2

Substituting these into the expression for γik :

πk N (xi |µk , σk2 )

γik = PK
2
j=1 πj N (xi |µj , σj )

Expanding the Gaussian pdf:

(xi −µk )2
πk √ 1 exp − 2σ 2
2πσk2
γik = k
PK 1 (xi −µj )2
j=1 πj
q
2
exp − 2σ2
2πσj j

This formula gives the responsibility γik that the k-th Gaussian component has for explaining the i-th
data point xi .

The responsibilities γik are then used in the M-step to update the parameters πk , µk , and σk2 .

M-Step

In the M-step, we maximize Q(π, µ, σ 2 ) with respect to the parameters πk , µk , and σk2 .

Update of Mixing Coefficients πk :

To find the optimal πk , we maximize Q(π, µ, σ 2 ) with respect to πk , subject to the constraint that
PK
k=1 πk = 1. The relevant part of Q is:

N X
X K
Q(π) = γik log πk
i=1 k=1

7
We introduce a Lagrange multiplier λ for the constraint and set up the Lagrangian:

N X
K K
!
X X
L= γik log πk + λ πk − 1
i=1 k=1 k=1

Taking the derivative with respect to πk and setting it to zero gives:

PN
∂L i=1 γik
= +λ=0
∂πk πk
PK
Solving for πk and using the constraint k=1 πk = 1, we get:

N
1 X
πknew = γik
N
i=1

This is the average responsibility that component k takes over all data points.

Update of Means µk :

To find the optimal µk , we maximize Q(π, µ, σ 2 ) with respect to µk . The relevant part of Q is:

N X
X K
Q(µ) = γik log N (xi |µk , σk2 )
i=1 k=1

Substituting the Gaussian pdf:

1 (xi − µk )2
log N (xi |µk , σk2 ) = − log(2πσk2 ) −
2 2σk2

The relevant term for µk is the quadratic term:

N
1X (xi − µk )2
Q(µk ) = − γik
2
i=1
σk2

Taking the derivative with respect to µk and setting it to zero:

N
∂Q(µk ) X xi − µ k
= γik =0
∂µk
i=1
σk2

This leads to: PN

γik xi
µnew
k = Pi=1
N
i=1 γik

This is the weighted average of the data points, where the weights are the responsibilities.

Update of Variances σk2 :

To find the optimal σk2 , we maximize Q(π, µ, σ 2 ) with respect to σk2 . The relevant part of Q is:

N
(xi − µk )2

1X
Q(σk2 ) =− 2
γik log σk +
2
i=1
σk2

8
Taking the derivative with respect to σk2 and setting it to zero gives:

N
∂Q(σk2 ) (xi − µk )2

1X 1
=− γik 2 − =0
∂σk2 2
i=1
σ k σ 4
k

Multiplying through by σk2 and solving for σk2 , we get:

PN new 2
i=1 γik (xi − µk )
σk2new = PN
i=1 γik

This is the weighted variance, where the weights are the responsibilities.

Multi-dimensional Case

The complete-data log-likelihood is:

N X
X K
log L(π, µ, Σ) = I(zi = k) [log πk + log N (xi |µk , Σk )]
i=1 k=1

where I(zi = k) is an indicator function that equals 1 if zi = k (i.e., if the i-th data point was generated
by the k-th Gaussian component) and 0 otherwise. Since we don’t know the values of zi (they are
latent variables), we take the expectation with respect to their posterior distribution, which leads to
the expected complete-data log-likelihood:

N X
X K
Q(π, µ, Σ) = γik [log πk + log N (xi |µk , Σk )]
i=1 k=1

where γik = P (zi = k|xi ) is the responsibility calculated in the E-step.

E-step

Similar to one-dimensional case, given the current parameters in t-th iteration, we compute the re-
sponsibilities (posterior probabilities of the latent variables):

(t) (t) (t)
πk N xi | µk , Σk
(t+1)
γik =P ,
K (t) (t) (t)
j=1 πj N xi | µj , Σj

M-Step

In the M-step, we maximize Q(π, µ, Σ) with respect to the parameters πk , µk , and Σk .

Update of Mixing Coefficients πk :

To find the optimal πk , we maximize Q(π, µ, Σ) with respect to πk , subject to the constraint that
PK
k=1 πk = 1. The relevant part of Q is:

N X
X K
Q(π) = γik log πk
i=1 k=1

9
We introduce a Lagrange multiplier λ for the constraint and set up the Lagrangian:

N X
K K
!
X X
L= γik log πk + λ πk − 1
i=1 k=1 k=1

Taking the derivative with respect to πk and setting it to zero gives:

PN
∂L i=1 γik
= +λ=0
∂πk πk
PK
Solving for πk and using the constraint k=1 πk = 1, we get:

N
(t+1) 1 X (t+1)
πk = γik
N
i=1

This is the average responsibility that component k takes over all data points.

Update of Means µk :

To find the optimal µk , we maximize Q(π, µ, Σ) with respect to µk . The relevant part of Q is:

N X
X K
Q(µ) = γik log N (xi |µk , Σk )
i=1 k=1

Substituting the Gaussian pdf:

D 1 1
log N (xi |µk , Σk ) = − log(2π) − log |Σk | − (xi − µk )⊤ Σ−1
k (xi − µk )
2 2 2

The relevant term for µk is the quadratic form:

N
1X
Q(µk ) = − γik (xi − µk )⊤ Σ−1
k (xi − µk )
2
i=1

Taking the derivative with respect to µk and setting it to zero:

N
∂Q(µk ) X
= γik Σ−1
k (xi − µk ) = 0
∂µk
i=1

This leads to:

PN (t+1)
(t+1) γik xi
µk = Pi=1 (t+1)
N
i=1 γik

This is the weighted average of the data points, where the weights are the responsibilities.

10
Update of Covariances Σk :

To find the optimal Σk , we maximize Q(π, µ, Σ) with respect to Σk . The relevant part of Q is again
the quadratic form:

N
1X h i
Q(Σk ) = − γik log |Σk | + (xi − µk )⊤ Σ−1
k (xi − µk )
2
i=1

Taking the derivative with respect to Σk and setting it to zero gives:

N
∂Q(Σk ) 1X h i
=− γik Σ−1
k − Σ−1
k (x i − µk )(x i − µ k ) ⊤ −1
Σk =0
∂Σk 2
i=1

Multiplying through by Σk and solving for Σk , we get:

PN (t+1) (t+1) (t+1) ⊤

(t+1) i=1 γik (xi − µk )(xi − µk )
Σk = PN (t+1)
i=1 γik

This is the weighted covariance matrix, where the weights are the responsibilities.

Summary: The EM algorithm consists in the following iterations:

(0) (0) (0)
(i) we set k = 0 and initialize the algorithm with starting values πj , µj and Σj ;
(k+1)
(ii) E-Step: we calculate the posterior probabilities γj,i :

(k) (k) (k)

(k+1) πj ϕd (xi ; µj , Σj )
γj,i =P (k) (k) (k)
m
s=1 πs ϕd (xi ; µs , Σs )

(iii) M-step: we update the estimators π̂j , µ̂j and Σ̂j :

n
(k+1) 1 X (k+1)
πj = γj,i
n
i=1

Pn (k+1)
(k+1) i=1 γj,i xi
µj = Pn (k+1)
i=1 γj,i
Pn (k+1) (k+1) (k+1) ⊤
(k+1) i=1 γj,i (xi − µj )(xi − µj )
Σj = Pn (k+1)
i=1 γj,i

(iv) we iterate steps 2 and 3 until convergence;

(v) finally, we have

(∞) (∞) (∞)
π̂j = πj , µ̂j = µj , Σ̂j = Σj .

11
Mean-Variance Mixture Distributions
We now generalize the multivariate normal to obtain multivariate normal mixture distributions. The
random vector X is said to have a (multivariate) normal mean-variance mixture distribution if

d √
X = m(W ) + W AZ, (4)

where

(i) Z ∼ Nk (0, Ik );

(ii) W ≥ 0 is a non-negative, scalar-valued rv which is independent of Z;

(iii) A ∈ Rd×k is a matrix; and

(iv) m : [0, ∞) → Rd is a measurable function.

In this case we have that

X | W = w ∼ Nd (m(w), wΣ),

where Σ = AA′ . For instance, a possible concrete specification for the function m(W ) is

m(W ) = µ + W γ, (5)

where µ and γ are parameter vectors in Rd . Since E(X | W ) = µ + W γ and cov(X | W ) = W Σ, it

follows in this case by simple calculations that

E(X) = E(E(X | W )) = µ + E(W )γ, (6)

cov(X) = E(cov(X | W )) + cov(E(X | W ))
= E(W )Σ + var(W )γγ ′ , (7)

If we take γ = 0, we get (multivariate) normal variance mixture distribution.

12
Appendix: EM-algorithm
The expectation–maximization (EM) algorithm is an iterative method to find the maximum likelihood
estimate when the statistical model depends on unobserved latent variables.

We note Y the sample of observed data and Z the sample of unobservable data. We have:
n
X
ℓ(Y, Z; θ) = ln f (yi , zi ; θ)
i=1
Xn

= ln f (zi ; θ) f (yi | zi ; θ)
i=1
n
X n
X
= ln f (zi ; θ) + ln f (yi | zi ; θ).
i=1 i=1

To overcome the latency of the Zi data the EM algorithm is used. This is an iterative procedure
consisting of an E-step, or expectation step (where essentially Zi is replaced by an estimate given the
observed data and current parameter estimates), and an M-step, or maximization step (where the
parameter estimates are updated). The EM algorithm consists in iteratively applying the two steps:

(E–Step) We calculate the conditional expectation of the so called augmented likelihood (in the above
equation) given the data Y1 , . . . , Yn using the parameter values θ(k) . This results in the objective
function h i
Q(θ; θ(k) ) = E ℓ(Y, Z; θ) | Y, θ(k)

with respect to the parameter vector θ(k) . In practice, performing the E-step amounts to replacing
any functions g(Wi ) of the latent mixing variables by the quantities E g(Wi ) | Yi ; θ(k) .

(M–Step) we estimate θ(k+1) by maximizing Q(θ; θ(k) ):

θ(k+1) = arg max Q(θ; θ(k) ).

Alternating between these steps, the EM algorithm produces improved parameter estimates at each
step (in the sense that the value of the original likelihood is continually increased) and we converge
to the maximum likelihood (ML) estimates.

Multivariate Distributions
No ratings yet
Multivariate Distributions
8 pages
Multivariate Normal Distributions
No ratings yet
Multivariate Normal Distributions
6 pages
Topic 3 Multivariate Models I (Week 2)
No ratings yet
Topic 3 Multivariate Models I (Week 2)
27 pages
Multivariate Normal Distribution Guide
No ratings yet
Multivariate Normal Distribution Guide
59 pages
Multi Varia Da 1
No ratings yet
Multi Varia Da 1
59 pages
Multivariate Normal Distribution - Wikipedia, The Free Encyclopedia
No ratings yet
Multivariate Normal Distribution - Wikipedia, The Free Encyclopedia
12 pages
Unit 19
No ratings yet
Unit 19
16 pages
Covariance Matrix (W Krzanowski)
No ratings yet
Covariance Matrix (W Krzanowski)
5 pages
AE - Tema 3 - The Multivariate Gaussian Distribution
No ratings yet
AE - Tema 3 - The Multivariate Gaussian Distribution
6 pages
Multivariate Normal Distribution
No ratings yet
Multivariate Normal Distribution
13 pages
Multivariate Normal Distribution Guide
No ratings yet
Multivariate Normal Distribution Guide
9 pages
LN 1
No ratings yet
LN 1
11 pages
Slides 4
No ratings yet
Slides 4
51 pages
Statistics Review
No ratings yet
Statistics Review
9 pages
Chapter 6 - The Multivariate Normal Distribution and Copulas - 2013 - Simulation
No ratings yet
Chapter 6 - The Multivariate Normal Distribution and Copulas - 2013 - Simulation
13 pages
MTL766 5
No ratings yet
MTL766 5
15 pages
QRM 06
No ratings yet
QRM 06
59 pages
MS Lectures 6
No ratings yet
MS Lectures 6
10 pages
Multivariate Normal Distribution
No ratings yet
Multivariate Normal Distribution
20 pages
Lecture 11 HHJJ
No ratings yet
Lecture 11 HHJJ
6 pages
Multivariate Statistical Analysis: The Multivariate Normal Distribution
No ratings yet
Multivariate Statistical Analysis: The Multivariate Normal Distribution
13 pages
Properties of The Normal and Multivariate Normal Distributions
No ratings yet
Properties of The Normal and Multivariate Normal Distributions
2 pages
Multivariate Normal Distribution
No ratings yet
Multivariate Normal Distribution
19 pages
Unit 1 Multivariate Analysis Lecture Notes
No ratings yet
Unit 1 Multivariate Analysis Lecture Notes
12 pages
SSP4SE Appa
No ratings yet
SSP4SE Appa
10 pages
STAT3006: Tutorial 1: Sample Solutions
No ratings yet
STAT3006: Tutorial 1: Sample Solutions
10 pages
Multivariate Normal Distribution Guide
50% (2)
Multivariate Normal Distribution Guide
8 pages
Multivariate Normal Distribution
No ratings yet
Multivariate Normal Distribution
15 pages
Week 2 DrBuddhananda Banerjee Vector RV
No ratings yet
Week 2 DrBuddhananda Banerjee Vector RV
10 pages
STAT3006 Lecture Notes 2021 Aug8 2021
No ratings yet
STAT3006 Lecture Notes 2021 Aug8 2021
110 pages
The Multivariate Normal Distribution: f (x) = √ e −∞ 0. /σ
No ratings yet
The Multivariate Normal Distribution: f (x) = √ e −∞ 0. /σ
5 pages
Multivariate Normal
No ratings yet
Multivariate Normal
24 pages
Multivariate Normal Distribution
No ratings yet
Multivariate Normal Distribution
14 pages
Multivariate Normal Distribution
No ratings yet
Multivariate Normal Distribution
9 pages
Slides
No ratings yet
Slides
38 pages
MVN Exercises Solution Guide
No ratings yet
MVN Exercises Solution Guide
3 pages
Multivariate Statistical Distributions
No ratings yet
Multivariate Statistical Distributions
12 pages
Symbiosis International (Deemed University) : Symbiosis School For Online and Digital Learning
No ratings yet
Symbiosis International (Deemed University) : Symbiosis School For Online and Digital Learning
84 pages
Multivariate Normal Distribution: 3.1 Basic Properties
No ratings yet
Multivariate Normal Distribution: 3.1 Basic Properties
13 pages
Multi Normal
No ratings yet
Multi Normal
6 pages
Stat331-Multiple Linear Regression
No ratings yet
Stat331-Multiple Linear Regression
13 pages
Stat520 Ch.3
No ratings yet
Stat520 Ch.3
5 pages
4.multivariate Normal Distribution II
No ratings yet
4.multivariate Normal Distribution II
18 pages
Multi Variate Analysis
No ratings yet
Multi Variate Analysis
133 pages
The Mvtnorm Package: R Topics Documented
No ratings yet
The Mvtnorm Package: R Topics Documented
12 pages
Stat Note4
No ratings yet
Stat Note4
4 pages
Sampling From A Multivariate Normal Distribution - Dr. Juan Camilo Orduz
No ratings yet
Sampling From A Multivariate Normal Distribution - Dr. Juan Camilo Orduz
8 pages
CH 4
No ratings yet
CH 4
3 pages
Random Vectors
No ratings yet
Random Vectors
9 pages
Applied Multivariate Statistical Analysis-192-208
No ratings yet
Applied Multivariate Statistical Analysis-192-208
17 pages
Elliptical Distributions Overview
No ratings yet
Elliptical Distributions Overview
42 pages
Rohatgi - An Introduction To Probability and Statistics Wiley 2015 - Removed
No ratings yet
Rohatgi - An Introduction To Probability and Statistics Wiley 2015 - Removed
13 pages
Sampling MND MLE AED 2021
No ratings yet
Sampling MND MLE AED 2021
28 pages
Multinomial Distribution Explained
No ratings yet
Multinomial Distribution Explained
8 pages
Joint Normal Distribution Basics
No ratings yet
Joint Normal Distribution Basics
4 pages
Bivariate Normal Distribution Presentation
No ratings yet
Bivariate Normal Distribution Presentation
14 pages
Stochastic Processes
No ratings yet
Stochastic Processes
46 pages
Financial Engineering & Risk Management: Review of Basic Probability
No ratings yet
Financial Engineering & Risk Management: Review of Basic Probability
46 pages
Notes LinearTimeSeries
No ratings yet
Notes LinearTimeSeries
12 pages
Notes 1
No ratings yet
Notes 1
19 pages
Practice Sheet
No ratings yet
Practice Sheet
2 pages
Notes 4
No ratings yet
Notes 4
11 pages
ADM-SHS-StatProb-Q3-M3-Finding Possible Values of A Random Variable
No ratings yet
ADM-SHS-StatProb-Q3-M3-Finding Possible Values of A Random Variable
27 pages
‎⁨مد احصاء حيوي 1446⁩
No ratings yet
‎⁨مد احصاء حيوي 1446⁩
2 pages
Chapter 6
No ratings yet
Chapter 6
4 pages
Statistics Lecture: Central Limit Theorem
No ratings yet
Statistics Lecture: Central Limit Theorem
7 pages
Performance Modeling and Design of Computer Systems (Chapter 3 Only)
0% (1)
Performance Modeling and Design of Computer Systems (Chapter 3 Only)
53 pages
Monte Carlo Oil and Gas Reserve Estimation 080601
100% (1)
Monte Carlo Oil and Gas Reserve Estimation 080601
24 pages
Ieil"R.: Matrikulasi Malaysia
No ratings yet
Ieil"R.: Matrikulasi Malaysia
10 pages
Reinforcement Learning Guide
No ratings yet
Reinforcement Learning Guide
167 pages
Local Media100839098740108681
No ratings yet
Local Media100839098740108681
4 pages
S1 Mindmap
No ratings yet
S1 Mindmap
1 page
BMS - Statistics For Business de J2fMxBv
No ratings yet
BMS - Statistics For Business de J2fMxBv
4 pages
Probability & Stochastic Exam 2007
No ratings yet
Probability & Stochastic Exam 2007
2 pages
Introduction To Computational Data Analytics
No ratings yet
Introduction To Computational Data Analytics
11 pages
Unit II Descriptive-Statistics-And-Correlation
No ratings yet
Unit II Descriptive-Statistics-And-Correlation
19 pages
Bandit Algorithms (Tor Lattimore, Csaba Szepesvári) (Z-Library)
0% (1)
Bandit Algorithms (Tor Lattimore, Csaba Szepesvári) (Z-Library)
537 pages
Multivariate Capability Analysis Webinar
No ratings yet
Multivariate Capability Analysis Webinar
32 pages
OUtput VALIDITAS, RELIABILITAS, KORELASI
No ratings yet
OUtput VALIDITAS, RELIABILITAS, KORELASI
26 pages
ECO304 - Sampling Distribution
No ratings yet
ECO304 - Sampling Distribution
39 pages
William D. Penny - Signal Processing Course
100% (1)
William D. Penny - Signal Processing Course
178 pages
Assignment 02
No ratings yet
Assignment 02
3 pages
Continuation of Random Variable
No ratings yet
Continuation of Random Variable
11 pages
ABACUS-Business Statistics
No ratings yet
ABACUS-Business Statistics
3 pages
Probability Problems With Solution
No ratings yet
Probability Problems With Solution
14 pages
Dec 2021 7
No ratings yet
Dec 2021 7
4 pages
Acceptance Sampling & Probability
No ratings yet
Acceptance Sampling & Probability
33 pages
Yaya Sagna Article 2024
No ratings yet
Yaya Sagna Article 2024
25 pages
Random Effects Probit and Logit Understanding Predictions and Marginal Effects
No ratings yet
Random Effects Probit and Logit Understanding Predictions and Marginal Effects
9 pages
Marley Colonius JMP92
No ratings yet
Marley Colonius JMP92
20 pages
Foundations of Quantitative Finance. Book IV: Distribution Functions and Expectations 1st Edition Robert R. Reitano
100% (2)
Foundations of Quantitative Finance. Book IV: Distribution Functions and Expectations 1st Edition Robert R. Reitano
47 pages
Gaussian Mixture Model (GMM)
No ratings yet
Gaussian Mixture Model (GMM)
10 pages

Notes 5 Multivariate Distributions

Uploaded by

Notes 5 Multivariate Distributions

Uploaded by

MA634 Financial Risk Management

FX (x) = FX (x1 , . . . , xn ) = P (X1 ≤ x1 , . . . , Xn ≤ xn ).

FXi (xi ) = FX (∞, . . . , ∞, xi , ∞, . . . , ∞).

Fij (xi , xj ) = FX (∞, . . . , ∞, xi , ∞, . . . , ∞, xj , ∞, . . . , ∞).

If X1 = (X1 , . . . , Xk )⊤ and X2 = (Xk+1 , . . . , Xn )⊤ is a partition of X, then the conditional CDF

If X has a PDF f (·), then it satisfies

where fX1 (·) is the joint marginal PDF of X1 .

Assuming it exists, the mean vector of X is given by

The correlation matrix, ρ(X), has as its (i, j)th element

For any matrix A ∈ Rk×n and vector a ∈ Rk we have

E[AX + a] = AE[X] + a, (1)

Multivariate Normal Distribution

(x − µ)′ Σ−1 (x − µ) = constant.

(i) Marginal distributions. All subsets of X are themselves multivariate normal.

(ii) Any linear combination of the Xi , say c′ X = c1 X1 + c2 X2 + · · · + cp Xp , is normally distributed

so that " # " # " #!

The conditional distribution of X1 given X2 = x2 , f (X1 |X2 = x2 ), is p-variate multivariate

(vii) Convolutions. If X and Y are independent d-dimensional random vectors satisfying X ∼

Estimation of Multivariate Normal Distributions

The first-order condition with respect to µ is:

where S is the p × p matrix defined in the following way:

We deduce the first-order condition:

It follows that the ML estimator of Σ is the empirical covariance matrix:

Limitations In many risk-management applications the multivariate normal distribution is not a

We now discuss some distributions that address some of these defects.

To maximize the log-likelihood function, we use EM algorithm1 .

In case of one-dimensional mixture, the parameters of the GMM are:

The complete-data log-likelihood is:

where γik = P (zi = k|xi ) is the responsibility calculated in the E-step.

P (zi = k) p(xi |zi = k)

Substituting these into the expression for γik :

πk N (xi |µk , σk2 )

Expanding the Gaussian pdf:

Update of Mixing Coefficients πk :

Taking the derivative with respect to πk and setting it to zero gives:

Substituting the Gaussian pdf:

The relevant term for µk is the quadratic term:

Taking the derivative with respect to µk and setting it to zero:

This leads to: PN

Update of Variances σk2 :

Multiplying through by σk2 and solving for σk2 , we get:

The complete-data log-likelihood is:

where γik = P (zi = k|xi ) is the responsibility calculated in the E-step.

In the M-step, we maximize Q(π, µ, Σ) with respect to the parameters πk , µk , and Σk .

Update of Mixing Coefficients πk :

Taking the derivative with respect to πk and setting it to zero gives:

Substituting the Gaussian pdf:

The relevant term for µk is the quadratic form:

Taking the derivative with respect to µk and setting it to zero:

This leads to:

Taking the derivative with respect to Σk and setting it to zero gives:

Multiplying through by Σk and solving for Σk , we get:

PN (t+1) (t+1) (t+1) ⊤

Summary: The EM algorithm consists in the following iterations:

(k) (k) (k)

(iii) M-step: we update the estimators π̂j , µ̂j and Σ̂j :

(iv) we iterate steps 2 and 3 until convergence;

(v) finally, we have

(ii) W ≥ 0 is a non-negative, scalar-valued rv which is independent of Z;

(iii) A ∈ Rd×k is a matrix; and

(iv) m : [0, ∞) → Rd is a measurable function.

In this case we have that

where µ and γ are parameter vectors in Rd . Since E(X | W ) = µ + W γ and cov(X | W ) = W Σ, it

E(X) = E(E(X | W )) = µ + E(W )γ, (6)

If we take γ = 0, we get (multivariate) normal variance mixture distribution.

(M–Step) we estimate θ(k+1) by maximizing Q(θ; θ(k) ):

θ(k+1) = arg max Q(θ; θ(k) ).

You might also like