KEMBAR78
Multivariate Discrete Distributions | PDF | Probability Distribution | Variance
0% found this document useful (0 votes)
34 views27 pages

Multivariate Discrete Distributions

Chapter 2 discusses multivariate discrete distributions, focusing on joint, marginal, and conditional distributions of discrete random variables. It emphasizes the importance of understanding the joint behavior of multiple random variables through examples such as coin tossing and dice rolls. The chapter also introduces concepts like joint probability mass functions (pmf) and cumulative distribution functions (CDF) for discrete variables.

Uploaded by

Nitin Chaturvedi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views27 pages

Multivariate Discrete Distributions

Chapter 2 discusses multivariate discrete distributions, focusing on joint, marginal, and conditional distributions of discrete random variables. It emphasizes the importance of understanding the joint behavior of multiple random variables through examples such as coin tossing and dice rolls. The chapter also introduces concepts like joint probability mass functions (pmf) and cumulative distribution functions (CDF) for discrete variables.

Uploaded by

Nitin Chaturvedi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

Chapter 2

Multivariate Discrete Distributions

We have provided a detailed overview of distributions of one discrete or one


continuous random variable in the previous chapter. But often in applications, we are
just naturally interested in two or more random variables simultaneously. We may
be interested in them simultaneously because they provide information about each
other, or because they arise simultaneously as part of the data in some scientific
experiment. For instance, on a doctor’s visit, the physician may check someone’s
blood pressure, pulse rate, blood cholesterol level, and blood sugar level, because
together they give information about the general health of the patient. In such cases,
it becomes essential to know how to operate with many random variables simul-
taneously. This is done by using joint distributions. Joint distributions naturally
lead to considerations of marginal and conditional distributions. We study joint,
marginal, and conditional distributions for discrete random variables in this chapter.
The concepts of these various distributions for continuous random variables are not
different; but the techniques are mathematically more sophisticated. The continuous
case is treated in the next chapter.

2.1 Bivariate Joint Distributions and Expectations of Functions

We present the fundamentals of joint distributions of two variables in this section.


The concepts in the multivariate case are the same, although the technicalities are
somewhat more involved. We treat the multivariate case in a later section. The idea
is that there is still an underlying experiment , with an associated sample space .
But now we have two or more random variables on the sample space . Random
variables being functions on the sample space , we now have multiple functions,
say X.!/; Y .!/; : : : ; and so on . We want to study their joint behavior.

Example 2.1 (Coin Tossing). Consider the experiment  of tossing a fair coin three
times. Let X be the number of heads among the first two tosses, and Y the num-
ber of heads among the last two tosses. If we consider X and Y individually, we
realize immediately that they are each Bin.2; :5/ random variables. But the individ-
ual distributions hide part of the full story. For example, if we knew that X was 2,

A. DasGupta, Probability for Statistics and Machine Learning: Fundamentals 95


and Advanced Topics, Springer Texts in Statistics, DOI 10.1007/978-1-4419-9634-3 2,
c Springer Science+Business Media, LLC 2011
96 2 Multivariate Discrete Distributions

then that would imply that Y must be at least 1. Thus, their joint behavior cannot
be fully understood from their individual distributions; we must study their joint
distribution.
Here is what we mean by their joint distribution. The sample space  of this
experiment is

 D fHHH; HH T; H TH; H T T; THH; TH T; T TH; T T T g:

Each sample point has an equal probability 18 . Denoting the sample points as
!1 ; !2 ; : : : ; !8 , we see that if !1 prevails, then X.!1 / D Y .!1 / D 2. but if !2
prevails, then X.!2 / D 2; Y .!2 / D 1. The combinations of all possible values of
.X; Y / are

.0; 0/; .0; 1/; .0; 2/; .1; 0/; .1; 1/; .1; 2/; .2; 0/; .2; 1/; .2; 2/:

The joint distribution of .X; Y / provides the probability p.x; y/ D P .X Dx; Y Dy/
for each such combination of possible values .x; y/. Indeed, by direct counting us-
ing the eight equally likely sample points, we see that

1 1 1 1
p.0; 0/ D ; p.0; 1/ D ; p.0; 2/ D 0; p.1; 0/ D ; p.1; 1/ D I
8 8 8 4
1 1 1
p.1; 2/ D ; p.2; 0/ D 0; p.2; 1/ D ; p.2; 2/ D :
8 8 8

For example, why is p.0; 1/ 81 ? This is because the combination .X D 0; Y D 1/ is


favored by only one sample point, namely T TH . It is convenient to present these
nine different probabilities in the form of a table as follows.

Y
X 0 1 2
1 1
0 8 8
0
1 1 1
1 8 4 8
1 1
2 0 8 8

Such a layout is a convenient way to present the joint distribution of two discrete
random variables with a small number of values. The distribution itself is called the
joint pmf ; here is a formal definition.

Definition 2.1. Let X; Y be two discrete random variables with respective sets of
values x1 ; x2 ; : : : ; and y1 ; y2 ; : : : ; defined on a common sample space . The joint
pmf of X; Y is defined to be the function p.xi ; yj /DP .X Dxi ; Y Dyj /; i; j  1,
and p.x; y/ D 0 at any other point .x; y/ in R2 .
2.1 Bivariate Joint Distributions and Expectations of Functions 97

The requirements of a joint pmf are that


(i) p.x; y/  0 8.x; y/I
P P
(ii) i j p.xi ; yj / D 1:
Thus, if we write the joint pmf in the form of a table, then all entries should be
nonnegative, and the sum of all the entries in the table should be one.
As in the case of a single variable, we can define a CDF for more than one
variable also. For the case of two variables, here is the definition of a CDF.
Definition 2.2. Let X; Y be two discrete random variables, defined on a common
sample space . The joint CDF, or simply the CDF, of .X; Y / is a function F W
R2 ! Œ0; 1 defined as F .x; y/ D P .X  x; Y  y/; x; y 2 R:
Like the joint pmf, the CDF also characterizes the joint distribution of two dis-
crete random variables. But it is not very convenient or even interesting to work with
the CDF in the case of discrete random variables. It is much preferable to work with
the pmf when dealing with discrete random variables.
Example 2.2 (Maximum and Minimum in Dice Rolls). Suppose a fair die is rolled
twice, and let X; Y be the larger and the smaller of the two rolls (note that X can
be equal to Y ). Each of X; Y takes the individual values 1; 2; : : : ; 6, but we have
necessarily X  Y . The sample space of this experiment is
f11; 12; 13; : : : ; 64; 65; 66g:
By direct counting, for example, p.2; 1/ D 36 2
. Indeed, p.x; y/ D 36 2
for each
x; y D 1; 2; : : : ; 6; x > y, and p.x; y/ D 36 for x D y D 1; 2; : : : ; 6. Here is what
1

the joint pmf looks like in the form of a table:

Y
X 1 2 3 4 5 6
1
1 36 0 0 0 0 0
1 1
2 18 36
0 0 0 0
1 1 1
3 18 18 36
0 0 0
1 1 1 1
4 18 18 18 36
0 0
1 1 1 1 1
5 18 18 18 18 36
0
1 1 1 1 1 1
6 18 18 18 18 18 36

The individual pmfs of X; Y are easily recovered from the joint distribution. For
example,
X
6
1
P .X D 1/ D P .X D 1; Y D y/ D ; and
yD1
36

X
6
1 1 1
P .X D 2/ D P .X D 2; Y D y/ D C D ;
yD1
18 36 12
98 2 Multivariate Discrete Distributions

and see on. The individual pmfs are obtained by summing the joint probabilities
over all values of the other variable. They are:
x 1 2 3 4 5 6
1 3 5 7 9 11
pX .x/ 36 36 36 36 36 36
y 1 2 3 4 5 6
11 9 7 5 3 1
pY .y/ 36 36 36 36 36 36

From the individual pmf of X , we can find the expectation of X . Indeed,


1 3 11 161
E.X / D 1  C2 CC6 D :
36 36 36 36
Similarly, E.Y / D 9136
. The individual pmfs are called marginal pmfs, and here is
the formal definition.

Definition 2.3. Let p.x; y/ be the joint pmf


P of .X; Y /. The marginal pmf of a func-
tion Z D g.X; Y / is defined as pZ .z/ D .x;y/Wg.x;y/Dz p.x; y/: In particular,
X X
pX .x/ D p.x; y/I pY .y/ D p.x; y/;
y x

and for any event A, X


P .A/ D p.x; y/:
.x;y/2A

Example 2.3. Consider a joint pmf given by the formula


p.x; y/ D c.x C y/; 1  x; y  n;
where c is a normalizing constant.
First of all, we need to evaluate c by equating

X
n X
n
p.x; y/ D 1
xD1 yD1

X
n X
n
,c .x C y/ D 1
xD1 yD1

Xn  
n.n C 1/
,c nx C D1
xD1
2
 2 
n .n C 1/ n2 .n C 1/
,c C D1
2 2
, cn2 .n C 1/ D 1
1
,c D :
n2 .n C 1/
2.1 Bivariate Joint Distributions and Expectations of Functions 99

The joint pmf is symmetric between x and y (because x C y D y C x), and so,
X; Y have the same marginal pmf. For example, X has the pmf
Xn
1 Xn
pX .x/ D p.x; y/ D 2 .x C y/
yD1
n .n C 1/ yD1
 
1 n.n C 1/
D 2 nx C
n .n C 1/ 2
x 1
D C ; 1  x  n:
n.n C 1/ 2n
Suppose now we want to compute P .X > Y /. This can be found by summing
p.x; y/ over all combinations for which x > y. But this longer calculation
can be avoided by using a symmetry argument that is often very useful. Note
that because the joint pmf is symmetric between x and y, we must have
P .X > Y / D P .Y > X / D p (say). But, also,
P .X > Y / C P .Y > X / C P .X D Y / D 1 ) 2p C P .X D Y / D 1
1  P .X D Y /
)pD :
2
Now,
X
n X
n
P .X D Y / D p.x; x/ D c  2x
xD1 xD1
1 1
D n.n C 1/ D :
n2 .n C 1/ n
Therefore, P .X > Y / D p D n1
2n
 12 ; for large n.
Example 2.4 (Dice Rolls Revisited). Consider again the example of two rolls of a
fair die, and suppose X; Y are the larger and the smaller of the two rolls. We have
worked out the joint distribution of .X; Y / in Example 2.2. Suppose we want to
find the distribution of the difference, X  Y . The possible values of X  Y are
0; 1; : : : ; 5, and we find P .X  Y D k/ by using the joint distribution of .X; Y /:
1
P .X  Y D 0/ D p.1; 1/ C p.2; 2/ C    C p.6; 6/ D I
6
5
P .X  Y D 1/ D p.2; 1/ C p.3; 2/ C    C p.6; 5/ D I
18
2
P .X  Y D 2/ D p.3; 1/ C p.4; 2/ C p.5; 3/ C p.6; 4/ D I
9
1
P .X  Y D 3/ D p.4; 1/ C p.5; 2/ C p.6; 3/ D I
6
1
P .X  Y D 4/ D p.5; 1/ C p.6; 2/ D I
9
1
P .X  Y D 5/ D p.6; 1/ D :
18
There is no way to find the distribution of X Y except by using the joint distribution
of .X; Y /.
100 2 Multivariate Discrete Distributions

Suppose now we also want to know the expected value of X  Y . Now that we
have the distribution of X  Y worked out, we can find the expectation by directly
using the definition of expectation:

X
5
E.X  Y / D kP .X  Y D k/
kD0
5 4 1 4 5 35
D C C C C D :
18 9 2 9 18 18

But, we can also use linearity of expectations and find E.X  Y / as

161 91 35
E.X  Y / D E.X /  E.Y / D  D
36 36 18

(see Example 2.2 for E.X /; E.Y /).


A third possible way to compute E.X  Y / is to treat X P P Y as a function of
.X; Y / and use the joint pmf of .X; Y / to find E.X  Y / as x y .x  y/p.x; y/.
In this particular example, this is an unncessarily laborious calculation, because
luckily we can find E.X  Y / by other quicker means in this example, as we just
saw. But in general, one has to resort to the joint pmf to calculate the expectation of
a function of .X; Y /. Here is the formal formula.
Theorem 2.1 (Expectation of a Function). Let .X; Y / have the joint pmf
p.x; y/, and let g.X;P YP
/ be a function of .X; Y /. We say that the expectation
of g.X; Y / exists if x y jg.x; y/jp.x; y/ < 1, in which case,
XX
EŒg.X; Y / D g.x; y/p.x; y/:
x y

2.2 Conditional Distributions and Conditional Expectations

Sometimes we want to know what the expected value is of one of the variables,
say X , if we knew the value of the other variable Y . For example, in the die tossing
experiment above, what should we expect the larger of the two rolls to be if the
smaller roll is known to be 2?
To answer this question, we have to find the probabilities of the various values
of X , conditional on knowing that Y equals some given y, and then average by
using these conditional probabilities. Here are the formal definitions.
Definition 2.4 (Conditional Distribution). Let .X; Y / have the joint pmf p.x; y/.
The conditional distribution of X given Y D y is defined to be

p.x; y/
p.xjy/ D P .X D xjY D y/ D ;
pY .y/
2.2 Conditional Distributions and Conditional Expectations 101

and the conditional expectation of X given Y D y is defined to be


P P
X xp.x; y/ xp.x; y/
E.X jY D y/ D xp.xjy/ D x
D Px :
x
pY .y/ x p.x; y/

The conditional distribution of Y given X D x and the conditional expectation of


Y given X D x are defined analogously, by switching the roles of X and Y in the
above definitions.
We often casually write E.X jy/ to mean E.X jY D y/.
Two easy facts that are nevertheless often useful are the following.

Proposition. Let X; Y be random variables defined on a common sample space .


Then,
(a) E.g.Y /jY D y/ D g.y/; 8y; and for any function g;
(b) E.Xg.Y /jY D y/ D g.y/E.X jY D y/ 8y; and for any function g.
Recall that in Chapter 1, we defined two random variables to be independent if
P .X  x; Y  y/ D P .X  x/P .Y  y/ 8 x; y 2 R. This is of course a correct
definition; but in the case of discrete random variables, it is more convenient to
think of independence in terms of the pmf. The definition below puts together some
equivalent definitions of independence of two discrete random variables.

Definition 2.5 (Independence). Let .X; Y / have the joint pmf p.x; y/. Then X; Y
are said to be independent if

p.xjy/ D pX .x/; 8 x; y such that pY .y/ > 0I


, p.yjx/ D pY .y/: 8 x; y such that pX .x/ > 0I
, p.x; y/ D pX .x/pY .y/; 8 x; yI
, P .X  x; Y  y/ D P .X  x/P .Y  y/ 8 x; y:

The third equivalent condition in the above list is usually the most convenient one
to verify and use.
One more frequently useful fact about conditional expectations is the following.

Proposition. Suppose X; Y are independent random variables. Then, for any func-
tion g.X / such that the expectations below exist, and for any y,

EŒg.X /jY D y D EŒg.X /:

2.2.1 Examples on Conditional Distributions and Expectations

Example 2.5 (Maximum and Minimum in Dice Rolls). In the experiment of two rolls
of a fair die, we have worked out the joint distribution of X; Y , where X is the larger
102 2 Multivariate Discrete Distributions

and Y the smaller of the two rolls. Using this joint distribution, we can now find the
conditional distributions. For instance,

P .Y D 1jX D 1/ D 1I P .Y D yjX D 1/ D 0; if y > 1I


1=18 2
P .Y D 1jX D 2/ D D I
1=18 C 1=36 3
1=36 1
P .Y D 2jX D 2/ D D I
1=18 C 1=36 3
P .Y D yjX D 2/ D 0; if y > 2I
1=18 2
P .Y D yjX D 6/ D D ; if 1  y  5I
5=18 C 1=36 11
1=36 1
P .Y D 6jX D 6/ D D :
5=18 C 1=36 11

Example 2.6 (Conditional Expectation in a 2  2 Table). Suppose X; Y are binary


variables, each taking only the values 0; 1 with the following joint distribution.

Y
X 0 1
0 s t
1 u v

We want to evaluate the conditional expectation of X given Y D 0; 1, respectively.


By using the definition of conditional expectation,

0  p.0; 0/ C 1  p.1; 0/ u
E.X jY D 0/ D D I
p.0; 0/ C p.1; 0/ sCu
0  p.0; 1/ C 1  p.1; 1/ v
E.X jY D 1/ D D :
p.0; 1/ C p.1; 1/ t Cv

Therefore,

v u vs  ut
E.X jY D 1/  E.X jY D 0/ D  D :
t Cv sCu .t C v/.s C u/

It follows that we can now have the single formula

u vs  ut
E.X jY D y/ D C y;
sCu .t C v/.s C u/

y D 0; 1. We now realize that the conditional expectation of X given Y D y is a


linear function of y in this example. This is the case whenever both X; Y are binary
variables, as they were in this example.
2.2 Conditional Distributions and Conditional Expectations 103

Example 2.7 (Conditional Expectation in Dice Experiment). Consider again the


example of the joint distribution of the maximum and the minimum of two rolls of
a fair die. Let X denote the maximum, and Y the minimum. We find E.X jY D y/
for various values of y.
By using the definition of E.X jY D y/, we have, for example,

1 1
C 18 Œ2 C
1
   C 6 41
E.X jY D 1/ D 36
D D 3:73I
1
36
C 185 11

as another example,

3 1
C 18
1
 15 33
E.X jY D 3/ D 36
D D 4:71I
1
36
C 18
3 7

and,
5 1
C6 1
17
E.X jY D 5/ D 36 18
D D 5:77:
1
36
C 1
18
3

We notice that E.X jY D 5/ > E.X jY D 3/ > E.X jY D 1/I in fact, it is true that
E.X jY D y/ is increasing in y in this example. This does make intuitive sense.
Just as in the case of a distribution of a single variable, we often also want a mea-
sure of variability in addition to a measure of average for conditional distributions.
This motivates defining a conditional variance.

Definition 2.6 (Conditional Variance). Let .X; Y / have the joint pmf p.x; y/.
Let X .y/ D E.X jY D y/: The conditional variance of X given Y D y is
defined to be
X
Var.X jY D y/ D EŒ.X  X .y//2 jY D y D .x  X .y//2 p.xjy/:
x

We often write casually Var.X jy/ to mean Var.X jY D y/.

Example 2.8 (Conditional Variance in Dice Experiment). We work out the condi-
tional variance of the maximum of two rolls of a die given the minimum. That is,
suppose a fair die is rolled twice, and X; Y are the larger and the smaller of the two
rolls; we want to compute Var.X jy/.
For example, if y D 3, then X .y/ D E.X jY D y/ D E.X jY D 3/ D 4:71
(see the previous example). Therefore,
X
Var.X jy/ D .x  4:71/2 p.xj3/
x
.3  4:71/2  36
1
C.4  4:71/2  1
C.5  4:71/2  18
1
C.6  4:71/2  1
D 18 18
1
36
C 18 C 18 C 18
1 1 1

D 1:06:
104 2 Multivariate Discrete Distributions

To summarize, given that the minimum of two rolls of a fair die is 3, the expected
value of the maximum is 4.71 and the variance of the maximum is 1.06.
These two values, E.X jy/ and Var.X jy/, change as we change the given
value y. Thus E.X jy/ and Var.X jy/ are functions of y, and for each separate y, a
new calculation is needed. If X; Y happen to be independent, then of course what-
ever y is, E.X jy/ D E.X /, and Var.X jy/ D Var.X /.
The next result is an important one in many applications.

Theorem 2.2 (Poisson Conditional Distribution). Let X; Y be independent


Poisson random variables, with means ; . Then the conditional distribution
of X given X C Y D t is Bin.t; p/, where p D C

.

Proof. Clearly, P .X D xjX C Y D t/ D 0 8x > t: For x  t,

P .X D x; X C Y D t/
P .X D xjX C Y D t/ D
P .X C Y D t/
P .X D x; Y D t  x/
D
P .X C Y D t/
e  x e  t x tŠ
D
xŠ .t  x/Š e .C/ . C /t

(on using the fact that X C Y  Poi. C /; see Chapter 1)

tŠ x t x
D
xŠ.t  x/Š . C /t
! x  t x
t  
D ;
x C C


which is the pmf of the Bin.t; C / distribution. t
u

2.3 Using Conditioning to Evaluate Mean and Variance

Conditioning is often an extremely effective tool to calculate probabilities, means,


and variances of random variables with a complex or clumsy joint distribution. Thus,
in order to calculate the mean of a random variable X , it is sometimes greatly con-
venient to follow an iterative process, whereby we first evaluate the mean of X after
conditioning on the value y of some suitable random variable Y , and then average
over y. The random variable Y has to be chosen judiciously, but is often clear from
the context of the specific problem. Here are the precise results on how this tech-
nique works; it is important to note that the next two results hold for any kind of
random variables, not just discrete ones.
2.3 Using Conditioning to Evaluate Mean and Variance 105

Theorem 2.3 (Iterated Expectation Formula). Let X; Y be random variables


defined on the same probability space . Suppose E.X / and E.X jY D y/ exist
for each y. Then,
E.X / D EY ŒE.X jY D y/I
thus, in the discrete case,
X
E.X / D X .y/pY .y/;
y

where X .y/ D E.X jY D y/.

Proof. We prove this for the discrete case. By definition of conditional expectation,
P
xp.x; y/
X .y/ D x
pY .y/
X XX XX
) X .y/pY .y/ D xp.x; y/ D xp.x; y/
y y x x y
X X X
D x p.x; y/ D xpX .x/ D E.X /:
x y x
The corresponding variance calculation formula is the following. The proof of
this uses the iterated mean formula above, and applies it to .X  X /2 . u
t

Theorem 2.4 (Iterated Variance Formula). Let X; Y be random variables de-


fined on the same probability space . Suppose Var.X /; Var.X jY D y/ exist for
each y. Then,

Var.X / D EY ŒVar.X jY D y/ C VarY ŒE.X jY D y/:

Remark. These two formulas for iterated expectation and iterated variance are valid
for all types of variables, not just the discrete ones. Thus, these same formulas still
hold when we discuss joint distributions for continuous random variables in the next
chapter.
Some operational formulas that one should be familiar with are summarized
below.

Conditional Expectation and Variance Rules.

E.g.X /jX D x/ D g.x/I E.g.X /h.Y /jY D y/ D h.y/E.g.X /jY D y/I


E.g.X /jY D y/ D E.g.X // if X; Y are independentI
Var.g.X /jX D x/ D 0I Var.g.X /h.Y /jY D y/ D h2 .y/Var.g.X /jY D y/I
Var.g.X /jY D y/ D Var.g.X // if X; Y are independent:

Let us see some applications of the two iterated expectation and iterated variance
formulas.
106 2 Multivariate Discrete Distributions

Example 2.9 (A Two-Stage Experiment). Suppose n fair dice are rolled. Those that
show a six are rolled again. What are the mean and the variance of the number of
sixes obtained in the second round of this experiment?
Define Y to be the number of dice in the first round that show a six, and X the
number of dice in the second round that show a six. Given Y D y; X  Bin.y; 16 /,
and Y itself is distributed as Bin.n; 16 /. Therefore,
hy i n
E.X / D EŒE.X jY D y/ D EY D :
6 36
Also,

Var.X / D EY ŒVar.X jY D y/ C VarY ŒE.X jY D y/


  hy i
15
D EY y C VarY
66 6
5 n 1 15
D C n
36 6 36 6 6
5n 5n 35n
D C D :
216 1296 1296
Example 2.10. Suppose a chicken lays a Poisson number of eggs per week with
mean . Each egg, independently of the others, has a probability p of fertilizing.
We want to find the mean and the variance of the number of eggs fertilized in a
week.
Let N denote the number of eggs hatched and X the number of eggs fertilized.
Then, N  Poi./, and given N D n; X  Bin.n; p/. Therefore,

E.X / D EN ŒE.X jN D n/ D EN Œnp D p;

and,

Var.X / D EN ŒVar.X jN D n/ C VarN .E.X jN D n/


D EN Œnp.1  p/ C VarN .np/ D p.1  p/ C p 2  D p:

Interestingly, the number of eggs actually fertilized has the same mean and variance
p, (Can you see why?)
Remark. In all of these examples, it was important to choose the variable Y wisely
on which one should condition. The efficiency of the technique depends on this very
crucially.
Sometimes a formal generalization of the iterated expectation formula when a
third variable Z is present is useful. It is particularly useful in hierarchical statis-
tical modeling of distributions, where an ultimate marginal distribution for some
X is constructed by first conditioning on a number of auxiliary variables, and then
gradually unconditioning them. We state the more general iterated expectation for-
mula; its proof is exactly similar to that of the usual iterated expectation formula.
2.4 Covariance and Correlation 107

Theorem 2.5 (Higher-Order Iterated Expectation). Let X; Y; Z be random


variables defined on the same sample space . Assume that each conditional
expectation below and the marginal expectation E.X / exist. Then,

E.X / D EY ŒEZjY fE.X jY D y; Z D z/g:

2.4 Covariance and Correlation

We know that variance is additive for independent random variables; that is, if
X1 ; X2 ; : : : ; Xn are independent random variables, then Var.X1 CX2 C  CXn / D
Var.X1 / C    C Var.Xn /: In particular, for two independent random variables
X; Y; Var.X CY / D Var.X /CVar.Y /: However, in general, variance is not additive.
Let us do the general calculation for Var.X C Y /.

Var.X C Y / D E.X C Y /2  ŒE.X C Y /2


D E.X 2 C Y 2 C 2X Y /  ŒE.X / C E.Y /2
D E.X 2 /CE.Y 2 /C2E.X Y /ŒE.X /2 ŒE.Y /2 2E.X /E.Y /
D E.X 2 /ŒE.X /2 CE.Y 2 /ŒE.Y /2 C2ŒE.X Y /E.X /E.Y /
D Var.X / C Var.Y / C 2ŒE.X Y /  E.X /E.Y /:

We thus have the extra term 2ŒE.X Y /  E.X /E.Y / in the expression for
Var.X C Y /; of course, when X; Y are independent, E.X Y / D E.X /E.Y /, and
so the extra term drops out. But, in general, one has to keep the extra term. The
quantity E.X Y /  E.X /E.Y / is called the covariance of X and Y .
Definition 2.7 (Covariance). Let X; Y be two random variables defined on a
common sample space , such that E.X Y /; E.X /; E.Y / all exist. The covariance
of X and Y is defined as

Cov.X; Y / D E.X Y /  E.X /E.Y / D EŒ.X  E.X //.Y  E.Y //:

Remark. Covariance is a measure of whether two random variables X; Y tend to in-


crease or decrease together. If a larger value of X generally causes an increment in
the value of Y , then often (but not always) they have a positive covariance. For ex-
ample, taller people tend to weigh more than shorter people, and height and weight
usually have a positive covariance.
Unfortunately, however, covariance can take arbitrary positive and arbitrary neg-
ative values. Therefore, by looking at its value in a particular problem, we cannot
judge whether it is a large value. We cannot compare a covariance with a standard
to judge if it is large or small. A renormalization of the covariance cures this prob-
lem, and calibrates it to a scale of 1 to C1. We can judge such a quantity as large,
small, or moderate; for example, .95 would be large positive, .5 moderate, and .1
small. The renormalized quantity is the correlation coefficient or simply the corre-
lation between X and Y .
108 2 Multivariate Discrete Distributions

Definition 2.8 (Correlation). Let X; Y be two random variables defined on a


common sample space , such that Var.X /; Var.Y / are both finite. The correla-
tion between X; Y is defined to be

Cov.X; Y /
X;Y D p p :
Var.X / Var.Y /

Some important properties of covariance and correlation are put together in the next
theorem.

Theorem 2.6 (Properties of Covariance and Correlation). Provided that the re-
quired variances and the covariances exist,
(a) Cov.X; c/ D 0 for any X and any constant cI
(b) Cov.X; X / D var.X / for!any X I
Pn P
m Pn Pm
(c) Cov ai Xi ; bj Yj D ai bj Cov.Xi ; Yj /;
i D1 j D1 i D1 j D1

and in particular,
Var.aX CbY / D Cov.aX CbY ;aX CbY /
D a2 Var.X /Cb 2 Var.Y /C2abCov.X; Y /;

and, !
X
n X
n X X
n
Var Xi D Var.Xi / C 2 Cov.Xi ; Xj /I
i D1 i D1 i <j D1

(d) For any two independent random variables X; Y; Cov.X; Y / D X;Y D 0I


(e) aCbX;cCd Y D sgn.bd /X;Y ; where sgn.bd / D 1 if bd > 0; and D 1
if bd < 0:
(f) Whenever X;Y is defined, 1  X;Y  1.
(g) X;Y D 1 if and only if for some a, some b > 0; P .Y D a C bX / D 1;
X;Y D 1 if and only if for some a, some b < 0; P .Y D a C bX / D 1.

Proof. For part (a), Cov.X; c/ D E.cX /  E.c/E.X / D cE.X /  cE.X / D 0.


For part (b), Cov.X; X / D E.X 2 /  ŒE.X /2 D var.X /. For part (c),
0 1
X
n X
m
Cov @ ai Xi ; bj Yj A
i D1 j D1
2 3 ! 0 1
X
n X
m X
n X
m
DE4 ai Xi  bj Yj 5  E ai Xi E @ bj Yj A
i D1 j D1 i D1 j D1
0 1 " # 2 3
X
n X
m X
n X
m
DE@ ai bj Xi Yj A  ai E.Xi /  4 bj E.Yj /5
i D1 j D1 i D1 j D1
2.4 Covariance and Correlation 109

X
n X
m X
n X
m
D ai bj E.Xi ; Yj /  ai bj E.Xi /E.Yj /
i D1 j D1 i D1 j D1

X
n X
m
D ai bj ŒE.Xi ; Yj /  E.Xi /E.Yj /
i D1 j D1

X
n X
m
D ai bj Cov.Xi ; Yj /:
i D1 j D1

Part (d) follows on noting that E.X Y / D E.X /E.Y / if X; Y are independent. For
part (e), first note that Cov.a C bX; c C d Y / D bd Cov.X; Y / by using part (a)
and part (c). Also, Var.a C bX / D b 2 Var.X /; Var.c C d Y / D d 2 Var.Y /

bd Cov.X; Y /
) aCbX;cCd Y D p p
b 2 Var.X /
d 2 Var.Y /
bd Cov.X; Y /
D p p
jbj Var.X /jd j Var.Y /
bd
D X;Y D sgn.bd /X;Y :
jbd j

The proof of part (f) uses the Cauchy–Schwarz inequality (see Chapter 1) that for
any two random variables U; V; ŒE.U V /2  E.U 2 /E.V 2 /. Let

X  E.X / Y  E.Y /
U D p ; V D p :
Var.X / Var.Y /

Then, E.U 2 / D E.V 2 / D 1, and

X;Y D E.U V /  E.U 2 /E.V 2 / D 1:

The lower bound X;Y  1 follows similarly.


Part (g) uses the condition for equality in the Cauchy–Schwarz inequality: in
order that X;Y D ˙1, one must have ŒE.U V /2 D E.U 2 /E.V 2 / in the argument
above, which implies the statement in part (g). t
u

Example 2.11 (Correlation Between Minimum and Maximum in Dice Rolls). Con-
sider again the experiment of rolling a fair die twice, and let X; Y be the maximum
and the minimum of the two rolls. We want to find the correlation between X; Y .
The joint distribution of .X; Y / was worked out in Example 2.2. From the joint
distribution,

1 2 4 3 6 9 30 36 49
E.X Y / D C C C C C CC C D :
36 18 36 18 18 36 18 36 4
110 2 Multivariate Discrete Distributions

The marginal pmfs of X; Y were also worked out in Example 2.2. From the marginal
pmfs, by direct calculation, E.X / D 161=36; E.Y / D 91=36; Var.X / D Var.Y / D
2555=1296: Therefore,

E.X Y /  E.X /E.Y /


X;Y D p p
Var.X / Var.Y /
49=4  161=36  91=36 35
D D D :48:
2555=1296 73

The correlation between the maximum and the minimum is in fact positive for any
number of rolls of a die, although the correlation will converge to zero when the
number of rolls converges to 1.

Example 2.12 (Correlation in the Chicken–Eggs Example). Consider again the ex-
ample of a chicken laying a Poisson number of eggs N with mean , and each egg
fertilizing, independently of others, with probability p. If X is the number of eggs
actually fertilized, we want to find the correlation between the number of eggs laid
and the number fertilized, that is, the correlation between X and N .
First,

E.XN / D EN ŒE.XN jN D n/ D EN ŒnE.X jN D n/


D EN Œn2 p D p. C 2 /:

Next, from our previous calculations, E.X / D p; E.N / D ; Var.X / D p;
Var.N / D : Therefore,

E.XN /  E.X /E.N /


X;N D p p
Var.X / Var.N /
p. C 2 /  p2 p
D p p D p:
p 

Thus, the correlation goes up with the fertility rate of the eggs.

Example 2.13 (Best Linear Predictor). Suppose X and Y are two jointly distributed
random variables, and either by necessity, or by omission, the variable Y was not
observed. But X was observed, and there may be some information in the X value
about Y . The problem is to predict Y by using X . Linear predictors, because of their
functional simplicity, are appealing. The mathematical problem is to choose the best
linear predictor a C bX of Y , where best is defined as the predictor that minimizes
the mean squared error EŒY  .a C bX /2 . We show that the answer has something
to do with the covariance between X and Y .
By breaking the square, R.a; b/

DEŒY .aCbX /2 D a2 Cb 2 E.X 2 /C2abE.X /2aE.Y /2bE.X Y /CE.Y 2 /:


2.5 Multivariate Case 111

To minimize this with respect to a; b, we partially differentiate R.a; b/ with respect


to a; b, and set the derivatives equal to zero:

@
R.a; b/ D 2a C 2bE.X /  2E.Y / D 0
@a
, a C bE.X / D E.Y /I
@
R.a; b/ D 2bE.X 2 / C 2aE.X /  2E.X Y / D 0
@b
, aE.X / C bE.X 2 / D E.X Y /:

Simultaneously solving these two equations, we get

E.X Y /  E.X /E.Y / E.X Y /  E.X /E.Y /


bD ; a D E.Y /  E.X /:
Var.X / Var.X /

These values do minimize R.a; b/ by an easy application of the second derivative


test. So, the best linear predictor of Y based on X is

Cov.X; Y / Cov.X; Y /
best linear predictor of Y D E.Y /  E.X / C X
Var.X / Var.X /
Cov.X; Y /
D E.Y / C ŒX  E.X /:
Var.X /

The best linear predictor is also known as the regression line of Y on X . It is of


widespread use in statistics.

Example 2.14 (Zero Correlation Does Not Mean Independence). If X; Y are inde-
pendent, then necessarily Cov.X; Y / D 0, and hence the correlation is also zero.
The converse is not true. Take a three-valued random variable X with the pmf
P .X D ˙1/ D p; P .X D 0/ D 1  2p; 0 < p < 12 . Let the other variable
Y be Y D X 2 : Then, E.X Y / D E.X 3 / D 0, and E.X /E.Y / D 0, because
E.X / D 0. Therefore, Cov.X; Y / D 0. But X; Y are certainly not independent; for
example, P .Y D 0jX D 0/ D 1, but P .Y D 0/ D 1  2p ¤ 0:
Indeed, if X has a distribution symmetric around zero, and if X has three finite
moments, then X and X 2 always have a zero correlation, although they are not
independent.

2.5 Multivariate Case

The extension of the concepts for the bivariate discrete case to the multivariate dis-
crete case is straightforward. We give the appropriate definitions and an important
example, namely that of the multinomial distribution, an extension of the binomial
distribution.
112 2 Multivariate Discrete Distributions

Definition 2.9. Let X1 ; X2 ; : : : ; Xn be discrete random variables defined on a com-


mon sample space , with Xi taking values in some countable set Xi . The joint
pmf of .X1 ; X2 ; : : : ; Xn / is defined as p.x1 ; x2 ; : : : ; xn / D P .X1 D x1 ; : : : ; Xn D
xn /; xi 2 Xi ; and zero otherwise:.

Definition 2.10. Let X1 ; X2 ; : : : ; Xn be random variables defined on a common


sample space . The joint CDF of X1 ; X2 ; : : : ; Xn is defined as F .x1 ; x2 ; : : : ; xn / D
P .X1  x1 ; X2  x2 ; : : : ; Xn  xn /; x1 ; x2 ; : : : ; xn 2 R.
The requirements of a joint pmf are the usual:
x2 ; : : : ; xn /  0 8 x1 ; x2 ; : : : ; xn 2 RI
(i) p.x1 ; P
(ii) p.x1 ; x2 ; : : : ; xn / D 1:
x1 2X1 ;:::;xn 2Xn

The requirements of a joint CDF are somewhat more complicated.


The requirements of a CDF are that
(i) 0  F  1 8.x1 ; : : : ; xn /:
(ii) F is nondecreasing in each coordiante:
(iii) F equals zero if one or more of the xi D 1:
(iv) F equals one if all the xi D C1:
(v) F assigns a nonnegative probability to every n dimensional rectangle

Œa1 ; b1   Œa2 ; b2       Œan ; bn :

This last condition, (v), is a notationally clumsy condition to write down. If n D 2,


it reduces to the simple inequality that

F .b1 ; b2 /  F .a1 ; b2 /  F .b1 ; a2 / C F .a1 ; a2 /  0 8a1  b1 ; a2  b2 :

Once again, we mention that it is not convenient or interesting to work with the
CDF for discrete random variables; for discrete variables, it is preferable to work
with the pmf.

2.5.1 Joint MGF

Analogous to the case of one random variable, we can define the joint mgf for sev-
eral random variables. The definition is the same for all types of random variables,
discrete or continuous, or other mixed types. As in the one-dimensional case, the
joint mgf of several random variables is also a very useful tool. First, we repeat
the definition of expectation of a function of several random random variables; see
Chapter 1, where it was first introduced and defined. The definition below is equiv-
alent to what was given in Chapter 1.

Definition 2.11. Let X1 ; X2 ; : : : ; Xn be discrete random variables defined on


a common sample space , with Xi taking values in some countable set Xi .
2.5 Multivariate Case 113

Let the joint pmf of X1 ; X2 ; : : : ; Xn be p.x1 ; : : : ; xn /: Let g.x1 ; : : : ; xn / be


a real-valued
P function of n variables. We say that EŒg.X1 ; X2 ; : : : ; Xn / ex-
ists if x1 2X1 ;:::;xn 2Xn jg.x1 ; : : : ; xn /jp.x1 ; : : : ; xn / < 1, in which case, the
expectation is defined as
X
EŒg.X1 ; X2 ; : : : ; Xn / D g.x1 ; : : : ; xn /p.x1 ; : : : ; xn /:
x1 2X1 ;:::;xn 2Xn

A corresponding definition when X1 ; X2 ; : : : ; Xn are all continuous random vari-


ables is given in the next chapter.
Definition 2.12. Let X1 ; X2 ; : : : ; Xn be n random variables defined on a common
sample space , The joint moment-generating function of X1 ; X2 ; : : : ; Xn is defined
to be
0
.t1 ; t2 ; : : : ; tn / D EŒe t1 X1 Ct2 X2 C:::Ctn Xn  D EŒe t X ;
provided the expectation exists, and where t0 X denotes the inner product of the
vectors t D .t1 ; : : : ; tn /; X D .X1 ; : : : ; Xn /.
Note that the joint moment-generating function (mgf) always exists at the origin,
namely, t D .0; : : : ; 0/, and equals one at that point. It may or may not exist at other
points t. If it does exist in a nonempty rectangle containing the origin, then many
important characteristics of the joint distribution of X1 ; X2 ; : : : ; Xn can be derived
by using the joint mgf. As in the one-dimensional case, it is a very useful tool. Here
is the moment-generation property of a joint mgf.
Theorem 2.7. Suppose .t1 ; t2 ; : : : ; tn / exists in a nonempty open rectangle con-
taining the origin t D 0: Then a partial derivative of .t1 ; t2 ; : : : ; tn / of every order
with respect to each ti exists in that open rectangle, and furthermore,

  @k1 Ck2 CCkn


E X1k1 X2k2    Xnkn D .t1 ; t2 ; : : : ; tn /jt1 D 0; t2 D 0; : : : ; tn D 0:
@t1k1    @tnkn

A corollary of this result is sometimes useful in determining the covariance between


two random variables.
Corollary. Let X; Y have a joint mgf in some open rectangle around the origin
.0; 0/. Then,
  
@2 @ @
Cov.X; Y / D .t1 ; t2 /j0;0  .t1 ; t2 /j0;0 .t1 ; t2 /j0;0 :
@t1 @t2 @t1 @t2

We also have the distribution-determining property, as in the one-dimensional case.


Theorem 2.8. Suppose .X1 ; X2 ; : : : ; Xn / and .Y1 ; Y2 ; : : : ; Yn / are two sets of
jointly distributed random variables, such that their mgfs X .t1 ; t2 ; : : : ; tn / and
Y .t1 ; t2 ; : : : ; tn / exist and coincide in some nonempty open rectangle contain-
ing the origin. Then .X1 ; X2 ; : : : ; Xn / and .Y1 ; Y2 ; : : : ; Yn / have the same joint
distribution.
114 2 Multivariate Discrete Distributions

Remark. It is important to note that the last two theorems are not limited to discrete
random variables; they are valid for general random variables. The proofs of these
two theorems follow the same arguments as in the one-dimensional case, namely
that when an mgf exists in a nonempty open rectangle, it can be differentiated in-
finitely often with respect to each variable ti inside the expectation; that is, the order
of the derivative and the expectation can be interchanged.

2.5.2 Multinomial Distribution

One of the most important multivariate discrete distributions is the multinomial dis-
tribution. The multinomial distribution corresponds to n balls being distributed to k
cells, independently, with each ball having the probability pi of being dropped into
the i th cell. The random variables under consideration are X1 ; X2 ; : : : ; Xk , where
Xi is the number of balls that get dropped into the i th cell. Then their joint pmf is
the multinomial pmf defined below.
Definition 2.13. A multivariate random vector .X1 ; X2 ; : : : ; Xk / is said to have a
multinomial distribution with parameters n; p1 ; p2 ; : : : ; pk if it has the pmf
nŠ x
P .X1 D x1 ; X2 D x2 ; : : : ; Xk D xk / D p x1 p x2 : : : pk k ;
x1 Šx2 Š    xk Š 1 2
X
k
xi  0; xi Dn;
i D1

P
pi  0; kiD1 pi D 1:
We write .X1 ; X2 ; : : : ; Xk /  Mult.n; p1 ; : : : ; pk / to denote a random vector
with a multinomial distribution.
Example 2.15 (Dice Rolls). Suppose a fair die is rolled 30 times. We want to find
the probabilities that
(i) Each face is obtained exactly five times.
(ii) The number of sixes is at least five.
If we denote the number of times face number i is obtained as Xi , then
.X1 ; X2 ; : : : ; X6 /  Mult.n; p1 ; : : : ; p6 /, where n D 30 and each pi D 16 .
Therefore,
P .X1 D 5; X2 D 5; : : : ; X6 D 5/
   5
30Š 1 5 1
D :::
.5Š/6 6 6
 
30Š 1 30
D
.5Š/6 6
D :0004:
2.5 Multivariate Case 115

Next, each of the 30 rolls will either be a 6 or not, independently of the other rolls,
with probability 16 , and so, X6  Bin.30; 16 /: Therefore,
!   
X4
30 1 x 5 30x
P .X6  5/ D 1  P .X6  4/ D 1 
xD0
x 6 6
D :5757:

Example 2.16 (Bridge). Consider a Bridge game with four players, North, South,
East, and West. We want to find the probability that North and South together
have two or more aces. Let Xi denote the number of aces in the hands of player
i; i D 1; 2; 3; 4; we let i D 1; 2 mean North and South. Then, we want to find
P .X1 C X2  2/:
The joint distribution of .X1 ; X2 ; X3 ; X4 / is Mult.4; 14 ; 14 ; 14 ; 14 / (think of each
ace as a ball, and the four players as cells). Then, .X1 C X2 ; X3 C X4 / 
Mult.4; 12 ; 12 /: Therefore,
 4    
4Š 1 4Š 1 4 4Š 1 4
P .X1 C X2  2/ D C C
2Š2Š 2 3Š1Š 2 4Š0Š 2
11
D :
16
Important formulas and facts about the multinomial distribution are given in the
next theorem.

Theorem 2.9. Let .X1 ; X2 ; : : : ; Xk /  Mult.n; p1 ; p2 ; : : : ; pk /. Then,


(a) E.Xi / D npi I Var.Xi / D npi .1  pi /I
(b) 8 i; Xi  Bin.n; pi /I
(c) Cov.Xi ; Xj /qD npi pj ; 8i ¤ j I
pi pj
(d) Xi ;Xj D  .1pi /.1pj /
; 8i ¤ j I
(e) 8m; 1  m < k; .X1 ; X2 ; : : : ; Xm /j.XmC1 C XmC2 C : : : C Xk / D s 
Mult.n  s; 1 ; 2 ; : : : ; m /;
pi
where i D p1 Cp2 C:::Cpm
:

Proof. Define Wi r as the indicator of the event that the rth ball lands in the i th cell.
Note that for a given i , the variables Wi r are independent. Then,

X
n
Xi D Wi r ;
rD1

P P
and therefore, E.Xi / D nrD1 EŒWi r  D npi , and Var.Xi / D nrD1 Var.Wi r / D
npi .1  pi /: Part (b) follows from the definition of a multinomial experiment
116 2 Multivariate Discrete Distributions

(the trials are identical and independent, and each ball either lands or not in the
i th cell). For part (c),
!
Xn X
n
Cov.Xi ; Xj / D Cov Wi r ; Wjs
rD1 sD1
X
n X
n
D Cov.Wi r ; Wjs /
rD1 sD1
Xn
D Cov.Wi r ; Wjr /
rD1

(because Cov.Wi r ; Wjs / would be zero when s ¤ r)


X
n
D ŒE.Wi r Wjr /  E.Wi r /E.Wjr /
rD1
Xn
D Œ0  pi pj  D npi pj :
rD1

Part (d) follows immediately from part (c) and part (a). Part (e) is a calculation, and
is omitted. t
u
Example 2.17 (MGF of the Multinomial Distribution). Let .X1 ; X2 ; : : : ; Xk / 
Mult.n; p1 :p2 ; : : : ; pk /. Then the mgf .t1 ; t2 ; : : : ; tk / exists at all t, and a formula
follows easily. Indeed,
X nŠ
EŒe t1 X1 CCtk Xk  D
x
e t1 x1 e t2 x2    e tk xk p1x1 p2x2 cdotspk k
Pk x1 Š    xk Š
xi 0; i D1 xi Dn
X nŠ
D .p1 e t1 /x1 .p2 e t2 /x2    .pk e tk /xk
Pk x1 Š    xk Š
xi 0; i D1 xi Dn

D .p1 e C p2 e t2 C    C pk e tk /n ;
t1

by the multinomial expansion identity


X nŠ x
.a1 C a2 C    C ak /n D ax1 ax2    akk :
Pk x1 Š    xk Š 1 2
xi 0; i D1 xi Dn

2.6  The Poissonization Technique

Calculation of complex multinomial probabilities often gets technically simplified


by taking the number of balls to be a random variable, specifically, a Poisson random
variable. We give the Poissonization theorem and some examples in this section.
2.6 The Poissonization Technique 117

Theorem 2.10. Let N  Poi./, and suppose given N D n; .X1 ; X2 ; : : : ; Xk / 


Mult.n; p1 ; p2 ; : : : ; pk /: Then, marginally, X1 ; X2 ; : : : ; Xk are independent
Poisson, with Xi  Poi.pi /:

Proof. By the total probability formula,

P .X1 D x1 ; X2 D x2 ; : : : ; Xk D xk /
1
X e  n
D P .X1 D x1 ; X2 D x2 ; : : : ; Xk D xk jN D n/
nD0

X1  n
.x1 C x2 C    C xk /Š x1 x2 x e 
D p1 p2    pk k InDx1 Cx2 CCxk
nD0
x Šx
1 2 Š    xk Š nŠ
1
D e  x1 x2    xk p1 1 p2 2    pk k
x x x
x1 Šx2 Š    xk Š
1
D e  .p1 /x1 .p2 /x2    .pk /xk
x1 Šx2 Š    xk Š
Y
k
e pi .pi /xi
D ;
xi Š
i D1

which establishes that the joint marginal pmf of .X1 ; X2 ; : : : ; Xk / is the product
of k Poisson pmfs, and so X1 ; X2 ; : : : ; Xk must be marginally independent, with
Xi  Poi.pi /. t
u

Corollary. Let A be a set in the k-dimensional Euclidean space Rk . Let


.Y1 ; Y2 ; : : : ; Yk /  Mult.n; p1 ; p2 ; : : : ; pk /. Then, P ..Y1 ; Y2 ; : : : ; Yk / 2 A/
equals nŠc.n/, where c.n/ is the coefficient of n in the power series expansion
of e  P ..X1 ; X2 ; : : : ; Xk / 2 A/: Here X1 ; X2 ; : : : ; Xk are as above: they are
independent Poisson variables, with Xi  Poi.pi /.
The corollary is simply a restatement of the identity
X1
 n
P ..X1 ; X2 ; : : : ; Xk / 2 A/ D e P ..Y1 ; Y2 ; : : : ; Yk / 2 A/:
nD0

Example 2.18 (No Empty Cells). Suppose n balls are distributed independently and
at random into k cells. We want to find a formula for the probability that no cell
remains empty.
We use the Poissonization technique to solve this problem. We want a formula
for P .Y1 ¤ 0; Y2 ¤ 0; : : : ; Yk ¤ 0/.

Marginally, each Xi  Poi. k /, and therefore,

P .X1 > 0; X2 > 0; : : : ; Xk > 0/ D .1  e =k /k


) e  P .X1 > 0; X2 > 0; : : : ; Xk > 0/ D e  .1  e =k /k
118 2 Multivariate Discrete Distributions
!
X
k
k
D .1/x e .1x=k/
xD0
x
! 1
Xk
k X ..1  x=k//n
D .1/x
xD0
x nD0 nŠ
1
!
X n X
k
k
D Œ .1/x .1  x=k/n :
nD0
nŠ xD0
x

Therefore, by the above corollary,


!
X
k
k
P .Y1 ¤ 0; Y2 ¤ 0; : : : ; Yk ¤ 0/ D .1/ x
.1  x=k/n :
xD0
x

Exercises

Exercise 2.1. Consider the experiment of picking one word at random from the
sentence
ALL IS WELL IN THE NEWELL FAMILY
Let X be the length of the word selected and Y the number of Ls in it. Find in a
tabular form the joint pmf of X and Y , their marginal pmfs, means, and variances,
and the correlation between X and Y .
Exercise 2.2. A fair coin is tossed four times. Let X be the number of heads, Z the
number of tails, and Y D jX  Zj. Find the joint pmf of .X; Y /, and E.Y /.
Exercise 2.3. Consider the joint pmf p.x; y/ D cxy; 1  x  3; 1  y  3.
(a) Find the normalizing constant c.
(b) Are X; Y independent? Prove your claim.
(c) Find the expectations of X; Y; X Y:
Exercise 2.4. Consider the joint pmf p.x; y/ D cxy; 1  x  y  3.
(a) Find the normalizing constant c.
(b) Are X; Y independent? Prove your claim.
(c) Find the expectations of X; Y; X Y:
Exercise 2.5. A fair die is rolled twice. Let X be the maximum and Y the minimum
of the two rolls. By using the joint pmf of .X; Y / worked out in text, find the pmf
of X
Y
, and hence the mean of X Y
.
Exercise 2.6. A hat contains four slips of paper, numbered 1, 2, 3, and 4. Two slips
are drawn at random, without replacement. X is the number on the first slip and Y
the sum of the two numbers drawn. Write in a tabular form the joint pmf of .X; Y /.
Hence find the marginal pmfs. Are X; Y independent?
Exercises 119

Exercise 2.7 * (Conditional Expectation in Bridge). Let X be the number of


clubs in the hand of North and Y the number of clubs in the hand of South in a
Bridge game. Write a general formula for E.X jY D y/, and compute E.X jY D 3/.
How about E.Y jX D 3/?

Exercise 2.8. A fair die is rolled four times. Find the probabilities that:
(a) At least 1 six is obtained;
(b) Exactly 1 six and exactly one two is obtained,
(c) Exactly 1 six, 1 two, and 2 fours are obtained.

Exercise 2.9 (Iterated Expectation). A household has a Poisson number of cars


with mean 1. Each car that a household possesses has, independently of the other
cars, a 20% chance of being an SUV. Find the mean number of SUVs a household
possesses.

Exercise 2.10 (Iterated Variance). Suppose N  Poi./, and given N D n; X is


distributed as a uniform on f0; 1; : : : ; ng. Find the variance of the marginal distribu-
tion of X .

Exercise 2.11. Suppose X and Y are independent Geo.p/ random variables. Find
P .X  Y /I P .X > Y /:

Exercise 2.12. * Suppose X and Y are independent Poi./ random variables. Find
P .X  Y /I P .X > Y /:

Hint: This involves a Bessel function of a suitable kind.

Exercise 2.13. Suppose X and Y are independent and take the values 1, 2, 3, 4 with
probabilities .2, .3, .3, .2. Find the pmf of X C Y .

Exercise 2.14. Two random variables have the joint pmf p.x; x C 1/ D
1
nC1
; x D 0; 1; : : : ; n. Answer the following questions with as little calculation
as possible.
(a) Are X; Y independent?
(b) What is the variance of Y  X ?
(c) What is Var.Y jX D 1/?

Exercise 2.15 (Binomial Conditional Distribution). Suppose X; Y are indepen-


dent random variables, and that X  Bin.m; p/; Y  Bin.n; p/. Show that the
conditional distribution of X given X C Y D t is a hypergeometric distribution;
identify the parameters of this hypergeometric distribution.

Exercise 2.16 * (Poly-Hypergeometric Distribution). A box has D1 red, D2


green, and D3 blue balls. Suppose n balls are picked at random without replace-
ment from the box. Let X; Y; Z be the number of red, green, and blue balls selected.
Find the joint pmf of .X; Y; Z/.
120 2 Multivariate Discrete Distributions

Exercise 2.17 (Bivariate Poisson). Suppose U; V; W are independent Poisson ran-


dom variables, with means ; ; . Let X D U C W I Y D V C W:
(a) Find the marginal pmfs of X; Y .
(b) Find the joint pmf of .X; Y /.

Exercise 2.18. Suppose a fair die is rolled twice. Let X; Y be the two rolls. Find
the following with as little calculation as possible:
(a) E.X C Y jY D y/.
(b) E.X Y jY D y/.
(c) Var.X 2 Y jY D y/.
(d) XCY;XY :

Exercise 2.19 (A Waiting Time Problem). In repeated throws of a fair die, let X
be the throw in which the first six is obtained, and Y the throw in which the second
six is obtained.
(a) Find the joint pmf of .X; Y /.
(b) Find the expectation of Y  X .
(c) Find E.Y  X jX D 8/.
(d) Find Var.Y  X jX D 8/.

Exercise 2.20 * (Family Planning). A couple want to have a child of each sex, but
they will have at most four children. Let X be the total number of children they
will have and Y the number of girls at the second childbirth. Find the joint pmf of
.X; Y /, and the conditional expectation of X given Y D y; y D 0; 2.

Exercise 2.21 (A Standard Deviation Inequality). Let X; Y be two random vari-


ables. Show that XCY  X C Y :

Exercise 2.22 * (A Covariance Fact). Let X; Y be two random variables. Suppose


E.X jY D y/ is nondecreasing in y. Show that X;Y  0, assuming the correlation
exists.

Exercise 2.23 (Another Covariance Fact). Let X; Y be two random variables.


Suppose E.X jY D y/ is a finite constant c. Show that Cov.X; Y / D 0:

Exercise 2.24 (Two-Valued Random Variables). Suppose X; Y are both two-


valued random variables. Prove that X and Y are independent if and only if they
have a zero correlation.

Exercise 2.25 * (A Correlation Inequality). Suppose X; Y each have p mean 0 and


variance 1, and a correlation . Show that E.maxfX 2 ; Y 2 g/  1 C 1  2 .

Exercise 2.26 (A Covariance Inequality). Let X be any random variable, and


g.X /; h.X / two functions such that they are both nondecreasing or both nonin-
creasing. Show that Cov.g.X /; h.X //  0:
Exercises 121

Exercise 2.27 (Joint MGF). Suppose a fair die is rolled four times. Let X be the
number of ones and Y the number of sixes. Find the joint mgf of X and Y , and
hence, the covariance between X; Y .

Exercise 2.28 (MGF of Bivariate Poisson). Suppose U; V; W are independent


Poisson random variables, with means ; ; . Let X D U C W I Y D V C W:
Find the joint mgf of X; Y , and hence E.X Y /.

Exercise 2.29 (Joint MGF). In repeated throws of a fair die, let X be the throw in
which the first six is obtained, and Y the throw in which the second six is obtained.
Find the joint mgf of X; Y , and hence the covariance between X and Y .

Exercise 2.30 * (Poissonization). A fair die is rolled 30 times. By using the Pois-
sonization theorem, find the probability that the maximum number of times any face
appears is 9 or more.

Exercise 2.31 * (Poissonization). Individuals can be of one of three genotypes in


a population. Each genotype has the same percentage of individuals. A sample of n
individuals from the population will be taken. What is the smallest n for which with
probability  :9, there are at least five individuals of each genotype in the sample?

You might also like