Université Paris Dauphine 2019–2020
Département MIDO
Licence 3 – Statistical modelling
Tutorial n°1
stoehr@ceremade.dauphine.fr
Exercise 1 (Probability or Statistics?). Let X 1 , . . . , X n be i.i.d. random variables and x 1 , . . . , x n be
realisations or observations of the later random variables. Which of the following quantities are random?
1 Pn 1 Pn
1. max{x 1 , . . . , x n }, 3. n i =1 X i , 5. n−1 i =1 (x i − x̄ n )2 .
2. the sample size n, 4. min{X 1 , . . . , X n },
Solution Exercise 1. Quantities 3. and 4. are functions of random variables. Hence they are
random. Other quantities are function of observed data. They are real numbers.
Exercise 2 (True or False?). Let X and Y be integrable random variables. Which of the following
statements are correct? Justify your answer with a brief proof or a counterexample.
1. If X is symmetric with respect to 0, then E [X ] = 0. 4. E [X Y ] = E [X ]E [Y ].
£1¤ 1
2. E X = E[X ] . 5. Var[X + Y ] = Var[X ]+Var[Y ].
3. E [X ]2 ≤ E X 2 .
£ ¤
Solution Exercise 2.
1. True. If X is symmetric with respect to 0, then X and −X have the same distribution. Then
E [X ] = E [−X ] = −E [X ] and hence E [X ] = 0.
2. False. Let X be a uniform distribution on [0 , 1]. Then E X1 = ∞ and E[X
1
£ ¤
]
= 2.
2
3. True. The Jensen’s inequality applied to the convex function x 7→ x gives the result.
Alternative solution. The variance is defined by Var[X ] = E (X − E [X ])2 ≥ 0. Hence we have
£ ¤
E [X ]2 − E X 2 ≥ 0.
£ ¤
4. False. This result is only true under the independence assumption. For instance, set X = Y ∼
B(p). Then E [X Y ] = E X 2 = p 6= E [X ]E [Y ] = E [X ]2 = p 2 .
£ ¤
5. False. This result is only true under the independence assumption. For instance, let X be
a random variable with Var[X ] and set Y = X . Then Var[X + Y ] = Var[2X ] = 4Var[X ] 6=
Var[X ] + Var[Y ] = 2Var[X ].
Exercise 3 (Multinomial distribution). A population is divided into K groups. We denote p 1 , . . . , p K
the proportions of individuals in each group, with p 1 , . . . , p K ∈ [0 , 1] and Ki=1 p i = 1. We draw in this
P
1
Tutorial n°1 Statistical modelling
population n individuals with replacement. Let denote Ni the number of individuals belonging to group
i , i = 1, . . . , K , among the n individuals drawn.
1. Give, with a justification, the distribution of (N1 , . . . , NK ).
2. Give the marginal distribution of Ni , i = 1, . . . , K .
3. Give the R command to use to run this experiment.
Solution Exercise 3.
1. In a population we (randomly) observe n individuals and we denote X i , i = 1 . . . , n, the group
of individual i . (X 1 , . . . , X n ) is set of i.i.d. random variable with distribution
P [X = k] = p k , k = 1, . . . , K .
For any k = 1, . . . , K ,
n
X K
X
Nk = 1{X i =k} and n = Nk .
i =1 k=1
We first need to determine the number of possible ordering such that N1 = n 1 individuals
from group 1, . . ., NK = n K individuals from group K . If all the elements were distinct, the
number of permutation of (X 1 , . . . , X n ) is n!. But we need to take into account repetitions
in (X 1 , . . . , X n ) (individuals belonging to the same group can be exchanged in the sequence).
Hence the number of possible ordering is
n!
.
n1 ! . . . nK !
The probability of any specific ordering with N1 = n 1 individuals from group 1, . . ., NK = n K
n n
individuals from group K is p 1 1 . . . p KK . The distribution of (N1 , . . . , NK ) is then given by
n! n n
P [N1 = n 1 , . . . , NK = n K ] = p 1 . . . p KK .
n1 ! . . . nK ! 1
2. The marginal distribution of Ni , i = 1, . . . , K is the Binomial distribution B(n, p i ).
3. We can use the function sample.
sample(1:K, n, replace = TRUE, prob = c(p1, ..., pK))
Exercise 4. We consider a system made of two different machines working in series, that is the system
works as long as both machines work. Let denote X 1 and X 2 the lifetime of the two machines and Z the
lifetime of the system. We assume that the random variables X 1 and X 2 are independent and follow an
exponential distribution with respective parameters λ1 and λ2 .
1. Compute the probability that the system breaks down after time t ≥ 0 and deduce the distribution of
Z.
2. Compute the probability that the break down is due to a failure of machine 1.
2
Tutorial n°1 Statistical modelling
3. Let Y be a random variable such that Y = 1 if the failure is due to machine 1 and Y = 0 otherwise.
(a) Compute P [Z > t , Y = 1] for all t ≥ 0.
(b) Deduce that Z and Y are independent.
4. We have n identical systems working independently and we observe their lifetimes Z1 , . . . , Zn . Which
statistical model would you choose to perform a statistical analysis of lifetime performances of such
a system?
Solution Exercise 4.
1. Given t ∈ R+ ,
P [Z > t ] = P [X 1 > t , X 2 > t ] = P [X 1 > t ]P [X 2 > t ] = exp (−{λ1 + λ2 }t ) .
↑
X 1 and X 2 are independent
Hence Z is distributed according to an exponential distribution E (λ1 + λ2 ).
2. The probability that the break down is due to a failure of machine 1 is given by
Z
P [X 2 > X 1 ] = 1{x2 >x1 } λ1 e −λ1 x1 λ2 e −λ2 x1 dx 2 dx 1 (X 1 and X 2 are independent)
R2+
Z Z +∞
−λ1 x 1
= λ1 e λ2 e −λ2 x1 dx 2 dx 1
R+ x1
λ1
Z
= λ1 e −(λ1 +λ2 )x1 dx 1 = .
R+ λ1 + λ2
3. (a) For all t ≥ 0,
Z +∞ Z +∞
P [Z > t , Y = 1] = P [X 2 > X 1 > t ] = λ1 e −λ1 x1 λ2 e −λ2 x1 dx 2 dx 1
t x1
λ1
= exp (−{λ1 + λ2 }t ) .
λ1 + λ2
(b) From the answers to questions 1 and 2, we have
P [Z > t , Y = 1] = P [Z > t ]P [X 2 > X 1 ] = P [Z > t ]P [Y = 1].
The formula of total probability implies that P [Z > t , Y = 0] = P [Z > t ]P [Y = 0] and
hence the random variables Z and Y are independent (the cumulative distribution func-
tion of (Z , Y ) satisfies F (t , s) = P [Z ≤ t ]P [Y ≤ s], for all (t , s) ∈ R2 .)
¦ To do at Home ¦
Exercise 5. Let X be a real random variable with density
5
f X (x) = 1{x>5}
x2
3
Tutorial n°1 Statistical modelling
Compute the following quantities:
1. P [X > 20], 2. F X (t ), for all t ∈ R, 3. E [X ].
Solution Exercise 5.
1.
Z +∞ 1
P [X > 20] = f X (x)dx = .
20 4
2. For all t ∈ R
Z t 0 if t < 5,
F X (t ) = f X (x)dx =
1 − 5
−∞
t otherwise.
3. The expectation is not defined since x 7→ 1/x is not locally integrable on +∞.
Exercise 6. Let X be a random variable following the uniform distribution on [−π/2 , π/2]. Determine
the distribution of tan(X ).
Solution Exercise 6. Let g be a measurable and positive function on R. Since the change of
variable t 7→ arctan(t ) is a C 1 -diffeomorphism between the open sets R and ]−π/2 , π/2[,
π/2
1 1
Z Z
E g (tan(X )) =
£ ¤
g (tan(x))dx = g (t ) dt .
−π/2 π R π(1 + t 2 )
| {z }
measurable and positive
Hence tan(X ) is distributed according to a Cauchy distribution with position parameter x 0 = 0
and scale parameter a = 1.
Exercise 7 (Characteristic function). Let Y be a real random variable and Z a random variable,
independent of Y , such that
1
P [Z = 1] = P [Z = −1] = .
2
1. (a) Show that the law of X = Z Y is symmetric.
(b) Compute the characteristic function of X according to the characteristic function of Y .
2. Let X be a random variable following the standard Laplace law:
f X (x) = 0.5 exp(−|x|).
Show that, for every real t ,
1
Φ X (t ) = .
1+ t2
4
Tutorial n°1 Statistical modelling
Solution Exercise 7.
1. (a) Z and Y being independent and {Z = 1} ∩ {Z = 1} = ;, we have, for all t ∈ R
1 1
P [Z Y ≤ t ] = P [Y ≤ t ] + P [Y ≥ −t ] = P [Z Y ≥ −t ].
2 2
(b) Let denote φ X and φY the characteristic function of X and Y , respectively. The charac-
teristic function of X is defined, for all t ∈ R, by
h i Z Z
itX itzy
φ X (t ) = E e = e dP(Y ,Z ) (y, z) = e i t z y dPY (y) ⊗ dP Z (z)
R2 ↑ R2
Y and Z are independent
1 1
Z Z
ity
= e dPY (y) + e −i t y dPY (y)
2 R 2 R
1©
φY (t ) + φY (−t ) .
ª
=
2
2. For all t ∈ R, since lim exp(i t x + x) = lim exp(i t x − x) = 0
x→−∞ x→+∞
Z
1
Z
10 Z +∞ 1
φ X (t ) = exp(i t x − |x|)dx = exp(i t x + x)dx + exp(i t x − x)dx
R2 −∞ 2 0 2
µ ¶
1 1 1
= −
2 it +1 it −1
1
= .
1+ t2