DA1004-Problem Set 4
May 1, 2025
1. A statistician wants to estimate the mean height h (in meters) of a pop-
ulation, based on n independent samples X1 , . . . , Xn chosen uniformly from
the entire population. He uses the sample mean
X1 + · · · + Xn
Mn =
n
as the estimate of h and a rough guess of 1.0 m for the standard deviation
of the samples Xi .
(a) How large should n be so that the standard deviation of Mn is at most
1 centimeter?
(b) How large should n be so that Chebyshev’s inequality guarantees the
estimate is within 5 centimeters of h with probability at least 0.99?
(c) The statistician realizes that all persons in the population have heights
between 1.4 m and 2.0 m, and revises the standard-deviation figure
using the bound of Example 5.3. How should the values of n obtained
in parts (a) and (b) be revised?
2. The Chernoff bound is a powerful tool that relies on the transform asso-
ciated with a random variable and provides bounds on the probabilities of
certain tail events.
(a) Show that
Pr(X ≥ a) ≤ e−sa M (s)
holds for every a and every s ≥ 0, where
M (s) = E esX
1
is the moment-generating transform of X, assumed to be finite in a
small open interval containing s = 0.
(b) Show that
Pr(X ≤ a) ≤ e−sa M (s)
holds for every a and every s ≤ 0.
(c) Show that
Pr(X ≥ a) ≤ e−ϕ(a)
holds for every a, where
ϕ(a) = max sa − ln M (s) .
s≥0
3. A twice-differentiable real-valued function f of a single variable is called
convex if its second derivative d2 f /dx2 (x) is non-negative for all x in its
domain of definition.
(a) Show that the functions f (x) = eαx , f (x) = − ln x, and f (x) = x4 are
all convex.
(b) Show that if f is twice differentiable and convex, then the first-order
Taylor approximation of f is an underestimate of the function, that is,
df
f (a) + (x − a) (a) ≤ f (x)
dx
for every a and x.
(c) Show that if f has the property in part (b) and if X is a random variable,
then
f E[X] ≤ E f (X) .
4. In order to estimate f , the true fraction of smokers in a large
population, Alvin selects n people at random.
His estimator Mn is obtained by dividing Sn , the number of smokers in his
sample, by n, i.e. Mn = Sn /n. Alvin chooses the sample size n to be the
smallest possible number for which Chebyshev’s inequality yields a guarantee
that
Pr |Mn − f | ≥ ϵ ≤ δ,
where ϵ and δ are some prespecified tolerances. Determine how the value of
n recommended by Chebyshev’s inequality changes in the following cases.
2
(a) The value of ϵ is reduced to half its original value.
(b) The probability δ is reduced to half its original value.
5. Let X1 , X2 , . . . be i.i.d. random variables uniformly distributed
over [−1, 1].
Show that the sequence Y1 , Y2 , . . . converges in probability to some limit, and
identify the limit, for each of the following cases:
(a) Yn = Xn /n.
(b) Yn = (Xn )n .
(c) Yn = X1 · X2 · · · Xn .
(d) Yn = max{X1 , . . . , Xn }.
6. Consider two sequences of random variables X1 , X2 , . . . and Y1 , Y2 , . . . that
converge in probability to some constants. Let c be another constant. Show
that the following also converge in probability to the corresponding limits:
cXn , Xn + Yn , max{0, Xn }, |Xn |, and Xn Yn .
7. A sequence Xn of random variables is said to converge to a number c in
the mean square if
lim E (Xn − c)2 = 0.
n→∞
(a) Show that convergence in the mean square implies convergence in prob-
ability.
(b) Give an example that shows that convergence in probability does not
imply convergence in the mean square.
8. Before starting to play roulette in a casino, you want to look for biases
that you can exploit. You therefore watch 100 rounds that result in a number
between 1 and 36, and count the number of rounds for which the result is odd.
If the count exceeds 55 you decide that the roulette is not fair. Assuming
that the roulette is fair, find an approximation for the probability that you
will make the wrong decision.
3
9. During each day, the probability that your computer’s operating system
crashes at least once is 5 %, independent of every other day.
You are interested in the probability of at least 45 crash-free days out of the
next 50 days.
(a) Find the probability of interest by using the normal approximation to
the binomial.
(b) Repeat part (a), this time using the Poisson approximation to the bi-
nomial.
10. A factory produces Xn gadgets on day n, where the Xn are independent
and identically distributed random variables with mean 5 and variance 9.
(a) Find an approximation to the probability that the total number of gad-
gets produced in 100 days is less than 440.
(b) Find (approximately) the largest value of n such that
Pr X1 + · · · + Xn ≥ 200 + 5n ≤ 0.05.
(c) Let N be the first day on which the total number of gadgets produced
exceeds 1000. Calculate an approximation to the probability that N ≥
220.
11. Let X1 , Y1 , X2 , Y2 , . . . be independent random variables, each uniformly
distributed on [0, 1], and let
(X1 + · · · + X16 ) − (Y1 + · · · + Y16 )
W = .
16
Find a numerical approximation to the quantity
Pr |W − E[W ]| < 0.001 .
12. The number of minutes between successive bus arrivals at Alvin’s bus
stop is exponentially distributed with parameter Θ.
Alvin’s prior PDF of Θ is
(
10θ, if θ ∈ [0, 51 ],
fΘ (θ) =
0, otherwise.
4
(a) Alvin arrives on Monday at the bus stop and has to wait 30 minutes
for the bus to arrive. What is the posterior PDF, and the MAP and
conditional-expectation estimates of Θ?
(b) Following his Monday experience, Alvin decides to estimate Θ more
accurately, and records his waiting times for five days. These are 30,
25, 15, 40, and 20 minutes, and Alvin assumes that his observations are
independent. What is the posterior PDF, and the MAP and conditional-
expectation estimates of Θ given the five-day data?
13. Students in a probability class take a multiple-choice test with 10 ques-
tions and 3 choices per question.
A student who knows the answer to a question will answer it correctly, while
a student that does not will guess with probability 1/3. Each student is
equally likely to belong to one of three categories i = 1, 2, 3: those who know
the answer to each question with corresponding probabilities
θ1 = 0.3, θ2 = 0.7, θ3 = 0.95,
independent of other questions. Suppose that a randomly chosen student
answers k questions correctly.
(a) For each possible value of k, derive the MAP estimate of the category
that this student belongs to.
(b) Let M be the number of questions that the student knows how to an-
swer. Derive the posterior PMF, and the MAP and LMS estimates of
M given that the student answered correctly 5 questions.
14. Consider a biased coin
Assume the probability of heads Θ is distributed over [0, 1] according to
1
fΘ (θ) = 2 − 4 2
−θ , 0 ≤ θ ≤ 1.
Find the MAP estimate of Θ when n independent tosses yield k heads and
n − k tails.
15. Professor May B. Hard is unsure whether a quiz problem is difficult (Θ =
1) or not difficult (Θ = 2). Her prior is P (Θ = 1) = 0.3. The TA’s solution
5
time X (minutes) has conditional PDFs
( (
c1 e−0.04x , 5 ≤ x ≤ 60, c2 e−0.16x , 5 ≤ x ≤ 60,
fX|Θ (x|Θ = 1) = fX|Θ (x|Θ = 2) =
0, otherwise, 0, otherwise,
with normalising constants c1 , c2 .
(a) If the TA’s time is 20 min, which hypothesis does the MAP rule accept,
and what is the error probability?
(b) Four more independent TAs take 10, 25, 15, 35 min. Based on all five
times, which hypothesis is accepted and what is the new error proba-
bility?
16. Two-box problem.
Box 1 has one black and two white balls; box 2 has two black and one white.
Choose a box with probability p for box 1, then draw one ball.
(a) Give the MAP rule for deciding which box was chosen based on the
colour drawn.
(b) For p = 21 , find the probability of an incorrect decision and compare it
to the error probability when no ball is drawn.
17. Sequential coin with unknown bias q0 vs. q1 .
A coin’s head probability is q0 (H0 ) or q1 (H1 ), 0 < q0 < q1 < 1. Toss until
the first tail; let k be the number of heads observed.
(a) With priors P (H0 ) = P (H1 ) = 21 , compute P (H1 | k).
(b) Decide H1 if k ≥ k ∗ . Give Perr (k ∗ ; q0 , q1 ) and the k ∗ that minimises it.
Can any other rule do better?
(c) Take q0 = 0.3, q1 = 0.7. As P (H1 ) rises from 0.7 to 1.0, how does the
optimal k ∗ change?
18. Police radar bias.
The radar reading exceeds the true speed by U ∼ Unif(0, 5) mi/h. True
speeds are Unif(55, 75) mi/h. Find the LMS estimate of the true speed given
the radar measurement.
6
19. Shopping-cart estimator.
The number of carts Θ ∼ Unif{1, . . . , 100}. You observe the first cart number
X (uniform on 1, . . . , Θ). Find and plot the MAP and LMS estimators for
Θ. (Hint: cf. Example 8.2.)
20. Multiple-observation uniform problem.
i.i.d.
Given Θ = θ, X1 , . . . , Xn ∼ Unif[0, θ]; prior Θ ∼ Unif[0, 1]; n > 3.
(a) Derive the LMS estimator of Θ for data x1 , . . . , xn .
(b) For n = 5, plot the conditional MSE of MAP and LMS estimators vs.
x̄ = max{xi }.
(c) With x̄ = 0.5 fixed, describe how the estimators and their conditional
MSEs behave as n → ∞.
21. Conditional expectation identities.
Pn
(a) For i.i.d. Y1 , . . . , Yn with Y = i=1 Yi , show
Y
E[Y1 | Y ] = .
n
(b) Let Θ, W be independent N (0, k) and N (0, m) with integer k, m. Use
(a) to find E[Θ | Θ + W ] and connect with Example 8.3.
(c) Repeat (b) when Θ, W are independent Poisson(λ) and Poisson(µ).
22. Alice models the weekly time spent on homework by T ∼ Exp(θ) with
unknown rate θ. Times in five weeks were 10, 14, 18, 8, 20 hours. What is the
ML estimate of θ?
23. Consider independent coin tosses with head probability θ.
(a) Fix k and let N be the number of tosses until the k-th head occurs.
Find the ML estimator of θ based on N .
(b) Fix n and let K be the number of heads in n tosses. Find the ML
estimator of θ based on K.
7
24. Sampling and estimation of sums.
A box contains k balls; k of them are white and k − k are red. Each white
ball bears a (non-zero) number; red balls bear 0. We wish to estimate the
total of all ball numbers.
(a) Draw n balls with replacement, recording the numbers X1 , . . . , Xn . Also
record Yi , the number on the i-th white ball drawn, for an auxiliary
sample of size m. Define
n n m
kX k X k X
Sb = Xi , S= Xi , Se = Yi ,
n i=1 N i=1 m i=1
where N is the (random) number of white balls in the first n draws.
Show that all three estimators are unbiased for the total sum.
(b) Compute var(S) and var(S).e Prove that they are approximately equal
when
np k E[Y12 ]
m ≈ , p= , r= .
p + r(1 − p) k var(Y1 )
Further show that when m = n, var(S)/e var(S) = p p + r(1 − p) .
25. Mixture models.
Let the PDF of X be the mixture
m
X m
X
fX (x) = pj fYj (x), pj = 1, pj ≥ 0.
j=1 j=1
Assume each Yj ∼ N (µj , σj2 ) and the data X1 , . . . , Xn are i.i.d. with PDF
fX .
(a) Write the likelihood and log-likelihood.
(b) For m = 2, n = 1 with µ1 , µ2 , σ1 , σ2 known, find the ML estimators of
p1 , p2 .
(c) For m = 2, n = 1 with p1 , p2 , σ1 , σ2 known, find the ML estimators of
µ1 , µ2 .
(d) For general m and n with all parameters unknown, show that the like-
lihood can be made arbitrarily large by setting µ1 = x1 and σ12 → 0.
(Illustrates a pathology of ML estimation.)
8
26. Unstable particles.
Decay distances X1 , . . . , Xn are observed only when they lie in [m1 , m2 ]; the
true distribution is Exp(θ).
(a) Give the likelihood and log-likelihood.
(b) With m1 = 1, m2 = 20, n = 6 and x = (1.5, 2, 3, 4, 5, 12), plot the
log-likelihood versus θ and estimate the ML value of θ from the plot.
27. Heights of middle-school students.
Ten heights (cm) were recorded: 164, 167, 163, 158, 170, 183, 176, 159, 170, 167.
Assume female heights ∼ N (µ1 , σ12 ), male heights ∼ N (µ2 , σ22 ), and a student
is female/male with probability 1/2.
(a) Treat µ1 , µ2 , σ1 , σ2 as unknown; write the likelihood.
(b) Given σ12 = 9 and µ1 = 164, compute ML estimates of σ22 and µ2 .
(c) Given σ12 = σ22 = 9, compute ML estimates of µ1 , µ2 .
(d) Using the estimates from (c) as true values, describe the MAP rule for
deciding gender from a single height measurement.
28. Estimating the parameter of a Poisson random variable.
For i.i.d. X1 , . . . , Xn ∼ Poisson(λ),
(a) derive the ML estimator of λ;
(b) discuss unbiasedness and consistency.
29. Uniform parameter estimation I.
Given i.i.d. X1 , . . . , Xn ∼ Unif(0, θ):
(a) find the ML estimator of θ;
(b) determine whether it is consistent, unbiased, or asymptotically unbi-
ased;
(c) construct an alternative unbiased estimator if possible.
30. Uniform parameter estimation II.
Given i.i.d. X1 , . . . , Xn ∼ Unif(θ, θ + 1):
9
(a) find an ML estimator of θ;
(b) discuss consistency and (a)symptotic unbiasedness.
31. Photon counts and temperature inference.
Each trigger emits K photons with PMF pK (k; θ) = c(θ)e−θk , k = 0, 1, 2, . . .,
where θ is the inverse temperature.
(a) Determine the normalising factor c(θ).
(b) Find E[K] and var(K).
(c) Let ψ = 1/θ. Derive the ML estimator of ψ based on K1 , . . . , Kn .
(d) Prove that the estimator in (c) is consistent.
32. Let X be a normal random variable with mean µ and unit variance. We
want to test the hypothesis µ = 5 at the 5% level of significance, using n
independent samples of X.
(a) What is the range of values of the sample mean for which the hypothesis
is accepted?
(b) Let n = 10. Calculate the probability of accepting the hypothesis µ = 5
when the true value of µ is 4.
33. We have five observations drawn independently from a normal distribu-
tion with unknown mean µ and unknown variance σ 2 .
(a) Estimate µ and σ 2 if the observation values are 8.47, 10.91, 10.87, 9.46,
10.40.
(b) Use the t-distribution tables to test the hypothesis µ = 9 at the 95%
significance level, using the estimates of part (a).
34. A plant grows on two distant islands. Suppose that its life span (mea-
sured in days) on the first (or the second) island is normally distributed with
2
unknown mean µX (or µY ) and known variance σX = 32 (or σY2 = 29, respec-
tively). We wish to test the hypothesis µX = µY , based on 10 independent
samples from each island. The corresponding sample means are x̄ = 181 and
ȳ = 177. Do the data support the hypothesis at the 95% significance level?
10
35. A company considers buying a machine to manufacture a certain item.
When tested, 28 out of 600 items produced by the machine were found defec-
tive. Do the data support the hypothesis that the defect rate of the machine
is smaller than 3 percent, at the 5% significance level?
36. The values of five independent samples of a Poisson random variable
turned out to be 34, 35, 29, 31, and 30. Test the hypothesis that the mean
is equal to 35 at the 5% level of significance.
37. A surveillance camera periodically checks a certain area and records a
signal X = W if there is no intruder (this is the null hypothesis H0 ). If there
is an intruder the signal is X = θ + W , where θ is unknown with θ > 0.
We assume that W is a normal random variable with mean 0 and known
variance v = 0.5.
(a) We obtain a single signal value X = 0.96. Should H0 be rejected at the
5% level of significance?
(b) We obtain five independent signal values X = 0.96, −0.34, 0.85, 0.51, −0.24.
Should H0 be rejected at the 5% level of significance?
(c) Repeat part (b), using the t-distribution, and assuming the variance v
is unknown.
11