KEMBAR78
RobustStats Practice Problems | PDF | Robust Statistics | Mathematical Analysis
0% found this document useful (0 votes)
10 views4 pages

RobustStats Practice Problems

The document presents a series of problems and solutions related to robust statistics, including the effects of adding a value to a dataset on standard deviation and median, properties of the exponential distribution, and variance estimations for mean and median. It also discusses the influence function of the Huber estimator, conditions for M-estimates, and properties of L-estimates. Additionally, it covers topics such as the breakdown points of statistical measures and the asymptotic behavior of trimmed means.

Uploaded by

vrishti.godhwani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views4 pages

RobustStats Practice Problems

The document presents a series of problems and solutions related to robust statistics, including the effects of adding a value to a dataset on standard deviation and median, properties of the exponential distribution, and variance estimations for mean and median. It also discusses the influence function of the Huber estimator, conditions for M-estimates, and properties of L-estimates. Additionally, it covers topics such as the breakdown points of statistical measures and the asymptotic behavior of trimmed means.

Uploaded by

vrishti.godhwani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Robust Statistics
Sravan Danda
November 29, 2024

1. Show that if a value x0 is added to a dataset {x1 , · · · , xn } where −∞ < x0 < ∞ then the standard
deviation of the modified dataset ranges from a value smaller than the standard deviation of the
original dataset and ∞.
2. Consider the situation of the former problem.
(a) Show that if n is even, the maximum change in the sample median when x0 ranges from
−∞ to ∞ is the distance from median of the original dataset to the next order statistic, the
farthest from the median.
(b) What is the maximum change when n is odd?
log2
3. Show that the median of the exponential distribution is and hence log2 divided by sample
λ
median is a consistent estimator of λ.
Solution:
ln 2
To show that the median of the exponential distribution with rate parameter λ is , we can
λ
follow these steps:
The cumulative distribution function (CDF) of an exponential distribution with rate parameter λ
is:
F (x) = 1 − e−λx

By definition, the median m of the distribution satisfies F (m) = 0.5.


So, we set F (m) = 0.5 and solve for m:

1 − e−λm = 0.5

Rearranging, we get:
e−λm = 0.5

Taking the natural logarithm of both sides:

−λm = ln(0.5)

Recognizing that ln(0.5) = − ln(2), we have:

ln(2)
m=
λ

ln(2)
Thus, the population median of the exponential distribution is .
λ
Since sample median mn converges to population median m as the number of samples increase to
infinity, hence the result.

These are selected problems from Robust Statistics: Theory and Methods, Ricardo A. Maronna, R. Douglas Martin
and Victor J. Yohai, 2006, John Wiley and Sons.

1
4. Let F = (1 − )N (µ, 1) + N (µ, τ 2 ) then show that
(a) Variance of the mean estimator is given by

(1 − ) + τ 2
V ar(X̄) = (1)
n
(b) Variance of the median estimator is given by
π
X )) ≈
V ar(M ed(X (2)
2n(1 −  + τ )2

5. Consider the family of student’s t distribution with v degrees of freedom. The density is given by
− v+1
Γ( v+1 ) x2
 2

fv (x) = √ 2 v 1+ (3)
vπΓ( 2 ) v

This family contains all degrees of heavy-tailedness. When v → ∞, the distribution tends to
standard Gaussian and for v = 1, we have the Cauchy distribution. Find the values of v for which
the t distribution have finite moments of order k.
6. Show that if µ is a solution of
n
X
ψ(xi − µ̂) = 0 (4)
i=1

then µ+c is a solution of the same equation with xi +c instead of xi . Here ψ = ρ0 where ρ = −logf0
with f0 being the density of the probability distribution from which the samples are generated.
7. Show that if X = µ0 + U where the distribution of U is symmetric about 0 then µ0 is a solution of

EF [ψ(X − µ0 )] = 0 (5)

8. Verify
EΦ [ψk (x)2 ] = 2[k 2 (1 − Φ(k)) + Φ(k) − 0.5 − kφ(k)] (6)
where Φ and φ denote the cumulative distribution function and the density function of standard
Gaussian respectively. ψk is the Huber’s function defined by
(
x if |x| ≤ k
ψk (x) = (7)
sgn(x)k if |x| > k

Solution:
To prove
EΦ [ψk (x)2 ] = 2 k 2 (1 − Φ(k)) + Φ(k) − 0.5 − kφ(k) ,
 
(8)
where Φ and φ denote the CDF and PDF of the standard Gaussian distribution, respectively, and
ψk (x) is the Huber function defined by
(
x, if |x| ≤ k,
ψk (x) = (9)
sgn(x)k, if |x| > k,

we proceed as follows:
Z ∞
EΦ [ψk (x)2 ] = ψk (x)2 φ(x) dx.
−∞

Since ψk (x) behaves differently over |x| ≤ k and |x| > k, split the integral:
Z k Z
EΦ [ψk (x)2 ] = x2 φ(x) dx + k 2 φ(x) dx.
−k |x|>k

2
First Part (for |x| ≤ k)**:
Z k Z k
2
x φ(x) dx = 2 x2 φ(x) dx.
−k 0

Second Part (for |x| > k)**:


Z Z ∞
2 2
k φ(x) dx = 2k φ(x) dx = 2k 2 (1 − Φ(k)).
|x|>k k

Z k
Using integration by parts for x2 φ(x) dx:
0
Z k Z k
k
x2 φ(x) dx = [−xφ(x)]0 + φ(x) dx.
0 0

2
e−k /2
Substituting φ(k) = √ and simplifying:

Z k
x2 φ(x) dx = 2(Φ(k) − kφ(k) − 0.5).
−k

Substitute back to get:

EΦ [ψk (x)2 ] = 2 k 2 (1 − Φ(k)) + Φ(k) − 0.5 − kφ(k) .


 

Once can analyze the influence function of the Huber estimator using this result
The influence function describes the effect of a small contamination at a point x on the estimator.
For the Huber estimator, it is given by:

ψk (x)
IF (x) = .
EΦ [ψk (x)2 ]

From the previous result, we know:

EΦ [ψk (x)2 ] = 2 k 2 (1 − Φ(k)) + Φ(k) − 0.5 − kφ(k) .


 

Thus,
ψk (x)
IF (x) = .
2 [k 2 (1 − Φ(k)) + Φ(k) − 0.5 − kφ(k)]

Evaluating IF (x) in Different Regions:


- For |x| ≤ k:
x
IF (x) = .
2 [k 2 (1 − Φ(k)) + Φ(k) − 0.5 − kφ(k)]

- For |x| > k:


k sgn(x)
IF (x) = .
2 [k 2 (1 − Φ(k)) + Φ(k) − 0.5 − kφ(k)]

This bounded influence function indicates that the Huber estimator is robust to outliers.

9. Show that if ψ is odd then the M-estimate µ̂ with fixed σ satisfies the following conditions:
• If xi ≥ 0 for all i then µ̂ ≥ 0.
• If xi = c for all i then µ̂ = c.
• µ̂(−x) = −µ̂(x)

3
10. Show that L-estimates are shift and scale equivariant and also satisfy
• If xi ≥ 0 for all i then µ̂ ≥ 0.
• If xi = c for all i then µ̂ = c.
• µ̂(−x) = −µ̂(x)
11. Let [a, b], where a, b depend on the data be the shortest interval containing at least half of the data.
(a) The Shorth (shortest half) location estimate is defined as the midpoint
a+b
µ̂ = (10)
2
Show that
µ̂ = ArgM in[M ed1≤i≤n |xi − µ|] (11)
µ

(b) Show that the difference b − a is a dispersion estimate


(c) For a distribution F , let [a, b] be the shortest interval with probability 0.5. Find this interval
for N (µ, σ 2 )
12. Let µ̂ be a location M-estimator. Show that if the distribution of xsi is symmetric about µ then so
is the distribution of µ̂, and that the same happens with trimmed means.
13. Recall that Newton-Raphson procedure is a widely used iterative method for numerically solving
non-linear equations. To solve for h(t) = 0, at each iteration h is linearized i.e. replaced by its
Taylor expansion of order 1 about the current approximation. Thus, if at iteration m we have the
approximation tm , then the next value tm+1 is the solution of
h(tm ) + h0 (tm )(tm+1 − tm ) = 0 (12)
Geometrically, at every current estimate we draw a tangent and the updated estimate is the t-
coordinate where the tangent to the curve (t, h(t)) cuts the t-axis. In the context of location
M-estimator, the update is given by
Pn
ψ(xi − µm )
µm+1 = µm − Pni=1 0 (13)
i=1 ψ (xi − µm )
(a) Argue that if the sequence {µm } converges then the limit is the solution to
n
X
ψ(xi − µ) = 0 (14)
i=1

(b) Can you find an example of ψ where the sequence does not converge?
14. Verify that the breakdown points of Standard Deviation and Median absolute deviation about
1
median are 0 and respectively.
2
15. Show that the asymptotic breakdown point of α-trimmed mean is α.
16. Show that the breaking point of equivariant dispersion estimates is ≤ 0.5.
17. Let the density f (x) be a decreasing function of |x|. Show that the shortest interval covering a
given probability is symmetric about zero. Use this result to calculate the influence function of the
Shorth estimate for data with distribution f .
18. For the exponential family given by
1 −x
fθ (x) = e θ I{x≥0} (15)
θ
M ed{xi }
show that the estimate with smallest gross error sensitivity is . Find its efficiency w.r.t.
log2
MLE.

You might also like