Lecture - 12
January 29, 2012
Introduction
Nonparametric methods are typically involve some sort of approximation or smoothing method. Some of the main methods are called kernels, series, and splines. Nonparametric methods are typically indexed by a bandwidth or tunning parameter which controls the degree of complexity. The choice of bandwidth is often critical to implementation. Data-dependent rules for determination of the bandwidth are therefore essential for nonparametric methods. Nonparametric methods which require a bandwidth, are incomplete. Unfortunately this is quite common, due to the diculty in developing rigorous rules for bandwidth selection. Often in these cases the bandwidth is selected based on a related statistical problem. This is a feasible yet worrisome compromise. Many nonparametric problems are generalizations of univariate density estimation. We will start with this simple setting, and explore its considerable details.
Kernel Density Estimation
Discrete Estimator
Let X a random variable with continuous distribution F (x) and density f (x) = The goal is to estimate f (x) from a random sample {X1 , ......., Xn }. The distribution function F (x) is naturally estimated by the EDF F (x) = n1
n i=1 d F (x). dx
1(Xi
d F (x), dx
x). It might seem natural to estimate the density f (x) as the derivative of F (x),
but this estimator would be a set of mass points, and as such is not a useful estimate of f (x). Instead, consider a discrete derivative. For some small h > 0, let F (x + h) F (x h) f (x) = 2h We can write this as 1 2nh
n n
i=1
1 1(x + h < Xi x + h) = 2nh = 1 nh
1
i=1 n
|Xi x| 1 h Xi x h
k
i=1
Where k(u) = is the uniform density function on [1, 1]. The estimator f (x) counts the percentage of observations which are close to the point x. If many obervations are near x, then f (x) is large. Conversely, if only a few Xi are near x, then f (x) is small. The bandwidth h controls the degree of smoothing. f (x) is a special case of what is called a Kernel estimator. The general case is 1 f (x) = nh Where k(u) is a Kernel function. 2
n
, |u| 1 0 |u| > 1
1 2
k
i=1
Xi x h
Kernel Functions
A Kernel function k(u) : R R is any function which satises
k(u)du = 1. A
non-negative kernel satises k(u) 0 for all u. In this case, k(u) is a probability density function. The moments of kernel are kj (k) =
uj k(u)du.
A symmetric kernel function satises k(u) = k(u) for all u. In this case, all odd moments are zero. Most nonparametric estimation uses symmetric kernels, and we focus on this case. The order of a kernel, , is dened as the order of the rst non-zero moment. For example, if k1 (k) = 0 and k2 (k) > 0 then k is a second-order kernel and = 2. If k1 (k) = k2 (k) = k3 (k) = 0 and k4 (k) > 0 then k is fourth-order kernel and = 4. The order of a symmetric kernel is always even. Symmetric non-negative kernel are second-order kernels. A kernel is higher-order kernel if > 2 these kernels will have negative parts and are not probability densities. They are also refered to as bias-reducing kernels. Common Second-order Kernels
Kernel uniform Epanechnikov Biweight Triweight Gaussion
Equation k0 (u) = 1 1(|u| 1) 2 k1 (u) = 3 (1 u2 )1(|u| 1) 4 k2 (u) = k3 (u) =
15 (1 16 35 (1 32
R(k)
1 2 3 5 5 7 350 429 1 2
k2 (k)
1 3 1 5 1 7 1 9
u2 )2 1(|u| 1) u2 )3 1(|u| 1)
u 1 e 2 2 2
k4 (u) =
In addition to the kernel formula we will discuss its roughness R(k), second moment k2 (k). The roughness of a function is R(g) =
g 2 (u)du
The most commonly used kernels are the Epanechnikov and the Gaussion.