0% found this document useful (0 votes)

9 views21 pages

Izenman 1991

The article reviews recent advancements in nonparametric density estimation, highlighting the impact of computational tools on statistical research. It discusses various methods including kernel estimators, maximum penalized likelihood estimators, and adaptive estimators, along with their applications in multivariate cases. The review also addresses related areas such as nonparametric regression and density estimation for censored data.

Uploaded by

davehovi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views21 pages

Izenman 1991

Uploaded by

davehovi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

Journal of the American Statistical Association

ISSN: 0162-1459 (Print) 1537-274X (Online) Journal homepage: http://www.tandfonline.com/loi/uasa20

Review Papers: Recent Developments in

Nonparametric Density Estimation

Alan Julian Izenman

To cite this article: Alan Julian Izenman (1991) Review Papers: Recent Developments in
Nonparametric Density Estimation, Journal of the American Statistical Association, 86:413,
205-224, DOI: 10.1080/01621459.1991.10475021

To link to this article: https://doi.org/10.1080/01621459.1991.10475021

Published online: 27 Feb 2012.

Submit your article to this journal

Article views: 169

View related articles

Citing articles: 36 View citing articles

Full Terms & Conditions of access and use can be found at

http://www.tandfonline.com/action/journalInformation?journalCode=uasa20
Recent Developments in Nonparametric
Density Estimation
ALAN JULIAN IZENMAN*

Advances in computation and the fast and cheap computational facilities now available to statisticians have had a significant
impact upon statistical research, and especially the development of nonparametric data analysis procedures. In particular, the-
oretical and applied research on nonparametric density estimation has had a noticeable influence on related topics, such as
nonparametric regression, nonparametric discrimination, and nonparametric pattern recognition. This article reviews recent de-
velopments in nonparametric density estimation and includes topics that have been omitted from review articles and books on
the subject. The early density estimation methods, such as the histogram, kernel estimators, and orthogonal series estimators
are still very popular, and recent research on them is described. Different types of restricted maximum likelihood density es-
timators, including order-restricted estimators, maximum penalized likelihood estimators, and sieve estimators, are discussed,
where restrictions are imposed upon the class of densities or on the form of the likelihood function. Nonparametric density
estimators that are data-adaptive and lead to locally smoothed estimators are also discussed; these include variable partition
histograms, estimators based on statistically equivalent blocks, nearest-neighbor estimators, variable kernel estimators, and adap-
tive kernel estimators. For the multivariate case, extensions of methods of univariate density estimation are usually straightfor-
ward but can be computationally expensive. A method of multivariate density estimation that did not spring from a univariate
generalization is described, namely, projection pursuit density estimation, in which both dimensionality reduction and density
estimation can be pursued at the same time. Finally, some areas of related research are mentioned, such as nonparametric
estimation of functionals of a density, robust parametric estimation, semiparametric models, and density estimation for censored
and incomplete data, directional and spherical data, and density estimation for dependent sequences of observations.
KEY WORDS: Adaptive estimators; Censored data; Delta sequences; Directional data; Histograms; Kernel estimators; Max-
imum penalized likelihood; Method of sieves; Multivariate density estimation; Nearest neighbor methods; Or-
der-restricted maximum likelihood methods; Orthogonal series; Projection pursuit density estimation; Statis-
tically equivalent blocks.

1. INTRODUCTION neighbor methods-were inspired by application to non-

parametric discrimination and developments in spectral
The field of nonparametrics has broadened its appeal in
density estimation for stationary time series. Later, meth-
recent years with an array of new tools for statistical anal-
ods such as penalized likelihood, polynomial spline, vari-
ysis. These new tools offer sophisticated alternatives to tra-
able kernel, sieves, and projection pursuit were introduced
ditional parametric models for exploring large amounts of
with other objectives in mind. What has helped make non-
univariate or multivariate data without making specific dis-
parametric density estimation (and related methods) popu-
tributional assumptions. As one of those tools, nonpara-
lar today can be traced to a combination of circumstances:
metric density estimation has become a prominent statistical
the growing importance of computers in statistical research,
research topic. If X,; Xl, ... , X, is a random d-dimensional
the public availability of quality statistical software, and a
sample from a continuous probability density function f,
general awareness of the advantages of high-level graphics.
where
For example, in comparing data from two independent
f(x) ~ 0, r f(x) dx
lK d
= I, (1.1) samples, nonparametric density estimates can be very help-
ful. In a study by Kasser and Bruce (1969) of coronary
heart disease patients and age-matched "normals," a num-
the general problem is to estimate f when no formal para-
ber of variables were recorded on 117 men in each group.
metric structure is specified. In other words, f is taken to
These variables included heart rates recorded at rest and at
belong to a large enough family of densities so that it can-
their maximum following exercise. Figure I shows kernel
not be represented through a finite number of parameters.
density estimates of resting heart rate and maximum heart
"Smoothness" conditions are usually imposed on f and its
rate for both groups. Notice that the maximum heart rate
derivatives, although there are applications (e.g., X-ray
density estimate for the patient group appears to be bimo-
transmission tomography) in which discontinuities inf (tis-
dal, while for the normal group, the density estimate is es-
sue density) are natural (see Johnstone and Silverman 1990).
sentially unimodal. The opposite appears to be the case for
Perhaps the earliest nonparametric estimator of a univar-
resting heart rate. Figures 2 and 3 show a contour plot and
iate density f was the histogram. Further breakthroughs-
a perspective plot, respectively, of the bivariate density es-
initially, with the kernel, orthogonal series, and nearest-
timate of resting and maximum heart rates for both groups.
The shapes of both bivariate density estimates, especially
* Alan Julian Izenman is Associate Professor and Director of the Sta-
tistical Computing Laboratory, Department of Statistics, Temple Uni-
the direction and extent of bimodality, could be used to
versity, Philadelphia, PA 19122. The author thanks Luc Devroye, M. C. classify future males into one of the two diagnostic groups.
Jones, David Scott, Simon Sheather, Bernard Silverman, Mike Steele, Researchers have thus found nonparametric density es-
Michael Tarter, and Ed Wegman for their detailed comments, sugges-
tions, preprints, reprints, and references, and Partha Bagchi and Albert
Cheung for assistance with the graphs. Thanks also go to the coordinating © 1991 American Statistical Association
editor and referees, whose comments helped greatly in revising this ar- Joumal of the American Statistical Association
ticle. March 1991, Vol. 86, No. 413, Review Paper

205
206 Journal of the American Statistical Association, March 1991

0.04
0.03

0.03

0.02
...>-
'iii 0.02
c
ell
"C /
I , /
I
0.01 I
0.01 i
/
/
/
r
/ /
/ /
./
0.0 0.0

40 60 80 100 120 80 100 120 140 160 180 200 220

resting heart rate maximum heart rate
(a) (b)

Figure 1. Gaussian Kernel Density Estimates of (a) Resting Heart Rate and (b) Maximum Heart Rate Following Exercise for a Group of 117
Male Heart Patients (Dotted Lines) and for a Group of 117 Age-Matched Male "Normals" (Solid Lines) in a Study of Coronary Heart Disease
(Kasser and Bruce 1969). For each density estimate, the window-width was taken to reflect sample variation. Note especially the bimodal
density estimate for maximum heart rate for the patient group and the bimodal density estimate for resting heart rate for the normal group.
Source of data: Kronmal and Tarter (1973).

timates effective in the following situations: (a) In explor- such as multimodality, tail behavior, and skewness, are of
atory analysis, descriptive features of the density estimate, special interest, and a nonparametric approach may be more

220

200

180

...
ell
III , I
I
,,
,
160 ,
..."-
"-
I
I /
..
i ,\
III
ell
\
,
.c ,
, /

E
;]
\

,I I ,
/
,,
/

E 140 , /
.. ,
'xIII , /
I
,,
, /

,
I
/
/
,
,
\
"

E , I
/
,
i
, I I
I
I I , ,, i
,
I I
I
I
I
,,
I

,
, ,
I
I

,
I ,,
/ \

120 I
, /
I

100

40 60 80 100 ·120

resting heart rate

Figure 2. Equal Probability Contours of Bivariate Gaussian Kernel Density Estimates of Resting Heart Rate and Maximum Heart Rate From
Figure 1. The normals-group density contours are shown as solid lines and the patient-group density contours are shown as dotted lines. Notice
that the bimodal orientations of the density contours of the two groups appear orthogonal to each other.
Izenman: Recent Developments in Nonparametric Density Estimation 207

""'~~.-lillill
maximum
heart rate heart rate
maximum

(a) (b)

Figure 3. Three-Dimensional Perspective Plots of Bivariate Gaussian Kernel Density Estimates of Resting Heart Rate and Maximum Heart
Rate From Figure 1. The normals group is displayed in (a) and the patient group in (b).

flexible than the traditional parametric methods; (b) in con- 1985; Hand 1982; Nadarya 1989; Prakasa Rao 1983; Sil-
firmatory analysis, nonparametric density estimates are used verman 1986; Tapia and Thompson 1978; Van Es 1990;
in decision making, such as nonparametric discrimination and Wertz 1978); certain books emphasized density esti-
and classification analysis, testing for modes, and random mation methods preferred by the authors, while others were
variate testing; and (c) for presentational purposes, statis- more comprehensive in their treatment of the diverse ma-
tical peculiarities of the data often can be readily explained terial. As with most statistical research, much of what has
to clients through simple graphical displays of estimated been written on the subject of nonparametric density esti-
density curves (See Silverman 1981a). There is a very re- mation, including most of these books, has been completely
vealing example of (a) by Park and Marron (1990) where theoretical; some books (such as Silverman 1986), how-
they display a sequence of annual lognormal density esti- ever, contain discussions of real-data examples, simulation
mates for net income data that indicated unimodal densities studies, and computational issues. References to JASA re-
hardly changing from year to year, while nonparametric views of some of these books are listed in Table 2. See
density estimates indicated at least two modes and signif- also the book review by Silverman (1985). The successful
icant changes in shape over time. Further published appli- development of nonparametric density estimation tech-
cations of nonparametric density estimation can be found niques led, in tum, to the formulation of nonparametric
listed and briefly described in Table 1. regression (Eubank 1988; Muller 1988; Nadarya 1989), in-
The last two decades have seen a consolidation and a cluding the nonparametric analysis of growth curves, and
critical assessment of nonparametric density estimation nonparametric statistical pattern recognition (Devijver and
methods. Several review articles (Bean and Tsokos 1980; Kittler 1982; Fukunaga 1972, chap. 6).
Fryer 1977; Leonard 1978; Rosenblatt 1971; Tarter and This article surveys recent developments in nonparamet-
Kronmal1976; and Wegman 1972, 1982) and an extensive ric density estimation, as well as topics that were omitted
bibliography (Wertz and Schneider 1979) were published, from previous review articles and books. Section 2 dis-
as well as nine books (Devroye 1987; Devroye and Gyorfi cusses desirable statistical properties of nonparametric den-

Table 1. Case Studies Involving Nonparametric Density Estimation

Reference Topic Method Remarks

Silverman (1978c) Identifying the causes of "cot MPL Univariate data; assessing bimodality
death"
Scott, Gotto, Cole, and Gorry (1978) Coronary heart disease Kernel Bivariate data; classification problem
Good and Gaskins (1980) High-energy physics and "bump- MPL Univariate grouped data; assessing a
hunting" bump in a mass spectrum
histogram
Dubuisson and Lavison (1982) Surveillance of a nuclear reactor Kernel Multivariate data; classification
problem
Scott and Thompson (1983) Remote sensing of satellite ASH Trivariate data; exploratory analysis
agricultural crop data
Aitchison and Lauder (1985) Compositional data for geology Kernel Multivariate data vectors of
and consumer demand proportions summing to unity
analysis
De Jager, Swanepoel, and Gamma-ray astronomy for Kernel Univariate data; assessing Whether
Raubenheimer (1986) estimating light curves and light curve differs from uniform
identifying periodic sources density
Izenman and Sommer (1988) Identifying the components of a Kernel Univariate data; assessing
philatelic mixture multimodality; comparison with
parametric mixture
208 Journal of the American Statistical Association, March 1991

Table 2. Citations of Reviews in JASA of Books on Nonparametric Density Estimation

Author Source of review Reviewer General comments

Wertz (1978) JASA, 75 (1980), 241 K.-S. Lii Emphasizes kernel methods; theoretical
Tapia and Thompson (1978) no JASA review Emphasizes MPL method; theoretical;
Monte Carlo simulations
Hand (1982) JASA, 78 (1983), 990-991 J. D. Knoke Kernel methods only; some applications;
univariate and multivariate approaches
Prakasa Rao (1983) JASA, 81 (1986), 264 V. Surarla Comprehensive; theoretical; applications
to different topics
Devroye and Gyorfi (1985) JASA, 82 (1987), 344 J. R. Thompson Comprehensive; theoretical; L, viewpoint
Silverman (1986) JASA, 83 (1988), 269-270 A. J. Izenman Comprehensive; numerous real-data
applications; univariate and
multivariate approaches;
computational details
Devroye (1987) no JASA review Emphasizes kernel methods; theoretical;
L, viewpoint
Nadarya (1989) JASA, 85 (1990), 598 D. W. Scott Emphasizes kernel methods; theoretical

sity estimates, followed in Sections 3-9 by reviews of the where var[}(x)] = E/t.}(x) - EA}(x)]Y and bias[}(x)] =
various estimation methods. Finally, in Section 10, some EA}(x)] - f(x). If MSE(x) ~ 0 for all x ERas n ~ 00,
remarks are made about related research areas. Note that then} is said to be a pointwise consistent estimator off in
the references, though numerous, should not be regarded quadratic mean. A more important performance criterion
as exhaustive. relates to how well the entire curve} estimates f. One such
measure of goodness of fit is found by integrating (2.1)
2. STATISTICAL PROPERTIES OF DENSllY ESTIMATORS over all values of x, yielding the integrated mean squared
Like any statistical procedure, nonparametric density es- error,
timators are recommended only if they possess desirable
properties. Finite-sample properties of nonparametric den- IMSE = 1'' 00 Ei}(x) - f(x)]z dx. (2.2)
sity estimators are available for special situations (Deheu-
vels 1977; Fryer 1976), but, in general, research emphasis Another measure commonly used is integrated squared er-
has settled on developing large-sample properties. ror (or L z norm),
2.1 Unbiasedness
ISE = Loooo [}(x) - f(x)f dx. (2.3)
Consider, for example, unbiasedness. An estimator} of
a probability density function f is unbiased for f if, for all Taking expectations over fin (2.3) gives the mean inte-
x E R d , Ei}(x)] = f(x). Although unbiased estimators of grated squared error, MISE = E/ISE). Note that MISE =
parametric densities, such as the normal, Poisson, expo- IMSE. ISE is often preferred as a criterion, rather than its
nential, and geometric, do exist (Ghurye and Olkin 1969), expected value MISE, since ISE determines how closely}
no bona fide density estimator [that is, satisfying (1. 1)] can approximates f for a given data set, whereas MISE is con-
exist that is unbiased for all continuous densities (Rosen- cerned with the average over all possible data sets. Under
blatt 1956). Hence attention has since focused on sequences mild conditions, ISE has been shown to be a reasonably
Un} of nonparametric density estimators that are asymptot- random approximation to MISE (Marron and Hardle 1986),
ically unbiased for f; that is, for all x E R d , Ei}n(x)] ~ while, in certain situations, MISE may actually be a better
f(x) as n ~ 00. performance criterion than ISE (Hall and Marron 1988).
2.2 Consistency Farrell (1972) showed that for bona fide density estimates,
the best possible asymptotic rate of convergence for MISE
A more important property is consistency. The simplest is O(n- 4!5) , and Boyd and Steele (1978) proved that no}
notion of consistency of a density estimator is where} is can exist with a MISE better than O(n-)), even if f is a
(weakly) pointwise consistent for a univariate f if }(x) ~ normal density.
f(x) in probability for every x E R, and is strongly point-
wise consistent for f if convergence holds almost surely. The L] Approach. One problem with the L, approach
Other types of consistency depend upon the error criterion to nonparametric density estimation is that the tail behavior
(L) or L z, in general); see Hall (1989b). of a density becomes less important, possibly resulting in
peculiarities in the tails of the density estimate. Further ob-
The Lz Approach. Iff is assumed square integrable, then jections to the L z approach can be found in Donoho and
the performance of J at x E R is measured by the mean Johnstone (1989). In two books (Devroye 1987; Devroye
squared error, and Gyorfi 1985), and in a host of articles, an alternative
MSE(x) = Ef[}(x) - f(x)f (2.1) L) theory of nonparametric density estimation was vigor-
ously pursued by Devroye and his colleagues. Specifically,
= var[}(x)] + {bias[}(x)]f, Devroye and Gyorfi (1985, p. 1) claimed that L) is "the
Izenman: Recent Developments in Nonparametric Density Estimation 209

natural space for densities," and showed that the integrated ~;:I N; = n. Then, the histogram, defined by
absolute error (also known as the total variation or the L I
norm), A ~ NJn
f(x) = LJ lr,(x) , (3.1)
;=1 (t n •i + 1 - tn )

IAE = L"", Ij(x) I

- f(x) dx, (2.4) satisfies (1.1). If h; = tn,i+1 - tn.; (i = I, 2, ... , m), is a
common bin width, then (3.1) reduces to
is always well defined as a norm on that space, is invariant AIm
under monotone transformations, and 0:5 IAE :5 2. If IAE !h.(X) = -
nh; i=1
2:
N;lT,(x). (3.2)
~ 0 in probability as n ~ 00, thenj'Is said to be a consistent
estimator of f; strong consistency of j occurs when con- As a density estimator, however, the histogram leaves much
vergence holds almost surely. The distance IAE is related to be desired, with defects that include "the fixed nature of
to Kullback-Leibler relative entropy and Hellinger dis- the cell structure, the discontinuities at cell boundaries, and
tance; see Devroye and Gyorfi (1985, chap. 8) for details. the fact that it is zero outside a certain range" (Hand 1982,
The expectation of (2.4) over all densitiesfyields the mean p. 15). A much more serious defect relates to the sensitivity
integrated absolute error, MIAE = EiIAE]. Some quite re- of histogram shapes to the choice of origin; see Silverman
markable results were proved by Devroye and his colleages (1986, sec. 2.2) for an example.
concerning the asymptotic behavior of IAE and MIAE un-
der little or no assumptions on f. Hall and Wand (1988)
3.1 The Histogram As a Maximum Likelihood
derived a general asymptotic expression for MIAE and
Estimator
showed that its minimization reduced to numerically solv- Let H(O) be a specified class of real-valued functions
ing a particular equation. One thing, however, is clear: The defined on O. The maximum likelihood (ML) problem is
technical labor needed to get L I results is substantially more to find an f to maximize the likelihood function L(f) =
difficult than that needed to obtain analogous L z results. Il7=J!(Xj ) , or its logarithm, subject tofE H(O), dt s.s»
= 1, andf(t) ~ 0 (\:;/t E 0). If H(O) is finite dimensional,
2.3 Bona Fide Density Estimates then a (not necessarily unique) solution to this problem ex-
Of the density estimation methods currently available, ists and is called an ML estimator off. The uniqueness of
some always yield bona fide density estimates, while others the solution depends upon the specification of H(O). The
generally yield density estimates that contain negative or- histogram is the unique ML estimator based on the random
dinates (especially in the tails) or have an infinite integral. sample X], ... , X n , where H consists of functions of the
Negativity can occur naturally, as a result of data sparse- form ~;:ly;lT, (Yi E R). See de Montricher, Tapia, and
ness in certain regions (Boneva, Kendall, and Stefanov 1971; Thompson (1975), where the histogram was also described
Kronmal and Tarter 1968), or it can be caused by relaxing as a polynomial spline of degree 0 (functions which are
the nonnegativity constraint in (1.1) in order to improve the piecewise constant) with knots at the points tn.], ... , tn. m + l •
rate of convergence of an estimator of f. Moreover, in the More generalized versions of the histogram using polyno-
quest for faster convergence rates of estimators, some re- mial splines of higher degree appear in Tapia and Thomp-
searchers have chosen to relax the integral constraint in (1. I) son (1978, chap. 3).
rather than the nonnegativity constraint; see Terrell and Scott 3.2 Statistical Properties
(1980). There are several ways to alleviate such problems.
The density estimate may be truncated to its positive part Under different sets of conditions on f and (3.2), Scott
and renormalized; alternatively, one might estimate a trans- (1979) and Freedman and Diaconis (1981b) showed that if
formed version of f, say log f or f 1/2, and then transform h; ~ 0 and nh; ~ 00 as n ~ 00, then IMSE ~ 0, and that
back to get a nonnegative estimate off. Gajek (1986) pro- IMSE is asymptotically minimized if h: = [6/R(f')f/3
posed a simple improvement scheme by which any density x n- I / 3 , where R(g) = f.:'",[g(x)]z dx. For Gaussian data
estimator that was not a bona fide density could be made with variance tr", for example, h: = 3.49un- 1/ 3 • The op-
to converge to a bona fide density. timal IMSE convergence rate of O(n- Z/ 3) is substantially
slower than most other kinds of density estimators, such as
3. THE HISTOGRAM kernel estimators, and gives a more technical reason why
histograms should not be used as density estimators. De-
Traditionally, the histogram has been used to provide a vroye and Gyorfi (1985, sees. 3.3 and 5.4) showed that the
visual clue to the general shape off. Suppose f has support histogram (3.2) was strongly consistent for all f and that
o = [a, b], where a and b are usually taken to encompass MIAE was of order O(n- I / 3 ) . See also Freedman and Dia-
the observed data. Partition [a, b] into a grid (or mesh) or conis (l981a).
m nonoverlapping bins (or cells) T; = [tn.;, tn.H I ) (i = 1,2,
3.3 Choice of Bin Width
... , m), where a = tn. 1 < t n. Z < ... < t n. m + 1 = b, and the
bin edges {tnJ are shown depending on the sample size n. Since h:depends upon the unknownfthrough R(f'), an
This is generally termed a fixed partition of O. Let IT, be estimate j ofj can be "plugged into" h:.
For example, Scott
the indicator function of the ith bin and let N, be the number (1979) found that the approximate optimal bin width fz: =
of sample values falling into T; (i = I, 2, ... , m), where 3.49sn- I / 3 , where s is the sample standard deviation, worked
210 Journal of the American Statistical Association, March 1991

well for Gaussian samples, while it led to overly large bin to have been the first to call K in (4.1) a kernel function;
widths and hence oversmoothing otherwise. Freedman and previously, K was referred to as a weight function. Note
Diaconis (1981b) suggested a "simple, robust rule [that] that the same amount of smoothing is used in (4.1) for each
often gives quite reasonable results," namely, il: = of the d dimensions. The fast Fourier transform is recom-
2(IQR)n- I /3, where IQR is the interquartile range of the mended for computing (4.1) in the univariate case (d = 1);
data. Numerical comparisons by Emerson and Hoaglin (1983) see Silverman (1982a) and Jones and Lotwick (1984). Since
of the Scott and Freedman-Diaconis rules showed the (4.1) shows thatJh inherits whatever properties the kernel
Freedman-Diaconis rule led to narrower bin widths, al- K possesses, it is important that K have desirable proper-
though "in practical applications the two rules will often ties.
lead to the same choice of interval width." Terrell and Scott The simplest class of kernels consists of probability den-
(1985) and Terrell (1990) argued that h; should be chosen sity functions that satisfy
conservatively by restricting the choice of bin width to the
value that yields the smoothest density, subject to a given K(x) 2 0, r K(x) dx = 1.
JR
(4.2)
measure of spread (such as the standard deviation or range). d

Information-based methods for the histogram were studied If a kernel K from this class is used in (4. I), then Jh will
by Taylor (1987), who used Akaike's information criterion always be a bona fide probability density. Popular choices
for determining an optimal histogram bin width, and by of univariate kernels include the Gaussian kernel with un-
Rodriguez and van Ryzin (1985), who defined maximum bounded support,
entropy histograms. Scott (1988) studied hexagonal and
square bin shapes for bivariate histograms. K(x) = (277)-1/2 e-x'/2, x E R, (4.3)
and the compactly supported "polynomial" kernels,
3.4 Related Estimators
By modifying the block-like shape of the histogram, a
faster rate of IMSE convergence of O(n- 4 / 5 ) (or close to it) r
can be attained by the following estimators. K rs = , r> 0, s 2 O. (4.4)
2Beta(s + I, l/r)
The averaged shifted histogram (ASH) of Scott and
Thompson (1983) and Scott (1985a) is constructed by av- The rectangular kernel obtains in (4.4) if s = 0 (KrlJ = 1/
eraging several histograms with equal bin widths but dif- 2); the triangular kernel if r = I, s = I (KII = I); the
ferent bin locations and was motivated by the need to re- Bartlett-Epanechnikov kernel if r = 2, s = I (K21 = 3/4);
solve the problem of a choice of bin origin; its computational the biweight kernel if r = 2, s = 2 (K22 = 15/16); the
efficiency in the multivariate case has made the ASH pop- triweight kernel if r = 2, s = 3 (K23 = 35/32); and, after
ular among many researchers. a suitable rescaling, the Gaussian kernel if r = 2, S = 00.
The classical frequency polygon (FP), studied by Scott The triangular kernel density estimate is asymptotically re-
(1985b), is constructed by connecting the mid-bin values lated to the ASH since the former is obtained as a limit of
of the histogram with straight lines. The FP was especially the latter as the number of shifted histograms becomes in-
recommended for interpolating the ASH, leading to the ASH- finite. For x E R d , multivariate kernels are usually radially
FP. Jones (1989) studied discretization and interpolation symmetric unimodal densities such as the Gaussian K(x) =
problems related to the ASH and ASH-FP. (27T)-d/2 e- O / 2)x' X, and the Bartlett-Epanechnikov, K(x) =
The histospline of Boneva, Kendall, and Stefanov (1971) «d + T
2)/2cd)(1 - x x )I [x' X'; I]' Cd = 7Td/ 2/ f « d / 2) + I).
is a cardinal quadratic spline fitted to the histogram and is In certain situations (Cacoullos 1966), product kernels
obtained by interpolating the knots of the sample distribu- may be appropriate, where K(x) = II1=1 K(x;) is a product
tion function t; = n- I ~7=1 I[xisxl and then differentiating of univariate kernel functions. For example, Figures 2 and
the cubic spline estimator of the distribution function F. 3 were computed using bivariate product Gaussian kernel
A weighted histogram estimator of f, also referred to as density estimates. In a similar study, Scott, Gotto, Cole,
a Bernstein polynomial-type approximation, was proposed and Gorry (1978) used bivariate product biweight kernel
by Vitale (1975) and Gawronski and Stadtmuller (1980), density estimates.
where the bin counts were weighted by empirical Poisson
probabilities. 4.1 Statistical Properties
Deriving asymptotic properties of kernel density esti-
4. KERNEL DENSITY ESTIMATION mates depends on the particular viewpoint considered. De-
The multivariate kernel density estimator off has the form vroye (1983), using the L 1 approach, proved the remarkably
simple result that if K satisfies (4.2), then the kernel esti-
Jh(X) = (nh d) -I 2:
n
K (x - X)
_ _1 ,
mator (4.1) will be a strongly consistent estimator of f if
and only if h; ~ 0 and nh: ~ 00, as n ~ 00, without any
j=1 h
conditions on f. Devroye and Penrod (1984) also showed
where the choice of kernel function K and the window width that, for the univariate case, MIAE was of order O(n- 2 / 5 ) ,
Ji.
h = h; > 0 determine the performance of as an estimator better than the L 1 rate for histograms. Explicit formulas
of f. It is interesting to note that Cacoullos (1966) appears for minimum MIAE and asymptotically optimal smoothing
Izenman: Recent Developments in Nonparametric Density Estimation 211

parameters for kernel estimators were obtained by Hall and dow width is cross-validation (CV). The basic algorithm
Wand (1988). involves removing a single value, say Xi' from the sample,
For the L 2 approach, under regularity conditions on K computing the appropriate density estimate at that Xi from
andf, Parzen (1962) showed that if h; ~ 0 as n ~ 00, then the remaining n - 1 sample values,
the univariate kernel estimator was both asymptotically un-
1 '" (Xi - Xi)
biased and asymptotically normal. Cacoullos (1966) showed = (n _ l)h ~ K - h - ,
A

fh.i(Xi) (4.5)
that the asymptotic expression for IMSE for the d-dimen-
sional case was minimized over all h satisfying the above and then choosing h to optimize some given criterion in-
conditions by h~SE = a(K){3(f)n -1/(d+4), where a(K) de- volving all values oflh.i(X;) (i = 1,2, ... , n). Two different
pends only on the kernel K and (3(f) depends only on f; versions of CV have been used in density estimation: like-
furthermore, IMSE ~ 0 at rate O(n- 4 / (d+4) . The results show lihood cross-validation and least squares cross-validation.
clearly the dimensionality effect, since these convergence For likelihood cross-validation, h LC V is that h that maxi-
rates become slower as d increases. In the univariate case, mizes the "pseudo-likelihood" L(h) = II7~1 ]h.i(Xi), For least
if K is the standard Gaussian kernel (4.3) and f is a Gaus- squares cross-validation, hLSCV is that h that minimizes LS(h)
sian density with variance u 2 , then h~SE = 1.06un- 1/ 5 would = R(jh) - (2/n) 2:-7=dh,i(X;), which is exactly unbiased for
be the optimal window width. Additional consistency re- MISE - R(f). Marron (1987b) provided an excellent sur-
sults were obtained by Hall and Hannan (1988). vey of these and other automatic smoothing parameter
methods.
4.2 Choice of Kernel Mixed results have been obtained for CV methods in ker-
It has been known for some time that although the Bart- nel density estimation. It has been shown, for example, that
lett-Epanechnikov kernel minimizes the optimal asymp- when using compactly supported kernels [such as (4.4)],
totic IMSE with respect to K, IMSE is quite insensitive to likelihood CV produces consistent estimates of compactly
the shape of the kernel. Marron and Nolan (1987) gave fur- supported densities (Chow, Geman, and Wu 1983) but does
ther results in this direction. As a result, more exotic types not necessarily do so for estimating infinitely supported
of kernels are now being studied. The most important of densities (Schuster and Gregory 1981). The complex influ-
these developments concerns a hierarchy of classes of ker- ence that the tails of both K and f have on likelihood CV
nels defined by the existence of certain moments of K. In was studied by Hall (1987a) in terms of the Kullback-Lei-
this scheme, those univariate symmetric kernels K that in- bler norm. Broniatowski, Deheuvels, and Devroye (1989)
tegrate to unity are called order 0 kernels, while order s related such convergence problems to the stability of the
kernels, for some positive integer s, are those order 0 ker- extreme order statistics. Simulation studies by Scott and
nels whose first s - 1 moments vanish but whose sth mo- Factor (1981) indicated that, depending upon the type of
ment is finite. Thus second-order kernels have zero mean kernel employed, likelihood CV could lead to either a se-
and finite variance and include all compactly supported ker- verely undersmoothed or oversmoothed density estimate.
nels. Order s kernels, for s ~ 3, have zero variance, which Furthermore, the criterion L(h) was found to be very sen-
can be achieved only if K takes on negative values. Such sitive to outliers. Obvious modifications of L(h), including
kernels are important for bias reduction and improving the truncating f, have been considered; see Hall (1982) and
IMSE convergence rate. For example, if K is an order s Marron (1985).
kernel, then the fastest asymptotic rate of MSE conver- Least squares CV does not seem to display the peculiar
gence of]tofis O(n- 2s / (2s + 1) ; thus, for a fourth-order ker- behavior exhibited by likelihood CV. Indeed, very mild tail
nel, which cannot be nonnegative, the minimum asymptotic conditions on f and K are needed to prove asymptotic op-
MSE convergence rate of]tofis of order O(n- S/ 9 ) , which timality results for least squares CV. See, for example, Hall
is faster than the best such rate, O(n- 4 / 5 ) , for nonnegative (1983a) and Stone (1984), who showed that hLS CV asymp-
kernels (see Gasser, Muller, and Mammitzsch 1985). Hall totically minimized ISE. Bowman (1984) also showed, via
and Marron (1988) considered optimal selection of the or- simulation, that least squares CV achieved satisfactory re-
der s. Cline (1988) defined the admissibility of kernel es- sults for long-tailed f. Hall and Marron (1987a, b) proved
timators and showed that while the Bartlett-Epanechnikov that h LSCV performed asymptotically as well as the optimal
kernel is not admissible among all kernels, it is admissible (but unattainable) window width hIMSE; they then went on
among all nonnegative kernels. to show that although hLS CV converged very slowly, the least
squares CV choice of window width could not be improved
4.3 Choice of Window Width upon asymptotically. Scott and Terrell (1987) introduced a
Early work on the kernel method emphasized asymptotic version of the criterion LS(h) that was biased for MISE and
results, whereas determining an optimal h is the main re- showed that although large asymptotic performance gains
search focus today. Since the optimal window width, could be obtained from such a biased CV procedure, no
h~SE, depends explicitly on the unknown f through (3(f), currently available (biased or unbiased) CV procedure could
it cannot be computed exactly. Several "plug-in" proce- be considered highly reliable for very small samples.
dures were proposed whereby (3(j) was used to estimate The high sampling variability of CV estimates led Terrell
(3(f), but these were generally unsatisfactory (e.g., see Scott (1990) to propose that the smoothest density estimate be
and Terrell 1987). chosen that is compatible with the estimated scale of the
An automatic method for determining the optimal win- density. Taylor (1989) and Hall (1990) showed that the
212 Joumal of the American Statistical Association, March 1991

bootstrap also works well for selecting h in large samples ming algorithm, but gave no asymptotic rate of conver-
and if resampling is carried out with a reduced sample size. gence for the estimator.

4.4 Related Estimators 5.2 Estimators Based on Statistically Equivalent

Blocks
Applying the ideas of sequential analysis to kernel den-
sity estimation led to the development of sequential density A multivariate version of the variable partition histogram
estimators by Deheuvels (1973), Davies and Wegman (1975), was constructed by Gessaman (1970) and applied to non-
and Carroll (1976); for this type of estimator, sequential parametric discrimination in Gessaman and Gessaman (1972).
sampling is carried out, and the kernel estimator is com- See also Quesenberry and Gessaman (1968). This estimator
puted at each sample size until the conditions of a given was defined over a partitioning of the sample space into
stopping rule are satisfied, so that sample size is random. statistically equivalent blocks (a term introduced by Tukey
A related estimator is the recursive density estimator, where and abbreviated 'se-blocks'). An se-block is a multivariate
the kernel density estimator is calculated recursively, from in analog of the gap between two adjacent order statistics, and
]n-I; this estimator was introduced independently by Wol- was originally used for constructing nonparametric toler-
verton and Wagner (1969) and Yamato (1971), and further ance regions (Anderson 1966; Fraser 1951, 1953, 1957, sec.
studied by Devroye (1979) and Wegman and Davies (1979). 4.3; Fraser and Guttman 1956; Tukey 1947, 1948; Wald
See Prakasa Rao (1983, chap. 5). 1943; and Wilks 1962, sec. 8.7). Since this estimator does
not appear in any book or review of nonparametric density
5. LOCAL ADAPTIVE SMOOTHING estimation, some detail is provided here.
Let XI> Xz, ... , X, be a random sample on X E Rd. The
The methods for nonparametric density estimation so far
procedure for constructing se-blocks depends on a se-
described are quite insensitive to local peculiarities in the
quence, hl(x), ... , hix), of n real-valued functions of X,
data, such as data clumping in certain regions and data
not necessarily different, and a set of integers, UI, jz, ... ,
sparseness in others, particularly the tails. In this section,
j n), that forms a permutation of (1, 2, ... , n). Typically,
we describe attempts at constructing nonparametric density
ha(x) = Xb the kth coordinate of x. At the first step, hj)(x)
estimators that are more sensitive to the clustering of sam-
is used to order the {X}, Define XU,) as that Xa for which
ple values.
hh(x(j)) is the jIst smallest of the hj/xa) values. The cut
5.1 Variable Partition Histograms hj)(x) = hj)(x(iI) creates two disjoint blocks BI. ..h =
{x:hj)(x) ::5 hj,(x(j))} and Bj)+I. ..n+1 = {x : hj)(x) > hj,(x(j))}.
The results described in Section 3 were restricted to the Thus, there are exactly i. - 1 X; in Bl...j, and exactly n -
fixed partition case. Some work has appeared in which the jl in BiI+I. ..n+I' At the second step, if jz < i.. then hjz(x) is
histogram concept has beett made more data-sensitive as an used to order the j I - I X;' s in B I..j,' Define X(jzl as that
estimator off. This development, which led to the variable Xa for which j, - I Xa's satisfy hjz(xa) < hjz(x(jz» and hj)(xa)
partition histogram, was originally suggested by Wegman < hj)(x(j)) and i, - jz - 1 Xa's satisfy hjz(x a) >
(1969, 1975). Variable partition histograms are constructed h.12(X(j2» and h.11(x a ) < h,1" (x(j)) The cut h12(x) = h,12(x(jz» di-
in a similar manner as fixed partition histograms, but in this vides the block BI. . .j, into subblocks BI. ..jz = BI. ..j, n [x :
case the partition depends upon the gaps between the order hjz(x) ::5 hjz(X(j2»} and Bjz+l... j, = Bl...j) n {x : hjz(x) >
statistics X(l)' ... , X(n)' Choose an integer m E [2, n] to be hjz(X(j2»}. If, on the other hand, jl < jz, then the block
the number of bins of the histogram and then set k = [n/ B j, +1. . .n+I is divided into subblocks Bj, +I. ..jz = Bj) +1. . .n+I n
m]. A partition P = {Pin} can be obtained by defining Pin {x : hjz(x)::5 hjz(x(h»} and Bjz+I. ..n+1 = Bj)+I. ..n+1 n {x :
= [X(l)' X(k)], P Zn = (X(k)' X(Zk)], ... , P = (X«m-I)k), X(n)],
1M
hjz(x) > hjz(X(j2»}. This is done by ranking the n - i, Xa's
so that each interval contains about k sample values. Then, in Bj)+I .. .n+1 according to hjz(x) and letting X(jz) be the (jz -
for any x E [X(\), X(n)], estimate f by jl) smallest in the ranking. This procedure is continued. At
A ~ kin the mth step, the block that is divided is the one having i:
f(x) = L.J Ipin(x). (5.1) in its index set, and the X; in that block are ordered by
;=1 (XUk ) - X((i-1)k+1)
hjm(x) and the tJ; - jT1l(j) smallest value chosen to represent
Clearly.j'Is constant on the intervals {Pin} and is, therefore, the cut, where i; is the largest of the i.. ... , jm-I that are
a histogram-type estimator off. Wahba (1971) and Van Ry- less than i; After n steps there will be n + 1 se-blocks,
zin (1973) indicated that variable partition histograms were BI> B z, ... , B n+l • The map of se-blocks is completely de-
related to polynomial spline estimators. In the L I approach, termined by the functions {hal and the permutation used.
Devroye and Gyorfi (1983, sec. 7.5) showed that if k = k; To construct the density estimator, consider the bivariate
~ 00 and kn/n ~ 0 as n ~ 00, then] in (5.1) is a strongly case [d = 2, where X = (XI> Xz)]. Let k; > 0 be an integer
consistent estimator off. Similar results for the L z case can (Gessaman suggested k; = [n l / 3]) . Superimposed over the
be found in Prakasa Rao (1983, sec. 2.4), Lecoutre (1986), map of se-blocks, make [(n/kn)l/z] - I evenly spaced ver-
and Kogure (1987). Note that the results of Lecoutre are tical line cuts at the ordered XI-observations. After deleting
not valid when f is Gaussian. The rate of convergence for the observations used to make the cuts, make a further [(n/
MISE of the estimator (5.1) is O(n- Z/ 3 ) , the same order as kn)I/Z] - 1 evenly spaced horizontal line cuts at the ordered
for the fixed partition case. Kanazawa's (1988) results used Xz-observations. The plane will now be partitioned into
the Hellinger distance approach and a dynamic program- [(nkn)l/z] subblocks or probability squares (Gessaman and
Izenman: Recent Developments in Nonparametric Density Estimation 213

Gessaman 1972). Each probability square will be the union where the variable window width H jk = hDk(X} does not
of about k; se-blocks and, therefore, will contain about k; depend on x as did (5.4), h is a smoothing parameter, and
observations. If B; is a bounded probability square and x k controls the local behavior of H jk. The estimator (5.5) is
E B n , set a bona fide density if the kernel K satisfies (4.2). It was
, knl(n + 1) apparently first considered by Meisel in 1973 in the context
f(x) = . (5.2) of pattern recognition and then studied empirically by Brei-
area(B n )
man, Meisel, and Purcell (1977), who listed its advantages
On unbounded probability squares, estimate f as O. Ges- as having the smoothness properties of kernel estimators,
saman (1970) showed that if k; ~ 00 and k.f n ~ 0 as n ~ the data-adaptive character of the k-NN approach, and very
00, then the estimator (5.2) was weakly consistent for f.
little computational penalty. In their simulation studies, the
Convergence rates and some optimal choice for k; in (5.2) estimator (5.5) performed very poorly unless k was large,
have yet to be determined, however. on the order of . IOn. Conditions for consistency of the vari-
5.3 Nearest Neighbor Methods able kernel estimator were obtained by Wagner (1975) and
Devroye (1985); Devroye and Penrod (1986) proved the
Fix and Hodges (1951) proposed the nearest neighbor es- strong uniform consistency of (5.5).
timator in the context of nonparametric discrimination. See
Silverman and Jones (1988) for a modem interpretation. At 5.5 Adaptive Kernel Estimators
a fixed point x and for fixed integer k, let Dix) be the
The variable kernel estimator (5.5) led, in tum, to the
Euclidean distance from x to its kth nearest neighbor among
adaptive kernel estimator. Abramson (1982a,b), who was
the XI> X2 , ••• , Xn , and let vol.tx) = cADk(x)]d be the vol-
concerned with estimatingf at a point, proposed a two-step
ume of the d-dimensional sphere of radius Dix), where Cd
algorithm for computing a data-adaptive window width. First,
is the volume of the unit d-dimensional sphere. The kth
a clipped (or winsorized) version 12 is constructed from a
nearest neighbor (k-NN) density estimator is then given by
pilot kernel density estimate J2with fixed window width h
, k/n and then the adaptive kernel estimator is defined as
f(x) = --. (5.3)
voVx)
Tukey and Tukey (1981, sec. 11.3.2) called (5.3) the bal-
I~ I (x - X
fh(X) = - LJ d K - - ,
A j)
(5.6)
n j~1 hj h,
loon density estimate of f. An advantage of the k-NN es-
timator is that it is always positive, even in regions of sparce where hj = h[12(X}r l /2. Two modifications of Abram-
data. Loftsgaarden and Quesenberry (1965) proved (5.3) son's h, have been suggested. Silverman (1986, sec. 5.3)
was consistent if k = k; ~ 00 and k n / n ~ 0 as n ~ 00. set h, = h[(l/g) }2(X}r a , where g is a scale factor [such
Abramson (1984) proposed that in the d-dimensional case, as the geometric mean of the J2(xi ) , i = 1, 2, ... , n] and
k n should be chosen proportional to n 4 / (d + 4 ) , the constant of o ~ a ~ 1 reflects the sensitivity of the window width to
proportionality depending on x. The k-NN estimator (5.3) variations in the pilot estimate; examples of Silverman's
can be written as an kernel density estimator by setting adaptive window widths and a = 1/2 were also given that
demonstrated better tail behavior than the corresponding fixed
f(x) =
A 1 d
(x - X)
~ K _ _J
LJ , (5.4) window width kernel estimator. Hall and Marron (1988) set
hj = h F [J2p(X}r l / 2 in (5.6), where h p was the smoothing
n[Dix)J j=1 Dk(x)
parameter of the pilot estimate and hF was the smoothing
where the smoothing parameter is now k and the kernel K
parameter of the final estimate; they showed that their mod-
is the rectangular kernel. Moore and Yackel (1977) and Mack
ification had a very fast rate of MSE convergence.
and Rosenblatt (1979) analyzed the bias and variance of
(5.3). Rosenblatt (1979) studied the global behavior of gen- 6. ORTHOGONAL SERIES ESTIMATORS
eralized nearest neighbor estimates off. See also Mack (1980)
and Abramson (1984). Although the k-NN estimator ap- Orthogonal series density estimators were introduced by
peared reasonable for estimating a density at a point, it was Cencov (1962) and have since been applied to several dif-
not particularly successful for estimating the entire density ferent areas, especially pattern recognition and discrimi-
functionf. Indeed, the estimator was not a bona fide density nation and classification; see Greblicki and Pawlak (1981).
since (5.3) was discontinuous and had an infinite integral The method has been used to estimate multivariate densities
due to very heavy tails. Devroye and Gyorfi (1985, p. 21) for dichotomous (Ott and Kronmal 1976), polychotomous
noted that, because of these difficulties, "it is impossible (Butler and Kronmal 1985), and mixed continuous and dis-
to study its properties in L 1 • " crete variables (Hall 1983b).

5.4 Variable Kernel Estimators 6.1 Arbitrary Orthogonal Expansions

The variable kernel estimator, which was an attempt to This method assumes that a square-integrable f can be
avoid the problems associated with the k-NN estimator, was represented as a convergent orthogonal series expansion,
defined by setting .
xED, (6.1)
A 1~ 1
f(x) = - LJ d K - - ,
(x - X
j)
(5.5)
k=-oo

n j=1 H j k Hjk where {lilk} is a complete orthonormal system of functions

214 Joumal of the American Statistical Association, March 1991

on a set° of the real line [that is, satisfying In 'Pix)'Pix) last result slightly. Note that the IMSE convergence rate is
dx = Sjb where Sjk is the Kronecker delta] and {aJ are coef- independent of the dimension of the data, which gives the
ficients defined by ak = EA'Pt(X)], where 'Ptis the complex Hermite series estimator an advantage over the kernel es-
conjugate of 'Pk' This formulation allows for systems of real- timator for multivariate density estimation. The Hermite
or complex-valued orthonormal functions. Orthonormal system does not form a basis for the L 1 approach, however,
systems proposed for {'Pk} are those with compact support and the Hermite series estimator is neither translation in-
(such as the Fourier, trigonometric, and Haar systems on variant nor consistent in the L 1 sense.
[0, I], and Legendre system on [-1, 1]) and those with If j has compact support [0, 1], say, the popular Fourier
unbounded support [such as the Hermite system on R and (or trigonometric) series estimate, which is the real part of
Laguerre system on [0, 00)]. (6.4), is formed from the system of discrete Fourier func-
Given an independent sample, X h X 2 , ••• , Xn , fromjand tions, defined by 'Pk(X) = e2wikx [i = (_1)1/2, k = 0, 1, 2,
a system {'Pk}, the {ak} can be estimated unbiasedly by ...]. See Wahba (l975a, 1975b, 1981) and Hall (1981) for
I n details and comments about the influence of periodicity and
Ok = - 2:
n j=1
'Pf(X). (6.2) the Gibbs phenomenon on Fourier series density estimates.
Devroye and Gyorfi (1985, sec. 12.4) proved that for the
Fourier series estimator, under suitable conditions on j and
The obvious estimator ofj, obtained by plugging (6.2) into if r.fn - 0 as r; - 00, then MIAE - 0 as n _ 00.
(6.1) in place of ai, may not be well defined: It has infinite Arguments about the relative merits of the Hermite sys-
variance and is not consistent in the ISE sense. Tapered tem versus the Fourier system can be found in Walter (1977)
estimators of the form and Good and Gaskins (1980). Wahba (1981) suggested that
"in many applications it might be preferable to assume the
J(x) = 2:bkok'Pix) , xEO, (6.3) true density has compact support and to scale the data to
*=-00 the interior of [0, I]."
have been studied, where 0 < b, < I is a symmetric weight
(b_ k = b k) that shrinks Ok towards the origin, and ~Ibkl < 6.3 Choice of Number of Terms
00 is needed for pointwise convergence of (6.3). See, for The performance and smoothness of the orthogonal series
example, Watson (1969), Rosenblatt (1971), Brunk (1978), density estimate (6.4) depend on r, the number of terms in
and Hall (1986). Tapered orthogonal series estimators were the expansion. Kronmal and Tarter (1968) proposed a term-
used by Johnstone and Silverman (1990) to estimate bi- by-term optimal stopping rule for choosing r by minimizing
variate glucose density within the brain. The choice b, = an estimated MISE criterion. Disadvantages of that rule were
1 for - r s; k -s rand 0 otherwise leads to the partial sums pointed out by Crain (1973), who suggested that it might
of (6.1) being approximated by not yield the optimal r; by Hart (1985), who noted from
simulation studies that the rule tended to stop too soon, thus
Jr(x) = 2: ok'Pix) ,
k=-r
xEO, (6.4) yielding oversmoothed density estimates; and by Diggle and
Hall (1986), who warned about the possible poor perfor-
where {Ok} are given by (6.2). Wahba (1981) considered a mance and inconsistency of the rule in multimodal situa-
two-parameter system of weights, b, = (1 + A(2'lTk)2m)-1 tions. Improvements were suggested by Hart (1985) and
for -r -s k s; r, where A > 0 is a smoothing parameter Diggle and Hall (1986), and Lock (1990) combined choice
and m > 1/2 is a shape parameter. Other systems of weights of the number of terms with a tapered estimator and showed
were discussed by Hall (1987) and Lock (1990). To esti- its advantages in a simulation study.
mate the {b k } , likelihood cross-validation was proposed by
Wahba (1981) and least squares cross-validation by Hall 7. DELTA SEQUENCE DENSITY ESTIMATORS
(I 987b). In related work, Anderson and de Figueiredo (1980)
developed an adaptive orthogonal series estimator. Many of the different methods described so far for non-
parametric density estimation are special cases of the fol-
6.2 Statistical Properties lowing general class of density estimators. Let SA(X, y) (x,
Y E R), be a bounded function indexed by a smoothing
The most popular orthogonal series estimator for densi- parameter A > O. The sequence {8A(x , y)} is called a delta
ties with unbounded support, usually R or [0, 00), is the sequence on R if I::"" SA(X, y)cf>(y) dy - cf>(x) as A - 00 for
Hermite series estimator. The normalized Hermite functions every infinitely differentiable function cf> on R. Any esti-
given by 'Pix) = ck(x)Hk(x) (k = 0, 1, 2, ... ), where c, = mator that can be written in the form
e- x'/2/(2kk!7r1/2) 1/2 and Hk(x) = (_I)ke-x'/2(dk/~)(e-x')
is the kth Hermite polynomial, form an orthonormal basis
for an L 2 approach. They are heavily weighted in the tails xER, (7.1)
by e- x' /2 and provide sufficient protection against unusual
tail behavior of X; see Hall (1987b). Schwartz (1967) showed where {8A(x, y)} is a delta sequence, is called a delta se-
that if r = r n in (6.4) satisfies rn/n - 0 as r; _ 00, then quence density estimator. Thus histograms, kernel esti-
IMSE - 0 as n - 00; moreover, if r; = O(n 1/ q) for q :::: mators, and orthogonal series estimators can each be writ-
2, then IMSE = O(n-(I-I/q). Walter (1977) improved this ten in the form (7.1):
Izenman: Recent Developments in Nonparametric Density Estimation 215

l
histograms: <>m(x, X) = L~~l (t;+1 - t r ITi(x)ITi(X) the neighborhood of zero. For different approaches to com-
puting (8.1), see Barlow, Bartholomew, Bremner, and Brunk
[see (3.1)]
(1972, chap. 5) and Denby and Vardi (1986). Alternative
1
kernels: <>h(X, X) = - K((x - X)jh) approaches to estimating a decreasing density were given
h by Birge (1987a,b).
[see (4.1)] A related order restriction concerns unimodal densities.
First, without loss of generality, assume that the mode M
orthogonal series: <>rCx, X) = L~~-r q;ix)q;ff(X)
= 0 is known. Since a unimodal density f is nondecreasing
[see (6.2), (6.4)] in x prior to the mode and nonincreasing thereafter, it suf-
fices to consider only ML estimation of f+, the conditional
In some cases (such as histograms and orthogonal series
density on [0, 00), since a similar argument holds for f-,
estimators), A will be integer-valued as in the number of
the conditional density on (- 00, 0). The ML estimate of f
terms in an expansion, while in others (such as kernel es-
timators), A will be real-valued. Such general density es-
J 6.J+
is then given by = + (1 - 6.)J-, J+
where is the slope
of the least concave majorant of tn, J- is the slope of the
timators were first studied by Whittle (1958). Watson and
Leadbetter (1964) called them <>-function sequences and
greatest convex minorant of t: and 0 :5 6. :5 1 is the pro-
portion of sample values that fall into [0, 00). See, for ex-
showed that they were asymptotically unbiased as density
ample, Robertson, Wright, and Dykstra (1988, chap. 7).
estimators. Further work along the same lines was carried
Robertson (1967) showed that the ML estimate for a uni-
out by Foldes and Revesz (1974). Walter and Blum (1979)
variate, unimodal density with known mode can also be
and Prakasa Rao (1983, sec. 2.8) gave a long list of special
expressed as a conditional expectation given the a lattice
cases and established MSE rates of convergence; but, see
of all intervals that contained the mode, together with the
Hall (1981) for a cautionary note. Silverman (1986, sec.
empty set, and demonstrated that isotonic regression al-
2.9) referred to (7.1) as a general weight function esti-
gorithms can efficiently compute the ML estimate. When
mator. Marron (1987a) used delta sequence estimators as
the mode is unknown, Wegman (1969) obtained the ap-
a means of comparing different density estimators.
propriate ML estimator and showed consistency; in this case
8. RESTRICTED MAXIMUM LIKELIHOOD ESTIMATORS the a lattice was defined in terms of all intervals that con-
tained a consistent estimate of the mode. Sager (1982) gen-
The ML method of Section 3.1 fails miserably when the eralized the results of Robertson and Wegman and illus-
class of densities H over which the likelihood L is to be trated his results by estimating the contours of a bivariate
maximized is otherwise unrestricted. For that case, the like- density applied to a problem in cartography. See also Sager
lihood is maximized by a linear combination of Dirac delta (1986). A related minimum-distance estimator for unimo-
functions (or "spikes") at the n sample values, resulting in dal densities was studied by Reiss (1976).
a value of + 00 for the likelihood. In this section, ap-
proaches to the ML problem are described in which restric- 8.2 Method of Sieves
tions are placed either on H or L. The method of sieves is another restricted ML density
estimation method in which H is restricted. It is different,
8.1 Order-Restricted Methods however, in that the choice of "sieve" determines the den-
Consider, first, an order restriction on H. For example, sity estimation method. The essence of the method of sieves
densities that are monotone decreasing over the range [0, is the following: For each h > 0, select a subset Sh of den-
00) are especially important in survival analysis; see Denby sities for which a ML estimator does exist; next, find the
and Vardi (1986). Grenander (1956) showed that the ML restricted ML density estimator t; by maximizing the like-
estimator for a nonincreasing density on [0, 00) was a step lihood function
n
function with jumps at the order statistics {X(i)}. Specifi-
cally, if t, is the sample distribution function, then the ML Lh(f) = TIf(X;), (8.2)
i=1
estimator of a nonincreasing density is the slope of the least
concave majorant of r;
namely, and, finally, let the subset Sh grow (in some sense) with
the sample size n, while allowing h = h; ~ 0 as n ~ 00
, . tiX(/) - tn(x(S» in such a way as to ensure that the ML estimator converges
f(x) = mm max ,
sSt-I t?:; X(t) - X(s) to a density function. The sequence {Sh} of these subsets is
called a sieve, h is called the sieve parameter or mesh size,
X(i-I) < X < x.; (8.1)
and the estimation procedure is called the method of sieves.
and 0 for x < 0 and x < X(x)' Figure 4 displays the least For specific sieves, this method produced the histogram,
concave majorant for a sample of size n = 15. The Gren- MPL, and orthogonal series estimators, but, surprisingly,
ander estimator (8.1) is strongly consistent for monotone not the Gaussian kernel estimator.
decreasing f (Groeneboom 1983) with an MIAE conver- The method was introduced by Grenander (1981, part ill),
gence rate of O(n- 1/ 3 ) (Devroye 1987, chap. 8). It is also motivated by his work in pattern analysis and "based on an
reasonably well behaved when f is close to decreasing (Birge idea ofWald refmed by Bahadur." It was further developed
1986, 1989). Some modifications have been suggested to by Geman and Hwang (1982) and Walter and Blum (1984).
improve the performance of (8.1), including smoothing in See also Wegman (1975). As with density estimators in
216 Journal of the American Statistical Association. March 1991

0.8

o.
", F
n

0.2

0.0

o 1 2 3
x

Figure 4. The Empirical Distribution Function r. and Its Least Concave Majorant for a Sample of Size n = 15.

general which depend upon a smoothing parameter, the class of functions H. For example, $(f) = a f::oo [f"(x)f
performance of the method of sieves estimator depends par- dx is used in the International Mathematical and Statistical
ticularly upon the sequence of sieve parameters which should Libraries, Inc. (1987) routine DESPL, where a > 0 is a
decrease to zero "at a sufficiently slow rate" (Grenander smoothing parameter. Based on this penalty function, Fig-
1981, p. 426). It has been shown that this method leads to ure 5 shows MPL density estimates with different a using
consistent estimators in the L 1 sense, although exact rates n = 63 observations of Buffalo snowfall recorded during
of convergence have not yet been determined. To date, the 1910-1972. Good and Gaskins observed that the MPL
method has been studied only theoretically. method could, for certain types of problems, be interpreted
as "quasi-Bayesian" since (8.3) resembles a posterior den-
8.3 Maximum Penalized Likelihood Method sity for a parametric estimation problem. Furthermore, the
MPL method is closely related to Tikhonov's method of
The most popular method for restricted ML density es-
regularization used for solving ill-posed inverse problems
timation, however, involves penalizing the likelihood func-
(O'Sullivan 1986).
tion L for producing density estimates that are "too rough."
De Montricher, Tapia, and Thompson (1975) rigorously
See Good and Gaskins (1971). Thus, if $ is a given non-
established the existence and uniqueness of MPL density
negative (roughness) penalty junctional defined on H, then
the $-penalized likelihood of f is defined to be estimates, and showed that the MPL method was intimately
related to spline methods. For example, iffhas finite sup-
n
port 0 and H(O) is a suitable class of smooth functions on
L(f) = TI f(X;)e
;;}
-4>(/). (8.3) 0, then the MPL estimate Jexists, is unique, and is a poly-
nomial spline with join points (or "knots") only at the sam-
The optimization problem calls for L(f) in (8.3), or its log- ple values.
arithm, to be maximized subject to f E H(fl), f nf(t) dt = The case whenfhas infinite support is more complicated.
J,
1, and f(t) ;:: 0 ('\It E 0). If it exists, a solution, of that Good and Gaskins (1971) proposed penalty functionals de-
problem is called a maximum penalized likelihood (MPL) signed to estimate the root-density, 'Y = fl/1, so that J =
estimate of f corresponding to the penalty function $ and '92 would be a nonnegative (and bona fide) estimator of f.
Izenman: Recent Developments in Nonparametric Density Estimation 217

0.025 0.025

0.020 0.020

0.015 0.015
....>- ....>-
Ul Ul
cQ) l:
Q)
\J 0.010 "0 0.010

0.005 0.005

0.0 o .0
40 60 80 100 120 40 60 80 100 120

annual snowfall (in inches) annual snowfall (in inches)

(a) (b)

Figure 5. Maximum Penalized Likelihood Density Estimates of the 63 Annual Observations on Buffalo Snowfal/, 1910-1972. The data are
given in Scott (1985a). The penalty function used was 4>( f) = all f"(x)f dx, and the smoothing-parameter values were (a) a = 107, and (b)
a = 106 • The trimodal shape lsee (b)l is general/y regarded as the most reasonable density estimate for these data.

The penalty functionals were gave some recommendations for (a, f3) that performed well
in their examples.
<1>1(/) = 4a f'oo [y'(X)]2 dx, a> 0, (8.4) Another way of guaranteeing a bona fide density estimate
using the MPL method was devised by Silverman (l982b),
who used a roughness penalty based on g = logf, and showed
<1>2(/) = 4a Loooo [y'(X)]2 dx + f3 Loooo [y"(x)f dx, that this approach led to a wide range of possible density
estimates. Solving the appropriate optimization problem
a ~ 0, f3 ~ 0, (8.5) yielded an estimator g of g, so that a nonnegative MPL
estimate for f was given by J = e8. Silverman developed a
where the hyperparameters a and f3, with a + f3 > 0 in
very general theory of penalty functionals based on log f,
(8.5), control the amount of smoothing. Motivation for <1>1
and then proved the existence, consistency, and asymptotic
and <1>2 rested on how best to represent the "roughness" of
normality of the resulting estimators. This approach was
f. Good and Gaskins preferred (8.5) to (8.4), arguing that studied further by Silverman (1984).
curvature as well as slope of the density estimate should be
Implementation of the MPL method depends upon the
penalized. In follow-up papers, Good and Gaskins (1980)
quality of the numerical solutions to the restricted optimi-
and Good and Deaton (1981) set a = 0 in (8.5) and used
zation problems. Since y = fl /2 is square-integrable, Good
f3 Jl y"(x)f dx as the measure of roughness of f, where f3 and Gaskins (1980) suggested using mixtures of orthonor-
was to be determined from the data. Klonias and Nash (1983)
mal expansions for y, terminating the expansions at some
and Klonias (1984) investigated a very general class of pen-
finite number of terms. Scott, Tapia, and Thompson (1980)
alty functionals [that included (8.4) and (8.5) as special cases]
studied a discrete approximation to the spline solutions of
whose primary motivation was to improve estimation of peaks
the MPL problems, and proved that the resulting discrete
and valleys of f.
MPL estimator exists, is unique, converges to the spline
For the penalty function (8.4) and a given value of a,
MPL estimator, and is a strongly pointwise consistent es-
De Montricher et al. (1975) showed that, if the optimiza-
timator of f. Further computational work on the discrete
tion problem is set up correctly, then the resulting estimator
MPL estimator was carried out by Good and Deaton (1981).
'Ya, say, exists, is unique, and is a positive exponential spline
with knots only at the sample values. An exponential spline
9. PROJECTION PURSUIT DENSITY ESTIMATION
rather than a polynomial spline is the price to be paid for
requiring nonnegativity of the density estimate. The MPL Multivariate kernel density estimators tend to be poor
estimator is then given by t: = 'Y~. Klonias (1982) dem- performers when it comes to dealing with high-dimensional
onstrated consistency ofJa in a number of different norms, data since extremely large sample sizes are needed to match
including L 1 and L 2 • As for determining the value of a, the sort of numerical accuracy that is possible in low di-
Silverman (1978c) suggested, in a slightly different setup, mensions. In light of this, Friedman and Stuetzle (1982)
that a be chosen informally using graphical methods. If the and Friedman, Stuetzle, and Schroeder (1984) developed
penalty function is (8.5) and given values of a and f3, then, projection pursuit density estimation (PPDE). The PPDE
provided the optimization problem is set up correctly, the method has been shown in simulations to possess excellent
resulting estimate 'Ya.f3 exists and is unique. The MPL es- properties, and several quite striking applications of PPDE
timate off is given by Ja,f3 = 'Y ~.f3' Good and Gaskins also to real data have also been published.
218 Joumal of the American Statistical Association, March 1991

9.1 The PPDE Paradigm I(f) should be absolutely continuous with easily comput-
able first derivatives. "Interesting" projections should cor-
When dealing with small samples of high-dimensional
respond to large values of I(f), while small values of I(f)
data, the PPDE procedure may be jump-started by restrict-
should correspond to random or unstructured projections.
ing attention to the subspace spanned by the first few sig-
Estimates of I(f) should be amenable to fast computa-
nificant principal components; see Friedman (1987) and Jee
tion, unaffected by the overall covariance structure of the
(1987) for examples. A PPDE offis then formed using the
data and by outliers or heavy tails; see Huber (1985, sec.
following stepwise procedure. First, transform the data to
4). Friedman (1987) stressed that a very reliable and thor-
have center the origin and covariance matrix the identity.
ough numerical optimizer was absolutely essential for find-
Second, choose ]<0) to be an initial multivariate density es-
ing "substantive" maxima of I(f), since sampling fluctua-
timate off, usually taken to be standard multivariate Gaus-
d tions tend to trap ineffective optimizers within a multitude
sian. Third, fmd the direction a, E R for which the (model)
of local maxima. If [z} are the projected data, then (9.3)
marginal fa! along a. differs most from the current estimated
is estimated by f(f) = f J(}(z» dFn(z) = (1ln) ~7=1
'(data) marginalia! along a.. Choice of direction a. will not
J(}(z;). Thus if J(f(z» = f(z), then I(f) = f [f(Z)]2 dz
generally be unique. Fourth, given a., define a univariate
can be estimated by f (f) = (1 In) ~7= .Jh(Z;), where}h is a
"augmenting function" g.(a~x) as the ratio of the two mar-
kernel estimate with window width h; see Friedman and
ginals, namely, g.(a~x) = fa.<a~x)/J..!(a~x), and update the
Tukey (1974) and Tukey and Tukey (1981). Another choice
initial estimate so that ]<l)(x) = ]<O)(x)gl(a~x). Repeat this
is to take J(f(z» = logf(z), so that I(f) = f f(z) logf(z)
procedure on the modified density j'i'' so that a second di-
d dz, which is (negative) cross-entropy, and (9.3) can be es-
rection a z E R and augmenting function gz(a2x) = fa2(a2x)1
timated at the kth iteration by (lin) ~7=. log }(k)(Z;); see
}a2(a2x) are found, and the density is again modified to be
Friedman et al. (1984). Joe (1987) discussed kernel esti-
}(Z)(x) = }O)(X)gz(a2x). Repeat the procedure as many times
mation of functionals such as (9.3) and showed that, for
as necessary so that, at the kth iteration,
moderate-sized samples, statistical properties of f were im-
n
}(k)(X) = }(O)(x)
j~1
k

giaix) = }(k-1)(x)gk(a;x) (9.1)

proved either through bias corrections or by using a res-
caled kernel. .
Other projection indexes that have also been used include
will be the current multivariate density estimate, where a moment index based on the sum of squares of the third
T ) _ faj(aix )
and fourth sample cumulants of the projected data (Jones
gj ( ajx - 'F ( T ) ' j = 1,2, ... , k. (9.2) and Sibson 1987), and the ISE criterion (Friedman .1987;
laj a}x Hall 1989a). The latter approaches, though related, differed
In (9.1), the vectors {a} are unit length directions in R d , on whether or not to first transform the projected data.
and the augmenting (or ridge) functions {gJ are used to build Friedman used ISE between the transformed projected data
up the structure of }(O) so that }(k) converges to f in some density and the uniform density, while Hall's version used
appropriate sense as k ~ 00 • The number of iterations ISE between the untransformed projected data density and
k operates as a smoothing parameter and a stopping rule the standard normal. Both Friedman and Hall used orthog-
is determined by balancing bias against the variance of onal series density estimators (Legendre polynomials and
the estimate. Friedman et al. (1984) suggested graphical Hermite functions, respectively) to study their projection
inspection of the augmenting functions [plotting giaix) indexes.
against aix for j = 1, 2, ... , k] as a termination criterion Each of these indexes was designed to search for devia-
for the iterative procedure. tions from "uninterestingness," whose definition depended
Computation of the augmenting functions (9.2) has been on the application in question. Thus, the Friedman-Tukey
discussed by Friedman et al. (1984), Huber (1985, sec. 15) index searched for evidence of "clottedness" as well as de-
and discussants Buja and Stuetzle (especially pp. 487-489), partures from a parabolic density; the entropy index searched
and Jones and Sibson (1987, sec. 3). Given aj' estimate for departures of the projected data from normality since
faj by first projecting the sample data along the direction aj, the normal distribution maximizes entropy; and the moment
thus obtaining Z; = aix; (i = 1, 2, ... , n) and then compute index and ISE criteria also set up normality as the least
a kernel density estimate from the [z]. Monte Carlo sam- interesting data feature. Other indexes are also being stud-
J..
pling is used to compute j , followed by kernel density es- ied for specific applications.
timation. Alternatives to kernel smoothing include cubic
10. RELATED TOPICS
spline functions (Friedman et al. 1984) and average shifted
histograms (lee 1987). Functionals of a Density. Examples of functionals, a(F),
say, of the distribution function F associated with a density
9.2 Projection Indexes f include the quantile function F- I , the hazard function A
PPDE is driven by a projection index usually of the form = fl(1 - F), any Lp-norm of the derivatives off, Shannon

I(f) =J J(f(z»f(z) dz = Ef[J(f)],

negative entropy f flogf, and Fisher information f (f'i If.
(9.3) Certain of these are used as projection indexes in PPDE.
Typically, "plug-in" estimators of the form a(F) are used,
where J is a smooth real-valued functional and z is a one- where F is taken to be a smoothed version of Fn • Note that
dimensional projected version of x. As a functional on f, estimating F using the kernel method requires less smooth-
Izenman: Recent Developments in Nonparametric Density Estimation 219

ing than that best suited for estimatingf. Kernel estimation tance between two probability densities f and g, namely,
of the hazard rate was discussed by Singpurwalla and Wong
(1983) and Hassani, Sarda, and Vieu (1986), and that of HD(f, g) = ~ foo ([f(X)]1/2 - [g(X)]1/2)2 dx. (10.1)
the quantile function gp = F- 1( p ) , 0 < P < 1, by Parzen 2 -00

(1979), Falk (1984), and Sheather and Marron (1988). The The minimum Hellinger distance (MHD) estimator is that
bootstrap and its smoothed versions have been used to es- value fJ of fJ that minimizes HD(j, fe), where j is a non-
timate a(F) directly, especially for kernel quantile esti- parametric density estimator off and fe, (J E e, is a member
mation. See Silverman and Young (1987), Yang (1985), of some parametric family. The distance HD is always fi-
Hall, Diciccio, and Romano (1989), and Hall (1990). Note, nite and is invariant under strictly monotone transforma-
however, that bootstrap smoothing using a non-bona fide tions. Beran (1977a,b) Birge (1986), Tamura and Boos
kernel density estimator of a nonnegative quantity, such as (1986), and Simpson (1987, 1989) proved asymptotic re-
a probability or a variance, can make a nonnegative esti- sults and established impressive robustness properties of
mate negative. MHD location estimators based on the kernel density es-
Assessing Multimodality. Integer-valued nonlinear timator. For related work on minimum distance estimators
functionals off, such as the number of mixture components of densities, see Reiss (1976) and Birge (1983).
needed to represent I. and the number of modes of f, are Semiparametric Models. Olkin and Spiegelman (1987)
also of interest, and different nonparametric approaches to developed an approach to density estimation that combined
determining the values of such functionals have been con- parametric and nonparametric approaches. Their density es-
sidered. Donoho (1988) developed a general theory for de- timator was given by
termining nonparametric lower bounds on such functionals.
jAx) = 7Tf9(X) + (1 - 7T)j(X), (10.2)
Good and Gaskins (1980) used the MPL method together
with certain "bump hunting" surgical techniques to assess where f ii is a ML parametric estimator of f, j is a kernel
the existence of any "real" dips and bumps in mass spectra estimator off, and 0 ::s; 7T ::s; 1 is unknown. The parameter
obtained from scattering experiments. Silverman (1981b, 7T was chosen to minimize the Hellinger distance, HD<l""
1983) used the kernel method together with the smoothed f), and asymptotic results were obtained under regularity
bootstrap procedure to develop a confirmatory test of the conditions on f. Figure 6 shows the semiparametric density
most probable number of modes in a density; see Silverman estimate constructed from annual wind speed measurements
(1986, sec. 6.6) and Izenman and Sommer (1988). from Olkin and Spiegelman. For that example, the para-
metric model appeared to be appropriate.
Robust Estimation. Nonparametric density estimation
has been used to obtain robust estimators for parametric Directional Data. In astronomy, geology, and studies
inference. The main tool has been the use of Hellinger dis- of animal behavior, it is often of interest to estimate the

0.07

0.06

0.05

->-
.c;;
c 0.04
CD
c
0.03

0.02

o. 01 '-----'-_"""-----L_.-....._.&....----'-_"""-----L_.-....._.&....----'-~
40 45 50 55 60 65
Figure 6. Density Estimates for 20 Measurements on Annual Maximum Wind Speeds in the N. Direction Taken in Sheridan, Wyoming, During
1958-1977. Reproduced from Olkin and Spiegelman (1987). The dotted-and-dashed line shows the kernel density estimate with smoothing
parameter h = .7s, where s is the sample standard deviation; the dashed line shows the parametric density estimate; and the solid line shows
the semiparametric density estimate with estimated weight,", = .8.
220 Journal of the American Statistical Association, March 1991

(a) (b)

Figure 7. Perspective Plots for 685 Measurements on the Orbits of all Known Comets. Reproduced from Hall, Watson, and Cabrera (1987).
Smoothing was obtained by (a) likelihood cross-validation, and (b) least squares cross-validation. Notice that likelihood CV produces a smoother
density estimate having lower peaks than least squares CV. With permission of the Biometrika trustees.

density f of measurements, Xl' ... , X n , observed on the Time Series Data. For dependent observations gener-
surface of a d-dimensional unit sphere Sd' d ;::: 2. Kernel ated by a strictly stationary process, kernel density esti-
density estimators for such "directional data" have the forms mators were studied by Roussas (1969), Rosenblatt (1970,
n 1971), Nguyen (1979), and Hart (1984), recursive density
JK'K,(X) = n-lc(K) 2: Kl(KXTXJ, (10.3) estimators were studied by Masry (1986, 1989) and Masry
i=1 and Gyorfi (1987), and survival function and hazard rate
estimators were studied by Roussas (1989, 1990) and Iz-
JK.K2(X) = n-ld(K) 2: KiK(1 - xTX;)), (1004) enman and Tran (1990).
i=l

where K( and K 2 are known kernel functions typically de- REFERENCES

fined on [0, 00), K > 0 is an unknown smoothing parameter,
Abramson, I. S. (1982a), "On Bandwidth Variation in Kernel Esti-
C(K) and d(K) are positive numbers, and x E Sd' Asymptotic mates-A Square Root Law," The Annals of Statistics, 10, 1217-1223.
properties of (10.3) and (lOA) were studied by Hall, Wat- - - - (1982b), "Arbitrariness of the Pilot Estimate in Adaptive Kernel
son, and Cabrera (1987) and Bai, Rao, and Zhao (1988). Methods," Journal of Multivariate Analysis, 12,562-567.
- - - (1984), "Adaptive Density Flattening-A Metric Distortion Prin-
For a discussion of the related problem of nonparametric ciple for Combating Bias in Nearest Neighbor Methods," The Annals
density estimation on Riemannian manifolds using Fourier of Statistics, 12, 880-886.
transform methods, see Hendriks (1990). As an example, Aitchison, J., and Lauder, I. J. (1985), "Kernel Density Estimation for
Compositional Data," Applied Statistics, 34, 129-137.
three-dimensional perspective plots of kernel density esti- Anderson, G. L., and de Figueiredo, R. J. P. (1980), "An Adaptive
mators of different cometary orbits regarded as directional Orthogonal-Series Estimator for Probability Density Functions," The
data are given in Figure 7 using likelihood and least squares Annals of Statistics, 8, 347-376.
Anderson, J. A., and Senthilselvan, A. (1980), "Smooth Estimates for
cross-validation for determining the smoothing parameter. the Hazard Function," Journal of the Royal Statistical Society, Ser. B,
42, 322-327.
Censored Data. Often, in biomedical and industrial Anderson, T. W. (1966), "Some Nonparametric Multivariate Procedures
studies, censored survival or lifetime data are recorded, and Based on Statistically Equivalent Blocks," in Multivariate Analysis:
it is of interest to estimate density and hazard functions for Proceedings of an International Symposium, ed. P. R. Krishnaiah, New
York: Academic Press, pp. 5-27.
such data. Padgett and McNichols (1984) provided an ex- Bai, Z. D., Rao, C. R., and Zhao, L. C. (1988), "Kernel Estimators of
cellent survey paper on this topic. Since then, the kernel Density Function of Directional Data," Journal of Multivariate Anal-
(Marron and Padgett 1987) , nearest-neighbor (Mielniczuk ysis, 27,24-39.
Barlow, R. E., Bartholomew, D. J., Bremner, J. M., and Brunk, H. D.
1986), and penalized likelihood (Lubecke and Padgett 1985) (1972), Statistical Inference Under Order Restrictions, New York: John
methods have been used to obtain nonparametric estimates Wiley.
of the density f in the presence of censored data. The hazard Bartoszynski, R., Brown, B. W., McBride, C. M., and Thompson, J.
R. (1981), "Some Nonparametric Techniques for Estimating the In-
function (intensity function, failure rate) was estimated for tensity Function of a Cancer Related Nonstationary Poisson Process,"
censored data by the kernel method (Blum and Susarla 1980; The Annals of Statistics, 9, 1050-1060.
Liu and Van Ryzin 1985; Schafer 1985; Tanner 1983; Tan- Bean, S. J., and Tsokos, C. P. (1980), "Developments in Nonparametric
Density Estimation," International Statistical Review, 48, 267-287.
ner and Wong 1983; Yandell 1983) and by the MPL method Beran, R. (l977a), "Robust Location Estimates," The Annals of Statis-
(Anderson and Senthilse1van 1980; Bartoszynski, Brown, tics, 5,431-444.
McBride, and Thompson 1981). - - - (1977b), "Minimum Hellinger Distance Estimates for Parametric
Models," The Annals of Statistics, 5, 445-463.
Incomplete Data. Kernel density estimation from in- Birge, L. (1983), "Approximation Dans Les Espaces Metriques et Theo-
rie de l'Estimation," (in French), Zeitschrift fur Wahrkeinlichkeits-
complete data was considered by Titterington and Mill theorie und verwandte Gebeite, 65, 181-237.
(1983). - - - (1986), "On Estimating a Density Using Hellinger Distance and
Izenman: Recent Developments in Nonparametric Density Estimation 221

Some Other Strange Facts," Probability Theory and Related Fields. Diggle, P. J., and Hall, P. (1986), "The Selection of Terms in an Or-
71,271-291. thogonal Series Density Estimator," Journal of the American Statistical
- - - (1987a), "Estimating a Density Under Order Restrictions: Non- Association. 81, 230-233.
asymptotic Minimax Risk," The Annals of Statistics. 15,995-1012. Donoho, D. L. (1988), "One-Sided Inference About Functionals of a
- - - (1987b), "On the Risk of Histograms for Estimating Decreasing Density," The Annals of Statistics. 16, 1390-1420.
Densities," The Annals of Statistics. 15, 1013-1022. Donoho, D. L., and Johnstone, I. M. (1989), "Projection-Based Ap-
- - - (1989), "The Grenander Estimator: A Nonasymptotic Ap- proximation and a Duality with Kernel Methods," The Annals of Sta-
proach," The Annals of Statistics, 17, 1532-1549. tistics. 17, 58-106.
Blum, J. R., and Susarla, V. (1980), "Maximal Deviation Theory of Dubuisson, B., and Lavison, P. (1980), "Surveillance of a Nuclear Re-
Density and Failure Rate Function Estimates Based on Censored Data," actor by Use of a Pattern Recognition Methodology," IEEE Transac-
in Multivariate Analysis V. ed. P. R. Krishnaiah, Amsterdam: North- tions on Systems. Man. and Cybernetics. 10, 603-609.
Holland, pp. 213-222. Emerson, J. D., and Hoaglin, D. C. (1983), "Stem and Leaf Displays,"
Boneva, L. I., Kendall, D. G., and Stefanov, I. (1971), "Spline Trans- in Understanding Robust and Exploratory Data Analysis. eds. D. C.
formations: Three New Diagnostic Aids for the Statistical Data-Ana- Hoaglin, F. Mosteller, and J. W. Tukey, New York: John Wiley.
lyst" (with discussion), Journal of the Royal Statistical Society. Ser. Eubank, R. L. (1988), Spline Smoothing and Nonparametric Regression.
B, 33, 1-70. New York: Marcel Dekker.
Bowman, A. W. (1984), "An Alternative Method of Cross-Validation Falk, M. (1984), "Relative Deficiency of Kernel Type Estimators of
for the Smoothing of Density Estimates," Biometrika. 71, 353-360. Quantiles," The Annals of Statistics. 12, 261-268.
Boyd, D. W., and Steele, J. M. (1978), "Lower Bounds for Nonpara- Farrell, R. H. (1972), "On the Best Obtainable Asymptotic Rates of Con-
metric Density Estimation Rates," The Annals of Statistics. 6, 932- vergence in Estimation of a Density Function at a Point," The Annals
934. of Mathematical Statistics. 43, 170-180.
Breiman, L., Meisel, W., and Purcell, E. (1977), "Variable Kernel Es- Fix, E., and Hodges, J. L. (1951), "Discriminatory Analysis, Nonpara-
timates of Multivariate Densities," Technometrics, 19, 135-144. metric Estimation: Consistency Properties," Report No.4. Project No.
Broniatowski, M., Deheuvels, P., and Devroye, L. (1989), "On the Re- 21-49-004. Randolph Field, Texas: USAF School of Aviation Medi-
lationship Between Stability of Extreme Order Statistics and Conver- cine.
gence of the Maximum Likelihood Kernel Density Estimate," The An- Foldes, A., and Revesz, P. (1974), "A General Method for Density Es-
nals of Statistics. 17, 1070-1086. timation," Studia Scientiarum Mathematicarum Hungarica, 9, 81-92.
Brunk, H. B. (1978), "Univariate Density Estimation by Orthogonal Se- Fraser, D. A. S. (1951), "Sequentially Determined Statistically Equiv-
ries," Biometrika. 65,521-528. alent Blocks," The Annals of Mathematical Statistics. 22, 372-381.
Butler, W. J., and Kronmal, R. A. (1985), "Discrimination with Poly- - - - (1953), "Nonparametric Tolerance Regions," The Annals of
chotomous Predictor Variables Using Orthogonal Functions," Journal Mathematical Statistics. 24, 44-55.
of the American Statistical Association. 80, 443-448. - - - (1957), Nonparametric Methods in Statistics. New York: John
Cacoullos, T. (1966), "Estimation of a Multivariate Density," Annals of Wiley.
the Institute of Statistical Mathematics. 18, 178-189. Fraser, D. A. S., and Guttman, I. (1956), "Tolerance Regions," The
Carroll, R. J. (1976), "On Sequential Density Estimation," Zeitschrift Annals of Mathematical Statistics. 27, 162-179.
fur Wahrscheinlichkeitstheorie und verwandte Gebeite, 36, 136-151. Freedman, D., and Diaconis, P. (1981a), "On the Maximum Deviation
Cencov, N. N. (1962), "Evaluation of an Unknown Distribution Density Between the Histogram and the Underlying Density," Zeitschrift fur
From Observations," Soviet Mathematics. 3, 1559-1562. Wahrscheinlichkeitstheorie und verwandte Gebiete, 58, 139-167.
Chow, Y. S., Geman, S., and Wu, L. D. (1983), "Consistent Cross- - - - (1981 b), "On the Histogram as a Density Estimator: L, Theory, "
Validated Density Estimation," The Annals of Statistics. II, 25-38. Zeitschriji fur Wahrscheinlichkeitstheorie und verwandte Gebeite, 57,
Cline, D. (1988), "Admissible Kernel Estimators of a Multivariate Den- 453-476.
sity," The Annals of Statistics. 16, 1421-1427. Friedman, J. H. (1987), "Exploratory Projection Pursuit," Journal of the
Crain, B. R. (1973), "A Note on Density Estimation Using Orthogonal American Statistical Association. 82, 249-266.
Expansions," The Annals of Statistics. 2, 454-463. Friedman, J. H., and Stuetzle, W. (1982), "Projection Pursuit Methods
Davies, H. I., and Wegman, E. J. (1975), "Sequential Nonparametric for Data Analysis," in Modern Data Analysis. eds. R. L. Launer and
Density Estimation," IEEE Transactions on Information Theory. 21, A. F. Siegel, New York: Academic Press, pp. 123-147.
619-628. Friedman, J. H., Stuetzle, W., and Schroeder. A. (1984), "Projection
Deheuvels, P. (1973), "Sur l'Estimation Sequentielle de la Densite," Pursuit Density Estimation," Journal of the American Statistical As-
Comptes Rendus de l'Academie des Sciences de Paris. 276, 1119-1121. sociation. 79, 599-608.
- - - (1977), "Estimation nonparametrique de la densite par histo- Friedman, J. H., and Tukey, J. W. (1974), "A Projection Pursuit Al-
grammes generalises," Revue de Statistique Appliquee. 25/3, 5-42. gorithm for Exploratory Data Analysis," IEEE Transactions on Com-
De Jager, O. c.. Swanepoel, J. W. H., and Raubenheimer, B. C. (1986), puting. 23, 881-890.
"Kernel Density Estimators Applied to Gamma Ray Light Curves," Fryer, M. J. (1976), "Some Errors Associated With the Nonparametric
Astronomy and Astrophysics. 170, 187-196. Estimation of Density Functions," Journal of the Institute of Mathe-
De Montricher, G. M., Tapia, R. A., and Thompson, J. R. (1975), matics and its Applications. 18, 371-380.
"Nonpararnetric Maximum Likelihood Estimation of Probability Den- - - - (1977), "A Review of Some Nonparametric Methods of Density
sities by Penalty Function Methods," The Annals of Statistics. 3, 1329- Estimation," Journal of the Institute of Mathematics and Its Applica-
1348. tions. 20, 335-354.
Denby, L., and Vardi, Y. (1986), "The Survival Curve With Decreasing Fukunaga, K. (1972), Introduction to Statistical Pattern Recognition.
Density," Technometrics, 28, 359-367. London: Academic Press.
Devijver, P. A., and Kittler, J. (1982), Pattern Recognition: A Statistical Gajek, L. (1986), "On Improving Density Estimators Which Are Not
Approach. London: Prentice-Hall. Bona Fide Functions," The Annals of Statistics. 14, 1612-1618.
Devroye, L. (1979), "On the Pointwise and Integral Convergence of Re- Gasser, T., Muller, H.-G., and Mammitzsch, V. (1985), "Kernels for
cursive Kernel Estimates of Probability Densities," Utilitas Mathe- Nonparametric Curve Estimation," Journal of the Royal Statistical So-
matica, 15,113-128. ciety. Ser. B, 47, 238-252.
- - - (1983), "The Equivalence of Weak, Strong, and Complete Con- Gawronski, W., and Stadtmuller, U. (1980), "On Density Estimation by
vergence in L, For Kernel Density Estimates," The Annals of Statistics. Means of Poisson's Distribution," Scandinavian Journal of Statistics.
11, 896-904. 7,90-94.
- - - (1985), "A Note on the L, Consistency of Variable Kernel Es- Geman, S., and Hwang, C.-R. (1982), "Nonparametric Maximum Like-
timates," The Annals of Statistics. 13, 1041-1049. lihood Estimation by the Method of Sieves," The Annals of Statistics.
- - - (1987), A Course in Density Estimation. Boston: Birkhauser. 10, 401-414.
Devroye, L., and Gyorfi, L. (1985), Nonparametric Density Estimation: Gessaman, M. P. (1970), "A Consistent Nonparametric Multivariate
The L, View. New York: John Wiley. Density Estimator Based on Statistically Equivalent Blocks," The An-
Devroye, L., and Penrod, C. S. (1984), "The Consistency of Automatic nals of Mathematical Statistics. 41,1344-1346.
Kernel Density Estimates," The Annals of Statistics. 12, 1231-1249. Gessaman, M. P., and Gessaman, P. H. (1972), "A Comparison of Some
- - - (1986), "The Strong Uniform Convergence of Multivariate Vari- Multivariate Discrimination Procedures," Journal of the American Sta-
able Kernel Estimates," The Canadian Journal of Statistics. 14,211- tistical Association. 67, 468-472.
219. Ghurye, S. G. and Olkin, I. (1969), "Unbiased Estimation of Some Mul-
222 Journal of the American Statistical Association, March 1991

tivariate Probability Densities and Related Functions," The Annals of Huber, P. J. (1985), "Projection Pursuit" (with discussion), The Annals
Mathematical Statistics, 40, 1261-1271. of Statistics, 13, 435-525.
Good, 1. J., and Deaton, M. L. (1981), "Recent Advances in Bump International Mathematical and Statistical Libraries, Inc. (1987), STAT/
Hunting," in Computer Science and Statistics: Proceedings of the 13th liBRARY (Version 1.0), Houston, TX: Author.
Symposium on the Interface, ed. W. F. Eddy, New York: Springer- Izenman, A. J., and Sommer, C. J. (1988), "Philatelic Mixtures and
Verlag, pp. 92-104. Multimodal Densities," Journal of the American Statistical Associa-
Good,1. J., and Gaskins, R. A. (1971), "Nonparametric Roughness Pen- tion, 83, 941-953.
alties for Probability Densities," Biometrika, 58, 255-277. Izenman, A. J., and Tran, L. T. (1990), "Kernel Estimation of the Sur-
- - - (1980), "Density Estimation and Bump-Hunting by the Penalized vival Function and Hazard Rate Under Weak Dependence," Journal
Likelihood Method Exemplified by Scattering and Meteorite Data" (with of Statistical Planning and Inference, 24, 233-247.
discussion), Journal of the American Statistical Association, 75, 42- Jee, J. R. (1987), "Exploratory Projection Pursuit Using Nonparametric
73. Density Estimation," Proceedings of the Statistical Computing Section
Greblicki, W., and Pawlak, M. (1981), "Classification Using the Fourier of the American Statistical Association, 335-339.
Series Estimate of Multivariate Density Functions," IEEE Transactions Joe, H (1987), "Estimation of Entropy and Other Functionals of a Mul-
on Systems, Man, and Cybernetics, II, 726-730. tivariate Density," Technical Report, University of British Columbia.
Grenander, U. (1956), "On the Theory of Mortality Measurement. Part Johnstone, I. M., and Silverman, B. W. (1990), "Speed of Estimation
II," Skandinavisk Aktuarietidskrift, 39, 125-153. in Positron Emission Tomography and Related Inverse Problems," The
- - - (1981), Abstract Inference, New York: John Wiley. Annals of Statistics, 18, 251-280.
Groeneboom, P. (1983), "Estimating a Monotone Density," Proceedings Jones, M. C. (1989), "Discretized and Interpolated Kernel Density Es-
of the Berkeley Conference in Honor ofJerzy Neyman and Jack Keifer, timates," Journal of the American Statistical Association, 84, 733-
eds. L. M. LeCam and R. A. Olshen, 2, 539-555. Belmont, CA: 741.
Wadsworth. Jones, M. C., and Lotwick, H. W. (1984), "A Remark on Algorithm
Hall, P. (1981), "On Trigonometric Series Estimates of Densities," The AS 176. Kernel Density Estimation Using the Fast Fourier Transform,"
Annals of Statistics, 9, 683-685. Applied Statistics, 33, 120-122.
- - - (1982), "Cross-Validation in Density Estimation," Biometrika, Jones, M. C. and Sibson, R. (1987), "What is Projection Pursuit?" (with
69, 383-390. discussion), Journal of the Royal Statistical Society, Ser. A, 150,
- - - (1983a), "Large Sample Optimality of Least Squares Cross-Val- 1-36.
idation in Density Estimation," The Annals of Statistics, 11, 1156- Kanazawa, Y. (1988), "An Optimal Variable Cell Histogram," Com-
1174. munications in Statistics, 17, 1401-1422.
- - - (1983b), "Orthogonal Series Methods for Both Qualitative and Kasser, I. S., and Bruce, R. A. (1969), "Comparative Effects of Aging
Quantitative Data," The Annals of Statistics, 11, 1004-1007. and Coronary Heart Disease on Submaximal and Maximal Exercise,"
- - - (1986), "On the Rate of Convergence of Orthogonal Series Den- Circulation, 39, 759-774.
sity Estimators," Journal of the Royal Statistical Society, Ser. B, 48, Klonius, V. K. (1982), "Consistency of Two Nonparametric Maximum
115-122. Penalized Estimators of the Probability Density Function," The Annals
- - - (1987a), "On Kullback-Leibler Loss and Density Estimation," of Statistics, 10, 811-824.
The Annals of Statistics, 15, 1491-1519. - - - (1984), "On a Class of Nonparametric Density and Regression
- - - (1987b), "Cross-Validation and the Smoothing of Orthogonal Se- Estimators," The Annals of Statistics, 12, 1263-1284.
ries Density Estimators," Journal of Multivariate Analysis, 21, 189- Klonius, V. K., and Nash, S. G. (1983), "On the Computation ofa Class
206. of Maximum Penalized Likelihood Estimators of the Probability Den-
- - - (1989a), "On Polynomial-Based Projection Indices for Explora- sity Function," in Computer Science and Statistics: The Interface, ed.
tory Projection Pursuit," The Annals of Statistics, 17,589-605. J. E. Gentle, Amsterdam: North-Holland, pp. 310-314.
- - - (1989b), "On Convergence Rates in Nonparametric Problems," Kogure, A. (1987), "Asymptotically Optimal Cells for a Histogram,"
International Statistical Review, 57,45-58. The Annals of Statistics, 15, 1023-1030.
- - - (1990), "Using the Bootstrap to Estimate Mean Squared Error Kronmal, R. and Tarter, M. (1968), "The Estimation of Probability Den-
and Select Smoothing Parameter in Nonparametric Problems," Journal sities and Cumulatives by Fourier Series Methods," Journal of the
of Multivariate Analysis, 32, 177-203. American Statistical Association, 63, 925-952.
Hall, P., Diciccio, T. J., and Romano, J. P. (1989), "On Smoothing and - - - (1973), "The Use of Density Estimates Based on Orthogonal Ex-
the Bootstrap," The Annals of Statistics, 17, 692-704. pansions," in Exploring Data Analysis: The Computer Revolution in
Hall, P., and Hannan, E. J. (1988), "On Stochastic Complexity and Non- Statistics, eds. W. J. Dixon and W. L. Nicholson, Los Angeles: Uni-
parametric Density Estimation," Biometrika, 75, 705-714. versity of California Press, pp. 365-395.
Hall, P., and Marron, J. S. (1987a), "Extent to Which Least-Squares Lecoutre, J.-P. (1986), "The Histogram with Random Partition," in New
Cross-Validation Minimises Integrated Square Error in Nonparametric Perspectives in Theoretical and Applied Statistics, eds. M. L. Puri, J.
Density Estimation," Probability Theory and Related Fields, 74, 567- P. Vilaplana, and W. Wertz, New York: John Wiley, pp. 265-276.
581. Leonard, T. (1978), "Density Estimation, Stochastic Processes, and Prior
- - - (1987b), "On the Amount of Noise Inherent in Bandwidth Se- Information" (with discussion), Journal of the Royal Statistical Soci-
lection for a Kernel Density Estimator," The Annals of Statistics, 15, ety, Ser. B, 40, 113-146.
163-181. Liu, R. Y. C., and Van Ryzin, J. (1985), "A Histogram Estimator of
- - - (1988), "Choice of Kernel Order in Density Estimation," The the Hazard Rate With Censored Data," The Annals of Statistics, 13,
Annals of Statistics, 16, 161-173. 592-605.
Hall, P., and Wand, M. P. (1988), "Minimizing L, Distance in Non- Lock, M. D. (1990), "Optimizing Density Estimates Based On Un-
parametric Density Estimation," Journal of Multivariate Analysis, 26, weighted and Weighted Mean Integrated Squared Error," unpublished
59-88. Ph.D. dissertation, University of California, Berkeley, Group in Bio-
Hall, P., Watson, G. S., and Cabrera, J. (1987), "Kernel Density Es- statistics.
timation With Spherical Data," Biometrika, 74, 751-762. Loftsgaarden, D.O., and Quesenberry, C. P. (1965), "A Nonparametric
Hand, D. J. (1982), Kernel Discriminant Analysis. Chichester, U.K.: Estimate of a Multivariate Density Function," The Annals of Mathe-
Research Studies Press. matical Statistics, 36, 1049-1051.
Hart, J. D. (1984), "Efficiency of a Kernel Density Estimator Under an Lubecke, A. M., and Padgett, W. J. (1985), "Nonparametric Maximum
Autoregressive Dependence Model," Journal of the American Statis- Penalized Likelihood Estimation of a Density from Arbitrarily Right-
tical Association, 79, 110-117. Censored Observations," Communications in Statistics, Part A-The-
- - - (1985), "On the Choice of Truncation Point in Fourier Series ory and Methods, 14, 257-271 (corrigendum, p. 2007).
Density Estimation," Journal of Statistical Computation and Simula- Mack, Y. P. (1980), "Asymptotic Normality of Multivariate K-NN Den-
tion, 21,95-116. sity Estimates," Sankhya, Ser. A, 42, 53-63.
Hassani, S., Sarda, P., and Vieu, P. (1986), "Nonparametric Approaches Mack, Y. P., and Rosenblatt, M. (1979), "Multivariate K-Nearest Neigh-
to Hazard Functions: Bibliographical Review (in French)," Revue de bor Density Estimates," Journal of Multivariate Analysis, 9, 1-15.
Statistique Appliquee, 34/4, 27-42. Marron, J. S. (1985), "An Asymptotically Efficient Solution to the Band-
Hendriks, H. (1990), "Nonparametric Estimation of a Probability Density width Problem of Kernel Density Estimation," The Annals ofStatistics,
on a Riemannian Manifold Using Fourier Expansions," The Annals of 13, 1011-1023.
Statistics, 18, 832-849. - - - (1987a), "A Comparison of Cross- Validation Techniques in Den-
Izenman: Recent Developments in Nonparametric Density Estimation 223

sity Estimation," The Annals of Statistics, 15, 152-162. timation (Lecture Notes in Mathematics No. 757), eds. T. Gasser and
- - - (1987b), "Automatic Smoothing Parameter Selection: A Sur- M. Rosenblatt, Berlin: Springer-Verlag, pp. 181-190.
vey," Empirical Economics, 13, 187-208. Roussas, G. (1969), "Nonpararnetric Estimation of the Transition Dis-
Marron, J. S., and Hardle, W. (1986), "Random Approximations to Some tribution Function of a Markov Process," The Annals of Mathematical
Measures of Accuracy in Nonparametric Curve Estimation," Journal Statistics, 40, 1386-1400.
of Multivariate Analysis, 20,91-113. - - - (1989), "Hazard Rate Estimation Under Dependence Condi-
Marron, J. S., and Nolan, D. (1987), "Canonical Kernels for Density tions," Journal of Statistical Planning and Inference, 22, 81-93.
Estimation," Technical Report, University of North Carolina, Chapel - - - (1990), "Asymptotic Normality of the Kernel Estimate Under
Hill. Dependence Conditions: Application to Hazard Rate," Journal of Sta-
Marron, 1. S., and Padgett, W. J. (1987), "Asymptotically Optimal tistical Planning and Inference, 25, 81-104.
Bandwidth Selection for Kernel Density Estimators from Randomly Sager, T. W. (1982), "Nonparametric Maximum Likelihood Estimation
Right-Censored Samples," The Annals of Statistics, 15,1520-1535. of Spatial Patterns," The Annals of Statistics, 10, 1125-1136.
Masry, E. (1986), "Recursive Probability Density Estimation for Weakly - - - (1986), "An Application of Isotonic Regression to Multivariate
Dependent Stationary Processes," IEEE Transactions on Information Density Estimation," in Advances in Order Restricted Statistical In-
Theory, 32, 254-267. ference (Springer Lecture Notes in Statistics, Vol. 37), eds. R.· Dyk-
- - - (1989), "Nonparametric Estimation of Conditional Probability stra, T. Robertson, and F. T. Wright, New York: Springer-Verlag, pp.
Densities and Expectations of Stationary Processes: Strong Consistency 69-90.
and Rates," Stochastic Processes and Their Applications, 32, 109- Schafer, H. (1985), "A Note on Data-Adaptive Kernel Estimation of the
128. Hazard and Density Function in the Random Censorship Situation,"
Masry, E., and Gyorfi, L. (1987), "Strong Consistency and Rates for The Annals of Statistics, 13,818-820.
Recursive Probability Density Estimators of Stationary Processes," Schuster, E. F., and Gregory, C. G. (1981), "On the Nonconsistency of
Journal of Multivariate Analysis, 22, 79-93. Maximum Likelihood Nonparametric Density Estimators," in Com-
Mielniczuk, J. (1986), "Some Asymptotic Properties of Kernel Esti- puter Science and Statistics: Proceedings of the 13th Symposium on
mators in Case of Censored Data," The Annals of Statistics, 14, 766- the Interface, ed. W. F. Eddy, New York: Springer-Verlag pp. 295-
773. 298.
Moore, D. S., and Yackel, J. W. (1977), "Consistency Properties of Schwartz, S. C. (1967), "Estimation of Probability Density by an Or-
Nearest Neighbor Density Function Estimators," The Annals of Statis- thogonal Series," The Annals of Mathematical Statistics, 38, 1261-
tics, 5, 143-154. 1265.
Muller, H.-G. (1988), Nonparametric Regression Analysis of Longitu- Scott, D. W. (1979), "On Optimal and Data-Based Histograms," Biom-
dinal Data (Springer Lecture Notes in Statistics), New York: Springer- etrika, 66, 605-610.
Verlag pp. 295-298. - - - (1985), "Average Shifted Histograms: Effective Nonparametric
Nadarya, E. A. (1989), Nonparametric Estimation of Probability Den- Density Estimators in Several Dimensions," The Annals of Statistics,
sities and Regression Curves. Dordrecht, Neth.: Kluwer Academic 13, 1024-1040.
Publishers. - - - (1985b), "Frequency Polygons," Journal of the American Sta-
Nguyen, H. T. (1979), "Density Estimation in a Continuous-Time Sta- tistical Association, 80, 348-354.
tionary Markov Process," The Annals of Statistics, 7, 341-348. - - - (1988), "A Note on Choice of Bivariate Histogram Bin Shape,"
OIkin, 1., and Spiegelman, C. H. (1987), "A Semiparametric Approach Journal of Official Statistics, 4, 47-51.
to Density Estimation," Journal of the American Statistical Associa- Scott, D. W., and Factor, L. E. (1981), "Monte Carlo Study of Three
tion, 82, 858-865. Data-Based Nonparametric Density Estimators," Journal of the Amer-
O'Sullivan, F. (1986), "A Statistical Perspective on Ill-Posed Inverse ican Statistical Association, 76, 9-15.
Problems," Statistical Science, I, 502-527. Scott, D. W., Gotto, A. M., Cole, J. S., and Gorry, G. A. (1978),
Ott, J., and Kronmal, R. A. (1976), "Some Classification Procedures for "Plasma Lipids as Collateral Risk Factors in Coronary Artery Dis-
Multivariate Binary Data Using Orthogonal Functions," Journal of the ease-A Study of 371 Males With Chest Pain," Journal of Chronic
American Statistical Association, 71, 391-399. Diseases, 31, 337-345.
Padgett, W. J., and McNichols, D. T. (1984), "Nonparametric Density Scott, D. W., Tapia, R. A., and Thompson, J. R. (1980), "Nonpara-
Estimation From Censored Data," Communications in Statistics-The- metric Probability Density Estimation by Discrete Maximum Penal-
ory and Methods, 13, 1581-1611. ized-Likelihood Criteria," The Annals of Statistics, 8, 820-832.
Park, B. U., and Marron, J. S. (1990), "Comparison of Data-Driven Scott, D. W., and Terrell, G. R. (1987), "Biased and Unbiased Cross-
Bandwidth Selectors," Journal of the American Statistial Association, Validation in Density Estimation," Journal of the American Statistical
85,66-72. Association, 82, 1131-1146.
Parzen, E. (1962), "On Estimation of a Probability Density Function and Scott, D. W., and Thompson, J. R. (1983), "Probability Density Esti-
Mode," The Annals of Mathematical Statistics, 33, 1065-1076. mation in Higher Dimensions," in Computer Science and Statistics:
- - - (1979), "Nonparametric Statistical Data Modeling" (with dis- Proceedings of the Fifteenth Symposium on the Interface, ed. J. E.
cussion), Journal of the American Statistical Association, 74, 105-131. Gentle, Amsterdam: North-Holland, pp. 173-179.
Prakasa Rao, B. L. S. (1983), Nonparametric Functional Estimation. Sheather, S. J., and Marron, J. S. (1988), "Kernel Quantile Estimators,"
New York: Academic Press. Working Paper 88-012, Australian Graduate School of Management,
Quesenberry, C. P., and Gessaman, M. P. (1968), "Nonparametric Dis- The University of New South Wales, Australia.
crimination Using Tolerance Regions," The Annals of Mathematical Silverman, B. W. (1978c), "Density Ratios, Empirical Likelihood, and
Statistics, 39, 664-673. Cot Death," Applied Statistics, 27, 26-33.
Reiss, R.-D. (1976), "On Minimum Distance Estimators for Unimodal - - - (198Ia), "Density Estimation for Univariate and Bivariate Data,"
Densities," Metrika, 23,7-14. in Interpreting Multivariate Data, ed. V. Barnett, New York: John
Robertson, T. (1967), "On Estimating a Density Which is Measurable Wiley, Ch. 3, pp. 37-53.
With Respect to a a Lattice," The Annals of Mathematical Statistics, - - - (198Ib), "Using Kernel Density Estimates to Investigate Multi-
33, 482-493. modality," Journal of the Royal Statistical Society, Ser. B, 43, 97-
Robertson, T., Wright, F. T., and Dykstra, R. L. (1988), Order Re- 99.
stricted Statistical Inference, New York: John Wiley. - - - (1982a), "Algorithm AS 176. Kernel Density Estimation Using
Rodriguez, C. C., and van Ryzin, J. (1985), "Maximum Entropy His- the Fast Fourier Transform," Applied Statistics, 31,93-97.
tograms," Statistics and Probability Letters, 3, 117-120. - - - (1982b), "On the Estimation of a Probability Density Function
Rosenblatt, M. (1956), "Remarks on Some Nonparametric Estimates of by the Maximum Penalized Likelihood Method," The Annals of Sta-
a Density Function," The Annals of Mathematical Statistics, 27, 832- tistics, 10, 795-810.
837. - - - (1983), "Some Properties of a Test for Multimodality Based on
- - - (1970), "Density Estimates and Markov Sequences," in Non- Kernel Density Estimates," in Probability, Statistics, and Analysis, eds.
parametric Techniques in Statistical Inference, ed. M. L. Puri, Cam- J. F. C. Kingman and G. E. H. Reuter, 248-259, Cambridge: Cam-
bridge, U.K.: Cambridge University Press, pp. 199-210. bridge University Press.
- - - (1971), "Curve Estimates," The Annals of Mathematical Statis- - - - (1984), "Spline Smoothing: The Equivalent Variable Kernel
tics, 42, 1815-1842. Method," The Annals of Statistics, 12, 898-916.
- - - (1979), "Global Measures of Deviation for Kernel and Nearest - - - (1985), "Two Books on Density Estimation," The Annals of Sta-
Neighbor Density Estimates," in Smoothing Techniques for Curve Es- tistics, 13, 1630-1638.
224 Journal of the American Statistical Association, March 1991

- - - (1986), Density Estimationfor Statistics and Data Analysis, New Tract, Amsterdam: Centre for Mathematics and Computer Science.
York: Chapman and Hall. Van Ryzin, J. (1973), "On a Histogram Method of Density Estimation,"
Silverman, B. W., and Jones, M. C. (1988), "E. Fix and J. L. Hodges Communications in Statistics, 2, 493-506.
(1951): An Important Unpublished Contribution to Nonparametric Dis- Vitale, R. A. (1975), "A Bernstein Polynomial Approach to Density Es-
criminant Analysis and Density Estimation," Technical Report, Uni- timation," in Statistical Inference and Related Topics (Vol. 2), ed. M.
versity of Bath. L. Puri, San Francisco: Academic Press, pp. 87-99.
Silverman, B. W., and Young, G. A. (1987), "The Bootstrap: To Smooth Wagner, T. J. (1975), "Nonparametric Estimates of Probability Densi-
or Not To Smooth?" Biometrika, 74, 469-479. ties," IEEE Transactions on Information Theory, 21, 438-440.
Simpson, D. G. (1987), "Minimum Hellinger Distance Estimation for Wahba, G. (1971), "A Polynomial Algorithm for Density Estimation,"
the Analysis of Count Data," Journal of the American Statistical As- The Annals of Mathematical Statistics, 42, 1870-1886.
sociation, 82, 802-807. - - - (l975a), "Optimal Convergence Properties of Variable Knot,
- - - (1989), "Hellinger Deviance Tests: Efficiency, Breakdown Points, Kernel, and Orthogonal Series Methods for Density Estimation," The
and Examples," Journal of the American Statistical Association, 84, Annals of Statistics, 3, 15-29.
107-113. - - - (l975b), "Interpolating Spline Methods for Density Estimation
Singpnrwalla, N. D., and Wong, M.-Y. (1983), "Estimation of the Fail- I. Equi-Spaced Knots," The Annals of Statistics, 3, 30-48.
ure Rate-A Survey of Nonparametric Methods, Part I: Non-Bayesian - - - (1981), "Data-Based Optimal Smoothing of Orthogonal Series
Methods," Communications in Statistics, Part A-Theory and Meth- Density Estimates," The Annals of Statistics, 9, 146-156.
ods, 12, 559-588. Wald, A. (1943), "An Extension ofWilk's Method for Setting Tolerance
Stone, C. J. (1984), "An Asymptotically Optimal Window Selection Rule Limits," The Annals of Mathematical Statistics, 14, 45-55.
for Kernel Density Estimates," The Annals of Statistics, 12, 1285- Walter, G. (1977), "Properties of Hermite Series Estimation of Proba-
1297. bility Density," The Annals of Statistics, 5, 1258-1264. (Addendum:
Tamura, R. N., and Boos, D. D. (1986), "Minimum Hellinger Distance Annals of Statistics, 8,454-455 (1980)J.
Estimation for Multivariate Location and Covariance," Journal of the Walter, G., and Blum, J. R. (1979), "Probability Density Estimation
American Statistical Association, 81, 223-229. Using Delta Sequences," The Annals of Statistics, 7, 328-340.
Tanner, M. A. (1983), "A Note on the Variable Kernel Estimator of the - - - (1984), "A Simple Solution to a Nonparametric Maximum Like-
Hazard Function from Randomly Censored Data," The Annals of Sta- lihood Estimation Problem," The Annals of Statistics, 12, 372-379.
tistics, 11,994-998. Watson, G. S. (1969), "Density Estimation by Orthogonal Series," The
Tanner, M. A., and Wong, W. H. (1983), "The Estimation of the Hazard Annals of Mathematical Statistics, 40, 1496-1498.
Function from Randomly Censored Data by the Kernel Method;" The Watson, G. S., and Leadbetter, M. R. (1964), "Hazard Analysis 1,"
Annals of Statistics, 11,989-993. Biometrika, 51, 175-184.
Tapia, R. A., and Thompson, J. R. (1978), Nonparametric Probability Wegman, E. J. (1969), "Maximum Likelihood Histograms," Technical
Density Estimation, Baltimore, MD: Johns Hopkins University Press. Report, University of North Carolina, Chapel Hill.
Tarter, M. E., and Kronmal, R. A. (1976), "An Introduction to the Im- - - - (1972), "Nonparametric Probability Density Estimation: I. A
plementation and Theory of Nonparametric Density Estimation," The Summary of Available Methods," Technometrics, 14, 533-546.
American Statistician, 30, 105-112. - - - (1975), "Maximum Likelihood Estimation of a Probability Den-
Taylor, C. C. (1987), "Akaike's Information Criterion and the Histo- sity Function," Sankhya, Ser. A, 37, 211-224.
gram," Biometrika, 74, 636-639. - - - (1982), "Density Estimation," in Encyclopedia of Statistical Sci-
- - - (1989), "Bootstrap Choice of the Smoothing Parameter in Kernel ences, (Vol. 2), eds. S. Kotz and N. L. Johnson, New York: John
Density Estimation," Biometrika, 76,705-712. Wiley, pp. 309-315.
Terrell, G. R. (1990), "The Maximal Smoothing Principle in Density Wegman, E. J., and Davies, H. I. (1979), "Remarks on Some Recursive
Estimation, " Journal of the American Statistical Association, 85, 470- Estimators of a Probability Density," The Annals of Statistics, 7, 316-
477. 327.
Terrell, G. R., and Scott, D. W. (1980), "On Improving Convergence Wertz, W. (1978), Statistical Density Estimation: A Survey, Gottingen,
Rates for Nonnegative Kernel Density Estimators," The Annals of Sta- F.R.G.: Vanderhoeck and Ruprecht.
tistics, 8, 1160-1163. Wertz, W., and Schneider, B. (1979), "Statistical Density Estimation: A
- - - (1985), "Oversmoothed Nonpararnetric Density Estimates," Journal Bibliography," International Statistical Review, 47, 155-175.
of the American Statistical Association, 80, 209-214. Whittle, P. (1958), "On the Smoothing of Probability Density Func-
Titterington, D. M., and Mill, G. M. (1983), "Kernel-Based Density tions," Journal of the Royal Statistical Society, Ser. B, 20, 334-343.
Estimates from Incomplete Data," Journal of the Royal Statistical So- Wilks, S. S. (1962), Mathematical Statistics, New York: John Wiley.
ciety, Ser. B, 45, 258-266. Wolverton, C. T., and Wagner, T. J. (1969), "Recursive Estimates of
Tukey, J. W. (1947), "Non-Parametric Estimation II. Statistically Equiv- Probability Densities," IEEE Transactions on Systems, Science, and
alent Blocks and Tolerance Regions-The Continuous Case," The An- Cybernetics, 5, 307.
nals of Mathematical Statistics, 18, 529-539. Yamato, H. (1971), "Sequential Estimation of a Continuous Probability
- - - (1948), "Nonparametric Estimation, III. Statistically Equivalent Density Function and the Mode," Bulletin of Mathematical Statistics,
Blocks and Multivariate Tolerance Regions-The Discontinuous Case," 14, 1-12.
The Annals of Mathematical Statistics, 19, 30-39. Yandell, B. S. (1983), "Nonparametric Inference for Rates With Cen-
Tukey, P. A., and Tukey, J. W. (1981), "Data-Driven View Selection; sored Survival Data," The Annals of Statistics, 11, 1119-1135.
Agglomeration and Sharpening," in Interpreting Multivariate Data, ed. Yang, S.-S. (1985), "A Smooth Nonparametric Estimator of a Quantile
V. Bamett, New York: John Wiley, Ch. II, pp. 215-243. Function," Journal of the American Statistical Association, 80, 1004-
Van Es, A. J. (1990), Aspects of Nonparametric Density Estimation, CWI 1011.

Tabak Turner
No ratings yet
Tabak Turner
20 pages
Advanced Data Analysis Techniques
No ratings yet
Advanced Data Analysis Techniques
20 pages
Non Parametric Density Estimation
No ratings yet
Non Parametric Density Estimation
4 pages
A Primer in Nonparametric Econometrics
No ratings yet
A Primer in Nonparametric Econometrics
88 pages
Racine - 2007 - Nonparametric Econometrics A Primer
No ratings yet
Racine - 2007 - Nonparametric Econometrics A Primer
88 pages
Econometricians' Guide to KDE
No ratings yet
Econometricians' Guide to KDE
35 pages
Green 1988
No ratings yet
Green 1988
3 pages
(Bernard. W. Silverman) Density Estimation For Sta
No ratings yet
(Bernard. W. Silverman) Density Estimation For Sta
92 pages
A Review of Kernel Density Estimation With Applications To Econometrics (#278024) - 259389
No ratings yet
A Review of Kernel Density Estimation With Applications To Econometrics (#278024) - 259389
23 pages
Chapter One
100% (1)
Chapter One
46 pages
Ast Part1 PDF
No ratings yet
Ast Part1 PDF
20 pages
Articulo Sheather
No ratings yet
Articulo Sheather
11 pages
Adaptive Bayesian Density Regression For High-Dimensional Data
No ratings yet
Adaptive Bayesian Density Regression For High-Dimensional Data
25 pages
Nonparametric Methods: Jason Corso
No ratings yet
Nonparametric Methods: Jason Corso
49 pages
Density Estimation
No ratings yet
Density Estimation
17 pages
The Study of Different Types of Kernel Density Estimators: Minge Sha, Yonggang Xie
No ratings yet
The Study of Different Types of Kernel Density Estimators: Minge Sha, Yonggang Xie
5 pages
Parameter Estimation - PR
No ratings yet
Parameter Estimation - PR
66 pages
Bhattacharya Nonparametric
No ratings yet
Bhattacharya Nonparametric
30 pages
Densityestimation
No ratings yet
Densityestimation
28 pages
Chap 4
No ratings yet
Chap 4
21 pages
Lecture 12
No ratings yet
Lecture 12
4 pages
Non-Parametric Methods Using Kernel Density Estimation
No ratings yet
Non-Parametric Methods Using Kernel Density Estimation
1 page
On Density Estimation
No ratings yet
On Density Estimation
4 pages
Transformations in Density Estimation
No ratings yet
Transformations in Density Estimation
12 pages
Empirical Finance1
No ratings yet
Empirical Finance1
31 pages
Non-Parametric Density Estimation
No ratings yet
Non-Parametric Density Estimation
3 pages
Comprehensiv Questions Solved
No ratings yet
Comprehensiv Questions Solved
28 pages
U4 ProbabilityDensityEstimation
No ratings yet
U4 ProbabilityDensityEstimation
6 pages
Non-Parametric Methods
No ratings yet
Non-Parametric Methods
51 pages
Kernel Density Estimation
No ratings yet
Kernel Density Estimation
10 pages
Kernel Smoothers: An Overview of Curve Estimators For The First Graduate Course in Nonparametric Statistics
No ratings yet
Kernel Smoothers: An Overview of Curve Estimators For The First Graduate Course in Nonparametric Statistics
13 pages
Review of Nonparametric Time Series Analysis: Wolf'gmg Hardle' Helmut Lutkepoh12 Chen3
No ratings yet
Review of Nonparametric Time Series Analysis: Wolf'gmg Hardle' Helmut Lutkepoh12 Chen3
24 pages
Simon Sheather 2004 PDF
No ratings yet
Simon Sheather 2004 PDF
10 pages
Robust Kernel Density Estimation With Median-of-Means Principle-Humbert
No ratings yet
Robust Kernel Density Estimation With Median-of-Means Principle-Humbert
22 pages
Nonparametric Stats for Stochastic Processes
No ratings yet
Nonparametric Stats for Stochastic Processes
219 pages
Kernel Density Estimation and Its Application
No ratings yet
Kernel Density Estimation and Its Application
8 pages
Histogram: Nonparametric Kernel Density Estimation
No ratings yet
Histogram: Nonparametric Kernel Density Estimation
19 pages
Estimating The Support of A High-Dimensional Distribution
No ratings yet
Estimating The Support of A High-Dimensional Distribution
28 pages
A Bayesian Approach To Nonparametric Test Problems
No ratings yet
A Bayesian Approach To Nonparametric Test Problems
16 pages
Towardsdatascience Com The Math Behind Kernel Density Estimation 5deca75cba38 ...
No ratings yet
Towardsdatascience Com The Math Behind Kernel Density Estimation 5deca75cba38 ...
26 pages
Robust Kernel Density Estimation-Kim and Scott
No ratings yet
Robust Kernel Density Estimation-Kim and Scott
37 pages
ML Unit-4
No ratings yet
ML Unit-4
29 pages
S - D K D E: Core Ebiased Ernel Ensity Stimation
No ratings yet
S - D K D E: Core Ebiased Ernel Ensity Stimation
12 pages
Recent Advances and Trends in Nonparametric Statistics 1st Edition Michael G. Akritas and Dimitris N. Politis (Eds.) PDF Download
No ratings yet
Recent Advances and Trends in Nonparametric Statistics 1st Edition Michael G. Akritas and Dimitris N. Politis (Eds.) PDF Download
95 pages
Conditional Density Estimation With Neural Network
No ratings yet
Conditional Density Estimation With Neural Network
41 pages
Pa 01 Density Estimation
No ratings yet
Pa 01 Density Estimation
25 pages
TEAA - Memory Based Tecniques
No ratings yet
TEAA - Memory Based Tecniques
23 pages
Univariate Density Estimation by Orthogonal Series: Department of Statistics, Oregon State University, Corvallis
No ratings yet
Univariate Density Estimation by Orthogonal Series: Department of Statistics, Oregon State University, Corvallis
8 pages
Slides3part1 mrbm2324
No ratings yet
Slides3part1 mrbm2324
29 pages
Estimando Una Funcion de Distribucion Con Datos Truncados
No ratings yet
Estimando Una Funcion de Distribucion Con Datos Truncados
16 pages
Kernel Smoothing-MP Wand-MC Jones-1995
100% (1)
Kernel Smoothing-MP Wand-MC Jones-1995
228 pages
Minimum L - Distance Estimators For Non-Normalized Parametric Models
No ratings yet
Minimum L - Distance Estimators For Non-Normalized Parametric Models
32 pages
Study of Logspline Density Estimation: (Revised 10, 1990)
No ratings yet
Study of Logspline Density Estimation: (Revised 10, 1990)
29 pages
M3 DensityEstimation v1
No ratings yet
M3 DensityEstimation v1
65 pages
Histogram Density Estimation
No ratings yet
Histogram Density Estimation
17 pages
SDV
No ratings yet
SDV
82 pages
HO4 Estimation
No ratings yet
HO4 Estimation
9 pages
Introduction To Research NPTEL All Assignment Questions
100% (3)
Introduction To Research NPTEL All Assignment Questions
10 pages
CFA Level 1 Quick Notes Index
No ratings yet
CFA Level 1 Quick Notes Index
289 pages
Population Sample Parameter
No ratings yet
Population Sample Parameter
16 pages
CH8 Forecasting
No ratings yet
CH8 Forecasting
80 pages
Astm - D2812-07 (2021)
100% (2)
Astm - D2812-07 (2021)
5 pages
Introduction Statistics Imperial College London
50% (2)
Introduction Statistics Imperial College London
474 pages
Introduction To Spectral Analysis Sm-Slides-1ed
No ratings yet
Introduction To Spectral Analysis Sm-Slides-1ed
125 pages
A Handbook To Conquer Casella and Berger Book in Ten Days: Oliver Y. Chén Last Update: June 25, 2016
No ratings yet
A Handbook To Conquer Casella and Berger Book in Ten Days: Oliver Y. Chén Last Update: June 25, 2016
15 pages
Sampling Distributions & Estimation
No ratings yet
Sampling Distributions & Estimation
75 pages
(Haff) An Identity For The Wishart Distribution With Applications
No ratings yet
(Haff) An Identity For The Wishart Distribution With Applications
14 pages
Fgls PDF
No ratings yet
Fgls PDF
20 pages
Statistical Inference
100% (1)
Statistical Inference
118 pages
Econometrics Chapter Two-1
No ratings yet
Econometrics Chapter Two-1
41 pages
Simulation Techniques in R
100% (1)
Simulation Techniques in R
27 pages
The Fundamentals of Regression Analysis PDF
No ratings yet
The Fundamentals of Regression Analysis PDF
99 pages
Cumulative Prospect Theory and Decision Making Under Time Pressure
No ratings yet
Cumulative Prospect Theory and Decision Making Under Time Pressure
44 pages
PSLP Notes
No ratings yet
PSLP Notes
13 pages
Spectral Analysis PDF
100% (2)
Spectral Analysis PDF
22 pages
Additive Models: 36-350, Data Mining, Fall 2009 2 November 2009
No ratings yet
Additive Models: 36-350, Data Mining, Fall 2009 2 November 2009
16 pages
Kothari 2005 - Performance Matched Discretionary
No ratings yet
Kothari 2005 - Performance Matched Discretionary
35 pages
Glossary
No ratings yet
Glossary
31 pages
MCQ Sampling and Sampling Distributions Wiht Correct Answers
100% (20)
MCQ Sampling and Sampling Distributions Wiht Correct Answers
6 pages
ML Complete Notes-AIDS
No ratings yet
ML Complete Notes-AIDS
115 pages
Continuous Probability Distributions Guide
No ratings yet
Continuous Probability Distributions Guide
20 pages
Sullivan Et Al 2018 - Should Multiple Imputation Be The Method of Choice For Handling Missing Data in Randomized Trials
No ratings yet
Sullivan Et Al 2018 - Should Multiple Imputation Be The Method of Choice For Handling Missing Data in Randomized Trials
17 pages
Political Institutions and Academic Freedom: Evidence From Across The World
No ratings yet
Political Institutions and Academic Freedom: Evidence From Across The World
24 pages
Exercise 6
No ratings yet
Exercise 6
8 pages
ESO 209: Probability and Statistics 2019-2020-II Semester Assignment No. 7
No ratings yet
ESO 209: Probability and Statistics 2019-2020-II Semester Assignment No. 7
3 pages
Chapter Four
No ratings yet
Chapter Four
65 pages

Izenman 1991

Uploaded by

Izenman 1991

Uploaded by

Journal of the American Statistical Association

ISSN: 0162-1459 (Print) 1537-274X (Online) Journal homepage: http://www.tandfonline.com/loi/uasa20

Review Papers: Recent Developments in

Alan Julian Izenman

To link to this article: https://doi.org/10.1080/01621459.1991.10475021

Published online: 27 Feb 2012.

Submit your article to this journal

Article views: 169

View related articles

Citing articles: 36 View citing articles

Full Terms & Conditions of access and use can be found at

1. INTRODUCTION neighbor methods-were inspired by application to non-

40 60 80 100 120 80 100 120 140 160 180 200 220

resting heart rate

Table 1. Case Studies Involving Nonparametric Density Estimation

Reference Topic Method Remarks

Table 2. Citations of Reviews in JASA of Books on Nonparametric Density Estimation

Author Source of review Reviewer General comments

IAE = L"", Ij(x) I

4.4 Related Estimators 5.2 Estimators Based on Statistically Equivalent

5.4 Variable Kernel Estimators 6.1 Arbitrary Orthogonal Expansions

n j=1 H j k Hjk where {lilk} is a complete orthonormal system of functions

annual snowfall (in inches) annual snowfall (in inches)

giaix) = }(k-1)(x)gk(a;x) (9.1)

I(f) =J J(f(z»f(z) dz = Ef[J(f)],

where K( and K 2 are known kernel functions typically de- REFERENCES

You might also like