KEMBAR78
Kde Slides | PDF | Histogram | Statistics
0% found this document useful (0 votes)
54 views29 pages

Kde Slides

This document discusses nonparametric density estimation using kernel density estimation (KDE). It begins with an introduction to histograms and their problems for density estimation. It then introduces KDE, which uses a kernel function to smooth the histogram and provide a continuous density estimate. It discusses choosing the bandwidth and presents examples of KDE, including estimating disease risk based on glucose levels.

Uploaded by

Sue
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
54 views29 pages

Kde Slides

This document discusses nonparametric density estimation using kernel density estimation (KDE). It begins with an introduction to histograms and their problems for density estimation. It then introduces KDE, which uses a kernel function to smooth the histogram and provide a continuous density estimate. It discusses choosing the bandwidth and presents examples of KDE, including estimating disease risk based on glucose levels.

Uploaded by

Sue
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

Nonparametric Density Estimation

October 1, 2018
Introduction

I If we can’t fit a distribution to our data, then we use


nonparametric density estimation.
I Start with a histogram.
I But there are problems with using histrograms for density
estimation.
I A better method is kernel density estimation.
I Let’s consider an example in which we predict whether
someone has diabetes based on their glucode concentration.
I We can also use kernel density estimation with naive Bayes or
other probabilistic learners.
Introduction
I Plot of plasma glucose concentration (GLU) for a population
of women who were at least 21 years old, of Pima Indian
heritage and living near Phoenix, Arizona, with no evidence of
diabetes:
No Diabetes
14
12
10
Counts

8
6
4
2
0
0 50 100 150 200 250
GLU
Introduction

I Assume we want to determine if a person’s GLU is abnormal.


I The population was tested for diabetes according to World
Health Organization criteria.
I The data were collected by the US National Institute of
Diabetes and Digestive and Kidney Diseases.
I First, are these data distributed normally?
I No, according to a χ2 test of goodness of fit.
Histograms

I A histogram is a first (and rough) approximation to an


unknown probability density function.
I We have a sample of n observations, X1 , . . . , Xi , . . . , Xn .
I An important parameter is the bin width, h.
I Effectively, it determines the width of each bar.
I We can have thick bars or thin bars, obviously.
I h determines how much we smooth the data.
I Another parameter is the origin, x0 .
I x0 determines where we start binning data.
I This obviously effects the number of points in each bin.
I We can plot a histogram as
I the number of items in each bin or
I the proportion of the total for each bin
Histograms

I We define a bins or intervals as

[x0 + mh, x0 + (m + 1)h] for m ∈ Z

(i.e., the positive and negative integers).


I But for our purposes, it’s best to plot the relative frequency
1
fˆ(x) = (number of Xi in same bin as x)
nh
I Notice that this is the density estimate for x.
Problems with Histograms

I One program with using histograms as an estimate of the


PDF is there can be discontinuities.
I For example, if we have a bin with no counts, then its
probability is zero.
I This is also a problem “at the tails” of the distribution, the
left and right side of the histogram.
I First off, with real PDFs, there are no impossible events (i.e.,
events with probability zero).
I There are only events with extremely small probabilities.
I The histogram is discrete, rather than continuous, so
depending on the smoothing factor, there could be large
jumps in the density with very small changes in x.
I And depending on the bin width, the density may not change
at all with reasonably large changes to x.
Kernel Density Estimator: Motivation
I Research has shown that a kernel density estimator for
continuous attributes improve the performance of naive Bayes
over Gaussian distributions [John and Langley, 1995].
I KDE is more expensive in time and space than a Gaussian
estimator, and the result is somewhat intuitive: If the data do
not follow the distributional assumptions of your model, then
performance can suffer.
I With KDE, we start with a histogram, but when we estimate
the density of a value, we smooth the histogram using a
kernel function.
I Again, start with the histogram.
I A generalization of the histogram method is to use a function
to smooth the histogram.
I We get rid of discontinuities.
I If we do it right, we get a continuous estimate of the PDF.
Kernel Density Estimator
[McLachlan, 1992, Silverman, 1998]
I Given the sample Xi and the observation x
n  
ˆ 1 X x − Xi
f (x) = K ,
nh h
i=1

where h is the window width, smoothing parameter, or


bandwidth.
I K is a kernel function, such that
Z ∞
K (x) dx = 1
−∞

I One popular choice for K is the Gaussian kernel


1 2
K (t) = √ e −(1/2)t .

I One of the most important decisions is the bandwidth (h).
I We can just pick a number based on what looks good.
Kernel Density Estimator

Source: https://en.wikipedia.org/wiki/Kernel density estimation


Algorithm for KDE

I Representation: The sample Xi for i = 1, . . . , n.


I Learning: Add a new sample to the collection.
I Performance:
n  
1 X x − Xi
fˆ(x) = K ,
nh h
i=1

where h is the window width, smoothing parameter, or


bandwidth, and K is a kernel function, such as the Gaussian
kernel
1 2
K (t) = √ e −(1/2)t .

Kernel Density Estimator

public double getProbability( Number x ) {


int n = this.X.size();
double Pr = 0.0;
for ( int i = 0; i < n; i++ ) {
Pr += X.get(i) * Gaussian.pdf((x - X.get(i)) / this.h );
} // for
return Pr / ( n * this.h );
} // KDE::getProbability
Automatic Bandwidth Selection
I Ideally, we’d like to set h based on the data.
I This is called automatic bandwidth selection.
I Silverman’s [1998] rule-of-thumb method estimates h as
1/5
4σ̂ 5

hˆ0 = ≈ 1.06σ̂n−1/5 ,
3n

where σ̂ is the sample standard deviation and n is the number


of samples.
I Silverman’s rule of thumb assumes that the kernel is Gaussian
and that the underlying distribution is normal.
I This latter assumption may not be true, but we get a simple
expression that evaluates in constant time, and it seems to
perform well.
I Evaluating in constant time doesn’t include the time it takes
to compute σ̂, but we can compute σ̂ as we read the samples.
Automatic Bandwidth Selection

I Sheather and Jones’ [1991] solve-the-equation plug-in method


is a bit more complicated.
I It’s O(n2 ), and we have to solve numerically a set of
equations, which could fail.
I It is regarded as theoretically and empirically, the best method
we have.
Simple KDE Example
I Determine if a person’s GLU is abnormal.

No Diabetes
14
12
10
Counts

8
6
4
2
0
0 50 100 150 200 250
GLU
Simple KDE Example
I Green line: Fixed value, h = 1
I Magenta line: Sheather and Jones’ method, h = 1.5
I Blue line: Silverman’s method, h = 7.95

No Diabetes
0.04
Observations
0.035 h=1
0.03 Sheather (h = 1.5)
Est. Density

0.025 Silverman (h = 7.95)


0.02
0.015
0.01
0.005
0
0 50 100 150 200 250
GLU
Simple KDE Example

I Assume h = 7.95
I fˆ(100) = 0.018
I fˆ(250) = 3.3 × 10−14
R 100
I P(0 ≤ x ≤ 100) = 0 fˆ(x) dx
P(0 ≤ x ≤ 100) = 100 fˆ(x) dx
P
I
0
I P(0 ≤ x ≤ 100) ≈ 0.393
Naive Bayes with KDEs
I Assume we have GLU measurements for women with and
without diabetes.
I Plot of women with diabetes:

Diabetes
6
5
4
Counts

3
2
1
0
0 50 100 150 200 250
GLU
Naive Bayes with KDEs
I Plot of women without:

No Diabetes
14
12
10
Counts

8
6
4
2
0
0 50 100 150 200 250
GLU
Naive Bayes with KDEs

I The task is to determine, given a woman’s GLU measurement,


if it is more likely that she has diabetes (or vice versa).
I For this, we can use Bayes’ rule.
I Like before, we build a kernel density estimator for both sets
of data.
Naive Bayes with KDEs
I Without diabetes:

No Diabetes
0.04
Observations
0.035 h=1
0.03 Sheather (h = 1.5)
Est. Density

0.025 Silverman (h = 7.95)


0.02
0.015
0.01
0.005
0
0 50 100 150 200 250
GLU
I Silverman’s rule of thumb gives hˆ0 = 7.95
Naive Bayes with KDEs
I With diabetes:

Diabetes
0.035
Observations
0.03 Sheather (h = 1.5)
0.025 h=1
Est. Density

Silverman (h = 11.77)
0.02
0.015
0.01
0.005
0
0 50 100 150 200 250
GLU
I Silverman’s rule of thumb gives hˆ1 = 11.77
Naive Bayes with KDEs
I All together:
0.018
0.016
0.014
Est. Density

0.012
0.01
0.008
0.006
0.004
0.002
0
0 50 100 150 200 250
GLU
Naive Bayes with KDEs

I Now that we’ve built these kernel density estimators, they give
us P(GLU|Diabetes = true) and P(GLU|Diabetes = false).
Naive Bayes with KDEs

I We now need to calculate the base rate or the prior


probability of each class.
I There are 355 samples of women without diabetes, and 177
samples of women with diabetes.
I Therefore,
177
P(Diabetes = true) = = .332
177 + 355
I And,
355
P(Diabetes = false) = = .668
177 + 355
I Or,

P(Diabetes = false) = 1−P(Diabetes = true) = 1−.332 = .668


Naive Bayes with KDEs

I Bayes rule:

P(D)P(GLU|D)
P(D|GLU) =
P(D)P(GLU|D) + P(¬D)P(GLU|¬D)
Naive Bayes with KDEs
I Plot of the posterior distribution:

Posterior Distribution
1
0.9
0.8
0.7
Probability

0.6
0.5
0.4
0.3
0.2
0.1
0
0 50 100 150 200 250
GLU
Naive Bayes with KDEs

I P(D|GLU = 50)?

(.332)(2.73E − 5)
P(D|GLU = 50) = = .0385
(.332)(2.73E − 5) + (.668)(3.39E − 4)
I P(D|GLU = 175)?

(.332)(.009)
P(D|GLU = 175) = = .854
(.332)(.009) + (.668)(7.65E − 4)
References

G. H. John and P. Langley. Estimating continuous distributions in Bayesian


classifiers. In Proceedings of the Eleventh Conference on Uncertainty in
Artificial Intelligence, pages 338–345, San Francisco, CA, 1995. Morgan
Kaufmann.
G. J. McLachlan. Discriminant Analysis and Statistical Pattern Recognition.
John Wiley & Sons, New York, NY, 1992.
S. J. Sheather and M. C. Jones. A reliable data-based bandwidth selection
method for kernel density estimation. Journal of the Royal Statistical
Society. Series B (Methodological), 53(3):683–690, 1991.
B. W. Silverman. Density estimation for statistics and data analysis, volume 26
of Monographs on statistics and applied probability. Chapman & Hall/CRC,
Boca Raton, FL, 1998.

You might also like