0% found this document useful (0 votes)

44 views51 pages

Non-Parametric Methods

The document discusses non-parametric density estimation methods, specifically histograms and kernel density estimation. Histograms estimate density by counting observations within bins of a fixed width. Kernel density estimation uses a kernel function centered on each observation to estimate the density at any given point as the average of the kernel values. The bandwidth of the kernel determines the smoothness of the estimated density, and must be chosen to balance resolution and variability.

Uploaded by

bill.morrisson

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

44 views51 pages

Non-Parametric Methods

Uploaded by

bill.morrisson

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 51

CS 509: Pattern Recognition

Non-parametric Methods

Dr. Mohammed Ayoub Alaoui Mhamdi

Bishop's University
Sherbrooke, Qc, Canada
malaoui@ubishops.ca
Introduction
 Density estimation with parametric models assumes that
the forms of the underlying density functions are known.
 However, common parametric forms do not always fit the
densities actually encountered in practice.
 In addition, most of the classical parametric densities are
unimodal, whereas many practical problems involve
multimodal densities.
 Non-parametric methods can be used with arbitrary
distributions and without the assumption that the forms of
the underlying densities are known.
 Histograms.
 Kernel Density Estimation / Parzen Windows.
 k-Nearest Neighbor Density Estimation.
 Real Example in Figure-Ground Segmentation
2
Histograms

3
Histogram Density Representation
 Consider a single continuous variable x and let’s say we have
a set of of them . Our goal is to model from .
 Standard histograms simply partition into distinct bins of
width and then count the number of observations falling
into bin .
 To turn this count into a normalized probability density, we
simply divide by the total number of observations and by the
width of the bins.
 This gives us:

 Hence the model for the density p(x) is constant over the
width of each bin. (And often the bins are chosen to have the
same width .)
4
Histogram Density Representation

5
Histogram Density as a Function of Bin Width

6
Histogram Density as a Function of Bin
Width
 The green curve is the
underlying true density from
which the samples were drawn.
It is a mixture of two
Gaussians.
 When is very small (top), the
resulting density is quite spiky
and hallucinates a lot of
structure
When not is present
very bigin . (bottom), the resulting density is quite
smooth and consequently fails to capture the bimodality of .
 It appears that the best results are obtained for some
intermediate value of , which is given in the middle figure.
 In principle, a histogram density model is also dependent on

7
the choice of the edge location of each bin.
Analyzing the Histogram Density
What are the advantages and disadvantages of the
histogram density estimator?
Advantages:
 Simple to evaluate and simple to use.
 One can throw away once the histogram is computed.
 Can be computed sequentially if data continues to come in.
Disadvantages:
 The estimated density has discontinuities due to the bin
edges rather than any property of the underlying density.
 Scales poorly (curse of dimensionality): we would have bins
if we divided each variable in a -dimensional space into
bins.
8
What can we learn from Histogram Density
Estimation?
Lesson 1: To estimate the probability density at a particular
location, we should consider the data points that lie within
some local neighborhood of that point.
 This requires we define some distance measure.
 There is a natural smoothness parameter describing the spatial
extent of the regions (this was the bin width for the
histograms).
 Lesson 2: The value of the smoothing parameter should
neither be too large or too small in order to obtain good
results.
With these two lessons in mind, we proceed to kernel
density estimation and nearest neighbor density estimation,
9 two closely related methods for density estimation.
The Space-Averaged / Smoothed Density
Consider again samples x from underlying density
p(x).
Let denote a small region containing x.
The probability mass associated with is given by

Suppose we have samples . The probability of each

sample falling into is .
How will the total number of points falling into be
distributed?
This will be a binomial distribution:

10
The Space-Averaged / Smoothed Density
The expected value for k is thus

The binomial for peaks very sharply about the mean.

So, we expect to be a very good estimate for the
probability (and thus for the space-averaged density).
This estimate is increasingly accurate as n increases.

11
The Space-Averaged / Smoothed Density
Assuming continuous and that is so small that does
not appreciably vary within it, we can write:

where is a point within and is the volume enclosed by .

After some rearranging, we get the following estimate
for

12
Example
Simulated an example of example the density at 0.5 for
an underlying zero-mean, unit variance Gaussian.
Varied the volume used to estimate the density.
Red=1000, Green=2000, Blue=3000, Yellow=4000,
Black=5000.

13
Practical Concerns
The validity of our estimate depends on two contradictory
assumptions:
1. The region must be sufficiently small the the density is
approximately constant over the region.
2. The region must be sufficiently large that the number of
points falling inside it is sufficient to yield a sharply peaked
binomial.
Another way of looking it is to fix the volume and increase
the number of training samples. Then, the ratio will
converge as desired. But, this will only yield an estimate of
the space-averaged density ().
We want p(x), so we need to let V approach 0. However, with
a fixed , will become so small, that no points will fall into it
and our estimate would be useless: .
Note that in practice, we cannot let V to become arbitrarily
14
small because the number of samples is always limited.
Practical Concerns
How can we skirt these limitations when an unlimited
number of samples if available?
 To estimate the density at , form a sequence of regions
containing with the having sample (), having samples ()
and so on.
 Let be the volume of , be the number of samples falling in
, and be the nth estimate for :
*

 If is to converge to we need the following three

conditions

15
Practical Concerns
 ensures that our space-averaged density will converge to .
 basically ensures that the frequency ratio will converge to
the probability (the binomial will be sufficiently peaked).
 is required for to converge at all. It also says that
although a huge number of samples will fall within the
region , they will form a negligibly small fraction of the
total number of samples.

16
Practical Concerns
There are two common ways of obtaining regions that
satisfy these conditions:
1. Shrink an initial region by specifying the volume as
some function of such as . Then, we need to show that
converges to . (This is like the Parzen window we’ll
talk about next.)
2. Specify as some function of such as . Then, we grow
the volume until it encloses neighbors of . (This is the
k-nearest-neighbor).

Both of these methods converge...

17
18
Parzen Windows
Let’s temporarily assume the region is a -dimensional
hypercube with being the length of an edge.
The volume of the hypercube is given by

We can derive an analytic expression for :

 Define a windowing function:

 This windowing function defines a unit hypercube centered

at the origin.
 Hence, s equal to unity if falls within the hypercube of
volume centered at , and is zero otherwise

19
Parzen Windows
The number of samples in this hypercube is therefore
given by

Substituting in equation (*), yields the estimate

Hence, the windowing function , in this context called

a Parzen window, tells us how to weight all of the
samples in to determine at a particular .

20
Example

But, what undesirable traits from histograms are inherited

by Parzen window density estimates of the form we’ve
just defined?
Discontinuities...
21 Dependence on the bandwidth.
Generalizing the Kernel Function
What if we allow a more general class of windowing
functions rather than the hypercube?
If we think of the windowing function as an interpolator,
rather than considering the window function about only, we
can visualize it as a kernel sitting on each data sample in .
And, if we require the following two conditions on the
kernel function , then we can be assured that the resulting
density will be proper: non-negative and integrate to .

For our previous case of , then it follows will also satisfy

these conditions.
22
Example: A Univariate Guassian Kernel
A popular choice of the kernel is the Gaussian kernel:

The resulting density is given by:

It will give us smoother estimates without the

discontinuites from the hypercube kernel.

23
Effect of the Window Width
An important question is what effect does the window
width have on ?
Define as

and rewrite as the average

24
Effect of the Window Width
 clearly affects both the amplitude and the width of .

25
Effect of Window Width (And, hence,
Volume )
But, for any value of , the distribution is normalized:

If is too large, the estimate will suffer from too little
resolution.
If is too small, the estimate will suffer from too much
variability.
In theory (with an unlimited number of samples), we can
let slowly approach zero as increases and then will
converge to the unknown . But, in practice, we can, at
best, seek some compromise.

26
Example: Revisiting the Univariate
Guassian Kernel

27
Example: A Bimodal Distribution

28
Parzen Window-Based Classifiers
Estimate the densities for each category.
Classify a query point by the label corresponding to the
maximum posterior (i.e., one can include priors).
As you guessed it, the decision regions for a Parzen
window-based classifier depend upon the kernel
function.

29
Parzen Window-Based Classifiers
During training, we can make the error arbitrarily low by
making the window sufficiently small, but this will have an
ill-effect during testing (which is our ultimate need).
Think of any possibilities for system rules of choosing the
kernel?
One possibility is to use cross-validation. Break up the data
into a training set and a validation set. Then, perform
training on the training set with varying bandwidths. Select
the bandwidth that minimizes the error on the validation
set.
There is little theoretical justification for choosing one
window width over another.
30
Nearest Neighbor Methods
 Selecting the best window / bandwidth is a severe limiting
factor for Parzen window estimators.
 methods circumvent this problem by making the window
size a function of the actual training data.
 The basic idea here is to center our window around and
let it grow until it captures samples, where is a function
of n.
 These samples are the nearest neighbors of .
 If the density is high near then the window will be relatively
small leading to good resolution.
 If the density is low near , the window will grow large, but it
will stop soon after it enters regions of higher density.
31
 In either case, we estimate according to
Nearest Neighbor Methods
We want to go to infinity as n goes to infinity thereby
assuring us that will be a good estimate of the
probability that a point will fall in the window of volume
Vn.
But, we also want to grow sufficiently slowly so that the
size of our window will go to zero.
Thus, we want to go to zero.
Recall these conditions from the earlier discussion; these
will ensure that converges to as approaches infinity.

32
Examples of Estimation
Notice the discontinuities in the slopes of the estimate.

33
Estimation From 1 Sample

We don’t expect the density estimate from 1 sample to

be very good, but in the case of it will diverge!
With and , the estimate for is

34
But, as we increase the number of samples, the estimate will
improve.

35
Limitations
The Estimator suffers from an analogous flaw from which
the Parzen window methods suffer.
What is it? How do we specify the ?
We saw earlier that the specification of can lead to
radically different density estimates (in practical situations
where the number of training samples is limited).
 One could obtain a sequence of estimates by taking and
choose different values of .
But, like the Parzen window size, one choice is as good as
another absent any additional information.
Similarly, in classification scenarios, we can base our
judgement on classification error.
36
Posterior Estimation for Classification
 We can directly apply the methods to estimate the
posterior probabilities from a set of n labeled samples.
 Place a window of volume around and capture
samples, with ki turning out to be of label .
 The estimate for the joint probability is thus

 A reasonable estimate for the posterior is thus

 Hence, the posterior probability for is simply the

fraction of samples within the window that are
labeled . This is a simple and intuitive result.
37
Example: Figure-Ground Discrimination
Source: Zhao and Davis. Iterative Figure-Ground Discrimination. ICPR 2004.

Figure-ground discrimination is an important low-level

vision task.
Want to separate the pixels that contain some
foreground object (specified in some meaningful way)
from the background.

38
Example: Figure-Ground Discrimination
Source: Zhao and Davis. Iterative Figure-Ground Discrimination. ICPR 2004.

This paper presents a method for figure-ground

discrimination based on non-parametric densities for
the foreground and background.
They use a subset of the pixels from each of the two
regions. They propose an algorithm called iterative
sampling-expectation for performing the actual
segmentation.
The required input is simply a region of interest
(mostly) containing the object.

39
Example: Figure-Ground Discrimination
Source: Zhao and Davis. Iterative Figure-Ground Discrimination. ICPR 2004.

Given a set of samples where each is a dimensional

vector.
We know the kernel density estimate is defined as

where the same kernel ϕ with different bandwidth σj is

used in each dimension.

40
The Representation
Source: Zhao and Davis. Iterative Figure-Ground Discrimination. ICPR 2004.

 The representation used here is a function of RGB:

 Separating the chromaticity from the brightness allows them

to us a wider bandwidth in the brightness dimension to

account for variability due to shading effects.
 And, much narrower kernels can be used on the and
41 chromaticity channels to enable better discrimination.
The Color Density
Source: Zhao and Davis. Iterative Figure-Ground Discrimination. ICPR 2004.

Given a sample of pixels , the color density estimate is

given by

where we have simplified the kernel definition:

They use Gaussian kernels

with a different bandwidth in each dimension.

42
Data-Driven Bandwidth
Source: Zhao and Davis. Iterative Figure-Ground Discrimination. ICPR 2004.

The bandwidth for each channel is calculated directly from

the image based on sample statistics.

where is the sample variance.

43
Initialization: Choosing the Initial Scale
Source: Zhao and Davis. Iterative Figure-Ground Discrimination. ICPR 2004.
For initialization, they compute a distance between the
foreground and background distribution by varying the scale
of a single Gaussian kernel (on the foreground).
To evaluate the “significance” of a particular scale, they
compute the normalized KL-divergence:

where and are the density estimates for the foreground and
background regions respectively. To compute each, they use
about of the pixels (using all of the pixels would lead to quite
slow performance).

44
45
Iterative Sampling-Expectation Algorithm
Source: Zhao and Davis. Iterative Figure-Ground Discrimination. ICPR 2004.
Given the initial segmentation, they need to refine the
models and labels to adapt better to the image.
However, this is a chicken-and-egg problem. If we know the
labels, we could compute the models, and if we knew the
models, we could compute the best labels.
They propose an EM algorithm for this. The basic idea is to
alternate between estimating the probability that each pixel
is of the two classes, and then given this probability to refine
the underlying models.
EM is guaranteed to converge (but only to a local
minimum).
46
Iterative Sampling-Expectation Algorithm
Source: Zhao and Davis. Iterative Figure-Ground Discrimination. ICPR 2004.

1. Initialize using the normalized KL-divergence.

2. Uniformly sample a set of pixel from the image to use in
the kernel density estimation. This is essentially the ‘M’
step (because we have a non-parametric density).
3. Update the pixel assignment based on maximum
likelihood (the ‘E’ step).
4. Repeat until stable. One can use a hard assignment of the
pixels and the kernel density estimator we’ve discussed,
or a soft assignment of the pixels and then a weighted
kernel density estimate (the weight is between the
different classes).
5. The overall probability of a pixel belonging to the
47
foreground class
Results: Stability
Source: Zhao and Davis. Iterative Figure-Ground Discrimination. ICPR 2004.

48
Results
Source: Zhao and Davis. Iterative Figure-Ground Discrimination. ICPR 2004.

49
Results
Source: Zhao and Davis. Iterative Figure-Ground Discrimination. ICPR 2004.

50
Summary
Advantages:
 No assumptions are needed about the distributions ahead
of time (generality).
 With enough samples, convergence to an arbitrarily
complicated target density can be obtained.
Disadvantages:
 The number of samples needed may be very large
(number grows exponentially with the dimensionality of
the feature space).
 There may be severe requirements for computation time
and storage.

CpE646 7v3 PDF
No ratings yet
CpE646 7v3 PDF
40 pages
U4 ProbabilityDensityEstimation
No ratings yet
U4 ProbabilityDensityEstimation
6 pages
05 Density Estimation
No ratings yet
05 Density Estimation
29 pages
TEAA - Memory Based Tecniques
No ratings yet
TEAA - Memory Based Tecniques
23 pages
Ast Part1 PDF
No ratings yet
Ast Part1 PDF
20 pages
Lec 10 NN
No ratings yet
Lec 10 NN
10 pages
Parzen Window
No ratings yet
Parzen Window
43 pages
Pa 01 Density Estimation
No ratings yet
Pa 01 Density Estimation
25 pages
13 Density Estimation Note
No ratings yet
13 Density Estimation Note
48 pages
Density Estimation
No ratings yet
Density Estimation
17 pages
Empirical Finance1
No ratings yet
Empirical Finance1
31 pages
Nonparametric Methods: Jason Corso
No ratings yet
Nonparametric Methods: Jason Corso
49 pages
On Density Estimation
No ratings yet
On Density Estimation
4 pages
Econometricians' Guide to KDE
No ratings yet
Econometricians' Guide to KDE
35 pages
Tabak Turner
No ratings yet
Tabak Turner
20 pages
Chap 4
No ratings yet
Chap 4
21 pages
Kernel Density Estimation - Wikipedia
No ratings yet
Kernel Density Estimation - Wikipedia
11 pages
Comprehensiv Questions Solved
No ratings yet
Comprehensiv Questions Solved
28 pages
Articulo Sheather
No ratings yet
Articulo Sheather
11 pages
Density Estimation Is A Statistical Technique Used
No ratings yet
Density Estimation Is A Statistical Technique Used
16 pages
(Bernard. W. Silverman) Density Estimation For Sta
No ratings yet
(Bernard. W. Silverman) Density Estimation For Sta
92 pages
Nonparametric Statistics Epiphany 2024-25
No ratings yet
Nonparametric Statistics Epiphany 2024-25
102 pages
CrimeStatChapter 8
No ratings yet
CrimeStatChapter 8
43 pages
Parameter Estimation - PR
No ratings yet
Parameter Estimation - PR
66 pages
Non Parametric Density Estimation
No ratings yet
Non Parametric Density Estimation
4 pages
Densityestimation
No ratings yet
Densityestimation
28 pages
Non-Parametric Density Estimation
No ratings yet
Non-Parametric Density Estimation
3 pages
The Study of Different Types of Kernel Density Estimators: Minge Sha, Yonggang Xie
No ratings yet
The Study of Different Types of Kernel Density Estimators: Minge Sha, Yonggang Xie
5 pages
HISTOGRAMS
No ratings yet
HISTOGRAMS
5 pages
Getdist: Kernel Density Estimation: Url: Http://Cosmologist - Info
No ratings yet
Getdist: Kernel Density Estimation: Url: Http://Cosmologist - Info
11 pages
Intro To Kernel Density Estimation
No ratings yet
Intro To Kernel Density Estimation
4 pages
Histogram: Nonparametric Kernel Density Estimation
No ratings yet
Histogram: Nonparametric Kernel Density Estimation
19 pages
Advanced Density Estimation Guide
No ratings yet
Advanced Density Estimation Guide
32 pages
01 Intro Densities
No ratings yet
01 Intro Densities
23 pages
A Review of Kernel Density Estimation With Applications To Econometrics (#278024) - 259389
No ratings yet
A Review of Kernel Density Estimation With Applications To Econometrics (#278024) - 259389
23 pages
Chapter One
100% (1)
Chapter One
46 pages
Slides3part1 mrbm2324
No ratings yet
Slides3part1 mrbm2324
29 pages
Mean-Shift Tracking: R.Collins, CSE, PSU CSE598G Spring 2006
No ratings yet
Mean-Shift Tracking: R.Collins, CSE, PSU CSE598G Spring 2006
93 pages
09 ML Nonparametric Machine Learning
No ratings yet
09 ML Nonparametric Machine Learning
19 pages
Lec7 Density PDF
No ratings yet
Lec7 Density PDF
9 pages
AA1 Tema4
No ratings yet
AA1 Tema4
37 pages
Estimating Distributions and Densities: 36-350, Data Mining, Fall 2009 23 November 2009
No ratings yet
Estimating Distributions and Densities: 36-350, Data Mining, Fall 2009 23 November 2009
7 pages
Parzen Windowing
No ratings yet
Parzen Windowing
10 pages
04.05-Histograms-and-Binnings - Ipynb - Colaboratory
No ratings yet
04.05-Histograms-and-Binnings - Ipynb - Colaboratory
7 pages
Simon Sheather 2004 PDF
No ratings yet
Simon Sheather 2004 PDF
10 pages
Statistical Computing: Set - Seed (1001) N 100 X Rlnorm (N)
No ratings yet
Statistical Computing: Set - Seed (1001) N 100 X Rlnorm (N)
11 pages
Kernel Density Estimation
No ratings yet
Kernel Density Estimation
10 pages
Towardsdatascience Com The Math Behind Kernel Density Estimation 5deca75cba38 ...
No ratings yet
Towardsdatascience Com The Math Behind Kernel Density Estimation 5deca75cba38 ...
26 pages
ML Unit-4
No ratings yet
ML Unit-4
29 pages
Pattern Recognition 21BR551 MODULE 03 NOTES
No ratings yet
Pattern Recognition 21BR551 MODULE 03 NOTES
16 pages
UNIT2SVMKNN
No ratings yet
UNIT2SVMKNN
31 pages
CE-613 - DOC - 02 Descriptive Stat, Frequency Plot
No ratings yet
CE-613 - DOC - 02 Descriptive Stat, Frequency Plot
62 pages
Advanced Data Analysis Techniques
No ratings yet
Advanced Data Analysis Techniques
20 pages
Day 3
No ratings yet
Day 3
19 pages
Izenman 1991
No ratings yet
Izenman 1991
21 pages
Non-Parametric Methods Using Kernel Density Estimation
No ratings yet
Non-Parametric Methods Using Kernel Density Estimation
1 page
2022 Islamic Studies Past Paper
No ratings yet
2022 Islamic Studies Past Paper
28 pages
Cisco IOS XR Getting Started Guide For The Cisco CRS Router
No ratings yet
Cisco IOS XR Getting Started Guide For The Cisco CRS Router
220 pages
British Identity & Ethnic Loyalties
No ratings yet
British Identity & Ethnic Loyalties
10 pages
Text Books
No ratings yet
Text Books
236 pages
Arduino Buzzer Programming Guide
No ratings yet
Arduino Buzzer Programming Guide
6 pages
The Different Types of Web
No ratings yet
The Different Types of Web
59 pages
Ulysses - TEST
0% (1)
Ulysses - TEST
2 pages
WhatsApp Sticker Maker - Android Documentation
No ratings yet
WhatsApp Sticker Maker - Android Documentation
16 pages
Grade 4 Winter Break HW
No ratings yet
Grade 4 Winter Break HW
1 page
General Studies
No ratings yet
General Studies
40 pages
Tarea Semana 11 PDF
No ratings yet
Tarea Semana 11 PDF
3 pages
Application Developer Resume SEO
No ratings yet
Application Developer Resume SEO
1 page
Final Kisi Soal English Provinsi
No ratings yet
Final Kisi Soal English Provinsi
7 pages
Pediatric Growth & Development Guide
No ratings yet
Pediatric Growth & Development Guide
19 pages
W9 Review and Periodical Test
100% (2)
W9 Review and Periodical Test
2 pages
Lesson 3 Clock - Ball
No ratings yet
Lesson 3 Clock - Ball
7 pages
Coding and Decoding
0% (1)
Coding and Decoding
7 pages
Denver International Airport Baggage Handling System Failure
No ratings yet
Denver International Airport Baggage Handling System Failure
12 pages
Dynamic Programming Made Simpler
No ratings yet
Dynamic Programming Made Simpler
15 pages
Software Quality Engineering - Unit 3
No ratings yet
Software Quality Engineering - Unit 3
44 pages
Past Tense Simple /continuous
No ratings yet
Past Tense Simple /continuous
15 pages
E. Theodore Mullen, JR - The Divine Council in Canaanite and Early Hebrew Literature 4211702
No ratings yet
E. Theodore Mullen, JR - The Divine Council in Canaanite and Early Hebrew Literature 4211702
348 pages
CMIS320-Project 2-DatabaseDesign
No ratings yet
CMIS320-Project 2-DatabaseDesign
5 pages
Set 2 - SPED
No ratings yet
Set 2 - SPED
9 pages
Inverse Trigonometric Function: Multiple Choice Questions
100% (1)
Inverse Trigonometric Function: Multiple Choice Questions
6 pages
4019 OXE User Guide PDF
No ratings yet
4019 OXE User Guide PDF
2 pages
CELTA 2019 Pre-Course Task
No ratings yet
CELTA 2019 Pre-Course Task
19 pages
2024-03-06
No ratings yet
2024-03-06
17 pages
Lets Build A House Lesson
No ratings yet
Lets Build A House Lesson
5 pages
Great Ideas of Mathematics
No ratings yet
Great Ideas of Mathematics
2 pages

Non-Parametric Methods

Uploaded by

Non-Parametric Methods

Uploaded by

CS 509: Pattern Recognition

Dr. Mohammed Ayoub Alaoui Mhamdi

Suppose we have samples . The probability of each

The binomial for peaks very sharply about the mean.

where is a point within and is the volume enclosed by .

 If is to converge to we need the following three

Both of these methods converge...

We can derive an analytic expression for :

 This windowing function defines a unit hypercube centered

Substituting in equation (*), yields the estimate

Hence, the windowing function , in this context called

But, what undesirable traits from histograms are inherited

For our previous case of , then it follows will also satisfy

The resulting density is given by:

It will give us smoother estimates without the

and rewrite as the average

We don’t expect the density estimate from 1 sample to

 A reasonable estimate for the posterior is thus

 Hence, the posterior probability for is simply the

Figure-ground discrimination is an important low-level

This paper presents a method for figure-ground

Given a set of samples where each is a dimensional

where the same kernel ϕ with different bandwidth σj is

 The representation used here is a function of RGB:

 Separating the chromaticity from the brightness allows them

to us a wider bandwidth in the brightness dimension to

Given a sample of pixels , the color density estimate is

where we have simplified the kernel definition:

They use Gaussian kernels

with a different bandwidth in each dimension.

The bandwidth for each channel is calculated directly from

where is the sample variance.

1. Initialize using the normalized KL-divergence.

You might also like