Density-Based
Clustering
Department of Computer Science and Engineering
Dr. B R Ambedkar National Institute of Technology, Jalandhar
INTRODUCTION
01
Department of Computer Science and Engineering
Dr. B R Ambedkar National Institute of Technology, Jalandhar
What is Density-Based
Clustering?
Density-based clustering is a popular approach The key idea behind density-based clustering is
in data mining that aims to identify clusters of that clusters are dense regions in the data
data points based on their density distribution. space, separated by regions of lower density.
Unlike other clustering techniques, density- The algorithm identifies dense regions by
based clustering does not require the number of looking for areas with a high concentration of
clusters to be specified in advance and can data points and then expands the clusters by
handle arbitrary-shaped clusters and noise including neighboring data points that also meet
effectively. the density criteria.
Department of Computer Science and Engineering
Dr. B R Ambedkar National Institute of Technology, Jalandhar
Density-Based
Clustering Algorithms
DBSCAN DENCLUE OPTICS
Density-Based Spatial Clustering of DENsity CLUstering Ordering Points to Identify the Clustering
Applications with Noise Structure
DBSCAN groups data points DENCLUE is a kernel-based OPTICS is an extension of
into clusters based on density-based clustering DBSCAN that doesn't
density reachability and algorithm. It models the require specifying the
density connectivity. It overall density of the data epsilon parameter. It
requires two parameters: as the sum of influence creates an ordering of data
neighborhood radius and functions of the data points. points based on their
minimum points in a It can handle arbitrary- density reachability. The
neighborhood. It can handle shaped clusters and is resulting ordering can be
arbitrary-shaped clusters resistant to noise and used to extract clusters of
and is robust to noise and outliers. varying densities.
outliers.
Department of Computer Science and Engineering
Dr. B R Ambedkar National Institute of Technology, Jalandhar
DBSCAN ALGORITHM
02
Department of Computer Science and Engineering
Dr. B R Ambedkar National Institute of Technology, Jalandhar
Introduction to
DBSCAN
DBSCAN is one of the most popular and widely
used density-based clustering algorithms.
It was proposed by Martin Ester, Hans-Peter
Kriegel, Jörg Sander, and Xiaowei Xu in 1996.
It groups data points into clusters based on the
notion of density reachability and density
connectivity. Here density is the number of
points within a specified radius r (called epsilon).
Fig 1: DBSCAN density-based clustering with outlier
handling.
Department of Computer Science and Engineering
Dr. B R Ambedkar National Institute of Technology, Jalandhar
Key Concepts
Epsilon (ε) Neighborhood:
The ε-neighborhood of a point p is the set of
points within a radius ε from p.
Core Point:
A point p is a core point if its ε-neighborhood
contains at least minPts points (including p itself).
Border Point:
A point q is a border point if it is not a core point,
but it falls within the ε-neighborhood of a core
point.
Noise Point:
A point r is a noise point if it is neither a core Fig 2: Core, Border, and Noise
Points
point nor a border point.
Department of Computer Science and Engineering
Dr. B R Ambedkar National Institute of Technology, Jalandhar
Algorithm Steps
Step 1 Step 2 Step 3 Step 4
Find all the For each core Find recursively Iterate through
neighbor points point if it is not all its density- the remaining
within ε and already assigned connected points unvisited points
identify the core to a cluster, and assign them in the dataset.
points or visited create a new to the same Those points that
with more than cluster. cluster as the do not belong to
minPts neighbors. core point. any cluster are
noise.
A point a and b are said to be density connected if there exists a point c which has a sufficient number of points in its
neighbors and both points a and b are within the eps distance. This is a chaining process. So, if b is a neighbor of c, c is a
neighbor of d, and d is a neighbor of e, which in turn is neighbor of a implying that b is a neighbor of a.
Department of Computer Science and Engineering
Dr. B R Ambedkar National Institute of Technology, Jalandhar
Algorithm Steps
Parameter Selection: DBSCAN requires two input parameters: ε (the neighborhood radius) and minPts
(the minimum number of points required to form a dense region).
Identifying Core Points: For each point p, the algorithm calculates its ε-neighborhood and checks if it
contains at least minPts points. If so, p is labeled as a core point.
Cluster Formation: The algorithm starts with an arbitrary core point p and retrieves all points that are
density-reachable from p, forming a cluster. A point q is density-reachable from p if there is a chain of
points p1, p2, ..., pn such that p1 = p and pn = q, where each pi+1 is directly density-reachable from
pi (i.e., within the ε-neighborhood and has at least minPts points in its neighborhood).
Border Points Assignment: Border points are assigned to the cluster of their closest core point.
Noise Handling: Points that are not part of any cluster are labeled as noise points.
Repeat: The algorithm repeats above three steps for all remaining unvisited points until all points are
assigned to a cluster or labeled as noise.
Department of Computer Science and Engineering
Dr. B R Ambedkar National Institute of Technology, Jalandhar
Example
Apply DBSCAN algorithm to the given datapoints and create clusters with minPts = 4 and ε = 1.9.
Point P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12
x 3 4 5 6 7 6 7 8 3 2 3 2
y 7 6 5 4 3 2 2 4 3 6 5 4
Department of Computer Science and Engineering
Dr. B R Ambedkar National Institute of Technology, Jalandhar
Example
Department of Computer Science and Engineering
Dr. B R Ambedkar National Institute of Technology, Jalandhar
Example
Department of Computer Science and Engineering
Dr. B R Ambedkar National Institute of Technology, Jalandhar
Example
Department of Computer Science and Engineering
Dr. B R Ambedkar National Institute of Technology, Jalandhar
OTHER ALGORITHMS
03
Department of Computer Science and Engineering
Dr. B R Ambedkar National Institute of Technology, Jalandhar
DENCLUE
Introduction: Compared to DBSCAN:
DENCLUE is a kernel-based density estimation DENCLUE does not require specifying
approach which models overall density as the neighborhood radius (ε) parameter.
sum of influence functions.
It uses kernel functions instead of fixed radius
It automatically identifies density attractors as neighborhoods.
cluster centers and can handle arbitrary shapes
and varying densities. It assigns points to highest density attractor
rather than density reachability.
It is robust to noise and outliers.
It is more computationally expensive for large
datasets.
It may struggle with clusters of varying densities
within a cluster.
Department of Computer Science and Engineering
Dr. B R Ambedkar National Institute of Technology, Jalandhar
OPTICS
Introduction: Compared to DBSCAN:
OPTICS is an extension of DBSCAN algorithm OPTICS does not require specifying ε
which orders data points based on their density (neighborhood radius) parameter.
reachability.
It outputs an ordering rather than explicit cluster
It constructs a reachability plot to analyze assignments.
clustering structure and can identify clusters of
varying densities without specifying epsilon. It allows extracting clusters at different density
thresholds.
It is robust to noise and outliers.
It is more computationally expensive, especially
for large datasets.
Department of Computer Science and Engineering
Dr. B R Ambedkar National Institute of Technology, Jalandhar
Thank
You!