Density-Based
Clustering
Department of Computer Science and Engineering
Dr. B R Ambedkar National Institute of Technology, Jalandhar
INTRODUCTION
01
Department of Computer Science and Engineering
Dr. B R Ambedkar National Institute of Technology, Jalandhar
What is Density-Based
Clustering?
Density-based clustering is a popular approach in data The key idea behind density-based clustering is that
mining that aims to identify clusters of data points based clusters are dense regions in the data space, separated
on their density distribution. by regions of lower density.
Unlike other clustering techniques, density-based The algorithm identifies dense regions by looking for
clustering does not require the number of clusters to be areas with a high concentration of data points and then
specified in advance and can handle arbitrary-shaped expands the clusters by including neighboring data points
clusters and noise effectively. that also meet the density criteria.
Department of Computer Science and Engineering
Dr. B R Ambedkar National Institute of Technology, Jalandhar
Density-Based
Clustering Algorithms
DBSCAN DENCLUE OPTICS
Density-Based Spatial Clustering of Applications with DENsity CLUstering Ordering Points to Identify the Clustering Structure
Noise
DBSCAN groups data points into DENCLUE is a kernel-based OPTICS is an extension of
clusters based on density density-based clustering DBSCAN that doesn't require
reachability and density algorithm. It models the overall specifying the epsilon parameter.
connectivity. It requires two density of the data as the sum of It creates an ordering of data
parameters: neighborhood radius influence functions of the data points based on their density
and minimum points in a points. It can handle arbitrary- reachability. The resulting
neighborhood. It can handle shaped clusters and is resistant ordering can be used to extract
arbitrary-shaped clusters and is to noise and outliers. clusters of varying densities.
robust to noise and outliers.
Department of Computer Science and Engineering
Dr. B R Ambedkar National Institute of Technology, Jalandhar
DBSCAN ALGORITHM
02
Department of Computer Science and Engineering
Dr. B R Ambedkar National Institute of Technology, Jalandhar
Introduction to DBSCAN
DBSCAN is one of the most popular and widely used
density-based clustering algorithms.
It was proposed by Martin Ester, Hans-Peter Kriegel, Jörg
Sander, and Xiaowei Xu in 1996.
It groups data points into clusters based on the notion of
density reachability and density connectivity. Here density
is the number of points within a specified radius r (called
epsilon).
Fig 1: DBSCAN density-based clustering with outlier handling.
Department of Computer Science and Engineering
Dr. B R Ambedkar National Institute of Technology, Jalandhar
Key Concepts
Epsilon (ε) Neighborhood:
The ε-neighborhood of a point p is the set of points within
a radius ε from p.
Core Point:
A point p is a core point if its ε-neighborhood contains at
least minPts points (including p itself).
Border Point:
A point q is a border point if it is not a core point, but it falls
within the ε-neighborhood of a core point.
Noise Point:
A point r is a noise point if it is neither a core point nor a
border point. Fig 2: Core, Border, and Noise Points
Department of Computer Science and Engineering
Dr. B R Ambedkar National Institute of Technology, Jalandhar
Algorithm Steps
Step 1 Step 2 Step 3 Step 4
Find all the neighbor For each core point if Find recursively all Iterate through the
points within ε and it is not already its density-connected remaining unvisited
identify the core assigned to a cluster, points and assign points in the dataset.
points or visited with create a new cluster. them to the same Those points that do
more than minPts cluster as the core not belong to any
neighbors. point. cluster are noise.
A point a and b are said to be density connected if there exists a point c which has a sufficient number of points in its neighbors and both points
a and b are within the eps distance. This is a chaining process. So, if b is a neighbor of c, c is a neighbor of d, and d is a neighbor of e, which in
turn is neighbor of a implying that b is a neighbor of a.
Department of Computer Science and Engineering
Dr. B R Ambedkar National Institute of Technology, Jalandhar
Algorithm Steps
Parameter Selection: DBSCAN requires two input parameters: ε (the neighborhood radius) and minPts (the minimum
number of points required to form a dense region).
Identifying Core Points: For each point p, the algorithm calculates its ε-neighborhood and checks if it contains at least
minPts points. If so, p is labeled as a core point.
Cluster Formation: The algorithm starts with an arbitrary core point p and retrieves all points that are density-reachable
from p, forming a cluster. A point q is density-reachable from p if there is a chain of points p1, p2, ..., pn such that p1 = p
and pn = q, where each pi+1 is directly density-reachable from pi (i.e., within the ε-neighborhood and has at least
minPts points in its neighborhood).
Border Points Assignment: Border points are assigned to the cluster of their closest core point.
Noise Handling: Points that are not part of any cluster are labeled as noise points.
Repeat: The algorithm repeats above three steps for all remaining unvisited points until all points are assigned to a
cluster or labeled as noise.
Department of Computer Science and Engineering
Dr. B R Ambedkar National Institute of Technology, Jalandhar
Example
Apply DBSCAN algorithm to the given datapoints and create clusters with minPts = 4 and ε = 1.9.
Point P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12
x 3 4 5 6 7 6 7 8 3 2 3 2
y 7 6 5 4 3 2 2 4 3 6 5 4
Department of Computer Science and Engineering
Dr. B R Ambedkar National Institute of Technology, Jalandhar
Example
Department of Computer Science and Engineering
Dr. B R Ambedkar National Institute of Technology, Jalandhar
Example
Department of Computer Science and Engineering
Dr. B R Ambedkar National Institute of Technology, Jalandhar
Example
Department of Computer Science and Engineering
Dr. B R Ambedkar National Institute of Technology, Jalandhar
OTHER ALGORITHMS
03
Department of Computer Science and Engineering
Dr. B R Ambedkar National Institute of Technology, Jalandhar
DENCLUE
Introduction: Compared to DBSCAN:
DENCLUE is a kernel-based density estimation approach DENCLUE does not require specifying neighborhood
which models overall density as the sum of influence radius (ε) parameter.
functions.
It uses kernel functions instead of fixed radius
It automatically identifies density attractors as cluster neighborhoods.
centers and can handle arbitrary shapes and varying
densities. It assigns points to highest density attractor rather than
density reachability.
It is robust to noise and outliers.
It is more computationally expensive for large datasets.
It may struggle with clusters of varying densities within a
cluster.
Department of Computer Science and Engineering
Dr. B R Ambedkar National Institute of Technology, Jalandhar
OPTICS
Introduction: Compared to DBSCAN:
OPTICS is an extension of DBSCAN algorithm which OPTICS does not require specifying ε (neighborhood
orders data points based on their density reachability. radius) parameter.
It constructs a reachability plot to analyze clustering It outputs an ordering rather than explicit cluster
structure and can identify clusters of varying densities assignments.
without specifying epsilon.
It allows extracting clusters at different density thresholds.
It is robust to noise and outliers.
It is more computationally expensive, especially for large
datasets.
Department of Computer Science and Engineering
Dr. B R Ambedkar National Institute of Technology, Jalandhar
Thank
You!