DBSCAN ALGORITHM
• INTRODUCTION
• DBSCAN, which stands for Density-Based Spatial Clustering of
Applications with Noise, is a powerful clustering algorithm
that groups points that are closely packed together in data
space.
• Unlike some other clustering algorithms, DBSCAN doesn't
require you to specify the number of clusters beforehand.
• The algorithm works by defining clusters as dense regions
separated by regions of lower density.
73
DBSCAN ALGORITHM
• KEY CONCEPTS
• Core Points: These are points that have at least a minimum
number of other points (MinPts) within a specified distance
(ε or epsilon).
• Border Points: These are points that are within the ε distance
of a core point but don't have MinPts neighbors themselves.
• Noise Points: These are points that are neither core points
nor border points. They're not close enough to any cluster to
be included.
74
DBSCAN ALGORITHM
75
DBSCAN ALGORITHM
• PARAMETERS IN DBSCAN
• ε (epsilon): The maximum distance between two points for
them to be considered as neighbors.
• MinPts: The minimum number of points required to form a
dense region.
76
DBSCAN ALGORITHM
77
DBSCAN ALGORITHM
• WORKING OF DBSCAN
• STEP-1: Parameter Selection
– Choose ε (epsilon): The maximum distance between two points for
them to be considered as neighbors.
– Choose MinPts: The minimum number of points required to form a
dense region.
• STEP-2: Select a Starting Point
– The algorithm starts with an arbitrary unvisited point in the dataset.
78
DBSCAN ALGORITHM
• STEP-3: Examine the Neighborhood
– It retrieves all points within the ε distance of the starting point.
– If the number of neighboring points is less than MinPts, the point is
labeled as noise (for now).
– If there are at least MinPts points within ε distance, the point is
marked as a core point, and a new cluster is formed.
79
DBSCAN ALGORITHM
• STEP-4: Expand the Cluster
– All the neighbors of the core point are added to the cluster.
– For each of these neighbors:
• If it's a core point, its neighbors are added to the cluster
recursively.
• If it's not a core point, it's marked as a border point, and the
expansion stops.
80
DBSCAN ALGORITHM
• STEP-5: Repeat the Process
– The algorithm moves to the next unvisited point in the dataset.
– Steps 3-4 are repeated until all points have been visited.
• STEP-6: Finalize Clusters
– After all points have been processed, the algorithm identifies all
clusters.
– Points initially labeled as noise might now be border points if they're
within ε distance of a core point.
• STEP-7: Handling Noise
– Any points not belonging to any cluster remain classified as noise. 81