DB-SCAN ALGORITHM
DB-SCAN CLUSTERING
DBSCAN is the abbreviation for Density-Based
Spatial Clustering of Applications with Noise. . It is
an unsupervised clustering algorithm.
DBSCAN clustering can work with clusters of any
size from huge amounts of data. It is basically
based on the criteria of a minimum number of
points within a region.
3
WHAT IS DB-SCAN
ALGORITHM?
DBSCAN algorithm can cluster densely grouped
points efficiently into one cluster. It can identify
local density in the data points among large
datasets. DBSCAN can very effectively handle
outliers.
It has two parameters.
.Epsilon
.Min Point
4
FACTORS:
CORE POINT
BORDER POINT
NOISE POINT
5
WHY WE USE DBSCAN?
We use DBSCAN when the data has irregular shapes
or when there is no prior knowledge about the
number of clusters..
6
WHEN SHOULD WE USE
DBSCAN OVER K-MEANS IN
CLUSTERING ANALYSIS?
DBSCAN(Density-Based Spatial Clustering of Applications with Noise) and K-
Means are both clustering algorithms that group together data that have the
same characteristic. However, They work on different principles and are
suitable for different types of data. We prefer to use DBSCAN when the data is
not spherical in shape or the number of classes is not known beforehand.
7
DB-SCAN AND K-MEAN
K-Means is very sensitive
In DBSCAN we need not specify the number to the number of clusters
of clusters. so it
need to specified
Clusters formed in K-Means
Clusters formed in DBSCAN can be of any arbitrary
are spherical or
shape.
convex in shape
K-Means does not work
well with outliers data.
DBSCAN can work well with datasets having noise and Outliers
outliers can skew the clusters in K-
Means to a very large
extent.
In K-Means only one
In DBSCAN two parameters are required for training the
parameter is required is
Model
for training the model.
8
ADVANTAGES
• 1. Handles Noise and Outliers: Identifies noise points and separates them
from clusters.
• 2. No Fixed Cluster Shape: Clusters can have varying shapes and densities.
• 3. No Prior Knowledge of Clusters: Doesn't require knowing the number of
clusters.
4.Only two hyperparameters to tune .
• 5. Efficient for Large Datasets: Scalable and efficient for big data.
9
DISADVANTAGES:
• 1. Sensitivity to hyperparameters.
• 2.Difficulties with varying density clusters.
• 3.Does not predict.
THANK YOU