Density Based Spatial Clustering (DBSCAN)
With Data Analysis
Kristina Pestaria Sinaga, Ph.D
kristina.sinaga@binus.edu
kristinasinaga57@yahoo.co.id
January 08, 2021
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
KP. Sinaga () Density Based Spatial Clustering (DBSCAN) January 08, 2021
Summary
1 Why DBSCAN?
2 How Does DBSCAN Algorithm Works?
3 Case Analysis
2-D circular Dataset
Lidar Scan Dataset
Spiral Dataset
Aggregation Dataset
Chamelon DS4 Dataset
Chamelon DS5 Dataset
4 Conclusions
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
KP. Sinaga () Density Based Spatial Clustering (DBSCAN) January 08, 2021
Why DBSCAN?
Why DBSCAN?
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
KP. Sinaga () Density Based Spatial Clustering (DBSCAN) January 08, 2021
Why DBSCAN?
What is DBSCAN?
DBSCAN is a clustering algorithm that defines clusters as continuous
regions of high density and works well if all the clusters are dense
enough and well separated by low-density regions.
Source: www.mygreatlearning.com
DBSCAN requires two parameters:
1 ϵ (Epsilon): A distance measure that will be used to locate the points
or to check the density in the neighbourhood of any point.
2 n (minPts): the minimum number of points required to form a dense
region
Often used on non-linear or non-spherical datasets
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
KP. Sinaga () Density Based Spatial Clustering (DBSCAN) January 08, 2021
Why DBSCAN?
What is non-linear or non-spherical datasets?
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
KP. Sinaga () Density Based Spatial Clustering (DBSCAN) January 08, 2021
Why DBSCAN?
What is non-linear or non-spherical datasets?
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
KP. Sinaga () Density Based Spatial Clustering (DBSCAN) January 08, 2021
Why DBSCAN?
What is non-linear or non-spherical datasets?
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
KP. Sinaga () Density Based Spatial Clustering (DBSCAN) January 08, 2021
Why DBSCAN?
Advantages: DBSCAN
DBSCAN does not require one to specify the number of clusters in the
data a priori, as opposed to k-Means, K-Medoids
DBSCAN can sort data into clusters of varying shapes as well
DBSCAN has a notion of noise, and is robust to outliers.
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
KP. Sinaga () Density Based Spatial Clustering (DBSCAN) January 08, 2021
Why DBSCAN?
Advantages: DBSCAN
DBSCAN does not require one to specify the number of clusters in the
data a priori, as opposed to k-Means, K-Medoids
DBSCAN can sort data into clusters of varying shapes as well
DBSCAN has a notion of noise, and is robust to outliers.
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
KP. Sinaga () Density Based Spatial Clustering (DBSCAN) January 08, 2021
Why DBSCAN?
Advantages: DBSCAN
DBSCAN does not require one to specify the number of clusters in the
data a priori, as opposed to k-Means, K-Medoids
DBSCAN can sort data into clusters of varying shapes as well
DBSCAN has a notion of noise, and is robust to outliers.
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
KP. Sinaga () Density Based Spatial Clustering (DBSCAN) January 08, 2021
Why DBSCAN?
Advantages: DBSCAN
⇓
DBSCAN requires just two parameters and is mostly insensitive to the
ordering of the points in the database. (However, points sitting on the
edge of two different clusters might swap cluster membership if the
ordering of the points is changed, and the cluster assignment is unique
only up to isomorphism.)
⇓
The parameters minPts and ϵ can be set by a domain expert, if the data is
well understood.
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
KP. Sinaga () Density Based Spatial Clustering (DBSCAN) January 08, 2021
Why DBSCAN?
Advantages: DBSCAN
⇓
DBSCAN requires just two parameters and is mostly insensitive to the
ordering of the points in the database. (However, points sitting on the
edge of two different clusters might swap cluster membership if the
ordering of the points is changed, and the cluster assignment is unique
only up to isomorphism.)
⇓
The parameters minPts and ϵ can be set by a domain expert, if the data is
well understood.
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
KP. Sinaga () Density Based Spatial Clustering (DBSCAN) January 08, 2021
Why DBSCAN?
Disadvantages: DBSCAN
1 DBSCAN struggles with clusters of similar density
2 Struggles with high dimensionality data
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
KP. Sinaga () Density Based Spatial Clustering (DBSCAN) January 08, 2021
How Does DBSCAN Algorithm Works?
How Does DBSCAN Algorithm Works?
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
KP. Sinaga () Density Based Spatial Clustering (DBSCAN) January 08, 2021
How Does DBSCAN Algorithm Works?
DBSCAN algorithm
Pick an arbitrary data point p as your first point.
Mark p as visited.
⇓
Extract all points present in its neighborhood (upto eps distance from the
point), and call it a set nb
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
KP. Sinaga () Density Based Spatial Clustering (DBSCAN) January 08, 2021
How Does DBSCAN Algorithm Works?
DBSCAN algorithm
Pick an arbitrary data point p as your first point.
Mark p as visited.
⇓
Extract all points present in its neighborhood (upto eps distance from the
point), and call it a set nb
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
KP. Sinaga () Density Based Spatial Clustering (DBSCAN) January 08, 2021
How Does DBSCAN Algorithm Works?
DBSCAN algorithm
Pick an arbitrary data point p as your first point.
Mark p as visited.
⇓
Extract all points present in its neighborhood (upto eps distance from the
point), and call it a set nb
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
KP. Sinaga () Density Based Spatial Clustering (DBSCAN) January 08, 2021
How Does DBSCAN Algorithm Works?
DBSCAN algorithm
If nb ≥ minPts, then Consider p as the first point of a new cluster,
Consider all points within eps distance (members of nb) as other points in
this cluster
⇓
else label p as noise
⇓
Repeat steps 1-5 till the entire dataset has been labeled ie the clustering is
complete
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
KP. Sinaga () Density Based Spatial Clustering (DBSCAN) January 08, 2021
How Does DBSCAN Algorithm Works?
DBSCAN algorithm
If nb ≥ minPts, then Consider p as the first point of a new cluster,
Consider all points within eps distance (members of nb) as other points in
this cluster
⇓
else label p as noise
⇓
Repeat steps 1-5 till the entire dataset has been labeled ie the clustering is
complete
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
KP. Sinaga () Density Based Spatial Clustering (DBSCAN) January 08, 2021
How Does DBSCAN Algorithm Works?
DBSCAN algorithm
If nb ≥ minPts, then Consider p as the first point of a new cluster,
Consider all points within eps distance (members of nb) as other points in
this cluster
⇓
else label p as noise
⇓
Repeat steps 1-5 till the entire dataset has been labeled ie the clustering is
complete
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
KP. Sinaga () Density Based Spatial Clustering (DBSCAN) January 08, 2021
Case Analysis 2-D circular Dataset
Case Analysis
2-D circular Dataset
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
KP. Sinaga () Density Based Spatial Clustering (DBSCAN) January 08, 2021
Case Analysis 2-D circular Dataset
2 and 11 Clusters of 2-D circular Dataset
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
KP. Sinaga () Density Based Spatial Clustering (DBSCAN) January 08, 2021
Case Analysis Lidar Scan Dataset
Case Analysis
Lidar Scan Dataset
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
KP. Sinaga () Density Based Spatial Clustering (DBSCAN) January 08, 2021
Case Analysis Lidar Scan Dataset
12 and 18 Clusters of Lidar Scan Dataset
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
KP. Sinaga () Density Based Spatial Clustering (DBSCAN) January 08, 2021
Case Analysis Spiral Dataset
Case Analysis
Spiral Dataset
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
KP. Sinaga () Density Based Spatial Clustering (DBSCAN) January 08, 2021
Case Analysis Spiral Dataset
3 and 4 Clusters of Spiral Dataset
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
KP. Sinaga () Density Based Spatial Clustering (DBSCAN) January 08, 2021
Case Analysis Aggregation Dataset
Case Analysis
Aggregation Dataset
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
KP. Sinaga () Density Based Spatial Clustering (DBSCAN) January 08, 2021
Case Analysis Aggregation Dataset
8 and 5 Clusters of Aggregation Dataset
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
KP. Sinaga () Density Based Spatial Clustering (DBSCAN) January 08, 2021
Case Analysis Chamelon DS4 Dataset
Case Analysis
Chamelon DS4 Dataset
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
KP. Sinaga () Density Based Spatial Clustering (DBSCAN) January 08, 2021
Case Analysis Chamelon DS4 Dataset
7 and 5 Clusters of Chamelon DS4 Dataset
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
KP. Sinaga () Density Based Spatial Clustering (DBSCAN) January 08, 2021
Case Analysis Chamelon DS5 Dataset
Case Analysis
Chamelon DS5 Dataset
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
KP. Sinaga () Density Based Spatial Clustering (DBSCAN) January 08, 2021
Case Analysis Chamelon DS5 Dataset
7 and 5 Clusters of Chamelon DS5 Dataset
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
KP. Sinaga () Density Based Spatial Clustering (DBSCAN) January 08, 2021
Conclusions
Conclusions
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
KP. Sinaga () Density Based Spatial Clustering (DBSCAN) January 08, 2021
Conclusions
Conclusions: DBSCAN
1 In DBSCAN (Density Based Spatial Clustering), a cluster is a set of
data objects spread over a contigous region of high density objects,
and objects in low-density regions are typically considered noise or
outliers.
2 DBSCAN (Density Based Spatial Clustering) can find arbitrary shape
clusters and handle noise data well, but it is slow due to
neighborhood search for each data object, and faces a difficulty in
setting the density threshold parameters properly
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
KP. Sinaga () Density Based Spatial Clustering (DBSCAN) January 08, 2021
References
References
Asuncion, A. & Newman, D. UCI machine learning repository. 2007.
Chen, X. et al. Purtreeclust: A clustering algorithm for customer
segmentation from massive customer transaction data. IEEE
Transactions on Knowledge and Data Engineering 30, 559–572
(2017).
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
KP. Sinaga () Density Based Spatial Clustering (DBSCAN) January 08, 2021
The End
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
KP. Sinaga () Density Based Spatial Clustering (DBSCAN) January 08, 2021