(4,8)
n= number of data points
1st Centroid
Components
2nd Centroid
Components
41
42
j=number of clusters
(k,i )= current data point
𝑫𝟒𝟏 𝑫𝟒𝟐
A tolerance can be set as: 0.01
Density Based Clustering: DBSCAN Algorithm
[Density-Based Spatial Clustering of Applications with Noise]
• Partitioning methods (K-means) and hierarchical clustering work
for finding spherical-shaped clusters or convex clusters. In other
words, they are suitable only for compact and well-separated
clusters. Moreover, they are also severely affected by the presence
of noise and outliers in the data.
• Real-life data may contain irregularities, like:
1. Clusters can be of arbitrary shape such as those shown in the
figure below.
2. Data may contain noise.
Parameters Required For DBSCAN Algorithm
1.eps: It defines the neighborhood around a data point i.e. if the distance between two points is
lower or equal to ‘eps’ then they are considered neighbors. If the eps value is chosen too small
then a large part of the data will be considered as an outlier. If it is chosen very large then the
clusters will merge and the majority of the data points will be in the same clusters. One way to
find the eps value is based on the k-distance graph.
2.MinPts: Minimum number of neighbors (data points) within eps radius. The larger the dataset,
the larger value of MinPts must be chosen. As a general rule, the minimum MinPts can be
derived from the number of dimensions D in the dataset as, MinPts >= D+1. The minimum value
of MinPts must be chosen at least 3.
In this algorithm, we have 3 types of data points.
Core Point: A point is a core point if it has more than MinPts points within eps.
Border Point: A point which has fewer than MinPts within eps but it is in the neighborhood of
a core point.
Noise or outlier: A point which is not a core point or border point.
Steps Used In DBSCAN Algorithm
1.Find all the neighbor points within eps and identify the core points or visited with
more than MinPts neighbors.
2.For each core point if it is not already assigned to a cluster, create a new cluster.
3.Find recursively all its density-connected points and assign them to the same
cluster as the core point.
A point a and b are said to be density connected if there exists a point c which
has a sufficient number of points in its neighbors and both points a and b are
within the eps distance. This is a chaining process. So, if b is a neighbor of c, c is a
neighbor of d, and d is a neighbor of e, which in turn is neighbor of a implying
that b is a neighbor of a.
4.Iterate through the remaining unvisited points in the dataset. Those points that
do not belong to any cluster are noise.
Numeric Example
Finding Nearest Neighbors
Finding Core points If a point having neighbors (including itself)>=minPts,
is a core point.
Finding Border points If a point is a neighbor of any of the core point
having neighboring points <=minPts,
then it will be a border point
Draw The cluster and Identify Core, Border, Noise Points