KEMBAR78
Data Mining | PDF | Spatial Analysis | Applied Mathematics
0% found this document useful (0 votes)
10 views3 pages

Data Mining

DBSCAN is a clustering algorithm that identifies core, border, and noise points based on density, using parameters Epsilon (ε) and MinPts. It effectively handles datasets with clusters of varying shapes and sizes, as demonstrated with an example dataset. The algorithm has advantages such as finding arbitrary shaped clusters and identifying noise, but it is sensitive to parameter selection and struggles with varying densities.

Uploaded by

Billa G
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views3 pages

Data Mining

DBSCAN is a clustering algorithm that identifies core, border, and noise points based on density, using parameters Epsilon (ε) and MinPts. It effectively handles datasets with clusters of varying shapes and sizes, as demonstrated with an example dataset. The algorithm has advantages such as finding arbitrary shaped clusters and identifying noise, but it is sensitive to parameter selection and struggles with varying densities.

Uploaded by

Billa G
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

Bilal Hassan

36215
BSCS 5th Eve
Submitted to: Mr. Tauqeer Abbas

DBSCAN (Density-Based Spatial Clustering of Applications with Noise) is a


popular clustering algorithm that groups together data points that are closely
packed and marks points that are in low-density regions as outliers. It is
effective for datasets with clusters of varying shapes and sizes.

Key Concepts:
Epsilon (ε): The maximum distance between two points to be considered as
neighbors.

MinPts: The minimum number of points required to form a dense region (a


cluster).

Steps:
1.Core points: A point is a core point if it has at least MinPts points within a
distance of ε.

2.Border points: A point is a border point if it has fewer than MinPts points
within ε, but is in the neighborhood of a core point.

3.Noise points: Points that are neither core points nor border points.

Example Dataset:
Consider the following 2D dataset:

(1, 2), (2, 2), (2, 3), (8, 7), (8, 8), (25, 80)
DBSCAN Algorithm with Parameters:

Ε = 2 (distance threshold for neighborhood)

MinPts = 2 (minimum number of points to form a clustStep

Steps:
1.tarting with point (1, 2):

Look for points within ε = 2 distance. Points (1, 2), (2, 2), and (2, 3) are found
within this distance.

Since there are more than MinPts (3 points), these form a cluster.

(1, 2) becomes a core point, and (2, 2), (2, 3) are part of the same cluster.

1. Moving to point (8, 7):

Points (8, 7) and (8, 8) are within ε = 2 distance. These form another cluster.

(8, 7) becomes a core point.

1.Point (25, 80):

This point does not have enough neighboring points within ε = 2 distance, so
it’s marked as noise.

Final Clusters:
Cluster 1: {(1, 2), (2, 2), (2, 3)}

Cluster 2: {(8, 7), (8, 8)}

Noise: {(25, 80)}

Visual Representation:
Cluster 1 would be points near (1, 2).

Cluster 2 would be points near (8, 7).

The point (25, 80) would be considered as noise.

Python Code Example (Using sklearn):


Import numpy as np

From sklearn.cluster import DBSCAN

Import matplotlib.pyplot as plt


# Sample dataset

X = np.array([[1, 2], [2, 2], [2, 3], [8, 7], [8, 8], [25, 80]])

# DBSCAN clustering

Db = DBSCAN(eps=2, min_samples=2)

Labels = db.fit_predict(X)

# Plotting the clusters

Plt.scatter(X[:, 0], X[:, 1], c=labels, cmap=’viridis’)

Plt.title(“DBSCAN Clustering”)

Plt.show()

# Print labels (-1 represents noise points)

Print(labels)

Output:
The clusters are marked in different colors.

The point (25, 80) will be labeled -1 indicating it’s considered noise.

Advantages of DBSCAN:
Can find clusters of arbitrary shapes.

Does not require specifying the number of clusters in advance.

Can identify noise.

Disadvantages:
Sensitive to the choice of ε and MinPts.

Struggles with clusters of varying density.

You might also like