DBSCAN - Introduction in Machine Learning.

DBSCAN is a clustering algorithm that classifies points into core, border, and noise categories based on two parameters: ε (epsilon) and MinPts. It can identify clusters of arbitrary shape and is robust to noise, but struggles with varying densities and high-dimensional data. The document also provides guidance on parameter selection, advantages, limitations, and a Python implementation example.

Uploaded by

mb18ak

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views3 pages

DBSCAN - Introduction in Machine Learning.

Uploaded by

mb18ak

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

DBSCAN works with two main parameters:

1. ε (epsilon): The maximum distance between two points for them to be considered as
part of the same neighborhood.
2. MinPts: The minimum number of points required to form a dense region.

Based on these, each point is classified as:

• Core Point: Has at least MinPts points within ε (including itself).

• Border Point: Has fewer than MinPts within ε, but is in the neighborhood of a core
point.
• Noise (Outlier): Neither a core point nor a border point.

DBSCAN Algorithm Steps

1. Start with an arbitrary unvisited point.
2. Retrieve the ε-neighborhood of the point.
3. If the neighborhood contains at least MinPts, start a new cluster.
o Mark the point as core.
o Expand the cluster recursively by including all density-reachable points (those
reachable from core points through a chain of neighboring core points).
4. If not, mark the point as noise (it might later be included in a cluster as a border
point).
5. Repeat until all points have been visited.

Advantages of DBSCAN
• Can find clusters of arbitrary shape (unlike K-Means which assumes spherical
clusters).
• Robust to noise and outliers.
• No need to specify the number of clusters beforehand (unlike K-Means or Gaussian
Mixture Models).

Limitations
• DBSCAN struggles when the dataset has varying densities, as a global ε and MinPts
may not work well for all clusters.
• Performance can degrade with high-dimensional data due to the "curse of
dimensionality".
• Choosing optimal ε and MinPts can be non-trivial and often requires domain
knowledge or empirical testing (e.g., using a k-distance graph to find a good ε).
Comparison with Other Clustering Algorithms
K-
Feature DBSCAN Hierarchical
Means
Cluster shape Arbitrary Spherical Arbitrary
Handles noise Yes No No
No (but need linkage
Need to specify K? No Yes
criterion)
Handles varying
Not well No No
density
O(n log n) (with
Time complexity O(nkdi) O(n²)
indexing)

DBSCAN using Python with scikit-learn and matplotlib.

🧪 Step-by-Step Code Example

python
CopyEdit
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_moons
from sklearn.cluster import DBSCAN

# Generate sample data (two moons shape – non-convex clusters)

X, _ = make_moons(n_samples=300, noise=0.05, random_state=0)

# Apply DBSCAN
dbscan = DBSCAN(eps=0.2, min_samples=5)
labels = dbscan.fit_predict(X)

# Plotting results
plt.figure(figsize=(8, 6))
unique_labels = set(labels)

# Assign colors to each cluster, black for noise

colors = [plt.cm.Spectral(each) for each in np.linspace(0, 1,
len(unique_labels))]
for k, col in zip(unique_labels, colors):
if k == -1:
# Black for noise
col = [0, 0, 0, 1]

class_member_mask = (labels == k)
xy = X[class_member_mask]
plt.plot(xy[:, 0], xy[:, 1], 'o', markerfacecolor=tuple(col),
markeredgecolor='k', markersize=6)

plt.title("DBSCAN Clustering on Two Moons Dataset")

plt.xlabel("Feature 1")
plt.ylabel("Feature 2")
plt.grid(True)
plt.show()
🧠 Tips on Choosing Parameters

• ε (eps): Try plotting a k-distance graph (distance to the kth nearest neighbor) and
look for the "elbow" point.
• min_samples: A common heuristic is min_samples = 2 * n_features.

MinPts = D + 1

where D is the number of dimensions in your data. This ensures that core points are
surrounded by at least as many points as dimensions plus one (to avoid counting
noise or surface points as core).

• For 2D or 3D data, MinPts = 4 or 5 is often a good start.

• In high-dimensional datasets, set MinPts between 2×D and 4×D.

✅ When to increase MinPts:

• If your dataset has noise, increase MinPts to reduce false positives.

• A larger MinPts results in more conservative (denser) clusters.

📌 2. Choosing ε (Epsilon Radius)

This is more sensitive and often the parameter that requires tuning. A small change in ε can
drastically change the clustering outcome.

✅ Use a k-distance graph:

This is the most widely used technique. Here’s how it works:

Summary Cheat Sheet

Parameter Strategy
MinPts Set to D + 1 or 2D, where D = number of dimensions
ε Use a k-distance graph; pick value at the "elbow"
Fine-tuning Try a grid search on a small dataset subset

Gotchas to Avoid
• Don't set ε too small → All points become noise.
• Don't set ε too large → Merges all points into one big cluster.
• No single perfect choice → Varies by data distribution and density.

DB Scan
No ratings yet
DB Scan
7 pages
DBSCAN Clustering in ML - Density Based Clustering
No ratings yet
DBSCAN Clustering in ML - Density Based Clustering
5 pages
UNIT-6 DBSCAN Clustering
No ratings yet
UNIT-6 DBSCAN Clustering
6 pages
DBSCAN
No ratings yet
DBSCAN
23 pages
DBSCAN
No ratings yet
DBSCAN
29 pages
Dbscan and Optics
No ratings yet
Dbscan and Optics
28 pages
DBSCAN
No ratings yet
DBSCAN
7 pages
Density Based Clustering (Unit 5)
No ratings yet
Density Based Clustering (Unit 5)
5 pages
DBSCAN Clustering Explained
No ratings yet
DBSCAN Clustering Explained
3 pages
ML Exp 9
No ratings yet
ML Exp 9
5 pages
Dbscan
No ratings yet
Dbscan
18 pages
DBSCAN
No ratings yet
DBSCAN
3 pages
DB SCAN Unit 4
No ratings yet
DB SCAN Unit 4
6 pages
ML Exp 7
No ratings yet
ML Exp 7
6 pages
Unit 8 DBSCAN
No ratings yet
Unit 8 DBSCAN
53 pages
Lab Manual Dbscan
No ratings yet
Lab Manual Dbscan
6 pages
Dbscan: Presented By: Garrett Poppe
No ratings yet
Dbscan: Presented By: Garrett Poppe
22 pages
DBSCAN Clustering
No ratings yet
DBSCAN Clustering
2 pages
Esam - DWM Lab 8
No ratings yet
Esam - DWM Lab 8
5 pages
DM Lect 8 - Clustering - DBSCAN
No ratings yet
DM Lect 8 - Clustering - DBSCAN
22 pages
DBSCAN
No ratings yet
DBSCAN
14 pages
DBSCAN Clustering
No ratings yet
DBSCAN Clustering
17 pages
Se Demo
No ratings yet
Se Demo
29 pages
Data Mining
No ratings yet
Data Mining
3 pages
Density Based Clustering
No ratings yet
Density Based Clustering
19 pages
Density Based Clustering
No ratings yet
Density Based Clustering
25 pages
DBSCAN Clustering
No ratings yet
DBSCAN Clustering
6 pages
DBSCAN Algorithm
No ratings yet
DBSCAN Algorithm
15 pages
Density-Based Clustering Guide
No ratings yet
Density-Based Clustering Guide
21 pages
DBSCAN Clustering
No ratings yet
DBSCAN Clustering
6 pages
4.6 Dbscan
No ratings yet
4.6 Dbscan
27 pages
ML0101EN Clus DBSCN Weather Py v1
No ratings yet
ML0101EN Clus DBSCN Weather Py v1
16 pages
ML Module 5
No ratings yet
ML Module 5
15 pages
DBSCAN
No ratings yet
DBSCAN
14 pages
Density ML
No ratings yet
Density ML
51 pages
DBSCAN
No ratings yet
DBSCAN
27 pages
Unsupervised Learning Clustering II
No ratings yet
Unsupervised Learning Clustering II
17 pages
DBSCAN Clustering Algorithm: Presented by
No ratings yet
DBSCAN Clustering Algorithm: Presented by
22 pages
Unsuper L
No ratings yet
Unsuper L
26 pages
Clustering Analysis
No ratings yet
Clustering Analysis
12 pages
ML14 Dbscan
No ratings yet
ML14 Dbscan
10 pages
Density Based CA
No ratings yet
Density Based CA
8 pages
Unit 4 Cluster Analysis 4
No ratings yet
Unit 4 Cluster Analysis 4
25 pages
Machine Learning Unit-4
No ratings yet
Machine Learning Unit-4
24 pages
Unit IV Unsupervised Learning 73 81
No ratings yet
Unit IV Unsupervised Learning 73 81
9 pages
DBSCAN
No ratings yet
DBSCAN
30 pages
Session 11 Hierarchical DBSCAN
No ratings yet
Session 11 Hierarchical DBSCAN
27 pages
Multi Density DBScan
No ratings yet
Multi Density DBScan
8 pages
Density Based Clustering Technique
No ratings yet
Density Based Clustering Technique
54 pages
Autoepsdbscan: Dbscan With Eps Automatic For Large Dataset: Manisha Naik Gaonkar & Kedar Sawant
No ratings yet
Autoepsdbscan: Dbscan With Eps Automatic For Large Dataset: Manisha Naik Gaonkar & Kedar Sawant
6 pages
Sklearn Kmeans Dbscan Guide
No ratings yet
Sklearn Kmeans Dbscan Guide
2 pages
DBSCAN Presentation
No ratings yet
DBSCAN Presentation
10 pages
DBSCAN Clustering Guide
No ratings yet
DBSCAN Clustering Guide
22 pages
DBSCAN
No ratings yet
DBSCAN
22 pages
DBSCAN Clustering Lab Guide
No ratings yet
DBSCAN Clustering Lab Guide
6 pages
Advanced Clustering for Varied Densities
No ratings yet
Advanced Clustering for Varied Densities
4 pages
Lecture 5
No ratings yet
Lecture 5
20 pages
Capture D'écran, Le 2025-04-14 À 16.57.54
No ratings yet
Capture D'écran, Le 2025-04-14 À 16.57.54
40 pages
Unsupervised Learning Guide
No ratings yet
Unsupervised Learning Guide
50 pages
Cluster Analysis Concepts & Algorithms
No ratings yet
Cluster Analysis Concepts & Algorithms
93 pages
Lesson 4.1 - Unsupervised Learning Partitioning Methods
No ratings yet
Lesson 4.1 - Unsupervised Learning Partitioning Methods
32 pages
Bab 8 Clustering: Data Mining - Arif Djunaidy - FTIF ITS Bab 8 - 1/??
No ratings yet
Bab 8 Clustering: Data Mining - Arif Djunaidy - FTIF ITS Bab 8 - 1/??
119 pages
A Comprehensive Survey of Clustering Algorithms
No ratings yet
A Comprehensive Survey of Clustering Algorithms
30 pages
Discovering Cluster-Based Local Outliers: Zengyou He, Xiaofei Xu, Shengchun Deng
No ratings yet
Discovering Cluster-Based Local Outliers: Zengyou He, Xiaofei Xu, Shengchun Deng
10 pages
Cluster Analysis and Methods Overview
No ratings yet
Cluster Analysis and Methods Overview
47 pages
Assign 7
No ratings yet
Assign 7
5 pages
A Survey On Trust Evaluation Based On Machine Learning
No ratings yet
A Survey On Trust Evaluation Based On Machine Learning
37 pages
UNIT 3 Data Mining
No ratings yet
UNIT 3 Data Mining
11 pages
Unsupervised Learning & Clustering
No ratings yet
Unsupervised Learning & Clustering
18 pages
AISAR Artificial Intelligence-Based Student Assess
No ratings yet
AISAR Artificial Intelligence-Based Student Assess
22 pages
Clustering Algorithms Explained
No ratings yet
Clustering Algorithms Explained
3 pages
Unit 4 Data Analytics
No ratings yet
Unit 4 Data Analytics
11 pages
Clustering Data by Reordering Them
No ratings yet
Clustering Data by Reordering Them
60 pages
Optimizing VDBSCAN with Dynamic 'K'
No ratings yet
Optimizing VDBSCAN with Dynamic 'K'
5 pages
Optics Algorithm
No ratings yet
Optics Algorithm
10 pages
10clustering - Han and Kamber
No ratings yet
10clustering - Han and Kamber
93 pages
Machine Learning for Customer Segmentation
No ratings yet
Machine Learning for Customer Segmentation
12 pages
Lecture 13
No ratings yet
Lecture 13
45 pages
Unit 5 Data Science
No ratings yet
Unit 5 Data Science
18 pages
ML 5
No ratings yet
ML 5
61 pages
Duan-Business Intelligence For Enterprise Systems-A Survey-2012
100% (1)
Duan-Business Intelligence For Enterprise Systems-A Survey-2012
9 pages
RFM, Clustering & Classification for B2B CRM
No ratings yet
RFM, Clustering & Classification for B2B CRM
12 pages
Machine Learning Term Test 2
No ratings yet
Machine Learning Term Test 2
20 pages
Sajjad DS
100% (2)
Sajjad DS
97 pages
11 Most Common Machine Learning Algorithms Explained in A Nutshell by Soner Yıldırım Towards Data Science
No ratings yet
11 Most Common Machine Learning Algorithms Explained in A Nutshell by Soner Yıldırım Towards Data Science
16 pages
Fuzzy Extensions of The DBScan Clustering Algorithm
No ratings yet
Fuzzy Extensions of The DBScan Clustering Algorithm
12 pages
ML Lecture14
No ratings yet
ML Lecture14
17 pages