KEMBAR78
Ciea Assignment 3 | PDF | Cluster Analysis | Algorithms
0% found this document useful (0 votes)
11 views3 pages

Ciea Assignment 3

Uploaded by

Saad Karim
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views3 pages

Ciea Assignment 3

Uploaded by

Saad Karim
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Assignment 3: kNN and DBSCAN

Your Name
October 15, 2024

1 Introduction
This assignment explores two popular clustering and classification algorithms:
K-Nearest Neighbors (kNN) and DBSCAN. Each algorithm will be described in
detail, followed by an example with graphical representations.

2 K-Nearest Neighbors (kNN)


K-Nearest Neighbors (kNN) is a supervised learning algorithm used for classi-
fication and regression tasks. It works based on the principle that similar data
points are located close to each other in the feature space.

2.1 Algorithm
1. Choose the number of neighbors k.
2. Calculate the distance between the query point and all training points
using a distance metric (e.g., Euclidean distance).
3. Sort the distances and determine the k nearest neighbors.

4. Assign the class label based on majority voting among the k neighbors.

2.2 Example
Consider a simple dataset with two classes, represented in a 2D space. Assume
the following points:

• Class A: (1, 2), (2, 3), (3, 1)

• Class B: (6, 5), (7, 7), (8, 6)

To classify a new point P (4, 4) with k = 3:

1
1. Calculate distances:
p √ √
d(P, A) = (4 − 1)2 + (4 − 2)2 = 9 + 4 = 13 ≈ 3.61
p √ √
d(P, B) = (4 − 6)2 + (4 − 5)2 = 4 + 1 = 5 ≈ 2.24
Repeat for all points.
2. Identify the 3 nearest neighbors: (3, 1), (2, 3), (6, 5).
3. Majority voting yields Class A (2 votes) and Class B (1 vote). Thus,
P (4, 4) is classified as Class A.
B

3 DBSCAN
Density-Based Spatial Clustering of Applications with Noise (DBSCAN) is an
unsupervised clustering algorithm that groups together points that are closely
packed together, marking as outliers points that lie alone in low-density regions.

3.1 Algorithm
1. Choose parameters: ϵ (the maximum distance between two samples for
them to be considered as in the same neighborhood) and minPts (the
minimum number of points to form a dense region).
2. For each point in the dataset:
(a) Retrieve all points within ϵ distance.
(b) If the number of retrieved points is greater than or equal to minPts,
create a new cluster.
(c) Otherwise, mark the point as noise.
3. Repeat the process until all points are classified.

2
3.2 Example
Consider the following points in a 2D space:

• (1, 2), (1, 3), (2, 2), (2, 3), (5, 5), (5, 6), (8, 8)

Assuming ϵ = 1.5 and minPts = 3:

1. Start with point (1, 2):

• Neighbors: (1, 2), (1, 3), (2, 2), (2, 3) → forms Cluster 1.
2. Next point (5, 5):
• Neighbors: (5, 5), (5, 6) → insufficient points, mark as noise.

3. Next point (8, 8):


• Neighbors: (8, 8) → mark as noise.

Noise

Noise

Noise

C1 C1

C1 C1

4 Conclusion
KNN and DBSCAN are powerful algorithms for classification and clustering
tasks, respectively. While kNN relies on distance metrics to classify points
based on their neighbors, DBSCAN identifies clusters based on point density
and can effectively handle noise in the data.

You might also like