0% found this document useful (0 votes)

38 views84 pages

Week 10

The document provides an overview of clustering techniques in machine learning, focusing on various algorithms such as K-means, hierarchical clustering, BIRCH, CURE, and DBSCAN. It discusses the definitions, applications, advantages, and limitations of each method, including how to choose the number of clusters and evaluate clustering performance. Additionally, it highlights the importance of distance metrics and the challenges posed by large datasets.

Uploaded by

Harshit Arora

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

38 views84 pages

Week 10

Uploaded by

Harshit Arora

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 84

Introduction to Machine Learning

- Prof. Balaraman Ravindran | IIT Madras

Problem Solving Session (Week-10)

Shreya Bansal
PMRF PhD Scholar
IIT Ropar
Week-10 Contents

1. Partitional Clustering (K-means)

2. Hierarchical Clustering (Agglomerative)
3. Birch Algorithm
4. Cure Algorithm
5. Density Based Clustering (DB-SCAN)
What is Clustering?
What is Clustering?

● Deﬁnition: Grouping similar data points together.

● Goal: Maximize intra-cluster similarity.
Minimize inter-cluster similarity.
● Formal Problem: Partition n data points into k clusters.
● Challenge: Number of possible partitions is huge
(combinatorial).
Applications of Clustering

● Data Mining: Categorization of unlabeled data.

● Pre-processing: Reduce large datasets to representative
points (e.g., 10M → 10k clusters).
● Visualization: Understand structure in high-dimensional
feature spaces.
● Anomaly Detection: Identify outliers (e.g., fraud detection).
● Example:
● Customer segmentation: Group users by behavior.
Clustering Algorithms Overview

● Three Main Approaches:

○ Partitional (e.g., K-means, K-medoids).
○ Hierarchical (e.g., Agglomerative).
○ Density-Based (e.g., DBSCAN).

● Key Difference:
● Partitional methods search directly for the ﬁnal k-partition.
K-Means Clustering
Steps:

1. Initialize k centroids
(randomly or
heuristically).
2. Assign each point to
the nearest centroid.
3. Recompute centroids
as cluster means.
4. Repeat until
convergence.
K-Means Clustering
Steps:

1. Initialize k centroids (randomly or heuristically).

2. Assign each point to the nearest centroid.
3. Recompute centroids as cluster means.
4. Repeat until convergence.

Pros: Simple, fast.

Cons: Sensitive to initialization; assumes spherical clusters.
K-Means Initialization Matters!

● Problem: Random
seeds can lead to poor
local optima.
● Example:
● Bad initialization →
Uneven clusters.
● Solution: K-means++
(smart seeding).
K-Medoids and PAM

● K-Medoids:
○ Uses actual data points as centroids (medoids).
○ Robust to outliers (median-like behavior).
● PAM (Partitioning Around Medoids):
○ Swap medoids with non-medoids.
○ Keep swaps that improve cluster quality.
● Trade-off: Computationally expensive (O(n2)
How to Choose k?

● Methods:
● Domain Knowledge: Predeﬁned k (e.g., 5
customer segments).
● Elbow Method: Plot k vs. distortion (sum
of squared distances).
● Pick k at the "elbow" of the curve.
Cluster Evaluation Metrics

● Diameter: Max/Avg pairwise distance in a cluster.

● Radius: Avg distance to centroid.
● Purity: % of dominant class in a cluster (if labels exist).
● Rand Index: Agreement with reference clustering.

● Formula:
● Purity=(1/N) ∑clusters maxclass ∣cluster∩class∣
Limitations of Clustering

● Ill-Posed Problem: No unique "correct" clustering.

● Distance Metrics: Euclidean fails for categorical data.
● Scalability: PAM is slow for large n.

● Workarounds:
● Use Jaccard similarity for categorical data.
● Sample data before clustering.
Example - K-Means

● Given the following 2D data points, perform K-means

clustering with K=2. Use Euclidean distance and show the ﬁrst
two iterations.

● Data Points: A(1,2), B(1.5,1.8), C(5,8), D(8,8), E(1,0.5), F(9,11)

● Initial Centroids (chosen randomly):
● μ1 =(1,2) (Point A)
● μ2 =(5,8) (Point C)
Example - K-Means

● Iteration 1
● Step 1: Assign Points
to Nearest Centroid
● Calculate Euclidean
distance (d) from each
point to centroids:
Example - K-Means

● Cluster Assignments: Cluster 1: A, B, E Cluster 2: C, D, F

● Step 2: Update Centroids
● Recalculate centroids as the mean of assigned points:
Example - K-Means

● Iteration 2
● Step 1: Reassign
Points to New
Centroids
Example - K-Means

Cluster Assignments:
Cluster 1: A, B, E
Cluster 2: C, D, F

Step 2: Check Convergence

Centroid remain unchanged (no reassignments).
Algorithm converges.
Example - K-Means

Final Result
Clusters:
Cluster 1 (Red): A(1,2), B(1.5,1.8), E(1,0.5)
Cluster 2 (Blue): C(5,8), D(8,8), F(9,11)

Final Centroids:
μ1 =(1.17,1.43)
μ2 =(7.33,9.00)
Hierarchical Clustering

● Hierarchical clustering is
a method of cluster
analysis that builds a
hierarchy of clusters.
● It starts with each data
point as an individual
cluster and successively
merges the closest
clusters until a single
cluster remains.
Steps in Hierarchical Clustering
1. Start with each data point as a separate cluster.
2. Compute distances between clusters (initially between
individual points).
3. Merge the closest clusters.
4. Repeat until only one cluster remains.
5. Generate a dendrogram to visualize the merging process.
Distance Measures
● There are multiple ways to measure the distance between clusters:
● Single Link Clustering:
○ Distance is defined by the closest pair of points between
clusters.
○ May result in long, chain-like clusters.
● Complete Link Clustering:
○ Distance is defined by the farthest pair of points between
clusters.
○ Tends to produce compact, well-separated clusters.
● Average Link Clustering:
○ Distance is the average of all pairwise distances between
points in two clusters.
Distance Measures
Measuring Distance Between Clusters
● Centroid-Based Distance:
○ Distance is measured between the centroids of
two clusters.
● Radius-Based Distance:
○ Clusters are merged based on the radius of the
combined cluster.
● Diameter-Based Distance:
○ Clusters are merged based on the diameter of the
combined cluster.
Distance Metrics for Data Points
● The distance measure between individual data points
depends on the type of data and can be:
● Euclidean Distance
● Manhattan Distance
● Jaccard Similarity
● Cosine Similarity
● Other domain-specific distance measures
Dendrograms

● A dendrogram is a tree-like diagram that illustrates the order

and levels at which clusters were merged.
● The height at which two clusters are joined represents the
distance between them.
● To determine the ﬁnal clusters, the dendrogram is cut at a
chosen threshold.
Pros and Cons of Hierarchical Clustering
● Advantages of Hierarchical Clustering
● ✅ No need to specify the number of clusters K in advance.
● ✅ Provides a complete hierarchical decomposition of the
dataset.
● ✅ Can visualize the clustering process using a dendrogram.

● Disadvantages of Hierarchical Clustering

● ❌ Computationally expensive for large datasets.
● ❌ Sensitive to noise and outliers.
● ❌ Merging decisions are irreversible.
Choosing the Number of Clusters

● The knee method or thresholding the dendrogram can help

determine the optimal number of clusters.
● Look for a large jump in merging distances to decide where
to cut the dendrogram.
Example : Hierarchical Clustering

● Given the following dataset with ﬁve points:

● Perform hierarchical clustering using the Euclidean distance
and single linkage.
Example : Hierarchical Clustering
● Solution Steps
● Compute the pairwise Euclidean distances:
● Example: Distance between A and B:
● Create an initial distance matrix (distances between all points are computed).
● Find the closest pair: A and B are closest, so merge them into one cluster.
● Recompute the distance matrix:
● Using single linkage, the distance between clusters is the minimum of all
pairwise distances.
● Repeat until only one cluster remains.
Example : Hierarchical Clustering

Step 1: (A,B)=1.41 —C1- Merge them

Step2: Update distance matrix

Example : Hierarchical Clustering

Step 3: (D,E)=2.24 —C2- Merge them

Step4: Update distance matrix

Example : Hierarchical Clustering

Step 5: (C1,C)=3.16 —C3- Merge them

Step 6: Update distance matrix (C4)

/ \
(C3) (C2)
Step 7: Merge all / \ / \
(C1) C D E
/ \
A B
BIRCH Clustering for Large Datasets

● Balanced Iterative Reducing and

Clustering using Hierarchies
● Scalable clustering for massive
data
● Two-phase approach: CF-Tree
(Clustering Features) +
Reﬁnement
● Memory-efﬁcient hierarchical
clustering
Why BIRCH?

● Challenges with Large Data:

○ K-means requires multiple passes (expensive for disk/network I/O).
○ Hierarchical methods are O(n2) in memory.

● BIRCH Solution:
○ Phase 1: Build a CF-Tree (summarize data into tight subclusters).
○ Phase 2: Reﬁne clusters using the CF-Tree’s summaries.

● Key Idea: Reduce n to k representatives (e.g., 1M → 10K).

Clustering Feature (CF) Vector
● Deﬁnition: A tuple summarizing a cluster: CF=(N, LS ,SS)
○ N: Number of points in the cluster.
○ LS : Linear Sum of points (vector).
○ SS: Sum of Squares (scalar).
● Example:
● For cluster with points (1,2) and (3,4):
○ CF=(2, (1+3,2+4), (1+9+4+16) = (2,(4,6),30 )
● Properties:
● CFs are additive (merge clusters by adding CFs).
● Enable centroid (LS/N, radius, diameter calculations.
CF-Tree Construction

● Structure:
○ Leaf Nodes: Store CF entries (subclusters).
○ Non-Leaf Nodes: Store CFs summarizing child nodes.
● Algorithm Steps:
○ Insert a point into the closest CF in the leaf (based on
centroid/diameter threshold).
○ If leaf exceeds max entries (e.g., 10), split leaf and propagate CFs
upward.
○ Repeat until all points are processed.
Phase 1 Example (CF-Tree Build)

● Data Points: (1,2),(1.5,1.8),(5,8),(8,8),(9,11)

● Threshold: Max diameter = 3.0
● Key Insight: Points are grouped into tight subclusters
dynamically.
Phase 2: Global Clustering
● Input: CF-Tree leaf entries (subcluster centroids).
● Process:
○ Run any clustering algorithm (e.g., K-means, hierarchical) on the centroids.
○ Assign original data points to the nearest ﬁnal centroid (1 additional pass).

● Advantages:
○ Scalable: Works with summaries, not raw data.
○ Flexible: Choose any clustering method for reﬁnement.
BIRCH vs. K-means
Limitations of BIRCH

● Order Sensitivity: Early points inﬂuence CF-Tree structure.

● Threshold Tuning: Diameter threshold impacts cluster granularity.
● Non-Spherical Clusters: Struggles with arbitrary shapes (like
K-means).

● Workaround:
● Use BIRCH for initial reduction, then DBSCAN for reﬁnement.
CURE: Clustering Using Representatives

1. Handles non-convex clusters

2. Sampling-based approach for scalability
3. Shrinkage-based outlier robustness
Why CURE?

● Limitations of Traditional Methods:

● K-means: Only convex clusters (spherical shapes).
● Hierarchical Methods: O(n2 ) complexity → infeasible for large n.

● CURE’s Solution:
● Sampling: Work with a memory-friendly subset of data.
● Representative Points: Capture cluster boundaries (not just
centroids).
● Shrinkage: Mitigate outlier inﬂuence.
Key Steps of CURE

● Sampling: Randomly partition data into subsets (ﬁt in memory).

● Initial Clustering: Run hierarchical clustering on each subset.

● Representative Points:
● For each cluster, pick m farthest points from centroid.
● Shrink them toward centroid by factor α (e.g., α=0.3).
● Reassignment: Assign points to the closest representative.
● Merge: Combine subsets’ representatives and recluster.
Representative Points Selection

● Process:
○ Compute centroid μ of a cluster.
○ Find farthest point p1 from μ.
○ Find farthest point p2 from p1 .
○ Repeat for m points.
○ Shrink: Move each pi toward μ by α×d(pi,μ)
● Example:
○ Cluster points: ((1,1),(1,2),(5,5),(6,6).
○ Centroid μ=(3.25,3.5).
○ Farthest point: p1 =(6,6).
○ Shrunk point (α=0.2):
○ p1′=(6−0.2×(6-3.25), 6−0.2×(6-3.5))≈(5.25,5.5)
Parallelization in CURE

● Scalability Trick:
○ Split data into k random partitions.
○ Process each partition independently (parallelizable).
○ Merge results by clustering all representatives.

● Example:
○ 1M points → 10 partitions of 100K each.
○ Each partition → 100 clusters × 4 reps = 400 points.
○ Final merge: 4K points → manageable clustering.
○ Advantage: Avoids full O(n2) computations.
CURE vs. K-means vs. BIRCH
Parameters in CURE

● Number of Representatives (m): Typically 5–20.

● Shrinkage (α): 0.2–0.5 (balances outlier robustness and boundary
accuracy).
● Sample Size: As large as memory allows.

● Trade-offs:
○ Larger m: Better boundary detection but higher overhead.
○ Smaller α: More outlier resistance but less precise
boundaries.
Limitations of CURE

● Parameter Sensitivity: Performance depends on m, α, and

sample size.
● Order Dependency: Initial sampling affects results.
● Overhead: Representative selection adds computation.

● Workaround:
● Use multiple samples and aggregate (e.g., ensemble clustering).
DBSCAN: Density-Based Clustering

Handling Arbitrary Cluster Shapes and Noise

Key concepts: Core points, density-reachability, noise
Parameters: ε (epsilon) and minPts
Advantages over K-means and hierarchical clustering
Why DBSCAN?

Limitations of Traditional Methods:

K-means: Only convex clusters (fails on non-spherical shapes).
Hierarchical: Computationally expensive (O(n2))

DBSCAN’s Solution:
Density-based: Clusters are dense regions separated by low-density
areas.
Noise Handling: Automatically identiﬁes outliers.
Key Deﬁnitions

ε (epsilon): Radius of neighborhood around a point.

minPts: Minimum points to deﬁne a dense region.
Core Point: A point with ≥ minPts within ε.
Border Point: Fewer than minPts but reachable from a core
point.
Noise Point: Not reachable from any core point.
Density-Reachability and Connectivity

Density-Reachable: Point p is reachable from q if there’s a path

of core points within ε.
Density-Connected: Points p and q are connected if they share a
common core point.

Cluster Deﬁnition:
A cluster is a set of density-connected points.
DBSCAN Algorithm Steps
DBSCAN Algorithm Steps

● Random Start: Pick an unvisited point.

● Core Check: Count neighbors in ε-radius.
● If ≥ minPts, mark as core and expand cluster.
● Expand Cluster: Recursively add density-reachable points.
● Repeat: Until all points are visited.
Parameter Sensitivity
● Effects of ε and minPts:
● Large ε/Small minPts: Fewer, larger clusters (may merge true
clusters).
● Small ε/Large minPts: More clusters, more noise.
●
● Rule of Thumb:
● minPts: Start with minPts≥dimensions+1
● ε: Use k-distance plot (ﬁnd "knee" for optimal ε).
Advantages & Limitations
● Pros:
● Handles arbitrary shapes and noise.
● No need to specify cluster count (unlike K-means).
● Cons:
● Sensitive to ε and minPts.
● Struggles with varying densities.
● Alternatives:
● OPTICS: Handles varying ε (hierarchical density).
● HDBSCAN: Automates parameter selection.
DBSCAN vs. K-means vs. Hierarchical
Example:DBSCAN
Example:DBSCAN
Point (1,1) (1,2) (2,1) (2,2) (3,3) (8,8) (8,9) (9,8) (9,9) (10,10)

(1,1) 0 1✅ 1✅ 1.41✅ 2.83❌ 9.90❌ 10.63❌ 10.63❌ 11.31❌ 12.73❌

(1,2) 1✅ 0 1.41✅ 1✅ 2.23❌ 9.22❌ 9.90❌ 9.90❌ 10.63❌ 12.08❌

(2,1) 1✅ 1.41✅ 0 1✅ 2.23❌ 9.22❌ 9.90❌ 9.90❌ 10.63❌ 12.08❌

(2,2) 1.41✅ 1✅ 1✅ 0 1.41✅ 8.48❌ 9.22❌ 9.22❌ 9.90❌ 11.31❌

(3,3) 2.83❌ 2.23❌ 2.23❌ 1.41✅ 0 7.07❌ 7.81❌ 7.81❌ 8.48❌ 9.90❌

(8,8) 9.90❌ 9.22❌ 9.22❌ 8.48❌ 7.07❌ 0 1✅ 1✅ 1.41✅ 2.83❌

(8,9) 10.63❌ 9.90❌ 9.90❌ 9.22❌ 7.81❌ 1✅ 0 1.41✅ 1✅ 2.23❌

(9,8) 10.63❌ 9.90❌ 9.90❌ 9.22❌ 7.81❌ 1✅ 1.41✅ 0 1✅ 2.23❌

(9,9) 11.31❌ 10.63❌ 10.63❌ 9.90❌ 8.48❌ 1.41✅ 1✅ 1✅ 0 1.41✅

(10,10) 12.73❌ 12.08❌ 12.08❌ 11.31❌ 9.90❌ 2.83❌ 2.23❌ 2.23❌ 1.41✅ 0
Example:DBSCAN
Example:DBSCAN
Example:DBSCAN
Example:DBSCAN
Example:DBSCAN
Example:DBSCAN
Example:DBSCAN
Assignment-10 (Cs-101- 2024) (Week-10)

Source
Question-1
In a clustering evaluation, a cluster C contains 50 data points. Of these, 30 belong to class
A, 15 to class B, and 5 to class C. What is the purity of this cluster?

a) 0.5
b) 0.6
c) 0.7
d) 0.8
Question-1- Correct answer

In a clustering evaluation, a cluster C contains 50 data points. Of these, 30 belong

to class A, 15 to class B, and 5 to class C. What is the purity of this cluster?

a) 0.5
b) 0.6
c) 0.7
d) 0.8

Correct options: (b)-Purity = (Number of data points in the most frequent class) / (Total number of

data points)
Question-2
Consider the following 2D dataset with 10 points:

(1, 1),(1, 2),(2, 1),(2, 2),(3, 3),(8, 8),(8, 9),(9, 8),(9, 9),(10, 10)

Using DBSCAN with ϵ = 1.5 and MinPts = 3, how many core points are there in this dataset?

a) 4
b) 5
c) 8
d) 10
Question-2-Explanation
Point (1,1) (1,2) (2,1) (2,2) (3,3) (8,8) (8,9) (9,8) (9,9) (10,10)

(1,1) 0 1✅ 1✅ 1.41✅ 2.83❌ 9.90❌ 10.63❌ 10.63❌ 11.31❌ 12.73❌

(1,2) 1✅ 0 1.41✅ 1✅ 2.23❌ 9.22❌ 9.90❌ 9.90❌ 10.63❌ 12.08❌

(2,1) 1✅ 1.41✅ 0 1✅ 2.23❌ 9.22❌ 9.90❌ 9.90❌ 10.63❌ 12.08❌

(2,2) 1.41✅ 1✅ 1✅ 0 1.41✅ 8.48❌ 9.22❌ 9.22❌ 9.90❌ 11.31❌

(3,3) 2.83❌ 2.23❌ 2.23❌ 1.41✅ 0 7.07❌ 7.81❌ 7.81❌ 8.48❌ 9.90❌

(8,8) 9.90❌ 9.22❌ 9.22❌ 8.48❌ 7.07❌ 0 1✅ 1✅ 1.41✅ 2.83❌

(8,9) 10.63❌ 9.90❌ 9.90❌ 9.22❌ 7.81❌ 1✅ 0 1.41✅ 1✅ 2.23❌

(9,8) 10.63❌ 9.90❌ 9.90❌ 9.22❌ 7.81❌ 1✅ 1.41✅ 0 1✅ 2.23❌

(9,9) 11.31❌ 10.63❌ 10.63❌ 9.90❌ 8.48❌ 1.41✅ 1✅ 1✅ 0 1.41✅

(10,10) 12.73❌ 12.08❌ 12.08❌ 11.31❌ 9.90❌ 2.83❌ 2.23❌ 2.23❌ 1.41✅ 0
Question-2- Correct answer
Consider the following 2D dataset with 10 points (1, 1),(1, 2),(2, 1),(2, 2),(3, 3),(8, 8),(8, 9),(9, 8),(9, 9),(10, 10)

Using DBSCAN with ϵ = 1.5 and MinPts = 3, how many core points are there in this dataset?

a) 4
b) 5
c) 8
d) 10

Correct options: (c) To be a core point, it needs at least 3 points (including itself) within ϵ = 1.5
distance. There are 8 core points: (1,1), (1,2), (2,1), (2,2) from ﬁrst group and (8,8), (8,9), (9,8), (9,9)
from second group.
Question-3
Question-3 - Correct answer

Correct options: (c)

Question-4
Which of the following properties are TRUE?

a) Using the CURE algorithm can lead to non-convex clusters.

b) K-means scales better than CURE for large datasets.
c) CURE is a simpliﬁcation of K-means and hence scales better than
k-means for large datasets.
d) K-means being more expensive to run on large datasets, can give
non-convex clusters too.
Question-4 - Correct answer

Which of the following properties are TRUE?

a) Using the CURE algorithm can lead to non-convex clusters.

b) K-means scales better than CURE for large datasets.
c) CURE is a simpliﬁcation of K-means and hence scales better than k-means for large
datasets.
d) K-means being more expensive to run on large datasets, can give non-convex clusters too.

Correct options: (a)

Question-5

The pairwise distance between 6 points is given below. Which of the option
shows the hierarchy of clusters created by single link clustering algorithm?
Question-2-Explanation

Step 1: Connect closest pair of points. Closest pairs are:

[C1] d(P3, P4) = 1 , [C2] d(P5, P6) = 2, [C3] d(P1, P2) = 3

Step 2: Connect clusters with single link. The cluster pair to combine is
bolded:
d(C3,C1) = 8 [C4] d(C3, C2) = 4 d(C2, C1) = 6

Step 3: Connect the ﬁnal 2 clusters

Question-5 - Correct answer

The pairwise distance between 6 points is given below. Which of the option
shows the hierarchy of clusters created by single link clustering algorithm?

Correct options: (b)

Question-6
For the pairwise distance matrix given in the previous question, which of the
following shows the hierarchy of clusters created by the complete link clustering
algorithm.
Question-2-Explanation

Step 1: Connect closest pair of points. Closest pairs are:

[C1] d(P3, P4) = 1 , [C2] d(P5, P6) = 2, [C3] d(P1, P2) = 3

Step 2: Connect clusters with complete link. The cluster pair to combine
is bolded:
d(C3,C1) = 9 [C4] d(C3, C2) = 10 d(C2, C1) = 8

Step 3: Connect the ﬁnal 2 clusters

Question-6 - Correct answer

For the pairwise distance matrix given in the previous question, which of
the following shows the hierarchy of clusters created by the complete link
clustering algorithm.

Correct options: (d)

Suggestions and Feedback

Next Session:

Tuesday:
08-Apr-2025
6:00 - 8:00 PM

Week10 Summary Detail
No ratings yet
Week10 Summary Detail
16 pages
Week10 Summary Detail
No ratings yet
Week10 Summary Detail
12 pages
Un Supervised Learning
No ratings yet
Un Supervised Learning
22 pages
Module 5
No ratings yet
Module 5
43 pages
Cluster
100% (1)
Cluster
72 pages
K-Means Clustering Guide
No ratings yet
K-Means Clustering Guide
51 pages
Week 6 AM Slides
No ratings yet
Week 6 AM Slides
39 pages
Unit 4 - Data Warehousing and Mining
No ratings yet
Unit 4 - Data Warehousing and Mining
51 pages
Chapter 4 - Clustering
No ratings yet
Chapter 4 - Clustering
21 pages
Clustering
No ratings yet
Clustering
75 pages
Agglomerative Clustering
No ratings yet
Agglomerative Clustering
44 pages
Unit 5
No ratings yet
Unit 5
10 pages
Heirarchical Clustering
No ratings yet
Heirarchical Clustering
22 pages
AIMLB PGP 2024 Session 12
No ratings yet
AIMLB PGP 2024 Session 12
46 pages
MachineLearning Unit IV
No ratings yet
MachineLearning Unit IV
51 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
10 pages
Unit 4 Cluster Analysis 3
No ratings yet
Unit 4 Cluster Analysis 3
20 pages
Clustering Techniques Overview
No ratings yet
Clustering Techniques Overview
80 pages
Day 3
No ratings yet
Day 3
74 pages
Lecture-02 Unsupervised Learning Algorithm (Clustering)
No ratings yet
Lecture-02 Unsupervised Learning Algorithm (Clustering)
60 pages
Clustering
No ratings yet
Clustering
75 pages
03 Clustering
No ratings yet
03 Clustering
63 pages
"These Are Just Rough Notes For References" What Is K-Means Clustering
No ratings yet
"These Are Just Rough Notes For References" What Is K-Means Clustering
9 pages
Lecture 14 Clustering
0% (1)
Lecture 14 Clustering
57 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
84 pages
Data Mining and Machine Learning
No ratings yet
Data Mining and Machine Learning
48 pages
Lecture - 11 Hierarchical Clustering
No ratings yet
Lecture - 11 Hierarchical Clustering
28 pages
ML Module 4 2022 1 PDF
No ratings yet
ML Module 4 2022 1 PDF
31 pages
Clustering - Unit 4
No ratings yet
Clustering - Unit 4
19 pages
K-Means Clustering Guide
100% (1)
K-Means Clustering Guide
14 pages
Clustering Hierarchical PDF
No ratings yet
Clustering Hierarchical PDF
31 pages
Slide TIF311 DM 10 11
No ratings yet
Slide TIF311 DM 10 11
49 pages
DSML-ML09. Unsupervised Learning
No ratings yet
DSML-ML09. Unsupervised Learning
69 pages
Understanding Clustering - A Comprehensive Guide To
No ratings yet
Understanding Clustering - A Comprehensive Guide To
5 pages
19 - Sessionppt - Clusteringalgos
No ratings yet
19 - Sessionppt - Clusteringalgos
36 pages
Module-5-Cluster Analysis-Part1
No ratings yet
Module-5-Cluster Analysis-Part1
24 pages
Clustering
No ratings yet
Clustering
38 pages
Unit 3
No ratings yet
Unit 3
12 pages
Lecture 9 Clustering
No ratings yet
Lecture 9 Clustering
36 pages
22AIP3101A Session 9
No ratings yet
22AIP3101A Session 9
38 pages
Module 4 - 5TH Sem
No ratings yet
Module 4 - 5TH Sem
23 pages
Intro to Clustering Methods
No ratings yet
Intro to Clustering Methods
39 pages
Data Mining: Hierarchical Clustering, DBSCAN The EM Algorithm
No ratings yet
Data Mining: Hierarchical Clustering, DBSCAN The EM Algorithm
63 pages
Unit 2
No ratings yet
Unit 2
33 pages
Hierarchical Clustering: Relationship Between Clusters
No ratings yet
Hierarchical Clustering: Relationship Between Clusters
23 pages
Eml 10 250825
No ratings yet
Eml 10 250825
91 pages
Unit 4 ML
No ratings yet
Unit 4 ML
14 pages
10Hierarchical&Probabilistic Clustering & GMM (ML)
No ratings yet
10Hierarchical&Probabilistic Clustering & GMM (ML)
24 pages
ML CH 4
No ratings yet
ML CH 4
65 pages
Birch
No ratings yet
Birch
6 pages
SPK Clustering
No ratings yet
SPK Clustering
35 pages
Artificial Intelligence Report
No ratings yet
Artificial Intelligence Report
23 pages
Unsupervised Learning: Clustering Algorithms
No ratings yet
Unsupervised Learning: Clustering Algorithms
13 pages
Module 3 - 1
No ratings yet
Module 3 - 1
149 pages
DWM 4
No ratings yet
DWM 4
14 pages
Unit - 4 DWDM
No ratings yet
Unit - 4 DWDM
27 pages
Clustering
No ratings yet
Clustering
45 pages
Unsupervised Learning: Clustering
No ratings yet
Unsupervised Learning: Clustering
69 pages
K-Means vs Hierarchical Clustering
No ratings yet
K-Means vs Hierarchical Clustering
30 pages
Lec 6
No ratings yet
Lec 6
39 pages
Customer Churn Prediction System: A Machine Learning Approach
No ratings yet
Customer Churn Prediction System: A Machine Learning Approach
24 pages
Ensemble Learning: Martin Sewell
No ratings yet
Ensemble Learning: Martin Sewell
16 pages
Lecture-4 Multi-Layer Perceptrons
No ratings yet
Lecture-4 Multi-Layer Perceptrons
23 pages
What Is Considered A Good AUC Score
No ratings yet
What Is Considered A Good AUC Score
5 pages
AIML Assignment 6
No ratings yet
AIML Assignment 6
3 pages
02
No ratings yet
02
4 pages
Machine Learning: Regression & Trees
No ratings yet
Machine Learning: Regression & Trees
17 pages
Machine Learning Syllabus
No ratings yet
Machine Learning Syllabus
2 pages
Aiml Unit 4
No ratings yet
Aiml Unit 4
26 pages
Density Based Spatial Clustering (DBSCAN) : With Data Analysis
No ratings yet
Density Based Spatial Clustering (DBSCAN) : With Data Analysis
36 pages
Deep Dive Into Confusion Matrix - Towards AI
No ratings yet
Deep Dive Into Confusion Matrix - Towards AI
9 pages
Lead Scoring Group Case Study Presentation
100% (2)
Lead Scoring Group Case Study Presentation
19 pages
Lecture-04 Types of Machine Learning
No ratings yet
Lecture-04 Types of Machine Learning
9 pages
Answer Book (Ashish)
100% (1)
Answer Book (Ashish)
21 pages
Practical Machine Learning for Cybersecurity
No ratings yet
Practical Machine Learning for Cybersecurity
33 pages
PA Research Papers
No ratings yet
PA Research Papers
5 pages
SVM Set3
No ratings yet
SVM Set3
6 pages
19EEE362:Deep Learning For Visual Computing: Dr.T.Ananthan
No ratings yet
19EEE362:Deep Learning For Visual Computing: Dr.T.Ananthan
23 pages
ARI5102 Presentation
No ratings yet
ARI5102 Presentation
25 pages
Logistic Regression Lab Manual
No ratings yet
Logistic Regression Lab Manual
7 pages
(REPORT) LAB - 2 - Decision - Tree
No ratings yet
(REPORT) LAB - 2 - Decision - Tree
17 pages
ML Syllabus - Sem VII - Mumbai University
No ratings yet
ML Syllabus - Sem VII - Mumbai University
3 pages
AI & ML Exam: Practical Tasks
No ratings yet
AI & ML Exam: Practical Tasks
2 pages
AI Deep Learning Cheat Sheets-From BecomingHuman - Ai PDF
100% (3)
AI Deep Learning Cheat Sheets-From BecomingHuman - Ai PDF
25 pages
Pattern Recognition Assignment: Hari Narayan N.U B110490EE EEE A Batch
No ratings yet
Pattern Recognition Assignment: Hari Narayan N.U B110490EE EEE A Batch
18 pages
Idm Assignment 3 22735
No ratings yet
Idm Assignment 3 22735
8 pages
SaiVinayakSanam ML2Project
No ratings yet
SaiVinayakSanam ML2Project
112 pages
Clustering K-Means
100% (2)
Clustering K-Means
28 pages
Ensemble Learning
No ratings yet
Ensemble Learning
9 pages

Week 10

Uploaded by

Week 10

Uploaded by

Introduction to Machine Learning

- Prof. Balaraman Ravindran | IIT Madras

Problem Solving Session (Week-10)

1. Partitional Clustering (K-means)

● Deﬁnition: Grouping similar data points together.

● Data Mining: Categorization of unlabeled data.

● Three Main Approaches:

1. Initialize k centroids (randomly or heuristically).

Pros: Simple, fast.

● Diameter: Max/Avg pairwise distance in a cluster.

● Ill-Posed Problem: No unique "correct" clustering.

● Given the following 2D data points, perform K-means

● Data Points: A(1,2), B(1.5,1.8), C(5,8), D(8,8), E(1,0.5), F(9,11)

● Cluster Assignments: Cluster 1: A, B, E Cluster 2: C, D, F

Step 2: Check Convergence

● A dendrogram is a tree-like diagram that illustrates the order

● Disadvantages of Hierarchical Clustering

● The knee method or thresholding the dendrogram can help

● Given the following dataset with ﬁve points:

Step 1: (A,B)=1.41 —C1- Merge them

Step2: Update distance matrix

Step 3: (D,E)=2.24 —C2- Merge them

Step4: Update distance matrix

Step 5: (C1,C)=3.16 —C3- Merge them

Step 6: Update distance matrix (C4)

● Balanced Iterative Reducing and

● Challenges with Large Data:

● Key Idea: Reduce n to k representatives (e.g., 1M → 10K).

● Data Points: (1,2),(1.5,1.8),(5,8),(8,8),(9,11)

● Order Sensitivity: Early points inﬂuence CF-Tree structure.

1. Handles non-convex clusters

● Limitations of Traditional Methods:

● Sampling: Randomly partition data into subsets (ﬁt in memory).

● Number of Representatives (m): Typically 5–20.

● Parameter Sensitivity: Performance depends on m, α, and

Handling Arbitrary Cluster Shapes and Noise

Limitations of Traditional Methods:

ε (epsilon): Radius of neighborhood around a point.

Density-Reachable: Point p is reachable from q if there’s a path

● Random Start: Pick an unvisited point.

(1,1) 0 1✅ 1✅ 1.41✅ 2.83❌ 9.90❌ 10.63❌ 10.63❌ 11.31❌ 12.73❌

(1,2) 1✅ 0 1.41✅ 1✅ 2.23❌ 9.22❌ 9.90❌ 9.90❌ 10.63❌ 12.08❌

(2,1) 1✅ 1.41✅ 0 1✅ 2.23❌ 9.22❌ 9.90❌ 9.90❌ 10.63❌ 12.08❌

(2,2) 1.41✅ 1✅ 1✅ 0 1.41✅ 8.48❌ 9.22❌ 9.22❌ 9.90❌ 11.31❌

(8,8) 9.90❌ 9.22❌ 9.22❌ 8.48❌ 7.07❌ 0 1✅ 1✅ 1.41✅ 2.83❌

(8,9) 10.63❌ 9.90❌ 9.90❌ 9.22❌ 7.81❌ 1✅ 0 1.41✅ 1✅ 2.23❌

(9,8) 10.63❌ 9.90❌ 9.90❌ 9.22❌ 7.81❌ 1✅ 1.41✅ 0 1✅ 2.23❌

(9,9) 11.31❌ 10.63❌ 10.63❌ 9.90❌ 8.48❌ 1.41✅ 1✅ 1✅ 0 1.41✅

In a clustering evaluation, a cluster C contains 50 data points. Of these, 30 belong

(1,1) 0 1✅ 1✅ 1.41✅ 2.83❌ 9.90❌ 10.63❌ 10.63❌ 11.31❌ 12.73❌

(1,2) 1✅ 0 1.41✅ 1✅ 2.23❌ 9.22❌ 9.90❌ 9.90❌ 10.63❌ 12.08❌

(2,1) 1✅ 1.41✅ 0 1✅ 2.23❌ 9.22❌ 9.90❌ 9.90❌ 10.63❌ 12.08❌

(2,2) 1.41✅ 1✅ 1✅ 0 1.41✅ 8.48❌ 9.22❌ 9.22❌ 9.90❌ 11.31❌

(8,8) 9.90❌ 9.22❌ 9.22❌ 8.48❌ 7.07❌ 0 1✅ 1✅ 1.41✅ 2.83❌

(8,9) 10.63❌ 9.90❌ 9.90❌ 9.22❌ 7.81❌ 1✅ 0 1.41✅ 1✅ 2.23❌

(9,8) 10.63❌ 9.90❌ 9.90❌ 9.22❌ 7.81❌ 1✅ 1.41✅ 0 1✅ 2.23❌

(9,9) 11.31❌ 10.63❌ 10.63❌ 9.90❌ 8.48❌ 1.41✅ 1✅ 1✅ 0 1.41✅

Correct options: (c)

a) Using the CURE algorithm can lead to non-convex clusters.

Which of the following properties are TRUE?

a) Using the CURE algorithm can lead to non-convex clusters.

Correct options: (a)

Step 1: Connect closest pair of points. Closest pairs are:

Step 3: Connect the ﬁnal 2 clusters

Correct options: (b)

Step 1: Connect closest pair of points. Closest pairs are:

Step 3: Connect the ﬁnal 2 clusters

Correct options: (d)

You might also like