Unit-4 - Cluster
Analysis
Date
Lecture #1: Cluster Analysis: Partitioning
Methods- Hierarchical Methods:
Cluster: A group of similar data objects that are distinct from objects in other groups.
Cluster Analysis: Identifies similarities in data and groups similar objects together.
Unsupervised Learning: No predefined classes; learning is based on patterns in data
rather than labeled examples.
Applications:
● Used for understanding data distribution.
● Serves as a preprocessing step for other algorithms.
Cluster Analysis Example:
1. Scenario: A retail company segments customers based on annual income and
spending behavior.
2. Dataset: Includes customer income and spending scores.
3. Choose Clustering Method and the Number of clusters: (K-Means:(K=3))
i. Assign initial cluster centers.
ii. Group customers based on proximity to centroids.
iii. Recalculate centroids and repeat until stable clusters form.
4. Result:
○ Cluster 1: High-income, low-spenders.
○ Cluster 2: Low-income, high-spenders.
○ Cluster 3: Middle-income, moderate-spenders.
5. Application: Helps businesses target marketing campaigns effectively.
Partitioning vs. Hierarchical Methods
Cluster analysis is a technique used to group similar objects into clusters. Two primary
methods for clustering are partitioning methods and hierarchical methods.
1. Partitioning Methods: These methods divide the dataset into a predefined number
of clusters. Example: k-means clustering.
2. Hierarchical Methods: These build a tree-like structure of clusters, which can be
agglomerative (bottom-up) or divisive (top-down).
Partitioning Clustering Methods
K-Means
K-means is a popular clustering algorithm used to partition a dataset into K clusters. It
works by grouping similar data points together and minimizing the variance within each
cluster.
Example:
Let’s say we have the following 2D data points: (2,3),(3,3),(6,5),(8,8),(3,2),(5,7)
We want to cluster them into K=2 clusters.
# CHECK HOW MANY ITERATIONS IT TAKES TO CONVERGE
K-medoids is a clustering algorithm similar to K-means, but instead of
using the mean of the points in a cluster as the centroid, it uses an
actual point from the dataset as the "medoid" of the cluster. The medoid
is the point that minimizes the sum of distances to all other points in
the cluster.
Step 2: Assign each data point to the nearest medoid
Now, we assign each point to the cluster associated with the nearest
medoid by calculating the distance between the data points and the
medoids. We'll use the Manhattan distance for simplicity, which is the
sum of the absolute differences of their coordinates.
# CHECK HOW MANY ITERATIONS IT TAKES TO CONVERGE
CLARA (Clustering Large Applications) Algorithm :
CLARA (Clustering Large Applications) is an extension of PAM (Partitioning Around
Medoids) designed to handle large datasets efficiently by working on samples instead of
the entire dataset.
Dataset:
K=2 Clusters
Step 1: Initialize Parameters
● k = 2 (number of clusters).
● Sample size = 5 (choose 5 points randomly from 7).
● Number of samples = 2 (CLARA repeats the process multiple times).
Step 2: Draw a Random Sample of Points: Sample 1: {P1, P2, P4, P6, P7}
Apply PAM (Partitioning Around Medoids) on this sample to find the best k=2 medoids.
Step 3: Apply PAM on the Sample
Choose Initial Medoids
Let’s select P2 (5,4) and P6 (7,5) as initial medoids.
Step 4: Compute the Total Cost (Sum of Distances)
Cost = Sum of distances of each non-medoid to its medoid.
Step 5: Repeat for Another Sample
Since CLARA works with multiple random samples, we now select a second sample and
repeat the process.
New Sample (Sample 2)
Let's randomly pick a new subset of 5 points:
Sample 2 = {P1, P3, P5, P6, P7}
Now, we apply PAM (Partitioning Around Medoids) on this sample.
Step 6: Choose New Medoids for Sample 2 :say P3 (9,6) and P5 (8,1) as initial medoids.
Step 8: Compute the Total Cost for Sample 2
Step 10: Assign All Points to Final Medoids
Final clusters:
● Cluster 1 (P2 medoid): {P1, P2, P4}
● Cluster 2 (P6 medoid): {P3, P5, P6, P7}
❖ CLARA efficiently selects the best medoids by working with multiple samples
instead of processing the full dataset.
❖ The final clustering is determined based on the lowest cost among all samples.
❖ This approach makes CLARA scalable for large datasets compared to the costly
PAM algorithm.
Drawbacks of CLARA (Clustering Large Applications)
● If the sample is not representative of the entire dataset, the resulting clusters may
be suboptimal.
● Important outliers or dense regions in the full dataset may be missed in the
sampling process.
● While CLARA is more efficient than PAM, it still applies PAM to multiple samples,
making it computationally expensive.
Hierarchical Clustering:
Hierarchical clustering is a clustering method that builds a hierarchy of clusters. Unlike
k-means or k-medoids, it does not require the number of clusters (k) to be predefined.
It produces a tree-like structure called a dendrogram, which helps visualize how
clusters are merged or split.
Types of Hierarchical Clustering
1. Agglomerative (Bottom-Up)
○ Starts with each data point as its own cluster.
○ Iteratively merges the closest clusters until only one cluster remains.
○ Most commonly used.
2. Divisive (Top-Down)
○ Starts with all data points in a single cluster.
○ Iteratively splits clusters until each point is its own cluster.
○ Less common and computationally expensive.
Steps in Agglomerative Hierarchical Clustering
1. Compute Distance Matrix
○ Calculate pairwise distances between all points (e.g., using Euclidean
distance).
2. Merge Closest Clusters
○ Find the two closest clusters and merge them into one.
3. Update Distance Matrix
○ Recalculate distances between the new cluster and remaining clusters.
4. Repeat Until One Cluster Remains
○ Continue merging until all points are in a single cluster.
5. Create Dendrogram and Choose Final Clusters
○ Cut the dendrogram at a chosen threshold to determine the final number of
clusters.
Example
Step 2: Merge the Closest Clusters
● The smallest distance = 3.16 (P1 ↔ P2 and P2 ↔ P4).
● Merge P2 and P4 into a new cluster C1.
Step 3: Recalculate Distance Matrix
Use linkage methods to calculate the new cluster distance:
1. Single Linkage (minimum distance)
2. Complete Linkage (maximum distance)
3. Average Linkage (average distance)
4. Centroid Linkage (distance between centroids)
Step 4: Repeat Until All Clusters Merge
● Merge P1 and C1 (smallest distance = 3.16).
● Then merge P3 and P5 (smallest distance = 5.10).
● Continue merging until all points form a single cluster.
Dendrogram and Choosing Clusters
A dendrogram visualizes the clustering process. The height of the merge represents the
distance between clusters.
To determine the final clusters:
● Cut the dendrogram at a chosen height.
● Example: If we cut at height 4.5, we get two clusters:
○ Cluster 1: {P1, P2, P4}
○ Cluster 2: {P3, P5}
Advantages of Hierarchical Clustering
No Need to Predefine k
Produces a Dendrogram
Works for Non-Spherical Clusters
Can Be Applied to Small Datasets
Drawbacks
Computationally Expensive
Not Suitable for Large Datasets
Sensitive to Noise and Outliers
Example of Complete Linkage Clustering
=> =>
==>
#TRY single linkage dendrogram for the same distance matrix.
Divisive clustering is a type of hierarchical clustering where the algorithm starts with the
entire dataset as a single cluster and recursively splits it into smaller clusters until each
cluster contains only one element. This process contrasts with agglomerative clustering,
where the algorithm begins with individual data points as clusters and merges them
together.
Example
Consider the following 6 data points, represented as 1-dimensional points :
{10,20,30,40,50,60}
Step 1: Start with the entire dataset as a single cluster.
● Initially, all the data points { 10, 20, 30, 40, 50, 60 } are grouped together in one
cluster.
Step 2: Calculate the "best" way to split the cluster.
In this case, let’s split by using the median value of the data.
For the dataset { 10, 20, 30, 40, 50, 60 }, the median value is 35, which divides the
data into:
● Cluster 1: { 10, 20, 30 }
● Cluster 2: { 40, 50, 60 }
Step 3: Recursively divide the subclusters.
Let’s first split { 10, 20, 30 }:
● The median of { 10, 20, 30 } is 20.
● This results in two subclusters:
○ Cluster 1A: { 10 }
○ Cluster 1B: { 20, 30 }
Now, split { 40, 50, 60 }:
● The median of { 40, 50, 60 } is 50.
● This results in two subclusters:
○ Cluster 2A: { 40 }
○ Cluster 2B: { 50, 60 }
Step 4: Keep splitting until each cluster has only one element.
● We continue to recursively split clusters until each cluster contains a single data
point. Let’s split { 20, 30 } and { 50, 60 }.
● The median of { 20, 30 } is 25:
○ Cluster 1B1: { 20 }
○ Cluster 1B2: { 30 }
● The median of { 50, 60 } is 55:
○ Cluster 2B1: { 50 }
○ Cluster 2B2: { 60 }
Now all the clusters have just one data point.
Step 5: The final result.
● The final result of divisive clustering would look like this:
○ Cluster 1A: { 10 }
○ Cluster 1B1: { 20 }
○ Cluster 1B2: { 30 }
○ Cluster 2A: { 40 }
○ Cluster 2B1: { 50 }
○ Cluster 2B2: { 60 }
# TRY Using Divisive clustering cluster the following 2D points
{(2,3),(3,3),(6,6),(8,8),(10,8),(11,9)}