KEMBAR78
Hierarchical Clustering | PDF | Cluster Analysis | Machine Learning
0% found this document useful (0 votes)
19 views21 pages

Hierarchical Clustering

Hierarchical clustering is an unsupervised machine learning technique that groups data points into clusters based on similarity, creating a dendrogram to visualize relationships. It has two main types: Agglomerative Hierarchical Clustering (bottom-up) and Divisive Hierarchical Clustering (top-down), with AHC being more commonly used. While it offers advantages like not needing predefined clusters and better handling of non-convex shapes, it is computationally expensive and sensitive to noise.

Uploaded by

ambatiachuth53
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views21 pages

Hierarchical Clustering

Hierarchical clustering is an unsupervised machine learning technique that groups data points into clusters based on similarity, creating a dendrogram to visualize relationships. It has two main types: Agglomerative Hierarchical Clustering (bottom-up) and Divisive Hierarchical Clustering (top-down), with AHC being more commonly used. While it offers advantages like not needing predefined clusters and better handling of non-convex shapes, it is computationally expensive and sensitive to noise.

Uploaded by

ambatiachuth53
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Hierarchical clustering

Dr. Guttikonda Manohar


Department of Mechanical
CVR College of Engineering, Hyderabad
Hierarchical Clustering

• Hierarchical clustering is an unsupervised machine learning technique


used for grouping data points into clusters based on their similarity.
Unlike partition-based clustering methods like K-Means, hierarchical
clustering does not require the number of clusters to be predefined.
Instead, it creates a hierarchy (tree-like structure) of clusters, known
as a dendrogram, which allows users to determine the optimal
number of clusters by cutting the tree at a desired level.

2
. Types of Hierarchical Clustering

Hierarchical clustering is broadly categorized into two types:

1. Agglomerative Hierarchical Clustering (AHC) – Bottom-up approach.

2. Divisive Hierarchical Clustering (DHC) – Top-down approach.

3
Agglomerative Hierarchical Clustering (AHC)

• It starts with each data point as its own cluster.

• Then, it iteratively merges the closest clusters based on a distance


metric until a single cluster containing all data points is formed.

• The process is visualized using a dendrogram, which helps determine


the optimal number of clusters.

4
Divisive Hierarchical Clustering (DHC)

• It starts with all data points in a single cluster.

• The cluster is iteratively split into smaller clusters until each data
point is in its own cluster.

• It is computationally expensive and less commonly used than


agglomerative clustering.

5
Steps in Agglomerative Hierarchical Clustering

Step 1: Compute the Distance Matrix


• The distance (or similarity) between all data points is
calculated using a distance metric. Common metrics include:
• Euclidean Distance (default choice)
• Manhattan Distance
• Cosine Similarity
Step 2: Form Initial Clusters
• Each data point is treated as its own cluster.
6
Step 3: Merge Closest Clusters
• The two closest clusters are merged based on a linkage criterion
(explained below).
Step 4: Update the Distance Matrix
• After merging clusters, distances are recalculated.
Step 5: Repeat Until All Data Points Form One Cluster
• This process continues until a single cluster remains or the desired
number of clusters is reached.

7
Advantages and Disadvantages of Hierarchical
Clustering
Advantages
No need to predefine the number of clusters.
Can produce dendrograms, offering a visual representation of
cluster relationships.
Suitable for small datasets.
Can handle different shapes of clusters better than K-Means.
Disadvantages
Computationally expensive making it impractical for large datasets.
Sensitive to noise and outliers.
Once merged or split, clusters cannot be undone (non-reversible).

8
Applications of Hierarchical Clustering

• Bioinformatics – Gene expression analysis and phylogenetic tree


construction.
• Marketing – Customer segmentation based on purchasing behavior.
• Document Clustering – Grouping similar documents for information
retrieval.
• Social Network Analysis – Identifying communities in social graphs.
• Anomaly Detection – Finding outliers in datasets.

9
Hierarchical Clustering vs. K-Means

Feature Hierarchical Clustering K-Means

Requires pre-defined clusters? No Yes

Works well with non-convex


Yes No
shapes?

Computational Complexity High Lower

Handles large datasets well? No Yes

Output Dendrogram Fixed number of clusters

Sensitive to noise? Yes Moderate


10
Solved Example
• Sample points
18 22 25 27 42 43

18 22 25 27 42 43

18 0 4 7 9 24 25

22 4 0 3 5 20 21

25 7 3 0 2 17 18

27 9 5 2 0 15 16

42 24 20 17 15 0 1

43 25 21 18 16 1 0
11
18 22 25 27 42 43

18 0 4 7 9 24 25

22 4 0 3 5 20 21

25 7 3 0 2 17 18

27 9 5 2 0 15 16

42 24 20 17 15 0 1

43 25 21 18 16 1 0

12
18 22 25 27 42,43

18 0 4 7 9 24

22 4 0 3 5 20

25 7 3 0 2 17

27 9 5 2 0 15

42,43 24 20 17 15 0

(42,43)
13
18 22 25 27 42,43

18 0 4 7 9 24

22 4 0 3 5 20

25 7 3 0 2 17

27 9 5 2 0 15

42,43 24 20 17 15 0

14
18 22 25,27 42,43

18 0 4 7 24

22 4 0 3 20

25,27 7 3 0 15

42,43 24 20 15 0

((42,43),(25,27))
15
18 22 25,27 42,43

18 0 4 7 24

22 4 0 3 20

25,27 7 3 0 15

42,43 24 20 15 0

16
18 22,25,27 42,43

18 0 4 24

22,25,27 4 0 15

42,43 24 15 0

((42,43),(25,27),22)
17
18 22,25,27 42,43

18 0 4 24

22,25,27 4 0 15

42,43 24 15 0

18
18, 22, 25. 27 42,43

18 0 15

42,43 15 0

(42,43), (((25,27),22),18))

19
18, 22, 25, 27,42,43

18, 22, 25, 27,42,43 0

((42,43), (((25,27),22),18))

20
Dendrogram

15
4

3
2

18 22 25 27 42 43
21

You might also like