KEMBAR78
Hierarchical Clustering Case Study | PDF | Cluster Analysis | Algorithms
0% found this document useful (0 votes)
22 views4 pages

Hierarchical Clustering Case Study

Hierarchical clustering is a technique that creates a tree structure of clusters without requiring a predefined number of clusters, commonly used in customer segmentation, image analysis, and bioinformatics. It can be approached in two ways: bottom-up (agglomerative) which merges clusters, and top-down (divisive) which splits clusters, both visualized through dendrograms. The method is beneficial for understanding hierarchical relationships in data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views4 pages

Hierarchical Clustering Case Study

Hierarchical clustering is a technique that creates a tree structure of clusters without requiring a predefined number of clusters, commonly used in customer segmentation, image analysis, and bioinformatics. It can be approached in two ways: bottom-up (agglomerative) which merges clusters, and top-down (divisive) which splits clusters, both visualized through dendrograms. The method is beneficial for understanding hierarchical relationships in data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Hierarchical Clustering Technique

What is Hierarchical Clustering?

Hierarchical clustering is a clustering technique that builds a hierarchy (tree structure) of clusters.

Unlike K-means, it doesnt require specifying the number of clusters in advance. It is mainly used in:

- Customer segmentation

- Image analysis

- Bioinformatics (gene clustering)

Principle Behind Hierarchical Clustering

The key idea is to group similar data points into clusters by using a distance metric (e.g., Euclidean

distance) and a linkage criterion (e.g., single, complete, average).

It produces a dendrogram a tree-like diagram showing how data points are merged or split.

Two Main Approaches

1. Bottom-Up (Agglomerative) Clustering

Working Principle:

- Start with each data point as a separate cluster.

- At each step, merge the two closest clusters.

- Continue until all points belong to one cluster (or a stopping condition is met).

Linkage Criteria:

- Single linkage: Min distance between clusters

- Complete linkage: Max distance between clusters

- Average linkage: Average distance


Example:

Given 4 data points: A, B, C, D

| Pair | Distance |

|------|----------|

| A-B | 2 |

| A-C | 6 |

| A-D | 10 |

| B-C | 5 |

| B-D | 9 |

| C-D | 4 |

Steps:

1. Merge A and B Cluster1 (A,B)

2. Next closest: C and D Cluster2 (C,D)

3. Merge Cluster1 and Cluster2 Final Cluster

Result: A dendrogram showing the merging sequence.

2. Top-Down (Divisive) Clustering

Working Principle:

- Start with all data points in one big cluster.

- At each step, split the least similar points.

- Continue splitting until each point is its own cluster or a stopping criterion is met.

Example:
Start with: Cluster {A, B, C, D}

1. Split into {A, B} and {C, D} based on dissimilarity

2. Further divide {A, B} into A and B, and {C, D} into C and D

3. Final result: four singleton clusters

Result: Reverse of agglomerative, the dendrogram is built from the top and splits downward.

Comparison: Bottom-Up vs Top-Down

| Feature | Bottom-Up (Agglomerative) | Top-Down (Divisive) |

|-----------------------|--------------------------------|-------------------------------|

| Initial Step | Start with individual points | Start with all points |

| Approach | Merge clusters | Split clusters |

| Complexity | O(n2 log n) | Higher, often exponential |

| Common Usage | More commonly used | Less common due to complexity |

| Output | Dendrogram | Dendrogram |

Visualization: Dendrogram

_________

| |

___|___ __|__

| | | |

A B C D

- The height of the lines represents the distance (or dissimilarity) at which merges/splits occur.

- You can cut the dendrogram at a certain height to decide the number of clusters.
Summary

- Hierarchical clustering builds nested clusters either bottom-up or top-down.

- No need to pre-define number of clusters.

- Useful when the hierarchical relationship between data is important.

- Visualized using dendrograms.

You might also like