Hierarchical Clustering in Data Mining

School of Data Science and Forecasting
MBA -Business Analytics
Presentation of Data Mining and Warehousing
On
Hierarchical Clustering Technique
Presented By:
Yashraj Nigam
Tanvi Bhave
Anjali Agarwal
Presented To:
Mr. Viney Sharma

CLUSTERING
Clustering is the task of dividing the population or data points into a number of groups such that data
points in the same groups are more similar to other data points in the same group and dissimilar to
the data points in other groups. It is basically a collection of objects on the basis of similarity and
dissimilarity between them. In simple words, the aim is to segregate groups with similar traits and
assign them into clusters.
Let’s understand this with an example. Suppose, you are the head of a rental store and wish to
understand preferences of your customers to scale up your business. Is it possible for you to look at
details of each customer and devise a unique business strategy for each one of them? Definitely not.
But, what you can do is to cluster all of your customers into say 10 groups based on their purchasing
habits and use a separate strategy for customers in each of these 10 groups. And this is what we call
clustering.

CLUSTERING APPLICATIONS
Clustering algorithms can be applied in many fields, for instance:
a) Marketing: finding groups of customers with similar behavior given a large
database of customer data containing their properties and past buying records
b) Biology: classification of plants and animals given their features
c) Libraries: book ordering
d) Insurance: identifying groups of motor insurance policy holders with a high
average claim cost; identifying frauds
e) City-planning: identifying groups of houses according to their house type,
value and geographical location
f) Earthquake studies: clustering observed earthquake epicenters to identify
dangerous zones

Types of Agglomerative Techniques
• Single-linkage Technique
• Complete-linkage Technique
• Average linkage Technique

Single-Linkage Technique
Minimum distance clustering is also called as single linkage hierarchical
clustering or nearest neighbor clustering. Distance between two clusters is
defined by the minimum distance between objects of the two clusters, as
shown below.

Complete-Linkage Technique
A connected component is a maximal set of connected points such that there
is a path connecting each pair. A clique is a set of points that are completely
linked with each other.

Implementing Hierarchical
Clustering on
WEKA

1. SELECT THE DATASET FOR CLUSTERING

2. CLICK ON CLUSTER TAB AND CHOOSE HIERARICHAL CLUSTERER

3. DOUBLE CLICK ON HIERARCHICAL CLUSTERER TO CHANGE
NUMBER OF CLUSTERS AND DISTANCE FUNCTION

4. CLICK ON START TO INITIATING CLUSTERING PROCESS

5. RIGHT CLICK ON RESULT AND SELECT VISUALIZE
CLUSTER ASSIGNMENT

Hierarchical Clustering in Data Mining

Hierarchical Clustering in Data Mining

More Related Content

What's hot

Similar to Hierarchical Clustering in Data Mining

Recently uploaded

Hierarchical Clustering in Data Mining