KEMBAR78
Hierarchical Clustering in Data Mining | PPTX
School of Data Science and Forecasting
MBA -Business Analytics
Presentation of Data Mining and Warehousing
On
Hierarchical Clustering Technique
Presented By:
Yashraj Nigam
Tanvi Bhave
Anjali Agarwal
Presented To:
Mr. Viney Sharma
CLUSTERING
Clustering is the task of dividing the population or data points into a number of groups such that data
points in the same groups are more similar to other data points in the same group and dissimilar to
the data points in other groups. It is basically a collection of objects on the basis of similarity and
dissimilarity between them. In simple words, the aim is to segregate groups with similar traits and
assign them into clusters.
Let’s understand this with an example. Suppose, you are the head of a rental store and wish to
understand preferences of your customers to scale up your business. Is it possible for you to look at
details of each customer and devise a unique business strategy for each one of them? Definitely not.
But, what you can do is to cluster all of your customers into say 10 groups based on their purchasing
habits and use a separate strategy for customers in each of these 10 groups. And this is what we call
clustering.
CLUSTERING APPLICATIONS
Clustering algorithms can be applied in many fields, for instance:
a) Marketing: finding groups of customers with similar behavior given a large
database of customer data containing their properties and past buying records
b) Biology: classification of plants and animals given their features
c) Libraries: book ordering
d) Insurance: identifying groups of motor insurance policy holders with a high
average claim cost; identifying frauds
e) City-planning: identifying groups of houses according to their house type,
value and geographical location
f) Earthquake studies: clustering observed earthquake epicenters to identify
dangerous zones
Types of Agglomerative Techniques
• Single-linkage Technique
• Complete-linkage Technique
• Average linkage Technique
Single-Linkage Technique
Minimum distance clustering is also called as single linkage hierarchical
clustering or nearest neighbor clustering. Distance between two clusters is
defined by the minimum distance between objects of the two clusters, as
shown below.
Complete-Linkage Technique
A connected component is a maximal set of connected points such that there
is a path connecting each pair. A clique is a set of points that are completely
linked with each other.
Pictorial Analysis
Implementing Hierarchical
Clustering on
WEKA
1. SELECT THE DATASET FOR CLUSTERING
2. CLICK ON CLUSTER TAB AND CHOOSE HIERARICHAL CLUSTERER
3. DOUBLE CLICK ON HIERARCHICAL CLUSTERER TO CHANGE
NUMBER OF CLUSTERS AND DISTANCE FUNCTION
4. CLICK ON START TO INITIATING CLUSTERING PROCESS
5. RIGHT CLICK ON RESULT AND SELECT VISUALIZE
CLUSTER ASSIGNMENT
6. INTERPRET THE RESULTS
Hierarchical Clustering in Data Mining

Hierarchical Clustering in Data Mining

  • 1.
    School of DataScience and Forecasting MBA -Business Analytics Presentation of Data Mining and Warehousing On Hierarchical Clustering Technique Presented By: Yashraj Nigam Tanvi Bhave Anjali Agarwal Presented To: Mr. Viney Sharma
  • 2.
    CLUSTERING Clustering is thetask of dividing the population or data points into a number of groups such that data points in the same groups are more similar to other data points in the same group and dissimilar to the data points in other groups. It is basically a collection of objects on the basis of similarity and dissimilarity between them. In simple words, the aim is to segregate groups with similar traits and assign them into clusters. Let’s understand this with an example. Suppose, you are the head of a rental store and wish to understand preferences of your customers to scale up your business. Is it possible for you to look at details of each customer and devise a unique business strategy for each one of them? Definitely not. But, what you can do is to cluster all of your customers into say 10 groups based on their purchasing habits and use a separate strategy for customers in each of these 10 groups. And this is what we call clustering.
  • 4.
    CLUSTERING APPLICATIONS Clustering algorithmscan be applied in many fields, for instance: a) Marketing: finding groups of customers with similar behavior given a large database of customer data containing their properties and past buying records b) Biology: classification of plants and animals given their features c) Libraries: book ordering d) Insurance: identifying groups of motor insurance policy holders with a high average claim cost; identifying frauds e) City-planning: identifying groups of houses according to their house type, value and geographical location f) Earthquake studies: clustering observed earthquake epicenters to identify dangerous zones
  • 5.
    Types of AgglomerativeTechniques • Single-linkage Technique • Complete-linkage Technique • Average linkage Technique
  • 6.
    Single-Linkage Technique Minimum distanceclustering is also called as single linkage hierarchical clustering or nearest neighbor clustering. Distance between two clusters is defined by the minimum distance between objects of the two clusters, as shown below.
  • 8.
    Complete-Linkage Technique A connectedcomponent is a maximal set of connected points such that there is a path connecting each pair. A clique is a set of points that are completely linked with each other.
  • 10.
  • 11.
  • 12.
    1. SELECT THEDATASET FOR CLUSTERING
  • 13.
    2. CLICK ONCLUSTER TAB AND CHOOSE HIERARICHAL CLUSTERER
  • 14.
    3. DOUBLE CLICKON HIERARCHICAL CLUSTERER TO CHANGE NUMBER OF CLUSTERS AND DISTANCE FUNCTION
  • 15.
    4. CLICK ONSTART TO INITIATING CLUSTERING PROCESS
  • 16.
    5. RIGHT CLICKON RESULT AND SELECT VISUALIZE CLUSTER ASSIGNMENT
  • 17.