0% found this document useful (0 votes)

24 views89 pages

Clustering

Clustering is an unsupervised learning technique that partitions data into subsets called clusters, where objects in a cluster are similar to each other but dissimilar to those in other clusters. Major clustering methods include partitioning (e.g., k-means), hierarchical, density-based (e.g., DBSCAN), and grid-based methods, each with unique characteristics and applications. Clustering is widely used in various fields such as marketing, city planning, and climate studies to identify patterns and group similar data points.

Uploaded by

yijac51850

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views89 pages

Clustering

Uploaded by

yijac51850

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 89

Clustering

Partitioning, Hierarchical, Density-Based

&
Grid-Based Methods
Clustering
• Clustering is the process of partitioning a set of data objects (or
observations) into subsets.
• Each subset is a cluster, such that objects in a cluster are similar to one
another, yet dissimilar to objects in other clusters..
• Clusters: objects within a cluster have high similarity, but are very
dissimilar to objects in other clusters.
• Dissimilarities and similarities are assessed based on the attribute
values describing.
• Clustering is known as unsupervised learning because the class label
information is not present. For this reason, clustering is a form of
learning by observation, rather than learning by examples.
Clustering
• Cluster analysis has been widely used in many applications such as:
• Information retrieval: document clustering
• Land use: Identification of areas of similar land use in an earth
observation database
• Marketing: Help marketers discover distinct groups in their customer
bases, and then use this knowledge to develop targeted marketing
programs
• City-planning: Identifying groups of houses according to their house
type, value, and geographical location
• Earth-quake studies: Observed earth quake epicenters should be
clustered along continent faults
• Climate: understanding earth climate, find patterns of atmospheric and
ocean
• Economic Science: market research
Major Clustering Approaches
• There are many clustering algorithms in the literature. It is difficult to
provide a crisp categorization of clustering methods because these
categories may overlap so that a method may have features from
several categories.
• The major fundamental clustering methods can be classified into the
following four categories:
1. Partitioning method:
• Construct various partitions and then evaluate them by some criterion,
e.g., minimizing the sum of square errors.
• Typical methods: k-means, k-medoids
2. Hierarchical method:
• Create a hierarchical decomposition of the set of data (or objects) using some
criterion.
• Typical methods: Diana, Agnes, BIRCH, CAMELEON
Major Clustering Approaches
• There are many clustering algorithms in the literature. It is difficult to
provide a crisp categorization of clustering methods because these
categories may overlap so that a method may have features from
several categories.
• The major fundamental clustering methods can be classified into the
following four categories:
3. Density-based method:
• Based on connectivity and density functions
• Typical methods: DBSACN, OPTICS, DenClue
4. Grid-based method:
• based on a multiple-level granularity structure
• Typical methods: STING, WaveCluster, CLIQUE
1. Partitioning Methods
• The simplest and most fundamental version of cluster analysis is
partitioning, which organizes the objects of a set into several exclusive
groups or clusters.
• Formally, given a dataset D, of n objects, and k, the number of clusters
to form, a partitioning algorithm organizes the objects into k partitions
(k ≤ n).
• The clusters are formed to optimize an objective partitioning criterion,
such as a dissimilarity function based on distance, so that the objects
within a cluster are “similar” to one another and “dissimilar” to objects
in other clusters in terms of the data set attributes.
1. Partitioning Methods: k-Means
• A centroid-based partitioning technique uses the centroid of a cluster
Ci, to represent that cluster.
• Conceptually, the centroid of a cluster is its center point.
• The centroid can be defined in various ways such as by the mean or
medoid of the objects (or points) assigned to the cluster.
• The difference between an object p ϵ Ci and ci, the representative of the
cluster, is measured by dist(p,ci), where dist(x,y) is the Euclidean
distance between two points x and y.
1. Partitioning Methods: k-Means
1. Partitioning Methods: k-Means
K-means - Example
• Problem:
• Suppose we have 4 types of medicines and each has two attributes (pH and
weight index). Our goal is to group these objects into K=2 group of medicine.

D
Medicine Weight pH-
Index
C
A 1 1

B 2 1

C 4 3 A B

D 5 4
1. Partitioning Methods: k-Means
K-means - Example
• Step 1: Use initial seed points for partitioning
c1 = A , c 2 = B

d( D , c1 ) = ( 5 − 1)2 + ( 4 − 1)2 = 5
d( D , c2 ) = ( 5 − 2)2 + ( 4 − 1)2 = 4.24

Assign each object to the cluster

with the nearest seed point
1. Partitioning Methods: k-Means
K-means - Example
• Step 2: Compute new centroids of the current partition
Knowing the members of each
cluster, now we compute the new
centroid of each group based on
these new memberships.
c1 = (1, 1)

2 + 4 + 5 1+ 3+ 4
c 2 =  , 
 3 3 
11 8
=( , )
3 3
1. Partitioning Methods: k-Means
K-means - Example
• Step 2: Renew membership based on new centroids

Compute the distance of all

objects to the new centroids

Assign the membership to objects

1. Partitioning Methods: k-Means
K-means - Example
• Step 3: Repeat the first two steps until its convergence
Knowing the members of each
cluster, now we compute the new
centroid of each group based on
these new memberships.

 1+ 2 1+1 1
c1 =  ,  = (1 , 1)
 2 2  2
 4+5 3+4 1 1
c2 =  ,  = (4 , 3 )
 2 2  2 2
1. Partitioning Methods: k-Means
K-means - Example
• Step 3: Repeat the first two steps until its convergence

Compute the distance of all objects

to the new centroids

Stop due to no new assignment

Membership in each cluster no
longer change
1. Partitioning Methods: k-Means
Strengths of k-means
• Simple: easy to understand and to implement
• Efficient: Time complexity: O(tkn), where n is the number of data
points, k is the number of clusters, and t is the number of iterations.
1. Partitioning Methods: k-Means
Weaknesses of k-means
• The algorithm is only applicable if the mean is defined.
• The user needs to specify k.
• The algorithm is sensitive to outliers
• Outliers are data points that are very far away from other data
points.
• Outliers could be errors in the data recording or some special data
points with very different values.
1. Partitioning Methods: k-Means
Weaknesses of k-means: Problems with outliers
1. Partitioning Methods: k-Means
Weaknesses of k-means: To deal with outliers
• One method is to remove some data points in the clustering process
that are much further away from the centroids than other data points.
• To be safe, we may want to monitor these possible outliers over a
few iterations and then decide to remove them.
• Another method is to perform random sampling. Since in sampling
we only choose a small subset of the data points, the chance of
selecting an outlier is very small.
• Assign the rest of the data points to the clusters by distance or
similarity comparison, or classification
1. Partitioning Methods: k-Means
Weaknesses of k-means
• The algorithm is sensitive to initial seeds.
1. Partitioning Methods: k-Means
Weaknesses of k-means
• If we use different seeds: good results
There are some
methods to help
choose good seeds
1. Partitioning Methods: k-Medoids
Partitioning Around Medoids (PAM) algorithm is a popular realization of
k-medoids clustering.
1. Partitioning Methods: k-Medoids
K-medoids Example
X1 2 6 Distances to X5 to X9
X2 3 4 X1 8 7
X3 3 8 X2 5 6
X4 5 7 X3 9 8
X5 6 2 Assume k=2 X4 7 6
X6 6 4 Select X5 and X9 as medoids
X6 2 3
X7 7 3
X7 2 3
X8 7 4
X8 3 2
X9 8 5
x10 5 2
x10 7 6

Current clustering: {X2,X5,X6,X7},{X1,X3,X4,X8,X9,X10}

1. Partitioning Methods: k-Medoids
K-medoids Example
• So, now let us choose some other point to be a medoid instead of X5 (6, 2). Let us
randomly choose X1 (2, 6).
• Not the new medoid set is: (2, 6) and (8, 5). Now repeating the same task as earlier:
Replace Before to X1 To X9 Change
X5 by X1 X1 2 6
X1 7 0 0 -7 X2 3 4
X2 5 3 6 -2 X3 3 8
X3 8 3 8 -5 X4 5 7
X4 6 4 6 -2 X5 6 2
X5 0 8 5 5 X6 6 4
X6 2 6 3 1 X7 7 3
X7 2 8 3 1 X8 7 4
X8 2 7 2 0 X9 8 5
X9 0 0 0 0 x10 7 6
x10 2 5 2 0
-9 Current clustering: {X1,X2,X3,X4},{X5,X6,X7,X8,X9,X10}
1. Partitioning Methods: k-Medoids
K-medoids Properties
• The complexity of each iteration is 𝑂(𝑘 𝑛 − 𝑘 2 )
• For large values of n and k, such computation becomes very costly
• Advantages
• K-Medoids method is more robust than k-Means in the presence
of noise and outliers
• Disadvantages
• K-Medoids is more costly that the k-Means method
• Like k-means, k-medoids requires the user to specify k
• It does not scale well for large data sets
2. Density-Based Methods
• Most partitioning methods cluster objects based on the distance
between objects. Such methods can find only spherical-shaped
clusters and encounter difficulty in discovering clusters of arbitrary
shapes.
• To find clusters of arbitrary shape, alternatively, we can model
clusters as dense regions in the data space, separated by sparse
regions.
2. Density-Based Methods
• The general idea is to continue growing a given cluster as long as the
density (number of objects or data points) in the “neighborhood”
exceeds some threshold.
• For example, for each data point within a given cluster, the
neighborhood of a given radius has to contain at least a minimum
number of points. Such a method can be used to filter out noise or
outliers and discover clusters of arbitrary shape.
2. Density-Based Methods
• Basic density-based clustering methods:
• DBSCAN, OPTICS and DENCLUE.

• Major features:
✓Discover clusters of arbitrary shape
✓Handle noise
✓One scan
✓Need density parameters as termination condition
2. Density-Based Methods:DBSCAN
• DBSCAN (Density-Based Spatial Clustering of Applications with
Noise)
• The density of an object o can be measured by the number of
objects close to o.
• DBSCAN finds core objects, that is, objects that have dense
neighborhoods. It connects core objects and their neighborhoods to
form dense regions as clusters.
2. Density-Based Methods:DBSCAN
• Two parameters:
• Eps (ϵ): Maximum radius of the neighbourhood
• MinPts: Minimum number of points in an Eps-neighbourhood of that
point
• NEps(p): {q belongs to D | dist(p,q) ≤ Eps}
• Directly density-reachable: A point p is directly density-reachable from a
point q w.r.t. Eps, MinPts if
• p belongs to NEps(q)
p MinPts = 5
• core point condition:
Eps = 1 cm
|NEps (q)| ≥ MinPts q
29
2. Density-Based Methods:DBSCAN
• Density-Reachable and Density-Connected
• Density-reachable:
• A point p is density-reachable from a point p
q w.r.t. Eps, MinPts if there is a chain of p1
points p1, …, pn, p1 = q, pn = p such that q
pi+1 is directly density-reachable from pi
• Density-connected
• A point p is density-connected to a point q p q
w.r.t. Eps, MinPts if there is a point o such
that both, p and q are density-reachable o
from o w.r.t. Eps and MinPts
2. Density-Based Methods:DBSCAN
• Given ϵ represents the radius of the circles, and MinPts = 3.
• Of the labeled points, m,p,o,r are core objects because each is in an
ϵ -neighborhood containing at least three points. Object q is directly
density-reachable from m. Object m is directly density-reachable
from p and vice versa.
• Object q is (indirectly) density-reachable from p because q is directly
density-reachable from m and m is directly density-reachable from p.
However, p is not density-reachable from q because q is not a core
object.
2. Density-Based Methods:DBSCAN
• Relies on a density-based notion of cluster: A cluster is defined as a
maximal set of density-connected points
• Discovers clusters of arbitrary shape in spatial databases with noise

Outlier

Border
Eps = 1cm
Core MinPts = 5

32
2. Density-Based Methods:DBSCAN
• Due to the fixed neighborhood size parameterized by ϵ, the density
of a neighborhood can be measured simply by the number of objects
in the neighborhood.
• To determine whether a neighborhood is dense or not, DBSCAN uses
another user-specified parameter, MinPts, which specifies the
density threshold of dense regions.
• An object is a core object if the ϵ-neighborhood of the object
contains at least MinPts objects.
• Core objects are the pillars of dense regions.
2. Density-Based Methods:DBSCAN
• Advantages
➢Clusters can have arbitrary shape and size
➢Number of clusters is determined automatically
➢Can separate clusters from surrounding noise
• Disadvantages
➢Input parameters may be difficult to determine
➢In some situations very sensitive to input parameter setting
2. Density-Based Methods:OPTICS
• DBSCAN required input parameters ϵ and MinPts, however, it is not
easy for the user to set these parameters.
• Such parameter settings are usually empirically set and difficult to
determine, especially for real-world, highdimensional datasets.
• Most algorithms are sensitive to these parameter values: Slightly
different settings may lead to very different clusterings of the data.
• To overcome the difficulty in using one set of global parameters in
clustering analysis, a cluster analysis method called OPTICS was
proposed.
• OPTICS does not explicitly produce a dataset clustering. Instead, it
outputs a cluster ordering.
2. Density-Based Methods:OPTICS
• Objects in a denser cluster are listed closer to each other in the
cluster ordering.
• This ordering is equivalent to density-based clustering obtained from
a wide range of parameter settings.
• Thus, OPTICS does not require the user to provide a specific density
threshold.
• To construct the different clusterings simultaneously, the objects are
processed in a specific order.
• This order selects an object that is density-reachable with respect to
the lowest ϵ value so that clusters with higher density (lower ϵ) will
be finished first.
2. Density-Based Methods:OPTICS
OPTICS needs two important pieces of information per object:
1. The core-distance of an object p is the smallest value ϵƴ such that the
ϵ-neighborhood
ƴ of p has at least MinPts objects.
• That is, ϵƴ is the minimum distance threshold that makes p a core
object. If p is not a core object with respect to ϵ and MinPts, the core-
distance of p is undefined.
2. The reachability-distance to object p from q is the minimum radius
value that makes p density-reachable from q.
• According to the definition of density-reachability, q has to be a core
object and p must be in the neighborhood of q.
• Therefore, the reachability-distance from q to p is max{core-
distance(q), dist(p, q)}. If q is not a core object with respect to ϵ and
MinPts, the reachability-distance to p from q is undefined.
3. Grid-Based Methods
• Grid-based clustering method takes a space-driven approach by
partitioning the embedding space into cells independent of the
distribution of the input objects.
• The grid-based clustering approach uses a multiresolution grid data
structure.
• It quantizes the object space into a finite number of cells that form a
grid structure on which all of the operations for clustering are
performed.
• The main advantage of the approach is its fast processing time,
which is typically independent of the number of data objects.
3. Grid-Based Methods: CLIQUE
• CLIQUE (CLustering In QUEst) is a simple grid-based method for
finding densitybased clusters in subspaces.
• CLIQUE partitions each dimension into nonoverlapping intervals,
thereby partitioning the entire embedding space of the data objects
into cells.
• It uses a density threshold to identify dense cells and sparse ones. A
cell is dense if the number of objects mapped to it exceeds the
density threshold.
3. Grid-Based Methods: CLIQUE
• CLIQUE is a density-based and grid-based subspace clustering
algorithm
• Grid-based: It discretizes the data space through a grid and estimates
the density by counting the number of points in a grid cell
• Density-based: A cluster is a maximal set of connected dense units in
a subspace
• A unit is dense if the fraction of total data points contained in the
unit exceeds the input model parameter.
• Subspace clustering: A subspace cluster is a set of neighboring dense
cells in an arbitrary subspace. It also discovers some minimal
descriptions of the clusters
• It automatically identifies subspaces of a high dimensional data
space that allow better clustering than original space using the
Apriori principle
3. Grid-Based Methods: CLIQUE
• Bottom-up approach
3. Grid-Based Methods: CLIQUE
• Bottom-up approach

• Apriori principle: If a collection of points S is a cluster in a k-

dimensional space, then S is also part of a cluster in any (k-1)
dimensional projections of this space
3. Grid-Based Methods: CLIQUE
Major Steps of the CLIQUE Algorithm:
• Identify subspaces that contain clusters
• Partition the data space and find the number of points that lie
inside each cell of the partition
• Identify the subspaces that contain clusters using the Apriori
principle
• Identify clusters
• Determine dense units in all subspaces of interests
• Determine connected dense units in all subspaces of interests
• Generate minimal descriptions for the clusters
• Determine maximal regions that cover a cluster of connected
dense units for each cluster
• Determine minimal cover for each cluster
3. Grid-Based Methods: CLIQUE
Strengths
• Automatically finds subspaces of the highest dimensionality as long
as high density clusters exist in those subspaces
• Insensitive to the order of records in input and does not presume
some canonical data distribution
• Scales linearly with the size of input and has good scalability as the
number of dimensions in the data increases O(Ck + mk)
• Simple method and interpretability of results
Weaknesses
• As in all grid-based clustering approaches, the quality of the results
crucially depends on the
• appropriate choice of the number and width of the partitions and
grid cells
4. Hierarchical
• A hierarchical clustering method works by grouping data objects
into a hierarchy or “tree” of clusters.
4. Hierarchical
Agglomerative versus Divisive Hierarchical Clustering:
• An agglomerative hierarchical clustering method uses a bottom-up
strategy.
• It typically starts by letting each object form its own cluster and
iteratively merges clusters into larger and larger clusters, until all the
objects are in a single cluster or certain termination conditions are
satisfied.

• A divisive hierarchical clustering method employs a top-down

strategy.
• It starts by placing all objects in one cluster, which is the hierarchy’s
root.
4. Hierarchical
Agglomerative versus Divisive Hierarchical Clustering:

Initialization:
Initialization: All objects stay in one cluster
Each object is a cluster Iteration:
Iteration: Select a cluster and split it into
Merge two clusters which are two sub clusters
most similar to each other; Until each leaf cluster contains
Until all objects are merged only one object
into a single cluster
4. Hierarchical
• Dendrogram representation for hierarchical clustering of data objects
{a,b,c,d,e}

• Each node on the tree is a cluster; each leaf node is a singleton

cluster
4. Hierarchical
How to Merge Clusters?
• How to measure the distance between clusters?
Single-link
Complete-link
Distance?
Average-link
Centroid distance

Hint: Distance between clusters is usually defined

on the basis of distance between objects.
4. Hierarchical
How to Define Inter-Cluster Distance

Single-link d min (Ci , C j ) = min d ( p, q)

pCi , qC j
Complete-link
Average-link The distance between two clusters is
represented by the distance of the closest pair of
Centroid distance data objects belonging to different clusters.
4. Hierarchical
How to Define Inter-Cluster Distance

Single-link d min (Ci , C j ) = max d ( p, q )

pCi , qC j
Complete-link
The distance between two clusters is
Average-link represented by the distance of the farthest pair
Centroid distance of data objects belonging to different clusters.
4. Hierarchical
How to Define Inter-Cluster Distance

Single-link
d min (Ci , C j ) = avg d ( p, q)
Complete-link pCi , qC j
Average-link The distance between two clusters is
Centroid distance represented by the average distance of all pairs
of data objects belonging to different clusters.
4. Hierarchical
How to Define Inter-Cluster Distance

 
mi,mj are the means of
Ci, Cj, respectively

Single-link d mean (Ci , C j ) = d (mi , m j )

Complete-link
Average-link The distance between two clusters is
represented by the distance between the
Centroid distance means of the cluters.
4. Hierarchical
Cluster Distance Measures
Example: Given a data set of five objects characterized by a single continuous feature, assume that there are
two clusters: C1: {a, b} and C2: {c, d, e}.

a b c d e
1 2 4 5 6

1. Calculate the distance matrix. 2. Calculate three cluster distances between C1 and C2.
a b c d e Single link:
dist (C1 , C 2 ) = min{d(a, c) , d(a, d), d(a, e), d(b, c), d(b, d), d(b, e)}
a 0 1 3 4 5 = min{3, 4, 5, 2, 3, 4} = 2

b 1 0 2 3 4 Complete link:
dist(C1 , C 2 ) = max{d(a, c) , d(a, d), d(a, e), d(b, c), d(b, d), d(b, e)}
c 3 2 0 1 2
= max{3, 4, 5, 2, 3, 4} = 5
d 4 3 1 0 1
Average: d(a, c) + d(a, d) + d(a, e) + d(b, c) + d(b, d) + d(b, e)
e 5 4 2 1 0 dist(C1 , C 2 ) =
6
3 + 4 + 5 + 2 + 3 + 4 21
= = = 3.5
6 6
4. Hierarchical
Agglomerative Algorithm
• The Agglomerative algorithm is carried out in three steps:

1) Convert all object features into a

distance matrix
2) Set each object as a cluster (thus if
we have N objects, we will have N
clusters at the beginning)
3) Repeat until number of cluster is one
(or known # of clusters)
▪ Merge two closest clusters
▪ Update “distance matrix”
4. Hierarchical
• Problem: clustering analysis with agglomerative algorithm

data matrix

Euclidean distance

distance matrix
4. Hierarchical
• Merge two closest clusters (iteration 1)
4. Hierarchical
• Update distance matrix (iteration 1)
4. Hierarchical
• Merge two closest clusters (iteration 2)
4. Hierarchical
• Update distance matrix (iteration 2)
4. Hierarchical
• Merge two closest clusters/update distance matrix (iteration 3)
4. Hierarchical
• Merge two closest clusters/update distance matrix (iteration 4)
4. Hierarchical
• Final result (meeting termination condition)
4. Hierarchical
• Dendrogram tree representation
1. In the beginning we have 6
clusters: A, B, C, D, E and F
2. We merge clusters D and F into
6 cluster (D, F) at distance 0.50
3. We merge cluster A and cluster B
into (A, B) at distance 0.71
lifetime

4. We merge clusters E and (D, F)

5 into ((D, F), E) at distance 1.00
5. We merge clusters ((D, F), E) and C
4 into (((D, F), E), C) at distance 1.41
3 6. We merge clusters (((D, F), E), C)
2
and (A, B) into ((((D, F), E), C), (A, B))
at distance 2.50
7. The last cluster contain all the objects,
object thus conclude the computation
4. Hierarchical
• Dendrogram tree representation

• For a dendrogram tree, its horizontal axis

indexes all objects in a given data set, while
6
its vertical axis expresses the lifetime of all
possible cluster formation.
lifetime

5 • The lifetime of a cluster (individual cluster)

in the dendrogram is defined as a distance
4
interval from the moment that the cluster is
3
2 created to the moment that it disappears by
merging with other clusters.

object
4. Hierarchical
Which Distance Measure is Better?
• Each method has both advantages and disadvantages; application-
dependent, single-link and complete-link are the most common
methods
• Single-link
• Can find irregular-shaped clusters
• Sensitive to outliers, suffers the so-called chaining effects
• In order to merge two groups, only need one pair of points to
be close, irrespective of all others. Therefore clusters can be
too spread out, and not compact enough
• Average-link, and Centroid distance
• Robust to outliers
• Tend to break large clusters
4. Hierarchical: AGNES
• AGNES : AGglomerative NESting
• Use single-link method
• Merge nodes that have the least dissimilarity
• Eventually all objects belong to the same cluster

10 10 10

9 9 9

8 8 8

7 7 7

6 6 6

5 5 5

4 4 4

3 3 3

2 2 2

1 1 1

0 0 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
4. Hierarchical: BIRCH
• BIRCH: Balanced Iterative Reducing and Clustering using Hierarchies.
• BIRCH uses the notions of clustering feature to summarize a cluster,
and clustering feature tree (CF-tree) to represent a cluster hierarchy.
• These structures help the clustering method achieve good speed
and scalability in large or even streaming databases, and also make
it effective for incremental and dynamic clustering of incoming
objects.
4. Hierarchical: BIRCH
BIRCH: Key Components
• Clustering Feature (CF)
• Summary of the statistics for a given cluster: the 0-th, 1st and 2nd
moments of the cluster from the statistical point of view
• A CF entry has sufficient information to calculate the centroid,
radius, diameter and many other distance measures
• Additively theorem allows us to merge sub-clusters incrementally
• CF-Tree
• height-balanced tree
• two parameters:
• number of entries in each node
• The diameter of all entries in a leaf node
• Leaf nodes are connected via prev and next pointers
4. Hierarchical: BIRCH
Clustering Feature
• Clustering Feature (CF): CF = (N, LS, SS)
• N: Number of data points

• LS: linear sum of N points: σ𝑵

𝒊=𝟏 𝑿𝒊

• SS: square sum of N points: σ𝑵 𝑿𝟐

𝒊=𝟏 𝒊
4. Hierarchical: BIRCH
Distance Measures:
• Given a cluster with data points , 
N
( X − X ) 2

R=( i =1 i 0
)1/ 2
• Centroid: N
σ𝑵
𝒊=𝟏 𝑿𝒊 𝑳𝑺
𝑿𝟎 = =
𝑵 𝑵

N 2 2
(Xi − 2X0 Xi + X0 )
=( i =1
)1/ 2
• Radius: average distance from member N
points to centroid
 X i − 2i =1 X i X 0 + i =1 X 0 )
N 2 N N 2

=( i =1
)1/ 2

N
(Xi − X0) 2
N
R=( i =1
)1/ 2
 X i − 2 X 0 i =1 X i + N X 0
N 2 N 2
N
• Diameter: average pair-wise distance =( i =1
)1/ 2
N
within a cluster
LS LS
SS − 2  LS + N ( ) 2
 
N N
(Xi − X j ) 2
N N )1/ 2
D=( i =1 j =1
)1/ 2 =(
N ( N − 1) N
4. Hierarchical: BIRCH
Clustering Feature:
• Suppose cluster C1 has CF1=(N1, LS1 ,SS1),
cluster C2 has CF2 =(N2,LS2,SS2) CF1= 〈3, (2+3+4 , 5+2+3), (𝟐𝟐 + 𝟑𝟐 + 𝟒𝟐 , 𝟓𝟐 + 𝟐𝟐 + 𝟑𝟐 )〉
• If we merge C1 with C2, the CF for the = 〈3, (9,10), (29 ,38)〉
merged cluster C is
CF = CF1 + CF2
= ( N1 + N 2 , LS1 + LS 2 , SS1 + SS 2 )
• Why CF?
• Summarized info for single cluster
CF3=CF1+CF2= 〈3+3, (9+35, 10+36), (29+417 , 38+440)〉
• Summarized info for two clusters = 〈6, (44,46), (446 ,478)〉
• Additive theorem
4. Hierarchical: BIRCH
CF-Tree:
• A CF tree is a height-balanced tree with two parameters: branching
factor B and threshold T.
• Branching factor:
• B : Each non-leaf node contains at most B entries of the form [CFi, childi] ,
where I = 1, 2, …, B, and childi is a pointer to its i-th child node. CFi Is the CF
of the sub-cluster represented by the childi.
• L : A leaf node contains at most L entries, each of the form [CFi], where I = 1,
2, …, L. In addition, each leaf node has two pointers, “prev” and “next”
which are used to chain all leaf node together
• A leaf node also represent a cluster made up of all the sub-clusters
represented by its entries.
• Threshold T:
• The diameter (alternatively, the radius) of all entries in a leaf node is at
most T
• Leaf nodes are connected via prev and next pointers
4. Hierarchical: BIRCH
Example of CF Tree: Root
B=6 CF1 CF2 CF3 CF6

L=5 child1 child2 child3 child6

Non-leaf node
CF9 CF10 CF11 CF13

child1 child2 child3 child5

Leaf node Leaf node

prev CF90 CF91 CF94 next prev CF95 CF96 CF98 next
4. Hierarchical: BIRCH
CF Tree Insertion:
• Given entry “Ent”, it proceeds as below:
• Identifying the appropriate leaf
• recursively descending the CF tree and choosing the closest child node
according to a chosen distance metric
• Modifying the leaf:
• When it reaches a leaf node, it finds the closest leaf entry, say 𝐿𝑖 , and
then test whether 𝐿𝑖 can absorb the node without violating the
threshold. If so, the CF vector for 𝐿𝑖 is updated to reflect this.
• If not, a new entry for “Ent” is added to the leaf. If there is no room,
split the node
• Modifying the path:
• update CF information up the path.
4. Hierarchical: BIRCH
Example:
𝒙𝟏 0.5
• Enter the first data 𝒙𝟏
𝒙𝟐 0.25 • The root node is initialized with the CF values of
the first data value
𝒙𝟑 0
𝒙𝟒 0.65
• A new leaf Leaf1 is created, and BIRCH assigns the
first record 𝒙𝟏 to Leaf1
𝒙𝟓 1
• Leaf1 contains only one record and hence the
𝒙𝟔 1.4 radius is zero and thus less than T=0.15
𝒙𝟕 1.1 Root-Node 0
CF1: n=1, Ls=0.5, SS = 0.25
• T = 0.15
• L=2
Leaf1:R=0
• B=2
𝒙𝟏 =0.5
4. Hierarchical: BIRCH
Example:
𝒙𝟏 0.5 • The second data value 𝒙𝟐 =0.25 is entered.
𝒙𝟐 0.25 • BIRCH tentatively passes 𝒙𝟐 =0.25 to Leaf1.
𝒙𝟑 0
𝒙𝟒 0.65
• The radius of Leaf1 is now R = 0.126 < T = 0.15,
𝒙𝟓 1
so 𝒙𝟐 is assigned to Leaf1.
𝒙𝟔 1.4
𝒙𝟕 1.1 Root-Node 0 Root-Node 0
CF1: n=1, Ls=0.5, SS = 0.25 CF1: n=2, Ls=0.75, SS = 0.313
• T = 0.15
• L=2
Leaf1:R=0 Leaf1:R=0.126
• B=2
𝒙𝟏 =0.5,
𝒙𝟏 =0.5
𝒙𝟐 = 0.25
4. Hierarchical: BIRCH
Example:
• The third data value 𝒙𝟑 = 𝟎 is entered.
• BIRCH tentatively passes 𝒙𝟑 = 𝟎 to Leaf1.
• However, the radius of Leaf1 now increases to 𝑅 = 0.205 > 𝑇 = 0.15. The Threshold value
T=0.15 is exceeded, so x3 is not assigned to Leaf1.
• Instead, a new leaf is initialized, called Leaf2, containing x3 only. Node splitting is done by
choosing the farthest pair of entries as seeds, and redistributing the remaining entries based on
the closest criteria
Root-Node 0
𝒙𝟏 0.5 CF1: n=2, Ls=0.75, SS = 0.313 CF2: n=1, Ls=0.0, SS = 0
𝒙𝟐 0.25
𝒙𝟑 0
𝒙𝟒 0.65 • T = 0.15
𝒙𝟓 1 •L=2 Leaf1:R=0.126 Leaf2:R=0
𝒙𝟔 1.4
•B=2 𝒙𝟏 =0.5,
𝒙3 =0
𝒙𝟕 1.1 𝒙𝟐 = 0.25
4. Hierarchical: BIRCH
Example:

• The fourth data value 𝒙𝟒 = 𝟎. 𝟔𝟓 is entered. BIRCH compares 𝒙𝟒 to the

locations of CF1 and CF2.
𝑳𝑺 𝟎.𝟕𝟓
• The location is measured by 𝒙 ഥ = . We have 𝒙 ഥ𝑪𝑭𝟏 = = 𝟎. 𝟑𝟕𝟓 and
𝟎 𝒏 𝟐
ഥ𝑪𝑭𝟐 = = 𝟎
𝒙
𝟏
• The data point 𝒙𝟒 = 𝟎. 𝟔𝟓 is thus closer to CF1 than to CF2. Thus, BIRCH
tentatively passes x4 to CF1.
• The radius of CF1 now increases to R=0.166>T=0.15. The Threshold value
T=0.15 is exceeded, so 𝒙𝟒 is not assigned to CF1. Instead, we would like to
initialize a new leaf.
• However, B=2 means that we cannot have three leaf's in a leaf node. We
must therefore split the root node into (i) Node1, which has as its children
Leaf1 and Leaf2, and (ii) Node2, whose only leaf Leaf3 contains only 𝒙𝟒
• T = 0.15 𝒙𝟏 0.5
4. Hierarchical: BIRCH •L=2
𝒙𝟐 0.25
Example: 𝒙𝟑 0
•B=2 𝒙𝟒 0.65
Root-Node 0 𝒙𝟓 1
CF12: n=3, Ls=0.75, SS = 0.313 CF3: n=1, Ls=0.65, SS = 0.423 𝒙𝟔 1.4
𝒙𝟕 1.1

Node 1 Node 2
CF1: n=2, Ls=0.75, SS = 0.313 CF2: n=1, Ls=0.0, SS = 0 CF3: n=1, Ls=0.65, SS = 0.423

Leaf1:R=0.126 Leaf2:R=0. Leaf3:R=0.

𝒙𝟏 =0.5,
𝒙3 =0 𝒙4 =0.65
𝒙𝟐 = 0.25
4. Hierarchical: BIRCH
Example:
• The fifth data value 𝑥5 = 1 is entered.
• BIRCH compares 𝑥5 = 1 with the location of 𝐶𝐹12
and 𝐶𝐹3 .
• We have 𝑥𝐶𝐹12 = 0.75Τ3 = 𝟎. 𝟐𝟓 and 𝑥𝐶𝐹𝟑 = 0.65 ∕
1 = 0.65. The data point 𝑥5 = 1 is thus closer to
𝐶𝐹3 than to 𝐶𝐹12 .
• BIRCH passes 𝑥5 to 𝐶𝐹3 . The radius of 𝐶𝐹3 now
increases to 𝑅 = 0.175 > 𝑇 = 0.15, so 𝑥5 cannot be
assigned to 𝐶𝐹3 .
• Instead, a new leaf in leaf node Leaf4 is initialized,
with 𝐶𝐹, 𝐶𝐹4 , containing 𝑥5 only. The summary
statistics for 𝐶𝐹34 are updated.
• T = 0.15 𝒙𝟏 0.5
4. Hierarchical: BIRCH •L=2
𝒙𝟐 0.25
Example: 𝒙𝟑 0
•B=2 𝒙𝟒 0.65
Root-Node 0
𝒙𝟓 1
CF12: n=3, Ls=0.75, SS = 0.313 CF34: n=2, Ls=1.65, SS = 1.423 𝒙𝟔 1.4
𝒙𝟕 1.1

Node 1 Node 2
CF1: n=2, Ls=0.75, CF2: n=1, Ls=0.0, CF3: n=1, Ls=0.65, CF4: n=1, Ls=1, SS
SS = 0.313 SS = 0 SS = 0.423 =1

Leaf1:R=0.126 Leaf2:R=0. Leaf3:R=0. Leaf4:R=0.

𝒙𝟏 =0.5, 𝒙3 =0 𝒙4 =0.65 𝒙5 =1
𝒙𝟐 = 0.25

83
4. Hierarchical: BIRCH
Example:
• The sixth data value 𝑥6 = 1.4 is entered. At the root node, BIRCH compares 𝑥6 =
1.4 with the location of 𝐶𝐹12 and 𝐶𝐹34 .
• We have 𝑥𝐶𝐹12 = 0.75 ∕ 3 = 0.25 and 𝑥𝐶𝐹34 = 1.65 ∕ 2 = 0.825. The data point
𝑥6 = 1.4 is thus closer to 𝐶𝐹34 , and BIRCH passes 𝑥6 to 𝐶𝐹34 .
• The record descends to Node2, and BIRCH compares 𝑥6 = 1.4 with the location
of 𝐶𝐹3 and 𝐶𝐹4 .
• We have 𝑥𝐶𝐹3 = 0.65 and 𝑥𝐶𝐹4 = 1. The data point 𝑥6 = 1.4 is thus closer to
𝐶𝐹4 than to 𝐶𝐹3 . BIRCH tentatively passes 𝑥6 to 𝐶𝐹4 .
• The radius of 𝐶𝐹4 now increases to 𝑅 = 0.2 > 𝑇 = 0.15. The Threshold value
T=0.15 is exceeded, so 𝑥6 is not assigned to 𝐶𝐹4 . But the branching factor B=2
means that we may have at most two leaf nodes branching off of any non-leaf
node. Therefore, we will need a new set of non-leaf nodes, Node2.1 and
Node2.2, branching off from Node2.
• Node2.1 contains 𝐶𝐹𝟑 and 𝐶𝐹4 , while Node2.2 contains the desired new 𝐶𝐹𝟓 and
the new leaf node Leaf5 as its only child, containing only the information from 𝑥6 .
• T = 0.15 𝒙𝟏 0.5
4. Hierarchical: BIRCH •L=2 𝒙𝟐 0.25
Example: 𝒙𝟑 0
•B=2 𝒙𝟒 0.65
Root-Node 0
𝒙𝟓 1
CF12: n=3, Ls=0.75, SS = 0.313 CF345: n=3, Ls=3.05, SS = 3.393
𝒙𝟔 1.4
𝒙𝟕 1.1
Node 1 Node 2
CF1: n=2, Ls=0.75, CF2: n=1, Ls=0.0, CF34: n=2, Ls=1.65, CF5: n=1, Ls=1.4,
SS = 0.313 SS = 0 SS = 1.423 SS = 1.96

Leaf1:R=0.126 Leaf2:R=0. Node 2.1 Node 2.2

CF3: n=1, Ls=0.65, CF4: n=1, Ls=1, CF5: n=1, Ls=1.4,
𝒙𝟏 =0.5, SS = 0.423 SS = 1 SS = 1.96
𝒙3 =0
𝒙𝟐 = 0.25
Leaf3:R=0. Leaf4:R=0. Leaf5:R=0.

𝒙4 =0.65 𝒙5 =1 𝒙6 =1.4

85
• T = 0.15 𝒙𝟏 0.5
4. Hierarchical: BIRCH •L=2 𝒙𝟐 0.25
Example: 𝒙𝟑 0
•B=2 𝒙𝟒 0.65
Root-Node 0
𝒙𝟓 1
CF12: n=3, Ls=0.75, SS = 0.313 CF345: n=4, Ls=4.15, SS = 4.212
𝒙𝟔 1.4
𝒙𝟕 1.1
Node 1 Node 2
CF1: n=2, Ls=0.75, CF2: n=1, Ls=0.0, CF34: n=3, Ls=2.75, CF5: n=1, Ls=1.4,
SS = 0.313 SS = 0 SS = 2.252 SS = 1.96

Leaf1:R=0.126 Leaf2:R=0. Node 2.1 Node 2.2

CF3: n=1, Ls=0.65, CF4: n=2, Ls=2.1, CF5: n=1, Ls=1.4,
𝒙𝟏 =0.5, SS = 0.423 SS = 2.21 SS = 1.96
𝒙3 =0
𝒙𝟐 = 0.25
Leaf3:R=0. Leaf4:R=0.05 Leaf5:R=0.

𝒙4 =0.65 𝒙5 =1 𝒙6 =1.4
𝒙7 =1.1
86
Summary
• Data sets are made up of data objects. A data object represents an
entity. Data objects are described by attributes. Attributes can be
nominal, binary, ordinal, or numeric.
• The values of a nominal (or categorical) attribute are symbols or names
of things, where each value represents some kind of category, code, or
state.
• Binary attributes are nominal attributes with only two possible states
(such as 1 and 0 or true and false). If the two states are equally
important, the attribute is symmetric; otherwise it is asymmetric.
• An ordinal attribute is an attribute with possible values that have a
meaningful order or ranking among them, but the magnitude between
successive values is not known.
Summary
• A numeric attribute is quantitative (i.e., it is a measurable quantity)
represented in integer or real values. Numeric attribute types can be
interval-scaled or ratio-scaled. The values of an interval-scaled attribute
are measured in fixed and equal units. Ratio-scaled attributes are
numeric attributes with an inherent zero-point. Measurements are
ratio-scaled in that we can speak of values as being an order of
magnitude larger than the unit of measurement.
• Measures of object similarity and dissimilarity are used in data mining
applications such as clustering, outlier analysis, and nearest-neighbor
classification. Such measures of proximity can be computed for each
attribute type, or for combinations of such attributes. Examples include
the Jaccard coefficient for asymmetric binary attributes and Euclidean,
Manhattan, and Minkowski. For applications involving sparse numeric
data vectors, such as term-frequency vectors, the cosine measure.
References
• Jiawei Han, Micheline Kamber and Jian Pei, Data Mining: Concepts and
Techniques, Morgan Kaufmann, 3rd Edition.
• Tan, Pang-Ning, Michael Steinbach, and Vipin Kumar, Introduction to
data mining, Pearson India.

DWMModule 4
No ratings yet
DWMModule 4
31 pages
Cluster
No ratings yet
Cluster
20 pages
Clustering Explanation
No ratings yet
Clustering Explanation
8 pages
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
No ratings yet
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
42 pages
Clustering
No ratings yet
Clustering
25 pages
Cluster Analysis and Methods Overview
No ratings yet
Cluster Analysis and Methods Overview
47 pages
(3rd Year) Pattern REcognition Lecture 4
No ratings yet
(3rd Year) Pattern REcognition Lecture 4
48 pages
Data Mining-Partitioning Methods
100% (1)
Data Mining-Partitioning Methods
7 pages
Partitioning Methods & Hierachical Methods
No ratings yet
Partitioning Methods & Hierachical Methods
22 pages
Data Mining Clustering
No ratings yet
Data Mining Clustering
76 pages
Unit V - Clustering
No ratings yet
Unit V - Clustering
19 pages
Clustering
No ratings yet
Clustering
18 pages
Lecture 8 - Clustering
No ratings yet
Lecture 8 - Clustering
23 pages
2002 Spring CS525 Lecture 2
No ratings yet
2002 Spring CS525 Lecture 2
37 pages
05 Clustering
No ratings yet
05 Clustering
96 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
101 pages
Partitioning Methods
100% (1)
Partitioning Methods
3 pages
Clustering
No ratings yet
Clustering
32 pages
Lecture 3.2.3 3.2.4
No ratings yet
Lecture 3.2.3 3.2.4
28 pages
Clustering Methods
No ratings yet
Clustering Methods
64 pages
Unit - 5 Cluster Analysis
No ratings yet
Unit - 5 Cluster Analysis
83 pages
Clustering
No ratings yet
Clustering
24 pages
DWDM Unit 3
No ratings yet
DWDM Unit 3
21 pages
Unit IV
No ratings yet
Unit IV
96 pages
Cluster Analysis for Researchers
No ratings yet
Cluster Analysis for Researchers
76 pages
Slide-08-Chapter10-Cluster Analysis Basic Concept I
No ratings yet
Slide-08-Chapter10-Cluster Analysis Basic Concept I
40 pages
Clustering
No ratings yet
Clustering
7 pages
Clustering
No ratings yet
Clustering
9 pages
Clustering K Means Agnes
No ratings yet
Clustering K Means Agnes
36 pages
Clustering Techniques Overview
No ratings yet
Clustering Techniques Overview
20 pages
Unit IV Cluster Analysis
No ratings yet
Unit IV Cluster Analysis
7 pages
Cluster Analysis Essentials
No ratings yet
Cluster Analysis Essentials
50 pages
Cluster Analysis Clustering
No ratings yet
Cluster Analysis Clustering
6 pages
4.1 Clustering
No ratings yet
4.1 Clustering
69 pages
Cluster Analysis: Basic Concepts Partitioning Methods Hierarchical Methods Density-Based Methods Grid-Based Methods Evaluation of Clustering
No ratings yet
Cluster Analysis: Basic Concepts Partitioning Methods Hierarchical Methods Density-Based Methods Grid-Based Methods Evaluation of Clustering
38 pages
Unit 5
No ratings yet
Unit 5
85 pages
Clustering
No ratings yet
Clustering
104 pages
8 - Clustering
No ratings yet
8 - Clustering
85 pages
Unit4 ML
No ratings yet
Unit4 ML
20 pages
Amity School of Engineering and Technology Amity University, Uttar Pradesh
No ratings yet
Amity School of Engineering and Technology Amity University, Uttar Pradesh
5 pages
DWDM Unit V Note
No ratings yet
DWDM Unit V Note
19 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
101 pages
Cluster Analysis - Approach 1
No ratings yet
Cluster Analysis - Approach 1
28 pages
Unit-5 DM
No ratings yet
Unit-5 DM
11 pages
Introduction To Cluster Analysis.
No ratings yet
Introduction To Cluster Analysis.
53 pages
IT3080 Lecture04 2023
No ratings yet
IT3080 Lecture04 2023
56 pages
Chapter 5. Clustering Algorithms-Stud
No ratings yet
Chapter 5. Clustering Algorithms-Stud
44 pages
Clustering Algorithms
No ratings yet
Clustering Algorithms
93 pages
Clustering
No ratings yet
Clustering
125 pages
Clustering
No ratings yet
Clustering
11 pages
Data Mining - Lecture 9
No ratings yet
Data Mining - Lecture 9
29 pages
Cluster Analysis
No ratings yet
Cluster Analysis
21 pages
Cluster Analysis for CS Students
No ratings yet
Cluster Analysis for CS Students
43 pages
Clustering
No ratings yet
Clustering
41 pages
Clustering Data Mining
No ratings yet
Clustering Data Mining
27 pages
M5
No ratings yet
M5
40 pages
Clustering
No ratings yet
Clustering
18 pages
Clustering Algorithm
No ratings yet
Clustering Algorithm
47 pages
105linked Lists Malik Ch05
No ratings yet
105linked Lists Malik Ch05
77 pages
Basics - Rust Cheat Sheet
No ratings yet
Basics - Rust Cheat Sheet
2 pages
C++ STL: A Guide for Developers
No ratings yet
C++ STL: A Guide for Developers
6 pages
Freshman Stock Exchange Project
No ratings yet
Freshman Stock Exchange Project
24 pages
1.1 - Analysis of Algorithms
No ratings yet
1.1 - Analysis of Algorithms
37 pages
Problems and Solution - Bankers Algorithm
No ratings yet
Problems and Solution - Bankers Algorithm
7 pages
Advanced Algorithms Homework
100% (1)
Advanced Algorithms Homework
12 pages
Data Structures and Algorithm Analysis in Java 3rd Edition Weiss Fast Access
No ratings yet
Data Structures and Algorithm Analysis in Java 3rd Edition Weiss Fast Access
318 pages
Linked List Based Algo's
No ratings yet
Linked List Based Algo's
7 pages
For Solving The System of Linear Algebraic Equations: Numerical Matrix Methods
No ratings yet
For Solving The System of Linear Algebraic Equations: Numerical Matrix Methods
15 pages
Single Linked List Lab Manual
No ratings yet
Single Linked List Lab Manual
9 pages
Algorithm Exercises & Solutions
100% (2)
Algorithm Exercises & Solutions
28 pages
Mmte 007
No ratings yet
Mmte 007
5 pages
Trees in C++
100% (1)
Trees in C++
68 pages
Lecture 8: Python Functions & Search Algorithms
No ratings yet
Lecture 8: Python Functions & Search Algorithms
3 pages
DSA Roadmap With Resources
No ratings yet
DSA Roadmap With Resources
3 pages
Numerical Analysis Exam Review Answers
No ratings yet
Numerical Analysis Exam Review Answers
7 pages
Statement of Clear Goals: 2. Use of Appropriate Methods
No ratings yet
Statement of Clear Goals: 2. Use of Appropriate Methods
8 pages
Brachistochrone ProblemIPOPTinfo
No ratings yet
Brachistochrone ProblemIPOPTinfo
2 pages
Handbook of Data Structures and Applications Dinesh P. Mehta PDF Download
100% (1)
Handbook of Data Structures and Applications Dinesh P. Mehta PDF Download
53 pages
Ds Final Lab Report
No ratings yet
Ds Final Lab Report
52 pages
AI Assignment Questions
No ratings yet
AI Assignment Questions
4 pages
Lec5 - Hashing
No ratings yet
Lec5 - Hashing
16 pages
Backtracking C++
No ratings yet
Backtracking C++
7 pages
Doubly Linked List Operations Guide
No ratings yet
Doubly Linked List Operations Guide
48 pages
2019-Dec EE-712 23
No ratings yet
2019-Dec EE-712 23
1 page
Data Structures Notes
No ratings yet
Data Structures Notes
9 pages
Module: - 3 Z-Transform
No ratings yet
Module: - 3 Z-Transform
23 pages
Acoustic Echo Cancellation
No ratings yet
Acoustic Echo Cancellation
10 pages
downloadMathsA LevelPapersEdexcel FurtherDecision 1QPJune20201920QP PDF
No ratings yet
downloadMathsA LevelPapersEdexcel FurtherDecision 1QPJune20201920QP PDF
32 pages

Clustering

Uploaded by

Clustering

Uploaded by

Clustering

Partitioning, Hierarchical, Density-Based

Assign each object to the cluster

Compute the distance of all

Assign the membership to objects

Compute the distance of all objects

Stop due to no new assignment

Current clustering: {X2,X5,X6,X7},{X1,X3,X4,X8,X9,X10}

• Apriori principle: If a collection of points S is a cluster in a k-

• A divisive hierarchical clustering method employs a top-down

• Each node on the tree is a cluster; each leaf node is a singleton

Hint: Distance between clusters is usually defined

Single-link d min (Ci , C j ) = min d ( p, q)

Single-link d min (Ci , C j ) = max d ( p, q )

Single-link d mean (Ci , C j ) = d (mi , m j )

1) Convert all object features into a

4. We merge clusters E and (D, F)

• For a dendrogram tree, its horizontal axis

5 • The lifetime of a cluster (individual cluster)

• LS: linear sum of N points: σ𝑵

• SS: square sum of N points: σ𝑵 𝑿𝟐

L=5 child1 child2 child3 child6

child1 child2 child3 child5

Leaf node Leaf node

• The fourth data value 𝒙𝟒 = 𝟎. 𝟔𝟓 is entered. BIRCH compares 𝒙𝟒 to the

Leaf1:R=0.126 Leaf2:R=0. Leaf3:R=0.

Leaf1:R=0.126 Leaf2:R=0. Leaf3:R=0. Leaf4:R=0.

Leaf1:R=0.126 Leaf2:R=0. Node 2.1 Node 2.2

Leaf1:R=0.126 Leaf2:R=0. Node 2.1 Node 2.2

You might also like