KEMBAR78
Clustering | PDF | Cluster Analysis | Statistical Classification
0% found this document useful (0 votes)
46 views18 pages

Clustering

This document discusses different types of clustering methods for data mining including partitioning methods like k-means and k-medoids, hierarchical clustering, grid-based clustering, density-based clustering like DBSCAN and DENCLUE, and probabilistic clustering methods. It provides details on how several of these methods work and their features.

Uploaded by

pppchan23100
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views18 pages

Clustering

This document discusses different types of clustering methods for data mining including partitioning methods like k-means and k-medoids, hierarchical clustering, grid-based clustering, density-based clustering like DBSCAN and DENCLUE, and probabilistic clustering methods. It provides details on how several of these methods work and their features.

Uploaded by

pppchan23100
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Clustering

Data Mining:
Data Mining Methods
with Dr. Qin Lv
Learning objective: Apply techniques
for clustering and explain how they work.
Evaluate and compare methods.
Clustering
Ø No predefined
classes
Ø Intra-cluster
similarity
Ø Inter-cluster
dissimilarity
Cluster Analysis
Ø Unsupervised learning
• Group similar objects into clusters
Ø Similarity measure
• Types of objects, similarity/dissimilarity
Ø Clustering method
• Quality, efficiency, incremental
Cluster Evaluation
Ø Clustering tendency
Ø Cluster cohesion & separation
Ø #clusters (e.g., silhouette coefficient)
Ø Comparison with external knowledge
Ø Comparison of two sets of clusters
Types of Clustering Methods
Ø Partitioning methods
Ø Hierarchical methods
Ø Grid-based methods
Ø Density-based methods
Ø Probabilistic methods
Partitioning Methods
Ø Given n objects and #clusters k
• Partition the n objects into k clusters
Ø Brute force approach
• Enumerate all possible partitions
Ø Heuristic methods
• k-means: cluster centroid (mean of objects)
• k-medoids: cluster medoid (“central” object)
k-means Clustering: Method
Ø 1. Pick k initial centroids (e.g., randomly)
Ø 2. Assign each object to nearest centroid
Ø 3. Update each centroid based on objects
assigned to its cluster
Ø Repeat 2. & 3. until centroids are stable
Ø O(nkt): n objects, k clusters, t iterations
k-means Clustering: Example
Ø 10 objects: {35, 69, 9, 78, 9, 23, 81, 57, 15, 48}.
Ø 2 initial centroids: 30, 60
Ø Sort: {9, 9, 15, 23, 35, 48, 57, 69, 78, 81}
Ø R1: 30 {9, 9, 15, 23, 35}, 60 {48, 57, 69, 78, 81}
Ø C_1 = (9 + 9 + 15 + 23 + 35) / 5 = 18.2
Ø C_2 = (48 + 57 + 69 + 78 + 81) / 5 = 66.6
Ø R2: 18.2 {9, 9, 15, 23, 35}, 66.6 {48, 57, 69, 78, 81}
k-means Clustering: Features
Ø Widely-used, efficient and good results
Ø Need to specify k & define centroid
Ø Choice of initial centroids
Ø Not suitable for non-convex shapes
Ø Sensitive to noise & outliers
k-medoids Clustering
Ø Similar process as k-means
Ø Cluster medoid: “central” object
Ø Less sensitive to noise & outliers
Ø Medoid update: computation expensive
Ø Speedup using randomized samples
Hierarchical Clustering: Method
Ø Dendrogram
• Tree of clusters
Ø Agglomerative
• Bottom-up merging
Ø Decisive
• Top-down splitting
Hierarchical Clustering: Features
Ø Useful in many real-world applications
Ø No need to specify #clusters
Ø Need to define cluster distance
Ø Multi-level clustering
Ø Cannot undo cluster merge/split
Grid-based Clustering
Ø Multi-resolution grid structure
• Clusters of different resolutions
• Horizonal & vertical cluster boundaries
Ø Object space => grid cells
• Depends on #cells, easy to parallelize
Ø Statistical information of grid cells
• Incremental processing
Density-based Clustering
Ø Local clusters with high density
• DBSCAN: connected dense neighborhood
• DENCLUE: sum of local influence functions
Ø Key features
• Arbitrary cluster shape, noise tolerant
• Single scan, adjustable density parameters
DBSCAN
Ø Two key parameters
• ε-neighborhood: within radius ε of p
• MinPts: min #points in p’s ε-neighborhood
for p to be considered a core object
Ø Clustering
• Core objects, border objects
• Density-connected, density-reachable
DENCLUE
Ø Influence function
• Object’s impact in its neighborhood
Ø Overall density
• Sum of all objects’ influence function
Ø Density attractors
• Clusters correspond to local maxima
Types of Clustering Methods
Ø Partitioning methods
Ø Hierarchical methods
Ø Grid-based methods
Ø Density-based methods
Ø Probabilistic methods

You might also like