0% found this document useful (0 votes)

43 views81 pages

Clustering and K-Means Algorithm

The document provides an overview of unsupervised learning, focusing on clustering techniques, particularly the K-Means algorithm. It explains the differences between supervised and unsupervised learning, types of clustering, and various clustering methods, including hierarchical and density-based clustering. Additionally, it discusses applications of clustering and the challenges associated with the K-Means method.

Uploaded by

kritikamishra4000

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

43 views81 pages

Clustering and K-Means Algorithm

Uploaded by

kritikamishra4000

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 81

UNSUPERVISED

LEARNING
(Clustering and K-Means
Algorithm)
Outline of Presentation

❑Clustering
❑Types of Clustering
❑Examples and explanation of Clustering
❑K-Means Clustering
❑Working of K-Means Algorithm
❑K-Means Algorithm Examples
❑Hierarchal Clustering
❑Distance Based Clustering
Machine Learning
Classification
Supervised Task Driven (Defined Labels)
Predict Future
Learning Values Regression
(No Defined Labels)

Machine Unsupervised Data Driven

Learning Learning Clustering

Reinforcement Learning from the

mistakes
Learning
Supervised learning vs. Unsupervised learning

Supervised learning: discover patterns in the data that relate

data attributes with a target (class) attribute.
• These patterns are then utilized to predict the values of the
target attribute in future data instances.

Unsupervised learning: The data have no target attribute.

• We want to explore the data to find some intrinsic structures in
them.
Clustering

❑ Clustering is a technique for finding similarity groups in data,

called clusters. i.e.,
❖ It groups data instances that are similar to (near) each other in
one cluster and data instances that are very different (far away)
from each other into different clusters.
❑ Clustering is often called an unsupervised learning task as
no class values denoting an a priori grouping of the data
instances are given, which is the case in supervised learning.
❑ Due to historical reasons, clustering is often considered
synonymous with unsupervised learning.
❖ In fact, association rule mining is also unsupervised
WHAT Is Clustering?

“Clustering is the process of dividing the datasets into groups,

consisting of similar data-points”
• Points in the same group are as similar as possible.
• Points in different group are as dissimilar as possible

Examples of Clustering

Items arranges in a Mall Group of Diners in a Restaurant

Illustration of Clustering
The data set has three natural groups of
data points, i.e., 3 natural clusters.

Example 1: groups people of similar sizes together to

make “small”, “medium” and “large” T-Shirts.
• Tailor-made for each person: too expensive
• One-size-fits-all: does not fit all.
Example 2: In marketing, segment customers
according to their similarities
• To do targeted marketing.
➢ It has a long history, and used in almost every field, e.g., medicine,
psychology, botany, sociology, biology, archeology, marketing, insurance,
libraries, etc.
➢ In recent years, due to the rapid increase of online documents, text
clustering becomes important.
Aspects of Clustering
➢ A clustering algorithm
▪ Partitional clustering
▪ Hierarchical clustering
▪ …

➢ A distance (similarity, or dissimilarity) function

➢ Clustering quality
▪ Inter-clusters distance  maximized
▪ Intra-clusters distance  minimized

➢ The quality of a clustering result depends on the algorithm, the

distance function, and the application.
What is Similarity

• In the field of
cluster analysis,
this similarity
plays an
important part.
• Now, we shall
learn how
similarity (this is
also alternatively
judged as
“dissimilarity”)
between any two
data can be
measured.
Applications of clustering

Recommendation Search results

System
Banking/ Finance/ insurance

Grouping of photos Movie

Recommendation
Retail Stores
Applications of clustering (Face Clustering)
Applications of clustering (Face Clustering)
Types of CLUSTERING
❑ Exclusive Clustering
❑ Overlapping Clustering
❑ Hierarchical Clustering

Exclusive Clustering
• Hard Clustering
• Data Point/Item belongs exclusively to one
cluster
• For Example- K-Means Clustering
Types of CLUSTERING
❑ Exclusive Clustering
❑ Overlapping Clustering
❑ Hierarchical Clustering

Overlapping Clustering
• Soft Cluster
• Data Points/Item Belong to Multiple Cluster
• For Example- Fuzzy/ C-Means Clustering
Types of CLUSTERING
❑ Exclusive Clustering
❑ Overlapping Clustering
❑ Hierarchical Clustering

Hierarchal Clustering
Methods of CLUSTERING
❑ Partitioning Method
❑ Hierarchical Method
❑ Density-Based Methods

Partitioning Methods
Partitioning a database D of n objects into a set of k clusters, such that the sum of
squared distances is minimized (where c i is the centroid or medoid of cluster Ci)

E = ik=1 pCi ( p − ci ) 2
k-means (MacQueen’67, Lloyd’57/’82): Each cluster is represented by the
center of the cluster
k-medoids or PAM (Partition around medoids) (Kaufman & Rousseeuw’87):
Each cluster is represented by one of the objects in the cluster
K-means Clustering

➢ K-Means is a clustering algorithm whose main goal is to group similar elements

or data-points into a cluster.
➢ “K” represents the number of clusters

Examples:

Pile of Dirty Cloths

Document Classifier
K-means Clustering
K-means Clustering
K-means Clustering
K-means Clustering
K-means Clustering
K-means Clustering
K-means Clustering
K-means Clustering
K-means Clustering
K-means Clustering
K-means Clustering
K-means Clustering
K-means Clustering
K-means Clustering
K-means Clustering
K-means Clustering
K-means Clustering
K-means Clustering
K-means Clustering
K-means Clustering
K-means Clustering
K-means Clustering
K-means Clustering
K-means Clustering
K-means Clustering
K-means Clustering
K-means Clustering
K-means Clustering
K-means Clustering
K-means Clustering
K-means Clustering
K-means Clustering
K-means Clustering
K-means Clustering
K-means Clustering
K-means Clustering
K-means Clustering
K-means Clustering
K-means Clustering
K-means Clustering
How will you find value of k
Summary of K-means algorithm
Application of K-means algorithm
Colour-Based Image Segmentation Using K-means
Step 1: Loading a colour image of
tissue stained with
hemotoxylin and eosin
(H&E)
Application of K-means algorithm
Colour-Based Image Segmentation Using K-means
Step 2: Convert the image from RGB colour space to L*a*b*
colour space
• Unlike the RGB colour model, L*a*b* colour is
designed to approximate human vision.
• There is a complicated transformation between RGB
and L*a*b*.
(L*, a*, b*) = T(R, G, B)
(R, G, B) = T’(L*, a*, b*).
Application of K-means algorithm
Colour-Based Image Segmentation Using K-means
Step 3: Undertake clustering analysis in the (a*, b*) colour space
with the K-means algorithm
• In the L*a*b* colour space, each pixel has a
properties or feature vector: (L*, a*, b*).
• Like feature selection, L* feature is discarded. As a
result, each pixel has a feature vector (a*, b*).
• Applying the K-means algorithm to the image in the
a*b* feature space where K = 3 by applying the
domain knowledge.
Application of K-means algorithm
Colour-Based Image Segmentation Using K-means
Step 4: Label every pixel in the
image using the results
from K-means clustering
(indicated by three
different grey levels)
Application of K-means algorithm
Colour-Based Image Segmentation Using K-means
Step 5: Create Images that Segment the H&E Image by Colour
• Apply the label and the colour information of each pixel to achieve
separate colour images corresponding to three clusters.

“blue” pixels “white” pixels “pink” pixels

Application of K-means algorithm
Colour-Based Image Segmentation Using K-means
Step 6: Segment the nuclei into a
separate image with the L*
feature
• In cluster 1, there are dark and light
blue objects (pixels). The dark blue
objects (pixels) correspond to nuclei
(with the domain knowledge).
• L* feature specifies the brightness
values of each colour.
• With a threshold for L*, we achieve
an image containing the nuclei only.
Problem of the K-Means Method

• The algorithm is only applicable if the mean is defined.

• For categorical data, k-mode - the centroid is represented by
most frequent values.
• The user needs to specify k.
• The algorithm is sensitive to outliers
• Outliers are data points that are very far away from other
data points.
• Outliers could be errors in the data recording or some special
data points with very different values.
Problems with outliers
Problem of the K-Means Method

• The algorithm is sensitive to initial seeds.

Problem of the K-Means Method
• If we use different seeds: good results
▪ There are some
methods to help
choose good
seeds
Problem of the K-Means Method

• The k-means algorithm is not suitable for discovering clusters

that are not hyper-ellipsoids (or hyper-spheres).
Problem of the K-Means Method
• If we use different seeds: good results
▪ There are some
methods to help
choose good
seeds
Methods of CLUSTERING
❑ Partitioning Method
❑ Hierarchical Method
❑ Density-Based Methods

Partitioning Methods
Partitioning a database D of n objects into a set of k clusters, such that the sum of
squared distances is minimized (where c i is the centroid or medoid of cluster Ci)

• Use distance matrix as clustering criteria. This method does not

require the number of clusters k as an input, but needs a termination
condition
Step 0 Step 1 Step 2 Step 3 Step 4 agglomerative
(AGNES)
a Agglomerative Nesting
ab
b abcde
c
cde
d
de
e divisive
(DIANA)
Step 4 Step 3 Step 2 Step 1 Step 0
Divisive Analysis
Types of hierarchical clustering

➢ Agglomerative (bottom up) clustering: It builds the

dendrogram (tree) from the bottom level, and
▪ merges the most similar (or nearest) pair of cluster types of
hierarchical clustering.
▪ stops when all the data points are merged into a single cluster
(i.e., the root cluster).
➢ Divisive (top down) clustering: It starts with all data
points in one cluster, the root.
• Splits the root into a set of child clusters. Each child cluster is
recursively divided further
• stops when only singleton clusters of individual data points
remain, i.e., each cluster with only a single point
An example: working of the algorithm
Hierarchical CLUSTERING

Decompose data objects into a several levels of nested partitioning (tree of

clusters), called a dendrogram

A clustering of the data objects is obtained by cutting the dendrogram at the

desired level, then each connected component forms a cluster
Density-Based CLUSTERING

• Clustering based on density (local cluster criterion), such as density-

connected points
• Major features:
➢ Several interesting studies: Discover clusters of arbitrary shape
➢ Handle noise
➢ One scan
➢ Need density parameters as termination condition
• Several interesting studies:
DBSCAN: Ester, et al. (KDD’96)
OPTICS: Ankerst, et al (SIGMOD’99).
DENCLUE: Hinneburg & D. Keim (KDD’98)
CLIQUE: Agrawal, et al. (SIGMOD’98) (more grid-based)
Density-Based CLUSTERING

• Two parameters: Epsilon and Minimum Points

• Eps: Maximum radius of the neighbourhood
• MinPts: Minimum number of points in an Eps-neighbourhood
of that point
• NEps(p): {q belongs to D | dist(p,q) ≤ Eps}
• Directly density-reachable: A point p is directly density-reachable
from a point q w.r.t. Eps, MinPts if
• p belongs to NEps(q) p MinPts = 5
• core point condition: Eps = 1 cm
|NEps (q)| ≥ MinPts q
Density-Reachable and Density-Connected

• Density-reachable:
• A point p is density-reachable from a
p
point q w.r.t. Eps, MinPts if there is a p1
chain of points p1, …, pn, p1 = q, pn = p q
such that pi+1 is directly density-reachable
from pi
• Density-connected
• A point p is density-connected to a point
q w.r.t. Eps, MinPts if there is a point o
p q
such that both, p and q are density-
reachable from o w.r.t. Eps and MinPts o
DBSCAN: Density-Based Spatial
Clustering of Applications with Noise
• Relies on a density-based notion of cluster: A cluster is defined as a
maximal set of density-connected points
• Discovers clusters of arbitrary shape in spatial databases with noise

Outlier

Border
Eps = 1cm
Core MinPts = 5
DBSCAN: The Algorithm

• Arbitrary select a point p

• Retrieve all points density-reachable from p w.r.t. Eps and MinPts

• If p is a core point, a cluster is formed

• If p is a border point, no points are density-reachable from p and DBSCAN

visits the next point of the database

• Continue the process until all of the points have been processed

Unit 4
No ratings yet
Unit 4
74 pages
Chapter 5. Clustering Algorithms-Stud
No ratings yet
Chapter 5. Clustering Algorithms-Stud
44 pages
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
No ratings yet
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
42 pages
ML Unit-4 Final 2024-25
No ratings yet
ML Unit-4 Final 2024-25
28 pages
Unit 3 Clustering Algorithm
No ratings yet
Unit 3 Clustering Algorithm
44 pages
Clustering Algorithm
No ratings yet
Clustering Algorithm
47 pages
2023 K Means
No ratings yet
2023 K Means
48 pages
Clustering
No ratings yet
Clustering
32 pages
Clustering Techniques Explained
No ratings yet
Clustering Techniques Explained
11 pages
22AIP3101A Session 9
No ratings yet
22AIP3101A Session 9
38 pages
ML Unit-4
No ratings yet
ML Unit-4
14 pages
Artificial Intelligence Lec 5
No ratings yet
Artificial Intelligence Lec 5
20 pages
ML Unit III
No ratings yet
ML Unit III
82 pages
K Mean Clustering
No ratings yet
K Mean Clustering
59 pages
ML Module5 Clustering
No ratings yet
ML Module5 Clustering
71 pages
ML CH 4
No ratings yet
ML CH 4
51 pages
Clustering
No ratings yet
Clustering
38 pages
ML Mod 4 Part 1
No ratings yet
ML Mod 4 Part 1
99 pages
Unsupervised Learning: Clustering
No ratings yet
Unsupervised Learning: Clustering
57 pages
ML UNIT 4 Sir
No ratings yet
ML UNIT 4 Sir
42 pages
Unit 4
No ratings yet
Unit 4
125 pages
BDA Unit 2
No ratings yet
BDA Unit 2
31 pages
Clustering Explanation
No ratings yet
Clustering Explanation
8 pages
Clustering
No ratings yet
Clustering
84 pages
Unit - 4 (ML)
No ratings yet
Unit - 4 (ML)
13 pages
What Is Unsupervised Learning
No ratings yet
What Is Unsupervised Learning
9 pages
K Means Clustering
No ratings yet
K Means Clustering
29 pages
Clustering
No ratings yet
Clustering
104 pages
Data Mining Clustering
No ratings yet
Data Mining Clustering
76 pages
Clustering
No ratings yet
Clustering
125 pages
Clustering Algorithm
No ratings yet
Clustering Algorithm
17 pages
K-Means Data Clustering Approach: Jaipur National University
No ratings yet
K-Means Data Clustering Approach: Jaipur National University
43 pages
DWMModule 4
No ratings yet
DWMModule 4
31 pages
Module 5
No ratings yet
Module 5
91 pages
Unit 4 Clustering - K-Means and Hierarchical
No ratings yet
Unit 4 Clustering - K-Means and Hierarchical
40 pages
ML Module 4 Unsupervised Learning - Updated
No ratings yet
ML Module 4 Unsupervised Learning - Updated
55 pages
Session 37 CO4 Unsupervised Learning
No ratings yet
Session 37 CO4 Unsupervised Learning
34 pages
Clustering
No ratings yet
Clustering
80 pages
Final ML Unit3 May24
No ratings yet
Final ML Unit3 May24
154 pages
Unsupesfwafarvised Learning
No ratings yet
Unsupesfwafarvised Learning
49 pages
Clustering
No ratings yet
Clustering
25 pages
Unit-4 ML
No ratings yet
Unit-4 ML
16 pages
Clustering
No ratings yet
Clustering
67 pages
Machine Learning & Data Mining
No ratings yet
Machine Learning & Data Mining
108 pages
M3 - Unsupervised Machine Learning
No ratings yet
M3 - Unsupervised Machine Learning
35 pages
Clustering K Means Agnes
No ratings yet
Clustering K Means Agnes
36 pages
Clustering Algorithm: An Unsupervised Learning Approach
No ratings yet
Clustering Algorithm: An Unsupervised Learning Approach
23 pages
Lecture 8 - Clustering
No ratings yet
Lecture 8 - Clustering
23 pages
Cluster Analysis: Basic Concepts Partitioning Methods Hierarchical Methods Density-Based Methods Grid-Based Methods Evaluation of Clustering
No ratings yet
Cluster Analysis: Basic Concepts Partitioning Methods Hierarchical Methods Density-Based Methods Grid-Based Methods Evaluation of Clustering
38 pages
Clustering Notes
No ratings yet
Clustering Notes
37 pages
Clustering Part1
No ratings yet
Clustering Part1
79 pages
2002 Spring CS525 Lecture 2
No ratings yet
2002 Spring CS525 Lecture 2
37 pages
Cluster Analysis: Methods and Applications
No ratings yet
Cluster Analysis: Methods and Applications
14 pages
IT3080 Lecture04 2023
No ratings yet
IT3080 Lecture04 2023
56 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
23 pages
M5
No ratings yet
M5
40 pages
K-Means Clustering Guide
No ratings yet
K-Means Clustering Guide
32 pages
Clustering
No ratings yet
Clustering
29 pages
Cluster Analysis
No ratings yet
Cluster Analysis
21 pages
Data Mining and Predictive Modelling
No ratings yet
Data Mining and Predictive Modelling
14 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
86 pages
Data Mining and Predictive Modelling: Lecture 2: Functionalities, KDD Process, Data Attributes and Properties
No ratings yet
Data Mining and Predictive Modelling: Lecture 2: Functionalities, KDD Process, Data Attributes and Properties
11 pages
DS - Lecture-Week-3 Recursion & Searching
No ratings yet
DS - Lecture-Week-3 Recursion & Searching
51 pages
DS Lecture Week 13
No ratings yet
DS Lecture Week 13
38 pages
DS Lecture Week 4 Sorting
No ratings yet
DS Lecture Week 4 Sorting
22 pages
CSET243-R23 Questionpaper
No ratings yet
CSET243-R23 Questionpaper
3 pages
DS Lecture Week 5
No ratings yet
DS Lecture Week 5
26 pages
STAKE Six Strategy
No ratings yet
STAKE Six Strategy
1 page
Zanjadora Vermer T855
78% (9)
Zanjadora Vermer T855
328 pages
o Levelcomputer Ruknuddin Patels Web Portal
No ratings yet
o Levelcomputer Ruknuddin Patels Web Portal
29 pages
Literature Review On Finger Millet
100% (2)
Literature Review On Finger Millet
4 pages
Comba ODI-065R12M15JJJ02-GQ V1
No ratings yet
Comba ODI-065R12M15JJJ02-GQ V1
3 pages
Paper 2 - May 2018 Mark Scheme
No ratings yet
Paper 2 - May 2018 Mark Scheme
14 pages
Notification of Award (NOA)
No ratings yet
Notification of Award (NOA)
2 pages
SCI EI Art Qi
No ratings yet
SCI EI Art Qi
72 pages
Clinical Gynecologic Oncology
No ratings yet
Clinical Gynecologic Oncology
809 pages
IndustryHealthNewsFiles PDF
No ratings yet
IndustryHealthNewsFiles PDF
264 pages
Carlo Maderno, Dome of San Andrea
No ratings yet
Carlo Maderno, Dome of San Andrea
8 pages
Release Checklist - en
No ratings yet
Release Checklist - en
2 pages
Land Title Disputes Explained
No ratings yet
Land Title Disputes Explained
2 pages
The Wizz Pricelist
No ratings yet
The Wizz Pricelist
46 pages
Completed CH6 Mini Case Pharma Biotech Working Papers Fall 2014
50% (2)
Completed CH6 Mini Case Pharma Biotech Working Papers Fall 2014
3 pages
Final Centrifugal Lab
No ratings yet
Final Centrifugal Lab
11 pages
Proportionality & Average Speed
No ratings yet
Proportionality & Average Speed
2 pages
BMIS300 Revision Sheet - Final - Fall 2023-2024
No ratings yet
BMIS300 Revision Sheet - Final - Fall 2023-2024
11 pages
Renewable Energy Engineering
100% (2)
Renewable Energy Engineering
20 pages
2021 - 2 - 11 - 145 - CL1-Adavanced Audit and Assurance-Feb 2021 - English
No ratings yet
2021 - 2 - 11 - 145 - CL1-Adavanced Audit and Assurance-Feb 2021 - English
13 pages
Accredited Investor Verification Guide
No ratings yet
Accredited Investor Verification Guide
3 pages
Realcare: Adult & Baby Hygiene Products
No ratings yet
Realcare: Adult & Baby Hygiene Products
24 pages
Tesla Free Power Device - VladimirUtkin
100% (1)
Tesla Free Power Device - VladimirUtkin
75 pages
A Case Study On The Different Major Players of Fuel Oil in The Philippines
No ratings yet
A Case Study On The Different Major Players of Fuel Oil in The Philippines
6 pages
Unit 4-1
No ratings yet
Unit 4-1
11 pages
FRDM k22f SCH
No ratings yet
FRDM k22f SCH
5 pages
TJPR Manuscript - AL WAHFI SUHADA SIPAHUTAR - ERMADAYANI
No ratings yet
TJPR Manuscript - AL WAHFI SUHADA SIPAHUTAR - ERMADAYANI
6 pages
Router User Management Guide
No ratings yet
Router User Management Guide
47 pages
Chapter 7: Probability II Probability of An Event No
100% (1)
Chapter 7: Probability II Probability of An Event No
2 pages

Clustering and K-Means Algorithm

Uploaded by

Clustering and K-Means Algorithm

Uploaded by

UNSUPERVISED

Machine Unsupervised Data Driven

Reinforcement Learning from the

Supervised learning: discover patterns in the data that relate

Unsupervised learning: The data have no target attribute.

❑ Clustering is a technique for finding similarity groups in data,

“Clustering is the process of dividing the datasets into groups,

Items arranges in a Mall Group of Diners in a Restaurant

Example 1: groups people of similar sizes together to

➢ A distance (similarity, or dissimilarity) function

➢ The quality of a clustering result depends on the algorithm, the

Recommendation Search results

Grouping of photos Movie

➢ K-Means is a clustering algorithm whose main goal is to group similar elements

Pile of Dirty Cloths

“blue” pixels “white” pixels “pink” pixels

• The algorithm is only applicable if the mean is defined.

• The algorithm is sensitive to initial seeds.

• The k-means algorithm is not suitable for discovering clusters

• Use distance matrix as clustering criteria. This method does not

➢ Agglomerative (bottom up) clustering: It builds the

Decompose data objects into a several levels of nested partitioning (tree of

A clustering of the data objects is obtained by cutting the dendrogram at the

• Clustering based on density (local cluster criterion), such as density-

• Two parameters: Epsilon and Minimum Points

• Arbitrary select a point p

• If p is a core point, a cluster is formed

• If p is a border point, no points are density-reachable from p and DBSCAN

You might also like