0% found this document useful (0 votes)

130 views53 pages

3 UnSupervised Learning

The document discusses different types of unsupervised machine learning techniques, specifically clustering algorithms. It provides an overview of hierarchical and partitional clustering. For hierarchical clustering, it describes agglomerative and divisive approaches. For partitional clustering, it focuses on the k-means algorithm, providing examples and step-by-step explanations of how k-means works. It also discusses different distance measures that can be used to compare clusters in agglomerative hierarchical clustering.

Uploaded by

Zaeem Abbas

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

130 views53 pages

3 UnSupervised Learning

Uploaded by

Zaeem Abbas

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 53

SE-807/CS-871

Machine Learning
Prof Dr. Hammad Afzal
hammad.afzal@mcs.edu.pk

Data and Text Processing Lab

www.codteem.com

1
Agenda
• Unsupervised Learning

– K-Means Clustering

– Agglomerative Clustering

2
Unsupervised Learning

3
CLUSTERING

Clustering is the partitioning of a data set into

subsets (clusters), so that the data in each
subset (ideally) share some common trait -
often according to some defined distance
measure.

Clustering is unsupervised classification

4
CLUSTERING
There is no explicit teacher and the system forms clusters or “natural
groupings” or structure in the input pattern

5
CLUSTERING
• Data WITHOUT classes or labels

x1, x2 , x3 , xn  , x  d

• Deals with finding a structure in a collection of unlabeled data.

• The process of organizing objects into groups whose members are

similar in some way

• A cluster is therefore a collection of objects which are “similar” between

them and are “dissimilar” to the objects belonging to other clusters.

6
CLUSTERING

• In this case we easily identify the 4 clusters into which the data can be
divided;

• The similarity criterion is distance: two or more objects belong to the

same cluster if they are “close” according to a given distance

7
Types of Clustering
Hierarchical algorithms
These find successive clusters using previously established clusters.

1. Agglomerative ("bottom-up"):
Agglomerative algorithms begin with each element as a separate cluster and merge
them into successively larger clusters.

2. Divisive ("top-down"):
Divisive algorithms begin with the whole set and proceed to divide it into
successively into smaller clusters.

8
Types of Clustering
• Partitional clustering
– Construct a partition of a data set to produce
several clusters – At once
– The process is repeated iteratively –
Termination condition
– Examples
▪ K-means clustering
▪ Fuzzy c-means clustering

9
K means Clustering
1. Chose the number (K) of clusters and randomly
select the centroids of each cluster.

2. For each data point:

I. Calculate the distance from the data point to each
cluster.
II. Assign the data point to the closest cluster.

3. Recompute the centroid of each cluster.

4. Repeat steps 2 and 3 until there is no further

change in the assignment of data points (or in the
centroids).

10
K MEANS – Example 2
– Suppose we have 4 medicines and each has two attributes (pH
and weight index).
– Our goal is to group these objects into K=2 clusters of medicine

Medicine Weight pH-Index C

A 1 1
B 2 1
C 4 3 A B
D 5 4

11
K MEANS – Example 2
– Compute the distance between all samples and K centroids

c1 = A, c2 = B

d( D, c1 ) = ( 5 − 1)2 + ( 4 − 1)2 = 5
d( D, c2 ) = ( 5 − 2)2 + ( 4 − 1)2 = 4.24

12
K MEANS – Example 2
– Assign the sample to its closest cluster

– An element in a row of the Group matrix

below is 1 if and only if the object is
assigned to that group

13
K MEANS – Example 2
– Re-calculate the K-centroids
– Knowing the members of each
– cluster, now we compute the new
– centroid of each group based on
– these new memberships.

c1 = (1, 1)
 2 + 4 + 5 1+ 3 + 4 
c2 =  , 
 3 3 
= (11 / 3, 8 / 3)
= (3.67 , 2.67 )

14
K MEANS – Example 2
• Repeat the above steps

Compute the distance of all objects to

the new centroids

15
K MEANS – Example 2

Assign the membership to objects

16
K MEANS – Example 2

Knowing the members of each cluster, now we

compute the new centroid of each group based
on these new memberships.

 1+ 2 1+1 1
c1 =  ,  = (1 , 1)
 2 2  2
 4+5 3+4 1 1
c2 =  ,  = ( 4 ,3 )
 2 2  2 2

17
K MEANS – Example 2

18
K MEANS – Example 2

19
K MEANS – Example 2
• We obtain result that G2=G1. Comparing the grouping of last
iteration and this iteration reveals that the objects do not move
group anymore.

• Thus, the computation of the k-mean clustering has reached its

stability and no more iterations are needed.

20
Kmeans - Examples
Data Points – RGB Values of pixels
Can be used for Image Segmentation

D. Comaniciu and P.
Meer, Robust Analysis of
Feature Spaces:
Color Image
Segmentation, 1997.
21
Kmeans - Examples
Extraction of text in degraded documents

Original Image Kmeans with k=3

22
Kmeans - Examples

Original K=5 K=11

23
Kmeans - Examples

• Quantization of colors

24
Hierarchical
Clustering

25
Hierachical clustering
 Agglomerative and divisive clustering on the data set {a, b, c, d ,e }

Step 0 Step 1 Step 2 Step 3 Step 4

Agglomerative

a
ab
b
abcde
c
cde
d
de
e
Divisive
Step 4 Step 3 Step 2 Step 1 Step 0 26
Agglomerative clustering
1. Convert object attributes to distance matrix
2. Set each object as a cluster (thus if we have N
objects, we will have N clusters at the beginning)
3. Repeat until number of cluster is one (or known #
of clusters)
a. Merge two closest clusters
b. Update distance matrix

d3
d5
d1 d3,d4,d5
d4
d2
d1,d2 d4,d5 d3

27
Starting Situation
• Start with clusters of individual points and a distance/proximity matrix

p1 p2 p3 p4 p5 ...
p1

p2
p3

p4
p5
.
.
.

Distance Matrix
...
p1 p2 p3 p4 p9 p10 p11 p12

28
Intermediate situation

• After some merging steps, we have some clusters

C1 C2 C3 C4 C5
C1

C2
C3
C3
C4
C4
C5
C1 Distance Matrix

C2 C5

...
p1 p2 p3 p4 p9 p10 p11 p12

29
Intermediate situation
• How do we compare two clusters

C2 C5

30
Inter cluster distance measures

Similarity?

• Single Link
• Average Link
• Complete Link
• Distance between centroids

31
Intermediate situation
• We want to merge the two closest clusters (C2 and C5) and
update the distance matrix.
C1 C2 C3 C4 C5
C1

C2
C3
C3
C4
C4
C5
C1
Distance Matrix

C2 C5

...
p1 p2 p3 p4 p9 p10 p11 p12
32
Single link
• Smallest distance between an element in one cluster and an element
in the other

D(ci , c j ) = min D( x, y)
xci , yc j

33
Complete link
• Largest distance between an element in one cluster and an element in
the other

D(ci , c j ) = max D( x, y)
xci , yc j

34
Average Link
• Avg distance between an element in one cluster and an element in the
other

D(ci , c j ) = avg D( x, y)
xci , yc j

35
Distance between centroids
• Distance between the centroids of two clusters

 

36
After Merging
• Update the distance matrix

C2 U
C5
C1 C3 C4

C1 ?
C3
C2 U C5 ? ? ? ?
C4
C3 ?

C4 ?
C1

C2 U C5

...
p1 p2 p3 p4 p9 p10 p11 p12
37
Agglomerative Clustering - Example
X1 X2
A 1 1
B 1.5 1.5
C 5 5
D 3 4
E 4 4
F 3 3.5

Data matrix
Dist A B C D E F
A 0.00 0.71 5.66 3.61 4.24 3.20
B 0.71 0.00 4.95 2.92 3.54 2.50
dAB = ((1-1.5)2+(1-1.5)2)1/2 = 0.707
C 5.66 4.95 0.00 2.24 1.41 2.50
Euclidean distance D 3.61 2.92 2.24 0.00 1.00 0.50
E 4.24 3.54 1.41 1.00 0.00 1.12
F 3.20 2.50 2.50 0.50 1.12 0.00

38
Merge two closest clusters

Agglomerative Clustering - Example

X1 X2
A 1 1
B 1.5 1.5
C 5 5
D 3 4
E 4 4
Merge them into
single cluster` F 3 3.5

Data matrix
Dist A B C D E F
A 0.00 0.71 5.66 3.61 4.24 3.20
B 0.71 0.00 4.95 2.92 3.54 2.50
C 5.66 4.95 0.00 2.24 1.41 2.50

Find two closest clusters D 3.61 2.92 2.24 0.00 1.00 0.50
E 4.24 3.54 1.41 1.00 0.00 1.12
F 3.20 2.50 2.50 0.50 1.12 0.00

39
Update Distance Matrix

Agglomerative Clustering - Example

Dist A B C D E F
A 0.00 0.71 5.66 3.61 4.24 3.20
B 0.71 0.00 4.95 2.92 3.54 2.50
C 5.66 4.95 0.00 2.24 1.41 2.50
D 3.61 2.92 2.24 0.00 1.00 0.50
E 4.24 3.54 1.41 1.00 0.00 1.12
F 3.20 2.50 2.50 0.50 1.12 0.00

Dist A B C D,F E
A 0.00 0.71 5.66 ? 4.24
B 0.71 0.00 4.95 ? 3.54
C 5.66 4.95 0.00 ? 1.41
D,F ? ? ? 0.00 ?
E 4.24 3.54 1.41 ? 0.00

40
Update Distance Matrix

Agglomerative Clustering - Example

Dist A B C D E F
Min Distance – Single Linkage
A 0.00 0.71 5.66 3.61 4.24 3.20
D(D,F)→A = min(dDA,dFA)=min(3.61,3.20) = 3.20
B 0.71 0.00 4.95 2.92 3.54 2.50
C 5.66 4.95 0.00 2.24 1.41 2.50 D(D,F)→B = min(dDB,dFB)=min(2.92,2.50) = 2.50

D 3.61 2.92 2.24 0.00 1.00 0.50

D(D,F)→C = min(dDC,dFC)=min(2.24,2.50) = 2.24
E 4.24 3.54 1.41 1.00 0.00 1.12
F 3.20 2.50 2.50 0.50 1.12 0.00 D(D,F)→E = min(dDE,dFE)=min(1.00,1.12) = 1.00

Dist A B C D,F E Dist A B C D,F E

A 0.00 0.71 5.66 ? 4.24 A 0.00 0.71 5.66 3.20 4.24
B 0.71 0.00 4.95 ? 3.54 B 0.71 0.00 4.95 2.50 3.54
C 5.66 4.95 0.00 ? 1.41 C 5.66 4.95 0.00 2.24 1.41
D,F ? ? ? 0.00 ? D,F 3.20 2.50 2.24 0.00 1.00
E 4.24 3.54 1.41 ? 0.00 E 4.24 3.54 1.41 1.00 0.00

41
Merge two closest clusters

Agglomerative Clustering - Example

Dist A B C D,F E
A 0.00 0.71 5.66 3.20 4.24
B 0.71 0.00 4.95 2.50 3.54
C 5.66 4.95 0.00 2.24 1.41
D,F 3.20 2.50 2.24 0.00 1.00
E 4.24 3.54 1.41 1.00 0.00

Dist A,B C D,F E

A,B 0.00 ? ? ?
C ? 0.00 2.24 1.41
D,F ? 2.24 0.00 1.00
E ? 1.41 1.00 0.00

42
Update Distance Matrix

Agglomerative Clustering - Example

Dist A B C D,F E
A 0.00 0.71 5.66 3.20 4.24 D(A,B)→C = min(dCA,dCB)=min(5.66,4.95) = 4.95
B 0.71 0.00 4.95 2.50 3.54
C 5.66 4.95 0.00 2.24 1.41 D(A,B)→(D,F) = min(dDA,dDB, dFA,dFB)
=min(3.61, 2.92, 3.20, 2.50) = 2.50
D,F 3.20 2.50 2.24 0.00 1.00
E 4.24 3.54 1.41 1.00 0.00
D(A,B)→E = min(dAE,dBE)=min(4.24,3.54) = 3.54

Dist A,B C D,F E Dist A,B C D,F E

A,B 0.00 ? ? ? A,B 0.00 4.95 2.50 3.54
C ? 0.00 2.24 1.41 C 4.95 0.00 2.24 1.41
D,F ? 2.24 0.00 1.00 D,F 2.50 2.24 0.00 1.00
E ? 1.41 1.00 0.00 E 3.54 1.41 1.00 0.00

43
Merge two closest clusters/Update Distance Matrix

Agglomerative Clustering - Example

Dist A,B C D,F E

A,B 0.00 4.95 2.50 3.54
C 4.95 0.00 2.24 1.41
D,F 2.50 2.24 0.00 1.00
E 3.54 1.41 1.00 0.00

Dist (A,B) C (D,F),E

(A,B) 0.00 4.95 2.50
C 4.95 0.00 1.41
(D,F),E 2.50 1.41 0.00

44
Merge two closest clusters/Update Distance Matrix

Agglomerative Clustering - Example

Dist (A,B) C (D,F),E

(A,B) 0.00 4.95 2.50
C 4.95 0.00 1.41
(D,F),E 2.50 1.41 0.00

Dist (A,B) ((D,F),E),C

(A,B) 0.00 2.50
((D,F),E),C 2.50 0.00

45
Final Result

Agglomerative Clustering - Example

X1 X2
A 1 1
B 1.5 1.5
C 5 5
D 3 4
E 4 4
F 3 3.5

Data matrix

46
Dendrogram Representation

Agglomerative Clustering - Example

1. In the beginning we have 6 clusters: A,

B, C, D, E and F
2. We merge cluster D and F into cluster
6 (D, F) at distance 0.50
3. We merge cluster A and cluster B into
(A, B) at distance 0.71
4. We merge cluster E and (D, F) into ((D,
F), E) at distance 1.00
5. We merge cluster ((D, F), E) and C into
5 (((D, F), E), C) at distance 1.41
6. We merge cluster (((D, F), E), C) and
(A, B) into ((((D, F), E), C), (A, B)) at
4 distance 2.50
3 7. The last cluster contain all the objects,
2 thus conclude the computation

47
Single Link Clustering
5
1
3
5
2 1 0.2

2 3 6
0.15

0.1

0.05

4
4 0
3 6 2 5 4 1

Dendrogram
Nested Clusters

48
Complete link Clustering

4 1
2 5 0.4

5 0.35
2 0.3

0.25

3 6 0.2

3 0.15

1 0.1

0.05
4 0
3 6 4 1 2 5

Dendrogram
Nested Clusters

49
Average link clustering

5 4 1
2
5 0.25

2 0.2

3 6
0.15

0.1
1
0.05

4 0
3 3 6 4 1 2 5

Dendrogram

Nested Clusters

50
Comparison
5 4 1
1
3 2 5
5 5
2
2 1
2 3 3 6
6 3
1
4
4
4

Average Link
51
Agglomerative Clustering

• Where to cut the tree?

3-cluster model 2-cluster model

52
Thank You

Being A Man in Dance Socialization Modes and Gender Identities
No ratings yet
Being A Man in Dance Socialization Modes and Gender Identities
23 pages
Gaussian & GaussView Guide for Lehigh
No ratings yet
Gaussian & GaussView Guide for Lehigh
70 pages
Nirma - Case Study
No ratings yet
Nirma - Case Study
11 pages
Measure of Central Tendency Project
No ratings yet
Measure of Central Tendency Project
86 pages
Mccleary 2
No ratings yet
Mccleary 2
141 pages
CSI104 Summary
No ratings yet
CSI104 Summary
114 pages
Geometry & Surface Dev. Notes
No ratings yet
Geometry & Surface Dev. Notes
165 pages
Probability Theory III (B.Stat. 2017-2020)
No ratings yet
Probability Theory III (B.Stat. 2017-2020)
173 pages
April
No ratings yet
April
8 pages
Girault Raviart Finite Element Approximation of The Navier Stokes Equations LNM 1979
No ratings yet
Girault Raviart Finite Element Approximation of The Navier Stokes Equations LNM 1979
207 pages
Math Hons Previous
No ratings yet
Math Hons Previous
79 pages
June-July 2019 QP With Solutions
No ratings yet
June-July 2019 QP With Solutions
47 pages
Geometry of Polynomials 5ae4a0ed7f8b9a283f8b458e
No ratings yet
Geometry of Polynomials 5ae4a0ed7f8b9a283f8b458e
48 pages
Math 2004 Notes
No ratings yet
Math 2004 Notes
79 pages
Module Six Lesson One Notes Guided Notes
No ratings yet
Module Six Lesson One Notes Guided Notes
10 pages
Fainstein City Planning
No ratings yet
Fainstein City Planning
22 pages
Financial Markets & Investments Guide
No ratings yet
Financial Markets & Investments Guide
6 pages
Cagills
No ratings yet
Cagills
41 pages
Unreadable Document Placeholder
100% (1)
Unreadable Document Placeholder
26 pages
7822 - ThePellEquation
No ratings yet
7822 - ThePellEquation
4 pages
David Close
No ratings yet
David Close
80 pages
Rodriguez Perez V Amazon
No ratings yet
Rodriguez Perez V Amazon
30 pages
Using Multicast Hammer
No ratings yet
Using Multicast Hammer
5 pages
A Reference Process Model For Master Dat
No ratings yet
A Reference Process Model For Master Dat
14 pages
Moorthi and Thiagarajan - 2020 - Energy Consumption and Network Connectivity Based
No ratings yet
Moorthi and Thiagarajan - 2020 - Energy Consumption and Network Connectivity Based
9 pages
Unit-1 Notes (Mobile Computing) PDF
No ratings yet
Unit-1 Notes (Mobile Computing) PDF
14 pages
Note2 3
No ratings yet
Note2 3
36 pages
Graphs and BFS: Concepts & Applications
100% (1)
Graphs and BFS: Concepts & Applications
40 pages
Using Matlab: Page 1 of 3 Spring Semester 2012
No ratings yet
Using Matlab: Page 1 of 3 Spring Semester 2012
3 pages
Dataviz Cheatsheet
No ratings yet
Dataviz Cheatsheet
9 pages
Intellicare Reimbursement Form 10.27.2022
No ratings yet
Intellicare Reimbursement Form 10.27.2022
1 page
Cedex
No ratings yet
Cedex
189 pages
Gamma Distribution
No ratings yet
Gamma Distribution
7 pages
China's Belt and Road Initiative and Its Implications For Global Development
No ratings yet
China's Belt and Road Initiative and Its Implications For Global Development
28 pages
BADB1014 Quantitative Methods - Lesson 3
No ratings yet
BADB1014 Quantitative Methods - Lesson 3
23 pages
Lossless Compression
No ratings yet
Lossless Compression
11 pages
C++ Stream Classes and I/O Operations
No ratings yet
C++ Stream Classes and I/O Operations
19 pages
Bayer Annual Report 2012
No ratings yet
Bayer Annual Report 2012
285 pages
AP 5 Integumentary
No ratings yet
AP 5 Integumentary
6 pages
Multiple Choice Questions 3
100% (1)
Multiple Choice Questions 3
5 pages
QF Project
No ratings yet
QF Project
27 pages
Assignment 2 Specification
No ratings yet
Assignment 2 Specification
3 pages
Pair of Straight Lines-Full
100% (1)
Pair of Straight Lines-Full
40 pages
Fridpur Sugar Mills LTD (2020-21) AFS
No ratings yet
Fridpur Sugar Mills LTD (2020-21) AFS
23 pages
My Undergraduate Project On Persistent Homology For Topological Data Analysis
No ratings yet
My Undergraduate Project On Persistent Homology For Topological Data Analysis
56 pages
James Smith - Pfizer
No ratings yet
James Smith - Pfizer
1 page
Lecture 2 Introduction To Global and ASEAN Wildfire, Smoke Haze Tracking System
No ratings yet
Lecture 2 Introduction To Global and ASEAN Wildfire, Smoke Haze Tracking System
186 pages
Real Analysis Lecture Notes
No ratings yet
Real Analysis Lecture Notes
76 pages
General Knowledge MCQs for PPSC
No ratings yet
General Knowledge MCQs for PPSC
94 pages
02.05. Reporte de Calidad
No ratings yet
02.05. Reporte de Calidad
9 pages
1.0 Math in Our World
No ratings yet
1.0 Math in Our World
43 pages
Shigeo Sasaki, Almost Contact Manifolds, Part 2, Lecture Notes, Mathematical Institute, Tohoku University (1967,147p) PDF
No ratings yet
Shigeo Sasaki, Almost Contact Manifolds, Part 2, Lecture Notes, Mathematical Institute, Tohoku University (1967,147p) PDF
147 pages
Templates: Your Own Sub Headline
No ratings yet
Templates: Your Own Sub Headline
22 pages
Manual Imet GBS242 GH
No ratings yet
Manual Imet GBS242 GH
27 pages
Reading Comprehension Quiz
No ratings yet
Reading Comprehension Quiz
5 pages
Bridge Course Material New Xii
No ratings yet
Bridge Course Material New Xii
12 pages
Bowen Craggs - Explain Yourself Index US 2019
No ratings yet
Bowen Craggs - Explain Yourself Index US 2019
16 pages
Competitive Positioning Strategies
No ratings yet
Competitive Positioning Strategies
17 pages
CS423 Data Warehousing and Data Mining: Dr. Hammad Afzal
No ratings yet
CS423 Data Warehousing and Data Mining: Dr. Hammad Afzal
41 pages
06 - Unsupervised Learning - 18 Dec 2023
No ratings yet
06 - Unsupervised Learning - 18 Dec 2023
50 pages
Data-Based Decision Making and Digital Transformation
No ratings yet
Data-Based Decision Making and Digital Transformation
108 pages
Class Test-I Database Management Systems Time: 1Hr. 2010-11
No ratings yet
Class Test-I Database Management Systems Time: 1Hr. 2010-11
2 pages
IBM I2 Analyst's Notebook Family
No ratings yet
IBM I2 Analyst's Notebook Family
6 pages
2022 - Knowledge Management and Digital Transformation For Industry 4.0 A Structured Literature Review
No ratings yet
2022 - Knowledge Management and Digital Transformation For Industry 4.0 A Structured Literature Review
20 pages
Lab 2 Database
No ratings yet
Lab 2 Database
4 pages
SAP HANA Interview Questions
No ratings yet
SAP HANA Interview Questions
17 pages
Data Integration
No ratings yet
Data Integration
6 pages
B.lib. I.sc. (Revised) 27
No ratings yet
B.lib. I.sc. (Revised) 27
14 pages
Welcome
No ratings yet
Welcome
14 pages
Ansh20csu169bi DV
No ratings yet
Ansh20csu169bi DV
70 pages
STE Project
No ratings yet
STE Project
9 pages
PorselvanTJ Resume
No ratings yet
PorselvanTJ Resume
6 pages
Nishant Agarwal Resume
No ratings yet
Nishant Agarwal Resume
2 pages
Data Science
No ratings yet
Data Science
9 pages
Technical Report
No ratings yet
Technical Report
8 pages
Classical Analysis
100% (1)
Classical Analysis
6 pages
Python Bank Management System Guide
No ratings yet
Python Bank Management System Guide
7 pages
District Panchayat Office - Welcome To East Godavari District Web Portal - India
No ratings yet
District Panchayat Office - Welcome To East Godavari District Web Portal - India
1 page
Questions and Answers:: 1. List and Describe The Limitations To Using Big Data
No ratings yet
Questions and Answers:: 1. List and Describe The Limitations To Using Big Data
2 pages
Enterprise Data Warehouse Cuts Costs
No ratings yet
Enterprise Data Warehouse Cuts Costs
4 pages
Standard Operating Procedures For Data Management
100% (3)
Standard Operating Procedures For Data Management
104 pages
Iie VC Bachelor of Computer and Information Sciences in Application Development Module Purposes v1
No ratings yet
Iie VC Bachelor of Computer and Information Sciences in Application Development Module Purposes v1
3 pages
SAP Business Explorer Tools
No ratings yet
SAP Business Explorer Tools
12 pages
Oracle SQL 9i
No ratings yet
Oracle SQL 9i
76 pages
Quintessential Guide To Data Lineage 2023 1689366223
No ratings yet
Quintessential Guide To Data Lineage 2023 1689366223
10 pages
Tableau Answers Final
No ratings yet
Tableau Answers Final
5 pages
01 DK PDW Qliksense
No ratings yet
01 DK PDW Qliksense
19 pages
AI & Soft Computing Journal CFP
No ratings yet
AI & Soft Computing Journal CFP
2 pages
I Year / Ii Semester (Information Technology) It8201-Information Technology Essentials
No ratings yet
I Year / Ii Semester (Information Technology) It8201-Information Technology Essentials
46 pages
Storage Devices and Media
No ratings yet
Storage Devices and Media
37 pages

3 UnSupervised Learning

Uploaded by

3 UnSupervised Learning

Uploaded by

SE-807/CS-871

Data and Text Processing Lab

Clustering is the partitioning of a data set into

Clustering is unsupervised classification

• Deals with finding a structure in a collection of unlabeled data.

• The process of organizing objects into groups whose members are

• A cluster is therefore a collection of objects which are “similar” between

• The similarity criterion is distance: two or more objects belong to the

2. For each data point:

3. Recompute the centroid of each cluster.

4. Repeat steps 2 and 3 until there is no further

Medicine Weight pH-Index C

– An element in a row of the Group matrix

Compute the distance of all objects to

Assign the membership to objects

Knowing the members of each cluster, now we

• Thus, the computation of the k-mean clustering has reached its

Original Image Kmeans with k=3

Original K=5 K=11

Step 0 Step 1 Step 2 Step 3 Step 4

• After some merging steps, we have some clusters

Agglomerative Clustering - Example

Agglomerative Clustering - Example

Agglomerative Clustering - Example

D 3.61 2.92 2.24 0.00 1.00 0.50

Dist A B C D,F E Dist A B C D,F E

Agglomerative Clustering - Example

Dist A,B C D,F E

Agglomerative Clustering - Example

Dist A,B C D,F E Dist A,B C D,F E

Agglomerative Clustering - Example

Dist A,B C D,F E

Dist (A,B) C (D,F),E

Agglomerative Clustering - Example

Dist (A,B) C (D,F),E

Dist (A,B) ((D,F),E),C

Agglomerative Clustering - Example

Agglomerative Clustering - Example

1. In the beginning we have 6 clusters: A,

Single Link 5 Complete Link

• Where to cut the tree?

3-cluster model 2-cluster model

You might also like