K Means Clustering Algorithm in Machine Learning.pdf
1.
K-Means Clustering isan Unsupervised Learning algorithm, which groups the unlabeled
dataset into different clusters. Here K defines the number of pre-defined clusters that need to be
created in the process, as if K=2, there will be two clusters, and for K=3, there will be three clusters,
and so on.
2.
Manhattan distance
Problem
Cluster thefollowing eight points (with (x, y) representing locations) into three clusters: A1(2, 10),
A2(2, 5), A3(8, 4), A4(5, 8), A5(7, 5), A6(6, 4), A7(1, 2), A8(4, 9). Initial cluster centers are:
A1(2, 10), A4(5, 8) and A7(1, 2). Use K-Means Algorithm to find the three cluster centers after
the second iteration.
Solution
Iteration-01
Given Points
Distance from
center (2, 10) of
Cluster-01
Distance from
center (5, 8) of
Cluster-02
Distance from
center (1, 2) of
Cluster-03
Point belongs
to Cluster
A1(2, 10) 0 5 9 C1
A2(2, 5) 5 6 4 C3
A3(8, 4) 12 7 9 C2
A4(5, 8) 5 0 10 C2
A5(7, 5) 10 5 9 C2
A6(6, 4) 10 5 7 C2
A7(1, 2) 9 10 0 C3
A8(4, 9) 3 2 10 C2
From here, New clusters are-
Cluster-01: A1(2, 10)
Cluster-02: A3(8, 4), A4(5, 8), A5(7, 5), A6(6, 4), A8(4, 9)
Cluster-03: A2(2, 5), A7(1, 2)
The new cluster center is computed by taking mean of all the points contained in that cluster.
For Cluster-01: We have only one point A1(2, 10) in Cluster-01. So, cluster center remains the
same.
3.
For Cluster-02: Centerof Cluster-02
= ((8 + 5 + 7 + 6 + 4)/5, (4 + 8 + 5 + 4 + 9)/5)
= (6, 6)
For Cluster-03: Center of Cluster-03
= ((2 + 1)/2, (5 + 2)/2)
= (1.5, 3.5)
Iteration-02:
Given Points
Distance from
center (2, 10) of
Cluster-01
Distance from
center (6, 6) of
Cluster-02
Distance from
center (1.5, 3.5) of
Cluster-03
Point belongs to
Cluster
A1(2, 10) 0 8 7 C1
A2(2, 5) 5 5 2 C3
A3(8, 4) 12 4 7 C2
A4(5, 8) 5 3 8 C2
A5(7, 5) 10 2 7 C2
A6(6, 4) 10 2 5 C2
A7(1, 2) 9 9 2 C3
A8(4, 9) 3 5 8 C1
From here, New clusters are-
Cluster-01: A1(2, 10), A8(4, 9)
Cluster-02: A3(8, 4), A4(5, 8), A5(7, 5), A6(6, 4)
Cluster-03: A2(2, 5), A7(1, 2)
The new cluster center is computed by taking mean of all the points contained in that cluster.
For Cluster-01: Center of Cluster-01
= ((2 + 4)/2, (10 + 9)/2)
4.
= (3, 9.5)
ForCluster-02: Center of Cluster-02
= ((8 + 5 + 7 + 6)/4, (4 + 8 + 5 + 4)/4)
= (6.5, 5.25)
For Cluster-03: Center of Cluster-03
= ((2 + 1)/2, (5 + 2)/2)
= (1.5, 3.5)
This is completion of Iteration-02.
After second iteration, the center of the three clusters are-
• C1(3, 9.5)
• C2(6.5, 5.25)
• C3(1.5, 3.5)
Iteration-03
Given Points
Distance from
center (3, 9.5) of
Cluster-01
Distance from
center (6.5, 5.25)
of Cluster-02
Distance from
center (1.5, 3.5) of
Cluster-03
Point belongs to
Cluster
A1(2, 10) 1.5 9.25 7 C1
A2(2, 5) 5.5 4.75 2 C3
A3(8, 4) 10.5 2.75 7 C2
A4(5, 8) 3.5 4.25 8 C1
A5(7, 5) 8.5 0.75 7 C2
A6(6, 4) 8.5 1.75 5 C2
A7(1, 2) 9.5 8.75 2 C3
A8(4, 9) 1.5 6.25 8 C1
From here, New clusters are-
Cluster-01: A1(2, 10), A4(5, 8), A8(4, 9)
Cluster-02: A3(8, 4), A5(7, 5), A6(6, 4)
Cluster-03: A2(2, 5), A7(1, 2)
5.
The new clustercenter is computed by taking mean of all the points contained in that cluster.
For Cluster-01: Center of Cluster-01
= ((2 + 5 + 4)/3, (10 + 8 + 9)/3)
= (3.67, 9)
For Cluster-02: Center of Cluster-02
= ((8 + 7 + 6)/3, (4 + 5 + 4)/3)
= (7, 4.33)
For Cluster-03: Center of Cluster-03
= ((2 + 1)/2, (5 + 2)/2)
= (1.5, 3.5)
This is completion of Iteration-02.
After second iteration, the center of the three clusters are-
• C1(3.67, 9)
• C2(7, 4.33)
• C3(1.5, 3.5)
Iteration-04
Given Points
Distance from
center (3.67, 9)
of Cluster-01
Distance from
center (7, 4.33) of
Cluster-02
Distance from
center (1.5, 3.5) of
Cluster-03
Point belongs to
Cluster
A1(2, 10) 2.67 10.67 7 C1
A2(2, 5) 5.67 5.67 2 C3
A3(8, 4) 9.33 1.33 7 C2
A4(5, 8) 2.33 5.67 8 C1
A5(7, 5) 7.33 0.67 7 C2
A6(6, 4) 7.33 1.33 5 C2
A7(1, 2) 9.67 8.33 2 C3
A8(4, 9) 0.33 7.67 8 C1