KEMBAR78
K Means Clustering Algorithm in Machine Learning.pdf
K-Means Clustering is an Unsupervised Learning algorithm, which groups the unlabeled
dataset into different clusters. Here K defines the number of pre-defined clusters that need to be
created in the process, as if K=2, there will be two clusters, and for K=3, there will be three clusters,
and so on.
Manhattan distance
Problem
Cluster the following eight points (with (x, y) representing locations) into three clusters: A1(2, 10),
A2(2, 5), A3(8, 4), A4(5, 8), A5(7, 5), A6(6, 4), A7(1, 2), A8(4, 9). Initial cluster centers are:
A1(2, 10), A4(5, 8) and A7(1, 2). Use K-Means Algorithm to find the three cluster centers after
the second iteration.
Solution
Iteration-01
Given Points
Distance from
center (2, 10) of
Cluster-01
Distance from
center (5, 8) of
Cluster-02
Distance from
center (1, 2) of
Cluster-03
Point belongs
to Cluster
A1(2, 10) 0 5 9 C1
A2(2, 5) 5 6 4 C3
A3(8, 4) 12 7 9 C2
A4(5, 8) 5 0 10 C2
A5(7, 5) 10 5 9 C2
A6(6, 4) 10 5 7 C2
A7(1, 2) 9 10 0 C3
A8(4, 9) 3 2 10 C2
From here, New clusters are-
Cluster-01: A1(2, 10)
Cluster-02: A3(8, 4), A4(5, 8), A5(7, 5), A6(6, 4), A8(4, 9)
Cluster-03: A2(2, 5), A7(1, 2)
The new cluster center is computed by taking mean of all the points contained in that cluster.
For Cluster-01: We have only one point A1(2, 10) in Cluster-01. So, cluster center remains the
same.
For Cluster-02: Center of Cluster-02
= ((8 + 5 + 7 + 6 + 4)/5, (4 + 8 + 5 + 4 + 9)/5)
= (6, 6)
For Cluster-03: Center of Cluster-03
= ((2 + 1)/2, (5 + 2)/2)
= (1.5, 3.5)
Iteration-02:
Given Points
Distance from
center (2, 10) of
Cluster-01
Distance from
center (6, 6) of
Cluster-02
Distance from
center (1.5, 3.5) of
Cluster-03
Point belongs to
Cluster
A1(2, 10) 0 8 7 C1
A2(2, 5) 5 5 2 C3
A3(8, 4) 12 4 7 C2
A4(5, 8) 5 3 8 C2
A5(7, 5) 10 2 7 C2
A6(6, 4) 10 2 5 C2
A7(1, 2) 9 9 2 C3
A8(4, 9) 3 5 8 C1
From here, New clusters are-
Cluster-01: A1(2, 10), A8(4, 9)
Cluster-02: A3(8, 4), A4(5, 8), A5(7, 5), A6(6, 4)
Cluster-03: A2(2, 5), A7(1, 2)
The new cluster center is computed by taking mean of all the points contained in that cluster.
For Cluster-01: Center of Cluster-01
= ((2 + 4)/2, (10 + 9)/2)
= (3, 9.5)
For Cluster-02: Center of Cluster-02
= ((8 + 5 + 7 + 6)/4, (4 + 8 + 5 + 4)/4)
= (6.5, 5.25)
For Cluster-03: Center of Cluster-03
= ((2 + 1)/2, (5 + 2)/2)
= (1.5, 3.5)
This is completion of Iteration-02.
After second iteration, the center of the three clusters are-
• C1(3, 9.5)
• C2(6.5, 5.25)
• C3(1.5, 3.5)
Iteration-03
Given Points
Distance from
center (3, 9.5) of
Cluster-01
Distance from
center (6.5, 5.25)
of Cluster-02
Distance from
center (1.5, 3.5) of
Cluster-03
Point belongs to
Cluster
A1(2, 10) 1.5 9.25 7 C1
A2(2, 5) 5.5 4.75 2 C3
A3(8, 4) 10.5 2.75 7 C2
A4(5, 8) 3.5 4.25 8 C1
A5(7, 5) 8.5 0.75 7 C2
A6(6, 4) 8.5 1.75 5 C2
A7(1, 2) 9.5 8.75 2 C3
A8(4, 9) 1.5 6.25 8 C1
From here, New clusters are-
Cluster-01: A1(2, 10), A4(5, 8), A8(4, 9)
Cluster-02: A3(8, 4), A5(7, 5), A6(6, 4)
Cluster-03: A2(2, 5), A7(1, 2)
The new cluster center is computed by taking mean of all the points contained in that cluster.
For Cluster-01: Center of Cluster-01
= ((2 + 5 + 4)/3, (10 + 8 + 9)/3)
= (3.67, 9)
For Cluster-02: Center of Cluster-02
= ((8 + 7 + 6)/3, (4 + 5 + 4)/3)
= (7, 4.33)
For Cluster-03: Center of Cluster-03
= ((2 + 1)/2, (5 + 2)/2)
= (1.5, 3.5)
This is completion of Iteration-02.
After second iteration, the center of the three clusters are-
• C1(3.67, 9)
• C2(7, 4.33)
• C3(1.5, 3.5)
Iteration-04
Given Points
Distance from
center (3.67, 9)
of Cluster-01
Distance from
center (7, 4.33) of
Cluster-02
Distance from
center (1.5, 3.5) of
Cluster-03
Point belongs to
Cluster
A1(2, 10) 2.67 10.67 7 C1
A2(2, 5) 5.67 5.67 2 C3
A3(8, 4) 9.33 1.33 7 C2
A4(5, 8) 2.33 5.67 8 C1
A5(7, 5) 7.33 0.67 7 C2
A6(6, 4) 7.33 1.33 5 C2
A7(1, 2) 9.67 8.33 2 C3
A8(4, 9) 0.33 7.67 8 C1
From here, clusters are-
Cluster-01: A1(2, 10), A4(5, 8), A8(4, 9)
Cluster-02: A3(8, 4), A5(7, 5), A6(6, 4)
Cluster-03: A2(2, 5), A7(1, 2)

K Means Clustering Algorithm in Machine Learning.pdf

  • 1.
    K-Means Clustering isan Unsupervised Learning algorithm, which groups the unlabeled dataset into different clusters. Here K defines the number of pre-defined clusters that need to be created in the process, as if K=2, there will be two clusters, and for K=3, there will be three clusters, and so on.
  • 2.
    Manhattan distance Problem Cluster thefollowing eight points (with (x, y) representing locations) into three clusters: A1(2, 10), A2(2, 5), A3(8, 4), A4(5, 8), A5(7, 5), A6(6, 4), A7(1, 2), A8(4, 9). Initial cluster centers are: A1(2, 10), A4(5, 8) and A7(1, 2). Use K-Means Algorithm to find the three cluster centers after the second iteration. Solution Iteration-01 Given Points Distance from center (2, 10) of Cluster-01 Distance from center (5, 8) of Cluster-02 Distance from center (1, 2) of Cluster-03 Point belongs to Cluster A1(2, 10) 0 5 9 C1 A2(2, 5) 5 6 4 C3 A3(8, 4) 12 7 9 C2 A4(5, 8) 5 0 10 C2 A5(7, 5) 10 5 9 C2 A6(6, 4) 10 5 7 C2 A7(1, 2) 9 10 0 C3 A8(4, 9) 3 2 10 C2 From here, New clusters are- Cluster-01: A1(2, 10) Cluster-02: A3(8, 4), A4(5, 8), A5(7, 5), A6(6, 4), A8(4, 9) Cluster-03: A2(2, 5), A7(1, 2) The new cluster center is computed by taking mean of all the points contained in that cluster. For Cluster-01: We have only one point A1(2, 10) in Cluster-01. So, cluster center remains the same.
  • 3.
    For Cluster-02: Centerof Cluster-02 = ((8 + 5 + 7 + 6 + 4)/5, (4 + 8 + 5 + 4 + 9)/5) = (6, 6) For Cluster-03: Center of Cluster-03 = ((2 + 1)/2, (5 + 2)/2) = (1.5, 3.5) Iteration-02: Given Points Distance from center (2, 10) of Cluster-01 Distance from center (6, 6) of Cluster-02 Distance from center (1.5, 3.5) of Cluster-03 Point belongs to Cluster A1(2, 10) 0 8 7 C1 A2(2, 5) 5 5 2 C3 A3(8, 4) 12 4 7 C2 A4(5, 8) 5 3 8 C2 A5(7, 5) 10 2 7 C2 A6(6, 4) 10 2 5 C2 A7(1, 2) 9 9 2 C3 A8(4, 9) 3 5 8 C1 From here, New clusters are- Cluster-01: A1(2, 10), A8(4, 9) Cluster-02: A3(8, 4), A4(5, 8), A5(7, 5), A6(6, 4) Cluster-03: A2(2, 5), A7(1, 2) The new cluster center is computed by taking mean of all the points contained in that cluster. For Cluster-01: Center of Cluster-01 = ((2 + 4)/2, (10 + 9)/2)
  • 4.
    = (3, 9.5) ForCluster-02: Center of Cluster-02 = ((8 + 5 + 7 + 6)/4, (4 + 8 + 5 + 4)/4) = (6.5, 5.25) For Cluster-03: Center of Cluster-03 = ((2 + 1)/2, (5 + 2)/2) = (1.5, 3.5) This is completion of Iteration-02. After second iteration, the center of the three clusters are- • C1(3, 9.5) • C2(6.5, 5.25) • C3(1.5, 3.5) Iteration-03 Given Points Distance from center (3, 9.5) of Cluster-01 Distance from center (6.5, 5.25) of Cluster-02 Distance from center (1.5, 3.5) of Cluster-03 Point belongs to Cluster A1(2, 10) 1.5 9.25 7 C1 A2(2, 5) 5.5 4.75 2 C3 A3(8, 4) 10.5 2.75 7 C2 A4(5, 8) 3.5 4.25 8 C1 A5(7, 5) 8.5 0.75 7 C2 A6(6, 4) 8.5 1.75 5 C2 A7(1, 2) 9.5 8.75 2 C3 A8(4, 9) 1.5 6.25 8 C1 From here, New clusters are- Cluster-01: A1(2, 10), A4(5, 8), A8(4, 9) Cluster-02: A3(8, 4), A5(7, 5), A6(6, 4) Cluster-03: A2(2, 5), A7(1, 2)
  • 5.
    The new clustercenter is computed by taking mean of all the points contained in that cluster. For Cluster-01: Center of Cluster-01 = ((2 + 5 + 4)/3, (10 + 8 + 9)/3) = (3.67, 9) For Cluster-02: Center of Cluster-02 = ((8 + 7 + 6)/3, (4 + 5 + 4)/3) = (7, 4.33) For Cluster-03: Center of Cluster-03 = ((2 + 1)/2, (5 + 2)/2) = (1.5, 3.5) This is completion of Iteration-02. After second iteration, the center of the three clusters are- • C1(3.67, 9) • C2(7, 4.33) • C3(1.5, 3.5) Iteration-04 Given Points Distance from center (3.67, 9) of Cluster-01 Distance from center (7, 4.33) of Cluster-02 Distance from center (1.5, 3.5) of Cluster-03 Point belongs to Cluster A1(2, 10) 2.67 10.67 7 C1 A2(2, 5) 5.67 5.67 2 C3 A3(8, 4) 9.33 1.33 7 C2 A4(5, 8) 2.33 5.67 8 C1 A5(7, 5) 7.33 0.67 7 C2 A6(6, 4) 7.33 1.33 5 C2 A7(1, 2) 9.67 8.33 2 C3 A8(4, 9) 0.33 7.67 8 C1
  • 6.
    From here, clustersare- Cluster-01: A1(2, 10), A4(5, 8), A8(4, 9) Cluster-02: A3(8, 4), A5(7, 5), A6(6, 4) Cluster-03: A2(2, 5), A7(1, 2)