Artificial Intelligence Lab
Lab No: 10
Name: Rehan Munir Awan
Enrollment No: 01-132222-036
Date: 30-May-2025
Submitted To: Sir Syed Muhammad Usman
DEPARTMENT OF COMPUTER ENGINEERING
BAHRIA UNIVERSITY ISLAMABAD
CAMPUS
Implementation of K-means Clustering
Objective:
The objective of this lab is to implement and understand the K-means
clustering algorithm, an unsupervised machine learning technique
used to group data into distinct clusters based on similarity. This lab
aims to provide hands-on experience in applying K-means to real-
world datasets, allowing students to observe how data points are
partitioned into k clusters by minimizing the within-cluster variance.
Students will explore the steps of initialization, centroid updating, and
convergence, as well as the impact of choosing different values of k.
Additionally, the lab will focus on evaluating clustering performance
using metrics like inertia and silhouette score, and visualizing the
resulting clusters to gain insights into data distribution and structure.
This exercise will enhance students’ understanding of unsupervised
learning and the practical challenges involved in clustering tasks.
Software Used:
• Visual studio code
• Jupyter notebook
Introduction:
K-means clustering is one of the most widely used unsupervised
learning algorithms for partitioning a dataset into distinct groups or
clusters based on feature similarity. Unlike supervised learning,
clustering does not rely on labeled data but instead identifies inherent
patterns by grouping data points that are closer to each other in the
feature space. The K-means algorithm iteratively assigns data points
to one of k clusters by minimizing the sum of squared distances
between points and their respective cluster centroids. This approach is
valued for its simplicity, efficiency, and scalability to large datasets.
This lab focuses on implementing the K-means algorithm, exploring
how cluster centroids are initialized, updated, and how convergence is
achieved.
LAB TASK
Task No: 01
Code:
import numpy as np
from scipy.spatial.distance import euclidean, cityblock
# Product data: [Region 1, Region 2]
data = {
'A': [22, 21],
'B': [19, 20],
'C': [18, 22],
'D': [1, 3],
'E': [4, 2]
}
points = np.array(list(data.values()))
labels = list(data.keys())
init_centroids = {
'A': np.array(data['A']),
'E': np.array(data['E'])
}
def kmeans_custom(points, init_centroids, dist_func):
# Initialize clusters
centroids = {'A': init_centroids['A'], 'E': init_centroids['E']}
prev_assignment = None
while True:
clusters = {'A': [], 'E': []}
# Assign points to closest centroid
for i, point in enumerate(points):
dists = {k: dist_func(point, v) for k, v in centroids.items()}
closest = min(dists, key=dists.get)
clusters[closest].append(i)
# Check for convergence
assignment = [None] * len(points)
for cluster, indices in clusters.items():
for i in indices:
assignment[i] = cluster
if assignment == prev_assignment:
break
prev_assignment = assignment
# Recompute centroids
for cluster, indices in clusters.items():
if indices:
centroids[cluster] = np.mean(points[indices], axis=0)
return assignment
# Run k-means with Euclidean and Manhattan distances
euclidean_result = kmeans_custom(points, init_centroids, euclidean)
manhattan_result = kmeans_custom(points, init_centroids, cityblock)
print("Euclidean Clustering:")
for i, cluster in enumerate(euclidean_result):
print(f"Product {labels[i]} -> Cluster {cluster}")
print("\nManhattan Clustering:")
for i, cluster in enumerate(manhattan_result):
print(f"Product {labels[i]} -> Cluster {cluster}")
Output:
Task No: 02
Perform Task 1 using sklearn.cluster.KMeans.
Code:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from sklearn.metrics import pairwise_distances_argmin
# Data points
X = np.array([[5, 4], [1, 4], [4, 6], [9, 2], [7, 3]])
# Initial centroids: A = [5,4], E = [7,3]
initial_centroids = np.array([[5, 4], [7, 3]])
# Custom Manhattan distance function
def manhattan_kmeans(X, init_centroids, max_iter=10):
centroids = init_centroids
for _ in range(max_iter):
labels = pairwise_distances_argmin(X, centroids,
metric='manhattan')
new_centroids = np.array([X[labels == i].mean(axis=0) for i in
range(len(init_centroids))])
if np.all(centroids == new_centroids):
break
centroids = new_centroids
return labels, centroids
labels, final_centroids = manhattan_kmeans(X, initial_centroids)
# Plot
colors = ['r', 'b']
for i in range(len(X)):
plt.scatter(X[i][0], X[i][1], color=colors[labels[i]])
plt.scatter(final_centroids[:, 0], final_centroids[:, 1], color='black',
marker='x', s=100)
plt.title("K-Means Clustering using Manhattan Distance")
plt.xlabel("Region 1")
plt.ylabel("Region 2")
plt.grid(True)
plt.show()
Output:
Conclusion:
In conclusion, this lab demonstrated the practical application of the K-
means clustering algorithm using both Euclidean and Manhattan
distance metrics. Starting with initial centroids based on products A
and E, the algorithm successfully grouped the products into two
meaningful clusters. The exercise highlighted how the choice of
distance metric influences cluster assignment but resulted in
consistent final clusters in this case. Through iterative centroid
updates and reassignment of points, K-means efficiently minimized
within-cluster variance, showcasing its effectiveness in unsupervised
learning tasks. Overall, the lab reinforced understanding of cluster
formation, distance measures, and convergence criteria, providing a
solid foundation for applying K-means to real-world datasets.