VISVESVARAYA TECHNOLOGICAL
UNIVERSITY
JNANA SANGAMA, BELGAVI-590018, KARNATAKA
MACHINE LEARNING
LABORATORY
(As per CBCS Scheme 2022)
Sub Code: BCSL606
PREPARED BY:
INDHUMATHI R
TEACHING ASSISTANT
DEPT OF CSE-DS, KNSIT
DEPARTMENT OF COMPUTER SCIENCE (DATA SCIENCE) AND ENGINEERING
K.N.S INSTITUTE OF TECHNOLOGY
HEGDE-NAGAR, KOGILU ROAD,
THIRUMENAHALLI, YELAHANKA,
BANGALORE-560064
MACHINE LEARNIG LABORATORY BCSL606
PROGRAM LISTS:
1 Develop a program to create histograms for all numerical features and analyze the
distribution of each feature. Generate box plots for all numerical features and identify
any outliers. Use California Housing dataset.
2 Develop a program to Compute the correlation matrix to understand the relationships
between pairs of features. Visualize the correlation matrix using a heatmap to know
which variables have strong positive/negative correlations. Create a pair plot to
visualize pairwise relationships between features. Use California Housing dataset.
3 Develop a program to implement Principal Component Analysis (PCA) for reducing
the dimensionality of the Iris dataset from 4 features to 2.
4 For a given set of training data examples stored in a .CSV file, implement and
demonstrate the Find-S algorithm to output a description of the set of all hypotheses
consistent with the training examples.
5 Develop a program to implement k-Nearest Neighbour algorithm to classify the
randomly generated 100 values of x in the range of [0,1]. Perform the following based
on dataset generated.
a) Label the first 50 points {x1,……,x50} as follows: if (xi ≤ 0.5), then xi ∊ Class1,
else xi ∊ Class1
b) Classify the remaining points, x51,……,x100 using KNN. Perform this for
k=1,2,3,4,5,20,30
6 Implement the non-parametric Locally Weighted Regression algorithm in order to fit
data points. Select appropriate data set for your experiment and draw graphs.
7 Develop a program to demonstrate the working of Linear Regression and Polynomial
Regression. Use Boston Housing Dataset for Linear Regression and Auto MPG
Dataset (for vehicle fuel efficiency prediction) for Polynomial Regression.
8 Develop a program to demonstrate the working of the decision tree algorithm. Use
Breast Cancer Data set for building the decision tree and apply this knowledge to
classify a new sample.
9 Develop a program to implement the Naive Bayesian classifier considering Olivetti
Face Data set for training. Compute the accuracy of the classifier, considering a few
test data sets.
10 Develop a program to implement k-means clustering using Wisconsin Breast Cancer
data set and visualize the clustering result.
DEPT OF CSE-DS, KNSIT 2
MACHINE LEARNIG LABORATORY BCSL606
PROGRAM 1
Develop a program to create histograms for all numerical features and
analyze the distribution of each feature. Generate box plots for all
numerical features and identify any outliers. Use California Housing
dataset.
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.datasets import fetch_california_housing
# Step 1: Load the California Housing dataset
data = fetch_california_housing(as_frame=True)
housing_df = data.frame
# Step 2: Create histograms for numerical features
numerical_features = housing_df.select_dtypes(include=[np.number]).columns
# Plot histograms
plt.figure(figsize=(15, 10))
for i, feature in enumerate(numerical_features):
plt.subplot(3, 3, i + 1)
sns.histplot(housing_df[feature], kde=True, bins=30, color='blue')
DEPT OF CSE-DS, KNSIT 3
MACHINE LEARNIG LABORATORY BCSL606
plt.title(f'Distribution of {feature}')
plt.tight_layout()
plt.show()
# Step 3: Generate box plots for numerical features
plt.figure(figsize=(15, 10))
for i, feature in enumerate(numerical_features):
plt.subplot(3, 3, i + 1)
automatically
sns.boxplot(x=housing_df[feature], color='orange') adjusts the
spacing
plt.title(f'Box Plot of {feature}') between
subplots (or
elements like
plt.tight_layout() axis labels
and titles) so
they fit nicely
plt.show() within the
figure area
and don't
overlap.
# Step 4: Identify outliers using the IQR method
print("Outliers Detection:")
outliers_summary = {}
for feature in numerical_features:
Q1 = housing_df[feature].quantile(0.25)
Q3 = housing_df[feature].quantile(0.75)
IQR = Q3 - Q1
lower_bound = Q1 - 1.5 * IQR
upper_bound = Q3 + 1.5 * IQR
DEPT OF CSE-DS, KNSIT 4
MACHINE LEARNIG LABORATORY BCSL606
is a Boolean Series
indicating which rows are
outliers = housing_df[(housing_df[feature] < lower_bound) | outliers. When wrapped
in housing_df[...], it
(housing_df[feature] > upper_bound)] returns a subset of the
original DataFrame — the
rows that are outliers.
outliers_summary[feature] = len(outliers)
means you're recording the number of
print(f"{feature}: {len(outliers)} outliers") outliers for a given feature into a
dictionary or Series called
outliers_summary.
# Optional: Print a summary of the dataset
print("\nDataset Summary:")
print(housing_df.describe())
OUTPUT:
Outliers Detection:
MedInc: 681 outliers
HouseAge: 0 outliers
AveRooms: 511 outliers
AveBedrms: 1424 outliers
Population: 1196 outliers
AveOccup: 711 outliers
Latitude: 0 outliers
Longitude: 0 outliers
MedHouseVal: 1071 outliers
Dataset Summary:
DEPT OF CSE-DS, KNSIT 5
MACHINE LEARNIG LABORATORY BCSL606
MedInc HouseAge ... Longitude MedHouseVal
count 20640.000000 20640.000000 ... 20640.000000 20640.000000
mean 3.870671 28.639486 ... -119.569704 2.068558
std 1.899822 12.585558 ... 2.003532 1.153956
min 0.499900 1.000000 ... -124.350000 0.149990
25% 2.563400 18.000000 ... -121.800000 1.196000
50% 3.534800 29.000000 ... -118.490000 1.797000
75% 4.743250 37.000000 ... -118.010000 2.647250
max 15.000100 52.000000 ... -114.310000 5.000010
[8 rows x 9 columns]
DEPT OF CSE-DS, KNSIT 6
MACHINE LEARNIG LABORATORY BCSL606
DEPT OF CSE-DS, KNSIT 7
MACHINE LEARNIG LABORATORY BCSL606
PROGRAM 2
Develop a program to compute the correlation matrix to understand the
relationships between pairs of features. Visualize the correlation matrix
using a heatmap to know which variables have strong positive/negative
correlations. Create a pair plot to visualize pairwise relationships between
features. Use California Housing dataset.
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.datasets import fetch_california_housing# Step 1: Load the
California Housing Dataset
california_data = fetch_california_housing(as_frame=True)
data = california_data.frame# Step 2: Compute the correlation matrix
correlation_matrix = data.corr()# Step 3: Visualize the correlation matrix using
a heatmap
Annotates each cell with the numeric value.
plt.figure(figsize=(10, 8))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', fmt='.2f',
linewidths=0.5)
plt.title('Correlation Matrix of California Housing Features')
plt.show()# Step 4: Create a pair plot to visualize pairwise relationships
Keyword arguments passed
sns.pairplot(data, diag_kind='kde', plot_kws={'alpha': 0.5}) to the scatter plot on the off-
diagonal cells.
plt.suptitle('Pair Plot of California Housing Features', y=1.02) 'alpha': 0.5 sets
transparency (opacity level);
plt.show() 0 is fully transparent, 1 is
fully opaque.
Useful to reduce overlap
clutter in dense scatter plots.
DEPT OF CSE-DS, KNSIT 8
MACHINE LEARNIG LABORATORY BCSL606
OUTPUT:
KDE(Kernel Density Curve)
DEPT OF CSE-DS, KNSIT 9
MACHINE LEARNIG LABORATORY BCSL606
PROGRAM 3
Develop a program to implement Principal Component Analysis (PCA) for
reducing the dimensionality of the Iris dataset from 4 features to 2.
import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.decomposition import PCA
import matplotlib.pyplot as plt
# Load the Iris dataset
iris = load_iris()
data = iris.data 2D NumPy array of shape (150, 4) — contains 4 features for 150
samples
1D array of integers (0, 1, 2) — target class for each flower
labels = iris.target Array of class names: ['setosa', 'versicolor', 'virginica']
label_names = iris.target_names
# Convert to a DataFrame for better visualization
iris_df = pd.DataFrame(data, columns=iris.feature_names)
# Perform PCA to reduce dimensionality to 2
pca = PCA(n_components=2)
data_reduced = pca.fit_transform(data)
# Create a DataFrame for the reduced data
reduced_df = pd.DataFrame(data_reduced, columns=['Principal Component 1',
'Principal Component 2'])
reduced_df['Label'] = labels adding a new column 'Label' to the reduced_df DataFrame, where the
labels array holds the corresponding labels for each data point.
DEPT OF CSE-DS, KNSIT 10
MACHINE LEARNIG LABORATORY BCSL606
# Plot the reduced data This filters the reduced_df DataFrame to select rows where
the Label column matches the current value of label. This
plt.figure(figsize=(8, 6)) means you're isolating data points belonging to a specific
class or category (the current label).
colors = ['r', 'g', 'b'] Extracts the values for the first principal component of the
data, for the subset that matches the current label.
for i, label in enumerate(np.unique(labels)):
plt.scatter(
reduced_df[reduced_df['Label'] == label]['Principal Component 1'],
reduced_df[reduced_df['Label'] == label]['Principal Component 2'],
label=label_names[label], This assigns the label's name (from a predefined list or dictionary,
label_names) to the current category's points in the plot. This will
color=colors[i] appear in the legend
plt.title('PCA on Iris Dataset')
plt.xlabel('Principal Component 1')
plt.ylabel('Principal Component 2')
plt.legend()
plt.grid()
plt.show()
DEPT OF CSE-DS, KNSIT 11
MACHINE LEARNIG LABORATORY BCSL606
OUTPUT:
DEPT OF CSE-DS, KNSIT 12
MACHINE LEARNIG LABORATORY BCSL606
PROGRAM 4
For a given set of training data examples stored in a .CSV file, implement
and demonstrate the Find-S algorithm to output a description of the set of
all hypotheses consistent with the training examples.
Download csv file: click here
import pandas as pd
def find_s_algorithm(file_path):
data = pd.read_csv(file_path)
print("Training data:")
print(data)
data.columns[:-1] selects all columns except the
last one, assuming the last column is the target
attributes = data.columns[:-1] (label).
Example: if columns are ['age', 'income',
class_label = data.columns[-1] 'bought_insurance'], then attributes will be ['age',
'income']
hypothesis = ['?' for _ in attributes] hypothesis = ['?', '?', '?']
for index, row in data.iterrows():
if row[class_label] == 'Yes':
for i, value in enumerate(row[attributes]):
DEPT OF CSE-DS, KNSIT 13
MACHINE LEARNIG LABORATORY BCSL606
if hypothesis[i] == '?' or hypothesis[i] == value:
If the current hypothesis is '?' (meaning it
hypothesis[i] = value accepts any value), or it matches the
value from the training example set it to
the actual value.
else:
Else, if there's a conflict (i.e., the
hypothesis value and the training value
hypothesis[i] = '?' differ) set it to '?', generalizing the
hypothesis.
return hypothesis
file_path = 'training_data.csv'
hypothesis = find_s_algorithm(file_path)
print("\nThe final hypothesis is:", hypothesis)
Upload CSV Manually to Jupyter Notebook:
Open Jupyter Notebook in a browser.
Click on the Upload button (top-right corner).
Select the dataset.csv file from their computer.
Click Upload and confirm.
OUTPUT:
Training data:
Outlook Temperature Humidity Windy PlayTennis
0 Sunny Hot High False No
DEPT OF CSE-DS, KNSIT 14
MACHINE LEARNIG LABORATORY BCSL606
1 Sunny Hot High True No
2 Overcast Hot High False Yes
3 Rain Cold High False Yes
4 Rain Cold High True No
5 Overcast Hot High True Yes
6 Sunny Hot High False No
The final hypothesis is: ['Overcast', 'Hot', 'High', '?']
DEPT OF CSE-DS, KNSIT 15
MACHINE LEARNIG LABORATORY BCSL606
PROGRAM 5
Develop a program to implement k-Nearest Neighbour algorithm to
classify the randomly generated 100 values of x in the range of [0,1].
Perform the following based on dataset generated.
a) Label the first 50 points {x1,……,x50} as follows: if (xi ≤ 0.5), then xi ∊
Class1, else xi ∊ Class1
b) Classify the remaining points, x51,……,x100 using KNN. Perform this
for k=1,2,3,4,5,20,30
import numpy as np
import matplotlib.pyplot as plt
from collections import Counter
data = np.random.rand(100)
labels = ["Class1" if x <= 0.5 else "Class2" for x in data[:50]]
def euclidean_distance(x1, x2):
return abs(x1 - x2)
def knn_classifier(train_data, train_labels, test_point, k):
distances = [(euclidean_distance(test_point, train_data[i]), train_labels[i]) for
i in range(len(train_data))]
distances.sort(key=lambda x: x[0])
k_nearest_neighbors = distances[:k]
k_nearest_labels = [label for _, label in k_nearest_neighbors]
DEPT OF CSE-DS, KNSIT 16
MACHINE LEARNIG LABORATORY BCSL606
return Counter(k_nearest_labels).most_common(1)[0][0]
train_data = data[:50]
train_labels = labels
test_data = data[50:]
k_values = [1, 2, 3, 4, 5, 20, 30]
print("--- k-Nearest Neighbors Classification ---")
print("Training dataset: First 50 points labeled based on the rule (x <= 0.5 ->
Class1, x > 0.5 -> Class2)")
print("Testing dataset: Remaining 50 points to be classified\n")
results = {}
for k in k_values:
print(f"Results for k = {k}:")
classified_labels = [knn_classifier(train_data, train_labels, test_point, k) for
test_point in test_data]
results[k] = classified_labels
DEPT OF CSE-DS, KNSIT 17
MACHINE LEARNIG LABORATORY BCSL606
for i, label in enumerate(classified_labels, start=51):
print(f"Point x{i} (value: {test_data[i - 51]:.4f}) is classified as {label}")
print("\n")
print("Classification complete.\n")
for k in k_values:
classified_labels = results[k]
class1_points = [test_data[i] for i in range(len(test_data)) if
classified_labels[i] == "Class1"]
class2_points = [test_data[i] for i in range(len(test_data)) if
classified_labels[i] == "Class2"]
plt.figure(figsize=(10, 6))
plt.scatter(train_data, [0] * len(train_data), c=["blue" if label == "Class1" else
"red" for label in train_labels],
label="Training Data", marker="o")
plt.scatter(class1_points, [1] * len(class1_points), c="blue", label="Class1
(Test)", marker="x")
plt.scatter(class2_points, [1] * len(class2_points), c="red", label="Class2
(Test)", marker="x")
plt.title(f"k-NN Classification Results for k = {k}")
plt.xlabel("Data Points")
plt.ylabel("Classification Level")
plt.legend()
DEPT OF CSE-DS, KNSIT 18
MACHINE LEARNIG LABORATORY BCSL606
plt.grid(True)
plt.show()
OUTPUT:
--- k-Nearest Neighbors Classification ---
Training dataset: First 50 points labeled based on the rule (x <= 0.5 -> Class1, x
> 0.5 -> Class2)
Testing dataset: Remaining 50 points to be classified
Results for k = 1:
Point x51 (value: 0.2059) is classified as Class1
Point x52 (value: 0.2535) is classified as Class1
Point x53 (value: 0.4856) is classified as Class1
DEPT OF CSE-DS, KNSIT 19
MACHINE LEARNIG LABORATORY BCSL606
Point x54 (value: 0.9651) is classified as Class2
Point x55 (value: 0.3906) is classified as Class1
Point x56 (value: 0.8903) is classified as Class2
Point x57 (value: 0.9695) is classified as Class2
Point x58 (value: 0.2206) is classified as Class1
Point x59 (value: 0.0203) is classified as Class1
Point x60 (value: 0.1619) is classified as Class1
Point x61 (value: 0.6461) is classified as Class2
Point x62 (value: 0.6523) is classified as Class2
Point x63 (value: 0.8728) is classified as Class2
Point x64 (value: 0.5435) is classified as Class2
Point x65 (value: 0.8246) is classified as Class2
Point x66 (value: 0.9347) is classified as Class2
Point x67 (value: 0.5361) is classified as Class2
Point x68 (value: 0.7215) is classified as Class2
Point x69 (value: 0.9703) is classified as Class2
Point x70 (value: 0.8764) is classified as Class2
Point x71 (value: 0.7543) is classified as Class2
Point x72 (value: 0.1406) is classified as Class1
Point x73 (value: 0.1349) is classified as Class1
Point x74 (value: 0.9705) is classified as Class2
Point x75 (value: 0.2985) is classified as Class1
DEPT OF CSE-DS, KNSIT 20
MACHINE LEARNIG LABORATORY BCSL606
Point x76 (value: 0.9948) is classified as Class2
Point x77 (value: 0.4551) is classified as Class1
Point x78 (value: 0.2101) is classified as Class1
Point x79 (value: 0.5542) is classified as Class2
Point x80 (value: 0.3202) is classified as Class1
Point x81 (value: 0.6325) is classified as Class2
Point x82 (value: 0.9345) is classified as Class2
Point x83 (value: 0.0156) is classified as Class1
Point x84 (value: 0.8859) is classified as Class2
Point x85 (value: 0.2495) is classified as Class1
Point x86 (value: 0.6380) is classified as Class2
Point x87 (value: 0.7095) is classified as Class2
Point x88 (value: 0.4259) is classified as Class1
Point x89 (value: 0.0052) is classified as Class1
Point x90 (value: 0.6322) is classified as Class2
Point x91 (value: 0.1701) is classified as Class1
Point x92 (value: 0.3693) is classified as Class1
Point x93 (value: 0.4087) is classified as Class1
Point x94 (value: 0.8103) is classified as Class2
Point x95 (value: 0.0773) is classified as Class1
Point x96 (value: 0.8792) is classified as Class2
Point x97 (value: 0.9138) is classified as Class2
DEPT OF CSE-DS, KNSIT 21
MACHINE LEARNIG LABORATORY BCSL606
Point x98 (value: 0.5567) is classified as Class2
Point x99 (value: 0.8625) is classified as Class2
Point x100 (value: 0.9363) is classified as Class2
Results for k = 2:
Point x51 (value: 0.2059) is classified as Class1
Point x52 (value: 0.2535) is classified as Class1
Point x53 (value: 0.4856) is classified as Class1
Point x54 (value: 0.9651) is classified as Class2
Point x55 (value: 0.3906) is classified as Class1
Point x56 (value: 0.8903) is classified as Class2
Point x57 (value: 0.9695) is classified as Class2
Point x58 (value: 0.2206) is classified as Class1
Point x59 (value: 0.0203) is classified as Class1
Point x60 (value: 0.1619) is classified as Class1
Point x61 (value: 0.6461) is classified as Class2
Point x62 (value: 0.6523) is classified as Class2
Point x63 (value: 0.8728) is classified as Class2
Point x64 (value: 0.5435) is classified as Class2
Point x65 (value: 0.8246) is classified as Class2
Point x66 (value: 0.9347) is classified as Class2
DEPT OF CSE-DS, KNSIT 22
MACHINE LEARNIG LABORATORY BCSL606
Point x67 (value: 0.5361) is classified as Class2
Point x68 (value: 0.7215) is classified as Class2
Point x69 (value: 0.9703) is classified as Class2
Point x70 (value: 0.8764) is classified as Class2
Point x71 (value: 0.7543) is classified as Class2
Point x72 (value: 0.1406) is classified as Class1
Point x73 (value: 0.1349) is classified as Class1
Point x74 (value: 0.9705) is classified as Class2
Point x75 (value: 0.2985) is classified as Class1
Point x76 (value: 0.9948) is classified as Class2
Point x77 (value: 0.4551) is classified as Class1
Point x78 (value: 0.2101) is classified as Class1
Point x79 (value: 0.5542) is classified as Class2
Point x80 (value: 0.3202) is classified as Class1
Point x81 (value: 0.6325) is classified as Class2
Point x82 (value: 0.9345) is classified as Class2
Point x83 (value: 0.0156) is classified as Class1
Point x84 (value: 0.8859) is classified as Class2
Point x85 (value: 0.2495) is classified as Class1
Point x86 (value: 0.6380) is classified as Class2
Point x87 (value: 0.7095) is classified as Class2
Point x88 (value: 0.4259) is classified as Class1
DEPT OF CSE-DS, KNSIT 23
MACHINE LEARNIG LABORATORY BCSL606
Point x89 (value: 0.0052) is classified as Class1
Point x90 (value: 0.6322) is classified as Class2
Point x91 (value: 0.1701) is classified as Class1
Point x92 (value: 0.3693) is classified as Class1
Point x93 (value: 0.4087) is classified as Class1
Point x94 (value: 0.8103) is classified as Class2
Point x95 (value: 0.0773) is classified as Class1
Point x96 (value: 0.8792) is classified as Class2
Point x97 (value: 0.9138) is classified as Class2
Point x98 (value: 0.5567) is classified as Class2
Point x99 (value: 0.8625) is classified as Class2
Point x100 (value: 0.9363) is classified as Class2
Results for k = 3:
Point x51 (value: 0.2059) is classified as Class1
Point x52 (value: 0.2535) is classified as Class1
Point x53 (value: 0.4856) is classified as Class1
Point x54 (value: 0.9651) is classified as Class2
Point x55 (value: 0.3906) is classified as Class1
Point x56 (value: 0.8903) is classified as Class2
Point x57 (value: 0.9695) is classified as Class2
DEPT OF CSE-DS, KNSIT 24
MACHINE LEARNIG LABORATORY BCSL606
Point x58 (value: 0.2206) is classified as Class1
Point x59 (value: 0.0203) is classified as Class1
Point x60 (value: 0.1619) is classified as Class1
Point x61 (value: 0.6461) is classified as Class2
Point x62 (value: 0.6523) is classified as Class2
Point x63 (value: 0.8728) is classified as Class2
Point x64 (value: 0.5435) is classified as Class2
Point x65 (value: 0.8246) is classified as Class2
Point x66 (value: 0.9347) is classified as Class2
Point x67 (value: 0.5361) is classified as Class2
Point x68 (value: 0.7215) is classified as Class2
Point x69 (value: 0.9703) is classified as Class2
Point x70 (value: 0.8764) is classified as Class2
Point x71 (value: 0.7543) is classified as Class2
Point x72 (value: 0.1406) is classified as Class1
Point x73 (value: 0.1349) is classified as Class1
Point x74 (value: 0.9705) is classified as Class2
Point x75 (value: 0.2985) is classified as Class1
Point x76 (value: 0.9948) is classified as Class2
Point x77 (value: 0.4551) is classified as Class1
Point x78 (value: 0.2101) is classified as Class1
Point x79 (value: 0.5542) is classified as Class2
DEPT OF CSE-DS, KNSIT 25
MACHINE LEARNIG LABORATORY BCSL606
Point x80 (value: 0.3202) is classified as Class1
Point x81 (value: 0.6325) is classified as Class2
Point x82 (value: 0.9345) is classified as Class2
Point x83 (value: 0.0156) is classified as Class1
Point x84 (value: 0.8859) is classified as Class2
Point x85 (value: 0.2495) is classified as Class1
Point x86 (value: 0.6380) is classified as Class2
Point x87 (value: 0.7095) is classified as Class2
Point x88 (value: 0.4259) is classified as Class1
Point x89 (value: 0.0052) is classified as Class1
Point x90 (value: 0.6322) is classified as Class2
Point x91 (value: 0.1701) is classified as Class1
Point x92 (value: 0.3693) is classified as Class1
Point x93 (value: 0.4087) is classified as Class1
Point x94 (value: 0.8103) is classified as Class2
Point x95 (value: 0.0773) is classified as Class1
Point x96 (value: 0.8792) is classified as Class2
Point x97 (value: 0.9138) is classified as Class2
Point x98 (value: 0.5567) is classified as Class2
Point x99 (value: 0.8625) is classified as Class2
Point x100 (value: 0.9363) is classified as Class2
DEPT OF CSE-DS, KNSIT 26
MACHINE LEARNIG LABORATORY BCSL606
Results for k = 4:
Point x51 (value: 0.2059) is classified as Class1
Point x52 (value: 0.2535) is classified as Class1
Point x53 (value: 0.4856) is classified as Class1
Point x54 (value: 0.9651) is classified as Class2
Point x55 (value: 0.3906) is classified as Class1
Point x56 (value: 0.8903) is classified as Class2
Point x57 (value: 0.9695) is classified as Class2
Point x58 (value: 0.2206) is classified as Class1
Point x59 (value: 0.0203) is classified as Class1
Point x60 (value: 0.1619) is classified as Class1
Point x61 (value: 0.6461) is classified as Class2
Point x62 (value: 0.6523) is classified as Class2
Point x63 (value: 0.8728) is classified as Class2
Point x64 (value: 0.5435) is classified as Class2
Point x65 (value: 0.8246) is classified as Class2
Point x66 (value: 0.9347) is classified as Class2
Point x67 (value: 0.5361) is classified as Class2
Point x68 (value: 0.7215) is classified as Class2
Point x69 (value: 0.9703) is classified as Class2
Point x70 (value: 0.8764) is classified as Class2
DEPT OF CSE-DS, KNSIT 27
MACHINE LEARNIG LABORATORY BCSL606
Point x71 (value: 0.7543) is classified as Class2
Point x72 (value: 0.1406) is classified as Class1
Point x73 (value: 0.1349) is classified as Class1
Point x74 (value: 0.9705) is classified as Class2
Point x75 (value: 0.2985) is classified as Class1
Point x76 (value: 0.9948) is classified as Class2
Point x77 (value: 0.4551) is classified as Class1
Point x78 (value: 0.2101) is classified as Class1
Point x79 (value: 0.5542) is classified as Class2
Point x80 (value: 0.3202) is classified as Class1
Point x81 (value: 0.6325) is classified as Class2
Point x82 (value: 0.9345) is classified as Class2
Point x83 (value: 0.0156) is classified as Class1
Point x84 (value: 0.8859) is classified as Class2
Point x85 (value: 0.2495) is classified as Class1
Point x86 (value: 0.6380) is classified as Class2
Point x87 (value: 0.7095) is classified as Class2
Point x88 (value: 0.4259) is classified as Class1
Point x89 (value: 0.0052) is classified as Class1
Point x90 (value: 0.6322) is classified as Class2
Point x91 (value: 0.1701) is classified as Class1
Point x92 (value: 0.3693) is classified as Class1
DEPT OF CSE-DS, KNSIT 28
MACHINE LEARNIG LABORATORY BCSL606
Point x93 (value: 0.4087) is classified as Class1
Point x94 (value: 0.8103) is classified as Class2
Point x95 (value: 0.0773) is classified as Class1
Point x96 (value: 0.8792) is classified as Class2
Point x97 (value: 0.9138) is classified as Class2
Point x98 (value: 0.5567) is classified as Class2
Point x99 (value: 0.8625) is classified as Class2
Point x100 (value: 0.9363) is classified as Class2
Results for k = 5:
Point x51 (value: 0.2059) is classified as Class1
Point x52 (value: 0.2535) is classified as Class1
Point x53 (value: 0.4856) is classified as Class1
Point x54 (value: 0.9651) is classified as Class2
Point x55 (value: 0.3906) is classified as Class1
Point x56 (value: 0.8903) is classified as Class2
Point x57 (value: 0.9695) is classified as Class2
Point x58 (value: 0.2206) is classified as Class1
Point x59 (value: 0.0203) is classified as Class1
Point x60 (value: 0.1619) is classified as Class1
Point x61 (value: 0.6461) is classified as Class2
Point x62 (value: 0.6523) is classified as Class2
DEPT OF CSE-DS, KNSIT 29
MACHINE LEARNIG LABORATORY BCSL606
Point x63 (value: 0.8728) is classified as Class2
Point x64 (value: 0.5435) is classified as Class2
Point x65 (value: 0.8246) is classified as Class2
Point x66 (value: 0.9347) is classified as Class2
Point x67 (value: 0.5361) is classified as Class1
Point x68 (value: 0.7215) is classified as Class2
Point x69 (value: 0.9703) is classified as Class2
Point x70 (value: 0.8764) is classified as Class2
Point x71 (value: 0.7543) is classified as Class2
Point x72 (value: 0.1406) is classified as Class1
Point x73 (value: 0.1349) is classified as Class1
Point x74 (value: 0.9705) is classified as Class2
Point x75 (value: 0.2985) is classified as Class1
Point x76 (value: 0.9948) is classified as Class2
Point x77 (value: 0.4551) is classified as Class1
Point x78 (value: 0.2101) is classified as Class1
Point x79 (value: 0.5542) is classified as Class2
Point x80 (value: 0.3202) is classified as Class1
Point x81 (value: 0.6325) is classified as Class2
Point x82 (value: 0.9345) is classified as Class2
Point x83 (value: 0.0156) is classified as Class1
Point x84 (value: 0.8859) is classified as Class2
DEPT OF CSE-DS, KNSIT 30
MACHINE LEARNIG LABORATORY BCSL606
Point x85 (value: 0.2495) is classified as Class1
Point x86 (value: 0.6380) is classified as Class2
Point x87 (value: 0.7095) is classified as Class2
Point x88 (value: 0.4259) is classified as Class1
Point x89 (value: 0.0052) is classified as Class1
Point x90 (value: 0.6322) is classified as Class2
Point x91 (value: 0.1701) is classified as Class1
Point x92 (value: 0.3693) is classified as Class1
Point x93 (value: 0.4087) is classified as Class1
Point x94 (value: 0.8103) is classified as Class2
Point x95 (value: 0.0773) is classified as Class1
Point x96 (value: 0.8792) is classified as Class2
Point x97 (value: 0.9138) is classified as Class2
Point x98 (value: 0.5567) is classified as Class2
Point x99 (value: 0.8625) is classified as Class2
Point x100 (value: 0.9363) is classified as Class2
Results for k = 20:
Point x51 (value: 0.2059) is classified as Class1
Point x52 (value: 0.2535) is classified as Class1
Point x53 (value: 0.4856) is classified as Class1
DEPT OF CSE-DS, KNSIT 31
MACHINE LEARNIG LABORATORY BCSL606
Point x54 (value: 0.9651) is classified as Class2
Point x55 (value: 0.3906) is classified as Class1
Point x56 (value: 0.8903) is classified as Class2
Point x57 (value: 0.9695) is classified as Class2
Point x58 (value: 0.2206) is classified as Class1
Point x59 (value: 0.0203) is classified as Class1
Point x60 (value: 0.1619) is classified as Class1
Point x61 (value: 0.6461) is classified as Class2
Point x62 (value: 0.6523) is classified as Class2
Point x63 (value: 0.8728) is classified as Class2
Point x64 (value: 0.5435) is classified as Class1
Point x65 (value: 0.8246) is classified as Class2
Point x66 (value: 0.9347) is classified as Class2
Point x67 (value: 0.5361) is classified as Class1
Point x68 (value: 0.7215) is classified as Class2
Point x69 (value: 0.9703) is classified as Class2
Point x70 (value: 0.8764) is classified as Class2
Point x71 (value: 0.7543) is classified as Class2
Point x72 (value: 0.1406) is classified as Class1
Point x73 (value: 0.1349) is classified as Class1
Point x74 (value: 0.9705) is classified as Class2
Point x75 (value: 0.2985) is classified as Class1
DEPT OF CSE-DS, KNSIT 32
MACHINE LEARNIG LABORATORY BCSL606
Point x76 (value: 0.9948) is classified as Class2
Point x77 (value: 0.4551) is classified as Class1
Point x78 (value: 0.2101) is classified as Class1
Point x79 (value: 0.5542) is classified as Class1
Point x80 (value: 0.3202) is classified as Class1
Point x81 (value: 0.6325) is classified as Class2
Point x82 (value: 0.9345) is classified as Class2
Point x83 (value: 0.0156) is classified as Class1
Point x84 (value: 0.8859) is classified as Class2
Point x85 (value: 0.2495) is classified as Class1
Point x86 (value: 0.6380) is classified as Class2
Point x87 (value: 0.7095) is classified as Class2
Point x88 (value: 0.4259) is classified as Class1
Point x89 (value: 0.0052) is classified as Class1
Point x90 (value: 0.6322) is classified as Class2
Point x91 (value: 0.1701) is classified as Class1
Point x92 (value: 0.3693) is classified as Class1
Point x93 (value: 0.4087) is classified as Class1
Point x94 (value: 0.8103) is classified as Class2
Point x95 (value: 0.0773) is classified as Class1
Point x96 (value: 0.8792) is classified as Class2
Point x97 (value: 0.9138) is classified as Class2
DEPT OF CSE-DS, KNSIT 33
MACHINE LEARNIG LABORATORY BCSL606
Point x98 (value: 0.5567) is classified as Class2
Point x99 (value: 0.8625) is classified as Class2
Point x100 (value: 0.9363) is classified as Class2
Results for k = 30:
Point x51 (value: 0.2059) is classified as Class1
Point x52 (value: 0.2535) is classified as Class1
Point x53 (value: 0.4856) is classified as Class1
Point x54 (value: 0.9651) is classified as Class2
Point x55 (value: 0.3906) is classified as Class1
Point x56 (value: 0.8903) is classified as Class2
Point x57 (value: 0.9695) is classified as Class2
Point x58 (value: 0.2206) is classified as Class1
Point x59 (value: 0.0203) is classified as Class1
Point x60 (value: 0.1619) is classified as Class1
Point x61 (value: 0.6461) is classified as Class2
Point x62 (value: 0.6523) is classified as Class2
Point x63 (value: 0.8728) is classified as Class2
Point x64 (value: 0.5435) is classified as Class1
Point x65 (value: 0.8246) is classified as Class2
Point x66 (value: 0.9347) is classified as Class2
DEPT OF CSE-DS, KNSIT 34
MACHINE LEARNIG LABORATORY BCSL606
Point x67 (value: 0.5361) is classified as Class1
Point x68 (value: 0.7215) is classified as Class2
Point x69 (value: 0.9703) is classified as Class2
Point x70 (value: 0.8764) is classified as Class2
Point x71 (value: 0.7543) is classified as Class2
Point x72 (value: 0.1406) is classified as Class1
Point x73 (value: 0.1349) is classified as Class1
Point x74 (value: 0.9705) is classified as Class2
Point x75 (value: 0.2985) is classified as Class1
Point x76 (value: 0.9948) is classified as Class2
Point x77 (value: 0.4551) is classified as Class1
Point x78 (value: 0.2101) is classified as Class1
Point x79 (value: 0.5542) is classified as Class1
Point x80 (value: 0.3202) is classified as Class1
Point x81 (value: 0.6325) is classified as Class2
Point x82 (value: 0.9345) is classified as Class2
Point x83 (value: 0.0156) is classified as Class1
Point x84 (value: 0.8859) is classified as Class2
Point x85 (value: 0.2495) is classified as Class1
Point x86 (value: 0.6380) is classified as Class2
Point x87 (value: 0.7095) is classified as Class2
Point x88 (value: 0.4259) is classified as Class1
DEPT OF CSE-DS, KNSIT 35
MACHINE LEARNIG LABORATORY BCSL606
Point x89 (value: 0.0052) is classified as Class1
Point x90 (value: 0.6322) is classified as Class2
Point x91 (value: 0.1701) is classified as Class1
Point x92 (value: 0.3693) is classified as Class1
Point x93 (value: 0.4087) is classified as Class1
Point x94 (value: 0.8103) is classified as Class2
Point x95 (value: 0.0773) is classified as Class1
Point x96 (value: 0.8792) is classified as Class2
Point x97 (value: 0.9138) is classified as Class2
Point x98 (value: 0.5567) is classified as Class1
Point x99 (value: 0.8625) is classified as Class2
Point x100 (value: 0.9363) is classified as Class2
Classification complete.
DEPT OF CSE-DS, KNSIT 36
MACHINE LEARNIG LABORATORY BCSL606
PROGRAM 6
Implement the non-parametric Locally Weighted Regression algorithm in
order to fit data points. Select appropriate data set for your experiment
and draw graphs.
import numpy as np
import matplotlib.pyplot as plt
def gaussian_kernel(x, xi, tau):
return np.exp(-np.sum((x - xi) ** 2) / (2 * tau ** 2))
def locally_weighted_regression(x, X, y, tau):
m = X.shape[0]
weights = np.array([gaussian_kernel(x, X[i], tau) for i in range(m)])
W = np.diag(weights)
X_transpose_W = X.T @ W X.T: Transpose, @W is matrix multiplication
theta = np.linalg.inv(X_transpose_W @ X) @ X_transpose_W @ y
return x @ theta
np.random.seed(42)
X = np.linspace(0, 2 * np.pi, 100) np.linspace(start, stop, num)
y = np.sin(X) + 0.1 * np.random.randn(100)
DEPT OF CSE-DS, KNSIT 37
MACHINE LEARNIG LABORATORY BCSL606
This is required for linear regression models, so they can
learn both:Slope (weight),Intercept (bias)
X_bias = np.c_[np.ones(X.shape), X]
[np.ones(X.shape):This creates an array of 100 ones
np.c_:This is NumPy shorthand to concatenate arrays
column-wise.
[[1. 0. ]
[1. 0.06346652]
x_test = np.linspace(0, 2 * np.pi, 200) [1. 0.12693304]
...
[1.x_test]
x_test_bias = np.c_[np.ones(x_test.shape), 6.28318531]]
tau = 0.5
y_pred = np.array([locally_weighted_regression(xi, X_bias, y, tau) for xi in
x_test_bias])
plt.figure(figsize=(10, 6))
plt.scatter(X, y, color='red', label='Training Data', alpha=0.7)
plt.plot(x_test, y_pred, color='blue', label=f'LWR Fit (tau={tau})', linewidth=2)
plt.xlabel('X', fontsize=12)
plt.ylabel('y', fontsize=12)
plt.title('Locally Weighted Regression', fontsize=14)
plt.legend(fontsize=10)
plt.grid(alpha=0.3)
plt.show()
Proximity simply means closeness or nearness — how close one point is to another
in space.
In machine learning (especially in locally weighted regression), we measure how
close a test point x is to a training point xi. This closeness is usually computed using
Euclidean distance or some other metric.
DEPT OF CSE-DS, KNSIT 38
MACHINE LEARNIG LABORATORY BCSL606
OUTPUT:
DEPT OF CSE-DS, KNSIT 39
MACHINE LEARNIG LABORATORY BCSL606
PROGRAM 7
Develop a program to demonstrate the working of Linear Regression and
Polynomial Regression. Use Boston Housing Dataset for Linear Regression
and Auto MPG Dataset (for vehicle fuel efficiency prediction) for
Polynomial Regression.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures, StandardScaler
from sklearn.pipeline import make_pipeline
from sklearn.metrics import mean_squared_error, r2_score
def linear_regression_california():
housing = fetch_california_housing(as_frame=True)
X = housing.data[["AveRooms"]]
y = housing.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)
DEPT OF CSE-DS, KNSIT 40
MACHINE LEARNIG LABORATORY BCSL606
model = LinearRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
plt.scatter(X_test, y_test, color="blue", label="Actual")
plt.plot(X_test, y_pred, color="red", label="Predicted")
plt.xlabel("Average number of rooms (AveRooms)")
plt.ylabel("Median value of homes ($100,000)")
plt.title("Linear Regression - California Housing Dataset")
plt.legend()
plt.show()
print("Linear Regression - California Housing Dataset")
print("Mean Squared Error:", mean_squared_error(y_test, y_pred))
print("R^2 Score:", r2_score(y_test, y_pred))
def polynomial_regression_auto_mpg():
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/auto-
mpg/auto-mpg.data"
DEPT OF CSE-DS, KNSIT 41
MACHINE LEARNIG LABORATORY BCSL606
column_names = ["mpg", "cylinders", "displacement", "horsepower",
"weight", "acceleration", "model_year", "origin"]
data = pd.read_csv(url, sep='\s+', names=column_names, na_values="?")
data = data.dropna()
X = data["displacement"].values.reshape(-1, 1)
y = data["mpg"].values
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)
poly_model = make_pipeline(PolynomialFeatures(degree=2),
StandardScaler(), LinearRegression())
poly_model.fit(X_train, y_train)
y_pred = poly_model.predict(X_test)
plt.scatter(X_test, y_test, color="blue", label="Actual")
plt.scatter(X_test, y_pred, color="red", label="Predicted")
plt.xlabel("Displacement")
plt.ylabel("Miles per gallon (mpg)")
plt.title("Polynomial Regression - Auto MPG Dataset")
DEPT OF CSE-DS, KNSIT 42
MACHINE LEARNIG LABORATORY BCSL606
plt.legend()
plt.show()
print("Polynomial Regression - Auto MPG Dataset")
print("Mean Squared Error:", mean_squared_error(y_test, y_pred))
print("R^2 Score:", r2_score(y_test, y_pred))
if __name__ == "__main__":
print("Demonstrating Linear Regression and Polynomial Regression\n")
linear_regression_california()
polynomial_regression_auto_mpg()
DEPT OF CSE-DS, KNSIT 43
MACHINE LEARNIG LABORATORY BCSL606
OUTPUT:
Demonstrating Linear Regression and Polynomial Regression
Linear Regression - California Housing Dataset
Mean Squared Error: 1.2923314440807299
R^2 Score: 0.013795337532284901
Polynomial Regression - Auto MPG Dataset
Mean Squared Error: 0.743149055720586
R^2 Score: 0.7505650609469626
DEPT OF CSE-DS, KNSIT 44
MACHINE LEARNIG LABORATORY BCSL606
DEPT OF CSE-DS, KNSIT 45
MACHINE LEARNIG LABORATORY BCSL606
PROGRAM 8
Develop a program to demonstrate the working of the decision tree
algorithm. Use Breast Cancer Data set for building the decision tree and
apply this knowledge to classify a new sample.
# Importing necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
from sklearn import tree
data = load_breast_cancer()
X = data.data
y = data.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)
clf = DecisionTreeClassifier(random_state=42)
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
DEPT OF CSE-DS, KNSIT 46
MACHINE LEARNIG LABORATORY BCSL606
accuracy = accuracy_score(y_test, y_pred)
print(f"Model Accuracy: {accuracy * 100:.2f}%")
new_sample = np.array([X_test[0]])
prediction = clf.predict(new_sample)
prediction_class = "Benign" if prediction == 1 else "Malignant"
print(f"Predicted Class for the new sample: {prediction_class}")
plt.figure(figsize=(12,8))
tree.plot_tree(clf, filled=True, feature_names=data.feature_names,
class_names=data.target_names)
plt.title("Decision Tree - Breast Cancer Dataset")
plt.show()
DEPT OF CSE-DS, KNSIT 47
MACHINE LEARNIG LABORATORY BCSL606
OUTPUT:
DEPT OF CSE-DS, KNSIT 48
MACHINE LEARNIG LABORATORY BCSL606
PROGRAM 9
Develop a program to implement the Naive Bayesian classifier considering
Olivetti Face Data set for training. Compute the accuracy of the classifier,
considering a few test data sets.
import numpy as np
from sklearn.datasets import fetch_olivetti_faces
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score, classification_report,
confusion_matrix
import matplotlib.pyplot as plt
data = fetch_olivetti_faces(shuffle=True, random_state=42)
X = data.data
y = data.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,
random_state=42)
gnb = GaussianNB()
gnb.fit(X_train, y_train)
y_pred = gnb.predict(X_test)
DEPT OF CSE-DS, KNSIT 49
MACHINE LEARNIG LABORATORY BCSL606
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy * 100:.2f}%')
print("\nClassification Report:")
print(classification_report(y_test, y_pred, zero_division=1))
print("\nConfusion Matrix:")
print(confusion_matrix(y_test, y_pred))
cross_val_accuracy = cross_val_score(gnb, X, y, cv=5, scoring='accuracy')
print(f'\nCross-validation accuracy: {cross_val_accuracy.mean() * 100:.2f}%')
fig, axes = plt.subplots(3, 5, figsize=(12, 8))
for ax, image, label, prediction in zip(axes.ravel(), X_test, y_test, y_pred):
ax.imshow(image.reshape(64, 64), cmap=plt.cm.gray)
ax.set_title(f"True: {label}, Pred: {prediction}")
ax.axis('off')
plt.show()
DEPT OF CSE-DS, KNSIT 50
MACHINE LEARNIG LABORATORY BCSL606
Accuracy: 80.83%
Classification Report:
precision recall f1-score support
0 0.67 1.00 0.80 2
1 1.00 1.00 1.00 2
2 0.33 0.67 0.44 3
3 1.00 0.00 0.00 5
DEPT OF CSE-DS, KNSIT 51
MACHINE LEARNIG LABORATORY BCSL606
4 1.00 0.50 0.67 4
5 1.00 1.00 1.00 2
7 1.00 0.75 0.86 4
8 1.00 0.67 0.80 3
9 1.00 0.75 0.86 4
10 1.00 1.00 1.00 3
11 1.00 1.00 1.00 1
12 0.40 1.00 0.57 4
13 1.00 0.80 0.89 5
14 1.00 0.40 0.57 5
15 0.67 1.00 0.80 2
16 1.00 0.67 0.80 3
17 1.00 1.00 1.00 3
18 1.00 1.00 1.00 3
19 0.67 1.00 0.80 2
20 1.00 1.00 1.00 3
21 1.00 0.67 0.80 3
22 1.00 0.60 0.75 5
23 1.00 0.75 0.86 4
24 1.00 1.00 1.00 3
25 1.00 0.75 0.86 4
26 1.00 1.00 1.00 2
DEPT OF CSE-DS, KNSIT 52
MACHINE LEARNIG LABORATORY BCSL606
27 1.00 1.00 1.00 5
28 0.50 1.00 0.67 2
29 1.00 1.00 1.00 2
30 1.00 1.00 1.00 2
31 1.00 0.75 0.86 4
32 1.00 1.00 1.00 2
34 0.25 1.00 0.40 1
35 1.00 1.00 1.00 5
36 1.00 1.00 1.00 3
37 1.00 1.00 1.00 1
38 1.00 0.75 0.86 4
39 0.50 1.00 0.67 5
accuracy 0.81 120
macro avg 0.89 0.85 0.83 120
weighted avg 0.91 0.81 0.81 120
Confusion Matrix:
[[2 0 0 ... 0 0 0]
[0 2 0 ... 0 0 0]
[0 0 2 ... 0 0 1]
DEPT OF CSE-DS, KNSIT 53
MACHINE LEARNIG LABORATORY BCSL606
...
[0 0 0 ... 1 0 0]
[0 0 0 ... 0 3 0]
[0 0 0 ... 0 0 5]]
Cross-validation accuracy: 87.25%
DEPT OF CSE-DS, KNSIT 54
MACHINE LEARNIG LABORATORY BCSL606
PROGRAM 10
Develop a program to implement k-means clustering using Wisconsin
Breast Cancer data set and visualize the clustering result.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import load_breast_cancer
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.metrics import confusion_matrix, classification_report
data = load_breast_cancer()
X = data.data
y = data.target
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
kmeans = KMeans(n_clusters=2, random_state=42)
y_kmeans = kmeans.fit_predict(X_scaled)
DEPT OF CSE-DS, KNSIT 55
MACHINE LEARNIG LABORATORY BCSL606
print("Confusion Matrix:")
print(confusion_matrix(y, y_kmeans))
print("\nClassification Report:")
print(classification_report(y, y_kmeans))
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X_scaled)
df = pd.DataFrame(X_pca, columns=['PC1', 'PC2'])
df['Cluster'] = y_kmeans
df['True Label'] = y
plt.figure(figsize=(8, 6))
sns.scatterplot(data=df, x='PC1', y='PC2', hue='Cluster', palette='Set1',
s=100, edgecolor='black', alpha=0.7)
plt.title('K-Means Clustering of Breast Cancer Dataset')
plt.xlabel('Principal Component 1')
plt.ylabel('Principal Component 2')
plt.legend(title="Cluster")
plt.show()
plt.figure(figsize=(8, 6))
DEPT OF CSE-DS, KNSIT 56
MACHINE LEARNIG LABORATORY BCSL606
sns.scatterplot(data=df, x='PC1', y='PC2', hue='True Label',
palette='coolwarm', s=100, edgecolor='black', alpha=0.7)
plt.title('True Labels of Breast Cancer Dataset')
plt.xlabel('Principal Component 1')
plt.ylabel('Principal Component 2')
plt.legend(title="True Label")
plt.show()
plt.figure(figsize=(8, 6))
sns.scatterplot(data=df, x='PC1', y='PC2', hue='Cluster', palette='Set1',
s=100, edgecolor='black', alpha=0.7)
centers = pca.transform(kmeans.cluster_centers_)
plt.scatter(centers[:, 0], centers[:, 1], s=200, c='red', marker='X',
label='Centroids')
plt.title('K-Means Clustering with Centroids')
plt.xlabel('Principal Component 1')
plt.ylabel('Principal Component 2')
plt.legend(title="Cluster")
plt.show()
DEPT OF CSE-DS, KNSIT 57
MACHINE LEARNIG LABORATORY BCSL606
OUTPUT:
Confusion Matrix:
[[175 37]
[ 13 344]]
Classification Report:
precision recall f1-score support
0 0.93 0.83 0.88 212
1 0.90 0.96 0.93 357
accuracy 0.91 569
macro avg 0.92 0.89 0.90 569
weighted avg 0.91 0.91 0.91 569
DEPT OF CSE-DS, KNSIT 58
MACHINE LEARNIG LABORATORY BCSL606
DEPT OF CSE-DS, KNSIT 59
MACHINE LEARNIG LABORATORY BCSL606
DEPT OF CSE-DS, KNSIT 60