0% found this document useful (0 votes)

33 views29 pages

ML Observation

The document outlines several machine learning programs using Python, including creating histograms and box plots for the California Housing dataset, computing and visualizing a correlation matrix, implementing PCA on the Iris dataset, and applying the Find-S algorithm and k-Nearest Neighbour algorithm on generated data. Each program includes code snippets and outputs demonstrating the analysis and classification tasks performed. The document serves as a laboratory guide for machine learning techniques and data visualization.

Uploaded by

Ff Veera

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

33 views29 pages

ML Observation

Uploaded by

Ff Veera

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 29

Machine Learning Laboratory BCSL606

Program 1: Develop a program to create histograms for all numerical

features and analyze the distribution of each feature.
Generate box plots for all numerical features and identify any outliers.
Use California Housing dataset.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import fetch_california_housing

# Load the California Housing dataset

housing_data = fetch_california_housing(as_frame=True)
data = housing_data['data']
print(data)
data['MedHouseVal'] = housing_data['target'] # Adding target variable for
completeness

# Histograms for all numerical features

print("Creating histograms for all numerical features...")
for column in data.columns:
plt.figure(figsize=(8, 5))
plt.hist(data[column], bins=50, edgecolor='k', alpha=0.7)
plt.title(f'Distribution of {column}')
plt.xlabel(column)
plt.ylabel('Frequency')
plt.grid(axis='y', linestyle='--', alpha=0.7)
plt.show()

# Box plots for all numerical features

print("Creating box plots for all numerical features to identify outliers...")
for column in data.columns:
plt.figure(figsize=(8, 5))

Dept. of ISE, JSSATEB 2024-25 1

Machine Learning Laboratory BCSL606

plt.boxplot(data[column], vert=False, patch_artist=True,

boxprops=dict(facecolor='skyblue', color='blue'))
plt.title(f'Box Plot of {column}')
plt.xlabel(column)
plt.grid(axis='x', linestyle='--', alpha=0.7)
plt.show()

# Identify outliers using IQR

print("Identifying potential outliers using the IQR method...")
outliers = {}
for column in data.columns:
Q1 = data[column].quantile(0.25)
Q3 = data[column].quantile(0.75)
IQR = Q3 - Q1
lower_bound = Q1 - 1.5 * IQR
upper_bound = Q3 + 1.5 * IQR
outliers[column] = data[(data[column] < lower_bound) | (data[column] >
upper_bound)]
print(f"{column}:")
print(f"Lower Bound: {lower_bound}, Upper Bound: {upper_bound}")
print(f"Number of outliers: {len(outliers[column])}")
print("---")

Output:
MedInc HouseAge AveRooms AveBedrms Population AveOccup Latitude \
0 8.3252 41.0 6.984127 1.023810 322.0 2.555556 37.88
1 8.3014 21.0 6.238137 0.971880 2401.0 2.109842 37.86
2 7.2574 52.0 8.288136 1.073446 496.0 2.802260 37.85
3 5.6431 52.0 5.817352 1.073059 558.0 2.547945 37.85
4 3.8462 52.0 6.281853 1.081081 565.0 2.181467 37.85
... ... ... ... ... ... ... ...
20635 1.5603 25.0 5.045455 1.133333 845.0 2.560606 39.48
20636 2.5568 18.0 6.114035 1.315789 356.0 3.122807 39.49
20637 1.7000 17.0 5.205543 1.120092 1007.0 2.325635 39.43
20638 1.8672 18.0 5.329513 1.171920 741.0 2.123209 39.43
20639 2.3886 16.0 5.254717 1.162264 1387.0 2.616981 39.37

Dept. of ISE, JSSATEB 2024-25 2

Machine Learning Laboratory BCSL606
Longitude
0 -122.23
1 -122.22
2 -122.24
3 -122.25
4 -122.25
... ...
20635 -121.09
20636 -121.21
20637 -121.22
20638 -121.32
20639 -121.24

[20640 rows x 8 columns]

Creating histograms for all numerical features...

Identifying potential outliers using the IQR method...

MedInc:
Lower Bound: -0.7063750000000004, Upper Bound: 8.013024999999999
Number of outliers: 681
---
HouseAge:
Lower Bound: -10.5, Upper Bound: 65.5
Number of outliers: 0
---
AveRooms:
Lower Bound: 2.023219161170969, Upper Bound: 8.469878027106942
Number of outliers: 511
---

Dept. of ISE, JSSATEB 2024-25 3

Machine Learning Laboratory BCSL606

AveBedrms:
Lower Bound: 0.8659085155701288, Upper Bound: 1.2396965968190603
Number of outliers: 1424
---
Population:
Lower Bound: -620.0, Upper Bound: 3132.0
Number of outliers: 1196
---
AveOccup:
Lower Bound: 1.1509614824735064, Upper Bound: 4.5610405893536905
Number of outliers: 711
---
Latitude:
Lower Bound: 28.259999999999998, Upper Bound: 43.38
Number of outliers: 0
---
Longitude:
Lower Bound: -127.48499999999999, Upper Bound: -112.32500000000002
Number of outliers: 0
---
MedHouseVal:
Lower Bound: -0.9808749999999995, Upper Bound: 4.824124999999999
Number of outliers: 1071
---

Dept. of ISE, JSSATEB 2024-25 4

Machine Learning Laboratory BCSL606

Program 2: Develop a program to Compute the correlation matrix to

understand the relationships between pairs of features. Visualize the
correlation matrix using a heatmap to know which variables have strong
positive/negative correlations. Create a pair plot to visualize pairwise
relationships between features. Use California Housing dataset.

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.datasets import fetch_california_housing

# Load the California Housing dataset

housing_data = fetch_california_housing(as_frame=True)
data = housing_data['data']
data['MedHouseVal'] = housing_data['target'] # Adding target variable for
completeness

# Compute the correlation matrix

print("Computing the correlation matrix...")
correlation_matrix = data.corr()
print(correlation_matrix)

# Visualize the correlation matrix using a heatmap

print("Visualizing the correlation matrix using a heatmap...")
plt.figure(figsize=(10, 8))
sns.heatmap(correlation_matrix, annot=True, fmt=".2f", cmap="coolwarm",
cbar=True, square=True)
plt.title("Correlation Matrix Heatmap")
plt.show()

Dept. of ISE, JSSATEB 2024-25 5

Machine Learning Laboratory BCSL606

# Create a pair plot to visualize pairwise relationships between features

print("Creating a pair plot to visualize pairwise relationships between
features...")
sns.pairplot(data, diag_kind='kde', corner=True)
plt.show()
Output:
Computing the correlation matrix...
MedInc HouseAge AveRooms AveBedrms Population AveOccup \
MedInc 1.000000 -0.119034 0.326895 -0.062040 0.004834 0.018766
HouseAge -0.119034 1.000000 -0.153277 -0.077747 -0.296244 0.013191
AveRooms 0.326895 -0.153277 1.000000 0.847621 -0.072213 -0.004852
AveBedrms -0.062040 -0.077747 0.847621 1.000000 -0.066197 -0.006181
Population 0.004834 -0.296244 -0.072213 -0.066197 1.000000 0.069863
AveOccup 0.018766 0.013191 -0.004852 -0.006181 0.069863 1.000000
Latitude -0.079809 0.011173 0.106389 0.069721 -0.108785 0.002366
Longitude -0.015176 -0.108197 -0.027540 0.013344 0.099773 0.002476
MedHouseVal 0.688075 0.105623 0.151948 -0.046701 -0.024650 -0.023737

Latitude Longitude MedHouseVal

MedInc -0.079809 -0.015176 0.688075
HouseAge 0.011173 -0.108197 0.105623
AveRooms 0.106389 -0.027540 0.151948
AveBedrms 0.069721 0.013344 -0.046701
Population -0.108785 0.099773 -0.024650
AveOccup 0.002366 0.002476 -0.023737
Latitude 1.000000 -0.924664 -0.144160
Longitude -0.924664 1.000000 -0.045967
MedHouseVal -0.144160 -0.045967 1.000000
Visualizing the correlation matrix using a heatmap...

Dept. of ISE, JSSATEB 2024-25 6

Machine Learning Laboratory BCSL606

Dept. of ISE, JSSATEB 2024-25 7

Machine Learning Laboratory BCSL606

Program 3: Develop a program to implement Principal Component

Analysis (PCA) for reducing the dimensionality of the Iris dataset from 4
features to 2.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import load_iris
from sklearn.preprocessing import StandardScaler
from numpy.linalg import eig

# Load the Iris dataset

iris = load_iris()
iris_data = iris.data
iris_target = iris.target
iris_feature_names = iris.feature_names

# Convert to DataFrame
df = pd.DataFrame(iris_data, columns=iris_feature_names)
df['Target'] = iris_target

# Example Data (First 5 Samples for Explanation)

example_data = iris_data[:5]
print("Example Data (First 5 Samples):")
print(example_data)

# Step 1: Standardize the Data

scaler = StandardScaler()
iris_data_scaled = scaler.fit_transform(iris_data)
example_data_scaled = scaler.transform(example_data)
print("\nStandardized Example Data:")
print(example_data_scaled)

Dept. of ISE, JSSATEB 2024-25 8

Machine Learning Laboratory BCSL606

# Step 2: Compute Covariance Matrix Manually

n_samples = iris_data_scaled.shape[0]
mean_vector = np.mean(iris_data_scaled, axis=0)
X_centered = iris_data_scaled - mean_vector
cov_matrix_manual = (1 / (n_samples - 1)) * np.dot(X_centered.T,
X_centered)
print("\nManually Computed Covariance Matrix:")
print(cov_matrix_manual)

# Step 3: Compute Eigenvalues and Eigenvectors Manually

eigenvalues_manual, eigenvectors_manual = eig(cov_matrix_manual)
print("\nManually Computed Eigenvalues:")
print(eigenvalues_manual)
print("\nManually Computed Eigenvectors:")
print(eigenvectors_manual)

# Step 4: Select Top 2 Principal Components

sorted_indices = np.argsort(eigenvalues_manual)[::-1]
top_2_indices = sorted_indices[:2]
top_2_eigenvectors = eigenvectors_manual[:, top_2_indices]
print("\nTop 2 Eigenvectors:")
print(top_2_eigenvectors)

# Step 5: Transform Data to 2D

iris_pca = np.dot(iris_data_scaled, top_2_eigenvectors)
example_pca = np.dot(example_data_scaled, top_2_eigenvectors)
print("\nReduced 2D Example Data:")
print(example_pca)

# Step 6: Visualize PCA Results

iris_pca_df = pd.DataFrame(data=iris_pca, columns=["Principal Component
1", "Principal Component 2"])
iris_pca_df['Target'] = iris_target

Dept. of ISE, JSSATEB 2024-25 9

Machine Learning Laboratory BCSL606

plt.figure(figsize=(8, 6))
sns.scatterplot(
x="Principal Component 1", y="Principal Component 2", hue="Target",
data=iris_pca_df,
palette="viridis", s=100, alpha=0.8
)
plt.title("PCA of Iris Dataset")
plt.xlabel("Principal Component 1")
plt.ylabel("Principal Component 2")
plt.legend(title="Target", labels=iris.target_names)
plt.grid(alpha=0.5)
plt.show()
Output:

Dept. of ISE, JSSATEB 2024-25 10

Machine Learning Laboratory BCSL606

Program 4: For a given set of training data examples stored in a .CSV

file, implement and demonstrate the Find-S algorithm to output a
description of the set of all hypotheses consistent with the training
examples.
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
# Implement Find-S algorithm
print("Implementing Find-S algorithm...")
# Implement Find-S algorithm
def find_s_algorithm(csv_file):
# Load the dataset
dataset = pd.read_csv(csv_file)
attributes = dataset.iloc[:, :-1].values
labels = dataset.iloc[:, -1].values

for i, label in enumerate(labels):

if label == 'Yes': # First positive example found
hypothesis = list(attributes[i])
break # Stop after finding the first "Yes"

for i in range(len(labels)):
if labels[i] == 'Yes': # Only process positive examples
for j in range(len(hypothesis)):
if hypothesis[j] != attributes[i][j]:
hypothesis[j] = '?' # Generalize

return hypothesis

csv_file = "/content/find_s_example.csv" # Provide the path to your CSV file

final_hypothesis = find_s_algorithm(csv_file)
print("Final Hypothesis:", final_hypothesis)

Output:
Implementing Find-S algorithm...
Final Hypothesis: ['Sunny', 'Warm', '?', '?', '?', '?']

Dept. of ISE, JSSATEB 2024-25 11

Machine Learning Laboratory BCSL606

Program 5: Develop a program to implement k-Nearest Neighbour

algorithm to classify the randomly generated 100 values of x in the range
of [0,1]. Perform the following based on dataset generated.
1. Label the first 50 points {x1,……,x50} as follows: if (xi ≤ 0.5), then xi ε
Class1, else xi ε Class1
2. Classify the remaining points, x51,……,x100 using KNN. Perform this
for k=1,2,3,4,5,20,30

import numpy as np
import matplotlib.pyplot as plt
from sklearn.neighbors import KNeighborsClassifier

# Step 1: Generate 100 random values in the range [0,1]

np.random.seed(42) # For reproducibility
x = np.random.rand(100).reshape(-1, 1) # Reshape for sklearn compatibility

print(x[:5])

# Step 2: Label the first 50 points

labels = np.array([1 if xi <= 0.5 else 2 for xi in x[:50]]) # Class 1 if xi <= 0.5
else Class 2

# Step 3: Train KNN classifier

k_values = [1, 2, 3, 4, 5, 20, 30]
classified_labels = {}

for k in k_values:
knn = KNeighborsClassifier(n_neighbors=k)
knn.fit(x[:50], labels) # Train using first 50 points
classified_labels[k] = knn.predict(x[50:]) # Classify remaining 50 points

# Step 4: Visualize the results

plt.figure(figsize=(10, 6))
plt.scatter(x[:50], labels, color='blue', label='Training Data')

Dept. of ISE, JSSATEB 2024-25 12

Machine Learning Laboratory BCSL606

plt.scatter(x[50:], classified_labels[1], color='red', marker='x', label='Classified

Data (k=1)')
plt.xlabel('X values')
plt.ylabel('Class')
plt.title('KNN Classification of Random Values')
plt.legend()
plt.show()

# Print classification results for different k values

for k in k_values:
print(f"Classification results for k={k}: {classified_labels[k]}")

Output:

Classification results for k=1: [2 2 2 2 2 2 1 1 1 1 1 1 2 1 1 2 1 2 1 2 2 1 1 2

2221112211112
2 2 1 1 2 2 2 2 1 2 1 1 1]
Classification results for k=2: [2 2 2 2 2 2 1 1 1 1 1 1 2 1 1 2 1 2 1 2 2 1 1 2
2221112211112
2 2 1 1 2 2 2 2 1 2 1 1 1]
Classification results for k=3: [2 2 2 2 2 2 1 1 1 1 1 1 2 1 1 2 1 2 1 2 2 1 1 2
2221112211112
2 2 1 1 2 2 2 2 2 2 1 1 1]

Dept. of ISE, JSSATEB 2024-25 13

Machine Learning Laboratory BCSL606

Classification results for k=4: [2 2 2 2 2 2 1 1 1 1 1 1 2 1 1 2 1 2 1 2 2 1 1 2

2221112211112
2 2 1 1 2 2 2 2 2 2 1 1 1]
Classification results for k=5: [2 2 2 2 2 2 1 1 1 1 1 1 2 1 1 2 1 2 1 2 2 1 1 2
2221112211112
2 2 1 1 2 2 2 2 2 2 1 1 1]
Classification results for k=20: [2 2 2 2 2 2 1 1 1 1 1 1 2 1 1 2 1 2 1 2 2 1 1
22221112211112
2 2 1 1 2 2 2 2 2 2 1 1 1]
Classification results for k=30: [2 2 2 2 2 2 1 1 1 1 1 1 2 1 1 2 1 2 1 2 2 1 1
22221112211112
2 2 1 1 2 2 2 2 1 2 1 1 1]

Dept. of ISE, JSSATEB 2024-25 14

Machine Learning Laboratory BCSL606

Program 6: Implement the non-parametric Locally Weighted Regression

algorithm in order to fit data points. Select appropriate data set for your
experiment and draw graphs

import numpy as np
import matplotlib.pyplot as plt

def gaussian_kernel(x, x_query, tau):

"""Compute the Gaussian weight for each training sample."""
return np.exp(-np.square(x - x_query) / (2 * tau**2))

def locally_weighted_regression(x_train, y_train, x_query, tau):

"""Perform Locally Weighted Regression (LWR) for a given query point."""
m = len(x_train)
W = np.diag(gaussian_kernel(x_train, x_query, tau)) # Compute weights

X_bias = np.c_[np.ones(m), x_train] # Add bias term

theta = np.linalg.pinv(X_bias.T @ W @ X_bias) @ (X_bias.T @ W @ y_train)

return np.array([1, x_query]) @ theta # Predict output for x_query

# Generate synthetic dataset

np.random.seed(42)
x_train = np.linspace(0, 10, 100)
y_train = np.sin(x_train) + np.random.normal(0, 0.2, 100) # Sinusoidal data
with noise

# Define tau (bandwidth parameter)

tau_values = [0.1, 0.5, 1, 5]
x_test = np.linspace(0, 10, 100) # Test data

plt.figure(figsize=(12, 8))
for tau in tau_values:
y_pred = np.array([locally_weighted_regression(x_train, y_train, xq, tau)
for xq in x_test])
plt.plot(x_test, y_pred, label=f'tau={tau}')

# Plot training data

plt.scatter(x_train, y_train, color='black', label='Training Data', alpha=0.5)
plt.xlabel('X')
plt.ylabel('Y')
plt.title('Locally Weighted Regression (LWR) with Different Tau Values')
plt.legend()
plt.show()

Dept. of ISE, JSSATEB 2024-25 15

Machine Learning Laboratory BCSL606

Output:

Dept. of ISE, JSSATEB 2024-25 16

Machine Learning Laboratory BCSL606

Program 7: Develop a program to demonstrate the working of Linear

Regression and Polynomial Regression. Use Boston Housing Dataset for
Linear Regression and Auto MPG Dataset (for vehicle fuel efficiency
prediction) for Polynomial Regression.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import PolynomialFeatures, StandardScaler
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
from sklearn.pipeline import make_pipeline

# Load Boston Housing Dataset from CSV

boston_df = pd.read_csv('/content/Boston.csv')
print("Boston CSV Columns:", boston_df.columns)
X_boston = boston_df[['rm']].values
y_boston = boston_df['medv'].values

X_train, X_test, y_train, y_test = train_test_split(X_boston, y_boston,

test_size=0.2, random_state=42)

linear_reg = LinearRegression()
linear_reg.fit(X_train, y_train)
y_pred = linear_reg.predict(X_test)

# Plot results
plt.figure(figsize=(10, 5))
plt.scatter(X_test, y_test, color='blue', label='Actual')
plt.plot(X_test, y_pred, color='red', linewidth=2, label='Predicted')
plt.xlabel('Average Number of Rooms (RM)')
plt.ylabel('Housing Price')

Dept. of ISE, JSSATEB 2024-25 17

Machine Learning Laboratory BCSL606

plt.title('Linear Regression on Boston Housing Dataset')

plt.legend()
plt.show()

print(f"Mean Squared Error (Linear Regression): {mean_squared_error(y_test,

y_pred)}")

# Polynomial Regression on Auto MPG Dataset

auto_mpg_url = "https://archive.ics.uci.edu/ml/machine-learning-
databases/auto-mpg/auto-mpg.data"
column_names = ['mpg', 'cylinders', 'displacement', 'horsepower', 'weight',
'acceleration', 'model_year', 'origin']

auto_df = pd.read_csv(auto_mpg_url, delim_whitespace=True,

names=column_names, na_values='?')
auto_df = auto_df.dropna() # Remove rows with missing values

X_auto = auto_df[['horsepower']].astype(float).values # Using 'horsepower' as

feature
y_auto = auto_df['mpg'].values

X_train, X_test, y_train, y_test = train_test_split(X_auto, y_auto,

test_size=0.2, random_state=42)

# Polynomial Regression (degree=3)

poly_model = make_pipeline(PolynomialFeatures(degree=3),
StandardScaler(), LinearRegression())
poly_model.fit(X_train, y_train)
y_poly_pred = poly_model.predict(X_test)

# Plot results
X_test_sorted, y_poly_pred_sorted = zip(*sorted(zip(X_test.flatten(),
y_poly_pred)))

Dept. of ISE, JSSATEB 2024-25 18

Machine Learning Laboratory BCSL606

plt.figure(figsize=(10, 5))
plt.scatter(X_test, y_test, color='blue', label='Actual')
plt.plot(X_test_sorted, y_poly_pred_sorted, color='red', linewidth=2,
label='Predicted')
plt.xlabel('Horsepower')
plt.ylabel('MPG')
plt.title('Polynomial Regression on Auto MPG Dataset')
plt.legend()
plt.show()

print(f"Mean Squared Error (Polynomial Regression):

{mean_squared_error(y_test, y_poly_pred)}")

Output:

Mean Squared Error (Linear Regression): 46.144775347317264

Dept. of ISE, JSSATEB 2024-25 19

Machine Learning Laboratory BCSL606

Dept. of ISE, JSSATEB 2024-25 20

Machine Learning Laboratory BCSL606

8. Develop a program to demonstrate the working of the decision tree

algorithm. Use Breast Cancer Data set for building the decision tree and
apply this knowledge to classify a new sample.

import numpy as np
import pandas as pd
from sklearn.datasets import load_breast_cancer
from sklearn.tree import DecisionTreeClassifier, plot_tree
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, accuracy_score
import matplotlib.pyplot as plt
from collections import Counter

data = load_breast_cancer()
X = data.data
y = data.target
feature_names = data.feature_names
target_names = data.target_names

print("Feature names:", feature_names)

print("Target names:", target_names)

def calculate_entropy(labels):
total = len(labels)
counts = Counter(labels)
entropy = 0.0
for count in counts.values():
p = count / total
entropy -= p * np.log2(p)
return entropy

entropy_dataset = calculate_entropy(y)
print(f"\nOverall Entropy of Target (Malignant vs Benign):
{entropy_dataset:.4f}")

print("\nInformation Gain for Each Feature (using median split):")

for i, feature in enumerate(feature_names):
feature_values = X[:, i]
median_value = np.median(feature_values)

# Split dataset
left_mask = feature_values <= median_value
right_mask = feature_values > median_value

y_left = y[left_mask]

Dept. of ISE, JSSATEB 2024-25 21

Machine Learning Laboratory BCSL606

y_right = y[right_mask]

entropy_left = calculate_entropy(y_left)
entropy_right = calculate_entropy(y_right)

weighted_entropy = (len(y_left) / len(y)) * entropy_left + (len(y_right) /

len(y)) * entropy_right
info_gain = entropy_dataset - weighted_entropy

print(f"{feature}: IG = {info_gain:.4f}")

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,

random_state=42)

clf = DecisionTreeClassifier(criterion='entropy', max_depth=4,

random_state=42)
clf.fit(X_train, y_train)

y_pred = clf.predict(X_test)
print("\nClassification Report:\n", classification_report(y_test, y_pred))
print("Accuracy:", accuracy_score(y_test, y_pred))

plt.figure(figsize=(20, 10))
plot_tree(clf, feature_names=feature_names, class_names=target_names,
filled=True, rounded=True)
plt.title("Decision Tree Visualization for Breast Cancer Dataset")
plt.show()

new_sample = np.array([[17.99, 10.38, 122.8, 1001.0, 0.1184,

0.2776, 0.3001, 0.1471, 0.2419, 0.07871,
1.095, 0.9053, 8.589, 153.4, 0.006399,
0.04904, 0.05373, 0.01587, 0.03003, 0.006193,
25.38, 17.33, 184.6, 2019.0, 0.1622,
0.6656, 0.7119, 0.2654, 0.4601, 0.1189]])

prediction = clf.predict(new_sample)
print("\nPrediction for new sample:")
print("Class:", target_names[prediction[0]])

Dept. of ISE, JSSATEB 2024-25 22

Machine Learning Laboratory BCSL606

Output:
Feature names: ['mean radius' 'mean texture' 'mean perimeter'
'mean area'
'mean smoothness' 'mean compactness' 'mean concavity'
'mean concave points' 'mean symmetry' 'mean fractal
dimension'
'radius error' 'texture error' 'perimeter error' 'area error'
'smoothness error' 'compactness error' 'concavity error'
'concave points error' 'symmetry error' 'fractal dimension
error'
'worst radius' 'worst texture' 'worst perimeter' 'worst area'
'worst smoothness' 'worst compactness' 'worst concavity'
'worst concave points' 'worst symmetry' 'worst fractal
dimension']
Target names: ['malignant' 'benign']

Overall Entropy of Target (Malignant vs Benign): 0.9526

Information Gain for Each Feature (using median split):

mean radius: IG = 0.3416
mean texture: IG = 0.1445
mean perimeter: IG = 0.3507
mean area: IG = 0.3416
mean smoothness: IG = 0.0660
mean compactness: IG = 0.2325
mean concavity: IG = 0.3695
mean concave points: IG = 0.3995
mean symmetry: IG = 0.0627
mean fractal dimension: IG = 0.0000
radius error: IG = 0.1824
texture error: IG = 0.0000
perimeter error: IG = 0.2192
area error: IG = 0.2910
smoothness error: IG = 0.0023
compactness error: IG = 0.0990
concavity error: IG = 0.1601
concave points error: IG = 0.1445
symmetry error: IG = 0.0037
fractal dimension error: IG = 0.0284
worst radius: IG = 0.4588
worst texture: IG = 0.1298
worst perimeter: IG = 0.4436
worst area: IG = 0.4556
worst smoothness: IG = 0.0990
worst compactness: IG = 0.1882
worst concavity: IG = 0.3792
worst concave points: IG = 0.4209
worst symmetry: IG = 0.0762
worst fractal dimension: IG = 0.0452

Dept. of ISE, JSSATEB 2024-25 23

Machine Learning Laboratory BCSL606

Classification Report:
precision recall f1-score support

0 0.97 0.91 0.94 43

1 0.95 0.99 0.97 71

accuracy 0.96 114

macro avg 0.96 0.95 0.95 114
weighted avg 0.96 0.96 0.96 114

Accuracy: 0.956140350877193

Prediction for new sample:

Class: malignant

Dept. of ISE, JSSATEB 2024-25 24

Machine Learning Laboratory BCSL606

9. Develop a program to implement the Naive Bayesian classifier

considering Olivetti Face Data set for training. Compute the accuracy of
the classifier, considering a few test data sets.

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import fetch_olivetti_faces
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score, classification_report,
confusion_matrix

faces = fetch_olivetti_faces()
X = faces.data # Flattened images: 400 x 4096
y = faces.target # Labels: 0 to 39 (40 classes)
images = faces.images # Original image shapes: 64 x 64

print(f"Total samples: {X.shape[0]}")

print(f"Image shape: {images[0].shape}")
print(f"Total classes: {len(np.unique(y))}")

X_train, X_test, y_train, y_test, img_train, img_test = train_test_split(

X, y, images, test_size=0.3, random_state=42, stratify=y)

model = GaussianNB()
model.fit(X_train, y_train)

y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("\nClassification Report:")
print(classification_report(y_test, y_pred))
print("Accuracy:", accuracy)

def show_predictions(images, true_labels, predicted_labels, n=8):

plt.figure(figsize=(15, 5))
for i in range(n):
plt.subplot(1, n, i + 1)
plt.imshow(images[i], cmap='gray')
plt.title(f"True: {true_labels[i]}\nPred: {predicted_labels[i]}")
plt.axis('off')
plt.tight_layout()
plt.suptitle("Sample Test Predictions", fontsize=16)
plt.subplots_adjust(top=0.75)
plt.show()

show_predictions(img_test, y_test, y_pred, n=8)

Dept. of ISE, JSSATEB 2024-25 25

Machine Learning Laboratory BCSL606

Output:
Classification Report:
precision recall f1-score support

0 1.00 0.67 0.80 3

1 1.00 0.67 0.80 3
2 0.43 1.00 0.60 3
3 1.00 0.33 0.50 3
4 1.00 0.33 0.50 3
5 1.00 1.00 1.00 3
6 1.00 0.67 0.80 3
7 0.60 1.00 0.75 3
8 1.00 1.00 1.00 3
9 1.00 0.33 0.50 3
10 1.00 0.67 0.80 3
11 1.00 1.00 1.00 3
12 1.00 1.00 1.00 3
13 1.00 0.67 0.80 3
14 1.00 1.00 1.00 3
15 0.50 1.00 0.67 3
16 1.00 0.33 0.50 3
17 0.00 0.00 0.00 3
18 1.00 1.00 1.00 3
19 1.00 1.00 1.00 3
20 1.00 1.00 1.00 3
21 1.00 1.00 1.00 3
22 1.00 1.00 1.00 3
23 1.00 1.00 1.00 3
24 1.00 0.67 0.80 3
25 0.75 1.00 0.86 3
26 1.00 0.67 0.80 3
27 1.00 1.00 1.00 3
28 1.00 1.00 1.00 3
29 1.00 1.00 1.00 3
30 0.75 1.00 0.86 3
31 1.00 0.67 0.80 3
32 1.00 1.00 1.00 3
33 1.00 0.67 0.80 3
34 0.43 1.00 0.60 3
35 0.75 1.00 0.86 3
36 1.00 1.00 1.00 3
37 1.00 0.33 0.50 3
38 1.00 1.00 1.00 3
39 0.33 1.00 0.50 3

accuracy 0.82 120

macro avg 0.89 0.82 0.81 120

Dept. of ISE, JSSATEB 2024-25 26

Machine Learning Laboratory BCSL606

weighted avg 0.89 0.82 0.81 120

Accuracy: 0.8166666666666667

Dept. of ISE, JSSATEB 2024-25 27

Machine Learning Laboratory BCSL606

10. Develop a program to implement k-means clustering using

Wisconsin Breast Cancer data set and visualize the clustering result.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import load_breast_cancer
from sklearn.cluster import KMeans
from sklearn.decomposition import PCA
from sklearn.metrics import confusion_matrix, accuracy_score
from sklearn.preprocessing import StandardScaler

data = load_breast_cancer()
X = data.data
y = data.target
feature_names = data.feature_names
target_names = data.target_names

print("Data Shape:", X.shape)

print("Classes:", target_names)

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

kmeans = KMeans(n_clusters=2, random_state=42, n_init=10)

clusters = kmeans.fit_predict(X_scaled)

labels_mapped = np.where(clusters == 1, 0, 1)

print("\nConfusion Matrix:")
print(confusion_matrix(y, labels_mapped))
print("Accuracy:", accuracy_score(y, labels_mapped))

pca = PCA(n_components=2)
X_pca = pca.fit_transform(X_scaled)

plt.figure(figsize=(10, 6))
plt.scatter(X_pca[:, 0], X_pca[:, 1], c=clusters, cmap='viridis', alpha=0.6)
plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1],
s=250, marker='X', c='red', label='Centroids')
plt.title("K-Means Clustering of Breast Cancer Dataset (PCA-2D)")
plt.xlabel("PCA Component 1")
plt.ylabel("PCA Component 2")
plt.legend()
plt.grid(True)
plt.show()

Dept. of ISE, JSSATEB 2024-25 28

Machine Learning Laboratory BCSL606

Output:

Data Shape: (569, 30)

Classes: ['malignant' 'benign']

Confusion Matrix:
[[176 36]
[ 18 339]]
Accuracy: 0.9050966608084359

Dept. of ISE, JSSATEB 2024-25 29

Machine Learning Laboratory
No ratings yet
Machine Learning Laboratory
23 pages
For ML Lab Observation - Ex No 1-10
No ratings yet
For ML Lab Observation - Ex No 1-10
48 pages
ML Lab Manual
No ratings yet
ML Lab Manual
60 pages
ML Lab Manual
No ratings yet
ML Lab Manual
25 pages
Lab Manual ML
No ratings yet
Lab Manual ML
26 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
33 pages
Exp - 2-EDA - CaliforniaData Set - HeatMap - PairPlot-checkpoint - Jupyter Notebook
No ratings yet
Exp - 2-EDA - CaliforniaData Set - HeatMap - PairPlot-checkpoint - Jupyter Notebook
12 pages
ML Program No.1
No ratings yet
ML Program No.1
3 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
26 pages
Exp 1 A
No ratings yet
Exp 1 A
5 pages
ML Lab Program 1& 2
No ratings yet
ML Lab Program 1& 2
6 pages
Machine Learning Labnem
No ratings yet
Machine Learning Labnem
5 pages
ML Lab Manual
No ratings yet
ML Lab Manual
40 pages
ML Lab - Exp1-10
No ratings yet
ML Lab - Exp1-10
4 pages
ML Lab Manual
No ratings yet
ML Lab Manual
43 pages
Machine Learning Lab Manaul BCSL606
No ratings yet
Machine Learning Lab Manaul BCSL606
27 pages
2 Program
No ratings yet
2 Program
8 pages
ML Lab - BCSL606
No ratings yet
ML Lab - BCSL606
67 pages
ML - Datascience Manual
No ratings yet
ML - Datascience Manual
64 pages
ML Lab Manual
No ratings yet
ML Lab Manual
110 pages
Experiment No 11
No ratings yet
Experiment No 11
19 pages
Machine Learning (BCSL606) Lab Manual
No ratings yet
Machine Learning (BCSL606) Lab Manual
117 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
18 pages
Machine Learning (BCSL606) Lab Manual
No ratings yet
Machine Learning (BCSL606) Lab Manual
117 pages
ML Lab Manual
No ratings yet
ML Lab Manual
36 pages
ML Manual
No ratings yet
ML Manual
30 pages
ML 1st Program
No ratings yet
ML 1st Program
3 pages
ML LAB Manual
No ratings yet
ML LAB Manual
18 pages
PGM 1
No ratings yet
PGM 1
5 pages
MLLab Manual
No ratings yet
MLLab Manual
24 pages
Machine Learning (BCSL606) Lab Manual
No ratings yet
Machine Learning (BCSL606) Lab Manual
117 pages
ML Labmanual
No ratings yet
ML Labmanual
33 pages
ML 3
No ratings yet
ML 3
24 pages
Machinelearninglabmanual
No ratings yet
Machinelearninglabmanual
47 pages
ML Lab Mannual1
No ratings yet
ML Lab Mannual1
37 pages
ML Short Code - Under Updating
No ratings yet
ML Short Code - Under Updating
4 pages
M PDF
No ratings yet
M PDF
13 pages
Assignment - 2 (Set A)
No ratings yet
Assignment - 2 (Set A)
2 pages
Lab Prog1
No ratings yet
Lab Prog1
2 pages
Mlalllabprgs
No ratings yet
Mlalllabprgs
17 pages
The Boston Housing Dataset
100% (2)
The Boston Housing Dataset
4 pages
Project 4 - House Price Prediction - Ipynb - Colab
No ratings yet
Project 4 - House Price Prediction - Ipynb - Colab
5 pages
ML Spy Programs
No ratings yet
ML Spy Programs
16 pages
Regression Analysis - Lasso and Ridge Regularization
No ratings yet
Regression Analysis - Lasso and Ridge Regularization
17 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
9 pages
Linear Regression Analysis - Polynomial Regression
No ratings yet
Linear Regression Analysis - Polynomial Regression
25 pages
Bcsl606 - Lab Manual
No ratings yet
Bcsl606 - Lab Manual
28 pages
PrOGRAM1.Ipynb - Colab
No ratings yet
PrOGRAM1.Ipynb - Colab
2 pages
BCSL606 Machine Learning Lab
No ratings yet
BCSL606 Machine Learning Lab
33 pages
Boston House Prediction - Colab1
No ratings yet
Boston House Prediction - Colab1
10 pages
ML Lab Manual
No ratings yet
ML Lab Manual
24 pages
Machine Learning Programs
No ratings yet
Machine Learning Programs
10 pages
ML Lab
No ratings yet
ML Lab
14 pages
DALab Part-B BCU&BU
No ratings yet
DALab Part-B BCU&BU
12 pages
Python ML for Engineers: Week 3
No ratings yet
Python ML for Engineers: Week 3
12 pages
Saurabh
No ratings yet
Saurabh
22 pages
ML Lab File
No ratings yet
ML Lab File
47 pages
DM Assignment
No ratings yet
DM Assignment
17 pages
Unit 1: Shobana T S Assistant Professor Dept. of ISE, BMSCE
No ratings yet
Unit 1: Shobana T S Assistant Professor Dept. of ISE, BMSCE
127 pages
ANALYSIS - PROJECT (2) Ugggguguguguugugugu
No ratings yet
ANALYSIS - PROJECT (2) Ugggguguguguugugugu
15 pages
Day 01 - DSA Blended Learning - Matoshri Sxscs
No ratings yet
Day 01 - DSA Blended Learning - Matoshri Sxscs
6 pages
MVTT BWKZ YUXq Nvna
No ratings yet
MVTT BWKZ YUXq Nvna
5 pages
1jb21cs053 Harsh Pratap Singh Resume
No ratings yet
1jb21cs053 Harsh Pratap Singh Resume
1 page
1jb22is139 Sanketh Resume
No ratings yet
1jb22is139 Sanketh Resume
1 page
When Someone Tells Me
No ratings yet
When Someone Tells Me
1 page
Module-04 FSD (BIS601)
No ratings yet
Module-04 FSD (BIS601)
53 pages
Vinayak P Kashinakunti FlowCV Resume 20250729
No ratings yet
Vinayak P Kashinakunti FlowCV Resume 20250729
2 pages
Pre Placement Training On Soft Skills - Influence The Interviewer Is Scheduled
No ratings yet
Pre Placement Training On Soft Skills - Influence The Interviewer Is Scheduled
1 page
MS Kickdrum Technologies Campus Drive BE - CSE, IsE, CSE (AIML) 2026 Batch
No ratings yet
MS Kickdrum Technologies Campus Drive BE - CSE, IsE, CSE (AIML) 2026 Batch
1 page
Module-03 FSD (BIS601)
No ratings yet
Module-03 FSD (BIS601)
26 pages
Report On Renewable
No ratings yet
Report On Renewable
2 pages
Art 3 The Effect of Financial Ratios On Returns From Initial Public Offerings: An Application of Principal Components Analysis Min-Tsung Cheng
No ratings yet
Art 3 The Effect of Financial Ratios On Returns From Initial Public Offerings: An Application of Principal Components Analysis Min-Tsung Cheng
9 pages
Bivariate Statistics
No ratings yet
Bivariate Statistics
14 pages
Normal Distribution Properties and Transformations
No ratings yet
Normal Distribution Properties and Transformations
9 pages
Stationary Stochastic Process: 1 M t1 TM t1+h TM+H T
No ratings yet
Stationary Stochastic Process: 1 M t1 TM t1+h TM+H T
11 pages
Outliers Detection in Regression Analysis Using Partial Least Square Approach
No ratings yet
Outliers Detection in Regression Analysis Using Partial Least Square Approach
3 pages
Ai - Introduction: FDP / Short Term Training On Artificial Intelligence & Deep Learning Applications
No ratings yet
Ai - Introduction: FDP / Short Term Training On Artificial Intelligence & Deep Learning Applications
6 pages
301 739 1 PB PDF
No ratings yet
301 739 1 PB PDF
7 pages
JASP Manual: Seton Hall University Department of Psychology 2018
No ratings yet
JASP Manual: Seton Hall University Department of Psychology 2018
48 pages
A Guide To Modern Econometrics 5th Edition Marno Verbeek PDF Available
100% (2)
A Guide To Modern Econometrics 5th Edition Marno Verbeek PDF Available
78 pages
S13 CH 5
No ratings yet
S13 CH 5
31 pages
Correlation vs. Causation
No ratings yet
Correlation vs. Causation
13 pages
ATAR Conversion for Students
No ratings yet
ATAR Conversion for Students
1 page
Dummy TEST de Chow Illustration
No ratings yet
Dummy TEST de Chow Illustration
13 pages
Cheatsheet Machine Learning Tips and Tricks PDF
No ratings yet
Cheatsheet Machine Learning Tips and Tricks PDF
2 pages
Introduction To ML
No ratings yet
Introduction To ML
2 pages
Analysis of Covariance-ANCOVA-with Two Groups PDF
No ratings yet
Analysis of Covariance-ANCOVA-with Two Groups PDF
41 pages
ANOVA - Two-Way ANOVA (2B)
No ratings yet
ANOVA - Two-Way ANOVA (2B)
17 pages
Dimensionality Reduction Guide
No ratings yet
Dimensionality Reduction Guide
79 pages
19 Assessing Model Accuracy
No ratings yet
19 Assessing Model Accuracy
16 pages
Econometrics for Business Students
No ratings yet
Econometrics for Business Students
10 pages
Correlational Research Design
100% (1)
Correlational Research Design
18 pages
Boosting
No ratings yet
Boosting
13 pages
Stata Time Series Guide
100% (1)
Stata Time Series Guide
26 pages
Stat 250 Gunderson Lecture Notes 11: Regression Analysis: Main Idea
No ratings yet
Stat 250 Gunderson Lecture Notes 11: Regression Analysis: Main Idea
22 pages
11A
No ratings yet
11A
13 pages
Untitled6.Ipynb - Colab
No ratings yet
Untitled6.Ipynb - Colab
6 pages
Nur Ghailina As'ari, 2018
No ratings yet
Nur Ghailina As'ari, 2018
13 pages
Stata Basics for Data Analysts
No ratings yet
Stata Basics for Data Analysts
42 pages
Confusion Matrix
No ratings yet
Confusion Matrix
5 pages
Jaipuria Institute of Management, Jaipur: Course Name Statistics For Management Course Code Max. Time 2 Hour Max. Marks
No ratings yet
Jaipuria Institute of Management, Jaipur: Course Name Statistics For Management Course Code Max. Time 2 Hour Max. Marks
6 pages