MAHARAJA SURAJMAL INSTITUTE
Affiliated to GGSIP University & NAAC ‘A+’ grade accredited
DEPARTMENT OF COMPUTER APPLICATIONS
Machine Learning
PRACTICAL FILE
SUBJECT CODE – BCAP 311
Submitted by : Kanika Mittal Submitted to :- Dr. Anamika Rana
Enrollment no : 00121202021 Associate Professor , MSI
Sem : 5th Sec : A (2nd shift) Sign :- ____________
INDEX
S.No. Practical Date Sign
1. Write a program in python to implement Linear 11/09/23
Regression with one variable.
2. Write a program in python to implement Linear 11/09/23
Regression with multiple variables.
3. Write a program in python to implement Logistic 25/09/23
Regression.
4. Write a program in python to implement SVM 25/10/23
Classifier.
5. Write a program in python to implement KNN 09/10/23
Classifier.
6. Write a program in python to implement a 09/10/23
Decision Tree Classifier.
7. Write a program in python to implement the Naïve 16/10/23
Bayes Classifier.
8. Write a program in python to implement the 16/10/23
Random Forest Classifier.
9. Build an Artificial Neural Network (ANN) by 21/10/23
implementing the Back Propagation Algorithm.
10. Write a program in python to implement K-means 28/10/23
Algorithm.
11. Write a program in python on Self Organising Map 18/11/23
(SOM).
12. Write a program in python for Empirical 23/11/23
Comparison of different Supervised learning
techniques.
13. Write a program in python for Empirical 23/11/23
Comparison of different Unsupervised learning
techniques.
Practical – 1
Ques. Write a program in python to implement Linear Regression with one
variable.
Code :-
# Import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
# Load the diabetes dataset
diabetes = load_diabetes()
X = diabetes.data[:, np.newaxis, 2]
y = diabetes.target
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25)
# Create a linear regression model
model = LinearRegression()
# Train the model on the training set
model.fit(X_train, y_train)
# Make predictions on the testing set
y_pred = model.predict(X_test)
# Evaluate the model
mse = mean_squared_error(y_test, y_pred)
print(f'Mean Squared Error: {mse:.2f}')
# Plot the regression line
plt.scatter(X_test, y_test, color='red', label='Actual Data')
plt.plot(X_test, y_pred, color='blue', label='Regression Line')
plt.xlabel('Feature')
plt.ylabel('Target')
plt.title('Linear Regression with One Variable')
plt.show()
Output :-
Practical – 2
Ques. Write a program in python to implement Linear Regression with
multiple variables.
Code :-
# import all the necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
# Load the diabetes dataset
data = load_diabetes()
X, y = data.data, data.target
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25)
# Initialize and train the linear regression model
model = LinearRegression()
model.fit(X_train, y_train)
# Make predictions on the test set
y_pred = model.predict(X_test)
# Calculate the Mean Squared Error (MSE) as a measure of performance
mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error: {mse:.2f}")
print(f"R2 Score: {r2_score(y_test, y_pred):.2f}")
# Plot the regression line
plt.figure(figsize=(8, 6))
plt.scatter(y_test, y_pred, color='blue', label='Predicted Values')
plt.plot([min(y_test), max(y_test)], [min(y_test), max(y_test)], color='red',
linewidth=2, label='Regression Line')
plt.xlabel('Actual Values')
plt.ylabel('Predicted Values')
plt.title('Linear Regression with Multiple Variables')
plt.legend()
plt.show()
Output :-
Practical – 3
Ques. Write a program in python to implement Logistic Regression.
Code :-
# import all the necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_wine, load_iris
from sklearn.metrics import confusion_matrix, classification_report,
accuracy_score
import seaborn as sns
from sklearn.preprocessing import StandardScaler
#load iris dataset
iris=load_iris()
# Selecting features (X) and target variable (y)
X = iris.data
y = iris.target
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
# Initialize the logistic regression model
model = LogisticRegression()
# scale the data
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
# Train the model
model.fit(X_train, y_train)
# Make predictions on the test set
y_pred = model.predict(X_test)
# print classification report
print(classification_report(y_test, y_pred))
print("Accuracy : ",accuracy_score(y_test, y_pred)*100,'%')
# Plot the confusion matrix heatmap
conf_matrix = confusion_matrix(y_test, y_pred)
sns.heatmap(conf_matrix, annot=True, fmt='d', cmap='Blues',
xticklabels=['Predicted 0', 'Predicted 1'],
yticklabels=['Actual 0', 'Actual 1'])
plt.title('Confusion Matrix')
plt.show()
Output :-
Practical – 4
Ques. Write a program in python to implement SVM Classifier.
Code :-
# import all the necessary libraries
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.datasets import load_wine
from sklearn.metrics import confusion_matrix, classification_report,
accuracy_score
import seaborn as sns
from sklearn.preprocessing import StandardScaler
#load wine dataset
wine=load_wine()
# Selecting features (X) and target variable (y)
X = wine.data
y = wine.target
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
# Initialize the SVM model
model = SVC()
# scale the data
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
# Train the model
model.fit(X_train, y_train)
# Make predictions on the test set
y_pred = model.predict(X_test)
# print classification report
print(classification_report(y_test, y_pred))
print("Accuracy : ",accuracy_score(y_test, y_pred)*100,'%')
# Plot the confusion matrix heatmap
conf_matrix = confusion_matrix(y_test, y_pred)
sns.heatmap(conf_matrix, annot=True, fmt='d', cmap='Blues',
xticklabels=['Predicted 0', 'Predicted 1', 'Predicted 2'],
yticklabels=['Actual 0', 'Actual 1', 'Actual 2'])
plt.title('Confusion Matrix')
plt.show()
Output :-
Practical – 5
Ques. Write a program in python to implement KNN Classifier.
Code :-
#WAP to implement KNN algorithm on iris dataset
# import all the necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.datasets import load_iris
from sklearn.metrics import confusion_matrix, classification_report,
accuracy_score
import seaborn as sns
from sklearn.preprocessing import StandardScaler
#load iris dataset
iris=load_iris()
# Selecting features (X) and target variable (y)
X = iris.data
y = iris.target
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
# Initialize the KNN model
model = KNeighborsClassifier()
# scale the data
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
# Train the model
model.fit(X_train, y_train)
# Make predictions on the test set
y_pred = model.predict(X_test)
# print classification report
print(classification_report(y_test, y_pred))
print("Accuracy : ",accuracy_score(y_test, y_pred)*100,'%')
# Plot the confusion matrix heatmap
conf_matrix = confusion_matrix(y_test, y_pred)
sns.heatmap(conf_matrix, annot=True, fmt='d', cmap='Blues',
xticklabels=['Predicted 0', 'Predicted 1', 'Predicted 2'],
yticklabels=['Actual 0', 'Actual 1', 'Actual 2'])
plt.title('Confusion Matrix')
plt.show()
Output :-
Practical – 6
Ques. Write a program in python to implement a Decision Tree Classifier.
Code :-
# import all the necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_iris
from sklearn.metrics import confusion_matrix, classification_report,
accuracy_score
import seaborn as sns
from sklearn.preprocessing import StandardScaler
#load iris dataset
iris=load_iris()
# Selecting features (X) and target variable (y)
X = iris.data
y = iris.target
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
# Initialize the Decision Tree model
model = DecisionTreeClassifier()
# scale the data
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
# Train the model
model.fit(X_train, y_train)
# Make predictions on the test set
y_pred = model.predict(X_test)
# print classification report
print(classification_report(y_test, y_pred))
print("Accuracy : ",accuracy_score(y_test, y_pred)*100,'%')
# Plot the confusion matrix heatmap
conf_matrix = confusion_matrix(y_test, y_pred)
sns.heatmap(conf_matrix, annot=True, fmt='d', cmap='Blues',
xticklabels=['Predicted 0', 'Predicted 1', 'Predicted 2'],
yticklabels=['Actual 0', 'Actual 1', 'Actual 2'])
plt.title('Confusion Matrix')
plt.show()
Output :-
Practical – 7
Ques. Write a program in python to implement the Naïve Bayes Classifier.
Code :-
#WAP to implement Naive Bayesian algorithm on iris dataset
# import all the necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.datasets import load_iris
from sklearn.metrics import confusion_matrix, classification_report,
accuracy_score
import seaborn as sns
from sklearn.preprocessing import StandardScaler
#load iris dataset
iris=load_iris()
# Selecting features (X) and target variable (y)
X = iris.data
y = iris.target
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
# Initialize the Naive Bayes model
model = GaussianNB()
# scale the data
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
# Train the model
model.fit(X_train, y_train)
# Make predictions on the test set
y_pred = model.predict(X_test)
# print classification report
print(classification_report(y_test, y_pred))
print("Accuracy : ",accuracy_score(y_test, y_pred)*100,'%')
# Plot the confusion matrix heatmap
conf_matrix = confusion_matrix(y_test, y_pred)
sns.heatmap(conf_matrix, annot=True, fmt='d', cmap='Blues',
xticklabels=['Predicted 0', 'Predicted 1', 'Predicted 2'],
yticklabels=['Actual 0', 'Actual 1', 'Actual 2'])
plt.title('Confusion Matrix')
plt.show()
Output :-
Practical – 8
Ques. Write a program in python to implement the Random Forest Classifier.
Code :-
#WAP to implement Random Forest algorithm on iris dataset
# import all the necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_iris
from sklearn.metrics import confusion_matrix, classification_report,
accuracy_score
import seaborn as sns
from sklearn.preprocessing import StandardScaler
#load iris dataset
iris=load_iris()
# Selecting features (X) and target variable (y)
X = iris.data
y = iris.target
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=5)
# Initialize the Random Forest model
model = RandomForestClassifier()
# scale the data
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
# Train the model
model.fit(X_train, y_train)
# Make predictions on the test set
y_pred = model.predict(X_test)
# print classification report
print(classification_report(y_test, y_pred))
print("Accuracy : ",accuracy_score(y_test, y_pred)*100,'%')
# Plot the confusion matrix heatmap
conf_matrix = confusion_matrix(y_test, y_pred)
sns.heatmap(conf_matrix, annot=True, fmt='d', cmap='Blues',
xticklabels=['Predicted 0', 'Predicted 1', 'Predicted 2'],
yticklabels=['Actual 0', 'Actual 1', 'Actual 2'])
plt.title('Confusion Matrix')
plt.show()
Output :-
Practical – 9
Ques. Build an Artificial Neural Network (ANN) by implementing the Back
Propagation Algorithm.
Code :-
# import all the necessary libraries
import numpy as np
from sklearn.datasets import load_breast_cancer, load_iris, load_wine
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score
import numpy as np
# Load breast cancer dataset
breast_cancer = load_breast_cancer()
X, y = breast_cancer.data, breast_cancer.target
# Normalize the data
scaler = StandardScaler()
X = scaler.fit_transform(X)
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.25)
# Neural Network parameters
input_size = X_train.shape[1]
hidden_size = 5
output_size = 1
learning_rate = 0.01
epochs = 1000
# Initialize weights and biases
np.random.seed(42)
weights_input_hidden = np.random.randn(input_size, hidden_size)
biases_hidden = np.zeros((1, hidden_size))
weights_hidden_output = np.random.randn(hidden_size, output_size)
biases_output = np.zeros((1, output_size))
# Sigmoid activation function and its derivative
def sigmoid(x):
return 1 / (1 + np.exp(-x))
def sigmoid_derivative(x):
return x * (1 - x)
# Training the neural network using backpropagation
for epoch in range(epochs):
# Forward pass
hidden_input = np.dot(X_train, weights_input_hidden) +
biases_hidden
hidden_output = sigmoid(hidden_input)
final_input = np.dot(hidden_output, weights_hidden_output) +
biases_output
predicted_output = sigmoid(final_input)
# Compute the loss
loss = y_train.reshape(-1, 1) - predicted_output
# Backpropagation
output_error = loss * sigmoid_derivative(predicted_output)
hidden_layer_error = output_error.dot(weights_hidden_output.T) *
sigmoid_derivative(hidden_output)
# Update weights and biases
weights_hidden_output += hidden_output.T.dot(output_error) *
learning_rate
biases_output += np.sum(output_error, axis=0, keepdims=True) *
learning_rate
weights_input_hidden += X_train.T.dot(hidden_layer_error) *
learning_rate
biases_hidden += np.sum(hidden_layer_error, axis=0, keepdims=True)
* learning_rate
# Make predictions on the test set
hidden_input = np.dot(X_test, weights_input_hidden) + biases_hidden
hidden_output = sigmoid(hidden_input)
final_input = np.dot(hidden_output, weights_hidden_output) +
biases_output
predicted_output = sigmoid(final_input)
# Convert predicted probabilities to binary predictions (0 or 1)
binary_predictions = (predicted_output > 0.5).astype(int)
# Evaluate the accuracy
accuracy = accuracy_score(y_test, binary_predictions)
print(f"Accuracy: {100*accuracy:.2f}%")
Output :-
Practical – 10
Ques. Write a program in python to implement K-means Algorithm.
Code :-
# import all the necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans, AgglomerativeClustering
from sklearn.metrics import silhouette_score
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.datasets import load_wine
# Load dataset
wine = load_wine()
X = wine.data
y = wine.target
# Standardize the features
scaler = StandardScaler()
X_std = scaler.fit_transform(X)
# Apply K-means clustering
kmeans = KMeans(n_clusters=4)
kmeans_labels = kmeans.fit_predict(X_std)
# Apply Hierarchical clustering
agg_clustering = AgglomerativeClustering(n_clusters=4)
agg_labels = agg_clustering.fit_predict(X_std)
# Evaluate clustering quality using silhouette score
kmeans_silhouette = silhouette_score(X_std, kmeans_labels)
agg_silhouette = silhouette_score(X_std, agg_labels)
# Print silhouette scores
print(f"K-means Silhouette Score: {kmeans_silhouette}")
print(f"Hierarchical Silhouette Score: {agg_silhouette}")
# Visualize the clustering results using PCA for dimensionality
reduction
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X_std)
plt.figure(figsize=(12, 5))
plt.subplot(1, 2, 1)
for cluster in range(4):
plt.scatter(X_pca[kmeans_labels == cluster, 0], X_pca[kmeans_labels
== cluster, 1], label=f'Cluster {cluster + 1}')
plt.title('K-means Clustering')
plt.legend()
plt.subplot(1, 2, 2)
for cluster in range(4):
plt.scatter(X_pca[agg_labels == cluster, 0], X_pca[agg_labels ==
cluster, 1], label=f'Cluster {cluster + 1}')
plt.title('Hierarchical Clustering')
plt.legend()
plt.show()
Output :-
Practical – 11
Ques. Write a program in python on Self Organising Map (SOM).
Code :-
# import all the required libraries
from sklearn.datasets import load_iris
from sklearn.preprocessing import MinMaxScaler
from minisom import MiniSom
import matplotlib.pyplot as plt
data = load_iris()
X, y = data.data, data.target
# Normalize the data
scaler = MinMaxScaler()
X = scaler.fit_transform(X)
# SOM parameters
som_grid_size = (10, 10) # Grid size of the SOM
input_size = X.shape[1] # Number of features in the input data
learning_rate = 0.5 # Initial learning rate
sigma = 1.0 # Initial neighborhood radius
epochs = 1000 # Number of training epochs
# Initialize SOM
som = MiniSom(som_grid_size[0], som_grid_size[1], input_size, sigma=sigma,
learning_rate=learning_rate)
# Training the SOM
som.train_random(X, epochs, verbose=True)
# Visualize the SOM
plt.figure(figsize=(8, 6))
plt.pcolor(som.distance_map().T, cmap='bone_r') # plot the distance map as
background
plt.colorbar()
# Plot the data points on the SOM
for i, (x, label) in enumerate(zip(X, y)):
w = som.winner(x) # getting the winner
plt.text(w[0] + 0.5, w[1] + 0.5, str(label), color=plt.cm.rainbow(label /
2.0), fontdict={'weight': 'bold', 'size': 9})
plt.show()
Output :-
Practical – 12
Ques. Write a program in python for Empirical Comparison of different
Supervised learning techniques.
Code :-
import numpy as np
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, classification_report
from sklearn.model_selection import cross_val_score
# Load a sample dataset (Iris dataset in this case)
iris = datasets.load_iris()
X = iris.data
y = iris.target
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)
# Standardize features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
# Create classifiers
svm_classifier = SVC(kernel='linear', C=1)
rf_classifier = RandomForestClassifier(n_estimators=100, random_state=42)
knn_classifier = KNeighborsClassifier(n_neighbors=3)
# Train classifiers
svm_classifier.fit(X_train, y_train)
rf_classifier.fit(X_train, y_train)
knn_classifier.fit(X_train, y_train)
# Predictions
svm_pred = svm_classifier.predict(X_test)
rf_pred = rf_classifier.predict(X_test)
knn_pred = knn_classifier.predict(X_test)
# Evaluate performance
print("Support Vector Machine:")
print(f"Accuracy: {accuracy_score(y_test, svm_pred)}")
print("Classification Report:")
print(classification_report(y_test, svm_pred))
print("\nRandom Forest:")
print(f"Accuracy: {accuracy_score(y_test, rf_pred)}")
print("Classification Report:")
print(classification_report(y_test, rf_pred))
print("\nk-Nearest Neighbors:")
print(f"Accuracy: {accuracy_score(y_test, knn_pred)}")
print("Classification Report:")
print(classification_report(y_test, knn_pred))
# Cross-validation for additional comparison
svm_scores = cross_val_score(svm_classifier, X, y, cv=5)
rf_scores = cross_val_score(rf_classifier, X, y, cv=5)
knn_scores = cross_val_score(knn_classifier, X, y, cv=5)
print("\nCross-validation Scores:")
print("Support Vector Machine:", np.mean(svm_scores))
print("Random Forest:", np.mean(rf_scores))
print("k-Nearest Neighbors:", np.mean(knn_scores))
Output :-
Practical – 13
Ques. Write a program in python for Empirical Comparison of different
Unsupervised learning techniques.
Code :-
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_blobs
from sklearn.cluster import KMeans, AgglomerativeClustering
from minisom import MiniSom
from sklearn.metrics import silhouette_score
# Generate a synthetic dataset
X, _ = make_blobs(n_samples=300, centers=4, random_state=42, cluster_std=1.0)
# Standardize the features
X_std = (X - X.mean(axis=0)) / X.std(axis=0)
# Apply K-means clustering
kmeans = KMeans(n_clusters=4, random_state=42)
kmeans_labels = kmeans.fit_predict(X_std)
# Apply Hierarchical clustering
agg_clustering = AgglomerativeClustering(n_clusters=4)
agg_labels = agg_clustering.fit_predict(X_std)
# Apply Kohonen Self-Organizing Maps (SOM)
som = MiniSom(10, 10, X_std.shape[1], sigma=1.0, learning_rate=0.5)
som.train_random(X_std, 100)
som_labels = np.array([som.winner(x) for x in X_std]).T[0]
# Evaluate clustering quality using silhouette score
kmeans_silhouette = silhouette_score(X_std, kmeans_labels)
agg_silhouette = silhouette_score(X_std, agg_labels)
som_silhouette = silhouette_score(X_std, som_labels)
# Print silhouette scores
print(f"K-means Silhouette Score: {kmeans_silhouette}")
print(f"Hierarchical Silhouette Score: {agg_silhouette}")
print(f"SOM Silhouette Score: {som_silhouette}")
# Visualize the clustering results
plt.figure(figsize=(15, 5))
plt.subplot(1, 3, 1)
plt.scatter(X_std[:, 0], X_std[:, 1], c=kmeans_labels, cmap='viridis')
plt.title('K-means Clustering')
plt.subplot(1, 3, 2)
plt.scatter(X_std[:, 0], X_std[:, 1], c=agg_labels, cmap='viridis')
plt.title('Hierarchical Clustering')
plt.subplot(1, 3, 3)
plt.scatter(X_std[:, 0], X_std[:, 1], c=som_labels, cmap='viridis')
plt.title('SOM Clustering')
plt.show()
Output :-