KEMBAR78
ML Lab | PDF | Regression Analysis | Errors And Residuals
0% found this document useful (0 votes)
22 views23 pages

ML Lab

The document is a record notebook for the PG practical examinations in Machine Learning at Mohamed Sathak Engineering College. It includes various experiments, such as Linear Regression, Binary Classification Model, and K-Nearest Neighbors, detailing aims, procedures, and sample code for each experiment. The document serves as a practical guide for students to implement and understand key machine learning concepts.

Uploaded by

nagajothiviya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views23 pages

ML Lab

The document is a record notebook for the PG practical examinations in Machine Learning at Mohamed Sathak Engineering College. It includes various experiments, such as Linear Regression, Binary Classification Model, and K-Nearest Neighbors, detailing aims, procedures, and sample code for each experiment. The document serves as a practical guide for students to implement and understand key machine learning concepts.

Uploaded by

nagajothiviya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

MOHAMED SATHAK ENGINEERING COLLEGE

KILAKARAI – 623806
(Approved by AICTE, Accredited by NAAC and NBA)

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

ANNA UNIVERSITY – CHENNAI – 600 025

PG PRACTICAL EXAMINATIONS
JULY-2024

RECORD NOTE BOOK

CP4252 – Machine Learning

Student Name :

Register Number :

Class & Semester : I M.E –CSE & II Semester

Semester Month & Year :


MOHAMED SATHAK ENGINEERING COLLEGE
KILAKARAI – 623806
(Approved by AICTE, Accredited by NAAC and NBA)

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

Name: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Class: . . . . . . . . . . . . . . . . . .
.

Register No:

Certified that this is the bonafide record of work done by above student in the
CP4252 – Machine Learning during the year 2023-2024.

.................... ..... ......... .............. ..... .........

Signature of Lab-in-charge Signature of H.O.D.

Submitted for the university practical examination held on . . . . . . . . . . . . .

.................... ..... ........ .................... ..... ........

Internal Examiner External Examiner


TABLE OF CONTENTS

S.No. Date Title Page No Remark


1. LINEAR REGRESSION

BINARY CLASSIFICATION MODEL


2.

CLASSIFICATION WITH NEAREST


NEIGHBOURS
3.

4. EXPERIMENT WITH VALIDATION


SET AND TEST SET
K-MEANS CLUSTERING
5.

NAIVE BAYES CLASSIFIER


6.

0
EX.NO:1 LINEAR REGRESSION

AIM:
To implement the linear regression model and to experiment with different features in
building a model.

DEFINITION:
Let us consider a dataset where we have a value of response y for every feature x:

Now, the task is to find a line that fits best in the above scatter plot so that we can
predict the response for any new feature values. (i.e a value of x not present in the datasetThis
line is called a regression line.

PROCEDURE:

• Importing required libraries like pandas & numpy for data analysis and manipulation and
seaborn & matplotlib for data visualization.
• Visualizing the variables in order to interpret business/domain inferences.
• Splitting the data into two sections in order to train a subset of dataset to generate a trained
(fitted) line

1
• Rescaling the trained model: It is a method used to normalize the range of numerical
variables with varying degrees of magnitude.
• Residual analysis of the train data tells us how much the errors are distributed across the
model. A good residual analysis will signify that the mean is centred around 0.
PROGRAM:
import numpy as np
import matplotlib.pyplot as plt
def estimate_coef(x, y):
# number of observations/points
n = np.size(x)
# mean of x and y vector
m_x = np.mean(x)
m_y = np.mean(y)
# calculating cross-deviation and deviation about x
SS_xy = np.sum(y*x) - n*m_y*m_x
SS_xx = np.sum(x*x) - n*m_x*m_x
# calculating regression coefficients
b_1 = SS_xy / SS_xx
b_0 = m_y - b_1*m_x
return (b_0, b_1)
def plot_regression_line(x, y, b):
# plotting the actual points as scatter plot
plt.scatter(x, y, color = "m",
marker = "o", s = 30)
# predicted response vector
y_pred = b[0] + b[1]*x
# plotting the regression line
plt.plot(x, y_pred, color = "g")
# putting labels
plt.xlabel('x')
plt.ylabel('y')
# function to show plot
plt.show()
def main():
# observations / data
x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
y = np.array([1, 3, 2, 5, 7, 8, 8, 9, 10, 12])
# estimating coefficients
b = estimate_coef(x, y)
print("Estimated coefficients:\nb_0 = {} \
\nb_1 = {}".format(b[0], b[1]))
# plotting regression line
plot_regression_line(x, y, b)
if name == " main ":
main()

2
OUTPUT:

RESULT:
Thus the program to implement linear regression model was implemented and executed
successfully.

3
EX.NO:2 BINARY CLASSIFICATION MODEL

AIM:
To write a program to implement the binary classification model using python.

PROCEDURE:
Step 1: Define explanatory and target variables
Step 2: Split the dataset into training and testing sets
Step 3: Normalize the data for numerical stability
Step 4: Fit a logistic regression model to the training data
Step 5: Make predictions on the testing data
Step 6: Calculate the accuracy score by comparing the actual values and predicted values.
PROGRAM:
import numpy as np
class Perceptron(object):
""" Perceptron Classifier
Parameters

rate : float
Learning rate (ranging from 0.0 to 1.0)
number_of_iteration : int
Number of iterations over the input dataset.
Attributes:

weight_matrix : 1d-array
Weights after fitting.
error_matrix : list
Number of misclassification in every epoch(one full training cycle on the training set)
"""
def init (self, rate = 0.01, number_of_iterations = 100):
self.rate = rate
self.number_of_iterations = number_of_iterations
def fit(self, X, y):
""" Fit training data
Parameters:

X : array-like, shape = [number_of_samples, number_of_features]


Training vectors.

4
y : array-like, shape = [number_of_samples]
Target values.
Returns

self : object
"""
self.weight_matrix = np.zeros(1 + X.shape[1])
self.errors_list = []
for _ in range(self.number_of_iterations):
errors = 0
for xi, target in zip(X, y):
update = self.rate * (target - self.predict(xi))
self.weight_matrix[1:] += update * xi
self.weight_matrix[0] += update
errors += int(update != 0.0)
self.errors_list.append(errors)
return self
def dot_product(self, X):
""" Calculate the dot product """
return (np.dot(X, self.weight_matrix[1:]) + self.weight_matrix[0])
def predict(self, X):
""" Predicting the label for the input data """
return np.where(self.dot_product(X) >= 0.0, 1, 0)
if name == ' main ':
X = np.array([[0, 0, 0], [0, 0, 1], [0, 1, 0], [0, 1, 1], [1, 0, 0], [1, 0, 1], [1, 1, 0]])
y = np.array([0, 1, 1, 1, 1, 1, 1])
p = Perceptron()
p.fit(X, y)
print("Predicting the output of [1, 1, 1] = {}".format(p.predict([1, 1, 1])))

OUTPUT:
Predicting the output of [1, 1, 1] = 1

RESULT:
Thus the program for implementing binary classification model was implemented and executed
successfully.

5
EX.NO:3 CLASSIFICATION WITH NEAREST NEIGHBOURS

AIM:
To write the program for the implementation of the k-nearest neighbor algorithm

ALGORITHM:
Step 1 − For implementing any algorithm, we need dataset. So during the first step of KNN, we
must load the training as well as test data.
Step 2 − Next, we need to choose the value of K i.e. the nearest data points. K can be any
integer.
Step 3 − For each point in the test data do the following −
• 3.1 − Calculate the distance between test data and each row of training data with the help
of any of the method namely: Euclidean, Manhattan or Hamming distance. The most
commonly used method to calculate distance is Euclidean.
• 3.2 − Now, based on the distance value, sort them in ascending order.
• 3.3 − Next, it will choose the top K rows from the sorted array.
• 3.4 − Now, it will assign a class to the test point based on most frequent class of these
rows.
Step 4 − End
PROGRAM:
# Import necessary modules
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris
# Loading data
irisData = load_iris()
# Create feature and target arrays
X = irisData.data
y = irisData.target
# Split into training and test set
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size = 0.2, random_state=42)
knn = KNeighborsClassifier(n_neighbors=7)
knn.fit(X_train, y_train)
# Predict on dataset which model has not seen before
print(knn.predict(X_test))

OUTPUT

[1 0 2 1 1 0 1 2 2 1 2 0 0 0 0 1 2 1 1 2 0 2 0 2 2 2 2 2 0 0]

6
PERFORMANCE

# Import necessary modules


from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris

# Loading data
irisData = load_iris()

# Create feature and target arrays


X = irisData.data
y = irisData.target

# Split into training and test set


X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size = 0.2, random_state=42)

knn = KNeighborsClassifier(n_neighbors=7)

knn.fit(X_train, y_train)

# Calculate the accuracy of the model


print(knn.score(X_test, y_test))

OUTPUT:
0.9666666666666667

MODEL ACCURACY:
# Import necessary modules
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris
import numpy as np
import matplotlib.pyplot as plt
irisData = load_iris()
# Create feature and target arrays
X = irisData.data
y = irisData.target
# Split into training and test set
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size = 0.2, random_state=42)
neighbors = np.arange(1, 9)
train_accuracy = np.empty(len(neighbors))
test_accuracy = np.empty(len(neighbors))

7
# Loop over K values
for i, k in enumerate(neighbors):
knn = KNeighborsClassifier(n_neighbors=k)
knn.fit(X_train, y_train)

# Compute training and test data accuracy


train_accuracy[i] = knn.score(X_train, y_train)
test_accuracy[i] = knn.score(X_test, y_test)

# Generate plot
plt.plot(neighbors, test_accuracy, label = 'Testing dataset Accuracy')
plt.plot(neighbors, train_accuracy, label = 'Training dataset Accuracy')
plt.legend()
plt.xlabel('n_neighbors')
plt.ylabel('Accuracy')
plt.show()

OUTPUT:

RESULT :

Thus the program for the implementation of the k-nearest neighbor algorithm was verified and
executed successfully.

8
EX.NO:4 EXPERIMENT WITH VALIDATION SET AND TEST SET

AIM:

To write an experiment with validation sets and test sets for the given dataset.

PROCEDURE:

Training Dataset
The sample of data used to fit the model.The actual dataset that we use to train the model
(weights and biases in the case of a Neural Network). The model sees and learns from this data.

Validation Dataset
The sample of data used to provide an unbiased evaluation of a model fit on the training dataset
while tuning model hyperparameters. The evaluation becomes more biased as skill on the
validation dataset is incorporated into the model configuration.

Test Dataset: The sample of data used to provide an unbiased evaluation of a final model fit on
the training dataset.The Test dataset provides the gold standard used to evaluate the model. It is
only used once a model is completely trained(using the train and validation sets).

PROGRAM:

# Importing numpy & scikit-learn


import numpy as np
from sklearn.model_selection import train_test_split

# Making a dummy array to


# represent x,y for example
# Making a array for x ranging
# from 0-15 then reshaping it
# to form a matrix of shape 8x2
x = np.arange(16).reshape((8,2))

# y is just a list of 0-7 number


# representing target variable

9
y = range(8)

# Splitting dataset in 80-20 fashion .i.e.


# Testing set is 20% of total data
# Training set is 80% of total data
x_train, x_test, y_train, y_test = train_test_split(x,y,

train_size=0.8,

random_state=42)

# Training set
print("Training set x: ",x_train)
print("Training set y: ",y_train)
# Importing numpy & scikit-learn
import numpy as np
from sklearn.model_selection import train_test_split

# Making a dummy array to represent x,y for example


# Making a array for x ranging from 0-15 then
# reshaping it to form a matrix of shape 8x2
x = np.arange(16).reshape((8, 2))

# y is just a list of 0-7 number representing


# target variable
y = range(8)

# Splitting dataset in 80-20 fashion .i.e.


# Training set is 80% of total data
# Testing set is 20% of total data
x_train, x_test, y_train, y_test = train_test_split(x, y,

test_size=0.2,

random_state=42)

# Testing set
print("Testing set x: ", x_test)
print("Testing set y: ", y_test)
# Importing numpy & scikit-learn
import numpy as np
from sklearn.model_selection import train_test_split

# Making a dummy array to represent x,y for example


# Making a array for x ranging from 0-23 then reshaping it
# to form a matrix of shape 8x3

10
x = np.arange(24).reshape((8,3))

# y is just a list of 0-7 number representing


# target variable
y = range(8)

# Splitting dataset in 80-20 fashion .i.e.


# Training set is 80% of total data
# Combined set of testing & validation is
# 20% of total data
x_train, x_Combine, y_train, y_Combine = train_test_split(x,y,train_size=0.8,random_state=42)
# Splitting combined dataset in 50-50 fashion .i.e.
# Testing set is 50% of combined dataset
# Validation set is 50% of combined dataset
x_val, x_test, y_val, y_test =
train_test_split(x_Combine,y_Combine,test_size=0.5,random_state=42)
# Training set
print("Training set x: ",x_train)
print("Training set y: ",y_train)
print(" ")

# Testing set
print("Testing set x: ",x_test)
print("Testing set y: ",y_test)
print(" ")

# Validation set
print("Validation set x: ",x_val)
print("Validation set y: ",y_val)

OUTPUT:

Training set x: [[ 0 1]
[14 15]
[ 4 5]
[ 8 9]
[ 6 7]
[12 13]]
Training set y: [0, 7, 2, 4, 3, 6]
Testing set x: [[ 2 3]
[10 11]]
Testing set y: [1, 5]
Training set x: [[ 0 1 2]
[21 22 23]
[ 6 7 8]
[12 13 14]

11
[ 9 10 11]
[18 19 20]]
Training set y: [0, 7, 2, 4, 3, 6]

Testing set x: [[15 16 17]]


Testing set y: [5]

Validation set x: [[3 4 5]]


Validation set y: [1]

RESULT:
Thus the program for the implementation of an experiment with validation sets and test sets for
the given dataset was verified and executed successfully.

12
EX.NO:5 K-MEANS CLUSTERING

AIM:
To write a program for the implementation of the Kmeans to the given dataset.
PROCEDURE:
Step-1: Select the number K to decide the number of clusters.
Step-2: Select random K points or centroids. (It can be other from the input dataset).
Step-3: Assign each data point to their closest centroid, which will form the predefined K
clusters.
Step-4: Calculate the variance and place a new centroid of each cluster.
Step-5: Repeat the third steps, which means reassign each datapoint to the new closest centroid
of each cluster.
Step-6: If any reassignment occurs, then go to step-4 else go to FINISH.
Step-7: The model is ready.

PROGRAM
from sklearn.cluster import KMeans
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
from matplotlib import pyplot as plt
from sklearn.datasets import load_iris

%matplotlib inline
iris = load_iris()
df = pd.DataFrame(iris.data,columns=iris.feature_names)
df.head()

OUTPUT:

13
df['flower'] = iris.target
df.head()
OUTPUT:

df.drop(['sepal length (cm)', 'sepal width (cm)', 'flower'],axis='columns',inplace=True)


df.head(3)

OUTPUT:

km = KMeans(n_clusters=3)

yp = km.fit_predict(df)
yp

OUTPUT:

14
df['cluster'] = yp
df.head(2)
OUTPUT:

df.cluster.unique()

OUTPUT:

df1 = df[df.cluster==0]
df2 = df[df.cluster==1]
df3 = df[df.cluster==2]

plt.scatter(df1['petal length (cm)'],df1['petal width (cm)'],color='blue')


plt.scatter(df2['petal length (cm)'],df2['petal width (cm)'],color='green')
plt.scatter(df3['petal length (cm)'],df3['petal width (cm)'],color='yellow')
OUTPUT:

15
sse = []

k_rng = range(1,10)
for k in k_rng:
km = KMeans(n_clusters=k)
km.fit(df)
sse.append(km.inertia_)
plt.xlabel('K')

plt.ylabel('Sum of squared error')


plt.plot(k_rng,sse)
OUTPUT:

16
RESULT :
Thus the program for the implementation of the Kmeans to the given dataset was verified and
executed successfully.

17
EX.NO:6 NAIVE BAYES CLASSIFIER

AIM:
To implement a program for Naïve Bayes model

NAIVE BAYES CLASSIFIER ALGORITHM


➢ Naive Bayes is among one of the very simple and powerful algorithms for classification
based on Bayes Theorem with an assumption of independence among the predictors.
➢ The Naive Bayes classifier assumes that the presence of a feature in a class is not
related to any other feature.
➢ Naive Bayes is a classification algorithm for binary and multi-class classification
problems.
Bayes Theorem
• Based on prior knowledge of conditions that may be related to an event, Bayes
theorem describes the probability of the event
• conditional probability can be found this way
• Assume we have a Hypothesis(H) and evidence(E),
• According to Bayes theorem, the relationship between the probability of
Hypothesis before getting the evidence represented as P(H) and the probability
of the hypothesis after getting the evidence represented as P(H|E) is:

P(H|E) = P(E|H)*P(H)/P(E)

STEPS INVOLVE NAÏVE BAYES ALGORITHM


Step 1: Handling Data
Data is loaded from the .csv file and spread into training and tested assets.
Step 2: Summarizing the data
Summarise the properties in the training data set to calculate the probabilities and make
predictions.
Step 3: Making a Prediction
A particular prediction is made using a summarise of the data set to make a single prediction
Step 4: Making all the Predictions
Generate prediction given a test data set and a summarise data set.
Step 4: Evaluate Accuracy:
Accuracy of the prediction model for the test data set as a percentage correct out of them all the
predictions made.

18
Step 4: Trying all together
Finally, we tie to all steps together and form our own model of Naive Bayes Classifier.

PROGRAM:
# the tuples consist of (delay time of train1, number of times)

# tuples are (minutes, number of times)


in_time = [(0, 22), (1, 19), (2, 17), (3, 18),
(4, 16), (5, 15), (6, 9), (7, 7),
(8, 4), (9, 3), (10, 3), (11, 2)]
too_late = [(6, 6), (7, 9), (8, 12), (9, 17),
(10, 18), (11, 15), (12,16), (13, 7),
(14, 8), (15, 5)]
%matplotlib inline

import matplotlib.pyplot as plt

X, Y = zip(*in_time)

X2, Y2 = zip(*too_late)

bar_width = 0.9
plt.bar(X, Y, bar_width, color="blue", alpha=0.75, label="in time")
bar_width = 0.8
plt.bar(X2, Y2, bar_width, color="red", alpha=0.75, label="too late")
plt.legend(loc='upper right')
plt.show()
in_time_dict = dict(in_time)
too_late_dict = dict(too_late)

def catch_the_train(min):
s = in_time_dict.get(min, 0)
if s == 0:
return 0
else:
m = too_late_dict.get(min, 0)
return s / (s + m)

for minutes in range(-1, 13):


print(minutes, catch_the_train(minutes))

19
OUTPUT:

-1 0
0 1.0
1 1.0
2 1.0
3 1.0
4 1.0
5 1.0
6 0.6
7 0.4375
8 0.25
9 0.15
10 0.14285714285714285
11 0.11764705882352941
12 0

RESULT:
Thus the program to implement naïve bayes classifier hass been verified and executed
successfully.

20

You might also like