PCTE Group of Institutes,
Ludhiana
Machine Learning
PRACTICAL FILE
(BCA 3(A) 6th - SEMESTER 2021-24)
NAME: Saurav
SUBJECT: Machine Learning
UNIV.ROLL.NO: 2123397
SUBMITTED TO: Ms. Navkiran Gill
OFFICIAL E-MAIL ADDRESS: sauravpctebca21a@gmail.com
Table of Contents
Sr No Topics Page No
1 Reading the csv File 1
2 Pre-Processing of Data 2-4
Linear Regression using sklearn, matplotlib and seaborn
3 5-6
Write a program to demonstrate the working of the decision tree
4 algorithm. Use an appropriate data set for building the decision tree 7-9
and apply this knowledge to classify a new sample.
Write a program to demonstrate the working of the Random Forest
5 10 - 11
algorithm.
Write a program to implement the naïve Bayesian classifier for a
sample training data set stored as a .CSV file. Compute the accuracy of
6 12 – 14
the classifier, considering few test data sets
Write a program to implement k-Nearest Neighbour algorithm to
7 classify the iris data set. Print both correct and wrong predictions. 15 – 16
Java/Python ML library classes can be used for this problem
Write a program to demonstrate the working of the K-means clustering
8 17 – 19
algorithm.
Write a program to demonstrate the working of the Support Vector
9 20 - 22
Machine for Classification Algorithm.
Task 1: Reading the csv File
import pandas as pd
data = pd.read_csv("1.csv")
d
1
Task 2: Pre-Processing of Data
data.head()
data.tail()
data.isnull()
2
data.dropna()
data.fillna(0)
3
data.fillna(method="ffill")
data.fillna(method="bfill")
4
Task 3: Linear Regression sklearn, matplotlib and seaborn
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
import seaborn as sns
from sklearn.linear_model import LinearRegression
score_df = pd.read_csv('student_scores.csv')
score_df.head()
score_df.describe()
5
X = score_df.iloc[:, :-1].values
y = score_df.iloc[:, 1].values
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=0)
#In this case, setting random_state=0 ensures that the random splitting of the
data into
#training and testing sets is the same every time you run the code with the same
dataset.
#This reproducibility can be useful for debugging, sharing code, or ensuring
consistent results in different runs.
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train, y_train)
LinearRegression()
y_pred = regressor.predict(X_test)
plt.scatter(X_train, y_train,color='g')
plt.plot(X_test, y_pred,color='k')
plt.xlabel("Hours", fontsize=15)
plt.ylabel("Score", fontsize=15)
plt.show()
6
Task 4: Write a program to demonstrate the working of the
decision tree algorithm. Use an appropriate data set for building
the decision tree and apply this knowledge to classify a new
sample.
7
8
9
Task 5: Write a program to demonstrate the working of the
Random Forest algorithm.
10
11
Task 6: Write a program to implement the naïve Bayesian classifier for
a sample training data set stored as a .CSV file. Compute the accuracy
of the classifier, considering few test data sets.
12
13
14
Task 7: Write a program to implement k-Nearest Neighbour
algorithm to classify the iris data set. Print both correct and
wrong predictions. Java/Python ML library classes can be used
for this problem
15
16
Task 8: Write a program to demonstrate the working of the K-
means clustering algorithm.
17
18
19
Task 9: Write a program to demonstrate the working of the
Support Vector Machine for Classification Algorithm.
20
21
22