5/10/22, 2:58 PM Lab7.
ipynb - Colaboratory
Implementing K Nearest Naighbour for a dataset.
Importing Libraries and Dataset: -
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from google.colab import files
uploaded = files.upload()
Choose Files Iris.csv
Iris.csv(text/csv) - 5107 bytes, last modified: 3/17/2022 - 100% done
Saving Iris.csv to Iris.csv
Creating Data frame: -
df=pd.read_csv('Iris.csv')
Printing first 10 values: -
df.head(10)
Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species
0 1 5.1 3.5 1.4 0.2 Iris-setosa
1 2 4.9 3.0 1.4 0.2 Iris-setosa
2 3 4.7 3.2 1.3 0.2 Iris-setosa
3 4 4.6 3.1 1.5 0.2 Iris-setosa
4 5 5.0 3.6 1.4 0.2 Iris-setosa
5 6 5.4 3.9 1.7 0.4 Iris-setosa
6 7 4.6 3.4 1.4 0.3 Iris-setosa
7 8 5.0 3.4 1.5 0.2 Iris-setosa
8 9 4.4 2.9 1.4 0.2 Iris-setosa
9 10 4.9 3.1 1.5 0.1 Iris-setosa
Printing the all information of the dataset: -
df.info()
https://colab.research.google.com/drive/17EdAX0gZZGyDlojce0QA0Dn3jdLQt0Fa?authuser=1#scrollTo=IDas4r15mL2H&printMode=true 1/5
5/10/22, 2:58 PM Lab7.ipynb - Colaboratory
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 6 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Id 150 non-null int64
1 SepalLengthCm 150 non-null float64
2 SepalWidthCm 150 non-null float64
3 PetalLengthCm 150 non-null float64
4 PetalWidthCm 150 non-null float64
5 Species 150 non-null object
dtypes: float64(4), int64(1), object(1)
memory usage: 7.2+ KB
Checking is there exists any null values in the dataset or not: -
df[df.isnull().any(axis=1)].head()
Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species
Creating independent variable: -
X=df.iloc[:,[1,2,3,4]].values
Creating dependent variable: -
Y=df.iloc[:,5]
Splitting the dataset: -
from sklearn.model_selection import train_test_split
train_X,test_X,train_Y,test_Y = train_test_split(X, Y, test_size=0.3, random_state=0)
Standardizing the dataset: -
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
train_X = sc.fit_transform(train_X)
test_X = sc.transform(test_X)
Finding the optimised value of K: -
import math
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score
https://colab.research.google.com/drive/17EdAX0gZZGyDlojce0QA0Dn3jdLQt0Fa?authuser=1#scrollTo=IDas4r15mL2H&printMode=true 2/5
5/10/22, 2:58 PM Lab7.ipynb - Colaboratory
n=len(df.index)
li=list()
li2=list()
for i in range(1,int(pow(n,1/2))):
kclass = KNeighborsClassifier(n_neighbors = i, metric = 'minkowski', p = 2)
kclass.fit(train_X, train_Y)
y_pred = kclass.predict(test_X)
ac = accuracy_score(test_Y,y_pred)
li.append(ac)
li2.append(i)
max = li[0]
index = 0
for i in range(1,len(li)):
if li[i] > max:
max = li[i]
index = i
k=li2[index]
print("The value of K is = ",k)
plt.plot(li2,li)
plt.title("Graph showing the Accuracy with K",size=15,fontweight="bold")
plt.xlabel("Value of K",size=12,fontweight="bold")
plt.ylabel("Accuracy",size=12,fontweight="bold")
plt.show()
The value of K is = 3
Importing the KNN classifier for implementing the model: -
from sklearn.neighbors import KNeighborsClassifier
kclass = KNeighborsClassifier(n_neighbors = k, metric = 'minkowski', p = 2)
Training the model: -
https://colab.research.google.com/drive/17EdAX0gZZGyDlojce0QA0Dn3jdLQt0Fa?authuser=1#scrollTo=IDas4r15mL2H&printMode=true 3/5
5/10/22, 2:58 PM Lab7.ipynb - Colaboratory
kclass.fit(train_X, train_Y)
KNeighborsClassifier(n_neighbors=3)
Predicting the values of the Y(y_pred): -
y_pred = kclass.predict(test_X)
The values of the predicted y are : -
y_pred
array(['Iris-virginica', 'Iris-versicolor', 'Iris-setosa',
'Iris-virginica', 'Iris-setosa', 'Iris-virginica', 'Iris-setosa',
'Iris-versicolor', 'Iris-versicolor', 'Iris-versicolor',
'Iris-virginica', 'Iris-versicolor', 'Iris-versicolor',
'Iris-versicolor', 'Iris-versicolor', 'Iris-setosa',
'Iris-versicolor', 'Iris-versicolor', 'Iris-setosa', 'Iris-setosa',
'Iris-virginica', 'Iris-versicolor', 'Iris-setosa', 'Iris-setosa',
'Iris-virginica', 'Iris-setosa', 'Iris-setosa', 'Iris-versicolor',
'Iris-versicolor', 'Iris-setosa', 'Iris-virginica',
'Iris-versicolor', 'Iris-setosa', 'Iris-virginica',
'Iris-virginica', 'Iris-versicolor', 'Iris-setosa',
'Iris-virginica', 'Iris-versicolor', 'Iris-versicolor',
'Iris-virginica', 'Iris-setosa', 'Iris-virginica', 'Iris-setosa',
'Iris-setosa'], dtype=object)
Performance Measure: -
from sklearn.metrics import confusion_matrix,accuracy_score
cm = confusion_matrix(test_Y, y_pred)
print("Confusion Matrix: -\n",cm)
ac = accuracy_score(test_Y,y_pred)
print("\nAccuracy of the model(in %) is = ",ac*100)
Confusion Matrix: -
[[16 0 0]
[ 0 17 1]
[ 0 0 11]]
Accuracy of the model(in %) is = 97.77777777777777
https://colab.research.google.com/drive/17EdAX0gZZGyDlojce0QA0Dn3jdLQt0Fa?authuser=1#scrollTo=IDas4r15mL2H&printMode=true 4/5
5/10/22, 2:58 PM Lab7.ipynb - Colaboratory
check 0s completed at 2:57 PM
https://colab.research.google.com/drive/17EdAX0gZZGyDlojce0QA0Dn3jdLQt0Fa?authuser=1#scrollTo=IDas4r15mL2H&printMode=true 5/5