4/5/23, 12:36 PM Lecture-11-K Nearest Neighbors-Part2 - Jupyter Notebook
Lecture 11-K Nearest Neighbour-Part 2
K Nearest Neighbors with Python
Import Libraries
In [10]: 1 import pandas as pd
2 import seaborn as sns
3 import matplotlib.pyplot as plt
4 import numpy as np
5 %matplotlib inline
Get the Data
In [11]: 1 df = pd.read_csv('Downloads/KNN_Project_Data')
In [12]: 1 df.head()
Out[12]:
XVPM GWYH TRAT TLLZ IGGA HYKR EDFS
0 1636.670614 817.988525 2565.995189 358.347163 550.417491 1618.870897 2147.641254 33
1 1013.402760 577.587332 2644.141273 280.428203 1161.873391 2084.107872 853.404981 44
2 1300.035501 820.518697 2025.854469 525.562292 922.206261 2552.355407 818.676686 84
3 1059.347542 1066.866418 612.000041 480.827789 419.467495 685.666983 852.867810 34
4 1018.340526 1313.679056 950.622661 724.742174 843.065903 1370.554164 905.469453 65
Standardize the Variables
In [4]: 1 from sklearn.preprocessing import StandardScaler
In [5]: 1 scaler = StandardScaler()
In [13]: 1 scaler.fit(df.drop('TARGET CLASS',axis=1))
Out[13]: StandardScaler()
localhost:8890/notebooks/Lecture-11-K Nearest Neighbors-Part2.ipynb 1/6
4/5/23, 12:36 PM Lecture-11-K Nearest Neighbors-Part2 - Jupyter Notebook
In [16]: 1 scaled_features = scaler.transform(df.drop('TARGET CLASS',axis=1))
In [20]: 1 df_feat = pd.DataFrame(scaled_features,columns=df.columns[:-1])
2 df_feat.head()
Out[20]:
XVPM GWYH TRAT TLLZ IGGA HYKR EDFS GUUB MGJM
0 1.568522 -0.443435 1.619808 -0.958255 -1.128481 0.138336 0.980493 -0.932794 1.008313
1 -0.112376 -1.056574 1.741918 -1.504220 0.640009 1.081552 -1.182663 -0.461864 0.258321
2 0.660647 -0.436981 0.775793 0.213394 -0.053171 2.030872 -1.240707 1.149298 2.184784
3 0.011533 0.191324 -1.433473 -0.100053 -1.507223 -1.753632 -1.183561 -0.888557 0.162310
4 -0.099059 0.820815 -0.904346 1.609015 -0.282065 -0.365099 -1.095644 0.391419 -1.365603
Train Test Split
In [21]: 1 from sklearn.model_selection import train_test_split
In [22]: 1 X_train, X_test, y_train, y_test = train_test_split(scaled_features,df['TA
2 test_size=0.30)
Using KNN
Remember that we are trying to come up with a model to predict whether someone will TARGET
CLASS or not. We'll start with k=1.
In [23]: 1 from sklearn.neighbors import KNeighborsClassifier
In [24]: 1 knn = KNeighborsClassifier(n_neighbors=1)
In [25]: 1 knn.fit(X_train,y_train)
Out[25]: KNeighborsClassifier(n_neighbors=1)
In [26]: 1 pred = knn.predict(X_test)
Predictions and Evaluations
Let's evaluate our KNN model!
In [27]: 1 from sklearn.metrics import classification_report,confusion_matrix
localhost:8890/notebooks/Lecture-11-K Nearest Neighbors-Part2.ipynb 2/6
4/5/23, 12:36 PM Lecture-11-K Nearest Neighbors-Part2 - Jupyter Notebook
In [28]: 1 print(confusion_matrix(y_test,pred))
[[109 45]
[ 33 113]]
In [29]: 1 print(classification_report(y_test,pred))
precision recall f1-score support
0 0.77 0.71 0.74 154
1 0.72 0.77 0.74 146
accuracy 0.74 300
macro avg 0.74 0.74 0.74 300
weighted avg 0.74 0.74 0.74 300
Choosing a K Value
Let's go ahead and use the elbow method to pick a good K Value:
In [30]: 1 error_rate = []
2
3 # Will take some time
4 for i in range(1,40):
5
6 knn = KNeighborsClassifier(n_neighbors=i)
7 knn.fit(X_train,y_train)
8 pred_i = knn.predict(X_test)
9 error_rate.append(np.mean(pred_i != y_test))
localhost:8890/notebooks/Lecture-11-K Nearest Neighbors-Part2.ipynb 3/6
4/5/23, 12:36 PM Lecture-11-K Nearest Neighbors-Part2 - Jupyter Notebook
In [31]: 1 plt.figure(figsize=(10,6))
2 plt.plot(range(1,40),error_rate,color='blue', linestyle='dashed', marker='
3 markerfacecolor='red', markersize=10)
4 plt.title('Error Rate vs. K Value')
5 plt.xlabel('K')
6 plt.ylabel('Error Rate')
Out[31]: Text(0, 0.5, 'Error Rate')
localhost:8890/notebooks/Lecture-11-K Nearest Neighbors-Part2.ipynb 4/6
4/5/23, 12:36 PM Lecture-11-K Nearest Neighbors-Part2 - Jupyter Notebook
In [32]: 1 # FIRST A QUICK COMPARISON TO OUR ORIGINAL K=1
2 knn = KNeighborsClassifier(n_neighbors=1)
3
4 knn.fit(X_train,y_train)
5 pred = knn.predict(X_test)
6
7 print('WITH K=1')
8 print('\n')
9 print(confusion_matrix(y_test,pred))
10 print('\n')
11 print(classification_report(y_test,pred))
WITH K=1
[[109 45]
[ 33 113]]
precision recall f1-score support
0 0.77 0.71 0.74 154
1 0.72 0.77 0.74 146
accuracy 0.74 300
macro avg 0.74 0.74 0.74 300
weighted avg 0.74 0.74 0.74 300
localhost:8890/notebooks/Lecture-11-K Nearest Neighbors-Part2.ipynb 5/6
4/5/23, 12:36 PM Lecture-11-K Nearest Neighbors-Part2 - Jupyter Notebook
In [35]: 1 # NOW WITH K=3
2 knn = KNeighborsClassifier(n_neighbors=23)
3
4 knn.fit(X_train,y_train)
5 pred = knn.predict(X_test)
6
7 print('WITH K=30')
8 print('\n')
9 print(confusion_matrix(y_test,pred))
10 print('\n')
11 print(classification_report(y_test,pred))
WITH K=30
[[114 40]
[ 20 126]]
precision recall f1-score support
0 0.85 0.74 0.79 154
1 0.76 0.86 0.81 146
accuracy 0.80 300
macro avg 0.80 0.80 0.80 300
weighted avg 0.81 0.80 0.80 300
In [ ]: 1
localhost:8890/notebooks/Lecture-11-K Nearest Neighbors-Part2.ipynb 6/6