3/17/25, 9:13 AM 210596_ML_Labtask5.
ipynb - Colab
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.naive_bayes import GaussianNB
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, precision_score, recall_score,f1_score, confusion_matrix, ConfusionMatrixDisplay
import seaborn as sns
from sklearn.neighbors import KNeighborsClassifier as KNN
from sklearn.tree import DecisionTreeClassifier as DT
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import LabelEncoder
from sklearn.ensemble import RandomForestClassifier
dataset = pd.read_csv('/content/accident- DATASET ML Lab Task 3.csv')
dataset.head()
Age Gender Speed_of_Impact Helmet_Used Seatbelt_Used Survived
0 56 Female 27.0 No No 1
1 69 Female 46.0 No Yes 1
2 46 Male 46.0 Yes Yes 0
3 32 Male 117.0 No Yes 0
4 60 Female 40.0 Yes Yes 0
Next steps: Generate code with dataset toggle_off View recommended plots New interactive sheet
dataset.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 196 entries, 0 to 195
Data columns (total 6 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Age 196 non-null int64
1 Gender 196 non-null object
2 Speed_of_Impact 196 non-null float64
3 Helmet_Used 196 non-null object
4 Seatbelt_Used 196 non-null object
5 Survived 196 non-null int64
dtypes: float64(1), int64(2), object(3)
memory usage: 9.3+ KB
dataset.dtypes
Age int64
Gender object
Speed_of_Impact float64
Helmet_Used object
Seatbelt_Used object
Survived int64
dtype: object
from sklearn.preprocessing import LabelEncoder
encoder1=LabelEncoder()
dataset['Gender']=encoder1.fit_transform(dataset['Gender'])
encoder2=LabelEncoder()
dataset['Helmet_Used']=encoder2.fit_transform(dataset['Helmet_Used'])
encoder3=LabelEncoder()
dataset['Seatbelt_Used']=encoder3.fit_transform(dataset['Seatbelt_Used'])
encoder4= LabelEncoder()
dataset['Survived']=encoder4.fit_transform(dataset['Survived'])
https://colab.research.google.com/drive/1VZpYsgzp-ngT7qoaguxyI4vMpI_QSBfF#scrollTo=gM4UulYfg9u4&printMode=true 1/8
3/17/25, 9:13 AM 210596_ML_Labtask5.ipynb - Colab
dataset
Age Gender Speed_of_Impact Helmet_Used Seatbelt_Used Survived
0 56 0 27.0 0 0 1
1 69 0 46.0 0 1 1
2 46 1 46.0 1 1 0
3 32 1 117.0 0 1 0
4 60 0 40.0 1 1 0
... ... ... ... ... ... ...
191 69 0 111.0 0 1 1
192 30 0 51.0 0 1 1
193 58 1 110.0 0 1 1
194 20 1 103.0 0 1 1
195 56 0 43.0 0 1 1
196 rows × 6 columns
Next steps: Generate code with dataset toggle_off View recommended plots New interactive sheet
dataset.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 196 entries, 0 to 195
Data columns (total 6 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Age 196 non-null int64
1 Gender 196 non-null int64
2 Speed_of_Impact 196 non-null float64
3 Helmet_Used 196 non-null int64
4 Seatbelt_Used 196 non-null int64
5 Survived 196 non-null int64
dtypes: float64(1), int64(5)
memory usage: 9.3 KB
encoder1.classes_
array([0, 1])
encoder1.transform(encoder1.classes_)
array([0, 1])
encoder2.classes_
array([0, 1])
encoder2.transform(encoder2.classes_)
array([0, 1])
x = dataset.drop(columns=['Survived'])
y = dataset['Survived']
keyboard_arrow_down For KNN as K=1
modelKNN2=KNN(n_neighbors=1)
modelKNN2.fit(x,y)
▾ KNeighborsClassifier i ?
KNeighborsClassifier(n_neighbors=1)
https://colab.research.google.com/drive/1VZpYsgzp-ngT7qoaguxyI4vMpI_QSBfF#scrollTo=gM4UulYfg9u4&printMode=true 2/8
3/17/25, 9:13 AM 210596_ML_Labtask5.ipynb - Colab
predict=modelKNN2.predict(x)
predict
array([1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1,
0, 1, 1, 0, 1, 0, 0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 0, 1, 0,
1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0,
1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 1, 0, 1, 0, 0, 0,
1, 1, 1, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0,
0, 1, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0,
1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0, 1, 0,
0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0,
1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1])
precision_test = precision_score(y, predict)
accuracy_test = accuracy_score(y, predict)
recall_test = recall_score(y, predict)
f1_test = f1_score(y, predict)
confusion_test = confusion_matrix(y, predict)
print("Precision:", precision_test)
print("Accuracy:", accuracy_test)
print("Recall:", recall_test)
print("F1 Score:", f1_test)
print("Confusion Matrix:\n", confusion_test)
Precision: 1.0
Accuracy: 1.0
Recall: 1.0
F1 Score: 1.0
Confusion Matrix:
[[ 96 0]
[ 0 100]]
confusion_test_plot = ConfusionMatrixDisplay(confusion_matrix = confusion_test, display_labels = [0,1])
confusion_test_plot.plot()
plt.show()
keyboard_arrow_down For KNN as K=3
modelKNN2=KNN(n_neighbors=3)
modelKNN2.fit(x,y)
▾ KNeighborsClassifier i ?
KNeighborsClassifier(n_neighbors=3)
predict=modelKNN2.predict(x)
predict
https://colab.research.google.com/drive/1VZpYsgzp-ngT7qoaguxyI4vMpI_QSBfF#scrollTo=gM4UulYfg9u4&printMode=true 3/8
3/17/25, 9:13 AM 210596_ML_Labtask5.ipynb - Colab
array([0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 0, 0, 0,
0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 1, 1, 0, 1, 0,
0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 0, 0,
1, 1, 0, 1, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 0, 0, 0,
0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 0,
0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0,
0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 1, 0, 1, 0,
1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 0, 1,
0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 1])
precision_test = precision_score(y, predict)
accuracy_test = accuracy_score(y, predict)
recall_test = recall_score(y, predict)
f1_test = f1_score(y, predict)
confusion_test = confusion_matrix(y, predict)
print("Precision:", precision_test)
print("Accuracy:", accuracy_test)
print("Recall:", recall_test)
print("F1 Score:", f1_test)
print("Confusion Matrix:\n", confusion_test)
Precision: 0.7551020408163265
Accuracy: 0.7448979591836735
Recall: 0.74
F1 Score: 0.7474747474747475
Confusion Matrix:
[[72 24]
[26 74]]
confusion_test_plot = ConfusionMatrixDisplay(confusion_matrix = confusion_test, display_labels = [0,1])
confusion_test_plot.plot()
plt.show()
keyboard_arrow_down For KNN as K=5
modelKNN2=KNN(n_neighbors=5)
modelKNN2.fit(x,y)
▾ KNeighborsClassifier i ?
KNeighborsClassifier()
predict=modelKNN2.predict(x)
predict
array([1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0,
0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 0,
1, 1, 0, 1, 1, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0,
https://colab.research.google.com/drive/1VZpYsgzp-ngT7qoaguxyI4vMpI_QSBfF#scrollTo=gM4UulYfg9u4&printMode=true 4/8
3/17/25, 9:13 AM 210596_ML_Labtask5.ipynb - Colab
0, 1, 0, 1, 0, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 0,
0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 0, 0,
0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0, 1, 0,
0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 1,
0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 1, 1, 1, 0, 1])
precision_test = precision_score(y, predict)
accuracy_test = accuracy_score(y, predict)
recall_test = recall_score(y, predict)
f1_test = f1_score(y, predict)
confusion_test = confusion_matrix(y, predict)
print("Precision:", precision_test)
print("Accuracy:", accuracy_test)
print("Recall:", recall_test)
print("F1 Score:", f1_test)
print("Confusion Matrix:\n", confusion_test)
Precision: 0.7184466019417476
Accuracy: 0.7193877551020408
Recall: 0.74
F1 Score: 0.729064039408867
Confusion Matrix:
[[67 29]
[26 74]]
confusion_test_plot = ConfusionMatrixDisplay(confusion_matrix = confusion_test, display_labels = [0,1])
confusion_test_plot.plot()
plt.show()
keyboard_arrow_down For KNN as K=7
modelKNN2=KNN(n_neighbors=7)
modelKNN2.fit(x,y)
▾ KNeighborsClassifier i ?
KNeighborsClassifier(n_neighbors=7)
predict=modelKNN2.predict(x)
predict
array([1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0,
0, 1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 1, 0,
0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0,
1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 0, 1, 0, 0,
0, 0, 1, 1, 0, 1, 1, 1, 1, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0,
1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 0, 1, 1, 0, 1, 1, 0,
0, 1, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0, 1, 0,
https://colab.research.google.com/drive/1VZpYsgzp-ngT7qoaguxyI4vMpI_QSBfF#scrollTo=gM4UulYfg9u4&printMode=true 5/8
3/17/25, 9:13 AM 210596_ML_Labtask5.ipynb - Colab
1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 1,
0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 1])
precision_test = precision_score(y, predict)
accuracy_test = accuracy_score(y, predict)
recall_test = recall_score(y, predict)
f1_test = f1_score(y, predict)
confusion_test = confusion_matrix(y, predict)
print("Precision:", precision_test)
print("Accuracy:", accuracy_test)
print("Recall:", recall_test)
print("F1 Score:", f1_test)
print("Confusion Matrix:\n", confusion_test)
Precision: 0.7184466019417476
Accuracy: 0.7193877551020408
Recall: 0.74
F1 Score: 0.729064039408867
Confusion Matrix:
[[67 29]
[26 74]]
confusion_test_plot = ConfusionMatrixDisplay(confusion_matrix = confusion_test, display_labels = [0,1])
confusion_test_plot.plot()
plt.show()
keyboard_arrow_down Decision Tree Model
modelDT=DT()
modelDT.fit(x,y)
▾ DecisionTreeClassifier i ?
DecisionTreeClassifier()
predict=modelDT.predict(x)
predict
array([1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1,
0, 1, 1, 0, 1, 0, 0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 0, 1, 0,
1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0,
1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 1, 0, 1, 0, 0, 0,
1, 1, 1, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0,
0, 1, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0,
1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0, 1, 0,
0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0,
1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1])
https://colab.research.google.com/drive/1VZpYsgzp-ngT7qoaguxyI4vMpI_QSBfF#scrollTo=gM4UulYfg9u4&printMode=true 6/8
3/17/25, 9:13 AM 210596_ML_Labtask5.ipynb - Colab
precision_test = precision_score(y, predict)
accuracy_test = accuracy_score(y, predict)
recall_test = recall_score(y, predict)
f1_test = f1_score(y, predict)
confusion_test = confusion_matrix(y, predict)
print('Precision:', precision_test)
print('Accuracy:', accuracy_test)
print('Recall:', recall_test)
print('F1 Score:', f1_test)
print('Confusion Matrix:\n', confusion_test)
Precision: 1.0
Accuracy: 1.0
Recall: 1.0
F1 Score: 1.0
Confusion Matrix:
[[ 96 0]
[ 0 100]]
confusion_test_plot = ConfusionMatrixDisplay(confusion_matrix = confusion_test, display_labels = [0,1])
confusion_test_plot.plot()
plt.show()
Conclusion: KNN vs. Decision Tree Model
The comparison between K-Nearest Neighbors (KNN) and the Decision Tree model highlights key differences in their behavior and
performance.
1. KNN Model Performance:
The choice of k-value significantly impacts KNN's accuracy.
Lower k-values (e.g., k=1, k=3) often lead to higher variance (overfitting), meaning the model may perform well on training data
but generalize poorly to new data.
Higher k-values (e.g., k=5, k=7) improve generalization by reducing sensitivity to noise but may lower accuracy slightly due to
smoothing effects.
KNN is computationally expensive for large datasets since it requires distance calculations for every prediction.
2. Decision Tree Model Performance:
Decision Trees tend to be faster and easier to interpret compared to KNN.
However, they are prone to overfitting, especially if the depth is not controlled.
Pruning techniques or setting depth limits can improve Decision Tree generalization.
Unlike KNN, Decision Trees do not rely on distance metrics, making them more robust to irrelevant features.
Extracted Accuracy Values:
The following accuracy scores for different models:
https://colab.research.google.com/drive/1VZpYsgzp-ngT7qoaguxyI4vMpI_QSBfF#scrollTo=gM4UulYfg9u4&printMode=true 7/8
3/17/25, 9:13 AM 210596_ML_Labtask5.ipynb - Colab
KNN (k=1): Accuracy = 1.0 (100%) (Likely overfitting)
KNN (k=3): Accuracy = 0.7448 (74.48%)
KNN (k=5): Accuracy = 0.7193 (71.93%)
KNN (k=7): Accuracy = 0.7194 (71.94%)
Decision Tree: Accuracy = 1.0 (100%) (Likely overfitting)
Comparison Based on k-Values:
k=1: Memorizes the data, causing overfitting.
k=3: Balances accuracy and generalization, yielding 74.48% accuracy.
k=5: Accuracy stabilizes at 71.93%, showing less variance but reduced performance.
k=7: Accuracy stabilizes at 71.94%, showing less variance but reduced performance.
Decision Tree: Perfect accuracy, likely due to overfitting.
Final Thoughts:
If the dataset is small and well-separated, Decision Trees may be a better choice due to their interpretability.
If the dataset is large and complex, KNN with a well-chosen k-value can offer competitive results.
Hyperparameter tuning (pruning for Decision Trees, k-value selection for KNN) is essential for optimal performance.
https://colab.research.google.com/drive/1VZpYsgzp-ngT7qoaguxyI4vMpI_QSBfF#scrollTo=gM4UulYfg9u4&printMode=true 8/8