11/29/24, 9:30 PM SVM and Kmeans -Iris dataset.
ipynb - Colab
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
import matplotlib.pyplot as plt
import seaborn as sns
!kaggle datasets download -d uciml/iris
Dataset URL: https://www.kaggle.com/datasets/uciml/iris
License(s): CC0-1.0
Downloading iris.zip to /content
0% 0.00/3.60k [00:00<?, ?B/s]
100% 3.60k/3.60k [00:00<00:00, 7.28MB/s]
Loading of the dataset and creating dataframe
!unzip iris.zip
Archive: iris.zip
inflating: Iris.csv
inflating: database.sqlite
df = pd.read_csv('Iris.csv')
print(df.head())
Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species
0 1 5.1 3.5 1.4 0.2 Iris-setosa
1 2 4.9 3.0 1.4 0.2 Iris-setosa
2 3 4.7 3.2 1.3 0.2 Iris-setosa
3 4 4.6 3.1 1.5 0.2 Iris-setosa
4 5 5.0 3.6 1.4 0.2 Iris-setosa
Changing categorical to numbers
df['Species'] = df['Species'].astype('category').cat.codes
Selection of columns and assigning to X and Y
X = df.iloc[:, :-1].values
y = df.iloc[:, -1].values
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
print("Training set shape:", X_train.shape)
print("Test set shape:", X_test.shape)
Training set shape: (120, 5)
Test set shape: (30, 5)
Training of the SVM Model
svm_model = SVC(kernel='linear', C=1.0, random_state=42)
Model fitting and Prediction
svm_model.fit(X_train, y_train)
y_pred = svm_model.predict(X_test)
Evaluation Metrics and Parameters
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
print("\nClassification Report:")
print(classification_report(y_test, y_pred))
Accuracy: 1.0
Classification Report:
precision recall f1-score support
0 1.00 1.00 1.00 10
https://colab.research.google.com/drive/1kDkVaGxeyPshe6mgQPxShanNabVTF1v_#scrollTo=oHHbeiXRnXVu&printMode=true 1/5
11/29/24, 9:30 PM SVM and Kmeans -Iris dataset.ipynb - Colab
1 1.00 1.00 1.00 9
2 1.00 1.00 1.00 11
accuracy 1.00 30
macro avg 1.00 1.00 1.00 30
weighted avg 1.00 1.00 1.00 30
Confusion Matrix
conf_matrix = confusion_matrix(y_test, y_pred)
print("\nConfusion Matrix:")
print(conf_matrix)
Confusion Matrix:
[[10 0 0]
[ 0 9 0]
[ 0 0 11]]
HeatMap
sns.heatmap(conf_matrix, annot=True, cmap="YlGnBu", fmt='g')
plt.title("Confusion Matrix")
plt.xlabel("Predicted")
plt.ylabel("Actual")
plt.show()
K MEANS Implementation
import numpy as np
df=pd.read_csv('/content/Iris.csv')
df.head()
Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species
0 1 5.1 3.5 1.4 0.2 Iris-setosa
1 2 4.9 3.0 1.4 0.2 Iris-setosa
2 3 4.7 3.2 1.3 0.2 Iris-setosa
3 4 4.6 3.1 1.5 0.2 Iris-setosa
4 5 5.0 3.6 1.4 0.2 Iris-setosa
Next steps: Generate code with df
toggle_off View recommended plots New interactive sheet
df.info()
https://colab.research.google.com/drive/1kDkVaGxeyPshe6mgQPxShanNabVTF1v_#scrollTo=oHHbeiXRnXVu&printMode=true 2/5
11/29/24, 9:30 PM SVM and Kmeans -Iris dataset.ipynb - Colab
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 6 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Id 150 non-null int64
1 SepalLengthCm 150 non-null float64
2 SepalWidthCm 150 non-null float64
3 PetalLengthCm 150 non-null float64
4 PetalWidthCm 150 non-null float64
5 Species 150 non-null object
dtypes: float64(4), int64(1), object(1)
memory usage: 7.2+ KB
df.drop(['Id'] ,axis=1, inplace=True)
df.isnull().sum()
SepalLengthCm 0
SepalWidthCm 0
PetalLengthCm 0
PetalWidthCm 0
Species 0
dtype: int64
df.describe()
SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm
count 150.000000 150.000000 150.000000 150.000000
mean 5.843333 3.054000 3.758667 1.198667
std 0.828066 0.433594 1.764420 0.763161
min 4.300000 2.000000 1.000000 0.100000
25% 5.100000 2.800000 1.600000 0.300000
50% 5.800000 3.000000 4.350000 1.300000
75% 6.400000 3.300000 5.100000 1.800000
max 7.900000 4.400000 6.900000 2.500000
df.head()
SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species
0 5.1 3.5 1.4 0.2 Iris-setosa
1 4.9 3.0 1.4 0.2 Iris-setosa
2 4.7 3.2 1.3 0.2 Iris-setosa
3 4.6 3.1 1.5 0.2 Iris-setosa
4 5.0 3.6 1.4 0.2 Iris-setosa
Next steps: Generate code with df
toggle_off View recommended plots New interactive sheet
df_imp = df.iloc[:,0:4]
from sklearn.cluster import KMeans
k_meansclus = range(1,10)
sse = []
for k in k_meansclus :
km = KMeans(n_clusters =k)
km.fit(df_imp)
sse.append(km.inertia_)
plt.title('The Elbow Method')
plt.plot(k_meansclus,sse)
plt.show()
https://colab.research.google.com/drive/1kDkVaGxeyPshe6mgQPxShanNabVTF1v_#scrollTo=oHHbeiXRnXVu&printMode=true 3/5
11/29/24, 9:30 PM SVM and Kmeans -Iris dataset.ipynb - Colab
km1 = KMeans(n_clusters=3,max_iter=300 , random_state=0)
km1.fit(df_imp)
y_means = km1.fit_predict(df_imp)
km1.cluster_centers_
array([[5.88360656, 2.74098361, 4.38852459, 1.43442623],
[5.006 , 3.418 , 1.464 , 0.244 ],
[6.85384615, 3.07692308, 5.71538462, 2.05384615]])
df_imp = np.array(df_imp)
plt.scatter(df_imp[y_means==0,2 ],df_imp[y_means==0,3 ], color='g' , label='Iris-versicolor ')
plt.scatter(df_imp[y_means==1,2 ],df_imp[y_means==1,3 ], color='r' , label='Iris-setosa')
plt.scatter(df_imp[y_means==2,2 ],df_imp[y_means==2,3 ], color='b', label='Iris-virginica')
plt.legend()
plt.show()
plt.scatter(df_imp[y_means==0,0 ],df_imp[y_means==0,1], color='g' , label='Iris-versicolor ')
plt.scatter(df_imp[y_means==1,0 ],df_imp[y_means==1,1 ], color='r' , label='Iris-setosa')
plt.scatter(df_imp[y_means==2,0 ],df_imp[y_means==2,1 ], color='b', label='Iris-virginica')
plt.legend()
plt.show()
https://colab.research.google.com/drive/1kDkVaGxeyPshe6mgQPxShanNabVTF1v_#scrollTo=oHHbeiXRnXVu&printMode=true 4/5
11/29/24, 9:30 PM SVM and Kmeans -Iris dataset.ipynb - Colab
https://colab.research.google.com/drive/1kDkVaGxeyPshe6mgQPxShanNabVTF1v_#scrollTo=oHHbeiXRnXVu&printMode=true 5/5