SVM is a one of the most popular supervised machine
learning algorithm, which can be used for both
classification and regression but mainly used in area of
classification.
Main goal of SVM is to create best fit line or boundary that
can segregate n-dimensional space into classes, so that we
can put the data point in correct category for future
prediction.
The best decision boundary is called hyper-plane.
SVM choses the extreme points/vectors that helps in
creating a hyper-plane. These extreme cases are called
as support vectors. Hence algorithm is termed as support
vector machine.
Linear SVM
There are mainly two types of SVM
1. Linear SVM: Linear SVM is used for linearly separable
data, which means if a dataset can be classified into two
classes by using a single straight line, then such data is
termed as linearly separable data, and classifier is used
called as Linear SVM classifier.
Numerical example for Linear SVM:
Q. Positively labelled data points (3,1)(3,-1)(6,1)(6,-1) and
Negatively labelled data points (1,0)(0,1)(0,-1)(-1,0)
Solution: for all negative labelled output is -1 and for all
positive labelled output is 1.
graph for above table
Now adding 1 to all points
s₁ = (3,1) => s₁`= (3,1,1)| s₂ = (3,-1) => s₂` = (3,1,-1)
s₃ = (6,1) => s₃`= (6,1,1) |s₄ = (6,-1) => s₄` = (6,1,-1)
s₅ = (1,0) => s₅`= (1,0,1) |s₆ = (0,1) => s₆` = (0,1,1)
s₇ = (0,-1)=> s₇ = (0,-1,1) |s₈ = (-1,0) => s₈` = (-1,0,1)
************************************************************
***
from the graph we can see there is one negative point α₁ =
(1,0,1) and two positive points α₂ = (3,1,1) & α₃ = (3,-1,1)
form support vectors
Generalized equation
α₁*s₁`*s₁`+ α₂*s₁`*s₂`+α₃*s₁`*s₃` = -1 → 1
α₁*s₂`*s₁`+ α₂*s₂`*s₂`+α₃*s₂`*s₃` = 1 → 2
α₁*s₃`*s₁`+ α₂*s₃`*s₂`+α₃*s₃`*s₃` = 1 → 3
************************************************************
*****
On solving these equations
eq 1 => α₁ * (1,0,1) *(1,0,1) + α₂* (1,0,1) * (3,1,1) + α₃ *
(1,0,1) * (3,-1,1) = -1
eq 2=> α₁ * (3,1,1) *(1,0,1) + α₂* (3,1,1) * (3,1,1) + α₃ *
(3,1,1) * (3,-1,1) = 1
eq 3=> α₁ * (3,-1,1) *(1,0,1) + α₂* (3,-1,1) * (3,1,1) + α₃ *
(3,-1,1)* (3,-1,1) = 1
************************************************************
****************
2α₁ + 4α₂ + 4α₃ = -1 → 4
4α₁ + 11α₂ + 9α₃ = 1 → 5
4α₁ + 9α₂ + 11α₃ = 1 → 6
on solving these equations 4,5 and 6 we get,
α₁ = -3.5, α₂ = 0.75 and α₃ = 0.75
************************************************************
****************
To find hyper-plane
W` = Σ αᵢ * sᵢ
W` = -3.5 * (1,0,1) + 0.75 * (3,1,1) + 0.75 * (3,-1,1)
W` = (1,0,-2)
y = W`x + b
W` = (1,0) and b = 2
so the best fit line or hyper plane is at (0,2) which splits
the data points into two classes
Final Graph for linear SVM
import numpy as np
import matplotlib.pyplot as plt
from sklearn import svm, datasets
iris = datasets.load_iris()
X = iris.data[:, :2]
y = iris.target
C = 1.0 # SVM regularization parameter
svc = svm.SVC(kernel='linear', C=1,gamma=0).fit(X, y)
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
h = (x_max / x_min)/100
xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
np.arange(y_min, y_max, h))
plt.subplot(1, 1, 1)
Z = svc.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
plt.contourf(xx, yy, Z, cmap=plt.cm.Paired, alpha=0.8)
plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.Paired)
plt.xlabel('Sepal length')
plt.ylabel('Sepal width')
plt.xlim(xx.min(), xx.max())
plt.title('SVC with linear kernel')
plt.show()
2. Non-linear SVM: Non-Linear SVM is used for non-
linearly separated data, which means if a dataset cannot
be classified by using a straight line, then such data is
termed as non-linear data and classifier used is called as
Non-linear SVM classifier.
In non-linear SVM we will use Kernel SVM to convert low
dimensional data into high dimensional data so that small
hyper plane will be created.
Numerical example for Non-Linear SVM:
Q. Positively labelled data points (2,2)(2,-2)(-2,-2)(-2,2) and
Negatively labelled data points (1,1)(1,-1)(-1,-1)(-1,1)
Solution: We have to find a hyper-plane i,e it divides data
into two different classes. But this is of type non-linear
SVM so we need to convert data from one feature space to
another feature space using Kernal SVM.
Kernal SVM Condition:
(1) For positive labelled data:
θ(2,2) => (√ 2² + 2²) = (√8) > 2
so θ(x₁) = 4 -2 + |2 -2| =>θ(x₁) = 2 and θ(x₂) = 4 -2 + |2 -
2| => θ(x₂) = 2
************************************************************
***********
θ(2,-2) => (√ 2²+2²) = (√8) > 2
so θ(x₁) = 4 +2 + |2 +2| =>θ(x₁) = 10 and θ(x₂) = 4 -2 + |
2 +2| => θ(x₂) = 6
************************************************************
***********
θ(-2,-2) => (√2²+2 ²) = (√8) > 2
so θ(x₁) = 4 +2 + |-2 +2| =>θ(x₁) = 6 and θ(x₂) = 4 +2 +
|-2 +2| => θ(x₂) = 6
************************************************************
***********
θ(-2,2) => (√2²+2²) = (√8) > 2
so θ(x₁) = 4 -2 + |-2 -2| =>θ(x₁) = 6 and θ(x₂) = 4 +2 + |-2
-2| => θ(x₂) = 10
************************************************************
***********
Hence overall positive labelled data is rechanged as (2,2),
(10,6), (6,6), (6,10)
(2) For negative labelled data points:
Here all values are (1, 1) only thing is negative and
positive sign changes, so if we square root them then all
values will be less than 2. Finally there is no change in the
points.
Final plot is
From the graph we can say support vectors are S₁= (1,1)
and S₂ = (2,2) so adding 1 to support vectors, S₁` = (1,1,1)
and S₂` = (2,2,1).
α₁*s₁`*s₁`+ α₂*s₁`*s₂` = -1 → 1
α₁*s₂`*s₁`+ α₂*s₂`*s₂` = 1 → 2
α₁*(1,1,1)*(1,1,1)+ α₂(1,1,1)*(2,2,1) = -1 → 1
α₁*(2,2,1)*(1,1,1)+ α₂*(2,2,1)*(2,2,1) = 1 → 2
3α₁ + 5α₂ = -1 ->1 and 5α₁ + 9α₂ = 1 ->2
On solving equations α₁ = -7 and α₂ = 4
To find hyper-plane
W` = Σ αᵢ * sᵢ
W` = -7 * (1,1,1) + 4 * (2,2,1)
W` = (1,1,-3)
y = W`x + b
W` = 1,1 and b = 3
on drawing graph
At 3 hyper-plane is drawn.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib import style
from sklearn.svm import SVC
style.use('fivethirtyeight')
# create mesh grids
def make_meshgrid(x, y, h =.02):
x_min, x_max = x.min() - 1, x.max() + 1
y_min, y_max = y.min() - 1, y.max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max,
h))
return xx, yy
# plot the contours
def plot_contours(ax, clf, xx, yy, **params):
Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
out = ax.contourf(xx, yy, Z, **params)
return out
color = ['r', 'b', 'g', 'k']
iris = pd.read_csv("iris-data.txt").values
features = iris[0:150, 2:4]
level1 = np.zeros(150)
level2 = np.zeros(150)
level3 = np.zeros(150)
# level1 contains 1 for class1 and 0 for all others.
# level2 contains 1 for class2 and 0 for all others.
# level3 contains 1 for class3 and 0 for all others.
for i in range(150):
if i>= 0 and i<50:
level1[i] = 1
elif i>= 50 and i<100:
level2[i] = 1
elif i>= 100 and i<150:
level3[i]= 1
# create 3 svm with rbf kernels
svm1 = SVC(kernel ='rbf')
svm2 = SVC(kernel ='rbf')
svm3 = SVC(kernel ='rbf')
# fit each svm's
svm1.fit(features, level1)
svm2.fit(features, level2)
svm3.fit(features, level3)
fig, ax = plt.subplots()
X0, X1 = iris[:, 2], iris[:, 3]
xx, yy = make_meshgrid(X0, X1)
# plot the contours
plot_contours(ax, svm1, xx, yy, cmap = plt.get_cmap('hot'), alpha = 0.8)
plot_contours(ax, svm2, xx, yy, cmap = plt.get_cmap('hot'), alpha = 0.3)
plot_contours(ax, svm3, xx, yy, cmap = plt.get_cmap('hot'), alpha = 0.5)
color = ['r', 'b', 'g', 'k']
for i in range(len(iris)):
plt.scatter(iris[i][2], iris[i][3], s = 30, c = color[int(iris[i][4])])
plt.show()
Advantages:
It works really well with a clear margin of separation
It is effective in high dimensional spaces.
It is effective in cases where the number of dimensions
is greater than the number of samples.
It uses a subset of training points in the decision
function (called support vectors), so it is also memory
efficient.
Disadvantages:
It doesn’t perform well when we have large data set
because the required training time is higher
It also doesn’t perform very well, when the data set has
more noise i.e. target classes are overlapping
SVM doesn’t directly provide probability estimates,
these are calculated using an expensive five-fold cross-
validation. It is included in the related SVC method of
Python scikit-learn library.
Python implementation in lab
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, classification_report
# Load the Iris dataset
iris = datasets.load_iris()
X = iris.data[:, :2] # Taking only first two features for visualization
y = iris.target
# Split dataset into training and testing sets (80-20 split)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Create SVM model with RBF kernel
svm_model = SVC(kernel='rbf', C=1.0, gamma='scale')
# Train the model
svm_model.fit(X_train, y_train)
# Predict on test data
y_pred = svm_model.predict(X_test)
# Print accuracy
print("Accuracy:", accuracy_score(y_test, y_pred))
print("\nClassification Report:\n", classification_report(y_test, y_pred))
# Plot decision boundary
def plot_decision_boundary(model, X, y):
h = .02 # Step size in the mesh
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
np.arange(y_min, y_max, h))
Z = model.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
plt.contourf(xx, yy, Z, alpha=0.3)
scatter = plt.scatter(X[:, 0], X[:, 1], c=y, edgecolors='k', marker='o', cmap=plt.cm.coolwarm)
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.title('SVM Decision Boundary')
plt.show()
Classification Report:
precision recall f1-score support
output
0 1.00 1.00 1.00 10
1 0.88 0.78 0.82 9
2 0.83 0.91 0.87 11
accuracy 0.90 30
macro avg 0.90 0.90 0.90 30
weighted avg 0.90 0.90 0.90 30
Support Vector Machine(with Numerical Example) | by Balaji C | Medium