SUPPORT VECTOR MACHINES
(SVM)
Maximum Margin Hyperplane and Core Concepts
What is SVM?
Support Vector Machine (SVM) is a
supervised learning algorithm used for
classification and regression.
Focuses on finding a hyperplane that
separates classes with maximum margin.
Effective in high-dimensional spaces and
with small/medium-sized datasets.
SVM - History and Applications
Vapnik and colleagues (1992)—groundwork from Vapnik &
Chervonenkis’ statistical learning theory in 1960s
Features: training can be slow but accuracy is high owing to
their ability to model complex nonlinear decision boundaries
(margin maximization)
Used for: classification and numeric prediction
Applications:
Handwritten digit recognition, object recognition, speaker
identification, benchmarking time-series prediction tests
Support Vector Machines
Classification method for both linear and nonlinear data
It uses a nonlinear mapping to transform the original
training data into a higher dimension
With the new dimension, it searches for the linear optimal
separating hyperplane (i.e., “decision boundary”)
With an appropriate nonlinear mapping to a sufficiently
high dimension, data from two classes can always be
separated by a hyperplane
SVM finds this hyperplane using support vectors
(“essential” training tuples) and margins (defined by the
support vectors)
SVM – General Philosophy
Small Margin Large Margin
Support Vectors
5
SVM - Margins and Support Vectors
April 16, 2019 Data Mining: Concepts and Techniques 6
SVM - When Data Is Linearly Separable
Let data D be (X1, y1), …, (X|D|, y|D|), where Xi is the set of training tuples associated
with the class labels yi
There are infinite lines (hyperplanes) separating the two classes but we want to find the
best one (the one that minimizes classification error on unseen data)
SVM searches for the hyperplane with the largest margin, i.e., Maximum Marginal
Hyperplane (MMH)
7
Support Vectors and Hyperplane
Support Vectors: Critical points closest to
the decision boundary that define the
hyperplane.
Hyperplane: The decision boundary
separating different classes (a line, plane,
or hyperplane in higher dimensions).
Maximum Margin Hyperplane
The Maximum Margin Hyperplane is the
decision boundary that maximizes the margin
between classes.
Margin: Distance between the hyperplane and
the nearest data points (support vectors).
SVM aims to maximize this margin, providing
a robust decision boundary.
SVM – General Philosophy
Small Margin Large Margin
Support Vectors
10
SVM - Margins and Support Vectors
April 16, 2019 Data Mining: Concepts and Techniques 11
SVM - When Data Is Linearly Separable
Let data D be (X1, y1), …, (X|D|, y|D|), where Xi is the set of training tuples associated
with the class labels yi
There are infinite lines (hyperplanes) separating the two classes but we want to find the
best one (the one that minimizes classification error on unseen data)
SVM searches for the hyperplane with the largest margin, i.e., Maximum Marginal
Hyperplane (MMH)
12
SVM - Linearly Separable
A separating hyperplane can be written as
W●X+b=0
where W={w1, w2, …, wn} is a weight vector and b a scalar (bias)
For 2-D it can be written as
w0 + w1 x1 + w2 x2 = 0
The hyperplane defining the sides of the margin:
H1: w0 + w1 x1 + w2 x2 ≥ 1 for yi = +1, and
H2: w0 + w1 x1 + w2 x2 ≤ – 1 for yi = –1
Any training tuples that fall on hyperplanes H1 or H2 (i.e., the
sides defining the margin) are support vectors
This becomes a constrained (convex) quadratic optimization
problem: Quadratic objective function and linear constraints
Quadratic Programming (QP) Lagrangian multipliers
13
Mathematical Formulation
Non-linear Data and Kernel Trick A2
(SVM - Linearly Inseparable)
A1
For non-linearly separable data, SVM uses the
Kernel Trick to project data into higher
dimensions.
Common kernels: Polynomial, Radial Basis
Function (RBF), and Sigmoid.
Allows SVM to find a hyperplane in higher-
dimensional space.
A2
Kernel Trick
A1
The kernel trick is used in SVMs when the data is not
linearly separable in its original feature space.
It allows the algorithm to operate in a higher-dimensional
space without explicitly computing the coordinates of the
data in that space.
This makes it possible to find a separating hyperplane for
complex, non-linear data distributions.
The goal of the kernel trick is to transform the data into a
higher-dimensional space where a linear separation
(hyperplane) is possible, even if the data is not linearly
separable in the original space.
Kernel Trick …
SVM: Different Kernel functions
Instead of computing the dot product on the transformed data, it
is math. equivalent to applying a kernel function K(Xi, Xj) to the
original data, i.e., K(Xi, Xj) = Φ(Xi) Φ(Xj)
Typical Kernel Functions
SVM can also be used for classifying multiple (> 2) classes and
for regression analysis (with additional parameters)
19
A2
Kernel Trick …
Transform the original input data into a higher dimensional space
A1
Search for a linear separating hyperplane in the new space
20