0% found this document useful (0 votes)

54 views28 pages

Lecture: Classification With Support Vector Machines: CS 2XX: Mathematics For AI and ML

Uploaded by

katariyam071

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

54 views28 pages

Lecture: Classification With Support Vector Machines: CS 2XX: Mathematics For AI and ML

Uploaded by

katariyam071

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 28

Lecture: Classification with Support Vector Machines1

CS 2XX: Mathematics for AI and ML

Chandresh Kumar Maurya

IIT Indore
https://chandreshiit.github.io

November 17, 2024

1
Slides credit goes to Yi, Yung
November 17, 2024 1 / 28
Warm-Up

Please watch this tutorial video by Luis Serrano on Support Vector

Machine.

https://youtu.be/Lpr__X8zuE8

November 17, 2024 2 / 28

Roadmap

(1) Story and Separating Hyperplanes

(2) Primal SVM: Hard SVM
(3) Primal SVM: Soft SVM
(4) Dual SVM
(5) Kernels
(6) Numerical Solution

November 17, 2024 3 / 28

Roadmap

(1) Story and Separating Hyperplanes

(2) Primal SVM: Hard SVM
(3) Primal SVM: Soft SVM
(4) Dual SVM
(5) Kernels
(6) Numerical Solution

L12(1) November 17, 2024 4 / 28

Storyline
• (Binary) classification vs. regression
• A Classification predictor f : RD 7→ {+1, −1}, where D is the dimension of features.
• Suppervised learning as in the regression with a given dataset
{(x1 , y1 ), . . . , (xN , yN )}, where our task is to learn the model parameters which
produces the smallest classification errors.
• SVM
◦ Geometric way of thinking about supvervised learning
◦ Relying on empirical risk minimization
◦ Binary classification = Drawing a separating hyperplane
◦ Various interpretation from various perspectives: geometric view, loss function view, the
view from convex hulls of data points

L12(1) November 17, 2024 5 / 28

Hard SVM vs. Soft SVM

• Hard SVM: Linearly separable, and thus, allow no classification error

• Soft SVM: Non-linearly separable, thus, allow some classification error

L12(1) November 17, 2024 6 / 28

Separating Hyperplane
• Hyperplane in RD is a set: {x | aT x = b} where a ∈ Rn , a ̸= 0, b ∈ R L7(3)
In other words, {x | aT (x − x0 ) = 0}, where x0 is any point in the hyperplane, i.e.,
aT x0 = b.

• Divides RD into two halfspaces:

{x|aT x ≤ b} and {x|aT x > b}

• In our problem, we consider the hyperplane w T x + b = 0, where w and b are the

parameters of the model.
• Classification logic
(
w T xn + b ≥ 0 when yn = +1 T

T
=⇒ yn w xn + b ≥ 0
w xn + b < 0 when yn = −1

L12(1) November 17, 2024 7 / 28

Distance bertween Two Hyperplanes
• Consider two hyperplanes w T x − b = 0 and w T x − b = r , where assume r > 0.
2 r
• Question. What is the distance between two hyperplanes? Answer:
∥w ∥

2
Shortested distance between two hyperplanes.
L12(1) November 17, 2024 8 / 28
Roadmap

(1) Story and Separating Hyperplanes

(2) Primal SVM: Hard SVM
(3) Primal SVM: Soft SVM
(4) Dual SVM
(5) Kernels
(6) Numerical Solution

L12(2) November 17, 2024 9 / 28

Hard Support Vector Machine

• Assume that the data points are linearly separable.

• Goal: Find the hyperplane that maximizes the margin between the positive and the
negative samples
• Given the training dataset {(x1 , y1 ), . . . , (xN , yN )} and a hyperplane w T x + b = 0,
what is the constraint that all data points are ∥wr ∥ -away from the hyperplane?
T
r
yn w xn + b ≥
∥w ∥
• Note that r and ∥w ∥ are scaled together, so if we fix ∥w ∥ = 1, then
T

yn w xn + b ≥ r

L12(2) November 17, 2024 10 / 28

Hard SVM: Formulation 1

• Maximize the margin, such that all the training data points are well-classified into
their classes (+ or −)

max r
w ,b,r
T

subject to yn w xn + b ≥ r , for all n = 1, . . . , N, ∥w ∥ = 1, r >0

L12(2) November 17, 2024 11 / 28

Formulation 2 (1)

max r
w ,b,r
T

subject to yn w xn + b ≥ r , for all n = 1, . . . , N, ∥w ∥ = 1, r >0

w ′T
• ′
Since ∥w ∥ = 1, reformulate w by w as: yn xn + b ≥ r
∥w ′ ∥
• Change the objective from r to r 2 .
• Define w ′′ and b ′′ by rescaling the constraint:
w ′T w ′ b
T
yn ′
xn + b ≥ r ⇐⇒ yn w ′′ xn + b ′′ ≥ 1, ′′
w = ′′
and b =
∥w ∥ ∥w ′ ∥ r r

L12(2) November 17, 2024 12 / 28

Formulation 2 (2)
• Note that ∥w ′′ ∥ = 1
r
• Thus, we have the following reformulated problem:

1
max
′′ ′′
w ,b ∥w ′′ ∥2
′′ T ′′

subject to yn w xn + b ≥ 1, for all n = 1, . . . , N,

1
min ∥w ∥2
w ,b 2
T

subject to yn w xn + b ≥ 1, for all n = 1, . . . , N,

L12(2) November 17, 2024 13 / 28

Understanding Formulation 2 Intuitively

• Given the training dataset {(x1 , y1 ), . . . , (xN , yN )} and a hyperplane w T x + b = 0,

what is the constraint that all data points are ∥wr ∥ -away from the hyperplane?
T
r
yn w xn + b ≥
∥w ∥
• Formulation 1. Note that r and ∥w ∥ are scaled together, so if we fix ∥w ∥ = 1, then
T

yn w xn + b ≥ r .
And, maximize r .
• Formulation 2. If we fix r = 1, then
T

yn w xn + b ≥ 1.
And, minimize ∥w ∥

L12(2) November 17, 2024 14 / 28

Roadmap

(1) Story and Separating Hyperplanes

(2) Primal SVM: Hard SVM
(3) Primal SVM: Soft SVM
(4) Dual SVM
(5) Kernels
(6) Numerical Solution

L12(3) November 17, 2024 15 / 28

Soft SVM: Geometric View
• Now we allow some classification errors, because it’s not linearly separable.
• Introduce a slack variable that quantifies how much errors will be allowed in my
optimization problem

• ξ = (ξn : n = 1, . . . , N)
• ξn : slack for the n-th sample (xn , yn )

N
1 2
X
min ∥w ∥ + C ξn
w ,b 2 n=1
T

subject to yn w x n + b ≥ 1 − ξ n ,
ξn ≥ 0, for all n

• C : Trade-off between width and slack

L12(3) November 17, 2024 16 / 28

Soft SVM: Loss Function View (1)
• From the perspective of empirical risk minimizaiton
• Loss function design
◦ zero-one loss 1(f (xn ) ̸= yn ): # of mismatches between the prediction and the label
=⇒ combinatorial optimization (typically NP-hard)
◦ hinge loss
ℓ(t) = max(0, 1 − t), where t = yf (x) = y (w T x + b)

▶ If x is really at the correct side, t ≥ 1

→ ℓ(t) = 0
▶ If x is at the correct side, but too
close to the boundary, 0 < t < 1
→ 0 < ℓ(t) = 1 − t < 1
▶ If x is at the wrong side, t < 0
→ 1 < ℓ(t) = 1 − t

L12(3) November 17, 2024 17 / 28

Soft SVM: Loss Function View (2)

N
1 2
X
min (regularizer + loss) = min ∥w ∥ + C max{0, 1 − y (w T x + b)}
w ,b w ,b 2
n=1

• 1
2 ∥w ∥2 : L2-regularizer (margin maximization = regularization)
• C : regularization parameter, which moves from the regularization term to the loss
term
• Why this loss function view = geometric view?
min max(0, 1 − t) ⇐⇒ min ξ, subject to ξ ≥ 0, ξ ≥ 1 − t
t ξ,t

L12(3) November 17, 2024 18 / 28

Roadmap

(1) Story and Separating Hyperplanes

(2) Primal SVM: Hard SVM
(3) Primal SVM: Soft SVM
(4) Dual SVM
(5) Kernels
(6) Numerical Solution

L12(4) November 17, 2024 19 / 28

Dual SVM: Idea
N
1 2
X
min ∥w ∥ + C ξn
w ,b 2
n=1
T

subject to yn w xn + b ≥ 1 − ξn , ξn ≥ 0, for all n

• The above primal problem is a convex optimization problem.

• Let’s apply Lagrange multipliers, find another formulation, and see what other nice
properties are shown L7(2), L7(4)
• Convert the problem into ”≤” constraints, so as to apply min-min-max rule

N
1 2
X
T

min ∥w ∥ + C ξn , s.t. − yn w xn + b ≤ −1 + ξn , −ξn ≤ 0, for all n
w ,b 2
n=1

L12(4) November 17, 2024 20 / 28

Applying Lagrange Multipliers (1)

N
1 2
X
ξn , s.t. − yn w T xn + b ≤ −1 + ξn , −ξn ≤ 0,

min ∥w ∥ + C for all n
w ,b 2
n=1
• Lagrangian with multipliers αn ≥ 0 and γn ≥ 0
N N N
1 X X h i X
L(w , b, ξ, α, γ) = ∥w ∥2 + C αn yn w T xn + b − 1 + ξn −

ξn − γ n ξn
2
n=1 n=1 n=1
• Dual function: D(α, γ) = inf w ,b,ξ L(w , b, ξ, α, γ) for which the followings should
be met:
N N
∂L T
X
T ∂L X ∂L
(D1) =w − αn yn xn = 0, (D2) = αn yn = 0, (D3) = C − αn − γn = 0
∂w n=1
∂b n=1
∂ξn

L12(4) November 17, 2024 21 / 28

Applying Lagrange Multipliers (2)

• Dual function D(α, γ) = inf w ,b,ξ L(w , b, ξ, α, γ) with (D1) is given by:
N XN N
* N + N
1 X X X X
D(α, γ) = yi yj αi αj ⟨xi , xj ⟩ − y i αi y j αj x j , x i − b yi αi
2
i=1 j=1 i=1 j=1 i=1
N
X N
X
+ αi + (C − αi − γi )ξi
i=1 i=1
• From (D2) and (D3), the above is simplified into:
N N N
1 XX X
D(α, γ) = yi yj αi αj ⟨xi , xj ⟩ + αi
2
i=1 j=1 i=1
• αi , γi ≥ 0 and C − αi − γi = 0 =⇒ 0 ≤ αi ≤ C

L12(4) November 17, 2024 22 / 28

Dual SVM
• (Lagrangian) Dual Problem: maximize D(α, γ)

N N N
1 XX X
min yi yj αi αj ⟨xi , xj ⟩ + αi
α 2
i=1 j=1 i=1
N
X
subject to yi αi = 0, 0 ≤ αi ≤ C , ∀i = 1, . . . , N
i=1

• Primal SVM: the number of parameters scales as the number of features (D)
• Dual SVM
◦ the number of parameters scales as the number of training data (N)
◦ only depends on the inner products of individual training data points ⟨xi , xj ⟩ → allow
the application of kernel

L12(4) November 17, 2024 23 / 28

Roadmap

(1) Story and Separating Hyperplanes

(2) Primal SVM: Hard SVM
(3) Primal SVM: Soft SVM
(4) Dual SVM
(5) Kernels
(6) Numerical Solution

L12(5) November 17, 2024 24 / 28

Kernel
• Modularity: Using the feature
transformation ϕ(x), dual SVMs can be
modularized
⟨xi , xj ⟩ =⇒ ⟨ϕ(xi ), ϕ(xj )⟩
• Similarity function k : X × X 7→ R,
k(xi , xj ) = ⟨ϕ(xi ), ϕ(xj )⟩
• Kernel matrix, Gram matrix: must be
symmetric and positive semidifinite
• Examples: polynomial kernel, Gaussian
radial basis function, rational quadratic
kernel

L12(5) November 17, 2024 25 / 28

Numerical Solution

L12(5) November 17, 2024 26 / 28

Questions?

L12(5) November 17, 2024 27 / 28

Review Questions

L12(5) November 17, 2024 28 / 28

Main 7
No ratings yet
Main 7
25 pages
Support Vector Machine
No ratings yet
Support Vector Machine
49 pages
Support Vector Machine
No ratings yet
Support Vector Machine
55 pages
Support Vector Machine
No ratings yet
Support Vector Machine
35 pages
CS-13410 Introduction To Machine Learning
No ratings yet
CS-13410 Introduction To Machine Learning
33 pages
SVM: Classification & Optimization
No ratings yet
SVM: Classification & Optimization
44 pages
ML TCS Lecture 15
No ratings yet
ML TCS Lecture 15
46 pages
SVM Tutorial
No ratings yet
SVM Tutorial
31 pages
Learning From Data Lecture 8: Support Vector Machines (SVM) : Alaa Othman June 10, 2025
No ratings yet
Learning From Data Lecture 8: Support Vector Machines (SVM) : Alaa Othman June 10, 2025
70 pages
Lecture 7 - SVM
No ratings yet
Lecture 7 - SVM
125 pages
Support Vector Machines Guide
No ratings yet
Support Vector Machines Guide
19 pages
SVM Slides
No ratings yet
SVM Slides
22 pages
L5 SVMs
No ratings yet
L5 SVMs
37 pages
Support Vector Machine (SVM)
No ratings yet
Support Vector Machine (SVM)
26 pages
Lec5 Support Vector Machine
No ratings yet
Lec5 Support Vector Machine
28 pages
L5 SVM
No ratings yet
L5 SVM
61 pages
W12 SVM
No ratings yet
W12 SVM
52 pages
An Introduction To Support Vector Machines
No ratings yet
An Introduction To Support Vector Machines
13 pages
Final - Support Vector Machine - Class - Modifie
No ratings yet
Final - Support Vector Machine - Class - Modifie
69 pages
Introduction To: Support Vector Machines
No ratings yet
Introduction To: Support Vector Machines
53 pages
Introduction to Support Vector Machines
No ratings yet
Introduction to Support Vector Machines
28 pages
Report 1
No ratings yet
Report 1
6 pages
ML Lec SVM Linear
No ratings yet
ML Lec SVM Linear
19 pages
Chapter 07 SVM
No ratings yet
Chapter 07 SVM
20 pages
Support Vector Machine
No ratings yet
Support Vector Machine
52 pages
Support Vector Machines (SVM) : Y.H. Hu
No ratings yet
Support Vector Machines (SVM) : Y.H. Hu
25 pages
04SVM
No ratings yet
04SVM
22 pages
An Introduction Of: Support Vector Machine
No ratings yet
An Introduction Of: Support Vector Machine
36 pages
SVM Student
No ratings yet
SVM Student
40 pages
Support Vector Machines For Classification and Regression
No ratings yet
Support Vector Machines For Classification and Regression
8 pages
Support Vector Machines (SVMS)
No ratings yet
Support Vector Machines (SVMS)
31 pages
SVM Seminarbericht Hofmann
No ratings yet
SVM Seminarbericht Hofmann
16 pages
Support Vector Machines Guide
No ratings yet
Support Vector Machines Guide
25 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
SVM Applications and Properties
100% (1)
SVM Applications and Properties
34 pages
Introduction to Support Vector Machines
No ratings yet
Introduction to Support Vector Machines
36 pages
Lecture 5. Support Vector Machines SVM
No ratings yet
Lecture 5. Support Vector Machines SVM
47 pages
Lecture 17 - Hyperplane Classifiers - SVM - Plain
No ratings yet
Lecture 17 - Hyperplane Classifiers - SVM - Plain
16 pages
Support Vector Machine
No ratings yet
Support Vector Machine
19 pages
Support Vector Machines: Jeff Wu
No ratings yet
Support Vector Machines: Jeff Wu
35 pages
Math Behind SVM Part 1 (Support Vector Machine) - by MLMath - Io - Medium
No ratings yet
Math Behind SVM Part 1 (Support Vector Machine) - by MLMath - Io - Medium
15 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
Lec15 16
No ratings yet
Lec15 16
35 pages
Machine Learning - Open Elective - Part III
No ratings yet
Machine Learning - Open Elective - Part III
90 pages
SVM PRESENTATION
No ratings yet
SVM PRESENTATION
34 pages
Chapter 8
No ratings yet
Chapter 8
52 pages
20 SVM
No ratings yet
20 SVM
35 pages
SVM: Linear & Non-linear Classification
No ratings yet
SVM: Linear & Non-linear Classification
35 pages
SVM Basics for Machine Learning Enthusiasts
No ratings yet
SVM Basics for Machine Learning Enthusiasts
4 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
An Introduction Of: Support Vector Machine
No ratings yet
An Introduction Of: Support Vector Machine
36 pages
Support Vector Machine
No ratings yet
Support Vector Machine
45 pages
10 SVM
No ratings yet
10 SVM
23 pages
Multiple Discriminant Analysis: 10.1 Concept
No ratings yet
Multiple Discriminant Analysis: 10.1 Concept
2 pages
Chapter 1 Qualitative Variables Final
No ratings yet
Chapter 1 Qualitative Variables Final
74 pages
Lab 4 DONE New
No ratings yet
Lab 4 DONE New
6 pages
Cobb-Douglas Regression Analysis
No ratings yet
Cobb-Douglas Regression Analysis
5 pages
Spatial Autocorrelation Analysis
No ratings yet
Spatial Autocorrelation Analysis
59 pages
Formulas in Inferential Statistics
No ratings yet
Formulas in Inferential Statistics
4 pages
Logistic Regression
No ratings yet
Logistic Regression
33 pages
ML Theory Questions Final
No ratings yet
ML Theory Questions Final
3 pages
Slide 6
No ratings yet
Slide 6
37 pages
Spermans Rank Correlation
0% (1)
Spermans Rank Correlation
23 pages
Correlation & Regression Insights
No ratings yet
Correlation & Regression Insights
1 page
Regression Techniques Guide
No ratings yet
Regression Techniques Guide
74 pages
Regression Analysis for ML Beginners
No ratings yet
Regression Analysis for ML Beginners
12 pages
Econometric Theory: Module - Ii
No ratings yet
Econometric Theory: Module - Ii
8 pages
RCBD Revised Notes
No ratings yet
RCBD Revised Notes
30 pages
Applied Statistics in Construction
No ratings yet
Applied Statistics in Construction
8 pages
Unit 5 Clustering
No ratings yet
Unit 5 Clustering
70 pages
Faculty of Science FRM 9649: Time Series Analysis & Forecasting
No ratings yet
Faculty of Science FRM 9649: Time Series Analysis & Forecasting
5 pages
Beale Et Al 2010 Regression Analysis of Spatial Data
No ratings yet
Beale Et Al 2010 Regression Analysis of Spatial Data
19 pages
Mtcars Dataset: Multilinear Regression Analysis
No ratings yet
Mtcars Dataset: Multilinear Regression Analysis
13 pages
Regression Models For Twin Studies
No ratings yet
Regression Models For Twin Studies
11 pages
Econometrics Study Guide
No ratings yet
Econometrics Study Guide
9 pages
Analysis of Variance
No ratings yet
Analysis of Variance
13 pages
Linear Regression
No ratings yet
Linear Regression
4 pages
ATAR Conversion for Students
No ratings yet
ATAR Conversion for Students
1 page
SSTA031 Lecture Notes
No ratings yet
SSTA031 Lecture Notes
49 pages
Mckinney Time Series
No ratings yet
Mckinney Time Series
29 pages
T Test
No ratings yet
T Test
12 pages
Homework 1
0% (1)
Homework 1
8 pages
TO Machine Learning: Lecture Slides For
No ratings yet
TO Machine Learning: Lecture Slides For
33 pages

Lecture: Classification With Support Vector Machines: CS 2XX: Mathematics For AI and ML

Uploaded by

Lecture: Classification With Support Vector Machines: CS 2XX: Mathematics For AI and ML

Uploaded by

Lecture: Classification with Support Vector Machines1

CS 2XX: Mathematics for AI and ML

Chandresh Kumar Maurya

November 17, 2024

Please watch this tutorial video by Luis Serrano on Support Vector

November 17, 2024 2 / 28

(1) Story and Separating Hyperplanes

November 17, 2024 3 / 28

(1) Story and Separating Hyperplanes

L12(1) November 17, 2024 4 / 28

L12(1) November 17, 2024 5 / 28

• Hard SVM: Linearly separable, and thus, allow no classification error

L12(1) November 17, 2024 6 / 28

• Divides RD into two halfspaces:

• In our problem, we consider the hyperplane w T x + b = 0, where w and b are the

L12(1) November 17, 2024 7 / 28

(1) Story and Separating Hyperplanes

L12(2) November 17, 2024 9 / 28

• Assume that the data points are linearly separable.

L12(2) November 17, 2024 10 / 28

L12(2) November 17, 2024 11 / 28

L12(2) November 17, 2024 12 / 28

L12(2) November 17, 2024 13 / 28

• Given the training dataset {(x1 , y1 ), . . . , (xN , yN )} and a hyperplane w T x + b = 0,

L12(2) November 17, 2024 14 / 28

(1) Story and Separating Hyperplanes

L12(3) November 17, 2024 15 / 28

• C : Trade-off between width and slack

L12(3) November 17, 2024 16 / 28

▶ If x is really at the correct side, t ≥ 1

L12(3) November 17, 2024 17 / 28

L12(3) November 17, 2024 18 / 28

(1) Story and Separating Hyperplanes

L12(4) November 17, 2024 19 / 28

• The above primal problem is a convex optimization problem.

L12(4) November 17, 2024 20 / 28

L12(4) November 17, 2024 21 / 28

L12(4) November 17, 2024 22 / 28

L12(4) November 17, 2024 23 / 28

(1) Story and Separating Hyperplanes

L12(5) November 17, 2024 24 / 28

L12(5) November 17, 2024 25 / 28

L12(5) November 17, 2024 26 / 28

L12(5) November 17, 2024 27 / 28

L12(5) November 17, 2024 28 / 28

You might also like