0% found this document useful (0 votes)

14 views69 pages

Support Vector Machines

Uploaded by

Paulo Rogero Lima

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views69 pages

Support Vector Machines

Uploaded by

Paulo Rogero Lima

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 69

10 601 Introduction to Machine Learning

Machine Learning Department

School of Computer Science
Carnegie Mellon University

Support Vector Machines

+
Kernels
Matt Gormley
Lecture 27
Apr. 22, 2020

1
Reminders
Homework 8: Reinforcement Learning
Out: Fri, Apr 10
Due: Wed, Apr 22 at 11:59pm
Homework 9: Learning Paradigms
Out: Wed, Apr. 22
Due: Wed, Apr. 29 at 11:59pm
Can only be submitted up to 3 days late,
so we can return grades before final exam

Today’s In Class Poll

http://poll.mlcourse.org
2
CONSTRAINED OPTIMIZATION

7
Constrained Optimization

8
Quadratic Program

9
SVM: Optimization Background
Whiteboard
Constrained Optimization
Linear programming
Quadratic programming
Example: 2D quadratic function with linear
constraints

10
Quadratic Program

11
Quadratic Program

12
Quadratic Program

13
Quadratic Program

14
Quadratic Program

15
SUPPORT VECTOR MACHINE
(SVM)

16
Example: Building Walls

18
https://www.facebook.com/Mondobloxx/
SVM
Whiteboard
SVM Primal (Linearly Separable Case)

This section borrows ideas from Nina Balcan’s SVM lectures at CMU and Patrick Winston’s 19
“widest street” SVM lecture at MIT (https://www.youtube.com/watch?v=_PwhiWxHK8o).
SVM QP

20
SVM QP

21
SVM QP

22
SVM QP

23
SVM QP

24
SVM QP

25
Support Vector Machines (SVMs)
Hard margin SVM (Primal) Hard margin SVM (Lagrangian Dual)

Instead of minimizing the primal, we can maximize the

dual problem
For the SVM, these two problems give the same
answer (i.e. the minimum of one is the maximum of the
other)
Definition: support vectors are those points x(i) for
which (i) 0
26
METHOD OF LAGRANGE
MULTIPLIERS

27
Method of Lagrange Multipliers

28
Method of Lagrange Multipliers

29
Method of Lagrange Multipliers

30
Method of Lagrange Multipliers

31
Method of Lagrange Multipliers

32
Method of Lagrange Multipliers

33
Method of Lagrange Multipliers

34
Figure from http://tutorial.math.lamar.edu/Classes/CalcIII/LagrangeMultipliers.aspx
Method of Lagrange Multipliers

35
Figure from http://tutorial.math.lamar.edu/Classes/CalcIII/LagrangeMultipliers.aspx
SVM DUAL

36
Method of Lagrange Multipliers
Whiteboard
Lagrangian Duality
Example: SVM Dual

This section borrows ideas from Nina Balcan’s SVM lectures at CMU and Patrick Winston’s 37
“widest street” SVM lecture at MIT (https://www.youtube.com/watch?v=_PwhiWxHK8o).
Support Vector Machines (SVMs)
Hard margin SVM (Primal) Hard margin SVM (Lagrangian Dual)

Instead of minimizing the primal, we can maximize the

42
Soft Margin SVM
Hard margin SVM (Primal)
Question: If the dataset is
not linearly separable, can
we still use an SVM?
Answer: Not the hard
margin version. It will never
find a feasible solution.
In the soft margin version,
Soft margin SVM (Primal) we add “slack variables”
that allow some points to
violate the large margin
constraints.
The constant C dictates
how large we should allow
the slack variables to be
43
Soft Margin SVM
Hard margin SVM (Primal)

Soft margin SVM (Primal)

44
Soft Margin SVM
Hard margin SVM (Primal) Hard margin SVM (Lagrangian Dual)

Soft margin SVM (Primal) Soft margin SVM (Lagrangian Dual)

We can also work with the dual of the soft margin SVM
45
Multiclass SVMs
The SVM is inherently a binary classification method,
but can be extended to handle K class classification in
many ways.
1. one vs rest:
build K binary classifiers
train the kth classifier to predict whether an instance
has label k or something else
predict the class with largest score
2. one vs one:
build (K choose 2) binary classifiers
train one classifier for distinguishing between each pair
of labels
predict the class with the most “votes” from any given
classifier
46
Learning Objectives
Support Vector Machines
You should be able to…
1. Motivate the learning of a decision boundary with large margin
2. Compare the decision boundary learned by SVM with that of
Perceptron
3. Distinguish unconstrained and constrained optimization
4. Compare linear and quadratic mathematical programs
5. Derive the hard margin SVM primal formulation
6. Derive the Lagrangian dual for a hard margin SVM
7. Describe the mathematical properties of support vectors and provide
an intuitive explanation of their role
8. Draw a picture of the weight vector, bias, decision boundary, training
examples, support vectors, and margin of an SVM
9. Employ slack variables to obtain the soft margin SVM
10. Implement an SVM learner using a black box quadratic programming
(QP) solver

52
KERNELS

53
Kernels: Motivation
Most real world problems exhibit data that is
not linearly separable.
Example: pixel representation for Facial Recognition:

Q: When your data is not linearly separable,

how can you still use a linear classifier?
A: Preprocess the data to produce nonlinear
features
54
Kernels: Motivation
Motivation #1: Inefficient Features
Non linearly separable data requires high
dimensional representation
Might be prohibitively expensive to compute or
store
Motivation #2: Memory based Methods
k Nearest Neighbors (KNN) for facial recognition
allows a distance metric between images no
need to worry about linearity restriction at all

55
Kernel Methods
Key idea:
1. Rewrite the algorithm so that we only work with dot products xTz
of feature vectors
2. Replace the dot products xTz with a kernel function k(x, z)

The kernel k(x,z) can be any legal definition of a dot product:

k(x, z) = (x) T (z) for any function : RD
So we only compute the dot product implicitly

This “kernel trick” can be applied to many algorithms:

classification: perceptron, SVM, …
regression: ridge regression, …
clustering: k means, …

57
SVM: Kernel Trick
Hard margin SVM (Primal) Hard margin SVM (Lagrangian Dual)

Suppose we do some
feature engineering
Our feature function is
We apply to each input
vector x

58
SVM: Kernel Trick
Hard margin SVM (Lagrangian Dual)

We could replace the dot product of the two feature vectors

in the transformed space with a function k(x,z)

59
SVM: Kernel Trick
Hard margin SVM (Lagrangian Dual)

We could replace the dot product of the two feature vectors

in the transformed space with a function k(x,z)

60
Kernel Methods
Key idea:
1. Rewrite the algorithm so that we only work with dot products xTz
of feature vectors
2. Replace the dot products xTz with a kernel function k(x, z)

The kernel k(x,z) can be any legal definition of a dot product:

k(x, z) = (x) T (z) for any function : RD
So we only compute the dot product implicitly

This “kernel trick” can be applied to many algorithms:

classification: perceptron, SVM, …
regression: ridge regression, …
clustering: k means, …

61
Kernel Methods
Q: These are just non linear features, right?
A: Yes, but…

Q: Can’t we just compute the feature

transformation explicitly?
A: That depends...

Q: So, why all the hype about the kernel trick?

A: Because the explicit features might either
be prohibitively expensive to compute or
infinite length vectors 62
Example: Polynomial Kernel

63
Slide from Nina Balcan
Kernel Examples
Name Kernel Function Feature Space
(implicit dot product) (explicit dot product)

Linear Same as original input

space

Polynomial (v1) All polynomials of degree

Polynomial (v2) All polynomials up to

degree d

Gaussian Infinite dimensional space

Hyperbolic (With SVM, this is

Tangent equivalent to a 2 layer
(Sigmoid) neural network)
Kernel
66
RBF Kernel Example

RBF Kernel:
67
RBF Kernel Example

RBF Kernel:
68
RBF Kernel Example

RBF Kernel:
69
RBF Kernel Example

RBF Kernel:
70
RBF Kernel Example

RBF Kernel:
71
RBF Kernel Example

RBF Kernel:
72
RBF Kernel Example

RBF Kernel:
73
RBF Kernel Example

RBF Kernel:
74
RBF Kernel Example

RBF Kernel:
75
RBF Kernel Example

RBF Kernel:
76
RBF Kernel Example

RBF Kernel:
77
RBF Kernel Example

RBF Kernel:
78
RBF Kernel Example
KNN vs. SVM

RBF Kernel:
79
RBF Kernel Example
KNN vs. SVM

RBF Kernel:
80
RBF Kernel Example
KNN vs. SVM

RBF Kernel:
81
RBF Kernel Example
KNN vs. SVM

RBF Kernel:
82
Kernel Methods
Key idea:
1. Rewrite the algorithm so that we only work with dot products xTz
of feature vectors
2. Replace the dot products xTz with a kernel function k(x, z)

The kernel k(x,z) can be any legal definition of a dot product:

k(x, z) = (x) T (z) for any function : RD
So we only compute the dot product implicitly

This “kernel trick” can be applied to many algorithms:

classification: perceptron, SVM, …
regression: ridge regression, …
clustering: k means, …

83
SVM + Kernels: Takeaways
Maximizing the margin of a linear separator is a good
training criteria
Support Vector Machines (SVMs) learn a max margin
linear classifier
The SVM optimization problem can be solved with
black box Quadratic Programming (QP) solvers
Learned decision boundary is defined by its support
vectors
Kernel methods allow us to work in a transformed
feature space without explicitly representing that
space
The kernel trick can be applied to SVMs, as well as
many other algorithms

86
Learning Objectives
Kernels
You should be able to…
1. Employ the kernel trick in common learning
algorithms
2. Explain why the use of a kernel produces only
an implicit representation of the transformed
feature space
3. Use the "kernel trick" to obtain a
computational complexity advantage over
explicit feature transformation
4. Sketch the decision boundaries of a linear
classifier with an RBF kernel
87

Kernel Method and Support Vector Machines: Nguyen Duc Dung, Ph.D. Ioit, Vast
No ratings yet
Kernel Method and Support Vector Machines: Nguyen Duc Dung, Ph.D. Ioit, Vast
34 pages
SVM Basics for Machine Learning Enthusiasts
No ratings yet
SVM Basics for Machine Learning Enthusiasts
4 pages
Support Vector Machine (SVM) PDF
No ratings yet
Support Vector Machine (SVM) PDF
15 pages
Support Vector Machine
No ratings yet
Support Vector Machine
49 pages
Support Vector Machines - An Introduction: Department of Electrical Engineering Technion, Israel
100% (1)
Support Vector Machines - An Introduction: Department of Electrical Engineering Technion, Israel
44 pages
SVM Consolidated
No ratings yet
SVM Consolidated
34 pages
6 Lec SVM Kernel
No ratings yet
6 Lec SVM Kernel
36 pages
L5-Support Vector Machine
No ratings yet
L5-Support Vector Machine
61 pages
Slide - SVM
No ratings yet
Slide - SVM
12 pages
Icml Tutorial
No ratings yet
Icml Tutorial
85 pages
SVM Presentation
No ratings yet
SVM Presentation
19 pages
Support Vector Machine in R Paper
No ratings yet
Support Vector Machine in R Paper
28 pages
A Support Vector Machines
No ratings yet
A Support Vector Machines
2 pages
Fundamental Knowledge of Machine Learning: Abstract This Chapter Introduces The Basic Concepts and Methods of Machine
No ratings yet
Fundamental Knowledge of Machine Learning: Abstract This Chapter Introduces The Basic Concepts and Methods of Machine
14 pages
Support Vector Machine - Wikipedia, The Free Encyclopedia
No ratings yet
Support Vector Machine - Wikipedia, The Free Encyclopedia
12 pages
Support Vector Machines: Some Slides Adapted From
No ratings yet
Support Vector Machines: Some Slides Adapted From
54 pages
An Introduction Of: Support Vector Machine
No ratings yet
An Introduction Of: Support Vector Machine
36 pages
Support Vector Machines: Jeff Wu
No ratings yet
Support Vector Machines: Jeff Wu
35 pages
Support Vector Machines (SVM) : N I y X D
No ratings yet
Support Vector Machines (SVM) : N I y X D
5 pages
Lecture Slides-Week12
100% (1)
Lecture Slides-Week12
41 pages
An Improved Training Algorithm For Support Vector Machines
No ratings yet
An Improved Training Algorithm For Support Vector Machines
10 pages
Chapter 2 - SVM (Support Vector Machine) - Theory - Machine Learning 101 - Medium
No ratings yet
Chapter 2 - SVM (Support Vector Machine) - Theory - Machine Learning 101 - Medium
9 pages
Introduction To Support Vector Machines: Hsuan-Tien Lin
No ratings yet
Introduction To Support Vector Machines: Hsuan-Tien Lin
20 pages
15 Support Vector Machines
No ratings yet
15 Support Vector Machines
30 pages
2024 Scu ML 2 1 SVM
No ratings yet
2024 Scu ML 2 1 SVM
36 pages
Support Vector Machine
No ratings yet
Support Vector Machine
45 pages
SVM Applications and Properties
100% (1)
SVM Applications and Properties
34 pages
Lec06 SVM
No ratings yet
Lec06 SVM
25 pages
Support Vector Machines: Constantin F. Aliferis & Ioannis Tsamardinos
No ratings yet
Support Vector Machines: Constantin F. Aliferis & Ioannis Tsamardinos
37 pages
7 - Support Vector Machines (SVM)
No ratings yet
7 - Support Vector Machines (SVM)
29 pages
CS-13410 Introduction To Machine Learning
No ratings yet
CS-13410 Introduction To Machine Learning
33 pages
Support Vector Machines Guide
No ratings yet
Support Vector Machines Guide
1 page
Support Vector Machine
No ratings yet
Support Vector Machine
29 pages
Machine Learning - SVM
No ratings yet
Machine Learning - SVM
11 pages
Support Vector Machines 1639601280
No ratings yet
Support Vector Machines 1639601280
16 pages
Support Vector and Kernel Methods - Detailed Notes
No ratings yet
Support Vector and Kernel Methods - Detailed Notes
10 pages
Intro to Support Vector Machines
No ratings yet
Intro to Support Vector Machines
19 pages
Lecture 18 - SVM
No ratings yet
Lecture 18 - SVM
54 pages
Detailed SVM Presentation
No ratings yet
Detailed SVM Presentation
15 pages
Chapter 07 SVM
No ratings yet
Chapter 07 SVM
20 pages
SVM Manual
No ratings yet
SVM Manual
7 pages
SVM Set-2
No ratings yet
SVM Set-2
5 pages
An Introduction Of: Support Vector Machine
No ratings yet
An Introduction Of: Support Vector Machine
36 pages
Support Vector Machine
No ratings yet
Support Vector Machine
11 pages
Support Vector Machines Guide
No ratings yet
Support Vector Machines Guide
19 pages
SVM Tutorial
No ratings yet
SVM Tutorial
28 pages
Lecture: Classification With Support Vector Machines: CS 2XX: Mathematics For AI and ML
No ratings yet
Lecture: Classification With Support Vector Machines: CS 2XX: Mathematics For AI and ML
28 pages
Support Vector Machine
No ratings yet
Support Vector Machine
52 pages
A09 Support Vector Machines 2up
No ratings yet
A09 Support Vector Machines 2up
15 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
Lecture 20-Dual Quadratic Programming Formulation of SVMs and Kernel Trick
No ratings yet
Lecture 20-Dual Quadratic Programming Formulation of SVMs and Kernel Trick
31 pages
TD1 SVM
No ratings yet
TD1 SVM
1 page
SVM Training Data Selection Methods
No ratings yet
SVM Training Data Selection Methods
11 pages
Support Vector Machines: Xiaojin Zhu
No ratings yet
Support Vector Machines: Xiaojin Zhu
41 pages
20 SVM
No ratings yet
20 SVM
35 pages
Support Vector Machines
No ratings yet
Support Vector Machines
16 pages
Annurev Environ 112321 095011
No ratings yet
Annurev Environ 112321 095011
30 pages
Health and Care Data
No ratings yet
Health and Care Data
248 pages
Distance Metrics in Machine Learning
No ratings yet
Distance Metrics in Machine Learning
2 pages
Strategic Considerations For Selecting Artificial
No ratings yet
Strategic Considerations For Selecting Artificial
12 pages
Deep Learning for Alzheimer's via Retinal Imaging
No ratings yet
Deep Learning for Alzheimer's via Retinal Imaging
11 pages
Lecture 16 - Classification
No ratings yet
Lecture 16 - Classification
43 pages
Machine Learning Algorithms
No ratings yet
Machine Learning Algorithms
28 pages
99 Machine Learning Algorithm
No ratings yet
99 Machine Learning Algorithm
7 pages
Neural Networks Basics Course
No ratings yet
Neural Networks Basics Course
36 pages
ANN 3 - Perceptron
100% (1)
ANN 3 - Perceptron
56 pages
Deep Learning Overview
No ratings yet
Deep Learning Overview
102 pages
Deep Learning: RNNs & Bi-RNNs Guide
No ratings yet
Deep Learning: RNNs & Bi-RNNs Guide
21 pages
Neural Networks: Key Concepts & Functions
No ratings yet
Neural Networks: Key Concepts & Functions
22 pages
Deep Learning Models for Developers
No ratings yet
Deep Learning Models for Developers
2 pages
Neural - N - Problems - MLP
No ratings yet
Neural - N - Problems - MLP
15 pages
Comparative Study On Spoken Language Identification Based On Deep Learning
No ratings yet
Comparative Study On Spoken Language Identification Based On Deep Learning
5 pages
Unit 5
No ratings yet
Unit 5
61 pages
Practice Question Bank - Machine Learning
No ratings yet
Practice Question Bank - Machine Learning
4 pages
Convolutional Neural Networks
No ratings yet
Convolutional Neural Networks
5 pages
L5 Neural Network
No ratings yet
L5 Neural Network
67 pages
IRE Deliverable 2
No ratings yet
IRE Deliverable 2
4 pages
Machine Learning
No ratings yet
Machine Learning
16 pages
ML Module 5
No ratings yet
ML Module 5
5 pages
Fraud Detection Using Machine Learning and Deep Learning: December 2019
No ratings yet
Fraud Detection Using Machine Learning and Deep Learning: December 2019
7 pages
Detailed Deep Learning Answers
No ratings yet
Detailed Deep Learning Answers
4 pages
Unit 4-2
No ratings yet
Unit 4-2
20 pages
Deep Learning
No ratings yet
Deep Learning
3 pages
ML Session 15 Backpropagation
No ratings yet
ML Session 15 Backpropagation
30 pages
ML Ca2
No ratings yet
ML Ca2
3 pages
Experiments With A New Boosting Algorithm: Yoav Freund Robert E. Schapire
No ratings yet
Experiments With A New Boosting Algorithm: Yoav Freund Robert E. Schapire
9 pages
ML Syllabus
No ratings yet
ML Syllabus
2 pages
Purva Rawale Prcatical 4 BDA
No ratings yet
Purva Rawale Prcatical 4 BDA
6 pages
ML Visuals
No ratings yet
ML Visuals
61 pages
LATIHAN SOAL Jaringan Syaraf Tiruan
No ratings yet
LATIHAN SOAL Jaringan Syaraf Tiruan
43 pages
Deep Learning: - Course Code: - Unit 1
No ratings yet
Deep Learning: - Course Code: - Unit 1
21 pages

Support Vector Machines

Uploaded by

Support Vector Machines

Uploaded by

10 601 Introduction to Machine Learning

Machine Learning Department

Support Vector Machines

Today’s In Class Poll

Instead of minimizing the primal, we can maximize the

Instead of minimizing the primal, we can maximize the

Soft margin SVM (Primal)

Soft margin SVM (Primal) Soft margin SVM (Lagrangian Dual)

Q: When your data is not linearly separable,

The kernel k(x,z) can be any legal definition of a dot product:

This “kernel trick” can be applied to many algorithms:

We could replace the dot product of the two feature vectors

We could replace the dot product of the two feature vectors

The kernel k(x,z) can be any legal definition of a dot product:

This “kernel trick” can be applied to many algorithms:

Q: Can’t we just compute the feature

Q: So, why all the hype about the kernel trick?

Linear Same as original input

Polynomial (v1) All polynomials of degree

Polynomial (v2) All polynomials up to

Gaussian Infinite dimensional space

Hyperbolic (With SVM, this is

The kernel k(x,z) can be any legal definition of a dot product:

This “kernel trick” can be applied to many algorithms:

You might also like