0% found this document useful (0 votes)

92 views23 pages

Support Vector Machines Vs Logistic Regression: Kevin Swersky University of Toronto CSC2515 Tutorial

This document compares and contrasts logistic regression and support vector machines (SVMs). Both can be viewed as minimizing a cost associated with misclassification based on a likelihood ratio. Logistic regression maximizes probability of the data, while SVMs maximize the margin between classes. However, SVMs can be derived by asking logistic regression to optimize for correct decisions rather than probabilities. This connection allows principled extensions of both models, such as kernelizing logistic regression or designing multi-class SVMs. While each has advantages depending on the problem, both aim to minimize misclassification based on a likelihood ratio framework.

Uploaded by

prash27k

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

92 views23 pages

Support Vector Machines Vs Logistic Regression: Kevin Swersky University of Toronto CSC2515 Tutorial

Uploaded by

prash27k

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

Support Vector Machines

vs Logistic Regression
Kevin Swersky

University of Toronto CSC2515 Tutorial

Part of this tutorial is borrowed from Mark Schmidt’s excellent note on

structural SVMs: http://www.di.ens.fr/~mschmidt/Documents/ssvm.pdf
Logistic regression
Logistic regression

•  Assign probability to each outcome

P (y = 1|x) = σ(wT x + b)

•  Train to maximize likelihood

!N
l(w) = − n=1 σ(wT xn + b)yn (1 − σ(wT xn + b))(1−yn )

•  Linear decision boundary (with y being 0 or 1)

ŷ = I[wT x + b ≥ 0]
Support vector machines

Source: Wikipedia
Support vector machines

•  Enforce a margin of separation (here, y ∈ {0, 1})

(2yn − 1)wT xn ≥ 1, ∀n = 1 . . . N

•  Train to find the maximum margin

1
min ||w||2
2
s.t. (2yn 1)(wT xn + b) 1, n = 1...N

•  Linear decision boundary

ŷ = I[wT x + b ≥ 0]
Recap

•  Logistic regression focuses on maximizing the

probability of the data. The farther the data lies from
the separating hyperplane (on the correct side), the
happier LR is.

•  An SVM tries to find the separating hyperplane that

maximizes the distance of the closest points to the
margin (the support vectors). If a point is not a
support vector, it doesn’t really matter.
A different take

•  Remember, in this example y ∈ {0, 1}

•  Another take on the LR decision function uses the

probabilities instead:

1 if P (y = 1|x) P (y = 0|x)
ŷ =
0 otherwise

P (y = 1|x) exp(wT x + b)
P (y = 0|x) 1
A different take

•  What if we don’t care about getting the right

probability, we just want to make the right decision?

•  We can express this as a constraint on the likelihood

ratio,

P (y=1|x)
P (y=0|x) ≥c
•  For some arbitrary constant c>1.
A different take

•  Taking the log of both sides,

log (P (y = 1|x)) − log (P (y = 0|x)) ≥ log(c)

•  and plugging in the definition of P,

wT x + b 0 log(c)
= (wT x + b) log(c)

•  c is arbitrary, so we pick it to satisfy log(c) = 1

wT x + b ≥ 1
A different take

•  This gives a feasibility problem (specifically the perceptron

problem) which may not have a unique solution.

•  Instead, put a quadratic penalty on the weights to make the

solution unique:
1
min ||w||2
2
s.t. (2yn 1)(wT xn + b) 1, n = 1...N

•  This gives us an SVM!

•  We derived an SVM by asking LR to make the right decisions.

The likelihood ratio

•  The key to this derivation is the likelihood ratio,

P (y = 1|x)
r=
P (y = 0|x)
exp(wT x + b)
=
1
= exp(wT x + b)
•  We can think of a classifier as assigning some cost to r.

•  Different costs = different classifiers.

LR cost
1
•  Pick cost(r) = log(1 + )
r
= log(1 + exp( (wT x + b)))

•  This is the LR objective (for a positive example)!

SVM with slack variables

•  If the data is not linearly separable, we can change

the program to:
N
1
min ||w||2 + n
2 n=1
s.t. (2yn 1)(wT xn + b) 1 n, n = 1...N
n 0, n = 1...N

•  Now if a point n is misclassified, we incur a

cost of n , it’s distance to the margin.
SVM with slack variables cost

•  Pick cost(r) =max(0, 1 log(r))

=max(0, 1 (wT x + b))
LR cost vs SVM cost

•  Plotted in terms of r,
LR cost vs SVM cost

•  Plotted in terms of wT x + b ,
Exploiting this connection

•  We can now use this connection to derive extensions

to each method.

•  These might seem obvious (maybe not) and that’s

usually a good thing.

•  The important point though is that they are

principled, rather than just hacks. We can trust that
they aren’t doing anything crazy.
Kernel trick for LR

•  Recall that in it’s dual form, we can represent an

SVM decision boundary as:
N
wT (x) + b = n K(x, xn ) =0
n=1
•  where (x) is an ∞-dimensional basis expansion
of x .

•  Plugging this into the LR cost:

N
log(1 + exp( n K(x, xn )))
n=1
Multi-class SVMs

•  Recall for multi-class LR we have:

exp(wiT x + bi )
P (y = i|x) = Tx + b )
k exp(w k k
Multi-class SVMs

•  Suppose instead we just want the decision rule to

satisfy:

P (y = i|x)
c k=i
P (y = k|x)

•  Taking logs as before, this gives:

wiT x wkT x 1 k=i
Multi-class SVMs

•  This produces the following quadratic program:

1
min ||w||2
2
s.t. (wyTn xn + byn ) (wkT xn + bk ) 1, n = 1 . . . N, k = yn
Take-home message

•  Logistic regression and support vector machines are

closely linked.

•  Both can be viewed as taking a probabilistic model

and minimizing some cost associated with
misclassification based on the likelihood ratio.

•  This lets us analyze these classifiers in a decision

theoretic framework.

•  It also allows us to extend them in principled ways.

Which one to use?

•  As always, depends on your problem.

•  LR gives calibrated probabilities that can be interpreted as
confidence in a decision.
•  LR gives us an unconstrained, smooth objective.
•  LR can be (straightforwardly) used within Bayesian models.
•  SVMs don’t penalize examples for which the correct decision is
made with sufficient confidence. This may be good for
generalization.
•  SVMs have a nice dual form, giving sparse solutions when
using the kernel trick (better scalability).

Unit 1,2,3
No ratings yet
Unit 1,2,3
17 pages
IOML Ch-5
No ratings yet
IOML Ch-5
11 pages
Week 4 Logistic
No ratings yet
Week 4 Logistic
21 pages
Lecture 18 - SVM
No ratings yet
Lecture 18 - SVM
54 pages
Support Vector Machine
No ratings yet
Support Vector Machine
45 pages
Intro To Machine Learning Lecture Notes2
No ratings yet
Intro To Machine Learning Lecture Notes2
7 pages
UNIT-II-Support Vector Machine Algorithm
No ratings yet
UNIT-II-Support Vector Machine Algorithm
13 pages
Machine Learning
No ratings yet
Machine Learning
45 pages
Lect4 Log Reg
No ratings yet
Lect4 Log Reg
20 pages
w04 LectureSlices MA4550
No ratings yet
w04 LectureSlices MA4550
32 pages
Support Vector Machines
No ratings yet
Support Vector Machines
33 pages
Support Vector Machines (SVM) Models in Stata
No ratings yet
Support Vector Machines (SVM) Models in Stata
19 pages
27-Module 4 - Support Vector Machine and Naïve Bayes-20-09-2024
No ratings yet
27-Module 4 - Support Vector Machine and Naïve Bayes-20-09-2024
31 pages
Unit 3 Aam
No ratings yet
Unit 3 Aam
30 pages
12 Support Vector Machines PDF
No ratings yet
12 Support Vector Machines PDF
11 pages
Support Vector Machine: Prof. Subodh Kumar Mohanty
No ratings yet
Support Vector Machine: Prof. Subodh Kumar Mohanty
52 pages
Lecture 8
No ratings yet
Lecture 8
19 pages
20.1 Support Vector Machines (Introduction Linear Models)
No ratings yet
20.1 Support Vector Machines (Introduction Linear Models)
19 pages
Interview Preparing - ML Draft
No ratings yet
Interview Preparing - ML Draft
12 pages
Supervised Learning Cheatsheet
No ratings yet
Supervised Learning Cheatsheet
4 pages
07 - Linear Models For Classification
No ratings yet
07 - Linear Models For Classification
76 pages
Support Vector Machines Ymod
No ratings yet
Support Vector Machines Ymod
4 pages
CS-13410 Introduction To Machine Learning
No ratings yet
CS-13410 Introduction To Machine Learning
33 pages
Introduction to Support Vector Machines
No ratings yet
Introduction to Support Vector Machines
33 pages
Introduction To: Support Vector Machines
No ratings yet
Introduction To: Support Vector Machines
53 pages
SVM7
No ratings yet
SVM7
53 pages
ML Classifiers with Scikit-learn
No ratings yet
ML Classifiers with Scikit-learn
43 pages
Support Vector Machine: Abinas Panda
No ratings yet
Support Vector Machine: Abinas Panda
52 pages
Ch4 Support Vector Machine
No ratings yet
Ch4 Support Vector Machine
21 pages
Introduction To Support Vector Machines
No ratings yet
Introduction To Support Vector Machines
23 pages
Unit - 2-1
No ratings yet
Unit - 2-1
7 pages
Support Vector Machines As Probabilistic Models
No ratings yet
Support Vector Machines As Probabilistic Models
8 pages
04 - Linear-Classification-2024
No ratings yet
04 - Linear-Classification-2024
65 pages
UNIT3 Machine Learning
No ratings yet
UNIT3 Machine Learning
53 pages
2IIG0 Cheat Sheet 1
No ratings yet
2IIG0 Cheat Sheet 1
2 pages
08 Classification
No ratings yet
08 Classification
46 pages
EE353 - 769 08 Linear Classification
No ratings yet
EE353 - 769 08 Linear Classification
22 pages
Support Vector Machine
No ratings yet
Support Vector Machine
50 pages
2024 Scu ML 2 1 SVM
No ratings yet
2024 Scu ML 2 1 SVM
36 pages
Third Year Engineering: Unit II: Supervised Machine Learning
No ratings yet
Third Year Engineering: Unit II: Supervised Machine Learning
11 pages
ML - 5 Sovan LR SVM 1
No ratings yet
ML - 5 Sovan LR SVM 1
59 pages
Unit - 2
No ratings yet
Unit - 2
15 pages
SVM & Image Classification.
No ratings yet
SVM & Image Classification.
22 pages
Introduction to Support Vector Machines
No ratings yet
Introduction to Support Vector Machines
40 pages
4 Linear Regression Additional Notes
No ratings yet
4 Linear Regression Additional Notes
8 pages
Cheatsheet Supervised Learning
100% (1)
Cheatsheet Supervised Learning
4 pages
ML Classification Trupesh Patel
No ratings yet
ML Classification Trupesh Patel
39 pages
Lecture 11 Logistic
No ratings yet
Lecture 11 Logistic
19 pages
9 Svm-Handout PDF
No ratings yet
9 Svm-Handout PDF
21 pages
Support Vector Machines
No ratings yet
Support Vector Machines
12 pages
Introduction to Support Vector Machines
No ratings yet
Introduction to Support Vector Machines
36 pages
Foundations of Machine Learning: Part A: Logistic Regression
No ratings yet
Foundations of Machine Learning: Part A: Logistic Regression
63 pages
Unit 3
No ratings yet
Unit 3
12 pages
ML 18-20 SVM
No ratings yet
ML 18-20 SVM
44 pages
Unit II 2.2 ML Kernel Machines SVM
No ratings yet
Unit II 2.2 ML Kernel Machines SVM
50 pages
SVM Tutorial
No ratings yet
SVM Tutorial
31 pages
CMPE 442 Introduction To Machine Learning: Support Vector Machines
No ratings yet
CMPE 442 Introduction To Machine Learning: Support Vector Machines
64 pages
Class05 LogisticsSVM
No ratings yet
Class05 LogisticsSVM
33 pages
This Is
No ratings yet
This Is
7 pages
Understanding Emotions & Moods
No ratings yet
Understanding Emotions & Moods
12 pages
On Some Issuse of Standardisation in Maritime English Istanbul
No ratings yet
On Some Issuse of Standardisation in Maritime English Istanbul
16 pages
Characteristics and Objectives of Successful Communication
No ratings yet
Characteristics and Objectives of Successful Communication
7 pages
Tutorial 02 - The Organizational Context - Strategy, Structure and Culture
No ratings yet
Tutorial 02 - The Organizational Context - Strategy, Structure and Culture
2 pages
Chapter 5 Tools and Techniques in Decision Making
No ratings yet
Chapter 5 Tools and Techniques in Decision Making
26 pages
How To Give A Persuasive Speech
No ratings yet
How To Give A Persuasive Speech
27 pages
Human Resource
No ratings yet
Human Resource
19 pages
Galveston College Chemistry Insights
No ratings yet
Galveston College Chemistry Insights
1 page
ACP Workbook
100% (1)
ACP Workbook
46 pages
Section 1.2. Roles of Assessment - 2
No ratings yet
Section 1.2. Roles of Assessment - 2
4 pages
GST 101-1
No ratings yet
GST 101-1
8 pages
Final Lesson Plan Invertebrates
100% (2)
Final Lesson Plan Invertebrates
4 pages
Mastering Delegation for Managers
No ratings yet
Mastering Delegation for Managers
25 pages
Education Public Vs Private Final
No ratings yet
Education Public Vs Private Final
2 pages
Rajiv Gandhi Proudyogiki Vishwavidyalaya, Bhopal Rajiv Gandhi Proudyogiki Vishwavidyalaya, Bhopal Rajiv Gandhi Proudyogiki Vishwavidyalaya, Bhopal
No ratings yet
Rajiv Gandhi Proudyogiki Vishwavidyalaya, Bhopal Rajiv Gandhi Proudyogiki Vishwavidyalaya, Bhopal Rajiv Gandhi Proudyogiki Vishwavidyalaya, Bhopal
1 page
LNG 411 Academic Listening and Speaking
100% (1)
LNG 411 Academic Listening and Speaking
6 pages
Student Report Card Summary
No ratings yet
Student Report Card Summary
3 pages
Principles of Soft Computing, 2: Edition
0% (1)
Principles of Soft Computing, 2: Edition
19 pages
Design Thinking Task Guide
No ratings yet
Design Thinking Task Guide
5 pages
E FutureCatalog
No ratings yet
E FutureCatalog
100 pages
Learner's Guide: Sampling Methods
No ratings yet
Learner's Guide: Sampling Methods
13 pages
Teens: Mastering Emotions
100% (2)
Teens: Mastering Emotions
42 pages
Assignment 1
No ratings yet
Assignment 1
6 pages
DLL Personal Development Leah David
No ratings yet
DLL Personal Development Leah David
9 pages
Annex D-Internet Research Output
No ratings yet
Annex D-Internet Research Output
1 page
Analisis Psikometri Intelligenz Structur Test (IST) Pada Mahasiswa
No ratings yet
Analisis Psikometri Intelligenz Structur Test (IST) Pada Mahasiswa
11 pages
Debate Script 3
No ratings yet
Debate Script 3
7 pages
Economics Sip
No ratings yet
Economics Sip
3 pages
Deped Order No. 8 S. 2015 &e Class Record
No ratings yet
Deped Order No. 8 S. 2015 &e Class Record
16 pages
Types of Research SHS Prac 1
0% (1)
Types of Research SHS Prac 1
9 pages

Support Vector Machines Vs Logistic Regression: Kevin Swersky University of Toronto CSC2515 Tutorial

Uploaded by

Support Vector Machines Vs Logistic Regression: Kevin Swersky University of Toronto CSC2515 Tutorial

Uploaded by

Support Vector Machines

University of Toronto CSC2515 Tutorial

Part of this tutorial is borrowed from Mark Schmidt’s excellent note on

• Assign probability to each outcome

• Train to maximize likelihood

• Linear decision boundary (with y being 0 or 1)

• Enforce a margin of separation (here, y ∈ {0, 1})

• Train to find the maximum margin

• Linear decision boundary

• Logistic regression focuses on maximizing the

• An SVM tries to find the separating hyperplane that

• Remember, in this example y ∈ {0, 1}

• Another take on the LR decision function uses the

• What if we don’t care about getting the right

• We can express this as a constraint on the likelihood

• Taking the log of both sides,

• and plugging in the definition of P,

• c is arbitrary, so we pick it to satisfy log(c) = 1

• This gives a feasibility problem (specifically the perceptron

• Instead, put a quadratic penalty on the weights to make the

• This gives us an SVM!

• We derived an SVM by asking LR to make the right decisions.

• The key to this derivation is the likelihood ratio,

• Different costs = different classifiers.

• This is the LR objective (for a positive example)!

• If the data is not linearly separable, we can change

• Now if a point n is misclassified, we incur a

• Pick cost(r) =max(0, 1 log(r))

• We can now use this connection to derive extensions

• These might seem obvious (maybe not) and that’s

• The important point though is that they are

• Recall that in it’s dual form, we can represent an

• Plugging this into the LR cost:

• Recall for multi-class LR we have:

• Suppose instead we just want the decision rule to

• Taking logs as before, this gives:

• This produces the following quadratic program:

• Logistic regression and support vector machines are

• Both can be viewed as taking a probabilistic model

• This lets us analyze these classifiers in a decision

• It also allows us to extend them in principled ways.

• As always, depends on your problem.

You might also like

•  Assign probability to each outcome

•  Train to maximize likelihood

•  Linear decision boundary (with y being 0 or 1)

•  Enforce a margin of separation (here, y ∈ {0, 1})

•  Train to find the maximum margin

•  Linear decision boundary

•  Logistic regression focuses on maximizing the

•  An SVM tries to find the separating hyperplane that

•  Remember, in this example y ∈ {0, 1}

•  Another take on the LR decision function uses the

•  What if we don’t care about getting the right

•  We can express this as a constraint on the likelihood

•  Taking the log of both sides,

•  and plugging in the definition of P,

•  c is arbitrary, so we pick it to satisfy log(c) = 1

•  This gives a feasibility problem (specifically the perceptron

•  Instead, put a quadratic penalty on the weights to make the

•  This gives us an SVM!

•  We derived an SVM by asking LR to make the right decisions.

•  The key to this derivation is the likelihood ratio,

•  Different costs = different classifiers.

•  This is the LR objective (for a positive example)!

•  If the data is not linearly separable, we can change

•  Now if a point n is misclassified, we incur a

•  Pick cost(r) =max(0, 1 log(r))

•  We can now use this connection to derive extensions

•  These might seem obvious (maybe not) and that’s

•  The important point though is that they are

•  Recall that in it’s dual form, we can represent an

•  Plugging this into the LR cost:

•  Recall for multi-class LR we have:

•  Suppose instead we just want the decision rule to

•  Taking logs as before, this gives:

•  This produces the following quadratic program:

•  Logistic regression and support vector machines are

•  Both can be viewed as taking a probabilistic model

•  This lets us analyze these classifiers in a decision

•  It also allows us to extend them in principled ways.

•  As always, depends on your problem.