0% found this document useful (0 votes)

47 views21 pages

Week 4 Logistic

Introduction to Applied Machine Learning

Uploaded by

zeliawillscumberg

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

47 views21 pages

Week 4 Logistic

Introduction to Applied Machine Learning

Uploaded by

zeliawillscumberg

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

IAML: Logistic Regression

Nigel Goddard
School of Informatics

Semester 1

1 / 21
Outline

I Logistic function
I Logistic regression
I Learning logistic regression
I Optimization
I The power of non-linear basis functions
I Least-squares classification
I Generative and discriminative models
I Relationships to Generative Models
I Multiclass classification

2 / 21
Decision Boundaries

I In this class we will discuss linear classifiers.

I For each class, there is a region of feature space in which
the classifier selects one class over the other.
I The decision boundary is the boundary of this region. (i.e.,
where the two classes are “tied”)
I In linear classifiers the decision boundary is a line.

3 / 21
Example Data

x2
o o
o o
o o
o
o o
x o
o
x x
x

x
x x x1
x

4 / 21
Linear Classifiers

I In a two-class linear classifier, we

learn a function
x2

o
o

o
o
o

o
F (x, w) = w> x + w0
o
o o
x o
x
x
x
o
that represents how aligned the
x
x x x1 instance is with y = 1.
x
I w are parameters of the classifier
that we learn from data.
I To do classification of an input x:

x 7→ (y = 1) if F (x, w) > 0

5 / 21
A Geometric View

x2
o o
o o
o o
o
w o o
x o
o
x x
x

x
x x x1
x

6 / 21
Explanation of Geometric View

I The decision boundary in this case is

{x|w> x + w0 = 0}

I w is a normal vector to this surface

I (Remember how lines can be written in terms of their
normal vector.)
I Notice that in more than 2 dimensions, this boundary will
be a hyperplane.

7 / 21
Two Class Discrimination

I For now consider a two class case: y ∈ {0, 1}.

I From now on we’ll write x = (1, x1 , x2 , . . . xd ) and
w = (w0 , w1 , . . . wd ).
I We will want a linear, probabilistic model. We could try
P(y = 1|x) = w> x. But this is stupid.
I Instead what we will do is

P(y = 1|x) = f (w> x)

I f must be between 0 and 1. It will squash the real line into

[0, 1]
I Furthermore the fact that probabilities sum to one means

P(y = 0|x) = 1 − f (w> x)

8 / 21
The logistic function
The logistic function
I We need a function that returns probabilities (i.e. stays
We need
�between 0aand
function
1). that returns probabilities (i.e. stays
between 0 and 1). provides this
I The logistic function
� The logistic function provides this
I f (z) = σ(z) ≡ 1/(1 + exp(−z)).
� f (z) = σ(z) ≡ 1/(1 + exp(−z)).
I As z goes from −∞ to ∞, so f goes from 0 to 1, a
� As z goes from −∞ to ∞, so f goes from 0 to 1, a
“squashing function”
“squashing function”
I It has a “sigmoid” shape (i.e. S-like shape)
� It has a “sigmoid” shape (i.e. S-like shape)

0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1

−6 −4 −2 0 2 4 6

6 / 24
9 / 21
Linear weights

I Linear weights + logistic squashing function == logistic

regression.
I We model the class probabilities as
D
wj xj ) = σ(wT x)
X
p(y = 1|x) = σ(
j=0

I σ(z) = 0.5 when z = 0. Hence the decision boundary is

given by wT x = 0.
I Decision boundary is a M − 1 hyperplane for a M
dimensional problem.

10 / 21
Logistic regression

I For this slide write w̃ = (w1 , w2 , . . . wd ) (i.e., exclude the

bias w0 )
I The bias parameter w0 shifts the position of the
hyperplane, but does not alter the angle
I The direction of the vector w̃ affects the angle of the
hyperplane. The hyperplane is perpendicular to w̃
I The magnitude of the vector w̃ effects how certain the
classifications are
I For small w̃ most of the probabilities within the region of
the decision boundary will be near to 0.5.
I For large w̃ probabilities in the same region will be close to
1 or 0.

11 / 21
Learning Logistic Regression

I Want to set the parameters w using training data.

I As before:
I Write out the model and hence the likelihood
I Find the derivatives of the log likelihood w.r.t the
parameters.
I Adjust the parameters to maximize the log likelihood.

12 / 21
I Assume data is independent and identically distributed.
I Call the data set D = {(x1 , y1 ), (x2 , y2 ), . . . (xn , yn )}
I The likelihood is
n
Y
p(D|w) = p(y = yi |xi , w)
i=1
n
p(y = 1|xi , w)yi (1 − p(y = 1|xi , w))1−yi
Y
=
i=1

I Hence the log likelihood L(w) = log p(D|w) is given by

n
X
L(w) = yi log σ(w> xi ) + (1 − yi ) log(1 − σ(w> xi ))
i=1

13 / 21
I It turns out that the likelihood has a unique optimum (given
sufficient training examples). It is convex.
I How to maximize? Take gradient
n
∂L
(yi − σ(wT xi ))xij
X
=
∂wj
i=1

I (Aside: something similar holds for linear regression

n
∂E
(wT φ(xi ) − yi )xij
X
=
∂wj
i=1

where E is squared error.)

I Unfortunately, you cannot maximize L(w) explicitly as for
linear regression. You need to use a numerical
optimisation method, see later.

14 / 21
Fitting this into the general structure for learning algorithms:

I Define the task: classification, discriminative

I Decide on the model structure: logistic regression model
I Decide on the score function: log likelihood
I Decide on optimization/search method to optimize the
score function: numerical optimization routine. Note we
have several choices here (stochastic gradient descent,
conjugate gradient, BFGS).

15 / 21
XOR and Linear Separability
XOR and Linear Separability
I A problem is linearly separable if we can find weights so
that
� A Tproblem is linearly separable if we can find weights so
I w̃ x + w0 > 0 for all positive cases (where y = 1), and
that
I w̃T� xw̃+T xw+0 w≤0 >
0 0forforall
allnegative cases
positive cases (where
(where y=
y = 1), and0)
I XOR � w̃T x + w0 ≤ 0 for all negative cases (where y = 0)
� XOR, a failure for the perceptron

� XOR can be solved by a perceptron using a nonlinear

transformation φ(x) of the input; can you find one?
I XOR becomes linearly separable if we apply a non-linear
11 / 24
tranformation φ(x) of the input — what is one?
16 / 21
The power of non-linear basis functions

1
1

φ2
x2

0 0.5

−1
0

−1 0 x1 1 0 0.5 φ1 1

Using two Gaussian basis functions φ1 (x) and φ2 (x)

Figure credit: Chris Bishop, PRML

As for linear regression, we can transform the input space if we

want x → φ(x) 17 / 21
Generative and Discriminative Models
I Notice that we have done something very different here
than with naive Bayes.
I Naive Bayes: Modelled how a class “generated” the
feature vector p(x|y ). Then could classify using

p(y|x) ∝ p(x|y )p(y )

. This called is a generative approach.

I Logistic regression: Model p(y|x) directly. This is a
discriminative approach.
I Discriminative advantage: Why spend effort modelling
p(x)? Seems a waste, we’re always given it as input.
I Generative advantage: Can be good with missing data
(remember how naive Bayes handles missing data). Also
good for detecting outliers. Or, sometimes you really do
want to generate the input.
18 / 21
Generative Classifiers can be Linear Too

Two scenarios where naive Bayes gives you a linear classifier.

1. Gaussian data with equal covariance. If
p(x|y = 1) ∼ N(µ1 , Σ) and p(x|y = 0) ∼ N(µ2 , Σ) then

p(y = 1|x) = σ(w̃T x + w0 )

for some (w0 , w̃) that depends on µ1 , µ2 , Σ and the class

priors
2. Binary data. Let each component xj be a Bernoulli variable
i.e. xj ∈ {0, 1}. Then a Naı̈ve Bayes classifier has the form

p(y = 1|x) = σ(w̃T x + w0 )

3. Exercise for keeners: prove these two results

19 / 21
Multiclass classification

I Create a different weight vector wk for each class, to

classify into k and not-k.
I Then use the “softmax” function

exp(wTk x)
p(y = k|x) = PC
T
j=1 exp(wj x)

PC
I Note that 0 ≤ p(y = k |x) ≤ 1 and j=1 p(y = j|x) = 1
I This is the natural generalization of logistic regression to
more than 2 classes.

20 / 21
Least-squares classification
I Logistic regression is more complicated algorithmically
than linear regression
I Why not just use linear regression with 0/1 targets?

4 4

2 2

0 0

−2 −2

−4 −4

−6 −6

−8 −8

−4 −2 0 2 4 6 8 −4 −2 0 2 4 6 8

Green: logistic regression; magenta, least-squares regression

Figure credit: Chris Bishop, PRML
21 / 21

04 - Linear-Classification-2024
No ratings yet
04 - Linear-Classification-2024
65 pages
Machine Learning for Mechanics
No ratings yet
Machine Learning for Mechanics
19 pages
Intro to Logistic Regression
No ratings yet
Intro to Logistic Regression
4 pages
Lecture Notes 6 Logistic Regression
No ratings yet
Lecture Notes 6 Logistic Regression
8 pages
Lecture 11 Logistic
No ratings yet
Lecture 11 Logistic
19 pages
2021 Logistic Regression
No ratings yet
2021 Logistic Regression
33 pages
Generalized Linear Model
No ratings yet
Generalized Linear Model
67 pages
Logistic Regression Explained
No ratings yet
Logistic Regression Explained
25 pages
Cheatsheet Supervised Learning
100% (1)
Cheatsheet Supervised Learning
4 pages
Logistic Regression: Some Slides Adapted From Dan Jurfasky and Brendan O'Connor
No ratings yet
Logistic Regression: Some Slides Adapted From Dan Jurfasky and Brendan O'Connor
53 pages
7 Logistic-Regression
No ratings yet
7 Logistic-Regression
63 pages
04 Probability and Learning PDF
No ratings yet
04 Probability and Learning PDF
34 pages
4 Linear Regression Additional Notes
No ratings yet
4 Linear Regression Additional Notes
8 pages
ML - MU - Unit - 2 - Supervised Learning-Classification Techniques
No ratings yet
ML - MU - Unit - 2 - Supervised Learning-Classification Techniques
153 pages
Machine Learning 2
No ratings yet
Machine Learning 2
19 pages
12 - Bài Toán Phân L P - LR - v2
No ratings yet
12 - Bài Toán Phân L P - LR - v2
130 pages
COMP-377Week6 v1.1
No ratings yet
COMP-377Week6 v1.1
38 pages
Logistic Regression
No ratings yet
Logistic Regression
26 pages
Week 7
No ratings yet
Week 7
21 pages
Lecture 5 - Logistic Regression
No ratings yet
Lecture 5 - Logistic Regression
28 pages
Logistic Regression
No ratings yet
Logistic Regression
78 pages
07 - Linear Models For Classification
No ratings yet
07 - Linear Models For Classification
76 pages
ML DSBA Lab2
No ratings yet
ML DSBA Lab2
4 pages
Lec 02 LogisticReg
No ratings yet
Lec 02 LogisticReg
33 pages
09 23ECE216 LogisticRegression
No ratings yet
09 23ECE216 LogisticRegression
40 pages
Support Vector Machines Vs Logistic Regression: Kevin Swersky University of Toronto CSC2515 Tutorial
No ratings yet
Support Vector Machines Vs Logistic Regression: Kevin Swersky University of Toronto CSC2515 Tutorial
23 pages
Supervised Learning Cheatsheet
No ratings yet
Supervised Learning Cheatsheet
4 pages
Logistic Regression & SVM Explained
No ratings yet
Logistic Regression & SVM Explained
6 pages
3-LG Eval
No ratings yet
3-LG Eval
52 pages
ML Classifiers & Regression Guide
No ratings yet
ML Classifiers & Regression Guide
46 pages
06 Lectureslides LinearClassification Fixed
No ratings yet
06 Lectureslides LinearClassification Fixed
52 pages
Logistic Regression
No ratings yet
Logistic Regression
25 pages
Lecture 7 Classification
No ratings yet
Lecture 7 Classification
33 pages
Binary Classification Techniques
No ratings yet
Binary Classification Techniques
11 pages
Logistic Regression Interview Prep
No ratings yet
Logistic Regression Interview Prep
9 pages
CSCI-43646364 S25 - Lecture 4
No ratings yet
CSCI-43646364 S25 - Lecture 4
92 pages
Final ML
No ratings yet
Final ML
54 pages
Regression vs Classification Algorithms
100% (1)
Regression vs Classification Algorithms
13 pages
20MEMECH Part 3 - Classification
No ratings yet
20MEMECH Part 3 - Classification
49 pages
Binary Logistic Regression 2
No ratings yet
Binary Logistic Regression 2
43 pages
Fileml
No ratings yet
Fileml
54 pages
Lect4 Log Reg
No ratings yet
Lect4 Log Reg
20 pages
Ch03 LogisticRegression
No ratings yet
Ch03 LogisticRegression
79 pages
Multimedia Application L9
No ratings yet
Multimedia Application L9
43 pages
M02Logistic Regression Logistic RegressioLogistic Regressionn
No ratings yet
M02Logistic Regression Logistic RegressioLogistic Regressionn
19 pages
ML Classification Trupesh Patel
No ratings yet
ML Classification Trupesh Patel
39 pages
Data Mining: Classification
No ratings yet
Data Mining: Classification
79 pages
Lec 20
No ratings yet
Lec 20
16 pages
FEM 2063 - Data Analytics: CHAPTER 4: Classifications
100% (2)
FEM 2063 - Data Analytics: CHAPTER 4: Classifications
76 pages
5 LR Apr 7 2021
No ratings yet
5 LR Apr 7 2021
93 pages
Logistic Regression
No ratings yet
Logistic Regression
91 pages
03-Logistic Regression
No ratings yet
03-Logistic Regression
59 pages
Lec 4
No ratings yet
Lec 4
24 pages
5 LR Apr 7 2021
No ratings yet
5 LR Apr 7 2021
94 pages
Q No. 1 1.1machine Learning:: Machine Learning Is The Study of Computer Algorithms That Improve Automatically
No ratings yet
Q No. 1 1.1machine Learning:: Machine Learning Is The Study of Computer Algorithms That Improve Automatically
10 pages
Logistic Regression
No ratings yet
Logistic Regression
8 pages
CS229 Supplemental Lecture Notes: 1 Binary Classification
No ratings yet
CS229 Supplemental Lecture Notes: 1 Binary Classification
7 pages
Logistic Regression Explained
No ratings yet
Logistic Regression Explained
16 pages
Logistic Regression Explained
No ratings yet
Logistic Regression Explained
15 pages
Doing Business in Hungary
No ratings yet
Doing Business in Hungary
22 pages
Biological Data Science Lecture6
No ratings yet
Biological Data Science Lecture6
29 pages
Master of Science in Renewable Energy and Management
No ratings yet
Master of Science in Renewable Energy and Management
1 page
W2e Multivariate Gaussian
No ratings yet
W2e Multivariate Gaussian
6 pages
Award in Education and Training Sample
No ratings yet
Award in Education and Training Sample
9 pages
w2c Central Limit
No ratings yet
w2c Central Limit
1 page
TS Part2
No ratings yet
TS Part2
62 pages
Biological Data Science Lecture4
No ratings yet
Biological Data Science Lecture4
21 pages
BDS 2018-19
No ratings yet
BDS 2018-19
6 pages
MATH11183 Week 1-Part 2
No ratings yet
MATH11183 Week 1-Part 2
18 pages
MLPR w0f - Machine Learning and Pattern Recognition
No ratings yet
MLPR w0f - Machine Learning and Pattern Recognition
3 pages
BDS 2016-17
No ratings yet
BDS 2016-17
4 pages
Part 5
No ratings yet
Part 5
31 pages
MDA3S
No ratings yet
MDA3S
22 pages
Week 2 Naive Bayes
No ratings yet
Week 2 Naive Bayes
15 pages
PMRslides 03 B
No ratings yet
PMRslides 03 B
45 pages
Part 3
No ratings yet
Part 3
29 pages
Part 4
No ratings yet
Part 4
24 pages
Week 8 Pca
No ratings yet
Week 8 Pca
26 pages
PMRslides 02
No ratings yet
PMRslides 02
13 pages
W6a Gaussian Process Kernels
No ratings yet
W6a Gaussian Process Kernels
6 pages
Slides 03 A
No ratings yet
Slides 03 A
21 pages
Bayesian Week4 LectureNotes
No ratings yet
Bayesian Week4 LectureNotes
15 pages
w9b Netflix Prize
No ratings yet
w9b Netflix Prize
3 pages
Bio Statslectures
No ratings yet
Bio Statslectures
60 pages
Heat Advection
No ratings yet
Heat Advection
12 pages
Bayesian Workshop1 Solution
No ratings yet
Bayesian Workshop1 Solution
3 pages
2019 AMAM Exam Paper
No ratings yet
2019 AMAM Exam Paper
3 pages
Laplace Approximation in Bayesian Logistic Regression
No ratings yet
Laplace Approximation in Bayesian Logistic Regression
4 pages
2017 AMAM Exam Paper
No ratings yet
2017 AMAM Exam Paper
6 pages
PCI-Handbook 18122023
No ratings yet
PCI-Handbook 18122023
125 pages
Repair and Maintenance of Mobile Cell Phones
No ratings yet
Repair and Maintenance of Mobile Cell Phones
6 pages
Interns JDs UTS 2026
No ratings yet
Interns JDs UTS 2026
5 pages
MODULE 4 Network Security
No ratings yet
MODULE 4 Network Security
16 pages
USB Audio/MIDI Interface: Reference Manual
No ratings yet
USB Audio/MIDI Interface: Reference Manual
24 pages
Cab 2024
No ratings yet
Cab 2024
1 page
Numerical Differentiation
No ratings yet
Numerical Differentiation
12 pages
Apps SQL Queries
100% (3)
Apps SQL Queries
11 pages
Pimcore E
No ratings yet
Pimcore E
448 pages
T4 For Virtualization
No ratings yet
T4 For Virtualization
16 pages
RNC Step2 Expansion - General - Procedure
No ratings yet
RNC Step2 Expansion - General - Procedure
9 pages
Rps Mini Project Final - Removed
No ratings yet
Rps Mini Project Final - Removed
11 pages
Cybersecurity in The Digital Age Protecting Data in A Connected World
No ratings yet
Cybersecurity in The Digital Age Protecting Data in A Connected World
10 pages
Data Dara Dasar
No ratings yet
Data Dara Dasar
14 pages
Case+study-+IIITB+ +upGrad+Template
No ratings yet
Case+study-+IIITB+ +upGrad+Template
19 pages
Time Office Software Manual
No ratings yet
Time Office Software Manual
70 pages
IT for Agribusiness Students
No ratings yet
IT for Agribusiness Students
5 pages
Are You Sure?: Astrology For Beginners B V Raman
No ratings yet
Are You Sure?: Astrology For Beginners B V Raman
2 pages
Alka Tiwari
No ratings yet
Alka Tiwari
37 pages
Premium Power Supply Scheme For Data Center With SMES and DG Integration
No ratings yet
Premium Power Supply Scheme For Data Center With SMES and DG Integration
5 pages
Hariganesh's CV
No ratings yet
Hariganesh's CV
1 page
Simplex Algorithm Explained
No ratings yet
Simplex Algorithm Explained
9 pages
B) Data Center Design - IT Departments Tend To Over-Size and Count On Their Companies Expanding To Capacity in The Meantime, They're Inefficient
No ratings yet
B) Data Center Design - IT Departments Tend To Over-Size and Count On Their Companies Expanding To Capacity in The Meantime, They're Inefficient
3 pages
A Scoping Review of Computational Thinking Assessments in Higher Education
No ratings yet
A Scoping Review of Computational Thinking Assessments in Higher Education
46 pages
Xilinx Answer 64761 Ultrascale Devices
No ratings yet
Xilinx Answer 64761 Ultrascale Devices
35 pages
Calculus: Continuity & Differentiability
No ratings yet
Calculus: Continuity & Differentiability
6 pages
IOT - Architecture and Setup - 20250124
100% (1)
IOT - Architecture and Setup - 20250124
7 pages
Icse Semester 1 Examination Specimen Question Paper Computer Applications
No ratings yet
Icse Semester 1 Examination Specimen Question Paper Computer Applications
12 pages
Towards Assessing The Maturity of OT Security Control Standards and Guidelines
No ratings yet
Towards Assessing The Maturity of OT Security Control Standards and Guidelines
6 pages
Data Mining Techniques: By-Priyank Yadav CSE
No ratings yet
Data Mining Techniques: By-Priyank Yadav CSE
8 pages

Week 4 Logistic

Uploaded by

Week 4 Logistic

Uploaded by

IAML: Logistic Regression

I In this class we will discuss linear classifiers.

I In a two-class linear classifier, we

I The decision boundary in this case is

I w is a normal vector to this surface

I For now consider a two class case: y ∈ {0, 1}.

P(y = 1|x) = f (w> x)

I f must be between 0 and 1. It will squash the real line into

P(y = 0|x) = 1 − f (w> x)

I Linear weights + logistic squashing function == logistic

I σ(z) = 0.5 when z = 0. Hence the decision boundary is

I For this slide write w̃ = (w1 , w2 , . . . wd ) (i.e., exclude the

I Want to set the parameters w using training data.

I Hence the log likelihood L(w) = log p(D|w) is given by

I (Aside: something similar holds for linear regression

where E is squared error.)

I Define the task: classification, discriminative

� XOR can be solved by a perceptron using a nonlinear

Using two Gaussian basis functions φ1 (x) and φ2 (x)

As for linear regression, we can transform the input space if we

p(y|x) ∝ p(x|y )p(y )

. This called is a generative approach.

Two scenarios where naive Bayes gives you a linear classifier.

p(y = 1|x) = σ(w̃T x + w0 )

for some (w0 , w̃) that depends on µ1 , µ2 , Σ and the class

p(y = 1|x) = σ(w̃T x + w0 )

3. Exercise for keeners: prove these two results

I Create a different weight vector wk for each class, to

Green: logistic regression; magenta, least-squares regression

You might also like