0% found this document useful (0 votes)

6 views19 pages

Lecture 11 Logistic

Uploaded by

jsandeep.esummit

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views19 pages

Lecture 11 Logistic

Uploaded by

jsandeep.esummit

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

Logistic Regression

Rishabh Iyer
University of Texas at Dallas

based on the slides of Nick Rouzzi and Vibhav Gogate

Last Time

• Supervised learning via naive Bayes

• Use MLE to estimate a distribution 𝑝 𝑥, 𝑦 = 𝑝 𝑦 𝑝(𝑥|𝑦)

• Classify by looking at the conditional distribution, 𝑝(𝑦|𝑥)

• Today: logistic regression

2
Logistic Regression
• Learn 𝑝(𝑌|𝑋) directly from the data

• Assume a particular functional form, e.g., a linear classifier

𝑝 𝑌 = 1 𝑥 = 1 on one side and 0 on the other

• Not differentiable…
𝑝(𝑌 = 1|𝑥) = 0
• Makes it difficult to learn

• Can’t handle noisy labels

𝑝(𝑌 = 1|𝑥) = 1
3
Logistic Regression

• Learn 𝑝(𝑦|𝑥) directly from the data

• Assume a particular functional

form
1
𝑝 𝑌 = −1 𝑥 =
1 + exp 𝑤 𝑇 𝑥 + 𝑏

exp 𝑤 𝑇 𝑥 + 𝑏
𝑝 𝑌=1𝑥 =
1 + exp 𝑤 𝑇 𝑥 + 𝑏

4
Logistic Function in 𝑚 Dimensions

1
𝑝 𝑌 = −1 𝑥 =
1 + exp 𝑤 𝑇 𝑥 + 𝑏

Can be applied to
discrete and
continuous features

5
Functional Form: Two classes
• Given some 𝑤 and 𝑏, we can classify a new point 𝑥 by assigning
the label 1 if 𝑝 𝑌 = 1 𝑥 > 𝑝(𝑌 = −1|𝑥) and −1 otherwise

• This leads to a linear classification rule:

• Classify as a 1 if 𝑤 𝑇 𝑥 + 𝑏 > 0

• Classify as a −1 if 𝑤 𝑇 𝑥 + 𝑏 < 0

6
Learning the Weights

• To learn the weights, we maximize the conditional likelihood

𝑁

𝑤 ∗ , 𝑏 ∗ = arg max ෑ 𝑝(𝑦 𝑖 |𝑥 𝑖 , 𝑤, 𝑏)

𝑤,𝑏
𝑖=1

• This is the not the same strategy that we used in the case of
naive Bayes

• For naive Bayes, we maximized the log-likelihood

7
Generative vs. Discriminative Classifiers
Generative classifier: Discriminative classifiers:
(e.g., Naïve Bayes) (e.g., Logistic Regression)
• Assume some functional form • Assume some functional
for 𝑝(𝑥|𝑦), 𝑝(𝑦) form for 𝑝(𝑦|𝑥)
• Estimate parameters of 𝑝(𝑥|𝑦), • Estimate parameters of
𝑝(𝑦) directly from training data 𝑝(𝑦|𝑥) directly from training
• Use Bayes rule to calculate data
𝑝 𝑦𝑥 • This is a discriminative model
• This is a generative model • Directly learn 𝑝(𝑦|𝑥)
• Indirect computation of 𝑝(𝑌|𝑋) • But cannot obtain a sample of
through Bayes rule the data as 𝑝(𝑥) is not
available
• As a result, can also generate a
sample of the data, • Useful for discriminating labels
𝑝(𝑥) = 𝑦 𝑝 𝑦 𝑝(𝑥|𝑦)
8
Learning the Weights
𝑁

ℓ 𝑤, 𝑏 = ln ෑ 𝑝(𝑦 𝑖 |𝑥 𝑖 , 𝑤, 𝑏)
𝑖=1
𝑁

= ෍ ln 𝑝(𝑦 𝑖 |𝑥 𝑖 , 𝑤, 𝑏)
𝑖=1
𝑁 𝑖
𝑦 +1 𝑖
𝑦 𝑖 +1
=෍ ln 𝑝(𝑌 = 1|𝑥 , 𝑤, 𝑏) + 1 − ln 𝑝(𝑌 = −1|𝑥 𝑖 , 𝑤, 𝑏)
2 2
𝑖=1
𝑁
𝑦 𝑖 +1 𝑝 𝑌 = 1 𝑥 𝑖 , 𝑤, 𝑏 𝑖
=෍ ln + ln 𝑝(𝑌 = −1|𝑥 , 𝑤, 𝑏)
𝑖=1
2 𝑝 𝑌 = −1 𝑥 𝑖 , 𝑤, 𝑏
𝑁
𝑦 𝑖 + 1 𝑇 (𝑖)
=෍ 𝑤 𝑥 + 𝑏 − ln 1 + exp 𝑤 𝑇 𝑥 𝑖
+𝑏
2
𝑖=1

9
Learning the Weights
𝑁

ℓ 𝑤, 𝑏 = ln ෑ 𝑝(𝑦 𝑖 |𝑥 𝑖 , 𝑤, 𝑏)
𝑖=1
𝑁

This is concave in 𝑤 and 𝑏: take

derivatives and solve!
10
Learning the Weights
𝑁

ℓ 𝑤, 𝑏 = ln ෑ 𝑝(𝑦 𝑖 |𝑥 𝑖 , 𝑤, 𝑏)
𝑖=1
𝑁

No closed form solution 

11
Learning the Weights

• Can apply gradient ascent to maximize the conditional likelihood

𝑁
𝜕ℓ 𝑦 𝑖 +1
=෍ − 𝑝(𝑌 = 1|𝑥 𝑖 , 𝑤, 𝑏)
𝜕𝑏 2
𝑖=1
𝑁
𝑖
𝜕ℓ (𝑖) 𝑦 +1
= ෍ 𝑥𝑗 − 𝑝(𝑌 = 1|𝑥 𝑖 , 𝑤, 𝑏)
𝜕𝑤𝑗 2
𝑖=1

12
Priors
• Can define priors on the weights to prevent overfitting

• Normal distribution, zero mean, identity covariance

1 𝑤𝑗2
𝑝 𝑤 =ෑ exp − 2
2𝜋𝜎 2 2𝜎
𝑗

• “Pushes” parameters towards zero

• Regularization

• Helps avoid very large weights and overfitting

13
Priors as Regularization
• The log-MAP objective with this Gaussian prior is then

𝑁 𝑁
𝑖 𝑖 𝑖 𝑖
𝜆 2
ln ෑ 𝑝 𝑦 𝑥 , 𝑤, 𝑏 𝑝 𝑤 𝑝(𝑏) = ෍ ln 𝑝 𝑦 𝑥 , 𝑤, 𝑏 − 𝑤 2
2
𝑖=1 𝑖

• Quadratic penalty: drives weights towards zero

• Adds a negative linear term to the gradients

• Different priors can produce different kinds of regularization

14
Priors as Regularization
• The log-MAP objective with this Gaussian prior is then

𝑁 𝑁
𝑖 𝑖 𝑖 𝑖
𝜆 2
ln ෑ 𝑝 𝑦 𝑥 , 𝑤, 𝑏 𝑝 𝑤 𝑝(𝑏) = ෍ ln 𝑝 𝑦 𝑥 , 𝑤, 𝑏 − 𝑤 2
2
𝑖=1 𝑖

• Quadratic penalty: drives weights towards zero

• Adds a negative linear term to the gradients

• Different priors can produce different kinds of

regularization
Somtimes called an ℓ2
regularizer

15
Regularization

ℓ1 ℓ2
16
Naïve Bayes vs. Logistic Regression
• Non-asymptotic analysis (for Gaussian NB)
• Convergence rate of parameter estimates as size of training
data tends to infinity (𝑛 = # of attributes in 𝑋)

• Naïve Bayes needs 𝑂(log 𝑛) samples

• NB converges quickly to its (perhaps less helpful)
asymptotic estimates
• Logistic Regression needs 𝑂(𝑛) samples
• LR converges more slowly but makes no
independence assumptions (typically less biased)

17
[Ng & Jordan, 2002]
NB vs. LR (on UCI datasets)

Naïve bayes Sample size 𝑚

Logistic Regression
18 [Ng & Jordan, 2002]
LR in General
• Suppose that 𝑦 ∈ {1, … , 𝑅}, i.e., that there are 𝑅 different class
labels

• Can define a collection of weights and biases as follows

• Choose a vector of biases and a matrix of weights such that

for 𝑦 ≠ 𝑅
exp 𝑏𝑘 + σ𝑖 𝑤𝑘𝑖 𝑥𝑖
𝑝 𝑌=𝑘𝑥 =
1 + σ𝑗<𝑅 exp 𝑏𝑗 + σ𝑖 𝑤𝑗𝑖 𝑥𝑖
and
1
𝑝 𝑌=𝑅𝑥 =
1 + σ𝑗<𝑅 exp 𝑏𝑗 + σ𝑖 𝑤𝑗𝑖 𝑥𝑖

Week 4 Logistic
No ratings yet
Week 4 Logistic
21 pages
Generalized Linear Model
No ratings yet
Generalized Linear Model
67 pages
Logistic Regression Explained
No ratings yet
Logistic Regression Explained
15 pages
Logistic Regression
No ratings yet
Logistic Regression
26 pages
7 Logistic-Regression
No ratings yet
7 Logistic-Regression
63 pages
04 - Linear-Classification-2024
No ratings yet
04 - Linear-Classification-2024
65 pages
Logistic Regression: Some Slides Adapted From Dan Jurfasky and Brendan O'Connor
No ratings yet
Logistic Regression: Some Slides Adapted From Dan Jurfasky and Brendan O'Connor
53 pages
Machine Learning for Mechanics
No ratings yet
Machine Learning for Mechanics
19 pages
Machine Learning - Unit 2
No ratings yet
Machine Learning - Unit 2
104 pages
ML Classifiers & Regression Guide
No ratings yet
ML Classifiers & Regression Guide
46 pages
Cheatsheet Supervised Learning
100% (1)
Cheatsheet Supervised Learning
4 pages
Supervised Learning Cheatsheet
No ratings yet
Supervised Learning Cheatsheet
4 pages
Intro to Logistic Regression
No ratings yet
Intro to Logistic Regression
4 pages
Lecture 11 Logistic-Marked
No ratings yet
Lecture 11 Logistic-Marked
27 pages
Ch03 LogisticRegression
No ratings yet
Ch03 LogisticRegression
79 pages
Logistic Regression
No ratings yet
Logistic Regression
78 pages
ESGB - Naive Bayes and Logistic Regression
No ratings yet
ESGB - Naive Bayes and Logistic Regression
36 pages
Basic Supervised ML Algorithms
No ratings yet
Basic Supervised ML Algorithms
34 pages
Machine Learning and Pattern Recognition Week 10 - Bayes - Logistic - Regression
No ratings yet
Machine Learning and Pattern Recognition Week 10 - Bayes - Logistic - Regression
4 pages
4 Linear Regression Additional Notes
No ratings yet
4 Linear Regression Additional Notes
8 pages
Multimedia Application L9
No ratings yet
Multimedia Application L9
43 pages
2021 Logistic Regression
No ratings yet
2021 Logistic Regression
33 pages
Lecture Notes 6 Logistic Regression
No ratings yet
Lecture Notes 6 Logistic Regression
8 pages
Lecture3 Logistic Regression Regularization
No ratings yet
Lecture3 Logistic Regression Regularization
39 pages
3-LG Eval
No ratings yet
3-LG Eval
52 pages
ML Classification Trupesh Patel
No ratings yet
ML Classification Trupesh Patel
39 pages
Logistic Regression and Naive Bayes
No ratings yet
Logistic Regression and Naive Bayes
4 pages
Lec18 Logistic Regression
No ratings yet
Lec18 Logistic Regression
17 pages
Support Vector Machines Vs Logistic Regression: Kevin Swersky University of Toronto CSC2515 Tutorial
No ratings yet
Support Vector Machines Vs Logistic Regression: Kevin Swersky University of Toronto CSC2515 Tutorial
23 pages
CS229 Supplemental Lecture Notes: 1 Binary Classification
No ratings yet
CS229 Supplemental Lecture Notes: 1 Binary Classification
7 pages
Credit Card Fraud Detection Using Logistic Regression
No ratings yet
Credit Card Fraud Detection Using Logistic Regression
36 pages
Logistic Regression Explained
No ratings yet
Logistic Regression Explained
16 pages
Logistic Regression for NLP
No ratings yet
Logistic Regression for NLP
64 pages
Logistic Regression
No ratings yet
Logistic Regression
91 pages
5 LR Apr 7 2021
No ratings yet
5 LR Apr 7 2021
94 pages
ECE 449 Notes
No ratings yet
ECE 449 Notes
5 pages
Lecture3 Linear Classifiers
No ratings yet
Lecture3 Linear Classifiers
36 pages
EE353 - 769 08 Linear Classification
No ratings yet
EE353 - 769 08 Linear Classification
22 pages
07 - Bayesian Learning
No ratings yet
07 - Bayesian Learning
55 pages
Regression vs Classification Algorithms
100% (1)
Regression vs Classification Algorithms
13 pages
ML Classifiers with Scikit-learn
No ratings yet
ML Classifiers with Scikit-learn
43 pages
23 LogisticRegression
No ratings yet
23 LogisticRegression
67 pages
Machine Learning UNIT-2: Logistic Regression
No ratings yet
Machine Learning UNIT-2: Logistic Regression
12 pages
Logisticregression 2021
No ratings yet
Logisticregression 2021
78 pages
Binary Classification and Logistic Regression
No ratings yet
Binary Classification and Logistic Regression
7 pages
ML - MU - Unit - 2 - Supervised Learning-Classification Techniques
No ratings yet
ML - MU - Unit - 2 - Supervised Learning-Classification Techniques
153 pages
AML - Lecture 3 Logistic Regression. Neural Networks
No ratings yet
AML - Lecture 3 Logistic Regression. Neural Networks
59 pages
MIT18 657F15 LecNote PDF
No ratings yet
MIT18 657F15 LecNote PDF
194 pages
Lec 05
No ratings yet
Lec 05
53 pages
Notes 05
No ratings yet
Notes 05
51 pages
Lecture 05
No ratings yet
Lecture 05
5 pages
Murphy Book Solution
No ratings yet
Murphy Book Solution
100 pages
5 LR Apr 7 2021
No ratings yet
5 LR Apr 7 2021
93 pages
Machine Learning Unit 2 Que and Ans
No ratings yet
Machine Learning Unit 2 Que and Ans
16 pages
ML Algo
No ratings yet
ML Algo
36 pages
Unit 3
No ratings yet
Unit 3
9 pages
Lecture 5 - Logistic Regression
No ratings yet
Lecture 5 - Logistic Regression
28 pages
Output 25
No ratings yet
Output 25
8 pages
Lecture 14 CLT Marked
No ratings yet
Lecture 14 CLT Marked
33 pages
DBMS Notes 12:07
No ratings yet
DBMS Notes 12:07
1 page
Spring 2025 Academic Calendar
No ratings yet
Spring 2025 Academic Calendar
2 pages
Lect1 2
No ratings yet
Lect1 2
76 pages
Syllabus 6313 S2025
No ratings yet
Syllabus 6313 S2025
5 pages
Unit I Signals and Systems
No ratings yet
Unit I Signals and Systems
3 pages
Assignment 03
No ratings yet
Assignment 03
6 pages
Recursion
No ratings yet
Recursion
24 pages
Example 1
No ratings yet
Example 1
2 pages
Sample Term Paper
No ratings yet
Sample Term Paper
7 pages
Lexical Analysis Finite Automata
No ratings yet
Lexical Analysis Finite Automata
12 pages
A Novel Two Stage Hybrid Default Prediction Model 2022 Research in Internat
No ratings yet
A Novel Two Stage Hybrid Default Prediction Model 2022 Research in Internat
24 pages
Differential Pulse Code Modulation: 308201-Communication Systems 39
No ratings yet
Differential Pulse Code Modulation: 308201-Communication Systems 39
9 pages
ECE 6123 Advanced Signal Processing: 1 Filters
No ratings yet
ECE 6123 Advanced Signal Processing: 1 Filters
9 pages
Dip Cat 2
No ratings yet
Dip Cat 2
2 pages
Unit - 2 Deep Learning
No ratings yet
Unit - 2 Deep Learning
26 pages
4 Filters
No ratings yet
4 Filters
19 pages
Closest Points 1 X 1
No ratings yet
Closest Points 1 X 1
18 pages
A Short Tutorial On Reinforcement Learning: Review and Applications
No ratings yet
A Short Tutorial On Reinforcement Learning: Review and Applications
5 pages
Echo Cancellation System in VOIP Using MATLAB
No ratings yet
Echo Cancellation System in VOIP Using MATLAB
3 pages
Ek Thesis 2003 05 12 Final 2
No ratings yet
Ek Thesis 2003 05 12 Final 2
121 pages
Error Detection Unit 3
No ratings yet
Error Detection Unit 3
13 pages
Mnnnii
No ratings yet
Mnnnii
32 pages
Cassi - An Optimal Zeros Assignment Method For Solving Assignment Problems
No ratings yet
Cassi - An Optimal Zeros Assignment Method For Solving Assignment Problems
14 pages
Operations Research: Text Book: Operations Research: An Introduction by Hamdy A.Taha (Pearson Education) 8 Edition
No ratings yet
Operations Research: Text Book: Operations Research: An Introduction by Hamdy A.Taha (Pearson Education) 8 Edition
34 pages
Matlab Matrix Basics Guide
No ratings yet
Matlab Matrix Basics Guide
4 pages
Matlab 4
No ratings yet
Matlab 4
4 pages
Nbtree: A Naive Bayes/Decision-Tree Hybrid: Darin Morrison
No ratings yet
Nbtree: A Naive Bayes/Decision-Tree Hybrid: Darin Morrison
27 pages
IRis
No ratings yet
IRis
19 pages
Normalization With Decimal Scaling in Data Mining - Examples Data Mining
No ratings yet
Normalization With Decimal Scaling in Data Mining - Examples Data Mining
4 pages
Ii/Iv B.Tech. Degree Examinations, Nov/Dec-2015 First Semester Ce/Cse/Ec/Ee/Ei/It/Me Mathematics-Iii
No ratings yet
Ii/Iv B.Tech. Degree Examinations, Nov/Dec-2015 First Semester Ce/Cse/Ec/Ee/Ei/It/Me Mathematics-Iii
9 pages
TCS Eqps 2022
No ratings yet
TCS Eqps 2022
79 pages
DSP Lab: DTMF Signal Processing
100% (1)
DSP Lab: DTMF Signal Processing
3 pages
Digital Filter Design Techniques
No ratings yet
Digital Filter Design Techniques
31 pages
SHA-512 Hash Calculation Example
No ratings yet
SHA-512 Hash Calculation Example
11 pages

Lecture 11 Logistic

Uploaded by

Lecture 11 Logistic

Uploaded by

Logistic Regression

based on the slides of Nick Rouzzi and Vibhav Gogate

• Supervised learning via naive Bayes

• Use MLE to estimate a distribution 𝑝 𝑥, 𝑦 = 𝑝 𝑦 𝑝(𝑥|𝑦)

• Classify by looking at the conditional distribution, 𝑝(𝑦|𝑥)

• Today: logistic regression

• Assume a particular functional form, e.g., a linear classifier

• Can’t handle noisy labels

• Learn 𝑝(𝑦|𝑥) directly from the data

• Assume a particular functional

• This leads to a linear classification rule:

• To learn the weights, we maximize the conditional likelihood

𝑤 ∗ , 𝑏 ∗ = arg max ෑ 𝑝(𝑦 𝑖 |𝑥 𝑖 , 𝑤, 𝑏)

• For naive Bayes, we maximized the log-likelihood

This is concave in 𝑤 and 𝑏: take

No closed form solution 

• Can apply gradient ascent to maximize the conditional likelihood

• Normal distribution, zero mean, identity covariance

• “Pushes” parameters towards zero

• Helps avoid very large weights and overfitting

• Quadratic penalty: drives weights towards zero

• Adds a negative linear term to the gradients

• Different priors can produce different kinds of regularization

• Quadratic penalty: drives weights towards zero

• Adds a negative linear term to the gradients

• Different priors can produce different kinds of

• Naïve Bayes needs 𝑂(log 𝑛) samples

Naïve bayes Sample size 𝑚

• Can define a collection of weights and biases as follows

• Choose a vector of biases and a matrix of weights such that

You might also like