0% found this document useful (0 votes)

81 views19 pages

Logistic Regression

This document provides an overview of logistic regression. It discusses: 1) The logistic sigmoid function which links the predictor variables to the probability of class membership. It ranges between 0 and 1. 2) Logistic regression has fewer parameters than naive Bayes since it does not make independence assumptions between features. 3) Maximum likelihood estimation is used to determine the logistic regression parameters by minimizing the cross-entropy error function. 4) The gradient of the error function is derived and used in a simple sequential algorithm to iteratively update the weights through gradient descent.

Uploaded by

sarah alina

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

81 views19 pages

Logistic Regression

Uploaded by

sarah alina

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

Logistic Regression

Sargur N. Srihari
University at Buffalo, State University of New York
USA
Machine Learning Srihari

Topics in Linear Classification using

Probabilistic Discriminative Models
• Generative vs Discriminative
1. Fixed basis functions
2. Logistic Regression (two-class)
3. Iterative Reweighted Least Squares (IRLS)
4. Multiclass Logistic Regression
5. Probit Regression
6. Canonical Link Functions
2
Machine Learning Srihari

Topics in Logistic Regression

• Logistic Sigmoid and Logit Functions
• Parameters in discriminative approach
• Determining logistic regression parameters
– Error function
– Gradient of error function
– Simple sequential algorithm
– An example
• Generative vs Discriminative Training
– Naiive Bayes vs Logistic Regression
3
Machine Learning Srihari

Logistic Sigmoid and Logit Functions

• In two-class case, posterior of class C1
can be written as as a logistic sigmoid Logistic Sigmoid
of feature vector ϕ=[ϕ1,..ϕM]T σ(a)
p(C1|ϕ) = y(ϕ) = σ (wTϕ)
with p(C2|ϕ) = 1- p(C1|ϕ) a
Here σ (.) is the logistic sigmoid function Properties:
A. Symmetry
– Known as logistic regression
w
in statistics
σ (-a)=1-σ (a)
• Although a model for classification rather than
for regression B. Inverse
a=ln(σ /1-σ)
known as logit.
• Logit function: Also known as
log odds since
– It is the log of the odds ratio it is the ratio
• It links the probability to the predictor variables ln[p(C1|ϕ)/p(C2|ϕ)]
C. Derivative
dσ/da = σ (1-σ)
Machine Learning Srihari

Fewer Parameters in Linear

Discriminative Model
• Discriminative approach (Logistic Regression)
– For M -dim feature space ϕ:
– M adjustable parameters
• Generative based on Gaussians (Bayes/NB)
• 2M parameters for mean
• M(M+1)/2 parameters for shared covariance matrix
• Two class priors
• Total of M(M+5)/2 + 1 parameters
– Grows quadratically with M
• If features assumed independent (naïve Bayes) still 5
needs M+3 parameters
Machine Learning Srihari

Determining Logistic Regression parameters

• Maximum Likelihood Approach for Two classes

• For a data set (ϕn,tn) where tn ε {0,1} and
ϕn=ϕ (xn), n =1,..,N

• Likelihood function can be written as

{ }
1−tn
p(t | w) = ∏ yn 1 − yn
tn

n =1
where t =(t1,..,tN)T and yn= p(C1|ϕn)

yn is the probability that tn =1

6
Machine Learning Srihari

Error Fn for Logistic Regression

• Likelihood function is
N

{ }
1−tn
p(t | w) = ∏ yn 1 − yn
tn

n=1

• By taking negative logarithm we get the

Cross-entropy Error Function
N

{ }
E(w) = − ln p(t | w) = − ∑ tn ln yn + (1 − tn )ln(1 − yn )
n =1

where yn= σ (an) and an=wTϕn

• We need to minimize E(w)
At its minimum, derivative of E(w) is zero
So we need to solve for w in the equation
∇E(w) = 0 7
Machine Learning Srihari

What is Cross-entropy?
• Entropy of p(x) is defined as H(p) = − ∑ p(x)log p(x)
x

– If p(x=1| t)=t and p(x=0|t)=1-t then we can write

p(x)=tx(1-t)1-x
• Then Entropy of p(x) is H(p)=t logt+(1-t)log(1-t)
• Cross entropy of p(x) and q(x) is defined as
H(p,q) = − ∑ p(x)logq(x)
x

– If q(x=1| y)=y then H(p,q)= t log y + (1-t)log(1-y)

• In general H(p,q)=H(p)+DKL(p||q)
– where p(x)
DKL (p,q) = − ∑ p(x)log
x q(x)

8
Machine Learning Srihari

Gradient of Error Function

Error function
N

{
E(w) = − ln p(t | w) = − ∑ tn ln yn + (1 − tn )ln(1 − yn )
n =1
}
where yn= σ(wTϕn)

Using Derivative of logistic sigmoid dσ/da=σ(1-σ)

Gradient of the error function Proof of gradient expression
Let z = z1 + z 2
N
∇E(w) = ∑ yn − tn φn
n =1
( ) where z1 = t ln σ (wφ ) and z 2 = (1 − t)ln[1 − σ (wφ )]
dz1 tσ (wφ )[1 − σ (wφ )]φ dσ
= = σ (1− σ )
da
Error x Feature Vector dw σ (wφ ) d a
and Using dx
(ln ax) =
x
Contribution to gradient by data dz2 (1 − t)σ (wφ )[1 − σ (wφ )](−φ )
point n is error between target tn = 9
dw [1 − σ (wφ )]
and prediction yn= σ (wTφn) times basis φn dz
Therefore = (σ (wφ ) − t)φ
dw
Machine Learning Srihari

Simple Sequential Algorithm

• Given Gradient of error function
N

( )
∇E(w) = ∑ yn − tn φn
where y = σ(w ϕ )
n =1
n
T
n

• Solve using an iterative approach

w τ+1 = w τ − η∇En
• where
Takes precisely same form as
∇En = (yn − tn )φn
Gradient of Sum-of-squares
error for linear regression
Error x Feature Vector

Samples are presented one at a time in

which each each of the weight vectors is updated 10
Python Code for Logistic Regression
Sigmoid function to produce value between 0 and 1
def sigmoid(z): Prediction
return (1 / (1 + np.exp(-z)))

Loss and Cost function

Loss function is the loss for a training example
z
Cost is the loss for whole training set

Updating weights and biases

p is our prediction and y is correct value

Finding db and dw
Derivative wrt p à Derivative wrt z.

https://towardsdatascience.com/
logistic-regression-from-
very-scratch-ea914961f320
Logistic Regression Code in Python
use sci-kit learn to create a data set.
import sklearn.datasets
import matplotlib.pyplot as plt
import numpy as np
X, Y = sklearn.datasets.make_moons(n_samples=500, noise=.2)
X, Y = X.T, Y.reshape(1, Y.shape[0])
epochs = 1000
learningrate = 0.01
def sigmoid(z):
return 1 / (1 + np.exp(-z))
losstrack = []
m = X.shape[1]
w = np.random.randn(X.shape[0], 1)*0.01
b=0
for epoch in range(epochs):
z = np.dot(w.T, X) + b
p = sigmoid(z)
cost = -np.sum(np.multiply(np.log(p), Y) + np.multiply((1 - Y), np.log(1 - p)))/m
losstrack.append(np.squeeze(cost))
dz = p-Y
dw = (1 / m) * np.dot(X, dz.T)
db = (1 / m) * np.sum(dz)
w = w - learningrate * dw
b = b - learningrate * db
plt.plot(losstrack)

Prediction: From the code above, you find p. It will be between 0 and
Machine Learning Srihari

ML solution can over-fit

• Severe over-fitting for linearly
4

σ(a)
0

separable data !4

!4 !2 0 2 4 6 8

– Because ML solution occurs at σ = 0.5 a

• With σ>0.5 and σ< 0.5 for the two classes
• Solution equivalent to a=wTϕ = 0
– Logistic sigmoid becomes infinitely steep
• A Heavyside step function ||w||goes to ∞
– Solution
• Penalizing wts
• Recall in linear regression
N

{
∇En = −∑ tn − wT φ(xn )
n=1
} φ(x ) n
T
without reg
⎡ N ⎤
{
∇En = ⎢ − ∑ tn − wT φ(x n ) } φ(x ) n
T
⎥ + λw with reg 13
⎣ n=1 ⎦
14

An Example of 2-class Logistic Regression

• Input Data

ϕ0(x)=1, dummy feature

Initial Weight Vector, Gradient and

Hessian (2-class)
• Weight vector

• Gradient

• Hessian
16

Final Weight Vector, Gradient and

Hessian (2-class)
• Weight Vector

• Gradient

• Hessian

Number of iterations : 10

Error (Initial and Final): 15.0642, 1.0000e-009

Generative vs Discriminative Training
Variables x ={x1,..xM} and classifier target y

1. Generative: estimate parameters of variables independently

Naïve y For classification:
Simple estimation
Determine joint:
Bayes M
p(y,x ) = p(y)∏ p(x i | y) independently estimate M sets of parameters
i=1 But independence is usually false
From joint get required We can estimate M(M+1)/2 covariance matrix
x1 x2 xM conditional p(y|x)

2. Discriminative: estimate joint parameters wi

Potential Functions (log-linear) For classification:
⎧ M
⎫
ϕi(xi,y)=exp{wixi I{y=1}}, Unnormalized P(y = 1 | x ) = exp ⎨w 0 + ∑ wi xi ⎬
! ! = 0 | x ) = exp {0} = 1
P(y
⎩ i=1 ⎭
ϕ0(y)=exp{w0 I{y=1}}
⎧ M
⎫ ez
Normalized P(y = 1 | x ) = sigmoid ⎨w 0 + ∑ wi xi ⎬ where sigmoid(z) =
Naïve ⎩ ⎭ 1+ ez

Markov
y I has value 1 when
y=1, else 0 Logistic Regression
i=1

Jointly optimize M parameters multiclass

exp(ai )
More complex estimation but correlations p(yi | φ ) = yi (φ ) =
x1 x2 xM accounted for ∑ j exp(a j )
Can use much richer features: where aj=wjTϕ
Edges, image patches sharing same pixels
Machine Learning Srihari

Logistic Regression is a special

architecture of a neural network

18
Machine Learning Srihari

19
https://storage.ning.com/topology/rest/1.0/file/get/2408482975?profile=original

3-LG Eval
No ratings yet
3-LG Eval
52 pages
Exp 2
No ratings yet
Exp 2
7 pages
L14 Logistic Regression
No ratings yet
L14 Logistic Regression
22 pages
Lecture 07
No ratings yet
Lecture 07
26 pages
Unit 3-ML
No ratings yet
Unit 3-ML
99 pages
Lecture Notes 6 Logistic Regression
No ratings yet
Lecture Notes 6 Logistic Regression
8 pages
09 23ECE216 LogisticRegression
No ratings yet
09 23ECE216 LogisticRegression
40 pages
Classification-Introduction, Logistic Regression
No ratings yet
Classification-Introduction, Logistic Regression
26 pages
Generalized Linear Model
No ratings yet
Generalized Linear Model
67 pages
Logistic Regression Explained
No ratings yet
Logistic Regression Explained
41 pages
Day.12 Logistic Regression
No ratings yet
Day.12 Logistic Regression
8 pages
Lecture 5 - Logistic Regression
No ratings yet
Lecture 5 - Logistic Regression
28 pages
11logistic Regression in Machine Learning - GeeksforGeeks
No ratings yet
11logistic Regression in Machine Learning - GeeksforGeeks
4 pages
Logistic Regression
No ratings yet
Logistic Regression
12 pages
Logistic Regression Guide
No ratings yet
Logistic Regression Guide
23 pages
Logistic Regression Explained
No ratings yet
Logistic Regression Explained
25 pages
Deep Learning Week 204-4
No ratings yet
Deep Learning Week 204-4
1 page
04 - Linear-Classification-2024
No ratings yet
04 - Linear-Classification-2024
65 pages
7 Logistic-Regression
No ratings yet
7 Logistic-Regression
63 pages
Logistic Regression
No ratings yet
Logistic Regression
8 pages
Logistic Regression Explained
No ratings yet
Logistic Regression Explained
10 pages
Logistic Regression Overview
No ratings yet
Logistic Regression Overview
11 pages
Chp2 Logistic Regression
No ratings yet
Chp2 Logistic Regression
6 pages
Binary Logistic Regression 2
No ratings yet
Binary Logistic Regression 2
43 pages
ML Assignment
No ratings yet
ML Assignment
20 pages
ML (08-08-2024)
No ratings yet
ML (08-08-2024)
5 pages
Logistic Regression - Jaikrishna 2
No ratings yet
Logistic Regression - Jaikrishna 2
5 pages
Logistic Regression for Beginners
No ratings yet
Logistic Regression for Beginners
5 pages
Intro To Linear and Logistic Reg
No ratings yet
Intro To Linear and Logistic Reg
5 pages
06 LogisticRegression
No ratings yet
06 LogisticRegression
29 pages
Report Logistic Regression
No ratings yet
Report Logistic Regression
21 pages
Logistic Regression
No ratings yet
Logistic Regression
34 pages
Notes 05
No ratings yet
Notes 05
51 pages
Lecture 12 LR
No ratings yet
Lecture 12 LR
11 pages
M02Logistic Regression Logistic RegressioLogistic Regressionn
No ratings yet
M02Logistic Regression Logistic RegressioLogistic Regressionn
19 pages
AC-ED L04 - Logistic Regression, Regularization
No ratings yet
AC-ED L04 - Logistic Regression, Regularization
80 pages
Exp 2 121a1047 ML Lavanya Kurup Div C C3
No ratings yet
Exp 2 121a1047 ML Lavanya Kurup Div C C3
8 pages
P-2 M.L. M-I U-I Logistic Regression
No ratings yet
P-2 M.L. M-I U-I Logistic Regression
50 pages
DS203 2024 01 02 LogisticRegression
No ratings yet
DS203 2024 01 02 LogisticRegression
38 pages
Logistic Regression
No ratings yet
Logistic Regression
13 pages
13.logistic Regression
No ratings yet
13.logistic Regression
9 pages
Lec 02 LogisticReg
No ratings yet
Lec 02 LogisticReg
33 pages
Logistic Regression
No ratings yet
Logistic Regression
25 pages
W2 Ann
No ratings yet
W2 Ann
12 pages
Regression vs Classification Algorithms
100% (1)
Regression vs Classification Algorithms
13 pages
Machine Learning for Mechanics
No ratings yet
Machine Learning for Mechanics
19 pages
Task 1
No ratings yet
Task 1
7 pages
Logistic Regression
No ratings yet
Logistic Regression
6 pages
Module-2 - Logistic Regression in Machine Learning
No ratings yet
Module-2 - Logistic Regression in Machine Learning
28 pages
Exp3 ML
No ratings yet
Exp3 ML
4 pages
ML-Unit 4
No ratings yet
ML-Unit 4
29 pages
W8 - Logistic Regression
No ratings yet
W8 - Logistic Regression
18 pages
Logistic Regression Explained
No ratings yet
Logistic Regression Explained
15 pages
ML Lec 3
No ratings yet
ML Lec 3
4 pages
Logistic Regression
No ratings yet
Logistic Regression
74 pages
Logistic Regression (Probability Concepts) and Perceptron
No ratings yet
Logistic Regression (Probability Concepts) and Perceptron
20 pages
Logistic Regression
No ratings yet
Logistic Regression
21 pages
ML - LAB - BE CSE (DS) Final
No ratings yet
ML - LAB - BE CSE (DS) Final
110 pages
Logistic Regression
No ratings yet
Logistic Regression
78 pages
J Patter 2023 100776
No ratings yet
J Patter 2023 100776
17 pages
E3sconf Icimece2023 02041
No ratings yet
E3sconf Icimece2023 02041
6 pages
1 s2.0 S0048969722019520 Main
No ratings yet
1 s2.0 S0048969722019520 Main
13 pages
Data Mining - Cluster Analysis Basic Concepts and Algorithms
No ratings yet
Data Mining - Cluster Analysis Basic Concepts and Algorithms
98 pages
Enhanced Anomaly-Based Fault Detection System in E
No ratings yet
Enhanced Anomaly-Based Fault Detection System in E
19 pages
Rem Koolhaas
100% (1)
Rem Koolhaas
7 pages
Michael Faraday
No ratings yet
Michael Faraday
5 pages
MT Test 1 QP
No ratings yet
MT Test 1 QP
2 pages
Seminar Face Recognition Technology
No ratings yet
Seminar Face Recognition Technology
21 pages
Endemism: Definition, Types, and Examples
No ratings yet
Endemism: Definition, Types, and Examples
39 pages
Java 8 Features
No ratings yet
Java 8 Features
42 pages
9TH Mathematics Test CH 2
No ratings yet
9TH Mathematics Test CH 2
2 pages
Guidelines For Life Safety Plan (LSP) Submissions
No ratings yet
Guidelines For Life Safety Plan (LSP) Submissions
6 pages
Euceg Be Negativelist 0
No ratings yet
Euceg Be Negativelist 0
56 pages
Hydraulic Handpump
No ratings yet
Hydraulic Handpump
1 page
English 7 Curriuculum Map Quarter 1-3
100% (4)
English 7 Curriuculum Map Quarter 1-3
15 pages
Action Plan AP
No ratings yet
Action Plan AP
3 pages
M01 MCS Machine Installation and Commissioning TM
No ratings yet
M01 MCS Machine Installation and Commissioning TM
43 pages
IPDC 1 English Question Bank
No ratings yet
IPDC 1 English Question Bank
24 pages
Efficient Market Hypothesis in The Indian Stock Market: January 2020
No ratings yet
Efficient Market Hypothesis in The Indian Stock Market: January 2020
11 pages
Codename - Tenka (USA)
No ratings yet
Codename - Tenka (USA)
5 pages
Product Guide: Hyundai Construction Equipment
100% (1)
Product Guide: Hyundai Construction Equipment
26 pages
Applied Economics
No ratings yet
Applied Economics
11 pages
Mechanical Engineering Review 2 Fundamentals Thermodynamics
No ratings yet
Mechanical Engineering Review 2 Fundamentals Thermodynamics
5 pages
4shapes in Tide Pools
No ratings yet
4shapes in Tide Pools
7 pages
American Options Pricing Methods
No ratings yet
American Options Pricing Methods
9 pages
Social Media Management Contract
33% (3)
Social Media Management Contract
3 pages
T-Root Blades in A Steam Turbine Rotor A
No ratings yet
T-Root Blades in A Steam Turbine Rotor A
8 pages
TSS HD Suspension
No ratings yet
TSS HD Suspension
2 pages
Wang 等 - 2019 - A Memory-Efficient Sketch Method for Estimating Hi
No ratings yet
Wang 等 - 2019 - A Memory-Efficient Sketch Method for Estimating Hi
10 pages
Arabic Greetings for Beginners
No ratings yet
Arabic Greetings for Beginners
4 pages
Calculating Laterite Nickel Reserves Using The Inverse Distancing Weighteness Method
No ratings yet
Calculating Laterite Nickel Reserves Using The Inverse Distancing Weighteness Method
6 pages
Excel Tutorial PDF
No ratings yet
Excel Tutorial PDF
13 pages
Carrier Central User Manual Guide
No ratings yet
Carrier Central User Manual Guide
20 pages