KEMBAR78
Logistic Regression | PDF | Statistical Classification | Loss Function
0% found this document useful (0 votes)
16 views9 pages

Logistic Regression

Uploaded by

lillyjoywin1235
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views9 pages

Logistic Regression

Uploaded by

lillyjoywin1235
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

11/23/24, 1:17 PM logistic Regression

https://amjadmajid.github.io/tutorials/LR.html 1/9
11/23/24, 1:17 PM logistic Regression

Logistic Regression
Classification
In a classification problem a classifier (a piece of software) maps the inputs to a small and discrete set of
outputs. For example, classifying incoming emails into (i) the spam folder, or (ii) the inbox folder.

In a Binary classification problem there are only two possible outputs. We can label these groups as 1
(positive class) and 0 (negative class), y ∈ {0, 1} . For instance, a binary classification problems can be

Email: Spam/Not Spam?


Online Transactions: Fradulent(yes/No)?
Tumor: Malignant/Benign?

Since classification is not a linear function, using linear regression to solve a classification can be a bad
choice.

Hypothesis representation
We are going to use logistic regression to solve calssification problems. Logistic regression hypothesis
accepts arbitrary inputs and bounds the output between zero and one, 0 ≤ y ≤ 1.

The logistic regression hypothesis function is constructed by feeding a sigmoid (or logistic) function a linear
model. A sigmoid function (g(z)
1
=
1+e − z
) has the following shape

By replacing z with a linear (or polynomial) function θ0 + θ1 x 1 + … (written in linear algebra notation as
θ x ) we construct the logistic regression hypothesis function,
T

1
h θ (x) =
T
−θ x
1 + e

As it can bee seen from the figure and the function form above our new hypothesis takes any number of
features and any input values and bounds the output between 0 and 1.

To turn this hypothesis function into a binary classifier we need to set a boundary and map some of the
output interval to 1 and the rest to 0. Choosing 0.5 as the boundary result in

any positive input will be mapped to the output 1 (positive class),


any negative input will be mapped to the output 0 (negrative class), and hθ (0) draws the decision
boundary and separates the area where y = 0 and where y = 1. The decision boundary itself can

https://amjadmajid.github.io/tutorials/LR.html 2/9
11/23/24, 1:17 PM logistic Regression

considered as part of the area where y = 0 or y = 1 , here we choose to make it belong to the area
where y = 1.

Cost Function
The squared error cost function, used with linear regression, results in many local minimum when applied to
logistic regression. This prevents gradient descent from finding the global minimum of the cost function.
Therefore, we will choose the following cost function which can be shown using maximum likelyhood theory
from that it is a convex function--a bow function with a single (global) minimum.
m
1 (i) (i) (i) (i)
J (θ) = − ∑[−y log(h θ (x )) + (1 − y ) log(1 − h θ (x ))]
m
i=1

The intuition behind choosing this cost function. Since the output y is either 1 or 0 this cost function will be in
one of the following two cases:

case 1: where y = 0

m
1 (i)
J (θ) = − ∑ log(1 − h θ (x ))
m
i=1

As we can see from the figure above, when hθ (x) output ≈ 0 , the error ≈ 0 also, and when the prediction
of hθ (x) approaches 1 the error approaches ∞

case 2: where y = 1

m
1 (i)
J (θ) = − ∑ log(h θ (x ))
m
i=1

Using analogous arguement we can see that, in case 2, when hθ (x) output ≈ 1 , the error ≈ 0 and when
h θ (x) output ≈ 0 , the error ≈ ∞

Gradient Descent
The Gradient Descent algorithm is

repeat:{

θj = θj − α J (θ)
∂ θj

https://amjadmajid.github.io/tutorials/LR.html 3/9
11/23/24, 1:17 PM logistic Regression

}.

We can work out the derivative1 </sup> using Calculus to get:

repeat:{
m
α (i) (i) (i)
θj = θj − ∑(h θ (x ) − y )x
j
m
i=1

}.

A vectorized implementation of Gradient Descent is:


α T

θ = θ − X (g(Xθ) − y )
m

1. Working out the derivative can be found though this here


(http://sambfok.blogspot.com/2012/08/partial-derivative-logistic-regression.html) and here
(https://math.stackexchange.com/questions/477207/derivative-of-cost-function-for-logistic-
regression)

Logistic Regression for Multi-class classification problem


Some examples of multi-class classification problems include

Email tagging: tag incoming emailt with labels like 'friends', 'work', 'hobby'.
Classifying patients with stuffy nose to fine, cold, and flu.

Using the One-vs-all method we can turn a multi-class classification prblem into a set of binary classification
problems. Let us consider the email tagging problem. We need to find, using the same procedure for the
binary classification problems, three hypotheses (i) one to classify friends vs the rest, (ii) one to classify work
vs the rest, and (iii) the last one to classify hobby vs the rest. Any testing input is fed to the three classifiers
(hypotheses). The hypothesis that produces the highest output value (as the output of a logistic regression
funciton is bounded between 1 and 0) classify the input to its group.

https://amjadmajid.github.io/tutorials/LR.html 4/9
11/23/24, 1:17 PM logistic Regression

Regularization
Overfitting refers to the problem of generating too complex hypothesis funciton that fits the training data very
well, but it does not generalize for new testing data becuase the hypothesis line is very wavy.

TODO use figures to explain

This terminology is applied to both linear and logistic regression. There are two main options to address the
issue of overfitting problem:

1) Reduce the number of features:

Manually select which features to keep.


Use a model selection algorithm (studied later in the course).

2) Regularization

Keep all the features, but reduce the magnitude of parameters θj .


Small values of θ leads to simpler hypothesis which is, in turn, less prone to overfitting.
Regularization works well when we have a lot of slightly useful features.

### Regularized cost function


m n
1
2
J (θ) = [ ∑ cost(h θ (x) − y) + λ ∑ θ ]
j
2m
i=1 j=1

where cost(hθ (x) − y) can be the part of cost function of linear regression or logistic regression.
λ is the regularization parameter that controls θ adaptation.

https://amjadmajid.github.io/tutorials/LR.html 5/9
11/23/24, 1:17 PM logistic Regression

Advanced Optimization Algorithms


In this section, we show how to use more advanced optimization algorithms like Conjugate gradient, BFGS,
and L-BFGS. Octave( or Matlab) provides well developed implementations of these algorithms. These
algorithms requires a function that compute the cost function and its partial derivatives. Consider the
following example to see how to apply these algorithms.

Example:

θ1
Minimize the following cost function J (θ) = ( θ 1 − 5)
2
+ ( θ 2 − 5)
2
over θ = [ ] , (min J (θ)).
θ
θ2

To use the advanced optimization algorithms we need to follow the following procedures:

1) write a function that takes θ and returns the value of the cost function and its partial derivatives at the
given θ .

function [Jval, gradient] = costFunction(theta)


Jval = (theta(1) - 5)^2 + (theta(2) -5)^2;
gradient = zeros(2,1);
gradient(1) = 2 * (theta(1) - 5);
gradient(2) = 2 * (theta(2) - 5);

2) Set the maximum number of iterations, the initial θ values and enable the gradient optimization.

options = optimset('GradObj', 'on', 'MaxIter', '100');


initialTheta = zeros(2,1)

3) Call fminunc (stands for function minimization unconstained). @costFunction is a poiner to the
costFunction() function.

[optTheta, functionVal, exitFlag]=fminunc(@costFunction, initialTheta, options);

https://amjadmajid.github.io/tutorials/LR.html 6/9
11/23/24, 1:17 PM logistic Regression

In [12]: ## Plotting the sigmoid function


import math as m
import matplotlib.pyplot as plt
import numpy as np

%matplotlib inline

y=[]
x=np.arange(-10,10,0.2)
for i in x:
y.append(1/(1+m.exp(-i)))

plt.figure(figsize=(8,3))
plt.plot(x,y)
plt.grid()
plt.savefig('sigmoid.png')

https://amjadmajid.github.io/tutorials/LR.html 7/9
11/23/24, 1:17 PM logistic Regression

In [24]: ## Plotting the log functions


import math as m
import matplotlib.pyplot as plt
import numpy as np

%matplotlib inline

y=[]
x=np.arange(0.01,1,0.01)
for i in x:
y.append(-m.log(i))

plt.figure(figsize=(8,5))
plt.plot(x,y)
plt.grid()
plt.xlabel(r'$h_{\theta}(x)$', fontsize=20)
plt.savefig('minusLog.png')

https://amjadmajid.github.io/tutorials/LR.html 8/9
11/23/24, 1:17 PM logistic Regression

In [25]: ## Plotting the log functions


import math as m
import matplotlib.pyplot as plt
import numpy as np

%matplotlib inline

y=[]
x=np.arange(0,1,0.01)
for i in x:
y.append(-m.log(1-i))

plt.figure(figsize=(8,5))
plt.plot(x,y)
plt.grid()
plt.xlabel(r'$h_{\theta}(x)$', fontsize=20)
plt.savefig('oneMinusLog.png')

https://amjadmajid.github.io/tutorials/LR.html 9/9

You might also like