Logistic Regression
Dr. Dinesh Kumar Vishwakarma
Professor,
Department of Information Technology,
Delhi Technological University, Delhi
Logistic Regression: Intro
Logistic regression extends the ideas of linear
regression to the situation where the dependent
variable, 𝑌 , is categorical.
Now suppose the dependent variable y is binary.
It takes on two values “Success” (1) or “Failure” (0)
We are interested in predicting a y from a continuous
independent variable x.
This is the situation in which Logistic Regression is
used.
Logistic Regression
Linear vs Logistic
Example
Based CGPA of UG, a student will get the admission
in PG? Yes/No
The values of y are 1 (Success) or 0 (Failure). The
values of x range over a continuum. Raining or Not.
A categorical variable as divides the observations into
classes of a stock such as holding /selling / buying,
then categorical variable with 3 categories. “hold"
class, the “sell" class, and the “buy” class.
It can be used for classifying a new observation into
one of the classes, based on the values of its predictor
variables (called “classification").
Applications
Logistic regression is used in applications such as:
Classifying customers as returning or non-returning
(classification)
Finding factors that differentiate between male and female
top executives (profiling)
Predicting the approval or disapproval of a loan based on
information such as credit scores (classification).
Popular examples of binary response outcomes are
success/failure, yes/no, buy/don't buy, default/don't default,
and survive/die.
We code the values of a binary response Y as 0 and 1.
Introduction Logistic Regression
Most important model for categorical
response (yi) data
Categorical response with 2 levels (binary: 0
and 1)
Categorical response with ≥ 3 levels (nominal
or ordinal)
Predictor variables (xi) can take on any form:
binary, categorical, and/or continuous.
Logistic Curve
1.0
0.9
0.8
Probability
0.7 Sigmoid cure
0.6
0.5
0.4
0.3
0.2
0.1
0.0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
x
Logistic Function
Logistic Function
1.0
e o 1 X
P (" Success"| X )
0.8
1 e o 1 X
P(“Success”|X)
0.6
0.4
0.2
0.0
X
Logit Transformation
The logistic regression model is given by
o 1 X
e
P (Y | X )
1 e o 1 X
which is equivalent to
P (Y | X )
ln o 1 X
1 P (Y | X )
This is called the
Logit Transformation
Logit Transformation
Logistic regression models transform
probabilities called logits.
pi
logit( pi ) log
where 1 pi
•i indexes all cases (observations).
• pi is the probability the event (a sale, for
example) occurs in the ith case.
• log is the natural log (to the base e).
Comparing LP and Logit Models
LP Model
Logit Model
0
Assumption
pii
P (pi )
Logit
Transform
Predictor Predictor
Logistic regression model with a
single continuous predictor
𝑙𝑜𝑔𝑖𝑡 𝑝𝑖 = log 𝑜𝑑𝑑𝑠 = 𝛽0 + 𝛽1 𝑋1
Where 𝑙𝑜𝑔𝑖𝑡 𝑝𝑖 logit transformation of the
probability of the event.
0 intercept of the regression line
1 slope of the regression line
The logistic Regression Model
Let p denote P[y = 1] = P[Success]. This
quantity will increase with the value of x.
The ratio:
p
is called the odds ratio
1 p
This quantity will also increase with the value of
x, ranging from zero to infinity.
p
The quantity: ln
1 p
is called the log odds ratio
Example: odds ratio, log odds ratio
Suppose a die is rolled:
Success = “roll a six”, p = 1/6
1 1
p 1
The odds ratio 61 6
1 p 1 6 5
6 5
The log odds ratio
p 1
ln ln ln 0.2 1.69044
1 p 5
The logistic Regression Model
Assumes the log odds ratio is linearly
related to x.
p
i. e. : ln 0 1 x
1 p
In terms of the odds ratio
p 0 1 x
e
1 p
The logistic Regression Model
Solving for p in terms x.
p
e 0 1x
1 p
pe 0 1 x
1 p
p pe 0 1x e 0 1x
0 1 x
e
or p 0 1 x
1 e
Interpretation of the parameter 0
• determines the intercept
1
0.8
p 0.6
0.4 0
e
0
0.2 1 e
0
0 2 4 6 8 10
x
Interpretation of the parameter 1
• determines when p is 0.50 (along with 0)
0.8 0 1 x
e 1 1
p p 0 1 x
0.6 1 e 11 2
0.4 when
0
0.2 0 1 x 0 or x
1
0
0 2 4 6 8 10
x
Interpretation of the parameter 1…
dp d e 0 1 x
Also
dx dx 1 e 0 1 x
e 0 1 x 1 1 e 0 1 x e 0 1 x 1e 0 1 x
1 e
2
0 1 x
e 0 1 x 1 1 0
1 e
0 1 x 2 4 when x
1
1 is the rate of increase in p with respect to x
4 when p = 0.50
Interpretation of the parameter 1
determines slope when p is 0.50
0.8
p 0.6 1
slope
0.4 4
0.2
0
0 2 4 6 8 10
x
Binary Classification
In logistic regression we take two steps:
First step yields estimates of the probabilities of belonging to
each class. In the binary case we get an estimate of P(Y = 1).
the probability of belonging to class 1 (which also tells us the
probability of belonging to class 0).
In the next step we use
a cutoff value on these probabilities in order to classify each case
to one of the classes.
a cutoff of 0.5 means that cases with an estimated probability of
P(Y = 1) > 0.5 are classified as belonging to class 1,
whereas cases with P(Y = 1) < 0.5 are classified as belonging to
class 0.
The cutoff need not be set at 0.5.
Types of Logistic Regression
Binary Logistic Regression
The categorical response has only two 2 possible
outcomes. Example: Spam or Not
Multinomial Logistic Regression
Three or more categories without ordering.
Example: Predicting which food is preferred more
(Veg, Non-Veg, Vegan)
Ordinal Logistic Regression
Three or more categories with ordering. Example:
Movie rating from 1 to 5
Cost Function
Gradient Descent
Now the question arises, how do we reduce the cost
value. Well, this can be done by using Gradient
Descent.
The main goal of Gradient descent is to minimize the
cost value. i.e. min J(θ).
Now to minimize our cost function we need to run the
gradient descent function on each parameter i.e.
Gradient Descent…
Objective: To minimize the cost function we
have to run the gradient descent function on
each parameter