KEMBAR78
ML Logistic Regression Module3 Final | PDF | Logistic Regression | Linear Regression
0% found this document useful (0 votes)
10 views22 pages

ML Logistic Regression Module3 Final

The document discusses binary logistic regression, a statistical method used to model relationships between independent variables and a binary dependent variable. It explains the mathematical foundations, including the odds and log-odds, and introduces the sigmoid function for estimating probabilities. Additionally, it covers the use of Maximum Likelihood Estimation (MLE) for estimating the model parameters.

Uploaded by

backupvedant2612
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views22 pages

ML Logistic Regression Module3 Final

The document discusses binary logistic regression, a statistical method used to model relationships between independent variables and a binary dependent variable. It explains the mathematical foundations, including the odds and log-odds, and introduces the sigmoid function for estimating probabilities. Additionally, it covers the use of Maximum Likelihood Estimation (MLE) for estimating the model parameters.

Uploaded by

backupvedant2612
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

Module 2 – Supervised Learning

Logistic Regression

▪ There are many important research topics for which the


dependent variable (y) is "limited" or “Categorical”
▪ For example: voting, morbidity or mortality, and participation
data is not continuous or distributed normally.
▪ Binary logistic regression is a type of regression analysis where
the
▪ dependent variable is a binary / dummy variable: coded 0 (failure)
or 1(success)
▪ Independent variable(s) is of any kind
Logistic
Regression
Like the multiple regression, logistic regression is a statistical analysis used to examine

relationships between independent variables (predictors) and a dependant variable

(criterion)

The main difference is in logistic regression, the criterion is nominal (predicting group

membership). For example, do age and gender predict whether one signs up for

swimming lessons (yes/no)


Logistic Regression
▪ Logistic Regression estimates the probability of an event
occurring, based on a given dataset of independent variables.
▪ It helps to understand the relationship between one or more
independent variables and a target variable
▪ Target / Dependent variable is bounded between 0 and 1
Logistic Regression
▪ We have a binary (dichotomous) response variable Y defined as

Y = 1 if “success” (“yes”)
0 if “failure” (“no”)

▪ π = Proportion of “Success”. We want to model the probability π for Y=1.

▪ In ordinary regression the model predicts the mean Y for any


combination of predictors. In logistic regression the model predicts the
true proportion of success, π, at any predictor value

# 𝑜𝑓 1 ′𝑠
𝜋= = Proportion of "success”
# 𝑜𝑓 𝑜𝑓 𝑡𝑟𝑖𝑎𝑙𝑠
Maths behind Logistic Regression
O dd s

 =
P(Yes)
is defined as odds of Yes
1−  P(No)

odds =
  =
odds
1−  1+ odds
Die Rolling
Event Prob Odds
even # 1/2 1 [or 1:1]
X>2 2/3 2 [or 2:1]

roll a 2 1/6 1/5 [or 1/5:1 or 1:5]


Maths behind Logistic Regression
➢ This restricts the ‘Odds’ value in the range of (0, +∞), but actually the variable
can take values over the entire range, (-∞, +∞)
➢ To incorporate this, we take ‘log of odds’, which has a range of (-∞, +∞)
𝑃
log = 𝛽0+𝛽1X
1−𝑃
➢ Taking exponential on both sides, we get,
𝑃
exp(log ) = exp(𝛽0+𝛽1X)
1−𝑃
𝑃
𝑒 log 1−𝑃
= 𝑒 (𝛽0 +𝛽1X)
𝑃
= 𝑒(𝛽0+𝛽1X)
1−𝑃
𝑃 = 𝑒(𝛽 0+𝛽 1X) - 𝑃𝑒(𝛽0+𝛽1X)
Maths behind Logistic Regression
➢ Dividing both sides by P, we get
(𝛽 0 + 𝛽 1 X)
1= 𝑒 - 𝑒(𝛽0 +𝛽1X)
𝑃
P[1 + 𝑒(𝛽 0 +𝛽 1 X)
] = 𝑒(𝛽 0 +𝛽 1 X)
𝑒 (𝛽 0 + 𝛽 1 X)
P=
1 + 𝑒 (𝛽 0 + 𝛽 1 X)

➢ Dividing by 𝑒(𝛽 0+𝛽 1X) , we get


1
P=
1 + 𝑒 −(𝛽 0 + 𝛽 1 X)
➢ This is the SIGMOID function, which has a S-shaped curve, ranging
between 0 and 1, for different values of (𝛽0+𝛽1X), as explained in next
slides
Two forms of Binary Logistic Regression

▪ X = Quantitative predictor Y = Binary response


π = proportion of success (Y=1) at any X

▪ Equivalent forms of the logistic regression model:

Logit form Probability form

𝜋 𝑒𝛽0+𝛽1X
𝑙𝑛 = 𝛽0+𝛽1X 𝜋=
1−𝜋 1+𝑒 𝛽 0 + 𝛽 1 X
Binary Logistic Regression Mod el – C o n t d …

▪ The logistic distribution constrains the estimated probabilities to lie


between 0 and 1.

▪ The estimated probability (𝜋 / P) is:

𝑒𝛽0+𝛽1X 1
𝜋= 𝜋=
1+𝑒 𝛽 0 + 𝛽 1 X 1+𝑒 −(𝛽 0 + 𝛽 1 X)
▪ If you let ′𝛽0+𝛽1X′ =0, then p = 0.50
▪ As ′𝛽0+𝛽1X′ gets really big (+∞), ‘P’ approaches 1
▪ As ′𝛽0+𝛽1X′ gets really small (-∞), ‘P’ approaches 0
Probability function
O d d s Ratio
A common way to compare two groups is to look at the ratio
of their odds

Odds1
Odds Ratio = OR =
Odds2
Note: Odds ratio (OR) is similar to relative risk (RR)

p1 1− p2
RR = OR = RR *
p2 1− p1

So when p is small, OR ≈ RR
Interpreting “Slope” using O d d s

When X is replaced by X + 1:

odds = e0 +1X


is replaced by

odds = e0 +1( X +1)


So the ratio is
0 +1 ( X +1)
e 0 +1( X +1)−( 0 +1 X ) 1
0 +1 X
=e =e
e
When we increase X by 1, the ratio of the new odds to the old odds
i. e., odds are multiplied by 𝑒𝛽1
Maximum Likelihood Estimation (MLE)
➢ The beta parameters or coefficients of logistic regression are estimated
using Maximum Likelihood estimation (MLE)
➢ This method tests the different values of beta through multiple
iterations to optimize for the best fit of ‘log odds’
➢ Cost function in Logistic regression:
➢ In Logistic regression ?Is a nonlinear function, given as

➢ As ?𝑖 is binary, each label can be interpreted as Bernoulli random variable,


which follows the Bernoulli distribution given as,
Maximum Likelihood Estimation (MLE)
➢ For the sigmoid function ‘p’, it can be given as,
𝑝=𝜎 𝛽𝑇 𝑥 𝑖 𝑦 (1 − 𝜎(𝛽𝑇𝑥𝑖))1−𝑦
➢ Likelihood function L(𝛽) is given as,
L(𝛽) = σ 𝑛𝑖=1 𝜎 𝛽𝑇𝑥𝑖 𝑦 (1 − 𝜎(𝛽𝑇𝑥𝑖))1−𝑦
➢ We need to find the value of 𝛽 which will maximise this likelihood f u n c t i o n
➢ For easier calculation, multiply both sides by ‘log’, to get the log- likelihood
function {LL(𝛽)}, given as,

log(L(𝛽) )= σ 𝑛𝑖=1 𝑦 ∗ log[𝜎 𝛽𝑇𝑥𝑖 ] + 1 − y ∗ log[(1 − 𝜎(𝛽𝑇𝑥𝑖)]


➢ In order to maximise the LL(𝛽), we can perform minimizing the [- LL(𝛽)]. i.e.,
max[log 𝑥 ] = min[− log 𝑥 ]
Cost function
🠶 Derivative of Sigmoid function:
Derivation of Cost function
Derivation of Cost function
Derivation of Cost function

Derivative of cost function: 𝜽𝒏𝒆𝒘 = 𝜽𝒐𝒍𝒅 − 𝜶[𝝈(𝜽𝑻𝒙) − 𝒚] 𝒙𝒋

You might also like