0% found this document useful (0 votes)

18 views23 pages

ML2 Logistic Regression

The document provides a comprehensive overview of logistic regression, detailing its application in binary classification problems, the mathematical formulation, and the estimation of probabilities. It covers key concepts such as cost functions, performance metrics, and optimization techniques, including gradient descent. Additionally, it discusses the importance of ROC curves, AUC, and precision-recall metrics in evaluating model performance.

Uploaded by

Ajflash

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views23 pages

ML2 Logistic Regression

Uploaded by

Ajflash

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

Machine Learning to build Intelligent Systems

Manas Dasgupta
Understanding Logistic Regression
Structure of this Module

TOPICS
Understanding Logistic Regression through a
Classification Problem (Project) Introduction to Logistic Regression
Estimating Probabilities
Logistic Regression Cost Functions
Softmax Regression
Performance Metrics
ROC Curve and AUC
Optimising Logistic Regression Model
Understanding the Logit Model
• In statistics, the logistic model (or logit model) is used to model the probability of a certain class or event existing such as
pass/fail, win/lose, alive/dead or healthy/sick.

• Logistic regression is a statistical model that in its basic form uses a logistic function to model a binary dependent variable. In
regression analysis, logistic regression (or logit regression) is estimating the parameters of a logistic model (a form of binary
regression).

• Mathematically, a binary logistic model has a dependent variable with two possible values, such as pass/fail which is represented
by an indicator variable, where the two values are labelled "0" and "1". In the logistic model, the log-odds (the logarithm of the
odds) for the value labelled "1" is a linear combination of one or more independent variables ("predictors"); the independent
variables can each be a binary variable (two classes, coded by an indicator variable) or a continuous variable (any real value).

• The corresponding probability of the value labelled "1" can vary between 0 (certainly the value "0") and 1 (certainly the value "1"),
hence the labelling; the function that converts log-odds to probability is the logistic function.

• The defining characteristic of the logistic model is that increasing one of the independent variables multiplicatively scales the odds
of the given outcome at a constant rate.

• Outputs with more than two values are modelled by multinomial logistic regression and, if the multiple categories are ordered,
by ordinal logistic regression (for example the proportional odds ordinal logistic model).
Understanding the Logit Model
Let us try to understand logistic regression by considering a logistic model with given parameters, then seeing how the
coefficients can be estimated from data. Consider a model with two predictors, x1 and x2, and one binary (Bernoulli) response
variable Y, which we denote p = P(Y = 1).

We assume a linear relationship between the predictor variables and the log-odds (also called logit) of the event that Y = 1.
This linear relationship can be written in the following mathematical form (where l, is the log-odds, b is the base of the
logarithm, and 𝛽 are parameters of the model).

Here, Sb is the Sigmoid Function with

Log Odds base b.
The formula on the left shows that once
𝛽i is fixed, e can easily compute either
the log-odds that Y = 1 or a given
Odds observation.

The main use-case of a logistic model is

to be given an observation (x1, x2) and
estimate the probability p, that Y = 1. In
most applications, the base b, of the
logarithm is usually taken to be e (Euler
number, equivalent to 2.71828).
Understanding the Logit Model
The logistic function is a sigmoid function, which takes any real input t, and outputs a value between zero and one. For the
logit, this is interpreted as taking input log-odds and having output probability.

Here, t is a linear function of a single explanatory variable x, i.e., t is a linear combination of multiple explanatory variables is
treated similarly.

The general logistic function can be written as:

Has output Probability

Understanding the Logit Model
We consider an example with b (log-base) = 10, and coefficients 𝛽0 = -3, 𝛽1 = 1, 𝛽2 = 2.

The model will read as:

[ Where p is the probability of the event that Y = 1 ]

This can be interpreted as follows:

• 𝛽0 is the y-intercept. It is the log-odds of the event that Y = 1, when the predictors x1 = x2 = 0.
• When 𝛽1 = 1, increasing x1 by 1 increases the log-odds for Y = 1 by 1, i.e., the odds increase by a
factor of 101. Note that the probability of Y = 1 has also increased, but it has not increased by as
much as the odds have increased.
• When 𝛽2 = 2, increasing x2 by 1 increases the log-odds for Y = 1 by 2, i.e., the odds increase by a
factor of 102. Note how the effect of x2 on the log-odds is twice as great as the effect of x1, but
the effect on the odds is 10 times greater. But the effect on the probability of Y = 1 is not as much
as 10 times greater, it's only the effect on the odds that is 10 times greater.
Introduction to Logistic Regression
Let us look at an example of determining whether or not a person has diabetes based on Blood Sugar level reading.

Diabetic (1)

Non-Diabetic (0)
Problem statement:
Given a Blood Sugar value, say 210, what is the
probability of Diabetes being 1?
Introduction to Logistic Regression

Diabetic (1)
Plotting the Probabilities

Sigmoid Curve

Sigmoid Function

Non-Diabetic (0)

Challenge:

How can you find the best fit sigmoid curve?

How to find the combination of β0 and β1 which fits the data best.
Introduction to Logistic Regression

Cost Function?

The best fitting combination of β0 and β1 will be the one

which maximises the product:

(1−P1)(1−P2)(1−P3)(1−P4)(1−P6)(P5)(P7)(P8)(P9)(P10)

Maximum Likelihood Function

[(1−Pi)(1−Pi)------ for all non-diabetics --------]

* [(Pi)(Pi) -------- for all diabetics -------]
Introduction to Logistic Regression
Minimizing the Cost with Gradient Descent

Gradient descent is an iterative optimization algorithm, which

finds the minimum of a cost function.

In this process, it tries different values starting from a random

combination and update them to reach the optimal ones,
minimizing the output.

The update rule is the same as the one derived by using the
sum of the squared errors (MSE) in linear regression. As a
result, the same gradient descent formula is used for logistic
regression as well.

By iterating over the training samples until convergence, it

reaches the optimal parameters leading to minimum cost.
Odds and Log-Odds
Equation for logistic regression: Linearising Sigmoid Equation:

Note: The relationship between P and x is so

complex that it is difficult to understand what
kind of trend exists between the two. If you
increase x by regular intervals of, say, 10, how will
that affect the probability? Will it also increase by
some regular interval? If not, what will happen?
Logistic Regression using Python

In python, logistic regression can be implemented using libraries such as SKLearn and statsmodels,
though looking at the coefficients and the model summary is easier using statsmodels.

Python Demo
Logistic Regression using Python

ROC Curve
AUC
Confusion Matrix
Accuracy
Precision
Recall
Sensitivity
Specificity
Finding Optimum Probability
Classification Errors
When making a prediction for a binary or two-class classification problem, there are
two types of errors that we could make.

• False Positive. Predict an event when there was no event.

• False Negative. Predict no event when in fact there was an event.

By predicting probabilities and calibrating a threshold, a balance of these two concerns

can be chosen by the operator of the model.

For example, in a smog prediction system, we may be far more concerned with having
low false negatives than low false positives. A false negative would mean not warning
about a smog day when in fact it is a high smog day, leading to health issues in the
public that are unable to take precautions. A false positive means the public would
take precautionary measures when they didn’t need to.
Some Metrices
True Positive Rate = True Positives / (True Positives + False Negatives) F1 Score: that calculates the harmonic mean of the
precision and recall (harmonic mean because the
Sensitivity = True Positives / (True Positives + False Negatives) precision and recall are rates).

False Positive Rate = False Positives / (False Positives + True Negatives)

Specificity = True Negatives / (True Negatives + False Positives)

Confusion Matrix
False Positive Rate = 1 – Specificity

Positive Predictive Power = True Positives / (True Positives + False Positives)

Precision = True Positives / (True Positives + False Positives)

Recall = True Positives / (True Positives + False Negatives)

Sensitivity = True Positives / (True Positives + False Negatives)

Recall == Sensitivity
Some Metrices
The True Positive Rate (Sensitivity) is calculated as the number of true positives divided by the sum of the number of true
positives and the number of false negatives. It describes how good the model is at predicting the positive class when the
actual outcome is positive.

The False Positive Rate ( 1- Specificity) is calculated as the number of false positives divided by the sum of the number of
false positives and the number of true negatives.

The False Positive Rate is also referred to as the Inverted Specificity where specificity is the total number of true negatives
divided by the sum of the number of true negatives and false positives.

Precision is a ratio of the number of true positives divided by the sum of the true positives and false positives. It describes
how good a model is at predicting the positive class. Precision is referred to as the positive predictive value.

Recall is calculated as the ratio of the number of true positives divided by the sum of the true positives and the false
negatives. Recall is the same as sensitivity.
AUC-ROC Curve
AUC - ROC curve is a performance measurement for the classification ROC Curves summarize the trade-off between the true
problems at various threshold settings. ROC is a probability curve and positive rate and false positive rate for a predictive model
AUC represents the degree or measure of separability. It tells how using different probability thresholds.
much the model is capable of distinguishing between classes. Higher
the AUC, the better the model is at predicting 0 classes as 0 and 1
classes as 1. By analogy, the Higher the AUC, the better the model is at
distinguishing between patients with the disease and no disease.

The ROC curve is plotted with TPR against the FPR where TPR is on the
y-axis and FPR is on the x-axis.

• A great model has AUC closer to 1

• A poor model has AUC closer to 0
Plot Accuracy-Sensitivity-Specificity for various Probabilities
One of the Optimization challenges of a Classification problem is to determine the right
Probability Threshold. This is done visually by plotting the three key Metrics – Acruracy,
Sensitivity and Specificity against Probability Thresholds.

When the probability thresholds are very low, the sensitivity is very high and specificity
is very low. Similarly, for larger probability thresholds, the sensitivity values are very low
but the specificity values are very high. And at about 0.3, the three metrics seem to be
almost equal with decent values and hence, we choose 0.3 as the optimal cut-off point.
The following graph also showcases that at about 0.3, the three metrics intersect.

We could've chosen any other cut-off point as well based on which of these metrics you
want to be high. If you want to capture the ‘Positives' better, we could have let go of a
little accuracy and would've chosen an even lower cut-off and vice-versa. It is
completely dependent on the situation we are in. In this case, we just chose the
'Optimal' cut-off point to give you a fair idea of how the thresholds should be chosen.
Precision-Recall Curve
Precision-Recall is an useful measure of success of prediction when the classes Precision-Recall curves summarize the trade-off
are imbalanced. between the true positive rate and the positive
predictive value for a predictive model using different
The precision-recall curve shows the trade-off between precision and recall for probability thresholds.
different thresholds.

• A high area under the curve represents both high recall and high precision
• High precision relates to a low false positive rate
• High recall relates to a low false negative rate
• High scores for both show that the classifier is returning accurate results (high
precision), as well as returning a majority of all positive results (high recall)

A system with high recall but low precision will have most of its predicted labels
are incorrect. A system with high precision but low recall is just the opposite,
most of its predicted labels would be correct.

An ideal system with high precision and high recall will return many results, with
all results labelled correctly.

ROC curves are appropriate when the observations are balanced between each
class, whereas precision-recall curves are appropriate for imbalanced datasets.
Precision-Recall vs Thresholds
Similar to the sensitivity-specificity trade-off, we see that there is a trade-off
between precision and recall against thresholds.

As you can see, the curve is similar to what we got for sensitivity and
specificity. Except now, the curve for precision is quite jumpy towards the
end. This is because the denominator of precision, i.e. (TP+FP) is not
constant as these are the predicted values of 1s. And because the predicted
values can swing wildly, you get a very jumpy curve.

NOTE: This curve is useful when you would want to determine the
Threshold for class prediction based on Precision and Recall values.
Logistic Regression Steps
To to summarise, the steps that you performed throughout model building and model evaluation were:

• Data cleaning and preparation

• Combining three DataFrames
• Handling categorical variables
• Mapping categorical variables to integers
• Dummy variable creation
• Handling missing values
• Test-train split and scaling
• Model Building
• Feature elimination based on correlations
• Feature selection using RFE (Coarse Tuning)
• Manual feature elimination (using p-values and VIFs)
• Model Evaluation
• Accuracy
• Sensitivity and Specificity
• Optimal cut-off using ROC curve
• Precision and Recall
• Predictions on the test set
Hope you have liked this Video.
Please help us by providing your Ratings and Comments for this
Course!

Thank You!!
Manas Dasgupta

Happy Learning!!

Logistic Regression
No ratings yet
Logistic Regression
18 pages
Logistic Regression
No ratings yet
Logistic Regression
20 pages
Logistic Regression
No ratings yet
Logistic Regression
4 pages
Lecture 22. GLM
No ratings yet
Lecture 22. GLM
41 pages
Detailed Logistic Regression
No ratings yet
Detailed Logistic Regression
30 pages
Logistic Regression For Machine Learning Complete TutorialUnderstand This Popular Supervised Classifi
No ratings yet
Logistic Regression For Machine Learning Complete TutorialUnderstand This Popular Supervised Classifi
10 pages
09 23ECE216 LogisticRegression
No ratings yet
09 23ECE216 LogisticRegression
40 pages
Practical - Logistic Regression
No ratings yet
Practical - Logistic Regression
84 pages
Reference Material Logistic Regression
No ratings yet
Reference Material Logistic Regression
11 pages
Logistic Regression
No ratings yet
Logistic Regression
25 pages
Reference Material - Logistic - Regression
No ratings yet
Reference Material - Logistic - Regression
11 pages
Logistic Regressions
No ratings yet
Logistic Regressions
11 pages
11logistic Regression in Machine Learning - GeeksforGeeks
No ratings yet
11logistic Regression in Machine Learning - GeeksforGeeks
4 pages
Logistic Regression
No ratings yet
Logistic Regression
25 pages
Eml 24.7.25
No ratings yet
Eml 24.7.25
23 pages
A Simple But Effective Logistic Regression Derivation
No ratings yet
A Simple But Effective Logistic Regression Derivation
6 pages
Reference Material - Logistic - Regression
No ratings yet
Reference Material - Logistic - Regression
11 pages
Module 2
No ratings yet
Module 2
92 pages
Logistic Regression (Autosaved)
No ratings yet
Logistic Regression (Autosaved)
21 pages
ML Logistic Regression Module3 Final
No ratings yet
ML Logistic Regression Module3 Final
22 pages
Chap10 Logistic Regression
No ratings yet
Chap10 Logistic Regression
36 pages
Exp 2
No ratings yet
Exp 2
7 pages
Lecture 07
No ratings yet
Lecture 07
26 pages
Logistic Regression for Analysts
No ratings yet
Logistic Regression for Analysts
33 pages
13 Logistic Regression Main
No ratings yet
13 Logistic Regression Main
14 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
208 pages
Logistic Regression
No ratings yet
Logistic Regression
10 pages
Logistic Regression
No ratings yet
Logistic Regression
21 pages
Report Logistic Regression
No ratings yet
Report Logistic Regression
21 pages
Logistic Regression
No ratings yet
Logistic Regression
6 pages
W5S01 - PM-Logistic Regression
No ratings yet
W5S01 - PM-Logistic Regression
17 pages
Logistic Regression Lecture Notes
No ratings yet
Logistic Regression Lecture Notes
11 pages
Logistic Regression
No ratings yet
Logistic Regression
36 pages
What Is Logistic Regression
No ratings yet
What Is Logistic Regression
20 pages
Advanced Regression with GLMs
No ratings yet
Advanced Regression with GLMs
13 pages
Understanding Logit Model
No ratings yet
Understanding Logit Model
3 pages
Logistic Regresson
No ratings yet
Logistic Regresson
32 pages
Logistic Regression (Autosaved)
No ratings yet
Logistic Regression (Autosaved)
21 pages
Lecture Notes 6 Logistic Regression
No ratings yet
Lecture Notes 6 Logistic Regression
8 pages
Chap4 Logistic Regression
No ratings yet
Chap4 Logistic Regression
40 pages
2+logistic Regression
No ratings yet
2+logistic Regression
10 pages
Classification Basics
No ratings yet
Classification Basics
14 pages
Binary Logistic
No ratings yet
Binary Logistic
29 pages
Lecture 6 Logistic Regression
No ratings yet
Lecture 6 Logistic Regression
28 pages
Chapter 10 Logistic Reg (Python)
No ratings yet
Chapter 10 Logistic Reg (Python)
29 pages
Mathematics Behind Logistic Regression Model 1598272636
No ratings yet
Mathematics Behind Logistic Regression Model 1598272636
6 pages
Logisticregression
No ratings yet
Logisticregression
22 pages
Nisha Arora - Logistics Regression Using SPSS
No ratings yet
Nisha Arora - Logistics Regression Using SPSS
76 pages
Lec 16 - Logistic Regression
No ratings yet
Lec 16 - Logistic Regression
11 pages
Chap10 LogisticRegression
No ratings yet
Chap10 LogisticRegression
19 pages
4 - C - Logistic Regression
No ratings yet
4 - C - Logistic Regression
13 pages
Data Analytics Using R
No ratings yet
Data Analytics Using R
23 pages
Non-Linear Regression Guide
No ratings yet
Non-Linear Regression Guide
10 pages
Logistic Regression Basics
No ratings yet
Logistic Regression Basics
18 pages
Logistic Regression
No ratings yet
Logistic Regression
14 pages
Regression3 Slides
No ratings yet
Regression3 Slides
47 pages
Logistic Regression
100% (2)
Logistic Regression
30 pages
07 Logistics Regression
No ratings yet
07 Logistics Regression
23 pages
MLS - Logistic Regression
No ratings yet
MLS - Logistic Regression
13 pages
CS6ML Assignment1
No ratings yet
CS6ML Assignment1
4 pages
Shap
100% (1)
Shap
214 pages
HPLC SOP for Aminex 87-H & 87-P Columns
No ratings yet
HPLC SOP for Aminex 87-H & 87-P Columns
11 pages
Finance Thesis
100% (1)
Finance Thesis
28 pages
GDCCB Financial Performance Analysis
No ratings yet
GDCCB Financial Performance Analysis
25 pages
Chen Y, Xie J (2008) - Online Consumer Review Word of Mouth
No ratings yet
Chen Y, Xie J (2008) - Online Consumer Review Word of Mouth
16 pages
Design of Experiments (DOE) May Prove Useful.: 2/25/2017 Ronald Morgan Shewchuk 1
No ratings yet
Design of Experiments (DOE) May Prove Useful.: 2/25/2017 Ronald Morgan Shewchuk 1
116 pages
Logistic Regression
No ratings yet
Logistic Regression
31 pages
A Note On The Validity of Cross-Validation For Evaluating Time Series Prediction
No ratings yet
A Note On The Validity of Cross-Validation For Evaluating Time Series Prediction
17 pages
Kuo 2011 Associations of Family Centered Care
No ratings yet
Kuo 2011 Associations of Family Centered Care
12 pages
Machine Learning - Question Bank
No ratings yet
Machine Learning - Question Bank
45 pages
Project Paper: Vienna University of Economics and Business Master's Program in Finance & Accounting
No ratings yet
Project Paper: Vienna University of Economics and Business Master's Program in Finance & Accounting
27 pages
NFT Hydroponic Nutrition Control
No ratings yet
NFT Hydroponic Nutrition Control
11 pages
Question Bank-II B.com AF-23CFU19-Business Statistics-2024-2025 Even
No ratings yet
Question Bank-II B.com AF-23CFU19-Business Statistics-2024-2025 Even
6 pages
Multivariate Analysis Guide
No ratings yet
Multivariate Analysis Guide
7 pages
Forecasting Methods and Techniques
No ratings yet
Forecasting Methods and Techniques
27 pages
Numerai Competition EDA
No ratings yet
Numerai Competition EDA
8 pages
1020245-St Model Answer Winter - 2023 Examination
No ratings yet
1020245-St Model Answer Winter - 2023 Examination
34 pages
HRD Impact on Rangpur Banking
No ratings yet
HRD Impact on Rangpur Banking
15 pages
CVB Feed Table 2021 544
0% (1)
CVB Feed Table 2021 544
700 pages
3 Islamic Banks Disclosure, Shariah Governance and Financial Performance
No ratings yet
3 Islamic Banks Disclosure, Shariah Governance and Financial Performance
21 pages
BitlyCaseStudy BSD 4
No ratings yet
BitlyCaseStudy BSD 4
6 pages
Step by Step Econometric Modelling
No ratings yet
Step by Step Econometric Modelling
24 pages
Performance Evaluation PDF
No ratings yet
Performance Evaluation PDF
10 pages
AP Classroom Unit 2 FRQ Scoring Guide
100% (1)
AP Classroom Unit 2 FRQ Scoring Guide
13 pages
SVKM's Narsee Monjee Institute of Management Studies Name of School - SBM, Bangalore
No ratings yet
SVKM's Narsee Monjee Institute of Management Studies Name of School - SBM, Bangalore
3 pages
Batanero Borovcnik Statistics and Probability in High School Contents Preface
No ratings yet
Batanero Borovcnik Statistics and Probability in High School Contents Preface
8 pages
Journal of Business Research: María M. Feliciano-Cestero, Nisreen Ameen, Masaaki Kotabe, Justin Paul, Mario Signoret
No ratings yet
Journal of Business Research: María M. Feliciano-Cestero, Nisreen Ameen, Masaaki Kotabe, Justin Paul, Mario Signoret
22 pages
Harmon Group Assignment
No ratings yet
Harmon Group Assignment
2 pages
Book: Advanced Statistical Methods and Applications: Statistica March 2018
100% (2)
Book: Advanced Statistical Methods and Applications: Statistica March 2018
10 pages

ML2 Logistic Regression

Uploaded by

ML2 Logistic Regression

Uploaded by

Machine Learning to build Intelligent Systems

Here, Sb is the Sigmoid Function with

The main use-case of a logistic model is

The general logistic function can be written as:

Has output Probability

The model will read as:

[ Where p is the probability of the event that Y = 1 ]

This can be interpreted as follows:

How can you find the best fit sigmoid curve?

The best fitting combination of β0 and β1 will be the one

Maximum Likelihood Function

[(1−Pi)(1−Pi)------ for all non-diabetics --------]

Gradient descent is an iterative optimization algorithm, which

In this process, it tries different values starting from a random

By iterating over the training samples until convergence, it

Note: The relationship between P and x is so

• False Positive. Predict an event when there was no event.

By predicting probabilities and calibrating a threshold, a balance of these two concerns

False Positive Rate = False Positives / (False Positives + True Negatives)

Specificity = True Negatives / (True Negatives + False Positives)

Positive Predictive Power = True Positives / (True Positives + False Positives)

Precision = True Positives / (True Positives + False Positives)

Recall = True Positives / (True Positives + False Negatives)

Sensitivity = True Positives / (True Positives + False Negatives)

• A great model has AUC closer to 1

• Data cleaning and preparation

You might also like