03 Logistic Regression
Objectives
Define Logistic Regression
Differentiate Logistic Regression from Linear Regression
Differentiate the different types of Logistic Regression
Differentiate between MLE & LSM
Build a Logistic Regression model
Perform prediction using the model
Evaluate the performance of the model
Introduction
Logistic regression is a statistical method for predicting binary
classes.
The outcome or target variable is dichotomous in nature.
It predicts the probability of occurrence of a binary event
utilizing a logistic/sigmoid function.
For example, it can be used to classify whether a tumor is
benign or malignant, an email whether spam or not a spam, a
student will pass or fail in a course, etc.
The Sigmoid Function
𝑝=
1
1+𝑒 −𝑦
𝑦 = 𝑤0 + 𝑤1𝑋1 + 𝑤2𝑋2 + ⋯ + 𝑤𝑛𝑋𝑛
𝑝=
1
1+𝑒 −(𝑤0+𝑤1𝑋1+𝑤2𝑋2+⋯+𝑤𝑛𝑋𝑛
The Sigmoid Function
The sigmoid function, also called logistic function gives an ‘S’ shaped
curve that can take any real-valued number and map it into a value
between 0 and 1.
If the curve goes to positive infinity, y predicted will become 1, and if
the curve goes to negative infinity, y predicted will become 0.
If the output of the sigmoid function is more than 0.5, we can classify
the outcome as 1 or YES, and if it is less than 0.5, we can classify it as 0
or NO.
For example: If the output is 0.75, we can say in terms of probability
as: There is a 75 percent chance that patient will suffer from cancer.
Properties
The dependent variable in logistic regression follows Bernoulli
Distribution.
Estimation is done through maximum likelihood
No R Square, Model fitness is calculated through Concordance,
KS-Statistics.
Logistic Regression VS Linear Regression
Linear regression gives you a continuous output, but logistic
regression provides a constant output.
Linear regression is estimated using Ordinary Least Squares
(OLS) while logistic regression is estimated using Maximum
Likelihood Estimation (MLE) approach.
Logistic Regression VS Linear Regression
MLE VS OLS
The Maximum Likelyhood Estimation (MLE) is a "likelihood"
maximization method, while Ordinary Least Square (OLS) is a
distance-minimizing approximation method.
Maximizing the likelihood function determines the parameters that
are most likely to produce the observed data.
From a statistical point of view, MLE sets the mean and variance
as parameters in determining the specific parametric values for a
given model. This set of parameters can be used for predicting the
data needed in a normal distribution.
MLE VS OLS
Ordinary Least squares estimates are computed by fitting a
regression line on given data points that has the minimum sum of
the squared deviations (least square error).
Both are used to estimate the parameters of a linear regression
model.
MLE assumes a joint probability mass function, while OLS doesn't
require any stochastic assumptions for minimizing distance.
Types of Logistic Regression
Binary Logistic Regression: The target variable has only two
possible outcomes such as Spam or Not Spam, Cancer or No
Cancer.
Multinomial Logistic Regression: The target variable has three or
more nominal categories such as predicting the type of Wine.
Ordinal Logistic Regression: the target variable has three or more
ordinal categories such as restaurant or product rating from 1 to 5.
Advantages
doesn't require high computation power
easy to implement
easily interpretable
used widely by data analyst and scientist
it doesn't require scaling of features
Disadvantages
not able to handle a large number of categorical
features/variables.
vulnerable to over-fitting
can't solve the non-linear problem with the logistic regression that
is why it requires a transformation of non-linear features
will not perform well with independent variables that are not
correlated to the target variable and are very similar or correlated
to each other.
Demo: Build a Logistic Regression Model
Evaluation Metrics for Logistic Regression
Accuracy Score
Confusion Matrix
Precision
Recall
F1 Score
Receiver Operating Characteristic (ROC) Curve
Accuracy Score
It is the total number of correct predictions over the total number
of predictions
Not suitable for imbalance dataset
Confusion Matrix
A confusion matrix is a table that is used to evaluate the
performance of a classification model.
You can also visualize the performance of an algorithm.
The fundamental of a confusion matrix is the number of correct
and incorrect predictions are summed up class-wise.
Precision
Precision is the measure of how many observations our model
correctly predicted over the amount of correct and incorrect
predictions.
Precision is about being precise, i.e., how accurate your model is.
In other words, you can say, when a model makes a prediction,
how often it is correct.
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 =
𝑇𝑃
𝑇𝑃+𝐹𝐵
Recall
Recall is the measure of how many observations our model
correctly predicted over the total amount of observations.
𝑅𝑒𝑐𝑎𝑙𝑙 =
𝑇𝑃
𝑇𝑃+𝐹𝑁
F1 Score
If we put our focus into one score, we might end up neglecting the
other.
In order to combat this we can use the F1 Score, which strikes a
balance between the Precision and Recall scores.
𝐹1 𝑆𝑐𝑜𝑟𝑒 = 2 ∗
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 ∗𝑅𝑒𝑐𝑎𝑙𝑙
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛+𝑅𝑒𝑐𝑎𝑙𝑙
ROC Curve
Receiver Operating Characteristic(ROC) curve is a plot of the true
positive rate against the false positive rate.
It shows the tradeoff between sensitivity and specificity.
Demo: Evaluate a LR Model
References
Bagheri, R. (2019). ROC curve, a complete introduction. Retrieved from
https://towardsdatascience.com/roc-curve-a-complete-introduction-2f2da2e0434c
Galarnyk, M. (2017). Logistic regression using python. Retrieved from
https://towardsdatascience.com/logistic-regression-using-python-sklearn-numpy-
mnist-handwriting-recognition-matplotlib-a6b31e2b166a
Navlani, A. (2019). Understanding logistic regression in python tutorial. Retrieved
from https://www.datacamp.com/community/tutorials/understanding-logistic-
regression-python
Santos, M. (2020). Precision or Recall: Which should you use? Retrieved from
https://towardsdatascience.com/explaining-precision-vs-recall-to-everyone-
295d4848edaf
Suresh, A. (2020). What is a confusion matrix? Retrieved from
https://medium.com/analytics-vidhya/what-is-a-confusion-matrix-d1c0f8feda5