Business Analytics
(Term 3, PGDM, Jan-March 2024)
--
Topic B
Topic: Probabilistic Prediction
(Classification – Logistic Regression
--
Rushikesh P Borse, PhD
Faculty – Analytics, Great Lakes Institute of Management, Chennai
Contents to be discussed…
• Regression
• Problems with linear regression
• Odds
• Logistic regression
• Case Study + Business Implications
• Implementation on Excel (Logit)
Where all do we need to predict a Class?
Different (Classification) Prediction Techniques
• Logistic Regression
• Artificial Neural network (ANN)
• Decision Trees
• Naïve Bayes
• Support Vector Machines (SVM)
• KNN classifier
• Random forest
• Ensemble learning techniques
• Deep Learning (CNN)
Different (Classification) Prediction Techniques
• Logistic Regression
• Artificial Neural network (ANN)
• Decision Trees
• Naïve Bayes
• Support Vector Machines (SVM)
• KNN classifier
• Random forest
• Ensemble learning techniques
• Deep Learning (CNN)
Application of Logistic Regression in…
• Analyze customer data and predict
whether or not they are likely to
purchase a particular product.
Application of Logistic Regression in…
• To Assess credit risk and determine the
probability of defaulting on a loan.
Application of Logistic Regression in…
• It can be used in healthcare to predict the
likelihood of a patient developing a certain
disease or condition.
Application of Logistic Regression in…
• To know how GPA, CAT score, and
number of AP classes taken impact
the probability of getting admission
into a particular university.
Application of Logistic Regression in…
• Predicting customer churn
Application of Logistic Regression in…
• Hotel Booking: trying to predict either
user will cancel the booking or not.
Application of Logistic Regression in…
• More applications…
For Regression
Need of logistic over linear Regression ?
But what if the data contains an outlier? Things would become shambles.
For Probabilistic prediction (classification )
• 1.
Logistic regression – The math behind it
• Z = b0+b1X1 – If one IV
• Z = b0+b1X1 + b2X2 + b3X3 +… - For multiple IVs
• P = exp(z) / (1+exp(z))
• P = 1 / (1+exp(-z))
• For any values of X and b results in P in interval [0,1]
Logistic regression- odds function
• P = exp(z) / (1+exp(z))
• P = 1 / (1+exp(-z))
• Odds = probability of success Vs probability of failure = p/(1-p)
= exp(z)
• Log odds = z
Relationship between Probability, odds ratio
Relation ship between Probability, odds ratio and log of odds ratio.
One verses the another…
Odds and Log odds
Odds and Odds ratio – An medical example
Odds and Odds ratio – An medical example
The odds that a person having cough/cold, also has fever =
The odds that a person not having cough/cold, also has fever =
∴ The objective odds =
This shows that the likelihood that a person having cough/cold
also has fever is 21 times more likely than that of a person not
having cough/cold.
Basic requirement of LR
• DV – Should be binary
• IV – should be uncorrelated to each other
(multi-collinearity should be less)
• Large sample sizes are good for LR
• Threshold is required to establish the forecast
• Works well, when data is linearly separable.
• Limitations
• Sensitive to outliers
• Only significant features should be used to construct LOR
model, otherwise incorrect predictions will happen
Implementation
• 1. on excel with admission dataset
• (with various column adjust, significance of parameters,
Interpretations)
• 2. on excel with predicting customer churn
• 3. Python Implementation (Admission)
Implementation
• Python Implementation (Disease Prediction - Diabetes)
Activity Time- Graded quiz
Thank you
rushikesh.p@greatlakes.edu.in