1.
Introduction
Logistic Regression is one of the most popular algorithms in supervised
machine learning. It is used to predict the probability of a binary outcome,
such as yes/no, true/false, or success/failure, based on input variables.
Despite its simplicity, logistic regression is widely used in domains like
medical diagnosis, marketing, and social sciences. This assignment explores
logistic regression in depth—its theory, use cases, evaluation, and coding.
2. What is Logistic Regression?
Logistic Regression is a statistical method used for binary classification
problems. It estimates the probability that an instance belongs to a certain
class.
While linear regression predicts continuous outcomes, logistic regression
maps predictions to probabilities using the sigmoid function and classifies
the result as 0 or 1 based on a decision boundary (usually 0.5).
3. Mathematical Foundation
In logistic regression, we compute a weighted sum of input features:
Z = b_0 + b_1x_1 + b_2x_2 + \dots + b_nx_n
We then apply the sigmoid function:
\sigma(z) = \frac{1}{1 + e^{-z}}
This converts the output into a probability between 0 and 1. If the result is ≥
0.5, we classify it as 1 (positive class); otherwise, 0.
4. Sigmoid Function & Decision Boundary
The sigmoid function is:
\sigma(z) = \frac{1}{1 + e^{-z}}
At z = 0, output is 0.5
As z → ∞, output → 1
As z → -∞, output → 0
This output probability helps classify inputs. The decision boundary is the
point where the model decides between class 0 and class 1 (usually at 0.5).
5. Types of Logistic Regression
1. Binary Logistic Regression:
Used when the output has two categories.
Example: Is a customer likely to buy? Yes or No.
2. Multinomial Logistic Regression:
Used when the outcome has more than two unordered categories.
Example: Classifying animals as Dog, Cat, or Rabbit.
3. Ordinal Logistic Regression:
Used when the outcome has ordered categories.
Example: Rating a product as Poor, Average, Good, Excellent.
6. Assumptions of Logistic Regression
1. The dependent variable is binary or ordinal.
2. The model assumes a linear relationship between independent
variables and log-odds.
3. Observations are independent.
4. There is little to no multicollinearity among independent variables.
5. Large sample size improves accuracy.
6. Applications in Real Life
Medical: Predicting if a patient has a disease.
Finance: Classifying if a transaction is fraudulent.
Marketing: Customer segmentation (likely to purchase or not).
Social Media: Spam detection.
Education: Predicting student dropout.
7. Logistic Regression vs Linear Regression
Feature Linear Regression Logistic Regression
Output Type Continuous Probability (0–1)
Used For Regression problems Classification problems
Function Used Linear Function Sigmoid Function
Output Range -∞ to +∞ 0 to 1
Cost Function Mean Squared Error (MSE) Log Loss / Cross Entropy
8. Evaluation Metrics
1. Accuracy:
\frac{TP + TN}{TP + TN + FP + FN}
2. Precision:
\frac{TP}{TP + FP}
3. Recall (Sensitivity):
\frac{TP}{TP + FN}
4. F1-Score:
Harmonic mean of precision and recall.
5. Confusion Matrix:
A matrix showing True Positives, False Positives, True Negatives, and False
Negatives.
9. Regularization in Logistic Regression
Regularization helps prevent overfitting:
L1 Regularization (Lasso):
Can shrink some weights to zero (feature selection).
L2 Regularization (Ridge):
Shrinks all weights but keeps all features.
The cost function with L2 becomes:
Loss = -\sum [y \log(p) + (1 – y) \log(1 – p)] + \lambda \sum w^2
Where λ is the regularization strength.
12. Real-world Projects
Customer Churn Prediction:
Predict whether a user will stop using a telecom service.
Loan Approval:
Classify if a loan application should be approved.
Email Spam Classifier:
Identify if an email is spam or not.
Heart Disease Prediction:
Predict if a person has heart disease based on medical data.
13. Advantages & Limitations
Advantages:
Simple and easy to implement.
Efficient for binary classification.
Interpretable output.
Works well with linearly separable data.
Limitations:
Poor performance on non-linear data.
Assumes linear relationship with log-odds.
Sensitive to irrelevant features and outliers.
14. Conclusion
Logistic regression is a foundational classification technique in machine
learning. It’s widely used due to its simplicity, efficiency, and ability to
handle binary outcomes effectively. With proper evaluation and
preprocessing, logistic regression can produce powerful results in real-world
applications across industries.
15. References
1. “Introduction to Machine Learning” by Ethem Alpaydin
2. Scikit-learn official documentation – https://scikit-learn.org
3. Coursera – Andrew Ng’s ML Course
4. Analytics Vidhya
5. GeeksforGeeks – Logistic Regression Tutorial
Would you like me to generate this as a Word or PDF file for easy printing or
submission?