Logistic Regression in Machine Learning
In our previous discussion, we explored the fundamentals of machine learning and walked
through a hands-on implementation of Linear Regression. Now, let’s take a step forward and
dive into one of the first and most widely used classification algorithms — Logistic Regression
What is Logistic Regression?
Logistic regression is a supervised machine learning algorithm used for classification
tasks where the goal is to predict the probability that an instance belongs to a given class or
not. Logistic regression is a statistical algorithm which analyze the relationship between two
data factors. The article explores the fundamentals of logistic regression, it’s types and
implementations.
Logistic regression is used for binary classification where we use sigmoid function, that takes
input as independent variables and produces a probability value between 0 and 1.
For example, we have two classes Class 0 and Class 1 if the value of the logistic function for
an input is greater than 0.5 (threshold value) then it belongs to Class 1 otherwise it belongs to
Class 0. It’s referred to as regression because it is the extension of linear regression but is
mainly used for classification problems.
Key Points:
   •   Logistic regression predicts the output of a categorical dependent variable. Therefore,
       the outcome must be a categorical or discrete value.
   •   It can be either Yes or No, 0 or 1, true or False, etc. but instead of giving the exact value
       as 0 and 1, it gives the probabilistic values which lie between 0 and 1.
   •   In Logistic regression, instead of fitting a regression line, we fit an “S” shaped logistic
       function, which predicts two maximum values (0 or 1).
Types of Logistic Regression
On the basis of the categories, Logistic Regression can be classified into three types:
   1. Binomial: In binomial Logistic regression, there can be only two possible types of the
      dependent variables, such as 0 or 1, Pass or Fail, etc.
   2. Multinomial: In multinomial Logistic regression, there can be 3 or more possible
      unordered types of the dependent variable, such as “cat”, “dogs”, or “sheep”
   3. Ordinal: In ordinal Logistic regression, there can be 3 or more possible ordered types
      of dependent variables, such as “low”, “Medium”, or “High”.
Assumptions of Logistic Regression
We will explore the assumptions of logistic regression as understanding these assumptions is
important to ensure that we are using appropriate application of the model. The assumption
include:
   1. Independent observations: Each observation is independent of the other. meaning there
      is no correlation between any input variables.
   2. Binary dependent variables: It takes the assumption that the dependent variable must
      be binary or dichotomous, meaning it can take only two values. For more than two
      categories SoftMax functions are used.
   3. Linearity relationship between independent variables and log odds: The relationship
      between the independent variables and the log odds of the dependent variable should
      be linear.
   4. No outliers: There should be no outliers in the dataset.
   5. Large sample size: The sample size is sufficiently large
Understanding Sigmoid Function
So far, we’ve covered the basics of logistic regression but now let’s focus on the most important
function that forms the core of logistic regression.
   •   The sigmoid function is a mathematical function used to map the predicted values to
       probabilities.
   •   It maps any real value into another value within a range of 0 and 1. The value of the
       logistic regression must be between 0 and 1, which cannot go beyond this limit, so it
       forms a curve like the “S” form.
   •   The S-form curve is called the Sigmoid function or the logistic function.
   •   In logistic regression, we use the concept of the threshold value, which defines the
       probability of either 0 or 1. Such as values above the threshold value tends to 1, and a
       value below the threshold values tends to 0.
Sigmoid Function
Now we use the sigmoid function where the input will be z and we find the probability between
0 and 1. i.e. predicted y.
σ(z)=11+e−zσ(z)=1+e−z1
Sigmoid function
As shown above, the figure sigmoid function converts the continuous variable data into
the probability i.e. between 0 and 1.
   •   σ(z) σ(z) tends towards 1 as z→∞z→∞
   •   σ(z) σ(z) tends towards 0 as z→−∞z→−∞
   •   σ(z) σ(z) is always bounded between 0 and 1
where the probability of being a class can be measured as:
P(y=1) =σ(z)
P(y=0) =1−σ(z)
Equation of Logistic Regression:
The odd is the ratio of something occurring to something not occurring. it is different from
probability as the probability is the ratio of something occurring to everything that could
possibly occur. so odd will be:
Likelihood Function for Logistic Regression
Gradient of the log-likelihood function
To find the maximum likelihood estimates, we differentiate w.r.t w,
Terminologies involved in Logistic Regression
Here are some common terms involved in logistic regression:
   •   Independent variables: The input characteristics or predictor factors applied to the
       dependent variable’s predictions.
   •   Dependent variable: The target variable in a logistic regression model, which we are
       trying to predict.
   •   Logistic function: The formula used to represent how the independent and dependent
       variables relate to one another. The logistic function transforms the input variables into
       a probability value between 0 and 1, which represents the likelihood of the dependent
       variable being 1 or 0.
   •   Odds: It is the ratio of something occurring to something not occurring. it is different
       from probability as the probability is the ratio of something occurring to everything that
       could possibly occur.
   •   Log-odds: The log-odds, also known as the logit function, is the natural logarithm of
       the odds. In logistic regression, the log odds of the dependent variable are modeled as
       a linear combination of the independent variables and the intercept.
   •   Coefficient: The logistic regression model’s estimated parameters, show how the
       independent and dependent variables relate to one another.
   •   Intercept: A constant term in the logistic regression model, which represents the log
       odds when all independent variables are equal to zero.
   •   Maximum likelihood estimation: The method used to estimate the coefficients of the
       logistic regression model, which maximizes the likelihood of observing the data given
       the model
Code Implementation for Logistic Regression
So far, we’ve covered the basics of logistic regression with all the theoritical concepts, but now
let’s focus on the hands on code implementation part which makes you understand the logistic
regression more clearly. We will dicuss Binomial Logistic regression and Multinomial
Logistic Regression one by one.
Binomial Logistic regression:
Target variable can have only 2 possible types: “0” or “1” which may represent “win” vs “loss”,
“pass” vs “fail”, “dead” vs “alive”, etc., in this case, sigmoid functions are used, which is
already discussed above.
Importing necessary libraries based on the requirement of model. This Python code shows how
to use the breast cancer dataset to implement a Logistic Regression model for classification.
This code loads the breast cancer dataset from scikit-learn, splits it into training and testing
sets, and then trains a Logistic Regression model on the training data. The model is used to
predict the labels for the test data, and the accuracy of these predictions is calculated by
comparing the predicted values with the actual labels from the test set. Finally, the accuracy is
printed as a percentage.
Multinomial Logistic Regression:
Target variable can have 3 or more possible types which are not ordered (i.e. types have no
quantitative significance) like “disease A” vs “disease B” vs “disease C”.
In this case, the softmax function is used in place of the sigmoid function. Softmax function for
K classes will be:
Here, K represents the number of elements in the vector z, and i, j iterates over all the elements
in the vector.
Then the probability for class c will be:
In Multinomial Logistic Regression, the output variable can have more than two possible
discrete outputs. Consider the Digit Dataset.
How to Evaluate Logistic Regression Model?
So far, we’ve covered the implementation of logistic regression. Now, let’s dive into the
evaluation of logistic regression and understand why it’s important
Evaluating the model helps us assess the model’s performance and ensure it generalizes well
to new data
We can evaluate the logistic regression model using the following metrics:
   •    Accuracy: Accuracy provides the proportion of correctly classified instances.
   •    Precision: Precision focuses on the accuracy of positive predictions.
   •    Recall (Sensitivity or True Positive Rate): Recall measures the proportion of correctly
        predicted positive instances among all actual positive instances.
   •    F1 Score: F1 score is the harmonic mean of precision and recall.
   •    Area Under the Receiver Operating Characteristic Curve (AUC-ROC): The ROC
        curve plots the true positive rate against the false positive rate at various thresholds.
        AUC-ROC measures the area under this curve, providing an aggregate measure of a
        model’s performance across different classification thresholds.
   •    Area Under the Precision-Recall Curve (AUC-PR): Similar to AUC-ROC, AUC-PR
        measures the area under the precision-recall curve, providing a summary of a model’s
        performance across different precision-recall trade-offs.
Differences Between Linear and Logistic Regression
 Linear Regression                                  Logistic Regression
 Linear regression is used to predict the           Logistic regression is used to predict the
 continuous dependent variable using a given set    categorical dependent variable using a given set
 of independent variables.                          of independent variables.
 Linear regression is used for solving regression   It is used for solving classification problems.
 problem.
 In this we predict the value of continuous         In this we predict values of categorical variables
 variables
 In this we find best fit line.                     In this we find S-Curve.
 Least square estimation method is used for         Maximum likelihood estimation method is used
 estimation of accuracy.                            for Estimation of accuracy.
 The output must be continuous value, such as       Output must be categorical value such as 0 or 1,
 price, age, etc.                                   Yes or no, etc.
 It required linear relationship between It not required linear relationship.
 dependent and independent variables.
 There may be collinearity          between the There should be little to no collinearity between
 independent variables.                         independent variables.
Logistic Regression Formula
Unlike Linear Regression, which predicts a continuous output, Logistic Regression predicts a
probability using the sigmoid function:
To convert this probability into a classification decision:
Types of Logistic Regression
   •   Binary Logistic Regression → Outcome has two classes (e.g., Spam/Not Spam)
   •   Multinomial Logistic Regression → Outcome has more than two classes (e.g.,
       Low/Medium/High)
   •   Ordinal Logistic Regression → Ordered categories (e.g., Customer Satisfaction: Low
       < Medium < High)
Steps for Building a Logistic Regression Model
Step 1: Import Required Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
Step 2: Load Dataset
df = pd.read_csv("data.csv") # Replace with your dataset
print(df.head())
Step 3: Preprocess Data
   •   Handle missing values
   •   Convert categorical variables to numeric (One-Hot Encoding or Label Encoding)
   •   Scale features if necessary
df.dropna(inplace=True) # Drop missing values
df = pd.get_dummies(df, drop_first=True) # Convert categorical variables
Step 4: Split Data
X = df.drop("target", axis=1) # Independent variables
y = df["target"] # Dependent variable (0 or 1)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
Step 5: Train Logistic Regression Model
model = LogisticRegression()
model.fit(X_train, y_train)
Step 6: Make Predictions
y_pred = model.predict(X_test)
Step 7: Evaluate the Model
print("Accuracy:", accuracy_score(y_test, y_pred))
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred))
print("Classification Report:\n", classification_report(y_test, y_pred))
Model Interpretation
   •   Accuracy: Measures overall correctness of predictions
   •   Confusion Matrix: Shows True Positives (TP), False Positives (FP), True Negatives
       (TN), and False Negatives (FN)
   •   Precision & Recall: Important when dealing with imbalanced classes
   •   ROC Curve & AUC Score: Measures model's ability to distinguish between classes
from sklearn.metrics import roc_curve, auc
y_prob = model.predict_proba(X_test)[:, 1] # Get probability scores
fpr, tpr, _ = roc_curve(y_test, y_prob)
roc_auc = auc(fpr, tpr)
plt.plot(fpr, tpr, label=f"AUC = {roc_auc:.2f}")
plt.xlabel("False Positive Rate")
plt.ylabel("True Positive Rate")
plt.title("ROC Curve")
plt.legend()
plt.show()
When to Use Logistic Regression?
   •   When the target variable is categorical (binary or multinomial)
   •   When features and target have a log-odds linear relationship
   •   When you need interpretability over complex models like Neural Networks