Logistic Regression in
Machine Learning
Last Updated : 23 Jul, 2025
Logistic Regression is a supervised machine learning algorithm used for classification problems. Unlike linear
regression which predicts continuous values it predicts the probability that an input belongs to a specific class. It
is used for binary classification where the output can be one of two possible categories such as Yes/No,
True/False or 0/1. It uses sigmoid function to convert inputs into a probability value between 0 and 1. In this
article, we will see the basics of logistic regression and its core concepts.
Types of Logistic Regression
Logistic regression can be classified into three main types based on the nature of the dependent variable:
1. Binomial Logistic Regression: This type is used when the dependent variable has only two possible
   categories. Examples include Yes/No, Pass/Fail or 0/1. It is the most common form of logistic regression and
   is used for binary classification problems.
2. Multinomial Logistic Regression: This is used when the dependent variable has three or more possible
   categories that are not ordered. For example, classifying animals into categories like "cat," "dog" or "sheep." It
   extends the binary logistic regression to handle multiple classes.
3. Ordinal Logistic Regression: This type applies when the dependent variable has three or more
   categories with a natural order or ranking. Examples include ratings like "low," "medium" and "high." It takes
   the order of the categories into account when modeling.
Assumptions of Logistic Regression
Understanding the assumptions behind logistic regression is important to ensure the model is applied correctly,
main assumptions are:
1. Independent observations: Each data point is assumed to be independent of the others means there
   should be no correlation or dependence between the input samples.
2. Binary dependent variables: It takes the assumption that the dependent variable must be binary, means
   it can take only two values. For more than two categories SoftMax functions are used.
3. Linearity relationship between independent variables and log odds: The model assumes a
   linear relationship between the independent variables and the log odds of the dependent variable which
   means the predictors affect the log odds in a linear way.
4. No outliers: The dataset should not contain extreme outliers as they can distort the estimation of the logistic
   regression coefficients.
5. Large sample size: It requires a sufficiently large sample size to produce reliable and stable results.
Understanding Sigmoid Function
1. The sigmoid function is a important part of logistic regression which is used to convert the raw output of the
model into a probability value between 0 and 1.
2. This function takes any real number and maps it into the range 0 to 1 forming an "S" shaped curve called the
sigmoid curve or logistic curve. Because probabilities must lie between 0 and 1, the sigmoid function is perfect
for this purpose.
3. In logistic regression, we use a threshold value usually 0.5 to decide the class label.
     If the sigmoid output is same or above the threshold, the input is classified as Class 1.
     If it is below the threshold, the input is classified as Class 0.
This approach helps to transform continuous input values into meaningful class predictions.
How does Logistic Regression work?
Logistic regression model transforms the linear regression function continuous value output into categorical value
output using a sigmoid function which maps any real-valued set of independent variables input into a value
between 0 and 1. This function is known as the logistic function.
Suppose we have input features represented as a matrix:
                                                            ⎡ x11                            ...                   x1m ⎤
                                                                                                                       
                                                              x21                            ...                   x2m
                                                  X=                                                                        
                                                                     ⋮                    ⋱                          ⋮
                                                            ⎣ xn1                        ...                       xnm ⎦
                                                                                                                       
and the dependent variable is      Y having only binary value i.e 0 or 1.
                                                                 0                       if Class 1
                                                      Y ={
                                                                 1                       if Class 2
                                                                                                                          
then, apply the multi-linear function to the input variables X.
                                                                         n
                                                      z = (∑i=1 wi xi ) + b                           
Here xi is the ith observation of X, wi = [w1 , w2 , w3 , ⋯ , wm ] is the weights or Coefficient and bis the bias term
                                                                                                       
also known as intercept. Simply this can be represented as the dot product of weight and bias.
                                                            z =w⋅X +b
At this stage, z is a continuous value from the linear regression. Logistic regression then applies the sigmoid
function to z to convert it into a probability between 0 and 1 which can be used to predict the class.
Now we use the sigmoid function where the input will be z and we find the probability between 0 and 1. i.e.
predicted y.
                                                                                                1
                                                            σ(z) =                            1+e−z
                                                                                                                
                                                             Sigmoid function
As shown above the sigmoid function converts the continuous variable data into the probability i.e between 0 and
1.
   σ(z)    tends towards 1 as z→∞
   σ(z)    tends towards 0 as z → −∞
   σ(z)    is always bounded between 0 and 1
where the probability of being a class can be measured as:
                                                     P (y = 1) = σ(z)
                                                   P (y = 0) = 1 − σ(z)
Logistic Regression Equation and Odds:
It models the odds of the dependent event occurring which is the ratio of the probability of the event to the
probability of it not occurring:
                                                                 p(x)
                                                                1−p(x)            = ez
Taking the natural logarithm of the odds gives the log-odds or logit:
                                           p(x)
                                   log [           ]=z
                                         1 − p(x)
                                                    
                                           p(x)
                                   log [           ]=w⋅X +b
                                         1 − p(x)
                                                    
                                            p(x)
                                                    = ew⋅X+b ⋯ Exponentiate both sides
                                          1 − p(x)
                                                        
                                               p(x) = ew⋅X+b ⋅ (1 − p(x))
                                                                                                                     
                                                p(x) = ew⋅X+b − ew⋅X+b ⋅ p(x))
                              p(x) + ew⋅X+b ⋅ p(x)) = ew⋅X+b
                                   p(x)(1 + ew⋅X+b ) = ew⋅X+b
                                                         ew⋅X+b
                                                p(x) =
                                                       1 + ew⋅X+b
                                                                                             
then the final logistic regression equation will be:
                                                                         ew⋅X +b                     1
                                            p(X; b, w) =                1+ew⋅X +b
                                                                                         
                                                                                             =   1+e−w⋅X +b
                                                                                                              
This formula represents the probability of the input belonging to Class 1.
Likelihood Function for Logistic Regression
The goal is to find weights   w and bias b that maximize the likelihood of observing the data.
For each data point   i
   for y = 1, predicted probabilities will be: p(X;b,w) =p(x)
   for y = 0 The predicted probabilities will be: 1-p(X;b,w) = 1 − p(x)
                                                                n
                                           L(b, w) = ∏i=1 p(xi )yi (1 − p(xi ))1−yi
                                                                              
                                                                                     
                                                                                                       
                                                                                                                  
Taking natural logs on both sides:
                                                 n
                         log(L(b, w)) = ∑ yi log p(xi ) + (1 − yi ) log(1 − p(xi ))
                                                                                                                                                                                                                          
                                                 i=1
                                                  n
                                           = ∑ yi log p(xi ) + log(1 − p(xi )) − yi log(1 − p(xi ))
                                                                                                                                                                                                                                     
                                                 i=1
                                                  n                                                              n
                                                                                                                                                                                     p(xi )
                                           = ∑ log(1 − p(xi )) + ∑ yi log
                                                                                                                                                                                                      
                                                                                                                                                                                    1 − p(xi
                                                                                                                                                                                                                      
                                                                                                                                                                                                                                             
                                                 i=1                                                             i=1
                                                                                                                                                                                                                  
                                                  n                                                                                                n
                                           = ∑ − log 1 − e−(w⋅xi +b) + ∑ yi (w ⋅ xi + b)
                                                       
                                                                                                             
                                                                                                                                                                                                        
                                                 i=1                                                                               i=1
                                                  n                                                                        n
                                           = ∑ − log 1 + ew⋅xi +b + ∑ yi (w ⋅ xi + b)
                                                       
                                                                                                     
                                                                                                                                                                                            
                                                 i=1                                                                  i=1
This is known as the log-likelihood function.
Gradient of the log-likelihood function
To find the best   w and b we use gradient ascent on the log-likelihood function. The gradient with respect to each
weight   wj is:
            
                                                                                 n                                                                                                                            n
                                 ∂J (l(b, w)         1          w⋅xi +b
                                             =−∑              e         xij + ∑ yi xij                                                                          
                                    ∂wj          1 + ew⋅xi +b
                                                                                                                                                                                                                                
                                                                           i=n                                                                                                                        i=1
                                             
                                                                            n                                                                                  n
                                                               
                                                                   = − ∑ p(xi ; b, w)xij + ∑ yi xij
                                                                                                                                                                                             
                                                                           i=n                                                                                 i=1
                                                                       n
                                                                   = ∑ (yi − p(xi ; b, w))xij
                                                                                                                                                                  
                                                                       i=n
Terminologies involved in Logistic Regression
Here are some common terms involved in logistic regression:
1. Independent Variables: These are the input features or predictor variables used to make predictions
   about the dependent variable.
2. Dependent Variable: This is the target variable that we aim to predict. In logistic regression, the
   dependent variable is categorical.
3. Logistic Function: This function transforms the independent variables into a probability between 0 and 1
   which represents the likelihood that the dependent variable is either 0 or 1.
4. Odds: This is the ratio of the probability of an event happening to the probability of it not happening. It differs
   from probability because probability is the ratio of occurrences to total possibilities.
5. Log-Odds (Logit): The natural logarithm of the odds. In logistic regression, the log-odds are modeled as a
   linear combination of the independent variables and the intercept.
6. Coefficient: These are the parameters estimated by the logistic regression model which shows how strongly
   the independent variables affect the dependent variable.
7. Intercept: The constant term in the logistic regression model which represents the log-odds when all
   independent variables are equal to zero.
8. Maximum Likelihood Estimation (MLE): This method is used to estimate the coefficients of the logistic
   regression model by maximizing the likelihood of observing the given data.