What is Logistic Regression:
Logistic regression is a statistical method used to analyze a data set
where there are one or more independent variables that determine
the outcome. Outcomes are measured as dichotomous variables
(only two possible outcomes). It is used to predict the probability of a
categorical dependent variable. In logistic regression, the dependent
variable is binary, meaning it only contains data coded as 1 (yes,
success, etc.) or 0 (no, failure, etc.).
Understanding Logistic Regression:
In logistic regression, we are essentially trying to find the
weights that transform the input data (independent
variables) into predictions (dependent variables).
Using a logistic function (also called a sigmoid
function), we ensure that these predictions are in the
range 0 to 1. This function maps each real number to a
different value between 0 and 1. In the case of logistic
regression, it converts the output of linear regression
into probabilities.
The logistic function has an S-shaped curve, defined by
the following formula:
Event probability = 1 / (1 + e^(-y))
where y is a linear combination of input features,
weighted by model coefficients.
"e" is the base of the natural logarithm, and "y" is the
equation of the straight line (y = mx + b in simple linear
regression).
It is used to predict a categorical dependent variable using a
specific set of independent variables.
Logistic regression predicts the output of a categorical
dependent variable.
Therefore, the result must be a categorical or discrete value.
It can be yes or no, 0 or 1, true or false, etc., but instead of
giving exact values of 0 and 1, it shows a probability value
between 0 and 1.
Apart from its usage, logistic regression is very similar to linear
regression.
Linear regression is used to solve regression problems, while
logistic regression is used to solve classification problems.
In logistic regression, instead of fitting a regression line, we fit
an "S" shaped logistic function to predict the two maximum
values (0 or 1).
The logistic function curve gives the probability of whether a
cell becomes cancerous, whether the mouse is obese (based
on body weight), etc.
Logistic regression is an important machine learning algorithm
because it can provide probabilities and classify new data using
both continuous and discrete data sets.
Logistic regression can be used to classify observations based
on different types of data, and the most effective variables for
classification can be easily determined.
Logistic Function (Sigmoid Function):
The sigmoid function is a mathematical function that
maps predicted values to probabilities.
It maps any real value to another value in the range 0
and 1.
The values for logistic regression must be between 0
and 1 and cannot exceed this limit, forming a curve like
an "S" shape.
The S-shaped curve is called a sigmoid function or
logistic function.
In logistic regression, we use the concept of threshold
to define the probability of 0 or 1.
For example, values above the threshold tend to 1 and
values below the threshold tend to 0
Differences b/w Linear and Logistic Regression:
Terminologies involved in Logistic Regression:
Independent variables: Input features or predictor variables that are
applied to the prediction of the dependent variable.
Dependent variable: The target variable we want to predict in the
logistic regression model.
Logistic function: A formula used to express how independent and
dependent variables relate to each other.
Logistic functions convert input variables into probability values
between 0 and 1, representing the probability that the dependent
variable is 1 or 0.
Odds: The ratio between what happens and what doesn't happen. It is
different from probability because probability is the ratio of what
happens to what could happen.
Log Odds: Log odds, also known as the logit function, is the natural
logarithm of odds. In logistic regression, the log odds of the dependent
variable are modeled as a linear combination of the independent
variables and the intercept.
How does Logistic Regression work?
Logistic regression models convert the continuous-valued
output of a linear regression function into a categorical-
valued output using the sigmoid function, which assigns each
real-valued set of independent variable inputs to a value
between 0 and 1. This function is called a logical function.
Let be an independent input feature
And the dependant variable is Y having only binary value i.e 0 or
1
Then apply the multi-linear function to the input variables
X
Here xi is the ith observation of xi Wi=(W1,W2,W3…….Wn)
Is the weight or coefficient, and b is the bias term also
know as intercept. Simply this can be represented as the
dot product of weight and bias.
Whatever we discussed above is the linear regression
Sigmoid Function
Now we use the sigmoid function where the input will be
z and we find the probability between 0 and 1.i.e
Predicted y.
As shown above, the figure sigmoid function converts
the continuous variable data into the probability i.e.
between 0 and 1.
Where the probability of being a class can be measured
as:
Logistic Regression Equation
Probability is the ratio of something happening to
something not happening.
It is different from probability because probability is the
ratio of what happens to what could happen. would be
weird
Applying natural log on odd will be
Then the final regression equation will be:
Likelihood function for logistic Regression
The predicted probabilities will p(X;b;w) = p(x) for y = 1
and for y = 0 predicted probabilities will 1-(X;b;w) = 1-p(x)
Taking natural logs on both sides
Gradient of the log-likelihood function
The assumptions of logistic regression are as follows:
Independent Observations: Each observation is
independent of other observations.
This means that there is no correlation between the
input variables.
Binary dependent variable: Assume that the dependent
variable must be binary or dichotomous, which means it
can only take on two values.
SoftMax function is used for more than two categories.
Linear relationship between the independent variable
and the log odds: The relationship between the
independent variable and the log odds of the dependent
variable should be linear.
No outliers: The dataset should not contain any outliers.
Large sample size: The sample size is large enough
Type of logistic Regression:
On the basis of the categories, Logistic Regression can
be classified intro three types:
1. Binomial: In binomial logistic regression, the
dependent variable can only be of two possible
types, such as B. 0 or 1, pass or fail, etc.
2. Polynomial: In multinomial logistic regression, there
can be three or more possible unordered types of
dependent variables, such as "cat", "dog", or "sheep".
3. Ordinal: In ordinal logistic regression, there can be
three or more possible ordinal types of the
dependent variable, such as B. "Low", "Medium" or
"High".
Code implementation for logistic Regression
Binomial Logistic regression:
The target variable can only have two possible types: "0"
or "1", in this case "win" vs. "lose", "pass" vs. "fail", "dead"
vs. "alive", etc. can represent. Us the Sigmoid function,
which has been discussed above.
Import necessary libraries according to the requirements
of the model.
This Python code shows how to implement a logistic
regression model for classification using a breast cancer
dataset.
Multinomial Logistic Regression:
A target variable can have three or more unordered
possible types (that is, the types have no quantitative
meaning), such as "Disease A" vs. "Disease B" vs.
"Disease C."
In this case, use the SoftMax function instead of the
sigmoid function. The SoftMax function of class K is:
Here, k represents the number of elements in the vector
z, and i, j iterates over all the elements in the vector.
Then the probability will be:
Ordinal Logistic Regression:
It deals target variables with ordered categories. For
example, a test score can be categorized as:
“Very poor”,” poor”, “good” or” very good”, Here, each
category can be given a score like 0,1,2 or,3
What is an activation function:
In an artificial neural network, a node's activation
function defines the output of that node or neuron for a
specific input or set of inputs.
That output is then used as input to the next node, and
so on until the desired solution to the original problem is
found.
It maps the resulting value to the desired range, e.g., B.
between 0 and 1 or -1 and 1 etc.
This depends on the choice of activation function. For
example, using a logistic activation function maps all
inputs to a real number range from 0 to 1.
Example of a binary classification problem:
In a binary classification problem, we have an input x,
say an image, and we have to classify it as having a
correct object or not.
If it is a correct object, we will assign it a 1, else 0.
So here, we have only two outputs – either the image
contains a valid object or it does not.
This is an example of a binary classification problem.
When we multiply each of them features with a weight
(w1, w2, …… wm) and sum them all together, node output
= activation (weighted sum of inputs).
Types of activation Function:
The Activation Functions are basically two types:
1. Linear Activation Function-
Equation: f(x)=x
Range: (-infinity to infinity)
2. Non-linear Activation Function-
It makes it easy for the model to generalize with a variety
of data and to differentiate between the output.
By simulation, it is found that for larger networks ReLUs
is much faster.
It has been proven that ReLUs result in much faster
training for large networks.
Non-linear means that the output cannot be reproduced
from a linear combination of the inputs.
The main terminologies needed to understand for
nonlinear functions are:
Derivative: Change in y-axis w.r.t.
change in x-axis.
It is also known as slope.
Monotonic function: A function which is either entirely
non-increasing or non-decreasing.
In conclusion, logistic regression is a powerful
statistical technique that allows us to model
the probability of a binary event based on a
set of input variables.
It is widely used in machine learning and data
analysis, and its interpretability makes it a
valuable tool for understanding the
relationship between input variables and
output probabilities.