KEMBAR78
Logistic Regression | PDF | Sensitivity And Specificity | Logistic Regression
0% found this document useful (0 votes)
13 views10 pages

Logistic Regression

Logistic regression is a statistical method for binary classification that models the probability of a given input belonging to a particular class using a sigmoid function. It is particularly effective in healthcare for predicting outcomes like disease presence, providing interpretable probabilities for clinicians. The document also discusses the cost function, performance metrics, and the importance of the confusion matrix in evaluating model performance.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views10 pages

Logistic Regression

Logistic regression is a statistical method for binary classification that models the probability of a given input belonging to a particular class using a sigmoid function. It is particularly effective in healthcare for predicting outcomes like disease presence, providing interpretable probabilities for clinicians. The document also discusses the cost function, performance metrics, and the importance of the confusion matrix in evaluating model performance.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 10

Logistic Regression

Logistic regression is a statistical method used for binary classification tasks, where the target
variable has only two possible outcomes (e.g., yes/no, true/false, diabetic/not diabetic). Unlike linear
regression, which predicts continuous outcomes, logistic regression models the probability that a
given input belongs to a particular class using a logistic (sigmoid) function. This function maps any
real-valued input to a value between 0 and 1, representing the probability of the positive class (e.g.,
diabetic).

Why Binary Classification Works Well:

In healthcare data analysis, binary classification is often preferred for predicting outcomes with two
possible states, such as disease presence or absence. Logistic regression excels in this context
because it provides interpretable probabilities, allowing clinicians to assess the likelihood of a patient
having a particular condition based on their features.

Example: Predicting Diabetes with Logistic Regression

Suppose we have a dataset containing patient information, including features like blood pressure
(BP), body mass index (BMI), pregnancy status, and the target variable indicating whether the patient
is diabetic (1) or not diabetic (0). Let's say we have eight features (BP, BMI, pregnancy, and others)
and the ninth feature is the target variable indicating diabetes status.

The logistic regression model will estimate the probability of a patient being diabetic based on their
feature values. By fitting the model to the training data, it learns the relationship between the
features and the probability of diabetes, represented by a best-fit line in feature space.
In logistic regression, we aim to model the probability that a given input belongs to a particular class
(e.g., diabetic or not diabetic). Unlike linear regression, where the outcome is continuous, logistic
regression outputs probabilities bounded between 0 and 1. To achieve this, we use the sigmoid
function, also known as the logistic function.

The sigmoid function forms an S shaped graph, which means as x approaches infinity, the probability
becomes 1, and as x approaches negative infinity, the probability becomes 0. The model sets a
threshold that decides what range of probability is mapped to which binary variable. Suppose we
have two possible outcomes, true and false, and have set the threshold as 0.5. A probability less than
0.5 would be mapped to the outcome false, and a probability greater than or equal to 0.5 would be
mapped to the outcome true.

Sigmoid Function graph


The formula of the sigmoid function is:

f(x)= 1/(1 + e^-x)

and since, y = mx + c
then, Probability of class 0:

𝛔y = 1/(1 + e-y)
𝛔y = 1/(1 + e-mx+c)
𝛔y = emx+c/(1 + emx+c)

Probability of class 1:

1 - 𝛔y = 1/(1 + emx+c)

Now,

𝛔y/(1 - 𝛔y) = emx+c

loge(𝛔y/(1 - 𝛔y)) = mx + c loge(e)

loge(𝛔y/(1 - 𝛔y)) = mx + c

Using the Logistic Regression Equation to Make Predictions:

Once we have trained the logistic regression model and obtained the optimal parameters θ, we can
use the model to make predictions on new data.

 We compute the linear combination y of the features and model parameters.

 We then pass y through the sigmoid function to obtain the predicted probability.

 If y is less than 0.5, we predict the sample belongs to the negative class (0); otherwise, we
predict it belongs to the positive class (1).

Visualizing the Best-Fit Line:

To visualize the best-fit line in logistic regression, we can plot the sigmoid function against the input
feature(s). This will show how the predicted probabilities change as the feature values vary. The
decision boundary, where the predicted probability is 0.5, separates the two classes.

Conclusion:

In summary, logistic regression finds the best-fit line by maximizing the likelihood of observing the
target outcomes given the feature values and model parameters. Through the sigmoid function,
logistic regression outputs probabilities bounded between 0 and 1, allowing for binary classification.
By iteratively optimizing the model parameters, logistic regression provides an effective framework
for predicting binary outcomes and drawing decision boundaries between classes.
Cost Function of Logistic Regression

A cost function is a mathematical function that calculates the difference between the target actual
values (ground truth) and the values predicted by the model. A function that assesses a machine
learning model’s performance also referred to as a loss function or objective function. Usually, the
objective of a machine learning algorithm is to reduce the error or output of cost function.

Log loss and Cost function for Logistic Regression

One of the popular metrics to evaluate models for classification by using probabilities is log loss.

F=−∑(i=1 to M) yilog (hθ(xi))+(1−yi)log(1−hθ(xi))

The cost function can be written as:

F(θ)=1/n∑(i=1 to n) 1/2[hθ(xi)−Yi]2

For Logistic Regression,

hθ(x)=g(θTx)

The above equation leads to a non−convex function that acts as the cost function. The cost function
logistic regression is log loss and is summarized below.

cost(hθ(x), y) = -log(hθ(x)) , when y=1

and

cost(hθ(x), y) = -log(1 - hθ(x)) , when y=0

where,

 y is the actual value of the target variable,

 hθ (x) is the predicted probability that y=1 given , X and parameterized by θ.

 yi is the actual label for the i th training example.

This cost function penalizes the model with a higher loss when its prediction diverges from the actual
label. Specifically, it imposes a large penalty when the model confidently predicts the wrong class
(i.e., high probability for the incorrect class).
Why Mean Squared Error not suitable for Logistic Regression?

Let’s consider the Mean Squared Error (MSE) as a cost function, but it is not suitable for logistic
regression due to its nonlinearity introduced by the sigmoid function.

MSE = 1/2m Σ (i=1 to m) (σ(i) - yi)2

In logistic regression, if we substitute the sigmoid function into the above MSE equation, we get

The equation 1/(1+ez) is a nonlinear transformation, and evaluating this term within the Mean
Squared Error formula results in a non-convex cost function. A non-convex function, have multiple
local minima which can make it difficult to optimize using traditional gradient descent algorithms as
shown below.
Imagine you have a function that looks like a series of hills and valleys, with multiple peaks and
troughs scattered throughout. This type of function is called non-convex because it doesn't have a
single, well-defined minimum point; instead, it has multiple local minima (valleys) and potentially
even some local maxima (peaks).

When you're trying to optimize such a function, the goal is to find the lowest point, which
corresponds to the global minimum. However, because of the presence of multiple local minima,
traditional gradient descent algorithms can encounter difficulties.

Why is it challenging?

1. Getting Stuck in Local Minima: Gradient descent algorithms, like the one used in logistic
regression, work by iteratively moving in the direction of the steepest descent of the function.
However, if they start from an initial point that is not the global minimum and there are multiple
local minima, they might get trapped in one of the local minima instead of reaching the global
minimum. Once stuck in a local minimum, the algorithm cannot escape it to find the true minimum.

2. Plateaus and Saddle Points: In addition to local minima, non-convex functions may have plateaus
(flat regions) and saddle points (points where the gradient is zero but not a minimum or maximum).
These features can slow down or stall the convergence of gradient descent algorithms, making
optimization even more challenging.
1.

2.
Performance Metrics

In the realm of machine learning, the ability to accurately assess the performance of a model is
paramount. Model evaluation metrics serve as the compass guiding practitioners to understand how
well their models are performing on a given task. One such fundamental tool in model evaluation is
the confusion matrix, which provides insights into the performance of a classification model. Let's
delve into the intricacies of the confusion matrix and explore the key metrics derived from it:
accuracy, precision, recall, specificity, and F1-score.

Understanding the Confusion Matrix:

A confusion matrix is a tabular representation of the performance of a classification model that


categorizes predictions into four categories:

 True Positives (TP): Instances where the model correctly predicts the positive class.

 True Negatives (TN): Instances where the model correctly predicts the negative class.

 False Positives (FP): Instances where the model incorrectly predicts the positive class (Type I
error).

 False Negatives (FN): Instances where the model incorrectly predicts the negative class (Type
II error).

EXAMPLE
A machine learning model is trained to predict diabetes in patients. The test dataset consists of 100
people.

True Positive (TP) — model correctly predicts the positive class (prediction and actual both are
positive). In the above example, 10 people who have diabetes are predicted positively by the model.

True Negative (TN) — model correctly predicts the negative class (prediction and actual both are
negative). In the above example, 60 people who don’t have diabetes are predicted negatively by the
model.

False Positive (FP) — model gives the wrong prediction of the negative class (predicted-positive,
actual-negative). In the above example, 22 people are predicted as positive of having a diabetes,
although they don’t have a diabetes. FP is also called a TYPE I error.

False Negative (FN) — model wrongly predicts the positive class (predicted-negative, actual-
positive). In the above example, 8 people who have diabetes are predicted as negative. FN is also
called a TYPE II error.

With the help of these four values, we can calculate True Positive Rate (TPR), False Negative Rate
(FPR), True Negative Rate (TNR), and False Negative Rate (FNR).

Even if data is imbalanced, we can figure out that our model is working well or not. For that, the
values of TPR and TNR should be high, and FPR and FNR should be as low as possible.

With the help of TP, TN, FN, and FP, other performance metrics can be calculated.

Accuracy

Accuracy measures the overall correctness of the model's predictions and is calculated as the ratio of
correct predictions (TP + TN) to the total number of predictions (TP + TN + FP + FN).

Accuracy = (TP + TN)/(TP + TN + FP + FN)


Specificity

Specificity measures the ability of the model to correctly identify negative instances out of all actual
negative instances. It is calculated as the ratio of true negatives (TN) to the total number of actual
negatives (TN + FP).

Specificity= TN / (TN + FP)

Precision:

Precision quantifies the ability of the model to correctly identify positive instances out of all
instances predicted as positive. It is calculated as the ratio of true positives (TP) to the total number
of predicted positives (TP + FP).

Precision = TP/ (TP + FP)

Recall (Sensitivity or True Positive Rate):

Recall measures the ability of the model to correctly identify positive instances out of all actual
positive instances. It is calculated as the ratio of true positives (TP) to the total number of actual
positives (TP + FN).

Recall = TP/(TP+FN)

Comparing Recall and Precision in Diabetic Prediction:

1. High Recall, Low Precision: In this scenario, the model captures a high proportion of diabetic
patients (high recall) but may also incorrectly label many non-diabetic individuals as diabetic (low
precision). While this ensures that diabetic patients are not missed, it may lead to unnecessary tests
or treatments for non-diabetic individuals.

2. High Precision, Low Recall: Conversely, in this scenario, the model correctly identifies diabetic
patients with high precision but may miss a significant number of diabetic patients (low recall). While
this minimizes unnecessary interventions for non-diabetic individuals, it increases the risk of
undiagnosed diabetes and its associated complications.

F1-Score:

The F1-score is the harmonic mean of precision and recall and provides a balanced measure of a
model's performance. It is calculated as:

F1-Score=2× (Precision+Recall/Precision×Recall)

Interpreting the Confusion Matrix Metrics:

 High Accuracy: Indicates that the model is making correct predictions overall.

 High Precision: Indicates that when the model predicts positive, it is very likely to be correct.

 High Recall: Indicates that the model is able to identify most of the positive instances.

 High Specificity: Indicates that the model is able to identify most of the negative instances.

 High F1-Score: Indicates a good balance between precision and recall.

You might also like