Logistic Regression
Logistic Regression
Logistic Regression
Logistic Regression is a supervised machine learning algorithm that is used to predict the
probability of an event or observation outcome. It is a popular statistical method used for
binary classification problems in machine learning.
Sigmoid Function:
1
σ(z) =
−z
1 + e
Spam detection
Credit risk analysis
Medical diagnosis
Despite its simplicity, Logistic Regression can be quite effective, especially when the
relationship between the predictor variable and the outcome is approximately linear.
However, it may not perform well with highly non-linear relationships, for which more
complex models like Decision Trees or Neural Networks might be more suitable.
In Logistic Regression, every model returns a likelihood value. The model with the
maximum likelihood is considered the better model.
Thus, the model with the highest likelihood value is considered the best fit for the
data. In Logistic Regression, this corresponds to the decision boundary (or line) that
maximizes the likelihood of the observed data.
can become extremely small, leading to underflow (i.e., the number becomes too small
for the computer to represent).
However, the log of a probability (which lies between 0 and 1) is always negative. To deal
with this:
Breakdown:
n : total number of samples
yi : true label for sample i
y
^
i
: predicted probability of class 1 for sample i
This is the loss function used in binary classification problems. It measures the
performance of a classification model whose output is a probability value between 0 and
1.
y
^ = σ(zi )
i
Explanation:
y
^
i
: predicted probability for the i th
data point
σ(zi ) : sigmoid function applied to z i
The goal is to find values of parameters β₀, β₁, β₂, ... such that the total loss L is
minimized.
1
σ(z) =
−z
1 + e
The sigmoid function ensures that our predictions are not just hard values (0 or 1), but
probabilities between 0 and 1.
For each training data point, we calculate the probability that it belongs to a
particular class (based on the sigmoid output).
These probabilities are multiplied across all data points.
However, multiplying many small probability values leads to very small numbers,
increasing the risk of underflow.
We take the log of the probabilities (since log turns products into sums).
Log values of numbers between 0 and 1 are negative, so we multiply by -1 to keep
the loss positive.
The final formula works for both classes (0 and 1) in a unified way.
Derivative of Sigmoid
The derivative of the sigmoid function is:
′
σ (z) = σ(z) ⋅ (1 − σ(z))
Multiclass Classification
Multiclass classification is a machine learning task that involves classifying instances into
more than two categories.
Disadvantage:
Softmax Function:
zi
e
σ(zi ) =
K zj
∑ e
j=1
∑ σ(zi ) = 1
i=1
Example:
If you have logits z = [1, 2, 3] :
1 2 3
e e e
σ(1) = , σ(2) = , σ(3) =
1 2 3 4 2 3 4 2 3
e + e + e e + e + e e + e + e
Where:
K = number of classes
n = number of training samples
Making Predictions:
For an input vector x , calculate logits:
What is MLE?
You have observed data.
You know the type of distribution (e.g., Bernoulli, Normal) that the data follows.
But you don’t know the parameter values of the distribution.
Your goal is to find parameter values that maximize the likelihood of the
observed data.
Steps in MLE:
1. Determine the probability distribution of the target variable Y given X .
y (1−y)
P (y) = p ⋅ (1 − p)
5. Find parameter values that maximize the likelihood function (or minimize the
negative log-likelihood).
P
logit(P ) = log( )
1 − P
Used to convert the asymmetric probability scale into a symmetric log scale.
z = w1 x 1 + w2 x 2 + ⋯ + wn x n + b
1
P (y = 1) =
−z
1 + e
Summary
Logistic Regression uses the principle of regression to model probabilities.
The logistic function (sigmoid) converts a linear regression output into a
probability.
That probability is used to classify outputs into classes.
MLE is the backbone method used to train logistic regression models.
sns.set(style="whitegrid")
plt.figure(figsize=(6, 4))
plt.plot(z, sigmoid, label="Sigmoid Curve")
plt.axhline(0.5, color='red', linestyle='--')
plt.title("Sigmoid Function")
plt.xlabel("z (Linear Combination)")
plt.ylabel("Sigmoid(z)")
plt.grid()
plt.legend()
plt.show()
plt.figure(figsize=(6, 4))
plt.plot(p, loss_0, label="True class = 0")
plt.plot(p, loss_1, label="True class = 1")
plt.title("Log Loss vs Predicted Probability")
plt.xlabel("Predicted Probability")
plt.ylabel("Log Loss")
plt.legend()
plt.grid()
plt.show()
plot_decision_boundary(X, y, model)
sns.set(style="whitegrid")
# Predictions
y_pred = model.predict(X)
acc = accuracy_score(y, y_pred)
print(f"Accuracy: {acc:.2f}")
Z = model.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
plt.figure(figsize=(7, 5))
plt.contourf(xx, yy, Z, cmap="viridis", alpha=0.6)
plt.scatter(X[:, 0], X[:, 1], c=y, cmap="viridis", edgecolors='k')
plt.title(title)
plt.xlabel("Feature 1")
plt.ylabel("Feature 2")
plt.show()
C:\Users\goura\anaconda3\Lib\site-packages\sklearn\linear_model\_logistic.py:124
7: FutureWarning: 'multi_class' was deprecated in version 1.5 and will be removed
in 1.7. From then on, it will always use 'multinomial'. Leave it to its default v
alue to avoid this warning.
warnings.warn(
Accuracy: 0.92
In [ ]: