KEMBAR78
Day.12 Logistic Regression | PDF | Logistic Regression | Regression Analysis
0% found this document useful (0 votes)
27 views8 pages

Day.12 Logistic Regression

Logistic Regression is a statistical method for binary classification that predicts outcomes based on input features using the sigmoid function to estimate probabilities. It is advantageous for its simplicity and interpretability but has limitations in handling complex relationships and is suitable only for binary or multinomial classification. Key evaluation metrics include accuracy, precision, recall, and F1-score, which help assess model performance in various scenarios.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views8 pages

Day.12 Logistic Regression

Logistic Regression is a statistical method for binary classification that predicts outcomes based on input features using the sigmoid function to estimate probabilities. It is advantageous for its simplicity and interpretability but has limitations in handling complex relationships and is suitable only for binary or multinomial classification. Key evaluation metrics include accuracy, precision, recall, and F1-score, which help assess model performance in various scenarios.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Logistic Regression

Logistic Regression is a statistical method used for binary classification—


predicting one of two possible outcomes based on one or more input features.
Despite the name, it's actually a classification algorithm, not a regression one.

When to Use Logistic Regression?

 The dependent variable is categorical (often binary: 0/1, Yes/No,


True/False).
 You want to estimate the probability of a class occurring, such as
whether a customer will buy a product or not.

Key Concepts
1. The Logistic Function (Sigmoid Function)

The core of logistic regression is the sigmoid function, which maps any real-
valued number to the (0, 1) interval:

Where:

z=β0+β1x1+β2x2+…+βnxn

β0,β1,…,βn, are the parameters (weights) learned from data

This output σ(z)can be interpreted as the probability that the output belongs
to class 1.

2. Model Prediction

The model predicts class 1 if the probability is greater than 0.5:

3. Loss Function

Instead of Mean Squared Error (used in linear regression), logistic regression


uses Log Loss (also called Binary Cross-Entropy):
Where:

y is the true label (0 or 1)

y^ is the predicted probability

Advantages

 Simple and easy to implement


 Interpretable (coefficients show the effect of features)
 Fast to train and predict
 Works well for linearly separable data

Limitations

 Only works for binary or multinomial classification


 Assumes a linear relationship between input variables and the log-odds
 Not suitable for complex relationships unless features are engineered
well

Use Cases

 Email spam detection


 Medical diagnosis (e.g., predicting if a tumor is malignant)
 Customer churn prediction
 Credit scoring

Binary Classification and the Sigmoid Function in Logistic Regression

Binary Classification

Binary classification is a type of supervised learning where the goal is to classify


input data into one of two classes (e.g., spam vs. not spam, malignant vs.
benign tumor).

Target variable:

y∈{0,1}

Examples:

 Email classification (spam or not)


 Disease diagnosis (positive or negative)
 Loan default prediction (default or not)
Sigmoid Function

At the heart of logistic regression is the sigmoid (logistic) function, which maps
any real-valued number into the (0, 1) interval.

Where:

 z=wTx+b (linear combination of input features)


 σ(z) is the predicted probability of class 1

Interpretation:

 σ(z)→1: strong prediction for class 1


 σ(z)→0: strong prediction for class 0

Prediction Rule

Once the sigmoid gives the probability, logistic regression classifies as:

Loss Function

To train logistic regression, we minimize the binary cross-entropy loss:

Where:

 y is the true label


 y^=σ(z) is the predicted probability

Why Use Sigmoid?

 Squashes outputs between 0 and 1 → perfect for probabilities.


 Smooth and differentiable → great for gradient descent optimization.
Model evaluation: Accuracy, Precision, Recall, F1-Score

1. Accuracy

The proportion of total correct predictions out of all predictions made.

 Use Case: Best used when the classes are balanced.

 Limitation: Misleading in imbalanced datasets (e.g., 95% one class).

2. Precision

The proportion of correctly predicted positive observations out of all predicted positives.

 Use Case: Important when the cost of false positives is high (e.g., spam detection).

 Interpretation: "Of all the things we predicted as positive, how many actually were?"

3. Recall (Sensitivity or True Positive Rate)

The proportion of actual positives correctly identified.

 Use Case: Important when the cost of false negatives is high (e.g., cancer detection).

 Interpretation: "Of all the actual positives, how many did we catch?"

4. F1-Score

The harmonic mean of precision and recall. It balances the two metrics.

Use Case: Useful when you need a balance between precision and recall, especially with
imbalanced classes.
Metric Good For Bad For

Accuracy Balanced datasets Imbalanced datasets

Precision Low tolerance for false positives Low tolerance for false negatives

Recall Low tolerance for false negatives Low tolerance for false positives

F1-Score Balance of Precision and Recall Requires tradeoff between the two

Use case :
1. Import Libraries
Predicting Product Purchase Based on Age and Estimated Salary

import numpy as np

import pandas as pd

from sklearn.model_selection import train_test_split

from sklearn.preprocessing import StandardScaler

from sklearn.linear_model import LogisticRegression

from sklearn.metrics import confusion_matrix, classification_report,


accuracy_score

2. Generate Sample Data (or load your own CSV)

# Simulate some data

np.random.seed(0)

n_samples = 200

age = np.random.randint(18, 60, size=n_samples)

salary = np.random.randint(20000, 100000, size=n_samples)

purchased = (salary > 50000).astype(int) # Simplified rule for


demonstration
# Create DataFrame

df = pd.DataFrame({'Age': age, 'EstimatedSalary': salary, 'Purchased':


purchased})

3. Prepare the Data

# Features and target

X = df[['Age', 'EstimatedSalary']]

y = df['Purchased']

# Train-test split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,


random_state=42)

# Feature scaling

scaler = StandardScaler()

X_train_scaled = scaler.fit_transform(X_train)

X_test_scaled = scaler.transform(X_test)

4. Train Logistic Regression Model

# Model training

model = LogisticRegression()

model.fit(X_train_scaled, y_train)

5. Evaluate the Model

# Predictions

y_pred = model.predict(X_test_scaled)

# Evaluation metrics

print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred))

print("\nClassification Report:\n", classification_report(y_test, y_pred))

print("Accuracy Score:", accuracy_score(y_test, y_pred))

6. Make a New Prediction


# Predict for a new customer: Age 35, Salary $60,000

new_data = scaler.transform([[35, 60000]])

prediction = model.predict(new_data)

print("Will the customer buy the product?", "Yes" if prediction[0] == 1


else "No")

python coding:
# Step 1: Import necessary libraries

import pandas as pd

import numpy as np

from sklearn.model_selection import train_test_split

from sklearn.linear_model import LogisticRegression

from sklearn.metrics import accuracy_score, precision_score,


recall_score, f1_score, confusion_matrix, classification_report

# Step 2: Create or load the dataset (example: customer purchase)

# Simulated dataset

data = {

'Age': [22, 25, 47, 52, 46, 56, 55, 60, 62, 61],

'Salary': [15000, 29000, 48000, 60000, 52000, 80000, 82000, 90000,


95000, 99000],

'Purchased': [0, 0, 0, 1, 0, 1, 1, 1, 1, 1]

df = pd.DataFrame(data)

# Step 3: Define features (X) and target (y)

X = df[['Age', 'Salary']]

y = df['Purchased']
# Step 4: Split data into train and test sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,


random_state=42)

# Step 5: Train logistic regression model

model = LogisticRegression()

model.fit(X_train, y_train)

# Step 6: Predict on test set

y_pred = model.predict(X_test)

# Step 7: Evaluate the model

print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred))

print("Classification Report:\n", classification_report(y_test, y_pred))

print("Accuracy:", accuracy_score(y_test, y_pred))

print("Precision:", precision_score(y_test, y_pred))

print("Recall:", recall_score(y_test, y_pred))

print("F1 Score:", f1_score(y_test, y_pred))

You might also like