Experiment 8
Experiment 8
• Implementation of Logistic
Regression using sklearn
Logistic Regression
• A basic machine learning approach that is frequently
used for binary classification tasks is called logistic
regression.
• It uses the sigmoid function to simulate the likelihood of
an instance falling into a specific class, producing
values between 0 and 1.
• Logistic regression, with its emphasis on interpretability,
simplicity, and efficient computation, is widely applied
in a variety of fields, such as marketing, finance, and
healthcare, and it offers insightful forecasts and useful
information for decision-making
• A statistical model for binary classification is called
logistic regression.
• Using the sigmoid function, it forecasts the likelihood
that an instance will belong to a particular class,
guaranteeing results between 0 and 1.
• To minimize the log loss, the model computes a linear
combination of input characteristics, transforms it using
the sigmoid, and then optimizes its coefficients using
methods like gradient descent.
• These coefficients establish the decision boundary that
divides the classes.
• Because of its ease of use, interpretability, and
versatility across multiple domains, Logistic Regression
is widely used in machine learning for problems that
involve binary outcomes.
• Overfitting can be avoided by implementing
regularization
Logistic Regression uses a linear equation to combine the
input information and the sigmoid function to restrict
predictions between 0 and 1.
Gradient descent and other techniques are used to
optimize the model’s coefficients to minimize the log loss
.
These coefficients produce the resulting decision
boundary, which divides instances into two classes.
• When it comes to binary classification, logistic
regression is the best choice because it is easy to
understand, straightforward, and useful in a variety of
settings.
• Generalization can be improved by using regularization.
Python Code Import Libraries
• # Import necessary libraries
• import numpy as np
• import pandas as pd
• import matplotlib.pyplot as plt
• import seaborn as sns
• from sklearn.datasets import load_diabetes
• from sklearn.model_selection import train_test_split
• from sklearn.preprocessing import StandardScaler
• from sklearn.linear_model import LogisticRegression
• from sklearn.metrics import accuracy_score, classification_report, confusion_matrix,
roc_curve, auc
Read and Explore the data
• # Load the diabetes dataset
• diabetes = load_diabetes()
• X, y = diabetes.data, diabetes.target
• # Convert the target variable to binary (1 for diabetes, 0 for no
diabetes)
• y_binary = (y > np.median(y)).astype(int)
Splitting The Dataset: Train and Test dataset
• # Split the data into training and testing sets
• X_train, X_test, y_train, y_test = train_test_split(
• X, y_binary, test_size=0.2, random_state=42)
Feature Scaling
• # Standardize features
• scaler = StandardScaler()
• X_train = scaler.fit_transform(X_train)
• X_test = scaler.transform(X_test)
Train The Model
• # Train the Logistic Regression model
• model = LogisticRegression()
• model.fit(X_train, y_train)
Evaluation Metrics
• # Evaluate the model
• y_pred = model.predict(X_test)
• accuracy = accuracy_score(y_test, y_pred)
• print("Accuracy: {:.2f}%".format(accuracy * 100))
Output
• Accuracy: 73.03%
Confusion Matrix and
Classification Report
• # evaluate the model
• print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred))
• print("\nClassification Report:\n", classification_report(y_test,
y_pred))
Visualizing the performance of our model.
• # Visualize the decision boundary with accuracy information
• plt.figure(figsize=(8, 6))
• sns.scatterplot(x=X_test[:, 2], y=X_test[:, 8], hue=y_test, palette={
• 0: 'blue', 1: 'red'}, marker='o')
• plt.xlabel("BMI")
• plt.ylabel("Age")
• plt.title("Logistic Regression Decision Boundary\nAccuracy: {:.2f}%".format(
• accuracy * 100))
• plt.legend(title="Diabetes", loc="upper right")
• plt.show()
OUTPUT
Plotting ROC Curve
• # Plot ROC Curve
• y_prob = model.predict_proba(X_test)[:, 1]
• fpr, tpr, thresholds = roc_curve(y_test, y_prob)
• roc_auc = auc(fpr, tpr)
• plt.figure(figsize=(8, 6))
• plt.plot(fpr, tpr, color='darkorange', lw=2,
• label=f'ROC Curve (AUC = {roc_auc:.2f})')
• plt.plot([0, 1], [0, 1], color='navy', lw=2, linestyle='--', label='Random')
• plt.xlabel('False Positive Rate')
• plt.ylabel('True Positive Rate')
• plt.title('Receiver Operating Characteristic (ROC) Curve\nAccuracy: {:.2f}%'.format(
• accuracy * 100))
• plt.legend(loc="lower right")
• plt.show()
OUTPUT