Experiment 8
Experiment 8
• Implementation of Logistic
  Regression using sklearn
Logistic Regression
• A basic machine learning approach that is frequently
  used for binary classification tasks is called logistic
  regression.
• It uses the sigmoid function to simulate the likelihood of
  an instance falling into a specific class, producing
  values between 0 and 1.
• Logistic regression, with its emphasis on interpretability,
  simplicity, and efficient computation, is widely applied
  in a variety of fields, such as marketing, finance, and
  healthcare, and it offers insightful forecasts and useful
  information for decision-making
• A statistical model for binary classification is called
  logistic regression.
• Using the sigmoid function, it forecasts the likelihood
  that an instance will belong to a particular class,
  guaranteeing results between 0 and 1.
• To minimize the log loss, the model computes a linear
  combination of input characteristics, transforms it using
  the sigmoid, and then optimizes its coefficients using
  methods like gradient descent.
• These coefficients establish the decision boundary that
  divides the classes.
• Because of its ease of use, interpretability, and
  versatility across multiple domains, Logistic Regression
  is widely used in machine learning for problems that
  involve binary outcomes.
• Overfitting can be avoided by implementing
  regularization
Logistic Regression uses a linear equation to combine the
input information and the sigmoid function to restrict
predictions between 0 and 1.
 Gradient descent and other techniques are used to
optimize the model’s coefficients to minimize the log loss
.
These coefficients produce the resulting decision
boundary, which divides instances into two classes.
• When it comes to binary classification, logistic
  regression is the best choice because it is easy to
  understand, straightforward, and useful in a variety of
  settings.
• Generalization can be improved by using regularization.
Python Code Import Libraries
•   # Import necessary libraries
•   import numpy as np
•   import pandas as pd
•   import matplotlib.pyplot as plt
•   import seaborn as sns
•   from sklearn.datasets import load_diabetes
•   from sklearn.model_selection import train_test_split
•   from sklearn.preprocessing import StandardScaler
•   from sklearn.linear_model import LogisticRegression
•   from sklearn.metrics import accuracy_score, classification_report, confusion_matrix,
    roc_curve, auc
Read and Explore the data
• # Load the diabetes dataset
• diabetes = load_diabetes()
• X, y = diabetes.data, diabetes.target
• # Convert the target variable to binary (1 for diabetes, 0 for no
  diabetes)
• y_binary = (y > np.median(y)).astype(int)
Splitting The Dataset: Train and Test dataset
• # Split the data into training and testing sets
• X_train, X_test, y_train, y_test = train_test_split(
•        X, y_binary, test_size=0.2, random_state=42)
Feature Scaling
• # Standardize features
• scaler = StandardScaler()
• X_train = scaler.fit_transform(X_train)
• X_test = scaler.transform(X_test)
Train The Model
• # Train the Logistic Regression model
• model = LogisticRegression()
• model.fit(X_train, y_train)
Evaluation Metrics
• # Evaluate the model
• y_pred = model.predict(X_test)
• accuracy = accuracy_score(y_test, y_pred)
• print("Accuracy: {:.2f}%".format(accuracy * 100))
Output
• Accuracy: 73.03%
Confusion Matrix and
Classification Report
• # evaluate the model
• print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred))
• print("\nClassification Report:\n", classification_report(y_test,
  y_pred))
Visualizing the performance of our model.
•   # Visualize the decision boundary with accuracy information
•   plt.figure(figsize=(8, 6))
•   sns.scatterplot(x=X_test[:, 2], y=X_test[:, 8], hue=y_test, palette={
•                         0: 'blue', 1: 'red'}, marker='o')
•   plt.xlabel("BMI")
•   plt.ylabel("Age")
•   plt.title("Logistic Regression Decision Boundary\nAccuracy: {:.2f}%".format(
•           accuracy * 100))
•   plt.legend(title="Diabetes", loc="upper right")
•   plt.show()
OUTPUT
Plotting ROC Curve
•   # Plot ROC Curve
•   y_prob = model.predict_proba(X_test)[:, 1]
•   fpr, tpr, thresholds = roc_curve(y_test, y_prob)
•   roc_auc = auc(fpr, tpr)
•   plt.figure(figsize=(8, 6))
•   plt.plot(fpr, tpr, color='darkorange', lw=2,
•                  label=f'ROC Curve (AUC = {roc_auc:.2f})')
•   plt.plot([0, 1], [0, 1], color='navy', lw=2, linestyle='--', label='Random')
•   plt.xlabel('False Positive Rate')
•   plt.ylabel('True Positive Rate')
•   plt.title('Receiver Operating Characteristic (ROC) Curve\nAccuracy: {:.2f}%'.format(
•           accuracy * 100))
•   plt.legend(loc="lower right")
•   plt.show()
OUTPUT